Naoki TAKEUCHI Taiki YAMAE Christopher L. AYALA Hideo SUZUKI Nobuyuki YOSHIKAWA
The adiabatic quantum-flux-parametron (AQFP) is an energy-efficient superconductor logic element based on the quantum flux parametron. AQFP circuits can operate with energy dissipation near the thermodynamic and quantum limits by maximizing the energy efficiency of adiabatic switching. We have established the design methodology for AQFP logic and developed various energy-efficient systems using AQFP logic, such as a low-power microprocessor, reversible computer, single-photon image sensor, and stochastic electronics. We have thus demonstrated the feasibility of the wide application of AQFP logic in future information and communications technology. In this paper, we present a tutorial review on AQFP logic to provide insights into AQFP circuit technology as an introduction to this research field. We describe the historical background, operating principle, design methodology, and recent progress of AQFP logic.
Taiki YAMAE Naoki TAKEUCHI Nobuyuki YOSHIKAWA
The adiabatic quantum-flux-parametron (AQFP) is an energy-efficient superconductor logic device. In a previous study, we proposed a low-latency clocking scheme called delay-line clocking, and several low-latency AQFP logic gates have been demonstrated. In delay-line clocking, the latency between adjacent excitation phases is determined by the propagation delay of excitation currents, and thus the rising time of excitation currents should be sufficiently small; otherwise, an AQFP gate can switch before the previous gate is fully excited. This means that delay-line clocking needs high clock frequencies, because typical excitation currents are sinusoidal and the rising time depends on the frequency. However, AQFP circuits need to be tested in a wide frequency range experimentally. Hence, in the present study, we investigate AQFP circuits adopting delay-line clocking with square excitation currents to apply delay-line clocking in a low frequency range. Square excitation currents have shorter rising time than sinusoidal excitation currents and thus enable low frequency operation. We demonstrate an AQFP buffer chain with delay-line clocking using square excitation currents, in which the latency is approximately 20ps per gate, and confirm that the operating margin for the buffer chain is kept sufficiently wide at clock frequencies below 1GHz, whereas in the sinusoidal case the operating margin shrinks below 500MHz. These results indicate that AQFP circuits adopting delay-line clocking can operate in a low frequency range by using square excitation currents.
Kei FUJIMOTO Masashi KANEKO Kenichi MATSUI Masayuki AKUTSU
Packet processing on commodity hardware is a cost-efficient and flexible alternative to specialized networking hardware. However, virtualizing dedicated networking hardware as a virtual machine (VM) or a container on a commodity server results in performance problems, such as longer latency and lower throughput. This paper focuses on obtaining a low-latency networking system in a VM and a container. We reveal mechanisms that cause millisecond-scale networking delays in a VM through a series of experiments. To eliminate such delays, we design and implement a low-latency networking system, kernel busy poll (KBP), which achieves three goals: (1) microsecond-scale tail delays and higher throughput than conventional solutions are achieved in a VM and a container; (2) application customization is not required, so applications can use the POSIX sockets application program interface; and (3) KBP software does not need to be developed for every Linux kernel security update. KBP can be applied to both a VM configuration and a container configuration. Evaluation results indicate that KBP achieves microsecond-scale tail delays in both a VM and a container. In the VM configuration, KBP reduces maximum round-trip latency by more than 98% and increases the throughput by up to three times compared with existing NAPI and Open vSwitch with the Data Plane Development Kit (OvS-DPDK). In the container configuration, KBP reduces maximum round-trip latency by 21% to 96% and increases the throughput by up to 1.28 times compared with NAPI.
This paper evaluates the bluetooth low energy (BLE) positioning systems using the sparse-training data through the comparison experiments. The sparse-training data is extracted from the database including enough data for realizing the highly accurate and precise positioning. First, we define the sparse-training data, i.e., the data collection time and the number of smartphones, directions, beacons, and reference points, on BLE positioning systems. Next, the positioning performance evaluation experiments are conducted in two indoor environments, that is, an indoor corridor as a one-dimensionally spread environment and a hall as a twodimensionally spread environment. The algorithms for comparison are the conventional fingerprint algorithm and the hybrid algorithm (the authors already proposed, and combined the proximity algorithm and the fingerprint algorithm). Based on the results, we confirm that the hybrid algorithm performs well in many cases even when using sparse-training data. Consequently, the robustness of the hybrid algorithm, that the authors already proposed for the sparse-training data, is shown.
Shohei KAMAMURA Yuhei HAYASHI Yuki MIYOSHI Takeaki NISHIOKA Chiharu MORIOKA Hiroyuki OHNISHI
This paper proposes a fast and scalable traffic monitoring system called Fast xFlow Proxy. For efficiently provisioning and operating networks, xFlow such as IPFIX and NetFlow is a promising technology for visualizing the detailed traffic matrix in a network. However, internet protocol (IP) packets in a large carrier network are encapsulated with various outer headers, e.g., layer 2 tunneling protocol (L2TP) or multi-protocol label switching (MPLS) labels. As native xFlow technologies are applied to the outer header, the desired inner information cannot be visualized. From this motivation, we propose Fast xFlow Proxy, which explores the complicated carrier's packet, extracts inner information properly, and relays the inner information to a general flow collector. Fast xFlow Proxy should be able to handle various packet processing operations possible (e.g., header analysis, header elimination, and statistics) at a wire rate. To realize the processing speed needed, we implement Fast xFlow Proxy using the data plane development kit (DPDK) and field-programmable gate array (FPGA). By optimizing deployment of processes between DPDK and FPGA, Fast xFlow Proxy achieves wire rate processing. From evaluations, we can achieve over 20 Gbps performance by using a single server and 100 Gbps performance by using scale-out architecture. We also show that this performance is sufficiently practical for monitoring a nationwide carrier network.
Tingting HU Ryuji FUCHIKAMI Takeshi IKENAGA
High frame rate and ultra-low delay vision system, which can finish reading and processing of 1000fps sequence within 1ms/frame, draws increasing attention in the field of robotics that requires immediate feedback from image process core. Meanwhile, tracking task plays an important role in many computer vision applications. Among various tracking algorithms, Lucas Kanade (LK)-based template tracking, which tracks targets with high accuracy over the sub-pixel level, is one of the keys for robotic applications, such as factory automation (FA). However, the substantial spatial iterative processing and complex computation in the LK algorithm, make it difficult to achieve a high frame rate and ultra-low delay tracking with limited resources. Aiming at an LK-based template tracking system that reads and processes 1000fps sequences within 1ms/frame with small resource costs, this paper proposes: 1) High temporal resolution-based temporal iterative tracking, which maps the spatial iterations into the temporal domain, efficiently reduces resource cost and delay caused by spatial iterative processing. 2) Label scanner-based multi-stream spatial processing, which maps the local spatial processing into the labeled input pixel stream and aggregates them with a label scanner, makes the local spatial processing in the LK algorithm possible be implemented with a small resource cost. Algorithm evaluation shows that the proposed temporal iterative tracking performs dynamic tracking, which tracks object with coarse accuracy when it's moving fast and achieves higher accuracy when it slows down. Hardware evaluation shows that the proposed label scanner-based multi-stream architecture makes the system implemented on FPGA (zcu102) with resource cost less than 20%, and the designed tracking system supports to read and process 1000fps sequence within 1ms/frame.
Wassapon WATANAKEESUNTORN Keichi TAKAHASHI Chawanat NAKASAN Kohei ICHIKAWA Hajimu IIDA
OpenFlow is a widely adopted implementation of the Software-Defined Networking (SDN) architecture. Since conventional network monitoring systems are unable to cope with OpenFlow networks, researchers have developed various monitoring systems tailored for OpenFlow networks. However, these existing systems either rely on a specific controller framework or an API, both of which are not part of the OpenFlow specification, and thus limit their applicability. This article proposes a transparent and low-overhead monitoring system for OpenFlow networks, referred to as Opimon. Opimon monitors the network topology, switch statistics, and flow tables in an OpenFlow network and visualizes the result through a web interface in real-time. Opimon monitors a network by interposing a proxy between the controller and switches and intercepting every OpenFlow message exchanged. This design allows Opimon to be compatible with any OpenFlow switch or controller. We tested the functionalities of Opimon on a virtual network built using Mininet and a large-scale international OpenFlow testbed (PRAGMA-ENT). Furthermore, we measured the performance overhead incurred by Opimon and demonstrated that the overhead in terms of latency and throughput was less than 3% and 5%, respectively.
Guohua LIU Huabang ZHONG Cantianci GUO Zhiqun CHENG
This paper proposes a methodology for designing broadband class B/J power amplifier based on a mirrored lowpass filter matching structure. According to this filter theory, the impedance of this design method is mainly related to the cutoff frequency. Series inductors and shunt capacitors filter out high frequencies. The change of input impedance with frequency is small in the passband. Which can suppress higher harmonics and expand bandwidth. In order to confirm the validity of the design method, a broadband high-efficiency power amplifier in the 1.3 - 3.9GHz band is designed and fabricated. Measurement results show that the output power is greater than 40.5dBm, drain efficiency is 61.2% - 70.8% and the gain is greater than 10dB.
Yuki IMAI Shinichi NISHIZAWA Kazuhito ITO
Environmental power generation devices such as solar cells are used as power sources for IoT devices. Due to the large internal resistance of such power source, LSIs in the IoT devices may malfunction when the LSI operates at high speed, a large current flows, and the voltage drops. In this paper, a standard cell library of stacked structured cells is proposed to increase the delay of logic circuits within the range not exceeding the clock cycle, thereby reducing the maximum current of the LSIs. We show that the maximum power consumption of LSIs can be reduced without increasing the energy consumption of the LSIs.
Limengnan ZHOU Qian KONG Hongyu HAN Xing LIU Hanzhou WU
Frequency-hopping sequence (FHS) sets with low-hit-zone (LZH) can be well applied in quasi-synchronous (QS) frequency-hopping multiple-access (FHMA) systems to reduce the mutual interference among different users. On the other hand, LHZ-FHS sets with wide-gap (WG) property can effectively resist the broadband blocking interference, the single frequency narrowband interference, the multipath fading and the tracking interference. In this letter, a new family of WG-LHZ-FHS sets is constructed. Besides, these new WG-LHZ-FHS sets possess optimal average periodic Hamming correlation (APHC) properties.
Abbas JAMALIPOUR Forough SHIRIN ABKENAR
In this paper, we propose a novel Hybrid-Hierarchical spatial-aerial-Terrestrial Edge-Centric (H2TEC) for the space-air integrated Internet of Things (IoT) networks. (H2TEC) comprises unmanned aerial vehicles (UAVs) that act as mobile fog nodes to provide the required services for terminal nodes (TNs) in cooperation with the satellites. TNs in (H2TEC) offload their generated tasks to the UAVs for further processing. Due to the limited energy budget of TNs, a novel task allocation protocol, named TOP, is proposed to minimize the energy consumption of TNs while guaranteeing the outage probability and network reliability for which the transmission rate of TNs is optimized. TOP also takes advantage of the energy harvesting by which the low earth orbit satellites transfer energy to the UAVs when the remaining energy of the UAVs is below a predefined threshold. To this end, the harvested power of the UAVs is optimized alongside the corresponding harvesting time so that the UAVs can improve the network throughput via processing more bits. Numerical results reveal that TOP outperforms the baseline method in critical situations that more power is required to process the task. It is also found that even in such situations, the energy harvesting mechanism provided in the TOP yields a more efficient network throughput.
Takanori HARA Masahiro SASABE Shoji KASAHARA
Traffic congestion in road networks has been studied as the congestion game in game theory. In the existing work, the road usage by each agent was assumed to be static during the whole time horizon of the agent's travel, as in the classical congestion game. This assumption, however, should be reconsidered because each agent sequentially uses roads composing the route. In this paper, we propose a multi-agent distributed route selection scheme based on a gradient descent method considering the time-dependency among agents' road usage for vehicular networks. The proposed scheme first estimates the time-dependent flow on each road by considering the agents' probabilistic occupation under the first-in-first-out (FIFO) policy. Then, it calculates the optimal route choice probability of each route candidate using the gradient descent method and the estimated time-dependent flow. Each agent finally selects one route according to the optimal route choice probabilities. We first prove that the proposed scheme can exponentially converge to the steady-state at the convergence rate inversely proportional to the product of the number of agents and that of individual route candidates. Through simulations under a grid-like network and a real road network, we show that the proposed scheme can improve the actual travel time by 5.1% and 2.5% compared with the conventional static-flow based approach, respectively. In addition, we demonstrate that the proposed scheme is robust against incomplete information sharing among agents, which would be caused by its low penetration ratio or limited transmission range of wireless communications.
In this letter, we study low-density parity-check (LDPC) codes for noisy channels with insertion and deletion (ID) errors. We first propose a design method of irregular LDPC codes for such channels, which can be used to simultaneously obtain degree distributions for different noise levels. We then show the asymptotic/finite-length decoding performances of designed codes and compare them with the symmetric information rates of cascaded ID-noisy channels. Moreover, we examine the relationship between decoding performance and a code structure of irregular LDPC codes.
Riku AKEMA Masao YAMAGISHI Isao YAMADA
The Canonical Polyadic Decomposition (CPD) is the tensor analog of the Singular Value Decomposition (SVD) for a matrix and has many data science applications including signal processing and machine learning. For the CPD, the Alternating Least Squares (ALS) algorithm has been used extensively. Although the ALS algorithm is simple, it is sensitive to a noise of a data tensor in the applications. In this paper, we propose a novel strategy to realize the noise suppression for the CPD. The proposed strategy is decomposed into two steps: (Step 1) denoising the given tensor and (Step 2) solving the exact CPD of the denoised tensor. Step 1 can be realized by solving a structured low-rank approximation with the Douglas-Rachford splitting algorithm and then Step 2 can be realized by solving the simultaneous diagonalization of a matrix tuple constructed by the denoised tensor with the DODO method. Numerical experiments show that the proposed algorithm works well even in typical cases where the ALS algorithm suffers from the so-called bottleneck/swamp effect.
Syful ISLAM Dong WANG Raula GAIKOVINA KULA Takashi ISHIO Kenichi MATSUMOTO
Third-party package usage has become a common practice in contemporary software development. Developers often face different challenges, including choosing the right libraries, installing errors, discrepancies, setting up the environment, and building failures during software development. The risks of maintaining a third-party package are well known, but it is unclear how information from Stack Overflow (SO) can be useful. This paper performed an empirical study to explore npm package co-usage examples from SO. From over 30,000 SO question posts, we extracted 2,100 posts with package usage information and matched them against the 217,934 npm library package. We find that, popular and highly used libraries are not discussed as often in SO. However, we can see that the accepted answers may prove useful, as we believe that the usage examples and executable commands could be reused for tool support.
This paper presents an X-band power-combined pulsed high power amplifier (HPA) based on the low insertion loss waveguide combiner. Relationships between the return loss and isolation of the magic Tee (MT) have been analyzed and the accurate design technique is given. The combination network is validated by the measurement of a single MT and a four-way passive network, and the characterization of the combined HPA module is designed, fabricated and discussed. The HPA delivers 200W output power with an associated power-added efficiency close to 40% within the frequency range of 7.8 GHz to 12.3 GHz. The combination efficiency is higher than 93%.
Hongjie XU Jun SHIOMI Hidetoshi ONODERA
Hardware accelerators are designed to support a specialized processing dataflow for everchanging deep neural networks (DNNs) under various processing environments. This paper introduces two hardware properties to describe the cost of data movement in each memory hierarchy. Based on the hardware properties, this paper proposes a set of evaluation metrics that are able to evaluate the number of memory accesses and the required memory capacity according to the specialized processing dataflow. Proposed metrics are able to analytically predict energy, throughput, and area of a hardware design without detailed implementation. Once a processing dataflow and constraints of hardware resources are determined, the proposed evaluation metrics quickly quantify the expected hardware benefits, thereby reducing design time.
Xianghong HU Hongmin HUANG Xin ZHENG Yuan LIU Xiaoming XIONG
Elliptic curve cryptography (ECC), one of the asymmetric cryptography, is widely used in practical security applications, especially in the Internet of Things (IoT) applications. This paper presents a low-power reconfigurable architecture for ECC, which is capable of resisting simple power analysis attacks (SPA) and can be configured to support all of point operations and modular operations on 160/192/224/256-bit field orders over GF(p). Point multiplication (PM) is the most complex and time-consuming operation of ECC, while modular multiplication (MM) and modular division (MD) have high computational complexity among modular operations. For decreasing power dissipation and increasing reconfigurable capability, a Reconfigurable Modular Multiplication Algorithm and Reconfigurable Modular Division Algorithm are proposed, and MM and MD are implemented by two adder units. Combining with the optimization of operation scheduling of PM, on 55 nm CMOS ASIC platform, the proposed architecture takes 0.96, 1.37, 1.87, 2.44 ms and consumes 8.29, 11.86, 16.20, 21.13 uJ to perform one PM on 160-bit, 192-bit, 224-bit, 256-bit field orders. It occupies 56.03 k gate area and has a power of 8.66 mW. The implementation results demonstrate that the proposed architecture outperforms the other contemporary designs reported in the literature in terms of area and configurability.
Hideya SO Kazuhiko FUKAWA Hayato SOYA Yuyuan CHANG
In unlicensed spectrum, wireless communications employing carrier sense multiple access with collision avoidance (CSMA/CA) suffer from longer transmission delay time as the number of user terminals (UTs) increases, because packet collisions are more likely to occur. To cope with this problem, this paper proposes a new multiuser detection (MUD) scheme that uses both request-to-send (RTS) and enhanced clear-to-send (eCTS) for high-reliable and low-latency wireless communications. As in conventional MUD scheme, the metric-combining MUD (MC-MUD) calculates log likelihood functions called metrics and accumulates the metrics for the maximum likelihood detection (MLD). To avoid increasing the number of states for MLD, MC-MUD forces the relevant UTs to retransmit their packets until all the collided packets are correctly detected, which requires a kind of central control and reduces the system throughput. To overcome these drawbacks, the proposed scheme, which is referred to as cancelling MC-MUD (CMC-MUD), deletes replicas of some of the collided packets from the received signals, once the packets are correctly detected during the retransmission. This cancellation enables new UTs to transmit their packets and then performs MLD without increasing the number of states, which improves the system throughput without increasing the complexity. In addition, the proposed scheme adopts RTS and eCTS. One UT that suffers from packet collision transmits RTS before the retransmission. Then, the corresponding access point (AP) transmits eCTS including addresses of the other UTs, which have experienced the same packet collision. To reproduce the same packet collision, these other UTs transmit their packets once they receive the eCTS. Computer simulations under one AP conditions evaluate an average carrier-to-interference ratio (CIR) range in which the proposed scheme is effective, and clarify that the transmission delay time of the proposed scheme is shorter than that of the conventional schemes. In two APs environments that can cause the hidden terminal problem, it is demonstrated that the proposed scheme achieves shorter transmission delay times than the conventional scheme with RTS and conventional CTS.
Cheng-Chung KUO Ding-Kai TSENG Chun-Wei TSAI Chu-Sing YANG
The development of an efficient detection mechanism to determine malicious network traffic has been a critical research topic in the field of network security in recent years. This study implemented an intrusion-detection system (IDS) based on a machine learning algorithm to periodically convert and analyze real network traffic in the campus environment in almost real time. The focuses of this study are on determining how to improve the detection rate of an IDS and how to detect more non-well-known port attacks apart from the traditional rule-based system. Four new features are used to increase the discriminant accuracy. In addition, an algorithm for balancing the data set was used to construct the training data set, which can also enable the learning model to more accurately reflect situations in real environment.