Akihito HIRAI Kazutomi MORI Masaomi TSURU Mitsuhiro SHIMOZAWA
This paper demonstrates that a 360° radio-frequency phase detector consisting of a combination of symmetrical mixers and 45° phase shifters with tunable devices can achieve a low phase-detection error over a wide frequency range. It is shown that the phase detection error does not depend on the voltage gain of the 45° phase shifter. This allows the usage of tunable devices as 45° phase shifters for a wide frequency range with low phase-detection errors. The fabricated phase detector having tunable low-pass filters as the tunable device demonstrates phase detection errors lower than 2.0° rms in the frequency range from 3.0 GHz to 10.5 GHz.
Enze YANG Shuoyan LIU Yuxin LIU Kai FANG
Crowd flow prediction in high density urban scenes is involved in a wide range of intelligent transportation and smart city applications, and it has become a significant topic in urban computing. In this letter, a CNN-based framework called Pyramidal Spatio-Temporal Network (PSTNet) for crowd flow prediction is proposed. Spatial encoding is employed for spatial representation of external factors, while prior pyramid enhances feature dependence of spatial scale distances and temporal spans, after that, post pyramid is proposed to fuse the heterogeneous spatio-temporal features of multiple scales. Experimental results based on TaxiBJ and MobileBJ demonstrate that proposed PSTNet outperforms the state-of-the-art methods.
The sum rate performance of nonlinier quantized precoding using Gibbs sampling are evaluated in a massive multiuser multiple-input multiple-output (MU-MIMO) system in this paper. Massive MU-MIMO is a key technology to handle the growth of data traffic. In a full digital massive MU-MIMO system, however, the resolution of digital-to-analogue converters (DACs) in transmit antenna branches have to be low to yield acceptable power consumption. Thus, a combinational optimization problem is solved for the nonlinier quantized precoding to determine transmit signals from finite alphabets output from low resolution DACs. A conventional optimization criterion minimizes errors between desired signals and received signals at user equipments (UEs). However, the system sum rate may decrease as it increases the transmit power. This paper proposes two optimization criteria that take the transmit power into account in order to maximize the sum rate. Mixed Gibbs sampling is applied to obtain the suboptimal solution of the nonlinear optimization problem. Numerical results obtained through computer simulations show that the two proposed criteria achieve higher sum rates than the conventional criterion. On the other hand, the sum rate criterion achieves the largest sum rate while it leads to less throughputs than the MMSE criterion on approximately 60% of subcarriers.
Tomohiro TSUKUSHI Satoshi ONO Koji WADA
Realizing frequency rectangular characteristics using a planar circuit made of a normal conductor material such as a printed circuit board (PCB) is difficult. The reason is that the corners of the frequency response are rounded by the effect of the low unloaded quality factors of the resonators. Rectangular frequency characteristics are generally realized by a low-noise amplifier (LNA) with flat gain characteristics and a high-order bandpass filter (BPF) with resonators having high unloaded quality factors. Here, we use an LNA and a fourth-order flat passband BPF made of a PCB to realize the desired characteristics. We first calculate the signal and noise powers to confirm any effects from insertion loss caused by the BPF. Next, we explain the design and fabrication of an LNA, since no proper LNAs have been developed for this research. Finally, the rectangular frequency characteristics are shown by a circuit combining the fabricated LNA and the fabricated flat passband BPF. We show that rectangular frequency characteristics can be realized using a flat passband BPF technique.
Zhengjie LI Jiabao GAO Jinmei LAI
In recent years FPGA has become popular in CNN acceleration, and many CNN-to-FPGA toolchains are proposed to fast deploy CNN on FPGA. However, for these toolchains, updating CNN network means regeneration of RTL code and re-implementation which is time-consuming and may suffer timing-closure problems. So, we propose HBDCA: a toolchain and corresponding accelerator. The CNN on HBDCA is defined by the content of BRAM. The toolchain integrates UpdateMEM utility of Xilinx, which updates content of BRAM without re-synthesis and re-implementation process. The toolchain also integrates TensorFlow Lite which provides high-accuracy quantization. HBDCA supports 8-bits per-channel quantization of weights and 8-bits per-layer quantization of activations. Upgrading CNN on accelerator means the kernel size of CNN may change. Flexible structure of HBDCA supports kernel-level parallelism with three different sizes (3×3, 5×5, 7×7). HBDCA implements four types of parallelism in convolution layer and two types of parallelism in fully-connected layer. In order to reduce access number to memory, both spatial and temporal data-reuse techniques were applied on convolution layer and fully-connect layer. Especially, temporal reuse is adopted at both row and column level of an Input Feature Map of convolution layer. Data can be just read once from BRAM and reused for the following clock. Experiments show by updating BRAM content with single UpdateMEM command, three CNNs with different kernel size (3×3, 5×5, 7×7) are implemented on HBDCA. Compared with traditional design flow, UpdateMEM reduces development time by 7.6X-9.1X for different synthesis or implementation strategy. For similar CNN which is created by toolchain, HBDCA has smaller latency (9.97µs-50.73µs), and eliminates re-implementation when update CNN. For similar CNN which is created by dedicated design, HBDCA also has the smallest latency 9.97µs, the highest accuracy 99.14% and the lowest power 1.391W. For different CNN which is created by similar toolchain which eliminate re-implementation process, HBDCA achieves higher speedup 120.28X.
In this paper, for the purpose of clarifying the desired ITS information and communication systems considering both safety and social feasibility to prevention overengineering, using a microscopic traffic flow simulator, we discuss the required information acquisition rate of three types of safety driving support systems, that is, the sensor type and the communication type, the sensor and communication fusion type. Performances are evaluated from the viewpoint of preventing overengineering performance using the “TsRm evaluation method” that considers a vehicle approaching within the range of R meters within T seconds as the vehicle with a high possibility of collision, and that evaluates only those vehicles. The results show that regarding the communication radius and the sensing range, overengineering performance may be estimated when all vehicles in the evaluation area are used for evaluations without considering each vehicle's location, velocity and acceleration as in conventional evaluations. In addition, it is clarified that the sensor and communication fusion type system is advantageous by effectively complementing the defects of the sensor type systems and the communication type systems.
Nobuhide NONAKA Satoshi SUYAMA Tatsuki OKUYAMA Kazushi MURAOKA Yukihiko OKUMURA
In order to realize the higher bit rates compared for the fifth-generation (5G) mobile communication system, massive MIMO technologies in higher frequency bands with wider bandwidth are being investigated for 5G evolution and 6G. One of practical method to realize massive MIMO in the high frequency bands is hybrid beamforming (BF). With this approach, user selection is an important function because its performance is highly affected by inter-user interference. However, the computational complexity of user selection in multi-user massive MIMO is high because MIMO channel matrix size excessive. Furthermore, satisfying user fairness by proportional fairness (PF) criteria leads to further increase of the complexity because re-calculation of precoding and postcoding matrices is required for each combination of selected users. To realize a fair and low-complexity user selection algorithm for multi-user massive MIMO employing hybrid BF, this paper proposes a two-step user selection algorithm that combines PF based user selection and chordal distance user selection. Computer simulations show that the proposed two-step user selection algorithm with higher user fairness and lower computational complexity can achieve higher system performance than the conventional user selection algorithms.
Seiichi KOJIMA Noriaki SUETAKE
LIME is a method for low-light image enhancement. Though LIME significantly enhances the contrast in dark regions, the effect of contrast enhancement tends to be insufficient in bright regions. In this letter, we propose an improved method of LIME. In the proposed method, the contrast in bright regions are improved while maintaining the contrast enhancement effect in dark regions.
Jiabao GAO Yuchen YAO Zhengjie LI Jinmei LAI
A series of Binarized Neural Networks (BNNs) show the accepted accuracy in image classification tasks and achieve the excellent performance on field programmable gate array (FPGA). Nevertheless, we observe existing designs of BNNs are quite time-consuming in change of the target BNN and acceleration of a new BNN. Therefore, this paper presents FCA-BNN, a flexible and configurable accelerator, which employs the layer-level configurable technique to execute seamlessly each layer of target BNN. Initially, to save resource and improve energy efficiency, the hardware-oriented optimal formulas are introduced to design energy-efficient computing array for different sizes of padded-convolution and fully-connected layers. Moreover, to accelerate the target BNNs efficiently, we exploit the analytical model to explore the optimal design parameters for FCA-BNN. Finally, our proposed mapping flow changes the target network by entering order, and accelerates a new network by compiling and loading corresponding instructions, while without loading and generating bitstream. The evaluations on three major structures of BNNs show the differences between inference accuracy of FCA-BNN and that of GPU are just 0.07%, 0.31% and 0.4% for LFC, VGG-like and Cifar-10 AlexNet. Furthermore, our energy-efficiency results achieve the results of existing customized FPGA accelerators by 0.8× for LFC and 2.6× for VGG-like. For Cifar-10 AlexNet, FCA-BNN achieves 188.2× and 60.6× better than CPU and GPU in energy efficiency, respectively. To the best of our knowledge, FCA-BNN is the most efficient design for change of the target BNN and acceleration of a new BNN, while keeps the competitive performance.
Masato KIKUCHI Kento KAWAKAMI Kazuho WATANABE Mitsuo YOSHIDA Kyoji UMEMURA
Likelihood ratios (LRs), which are commonly used for probabilistic data processing, are often estimated based on the frequency counts of individual elements obtained from samples. In natural language processing, an element can be a continuous sequence of N items, called an N-gram, in which each item is a word, letter, etc. In this paper, we attempt to estimate LRs based on N-gram frequency information. A naive estimation approach that uses only N-gram frequencies is sensitive to low-frequency (rare) N-grams and not applicable to zero-frequency (unobserved) N-grams; these are known as the low- and zero-frequency problems, respectively. To address these problems, we propose a method for decomposing N-grams into item units and then applying their frequencies along with the original N-gram frequencies. Our method can obtain the estimates of unobserved N-grams by using the unit frequencies. Although using only unit frequencies ignores dependencies between items, our method takes advantage of the fact that certain items often co-occur in practice and therefore maintains their dependencies by using the relevant N-gram frequencies. We also introduce a regularization to achieve robust estimation for rare N-grams. Our experimental results demonstrate that our method is effective at solving both problems and can effectively control dependencies.
This paper reviews the evolutionary process that reduced the transmission loss of silica optical fibers from the report of 20dB/km by Corning in 1970 to the current record-low loss. At an early stage, the main effort was to remove impurities especially hydroxy groups for fibers with GeO2-SiO2 core, resulting in the loss of 0.20dB/km in 1980. In order to suppress Rayleigh scattering due to composition fluctuation, pure-silica-core fibers were developed, and the loss of 0.154dB/km was achieved in 1986. As the residual main factor of the loss, Rayleigh scattering due to density fluctuation was actively investigated by utilizing IR and Raman spectroscopy in the 1990s and early 2000s. Now, ultra-low-loss fibers with the loss of 0.150dB/km are commercially available in trans-oceanic submarine cable systems.
Akira KITAYAMA Goichi ONO Tadashi KISHIMOTO Hiroaki ITO Naohiro KOHMU
Reducing power consumption is crucial for edge devices using convolutional neural network (CNN). The zero-skipping approach for CNNs is a processing technique widely known for its relatively low power consumption and high speed. This approach stops multiplication and accumulation (MAC) when the multiplication results of the input data and weight are zero. However, this technique requires large logic circuits with around 5% overhead, and the average rate of MAC stopping is approximately 30%. In this paper, we propose a precise zero-skipping method that uses input data and simple logic circuits to stop multipliers and accumulators precisely. We also propose an active data-skipping method to further reduce power consumption by slightly degrading recognition accuracy. In this method, each multiplier and accumulator are stopped by using small values (e.g., 1, 2) as input. We implemented single shot multi-box detector 500 (SSD500) network model on a Xilinx ZU9 and applied our proposed techniques. We verified that operations were stopped at a rate of 49.1%, recognition accuracy was degraded by 0.29%, power consumption was reduced from 9.2 to 4.4 W (-52.3%), and circuit overhead was reduced from 5.1 to 2.7% (-45.9%). The proposed techniques were determined to be effective for lowering the power consumption of CNN-based edge devices such as FPGA.
In this paper, we propose a robust parameters estimation algorithm for channel coded systems based on the low-density parity-check (LDPC) code over fading channels with impulse noise. The estimated parameters are then used to generate bit log-likelihood ratios (LLRs) for a soft-inputLDPC decoder. The expectation-maximization (EM) algorithm is used to estimate the parameters, including the channel gain and the parameters of the Bernoulli-Gaussian (B-G) impulse noise model. The parameters can be estimated accurately and the average number of iterations of the proposed algorithm is acceptable. Simulation results show that over a wide range of impulse noise power, the proposed algorithm approaches the optimal performance under different Rician channel factors and even under Middleton class-A (M-CA) impulse noise models.
Yuya OMORI Ken NAKAMURA Takayuki ONISHI Daisuke KOBAYASHI Tatsuya OSAWA Hiroe IWASAKI
This paper describes a novel 4K 120fps (frames per second) real-time HEVC (High Efficiency Video Coding) encoder for high-frame-rate video encoding and transmission. Motion portrayal problems such as motion blur and jerkiness may occur in video scenes containing fast-moving objects or quick camera panning. A high-frame-rate solves such problems and provides a more immersive viewing experience that can express even the fast-moving scenes without discomfort. It can also be used in remote operation for scenes with high motion, such as VAR (Video Assistant Referee) systems in sports. Real-time encoding of high-frame-rate videos with low latency and temporal scalability is required for providing such high-frame-rate video services. The proposed encoder achieves full 4K/120fps real-time encoding, which is twice the current 4K service frame rate of 60fps, by multichip configuration with two encoder LSI. Exchange of reference picture data near a spatially divided slice boundary provides cross-chip motion estimation, and maintains the coding efficiency. The encoder supports temporal-scalable coding mode, in which it output stream with temporal scalability transmitted over one or two transmission paths. The encoder also supports the other mode, low-delay coding mode, in which it achieves 21.8msec low-latency processing through motion vector restriction. Evaluation of the proposed encoder's multichip configuration shows that the BD-bitrate (the average rate of bitrate increase), compared to simple slice division without inter-chip transfer, is -2.86% at minimum and -2.41% on average in temporal-scalable coding mode. The proposed encoder system will open the door to the next generation of high-frame-rate UHDTV (ultra-high-definition television) services.
Ruilin ZHANG Xingyu WANG Hirofumi SHINOHARA
In this paper, we describe a post-processing technique having high extraction efficiency (ExE) for de-biasing and de-correlating a random bitstream generated by true random number generators (TRNGs). This research is based on the N-bit von Neumann (VN_N) post-processing method. It improves the ExE of the original von Neumann method close to the Shannon entropy bound by a large N value. However, as the N value increases, the mapping table complexity increases exponentially (2N), which makes VN_N unsuitable for low-power TRNGs. To overcome this problem, at the algorithm level, we propose a waiting strategy to achieve high ExE with a small N value. At the architectural level, a Hamming weight mapping-based hierarchical structure is used to reconstruct the large mapping table using smaller tables. The hierarchical structure also decreases the correlation factor in the raw bitstream. To develop a technique with high ExE and low cost, we designed and fabricated an 8-bit von Neumann with waiting strategy (VN_8W) in a 130-nm CMOS. The maximum ExE of VN_8W is 62.21%, which is 2.49 times larger than the ExE of the original von Neumann. NIST SP 800-22 randomness test results proved the de-biasing and de-correlation abilities of VN_8W. As compared with the state-of-the-art optimized 7-element iterated von Neumann, VN_8W achieved more than 20% energy reduction with higher ExE. At 0.45V and 1MHz, VN_8W achieved the minimum energy of 0.18pJ/bit, which was suitable for sub-pJ low energy TRNGs.
Zheng SUN Hanli LIU Dingxin XU Hongye HUANG Bangan LIU Zheng LI Jian PANG Teruki SOMEYA Atsushi SHIRANE Kenichi OKADA
This paper presents a high jitter performance injection-locked clock multiplier (ILCM) using an ultra-low power (ULP) voltage-controlled oscillator (VCO) for IoT application in 65-nm CMOS. The proposed transformer-based VCO achieves low flicker noise corner and sub-100µW power consumption. Double cross-coupled NMOS transistors sharing the same current provide high transconductance. The network using high-Q factor transformer (TF) provides a large tank impedance to minimize the current requirement. Thanks to the low current bias with a small conduction angle in the ULP VCO design, the proposed TF-based VCO's flicker noise can be suppressed, and a good PN can be achieved in flicker region (1/f3) with sub-100µW power consumption. Thus, a high figure-of-merit (FoM) can be obtained at both 100kHz and 1MHz without additional inductor. The proposed VCO achieves phase noise of -94.5/-115.3dBc/Hz at 100kHz/1MHz frequency offset with a 97µW power consumption, which corresponds to a -193/-194dBc/Hz VCO FoM at 2.62GHz oscillation frequency. The measurement results show that the 1/f3 corner is below 60kHz over the tuning range from 2.57GHz to 3.40GHz. Thanks to the proposed low power VCO, the total ILCM achieves 78 fs RMS jitter while using a high reference clock. A 960 fs RMS jitter can be achieved with a 40MHz common reference and 107µW corresponding power.
This letter presents an efficient technique to reduce the computational complexity involved in training binary convolutional neural networks (BCNN). The BCNN training shall be conducted focusing on the optimization of the sign of each weight element rather than the exact value itself in convention; in which, the sign of an element is not likely to be flipped anymore after it has been updated to have such a large magnitude to be clipped out. The proposed technique does not update such elements that have been clipped out and eliminates the computations involved in their optimization accordingly. The complexity reduction by the proposed technique is as high as 25.52% in training the BCNN model for the CIFAR-10 classification task, while the accuracy is maintained without severe degradation.
Lingshu LI Jiangxing WU Wei ZENG Xiaotao CHENG
Existing cyber deception technologies (e.g., operating system obfuscation) can effectively disturb attackers' network reconnaissance and hide fingerprint information of valuable cyber assets (e.g., containers). However, they exhibit ineffectiveness against skilled attackers. In this study, a proactive fingerprint deception method is proposed, termed as Continuously Anonymizing Containers' Fingerprints (CACF), which modifies the container's fingerprint in the cloud resource pool to satisfy the anonymization standard. As demonstrated by experimental results, the CACF can effectively increase the difficulty for attackers.
Seiki KOTACHI Takehiro SATO Ryoichi SHINKUMA Eiji OKI
The Software-defined network (SDN) uses a centralized SDN controller to store flow entries in the flow table of each SDN switch; the entries in the switch control packet flows. When a multicast service is provided in an SDN, the SDN controller stores a multicast entry dedicated for a multicast group in each SDN switch. Due to the limited capacity of each flow table, the number of flow entries required to set up a multicast tree must be suppressed. A conventional multicast routing scheme suppresses the number of multicast entries in one multicast tree by replacing some of them with unicast entries. However, since the conventional scheme individually determines a multicast tree for each request, unicast entries dedicated to the same receiver are distributed to various SDN switches if there are multiple multicast service requests. Therefore, further reduction in the number of flow entries is still possible. In this paper, we propose a multicast routing model for multiple multicast requests that minimizes the number of flow entries. This model determines multiple multicast trees simultaneously so that a unicast entry dedicated to the same receiver and stored in the same SDN switch is shared by multicast trees. We formulate the proposed model as an integer linear programming (ILP) problem. In addition, we develop a heuristic algorithm which can be used when the ILP problem cannot be solved in practical time. Numerical results show that the proposed model reduces the required number of flow entries compared to two benchmark models; the maximum reduction ratio is 49.3% when the number of multicast requests is 40.
Koichiro YAMANAKA Keita TAKAHASHI Toshiaki FUJII Ryuraroh MATSUMOTO
Thanks to the excellent learning capability of deep convolutional neural networks (CNNs), CNN-based methods have achieved great success in computer vision and image recognition tasks. However, it has turned out that these methods often have inherent vulnerabilities, which makes us cautious of the potential risks of using them for real-world applications such as autonomous driving. To reveal such vulnerabilities, we propose a method of simultaneously attacking monocular depth estimation and optical flow estimation, both of which are common artificial-intelligence-based tasks that are intensively investigated for autonomous driving scenarios. Our method can generate an adversarial patch that can fool CNN-based monocular depth estimation and optical flow estimation methods simultaneously by simply placing the patch in the input images. To the best of our knowledge, this is the first work to achieve simultaneous patch attacks on two or more CNNs developed for different tasks.