Yun JIANG Huiyang LIU Xiaopeng JIAO Ji WANG Qiaoqiao XIA
In this letter, a novel projection algorithm is proposed in which projection onto a triangle consisting of the three even-vertices closest to the vector to be projected replaces check polytope projection, achieving the same FER performance as exact projection algorithm in both high-iteration and low-iteration regime. Simulation results show that compared with the sparse affine projection algorithm (SAPA), it can improve the FER performance by 0.2 dB as well as save average number of iterations by 4.3%.
Takao WAHO Akihisa KOYAMA Hitoshi HAYASHI
Signal processing using delta-sigma modulated bit streams is reviewed, along with related topics in stochastic computing (SC). The basic signal processing circuits, adders and multipliers, are covered. In particular, the possibility of preserving the noise-shaping properties inherent in delta-sigma modulation during these operations is discussed. Finally, the root mean square error for addition and multiplication is evaluated, and the performance improvement of signal processing in the delta-sigma domain compared with SC is verified.
Fujihiko MATSUMOTO Hinano OHTSU
In a field of biomedical engineering, not only low-pass filters for high frequency elimination but also notch filters for suppressing powerline interference are necessary to process low-frequency biosignals. For integration of low-frequency filters, chip implementation of large capacitances is major difficulty. As methods to enhance capacitances with small chip area, use of capacitance multipliers is effective. This letter describes design consideration of integrated low-frequency low-pass notch filter employing capacitance multipliers. Two main points are presented. Firstly, a new floating capacitance multiplier is proposed. Secondly, a technique to reduce the number of capacitance multipliers is proposed. By this technique, power consumption is reduced. The proposed techniques are applied a 3rd order low-pass notch filter. Simulation results show the effectiveness of the proposed techniques.
Sho OBATA Koichi KOBAYASHI Yuh YAMASHITA
In a power network, it is important to detect a cyber attack. In this paper, we propose a method for detecting false data injection (FDI) attacks in distributed state estimation. An FDI attack is well known as one of the typical cyber attacks in a power network. As a method of FDI attack detection, we consider calculating the residual (i.e., the difference between the observed and estimated values). In the proposed detection method, the tentative residual (estimated error) in ADMM (Alternating Direction Method of Multipliers), which is one of the powerful methods in distributed optimization, is applied. First, the effect of an FDI attack is analyzed. Next, based on the analysis result, a detection parameter is introduced based on the residual. A detection method using this parameter is then proposed. Finally, the proposed method is demonstrated through a numerical example on the IEEE 14-bus system.
Yujin ZHENG Junwei ZHANG Yan LIN Qinglin ZHANG Qiaoqiao XIA
The Euclidean projection operation is the most complex and time-consuming of the alternating direction method of multipliers (ADMM) decoding algorithms, resulting in a large number of resources when deployed on hardware platforms. We propose a simplified line segment projection algorithm (SLSA) and present the hardware design and the quantization scheme of the SLSA. In simulation results, the proposed SLSA module has a better performance than the original algorithm with the same fixed bitwidths due to the centrosymmetric structure of SLSA. Furthermore, the proposed SLSA module with a simpler structure without hypercube projection can reduce time consuming by up to 72.2% and reduce hardware resource usage by more than 87% compared to other Euclidean projection modules in the experiments.
Ibrahim ABDO Korkut Kaan TOKGOZ Atsushi SHIRANE Kenichi OKADA
This paper introduces several design techniques to improve the performance of CMOS frequency multipliers that operate at the sub-THz band without increasing the complexity and the power consumption of the circuit. The proposed techniques are applied to a device nonlinearity-based frequency tripler and to a push-push frequency doubler. By utilizing the fundamental and second harmonic feedback cancellation, the tripler achieves -2.9dBm output power with a simple single-ended circuit architecture reducing the required area and power consumption. The tripler operates at frequencies from 103GHz to 130GHz. The introduced modified push-push doubler provides 2.3dB conversion gain including the balun losses and it has good tolerance against balun mismatches. The output frequency of the doubler is from 118GHz to 124GHz. Both circuits were designed and fabricated using CMOS 65nm technology.
Yujin ZHENG Yan LIN Zhuo ZHANG Qinglin ZHANG Qiaoqiao XIA
Linear programming (LP) decoding based on the alternating direction method of multipliers (ADMM) has proved to be effective for low-density parity-check (LDPC) codes. However, for high-density parity-check (HDPC) codes, the ADMM-LP decoder encounters two problems, namely a high-density check matrix in HDPC codes and a great number of pseudocodewords in HDPC codes' fundamental polytope. The former problem makes the check polytope projection extremely complex, and the latter one leads to poor frame error rates (FER) performance. To address these issues, we introduce the even vertex algorithm (EVA) into the ADMM-LP decoding algorithm for HDPC codes, named as HDPC-EVA. HDPC-EVA can reduce the complexity of the projection process and improve the FER performance. We further enhance the proposed decoder by the automorphism groups of codes, creating diversity in the parity-check matrix. The simulation results show that the proposed decoder is capable of cutting down the average decoding time for each iteration by 30%-60%, as well as achieving near maximum likelihood (ML) performance on some BCH codes.
Zheng SUN Hanli LIU Dingxin XU Hongye HUANG Bangan LIU Zheng LI Jian PANG Teruki SOMEYA Atsushi SHIRANE Kenichi OKADA
This paper presents a high jitter performance injection-locked clock multiplier (ILCM) using an ultra-low power (ULP) voltage-controlled oscillator (VCO) for IoT application in 65-nm CMOS. The proposed transformer-based VCO achieves low flicker noise corner and sub-100µW power consumption. Double cross-coupled NMOS transistors sharing the same current provide high transconductance. The network using high-Q factor transformer (TF) provides a large tank impedance to minimize the current requirement. Thanks to the low current bias with a small conduction angle in the ULP VCO design, the proposed TF-based VCO's flicker noise can be suppressed, and a good PN can be achieved in flicker region (1/f3) with sub-100µW power consumption. Thus, a high figure-of-merit (FoM) can be obtained at both 100kHz and 1MHz without additional inductor. The proposed VCO achieves phase noise of -94.5/-115.3dBc/Hz at 100kHz/1MHz frequency offset with a 97µW power consumption, which corresponds to a -193/-194dBc/Hz VCO FoM at 2.62GHz oscillation frequency. The measurement results show that the 1/f3 corner is below 60kHz over the tuning range from 2.57GHz to 3.40GHz. Thanks to the proposed low power VCO, the total ILCM achieves 78 fs RMS jitter while using a high reference clock. A 960 fs RMS jitter can be achieved with a 40MHz common reference and 107µW corresponding power.
The PCHS (Park-Chang-Hong-Seo) algorithm is a varied Karatsuba algorithm (KA) that utilizes a different splitting strategy with no overlap module. Such an algorithm has been applied to develop efficient hybrid GF(2m) multipliers for irreducible trinomials and pentanomials. However, compared with KA-based hybrid multipliers, these multipliers usually match space complexity but require more gates delay. In this paper, we proposed a new design of hybrid multiplier using PCHS algorithm for irreducible all-one polynomial. The proposed scheme skillfully utilizes redundant representation to combine and simplify the subexpressions computation, which result in a significant speedup of the implementation. As a main contribution, the proposed multiplier has exactly the same space and time complexities compared with the KA-based scheme. It is the first time to show that different splitting strategy for KA also can develop the same efficient multiplier.
For low-density parity-check (LDPC) codes, the penalized decoding method based on the alternating direction method of multipliers (ADMM) can improve the decoding performance at low signal-to-noise ratios and also has low decoding complexity. There are three effective methods that could increase the ADMM penalized decoding speed, which are reducing the number of Euclidean projections in ADMM penalized decoding, designing an effective penalty function and selecting an appropriate layered scheduling strategy for message transmission. In order to further increase the ADMM penalized decoding speed, through reducing the number of Euclidean projections and using the vertical layered scheduling strategy, this paper designs a fast converging ADMM penalized decoding method based on the improved penalty function. Simulation results show that the proposed method not only improves the decoding performance but also reduces the average number of iterations and the average decoding time.
Yi GUO Heming SUN Ping LEI Shinji KIMURA
Approximate multiplier design is an effective technique to improve hardware performance at the cost of accuracy loss. The current approximate multipliers are mostly ASIC-based and are dedicated for one particular application. In contrast, FPGA has been an attractive choice for many applications because of its high performance, reconfigurability, and fast development round. This paper presents a novel methodology for designing approximate multipliers by employing the FPGA-based fabrics (primarily look-up tables and carry chains). The area and latency are significantly reduced by applying approximation on carry results and cutting the carry propagation path in the multiplier. Moreover, we explore higher-order multipliers on architectural space by using our proposed small-size approximate multipliers as elementary modules. For different accuracy-hardware requirements, eight configurations for approximate 8×8 multiplier are discussed. In terms of mean relative error distance (MRED), the error of the proposed 8×8 multiplier is as low as 1.06%. Compared with the exact multiplier, our proposed design can reduce area by 43.66% and power by 24.24%. The critical path latency reduction is up to 29.50%. The proposed multiplier design has a better accuracy-hardware tradeoff than other designs with comparable accuracy. Moreover, image sharpening processing is used to assess the efficiency of approximate multipliers on application.
Yi GUO Heming SUN Ping LEI Shinji KIMURA
Approximate computing has emerged as a promising approach for error-tolerant applications to improve hardware performance at the cost of some loss of accuracy. Multiplication is a key arithmetic operation in these applications. In this paper, we propose a low-cost approximate multiplier design by employing new probability-driven inexact compressors. This compressor design is introduced to reduce the height of partial product matrix into two rows, based on the probability distribution of the sum result of partial products. To compensate the accuracy loss of the multiplier, a grouped error recovery scheme is proposed and achieves different levels of accuracy. In terms of mean relative error distance (MRED), the accuracy losses of the proposed multipliers are from 1.07% to 7.86%. Compared with the Wallace multiplier using 40nm process, the most accurate variant of the proposed multipliers can reduce power by 59.75% and area by 42.47%. The critical path delay reduction is larger than 12.78%. The proposed multiplier design has a better accuracy-performance trade-off than other designs with comparable accuracy. In addition, the efficiency of the proposed multiplier design is assessed in an image processing application.
Yan LIN Qiaoqiao XIA Wenwu HE Qinglin ZHANG
Using linear programming (LP) decoding based on alternating direction method of multipliers (ADMM) for low-density parity-check (LDPC) codes shows lower complexity than the original LP decoding. However, the development of the ADMM-LP decoding algorithm could still be limited by the computational complexity of Euclidean projections onto parity check polytope. In this paper, we proposed a bisection method iterative algorithm (BMIA) for projection onto parity check polytope avoiding sorting operation and the complexity is linear. In addition, the convergence of the proposed algorithm is more than three times as fast as the existing algorithm, which can even be 10 times in the case of high input dimension.
Nobutaka KITO Ryota ODAKA Kazuyoshi TAKAGI
A rapid single-flux-quantum (RSFQ) truncated multiplier based on bit-level processing is proposed. In the multiplier, two operands are transformed to two serialized patterns of bits (pulses), and the multiplication is carried out by processing those bits. The result is obtained by counting bits. By calculating in bit-level, the proposed multiplier can be implemented in small area. The gate level design of the multiplier is shown. The layout of the 4-bit multiplier was also designed.
Tongxin YANG Tomoaki UKEZONO Toshinori SATO
Many applications, such as image signal processing, has an inherent tolerance for insignificant inaccuracies. Multiplication is a key arithmetic function for many applications. Approximate multipliers are considered an efficient technique to trade off energy relative to performance and accuracy for the error-tolerant applications. Here, we design and analyze four approximate multipliers that demonstrate lower power consumption and shorter critical path delay than the conventional multiplier. They employ an approximate tree compressor that halves the height of the partial product tree and generates a vector to compensate accuracy. Compared with the conventional Wallace tree multiplier, one of the evaluated 8-bit approximate multipliers reduces power consumption and critical path delay by 36.9% and 38.9%, respectively. With a 0.25% normalized mean error distance, the silicon area required to implement the multiplier is reduced by 50.3%. Our multipliers outperform the previously proposed approximate multipliers relative to power consumption, critical path delay, and design area. Results from two image processing applications also demonstrate that the qualities of the images processed by our multipliers are sufficiently accurate for such error-tolerant applications.
Song BIAN Masayuki HIROMOTO Takashi SATO
In this work, we provide the first practical secure email filtering scheme based on homomorphic encryption. Specifically, we construct a secure naïve Bayesian filter (SNBF) using the Paillier scheme, a partially homomorphic encryption (PHE) scheme. We first show that SNBF can be implemented with only the additive homomorphism, thus eliminating the need to employ expensive fully homomorphic schemes. In addition, the design space for specialized hardware architecture realizing SNBF is explored. We utilize a recursive Karatsuba Montgomery structure to accelerate the homomorphic operations, where multiplication of 2048-bit integers are carried out. Through the experiment, both software and hardware versions of the SNBF are implemented. On software, 104-105x runtime and 103x storage reduction are achieved by SNBF, when compared to existing fully homomorphic approaches. By instantiating the designed hardware for SNBF, a further 33x runtime and 1919x power reduction are achieved. The proposed hardware implementation classifies an average-length email in under 0.5s, which is much more practical than existing solutions.
Tongxin YANG Tomoaki UKEZONO Toshinori SATO
Multiplication is a key fundamental function for many error-tolerant applications. Approximate multiplication is considered to be an efficient technique for trading off energy against performance and accuracy. This paper proposes an accuracy-controllable multiplier whose final product is generated by a carry-maskable adder. The proposed scheme can dynamically select the length of the carry propagation to satisfy the accuracy requirements flexibly. The partial product tree of the multiplier is approximated by the proposed tree compressor. An 8×8 multiplier design is implemented by employing the carry-maskable adder and the compressor. Compared with a conventional Wallace tree multiplier, the proposed multiplier reduced power consumption by between 47.3% and 56.2% and critical path delay by between 29.9% and 60.5%, depending on the required accuracy. Its silicon area was also 44.6% smaller. In addition, results from two image processing applications demonstrate that the quality of the processed images can be controlled by the proposed multiplier design.
The combination of large-scale antenna arrays and simultaneous wireless information and power transfer (SWIPT), which can provide enormous increase of throughput and energy efficiency is a promising key in next generation wireless system (5G). This paper investigates efficient transceiver design to minimize transmit power, subject to users' required data rates and energy harvesting, in large-scale SWIPT system where the base station utilizes a very large number of antennas for transmitting both data and energy to multiple users equipped with time-switching (TS) or power-splitting (PS) receive structures. We first propose the well-known semidefinite relaxation (SDR) and Gaussian randomization techniques to solve the minimum transmit power problems. However, for these large-scale SWIPT problems, the proposed scheme, which is based on conventional SDR method, is not suitable due to its excessive computation costs, and a consensus alternating direction method of multipliers (ADMM) cannot be directly applied to the case that TS or PS ratios are involved in the optimization problem. Therefore, in the second solution, our first step is to optimize the variables of TS or PS ratios, and to achieve simplified problems. After then, we propose fast algorithms for solving these problems, where the outer loop of sequential parametric convex approximation (SPCA) is combined with the inner loop of ADMM. Numerical simulations show the fast convergence and superiority of the proposed solutions.
Shinsuke HARA Kosuke KATAYAMA Kyoya TAKANO Ruibing DONG Issei WATANABE Norihiko SEKINE Akifumi KASAMATSU Takeshi YOSHIDA Shuhei AMAKAWA Minoru FUJISHIMA
This paper presents low-noise amplifier (LNA)-less 300-GHz CMOS receivers that operate above the NMOS unity-power-gain frequency, fmax. The receivers consist of a down-conversion mixer with a doubler- or tripler-last multiplier chain that upconverts an LO1/n signal into 300 GHz. The conversion gain of the receiver with the doubler-last multiplier is -19.5 dB and its noise figure, 3-dB bandwidth, and power consumption are 27 dB, 27 GHz, and 0.65 W, respectively. The conversion gain of the receiver with the tripler-last multiplier is -18 dB and its noise figure, 3-dB bandwidth, and power consumption are 25.5 dB, 33 GHz, and 0.41 W, respectively. The receivers achieve a wireless data rate of 32 Gb/s with 16QAM. This shows the potential of the moderate-fmax CMOS technology for ultrahigh-speed THz wireless communications.
Biao WANG Xiaopeng JIAO Jianjun MU Zhongfei WANG
By tracking the changing rate of hard decisions during every two consecutive iterations of the alternating direction method of multipliers (ADMM) penalized decoding, an efficient early termination (ET) criterion is proposed to improve the convergence rate of ADMM penalized decoder for low-density parity-check (LDPC) codes. Compared to the existing ET criterion for ADMM penalized decoding, the proposed method can reduce the average number of iterations significantly at low signal-to-noise ratios with negligible performance degradation.