Takao WAHO Akihisa KOYAMA Hitoshi HAYASHI
Signal processing using delta-sigma modulated bit streams is reviewed, along with related topics in stochastic computing (SC). The basic signal processing circuits, adders and multipliers, are covered. In particular, the possibility of preserving the noise-shaping properties inherent in delta-sigma modulation during these operations is discussed. Finally, the root mean square error for addition and multiplication is evaluated, and the performance improvement of signal processing in the delta-sigma domain compared with SC is verified.
Yanyan CHANG Wei ZHANG Hao WANG Lina SHI Yanyan LIU
This letter introduces a prime-factor Galois field Fourier transform (PF-GFFT) architecture to frequency domain decoding (FDD) of cyclic codes. Firstly, a fast FDD scheme is designed which converts the original single longer Fourier transform to a multi-dimensional smaller transform. Furthermore, a ladder-shift architecture for PF-GFFT is explored to solve the rearrangement problem of input and output data. In this regard, PF-GFFT is considered as a lower order spectral calculation scheme, which has sufficient preponderance in reducing the computational complexity. Simulation results show that PF-GFFT compares favorably with the current general GFFT, simplified-GFFT (S-GFFT), and circular shifts-GFFT (CS-GFFT) algorithms in time-consuming cost, and is nearly an order of magnitude or smaller than them. The superiority is a benefit to improving the decoding speed and has potential application value in decoding cyclic codes with longer code lengths.
Kaizhan LIN Fangguo ZHANG Chang-An ZHAO
Supersingular isogeny Diffie-Hellman (SIDH) is attractive for its relatively small public key size, but it is still unsatisfactory due to its efficiency, compared to other post-quantum proposals. In this paper, we focus on the performance of SIDH when the starting curve is E6 : y2 = x3 + 6x2 + x, which is fixed in Round-3 SIKE implementation. Inspired by previous works [1], [2], we present several tricks to accelerate key generation of SIDH and each process of SIKE. Our experimental results show that the performance of this work is at least 6.09% faster than that of the SIKE implementation, and we can further improve the performance when large storage is available.
Dongyue JIN Luming CAO You WANG Xiaoxue JIA Yongan PAN Yuxin ZHOU Xin LEI Yuanyuan LIU Yingqi YANG Wanrong ZHANG
Fast switching speed, low power consumption, and good stability are some of the important properties of spin transfer torque assisted voltage controlled magnetic anisotropy magnetic tunnel junction (STT-assisted VCMA-MTJ) which makes the non-volatile full adder (NV-FA) based on it attractive for Internet of Things. However, the effects of process variations on the performances of STT-assisted VCMA-MTJ and NV-FA will be more and more obvious with the downscaling of STT-assisted VCMA-MTJ and the improvement of chip integration. In this paper, a more accurate electrical model of STT-assisted VCMA-MTJ is established on the basis of the magnetization dynamics and the process variations in film growth process and etching process. In particular, the write voltage is reduced to 0.7 V as the film thickness is reduced to 0.9 nm. The effects of free layer thickness variation (γtf) and oxide layer thickness variation (γtox) on the state switching as well as the effect of tunnel magnetoresistance ratio variation (β) on the sensing margin (SM) are studied in detail. Considering that the above process variations follow Gaussian distribution, Monte Carlo simulation is used to study the effects of the process variations on the writing and output operations of NV-FA. The result shows that the state of STT-assisted VCMA-MTJ can be switched under -0.3%≤γtf≤6% or -23%≤γtox≤0.2%. SM is reduced by 16.0% with β increases from 0 to 30%. The error rates of writing ‘0’ in the NV-FA can be reduced by increasing Vb1 or increasing positive Vb2. The error rates of writing ‘1’ can be reduced by increasing Vb1 or decreasing negative Vb2. The reduction of the output error rates can be realized effectively by increasing the driving voltage (Vdd).
Tomoyuki TANAKA Christopher L. AYALA Nobuyuki YOSHIKAWA
Extremely energy-efficient logic devices are required for future low-power high-performance computing systems. Superconductor electronic technology has a number of energy-efficient logic families. Among them is the adiabatic quantum-flux-parametron (AQFP) logic family, which adiabatically switches the quantum-flux-parametron (QFP) circuit when it is excited by an AC power-clock. When compared to state-of-the-art CMOS technology, AQFP logic circuits have the advantage of relatively fast clock rates (5 GHz to 10 GHz) and 5 - 6 orders of magnitude reduction in energy before cooling overhead. We have been developing extremely energy-efficient computing processor components using the AQFP. The adder is the most basic computational unit and is important in the development of a processor. In this work, we designed and measured a 16-bit parallel prefix carry look-ahead Kogge-Stone adder (KSA). We fabricated the circuit using the AIST 10 kA/cm2 High-speed STandard Process (HSTP). Due to a malfunction in the measurement system, we were not able to confirm the complete operation of the circuit at the low frequency of 100 kHz in liquid He, but we confirmed that the outputs that we did observe are correct for two types of tests: (1) critical tests and (2) 110 random input tests in total. The operation margin of the circuit is wide, and we did not observe any calculation errors during measurement.
Haichuan YANG Shangce GAO Rong-Long WANG Yuki TODO
In 2019, a completely new algorithm, spherical evolution (SE), was proposed. The brand new search style in SE has been proved to have a strong search capability. In order to take advantage of SE, we propose a novel method called the ladder descent (LD) method to improve the SE' population update strategy and thereafter propose a ladder spherical evolution search (LSE) algorithm. With the number of iterations increasing, the range of parent individuals eligible to produce offspring gradually changes from the entire population to the current optimal individual, thereby enhancing the convergence ability of the algorithm. Experiment results on IEEE CEC2017 benchmark functions indicate the effectiveness of LSE.
Jinghao YE Masao YANAGISAWA Youhua SHI
To solve the area and power problems in Finite Impulse Response (FIR) implementations, a faithfully truncated adder-based FIR design is presented in this paper for significant area and power savings while the predefined output accuracy can still be obtained. As a solution to the accuracy loss caused by truncated adders, a static error analysis on the utilization of truncated adders in FIRs was performed. According to the mathematical analysis, we show that, with a given accuracy constraint, the optimal truncated adder configuration for an area-power efficient FIR design can be effortlessly determined. Evaluation results on various FIR implementations by using the proposed faithfully truncated adder designs showed that up to 35.4% and 27.9% savings in area and power consumption can be achieved with less than 1 ulp accuracy loss for uniformly distributed random inputs. Moreover, as a case study for normally distributed signals, a fixed 6-tap FIR is implemented for electrocardiogram (ECG) signal filtering was implemented, in which even with the increased truncated bits up to 10, the mean absolute error (Ē) can be guaranteed to be less than 1 ulp while up to 29.7% and 25.3% savings in area and power can be obtained.
Jing SUN Yi-mu JI Shangdong LIU Fei WU
Software defect prediction (SDP) plays a vital role in allocating testing resources reasonably and ensuring software quality. When there are not enough labeled historical modules, considerable semi-supervised SDP methods have been proposed, and these methods utilize limited labeled modules and abundant unlabeled modules simultaneously. Nevertheless, most of them make use of traditional features rather than the powerful deep feature representations. Besides, the cost of the misclassification of the defective modules is higher than that of defect-free ones, and the number of the defective modules for training is small. Taking the above issues into account, we propose a cost-sensitive and sparse ladder network (CSLN) for SDP. We firstly introduce the semi-supervised ladder network to extract the deep feature representations. Besides, we introduce the cost-sensitive learning to set different misclassification costs for defective-prone and defect-free-prone instances to alleviate the class imbalance problem. A sparse constraint is added on the hidden nodes in ladder network when the number of hidden nodes is large, which enables the model to find robust structures of the data. Extensive experiments on the AEEEM dataset show that the CSLN outperforms several state-of-the-art semi-supervised SDP methods.
Tongxin YANG Toshinori SATO Tomoaki UKEZONO
Addition is a key fundamental function for many error-tolerant applications. Approximate addition is considered to be an efficient technique for trading off energy against performance and accuracy. This paper proposes a carry-maskable adder whose accuracy can be configured at runtime. The proposed scheme can dynamically select the length of the carry propagation to satisfy the quality requirements flexibly. Compared with a conventional ripple carry adder and a conventional carry look-ahead adder, the proposed 16-bit adder reduced the power consumption by 54.1% and 57.5%, respectively, and the critical path delay by 72.5% and 54.2%, respectively. In addition, results from an image processing application indicate that the quality of processed images can be controlled by the proposed adder. Good scalability of the proposed adder is demonstrated from the evaluation results using a 32-bit length.
Xuncheng ZOU Shigetoshi NAKATAKE
A low voltage stochastic flash ADC (analog-to-digital converter) is presented, with an inverter-based comparative unit which is used to replace comparator for comparison. Aiming at the low voltage and low power consumption, a key of our design is in the simplicity of the structure. The inverter-based comparative unit replacing a comparator enables us to decrease the number of transistors for area saving and power reduction. We insert the inverter-chain in front of the comparative unit for the signal stability and discuss an appropriate circuit structure for the resolution by analyzing three different ones. Finally, we design the whole stochastic flash ADC for verifying our idea, where the supply voltage can go down to 0.6V on the 65nm CMOS process, and through post-layout simulation result, we can observe its advantage visually in voltage, area and power consumption.
Toshinori SATO Tongxin YANG Tomoaki UKEZONO
Approximate computing is a promising paradigm to realize fast, small, and low power characteristics, which are essential for modern applications, such as Internet of Things (IoT) devices. This paper proposes the Carry-Predicting Adder (CPredA), an approximate adder that is scalable relative to accuracy and power consumption. The proposed CPredA improves the accuracy of a previously studied adder by performing carry prediction. Detailed simulations reveal that, compared to the existing approximate adder, accuracy is improved by approximately 50% with comparable energy efficiency. Two application-level evaluations demonstrate that the proposed approximate adder is sufficiently accurate for practical use.
Md Belayet ALI Takashi HIRAYAMA Katsuhisa YAMANAKA Yasuaki NISHITANI
In this paper, we propose a design of reversible adder/subtractor blocks and arithmetic logic units (ALUs). The main concept of our approach is different from that of the existing related studies; we emphasize the function design. Our approach of investigating the reversible functions includes (a) the embedding of irreversible functions into incompletely-specified reversible functions, (b) the operation assignment, and (c) the permutation of function outputs. We give some extensions of these techniques for further improvements in the design of reversible functions. The resulting reversible circuits are smaller than that of the existing design in terms of the number of multiple-control Toffoli gates. To evaluate the quantum cost of the obtained circuits, we convert the circuits to reduced quantum circuits for experiments. The results also show the superiority of our realization of adder/subtractor blocks and ALUs in quantum cost.
Tadashi KAWAI Kensuke NAGANO Akira ENOKIHARA
This paper presents a lumped-element Wilkinson power divider (WPD) using LC-ladder circuits composed of a capacitor and an inductor, and a series LR/CR circuit. The proposed WPD has only seven elements. As a result of designing the divider based on an even/odd mode analysis technique, we theoretically show that broadband WPDs can be realized compared to lumped-element WPDs composed of Π/T-networks and an isolation resistor. By designing the WPD to match at two operating frequencies, the relative bandwidth of about 42% can be obtained. This value is larger than that of the conventional WPD based on the distributed circuit theory. Electromagnetic simulation and experiment are performed to verify the design procedure for the lumped-element WPD designed at a center frequency of 922.5MHz, and good agreement with both is shown.
Ken HAYAMIZU Nozomu TOGAWA Masao YANAGISAWA Youhua SHI
Approximate computing is a promising solution for future energy-efficient designs because it can provide great improvements in performance, area and/or energy consumption over traditional exact-computing designs for non-critical error-tolerant applications. However, the most challenging issue in designing approximate circuits is how to guarantee the pre-specified computation accuracy while achieving energy reduction and performance improvement. To address this problem, this paper starts from the state-of-the-art general approximate adder model (GeAr) and extends it for more possible approximate design candidates by relaxing the design restrictions. And then a maximum-error-distance-based performance/accuracy formulation, which can be used to select the performance/energy-accuracy optimal design from the extended design space, is proposed. Our evaluation results show the effectiveness of the proposed method in terms of area overhead, performance, energy consumption, and computation accuracy.
Yosuke OKADA Tadashi KAWAI Akira ENOKIHARA
In this paper, we propose a design method of compact multi-way Wilkinson power divider with a multiband operation for size reduction and band broadening. The proposed divider consists of multisection LC-ladder circuits in the division arms and isolation circuits between the output ports. To validate design procedures, we fabricated a trial divider at VHF band. The circuit layout of the trial divider was decided by using an electromagnetic simulator (Sonnet EM). Because the proposed divider consists of lumped element circuits, we can realize great miniaturization of a circuit area compared to that of the conventional Wilkinson power divider. The circuit size of the trial divider is 35 mm square. The measurement results for the trial divider by using a vector network analyzer indicates a relative bandwidth of about 60% under -17 dB reflection, flat power division within ±0.1 dB, and very low phase imbalances under 1.0 degree over the wide frequency range.
Katsuhisa YAMANAKA Shin-ichi NAKANO
A ladder lottery, known as “Amidakuji” in Japan, is one of the most popular lotteries. In this paper, we consider the problems of enumeration, counting, and random generation of the ladder lotteries. For given two positive integers n and b, we give algorithms of enumeration, counting, and random generation of ladder lotteries with n lines and b bars. The running time of the enumeration algorithm is O(n + b) time for each. The running time of the counting algorithm is O(nb3) time. The random generation algorithm takes O(nb3) time for preprocess, and then it generates a ladder lottery in O(n + b) for each uniformly at random.
Pao-Lung CHEN Da-Chen LEE Wei-Chia LI
This work presents a novel counter-based randomization method for use in a flying-adder frequency synthesizer with a cost-effective structure that can replace the fractional accumulator. The proposed technique involves a counter, a comparator and a modified linear feedback shift register. The power consumption and speed bottleneck of the conventional flying-adder are significantly reduced. The modified linear shift feedback register is used as a pseudo random data generator, suppressing the spurious tones arise from the periodic carry sequences that is generated by the fractional accumulator. Furthermore, the proposed counter-based randomization method greatly reduces the large memory size that is required by the conventional approach to carry randomization. A test chip for the proposed counter-based randomization method is fabricated in the TSMC 0.18,$mu $m 1P6M CMOS process, with the core area of 0.093,mm$^{mathrm{2}}$. The output frequency had a range of 43.4,MHz, extasciitilde 225.8,MHz at 1.8,V with peak-to-peak jitter (Pk-Pk) jitter 139.2,ps at 225.8,MHz. Power consumption is 2.8,mW @ 225.8,MHz with 1.8 supply voltage.
Chin-Long WEY Ping-Chang JUI Muh-Tian SHIUE
A constant multiplier performs a multiplication of a data-input with a constant value. Constant multipliers are essential components in various types of arithmetic circuits, such as filters in digital signal processor (DSP) units, and they are prevalent in modern VLSI designs. This study presents an efficient algorithm and fast hardware implementation for performing multiply-by-(1+2k) operation with additions. No multiplications are needed. The value of (1+2k)N can be computed by adding N to its k-bit left-shifted value 2kN. The additions can be performed by the full-adder-based (FA-based) ripple carry adder (RCA) for simple architecture. This paper introduces the unit cells for additions (UCAs) to construct the UCA-based RCA which achieves 35% faster than the FA-based RCA in speed performance. Further, in order to improve the speed performance, a simple and modular hybrid adder is presented with the proposed UCA concept, where the carry lookahead adder (CLA) as a module and many of the CLA modules are serially connected in a fashion similar to the RCA. Results show that the hybrid adder significantly improves the speed performance.
Yinan SUN Yongpan LIU Zhibo WANG Huazhong YANG
Function speculation design with error recovery mechanisms is quite promising due to its high performance and low area overhead. Previous work has focused on two-stage function speculation and thus lacks a systematic way to address the challenge of the multistage function speculation approach. This paper proposes a multistage function speculation with adaptive predictors and applies it in a novel adder. We deduced the analytical performance and area models for the design and validated them in our experiments. Based on those models, a general methodology is presented to guide design optimization. Both analytical proofs and experimental results on the fabricated chips show that the proposed adder's delay and area have a logarithmic and linear relationship with its bit number, respectively. Compared with the DesignWare IP, the proposed adder provides the same performance with 6-17% area reduction under different bit lengths.
It is shown that an infinite lumped-element LC- ladder network generates all Bessel functions Jn(t) of the first kind as a response to a single non-zero initial condition. Closed-form expressions for the voltage responses in the time domain are presented if the LC- ladder is driven by a step-like input voltage.