Kyo TAKAHASHI Shingo SATO Tadamichi KUDO Yoshitaka TSUNEKAWA
In this report, we propose a high-performance pipelined VLSI architecture of the LMS adaptive filter derived by a cut-set retiming technique. The proposed architecture has a peculiar pipelined form with 3 adaptation delays, and the FIR filter portion has a peculiar class of the transposed form providing a minimum output latency and coefficient delay. Both the delays, the adaptation delay and coefficient delay, are compensated by a look-ahead conversion. A new high-speed 4-input and 2-output CSA type adder with a small hardware is employed. The proposed architecture can achieve a good convergence property, high-sampling rate, minimum output latency, small hardware, and lower power dissipation, simultaneously, and is very suitable to implement on the VLSI.
A new Hybrid-Carry-Selection (HCS) approach for deriving an efficient modulo 2n-1 addition is presented in this study. Its resulting adder architecture is simple and applicable for all n values. Based on 180-nm CMOS technology, the HCS-based modulo 2n-1 adder demonstrates its superiority in Area-Time (AT) performance over existing solutions.
Taeko MATSUNAGA Yusuke MATSUNAGA
This paper addresses parallel prefix adder synthesis which targets area minimization under given bitwise timing constraints. This problem is treated as a problem to synthesize prefix graphs which represent global structures of parallel prefix adders at technology-independent level, and a two-folded algorithm to minimize area of prefix graphs is proposed. The first process is dynamic programming based area minimization (DPAM), which focuses on a specific subset of prefix graphs and finds an exact minimum solution for the subset by dynamic programming. The subset is defined by imposing some restrictions on structures of prefix graphs. By utilizing these restrictions, DPAM can find the minimum solutions efficiently for practical bit width. The second process is area reduction with re-structuring (ARRS), which removes the imposed restrictions on structures, and restructures the result of DPAM for further area reduction while satisfying timing constraints. Experimental results show that smaller area can be achieved compared to existing methods both at prefix graph level and at gate level.
Yukio TAKAHASHI Ryo ISHIKAWA Kazuhiko HONJO
Distortion characteristics caused by the thermal memory effect in power amplifiers were accurately predicted using a multi-stage thermal RC-ladder network derived by simplifying the heat diffusion equation. Assuming a steep gradient of heat diffusion near an intrinsic transistor region in a semiconductor substrate, the steady state temperature, as well as the transient thermal response at the transistor region, was estimated. The thermal resistances and thermal capacitances were adjusted to fit a temperature distribution characteristic and a step response characteristic of temperature in the substrate. These thermal characteristics were calculated by thermal FDTD simulation. For an InGaP/GaAs HBT, a step response characteristic for a square-wave voltage signal input was simulated using a large-signal model of the HBT connecting the multi-stage thermal RC-ladder network. The result was verified experimentally. Additionally, for an RF-amplifier using the HBT, the 3rd-order intermodulation distortion caused by the thermal memory effect was simulated and this result was also verified experimentally. From these verifications, a multi-stage thermal RC-ladder network can be used to accurately design super linear microwave power amplifiers and linearizers.
Hiroaki SUZUKI Woopyo JEONG Kaushik ROY
Demands for the low power VLSI have been pushing the development of aggressive design methodologies to reduce the power consumption drastically. To meet the growing demand, we propose low power adders that adaptively select supply voltages based on the input vector patterns. First, we apply the proposed scheme to the Ripple Carry Adder (RCA). A prototype design by a 0.18 µm CMOS technology shows that the Adaptive VDD 32-bit RCA achieves 25% power improvement over the conventional RCA with similar speed. The proposed adder cancels out the delay penalty, utilizing two innovative techniques: carry-skip techniques on the checking operands, and the use of Complementary Pass Transistor Logic (CPL) with dual supply voltage for level conversion. As an expansion to faster adder architectures, we extend the proposal to the Carry-Select Adders (CSA) composed of the RCA sub-blocks. We achieved 24% power improvement on the 128-bit CSA prototype over a conventional design. The proposed scheme also achieves stand-by leakage power reduction--for 32-bit and 128-bit Adaptive RCA and CSA, respectively, 62% and 54% leakage reduction was possible.
Debatosh DEBNATH Tsutomu SASAO
This paper presents a design method for three-level programmable logic arrays (PLAs), which have input decoders and two-input EXOR gates at the outputs. The PLA realizes an EXOR of two sum-of-products expressions (EX-SOP) for multiple-valued input two-valued output functions. We developed an output phase optimization method for EX-SOPs where some outputs of the function are minimized in the complemented form and presented techniques to minimize EX-SOPs for adders by using an extension of Dubrova-Miller-Muzio's AOXMIN algorithm. The proposed algorithm produces solutions with a half products of AOXMIN-like algorithm in 250 times shorter time for large adders with two-valued inputs. We also proved that an n-bit adder with two-valued inputs requires at most 32n-2+7n-5 products in an EX-SOP while it is known that a sum-of-products expression (SOP) requires 62n-4n-5 products.
Tso-Bing JUANG Shen-Fu HSIAO Ming-Yu TSAI Jenq-Shiun JAN
In this paper, a cell-driven multiplier generator is developed that can produce high-performance gate-level netlists for multiplier-related arithmetic functional units, including multipliers, multiplier and accumulators (MAC) and dot product calculator. The generator optimizes the speed/area performance both in the partial product compression and in the final addition stage for the specified process technology. In addition to the conventional CMOS full adder cells, we have also designed fast compression elements based on pass-transistor logic for further performance improvement of the generated multipliers. Simulation results show that our proposed generator could produce better multiplier-related functional units compared to those generated using Synopsys Designware library or other previously proposed approaches.
Jeong-Gun LEE Jeong-A LEE Suk-Jin KIM Kiseon KIM
A mutated adder architecture utilizing a mixture of carry propagation schemes is proposed to design a delay-area efficient adder which were not available in an ordinary design space. Further, we develop an optimization method based on integer linear programming to search the expanded design space of the mutated adder.
Kuo-Hsing CHENG Shun-Wen CHENG
The conditional sum adder (CSA) has been shown to outperform other adders applied in high-speed applications. This investigation proposes a modified CSA called the conditional carry adder (CCA). Based on the proposed adder architecture, six 64-bit hybrid dual-threshold CCAs for power-aware applications were discussed. Architectural modification of the CCA raises the operation speed, decreases the power dissipation, and lowers the hardware overhead. The proposed 64-bit CCA can decrease the number of multiplexers and internal nodes in the adder design by around 27% compared to the 64-bit CSA. Furthermore, components on critical paths use a low threshold voltage to accelerate the speed of operation, and other components use the normal threshold voltage to save power. This feature is very useful in implementing power-aware arithmetic systems. One of the proposed circuits has the lowest power-delay product and energy-delay product. The hybrid circuit represents a fine compromise between power and performance. Its power efficiency is better than that of the single threshold voltage circuit designs.
Jianxiao CHEN Tetsuya KAWANISHI Kaoru HIGUMA Satoshi SHINADA William S.C. CHANG Masayuki IZUTSU Paul K.L. YU
This paper presents a proposal for a novel integrated tunable coupler device called programmable coupler ladder, based on Titanium diffused lithium niobate waveguide and Y-junction reflector. Unlike the traditional serial to parallel converter, the coupler ladder sorts the output bits in the time axis using a built-in delay waveguide. With a proper control signal it can perform signal processing at the bit level. It also can generate coherent multi-channel outputs with theoretically arbitrary amplitude and phase from continuous input light source. Its application in optical microwave beam forming is briefly described. The key component, built-in delay line based on Y-junction reflector, has been experimentally verified via a loop resonator structure. 1 dB loss is found for each Y-junction reflector, which enables a practical coupler ladder. The loop itself is also an important device for optical signal processing.
Man Long HER Kun Ying LIN Yi Chyun CHIOU Chih Yuan HSIEH
In this study, an improved heterojunction bipolar transistor (HBT) monolithic microwave integrated circuit (MMIC) active mixer is designed and fabricated. The HBT MMIC active mixer that is integrated with a low-noise amplifier (LNA) and active power adder can not only achieve high isolation, but can also dispense with one active component and reduce power consumption at the same time. Measurement results show that the conversion gain, LO-RF isolation, and double sideband noise figure (DSB-NF) of the proposed mixer are 22 dB, 40 dB, and 7 dB, respectively.
Ioannis M. THOIDIS Dimitrios SOUDRIS Adonios THANAILAKIS
Novel designs of multiple-valued logic (quaternary) half adder, full adder, and carry-lookahead adder are introduced. The proposed circuits are static and operate in voltage-mode. Moreover, there is no current flow in steady states, and thus, no static power dissipation. Although the comparison in transistor count shows that the proposed quaternary circuits are larger than two respective binary ones, benefits in parallel addition arise from the use of multiple-valued logic. Firstly, the ripple-carry additions are faster because the number of carries is half compared to binary ones and the propagation delay from the input carry through the output carry is relatively small. Secondly, the carry-lookahead scheme exhibits less complexity, which leads to overall reduction in transistor count for addition with large number of bits.
Chan-Ho PARK Byung-Soo CHOI Suk-Jin KIM Eun-Gu JUNG Dong-Ik LEE
This paper presents a new asynchronous multiplier. The original array structure is divided into two asymmetric arrays, called an upper array and a lower array. For the lower array, Left to Right scheme is applied to take advantage of a fast computation and low power consumption as well. Simulation results show that the proposed multiplier has 40% of performance improvement with a relatively lower power consumption. The multiplier has been implemented in a CMOS 0.35 µm technology and proved functionally correct.
Yasuhiro TAKAHASHI Kei-ichi KONTA Kazukiyo TAKAHASHI Michio YOKOYAMA Kazuhiro SHOUNO Mitsuru MIZUNUMA
This paper describes a design of a Carry Propagation Free Adder/Subtracter (CPFA/S) VLSI using the Adiabatic Dynamic CMOS Logic (ADCL) circuit technology. Using a PSPICE simulator, energy dissipation of the ADCL 1 bit CPFA/S is compared with that of the CMOS 1 bit CPFA/S. As a result, energy dissipation of the proposed ADCL circuits is about 1/3 as high as that of the CMOS circuits. The transistors count, propagation-delay time and energy dissipation of the ADCL 4 bit CPFA/S are compared with those of the ADCL 4 bit Ripple Carry Adder/Subtracter (RCA/S). The transistors count and propagation-delay time are found to be reduced by 7.02% and 57.1%, respectively. Also, energy dissipation is found to be reduced by 78.4%. Circuit operation and performance are evaluated using a chain of the ADCL 1 bit CPFA/S fabricated in a 1.2 µm CMOS process. The experimental results show that addition and subtraction are operated with clock frequencies up to about 1 MHz. In addition, the total power dissipation of the ADCL 1 bit CPFA/S is 28.7 µW including the power supply.
A pass-transistor logic is enhanced with a bootstrap configuration for sub-1 V operation at high speed and low power. The bootstrap configuration drives the output to full swing, which accelerates the signal transition and cuts off the short-circuit current of subsequent CMOS logic gates. The asynchronous or synchronous timing sequence of the input (drain) and the control (gate) signals ensures bootstrap operation. A 1-b arithmetic logic unit (ALU) and an EXNOR gate built with the bootstrap pass-transistor logic outperforms those built with other types of pass-transistor logic. An experimental 16-b pass-transistor adder operates down to 0.4 V with a delay time of 4.2 ns and a power dissipation of 2.8 µ W/MHz at 0.5 V.
Kunitoshi KOMATSU Kaoru SEZAKI
Compatibility of conventional lossless discrete cosine transforms (LDCTs) with the discrete cosine transform (DCT) is not high due to rounding operations. In this paper, we design an LDCT which has high compatibility with the DCT. We first design an 8-point DCT (DCT3) by changing the order of row of the transform matrix and also the way of decomposing the DCT in order to obtain an 8-point LDCT which has high compatibility with the DCT. Next we design an 88-point nonseparable 2D LDCT based on a 4-point lossless Walsh-Hadamard Transform (LWHT) which is multiplier-free. The DCT3 is used, when the nonseparable 2D LDCT is designed. Simulation results show that compatibility of the nonseparable 2D LDCT with the separable 2D DCT is high. We also design an 88-point nonseparable 2D LWHT which is multiplier-free and indicate that its compatibility with the separable 2D Walsh-Hadamard Transform is high.
In this paper, a high performance 3232-bit multiplier for a DSP core is proposed. The multiplier is composed of a block of Booth Encoder, a block of data compression, and a block of a 64-bit adder. In the block of Booth encoder, a conditional sign decision Booth encoder that reduces the gate delay and power consumption is proposed. In the block of data compression, 4-2 and 9-2 data compressors based on a novel compound logic are used for the efficient compressing of extra sign bit. In the block of 64-bit adder, an adaptive MUX-based conditional select adder with a separated carry generation block is proposed. The proposed 3232-bit multiplier is designed by a full-custom method and there are about 28,000 transistors in an active area of 900 µm 500 µm with 0.25 µm CMOS technology. From the experimental results, the multiplication time of the multiplier is about 3.2 ns at 2.5 V power supply, and it consumes about 50 mW at 100 MHz.
Woo-Chan PARK Cheol-Ho JEONG Tack-Don HAN
The format conversion operations between a floating-point number and an integer number and a round operation are the important standard floating-point operations. In most cases, these operations are implemented by adding additional hardware to the floating-point adder. The SR (simultaneous rounding) method, one of the techniques used to improve the performance of the floating-point adder, can perform addition and rounding operations at the same stage and is an efficient method with respect to the silicon area and its performance. In this paper, a hardware model to execute CRops (conversion and rounding operations) for the SR floating-point adder is presented and CRops are analyzed on the proposed hardware model. Implementation details are also discussed. The proposed scheme can maintain the advantages of the SR method and can perform each CRop with three pipeline stages.
Hiromitsu KIMURA Takahiro HANYU Michitaka KAMEYAMA
A new logic-in-memory circuit is proposed for a fine-grain pipelined VLSI system. Dynamic-storage elements are distributed over a logic-circuit plane. A functional pass gate is a key component, where a linear summation and threshold function are merged compactly using charge-storage and charge-coupling effect with a DRAM-cell-based circuit structure. The use of dynamic logic based on pass-transistor network using functional pass gates makes it possible to realize any logic circuits compactly with small power dissipation. As a typical example, a 54-bit pipelined multiplier is implemented by using the proposed circuit technology. Its power dissipation and chip area are reduced to about 63 percent and 72 percent, respectively, in comparison with those of a corresponding binary CMOS implementation under 0.35-µm CMOS technology.
Thanyapat SAKUNKONCHAK Sawasd TANTARATANA
In this paper, we propose a high-speed multiplier-free realization using ROM's to store the results of coefficient scalings in combination with higher signal rate and pipelined operations, without the need of hardware multipliers. By varying some parameters, the proposed structure provides various combinations of hardware and clock speed (or throughput). Examples are given comparing the proposed realization with the distributed arithmetic (DA) realization and direct-form realization with power-of-two coefficients. Results show that with proper choices of the parameters the proposed structure achieves a faster processing speed with less hardware, as compared to the DA realization, while it is much faster than the direct-form with slightly more hardware.