Mohamed ABBAS Makoto IKEDA Kunihiro ASADA
In modern CMOS digital design, the noise immunity has come to have an almost equal importance to the power consumption. In the last decade, many low power design schemes have been presented. However, no one can simply judge which one is the best from the noise immunity point of view. In this paper, we investigate the noise immunity of the static CMOS low power design schemes in terms of logic and delay errors caused by different kinds of noise existing in the static CMOS digital circuits. To fulfill the aims of the paper, first a model representing the different sources of noise in deep submicron design is presented. Then the model is applied to the most famous low power design schemes to find out the most robust one with regard to noise. Our results show the advantages of the dual threshold voltage scheme over other schemes from the noise immunity point of view. Moreover, it indicates that noise should be carefully taken into account when designing low power circuits; otherwise circuit performance would be unexpected. The study is carried out on three circuits; each is designed in five different schemes. The analysis is done using HSPICE, assuming 0.18 µm CMOS technology.
Tetsuya IIZUKA Makoto IKEDA Kunihiro ASADA
This paper proposes flat and hierarchical approaches for generating a minimum-width transistor placement of CMOS cells in presence of non-dual P and N type transistors. Our approaches are the first exact method which can be applied to CMOS cells with any types of structure. Non-dual CMOS cells occupy a major part of an industrial standard-cell library. To generate the exact minimum-width transistor placement of non-dual CMOS cells, we formulate the transistor placement problem into Boolean Satisfiability (SAT) problem considering the P and N type transistors individually. Using the proposed method, the transistor placement problem of any types of CMOS cells can be solved exactly. In addition, the experimental results show that our flat approach generates smaller width placement for 29 out of 103 dual cells than that of the conventional method. Our hierarchical approach reduces the runtimes drastically. Although this approach has possibility to generate wider placements than that of the flat approach, the experimental results show that the width of only 3 out of 147 cells solved by our hierarchical approach are larger than that of the flat approach.
Tetsuya IIZUKA Meikan CHIN Toru NAKURA Kunihiro ASADA
This paper proposes a reference-clock-less quick-start-up CDR that resumes from a stand-by state only with a 4-bit preamble utilizing a phase generator with an embedded Time-to-Digital Converter (TDC). The phase generator detects 1-UI time interval by using its internal TDC and works as a self-tunable digitally-controlled delay line. Once the phase generator coarsely tunes the recovered clock period, then the residual time difference is finely tuned by a fine Digital-to-Time Converter (DTC). Since the tuning resolution of the fine DTC is matched by design with the time resolution of the TDC that is used as a phase detector, the fine tuning completes instantaneously. After the initial coarse and fine delay tuning, the feedback loop for frequency tracking is activated in order to improve Consecutive Identical Digits (CID) tolerance of the CDR. By applying the frequency tracking architecture, the proposed CDR achieves more than 100bits of CID tolerance. A prototype implemented in a 65nm bulk CMOS process operates at a 0.9-2.15Gbps continuous rate. It consumes 5.1-8.4mA in its active state and 42μA leakage current in its stand-by state from a 1.0V supply.
Toru NAKURA Makoto IKEDA Kunihiro ASADA
This paper compares a stub and a decoupling capacitor for power supply noise reduction. A quarter-length stub attached to the power supply line of an LSI chip works as a band-eliminate filter, and suppresses the power supply bounce of the designed frequency. The conditions where the stub is more effective than the same-area decoupling capacitor are clarified. The stub will work more efficiently and on-chip integration will be possible on high frequency operation LSIs.
Benjamin DEVLIN Makoto IKEDA Kunihiro ASADA
In this paper we show that self synchronous circuits can provide robust operation in both soft error prone and low voltage operating environments. Self synchronous circuits are shown to be self checking, where a soft error will either cause a detectable error or halt operation of the circuit. A watchdog circuit is proposed to autonomously detect dual-rail '11' errors and prevent propagation, with measurements in 65 nm CMOS showing seamless operation from 1.6 V to 0.37 V. Compared to a system without the watchdog circuit size and energy-per-operation is increased 6.9% and 16% respectively, while error tolerance to noise is improved 83% and 40% at 1.2 V and 0.4 V respectively. A circuit that uses the dual-pipeline circuit style as redundancy against permanent faults is also presented and 40 nm CMOS measurement results shows correct operation with throughput of 1.2 GHz and 810 MHz at 1.1 V before and after disabling a faulty pipeline stage respectively.
Some CMOS gates are topologically asymmetric in inputs, even though they are logically symmetric. It implies a possibility to reduce power consumption by optimizing signal assignment to the inputs. In this study we theoretically derive power consumption of 2-input NAND gate based on transition probability of input signals, with taking into account charging current due to an internal node. We also propose a signal assignment method to input terminals for reducing power consumption reduction by extending our method for large circuits, and demonstrate the effect of power consumption reduction by the present method.
Kazutoshi KODAMA Tetsuya IIZUKA Toru NAKURA Kunihiro ASADA
This paper proposes a high frequency resolution Digitally-Controlled Oscillator (DCO) using a single-period control bit switching scheme. The proposed scheme controls the tuning word of DCO in a single period for the fine frequency tuning. The LC type DCO is implemented to realize the proposed scheme, and is fabricated using a standard 65 nm CMOS technology. The measurement results show that the implemented DCO improves the frequency resolution from 560 kHz to 180 kHz without phase noise degradation with an additional area of 200 µm2.
In this paper, a high performance 3232-bit multiplier for a DSP core is proposed. The multiplier is composed of a block of Booth Encoder, a block of data compression, and a block of a 64-bit adder. In the block of Booth encoder, a conditional sign decision Booth encoder that reduces the gate delay and power consumption is proposed. In the block of data compression, 4-2 and 9-2 data compressors based on a novel compound logic are used for the efficient compressing of extra sign bit. In the block of 64-bit adder, an adaptive MUX-based conditional select adder with a separated carry generation block is proposed. The proposed 3232-bit multiplier is designed by a full-custom method and there are about 28,000 transistors in an active area of 900 µm 500 µm with 0.25 µm CMOS technology. From the experimental results, the multiplication time of the multiplier is about 3.2 ns at 2.5 V power supply, and it consumes about 50 mW at 100 MHz.
Toru NAKURA Makoto IKEDA Kunihiro ASADA
This paper demonstrates an on-chip di/dt detector circuit. The di/dt detector circuit consists of a power supply line, an underlying spiral inductor and an amplifier. The mutual inductor induces a di/dt proportional voltage, and the amplifier amplifies and outputs the value. The measurement results show that the di/dt detector output and the voltage difference between a resistor have good agreement. The di/dt reduction by a decoupling capacitor is also measured using the di/dt detector.
Currently, a typical 5454 bit multiplier is composed of a parallel structured architecture with the encoder block to implement the Modified Booth's algorithm, a block to implement the data compression, and a 108-bit Carry Look-Ahead (CLA) adder. The key idea in the present paper is a power optimization for the data compressors based on a Window Detector. The role of the Window Detector is detecting the input data, activating a selected operation unit, choosing the optimized output data, and driving the next stage. It can reduce the power consumption drastically because only one selected operation unit (a Window) is activated. The power consumption of the proposed data compressors is reduced by about 33%, compared with that of the conventional multiplier; while the propagation delay is nearly same as that of the conventional one. Furthermore, the power consumption dependent on the input data transition is shown for both the static CMOS logic and the nMOS pass transistor logic.
Yusuke OIKE Makoto IKEDA Kunihiro ASADA
We propose a high-sensitivity and wide-dynamic-range position sensor using logarithmic-response and correlation circuit. The 3-D measurement system using the proposed position sensor has advantages to applications, for example a walking robot and a recognition system on vehicles, which require both of availability in various backgrounds and safe light projection for human eyes. The position sensor with a 64 64 pixel array has been developed and successfully tested. We describe the sensitivity of position detection as SBR (Signal-to-Background Ratio). The minimum SBR of the sensor is -13.9 dB lower than standard sensors. High sensitivity under -10 dB SBR is realized in a dynamic range of 41.7 dB in terms of background illumination. Experimental results of position detection and 3-D measurement in a strong background illumination are also presented.
This paper demonstrates a pulse width controlled PLL without using an LPF. A pulse width controlled oscillator accepts the PFD output where its pulse width controls the oscillation frequency. In the pulse width controlled oscillator, the input pulse width is converted into soft thermometer code through a time to soft thermometer code converter and the code controls the ring oscillator frequency. By using this scheme, our PLL realizes LPF-less as well as quantization noise free operation. The prototype chip achieves 60 µm 20 µm layout area using 65 nm CMOS technology along with 1.73 ps rms jitter while consuming 2.81 mW under a 1.2 V supply with 3.125 GHz output frequency.
Minoru FUJISHIMA Makoto IKEDA Kunihiro ASADA Yasuhisa OMURA Katsutoshi IZUMI
Dynamic performance of ultra-thin SIMOX (Separation by IMplanted OXgen) CMOS circuits has been studied using ring oscillators. A novel concept of current-delay product, along with an equivalent linear resistance of MOSFETs, is applied for deriving effective load capacitance of near 0.1 µm gate CMOS circuits. Calculation results showed quatitative agreement with measurement data. It was found that the gate-fringing capacitance limits the delay time is the case of under 0.2 µm gate-length. The lower bound of power-delay product of SIMOX/SOI is expected as low as 0.2 fJ for the gate length of 0.15 µm at the supply voltage of 1.5 V.
Yusuke OIKE Makoto IKEDA Kunihiro ASADA
In this paper, we present a hierarchical multi-chip architecture which employs fully digital and word-parallel associative memories based on Hamming distance. High capacity scalability is critically important for associative memories since the required database capacity depends on the various applications. A multi-chip structure is most efficient for the capacity scalability as well as the standard memories, however, it is difficult for the conventional nearest-match associative memories. The present digital implementation is capable of detecting all the template data in order of the exact Hamming distance. Therefore, a hierarchical multi-chip structure is simply realized by using extra register buffers and an inter-chip pipelined priority decision circuit hierarchically embedded in multiple chips. It achieves fully chip- and word-parallel Hamming distance search with no throughput decrease, additional clock latency of O(log P), and inter-chip wires of O(P) in a P-chip structure. The feasibility of the architecture and circuit implementation has been demonstrated by post-layout simulations. The performance has been also estimated based on measurement results of a single-chip implementation.
This paper prsesnts a reduced swing signal data transmission method for the bus architectures in VLSIs, which consists of small size bus drivers of inverters, dual rail transmission lines, termination resistors and sense amplifiers for regenerating signal swing. The optimum value of signal swing and driving capacity of sense amplifier are given as functions of transmission line capacitance based on a criterion of areadelay2 for guideline. Using results of analysis, we propose a self-controlled data transmission module for the optimum reduced swing signal. Applying the method to a 32bit bus architecture, it is shown that total area, cycle time and total power consumption are 66,070[µm2], 0.90[ns], 32.2[mW], respectively, while those are 284,000[µm2], 1.12[ns], 173.4[mW], respectively, in the conventional chained buffer module. The proposed method is less noisy than the conventional chained buffer method.
Salih ERGUN Ulkuhan GULER Kunihiro ASADA
A novel random number generation method based on chaotic sampling of regular waveform is proposed. A high speed IC truly random number generator based on this method is also presented. Simulation and experimental results, verifying the feasibility of the circuit, are given. Numerical binary data obtained according to the proposed method pass the four basic tests of FIPS-140-2, while experimental data pass the full NIST-800-22 random number test suite without post-processing.
Toru NAKURA Tetsuya IIZUKA Kunihiro ASADA
This paper demonstrates a PLL compiler that generates the final GDSII data from a specification of input and output frequencies with PVT corner conditions. A Pulse Width Controlled PLLs (PWPLL) is composed of digital blocks, and thus suitable for being designed using a standard cell library and being layed out with a commercially available place-and-route (P&R) tool. A PWPLL has 8 design parameters. Our PLL compiler decides the 8 parameters and confirms the PLL operation with the following functions: 1) calculates rough parameter values based on an analytical model, 2) generates SPICE and gate-level verilog netlists with given parameter values, 3) runs SPICE simulations and analyzes the waveform, to examine the oscillation frequency or the voltage of specified nodes at a given time, 4) changes the parameter values to an appropriate direction depending on the waveform analyses to obtain the optimized parameter values, 5) generates scripts that can be used in commercial design tools and invokes the tools with the gate-level verilog netlist to get the final LVS/DRC-verified GDSII data from a P&R and a verification tools, and finally 6) generates the necessary characteristic summary sheets from the post-layout SPICE simulations extracted from the GDSII. Our compiler was applied to an 0.18µm standard CMOS technology to design a PLL with 600MHz output, 600/16MHz input frequency, and confirms the PLL operation with 1.2mW power and 85µm×85µm layout area.
Toru NAKURA Makoto IKEDA Kunihiro ASADA
This paper demonstrates a power supply noise reduction using on-board stubs. A quarter-length stub attached to the power supply line of an LSI chip works as a band-eliminate filter, and suppresses the power supply noise of the designed frequency. Preliminary experiments show that 87% of the designed frequency noise component is suppressed when stub patterns are written on a power supply area on a PCB board for a 1.25 GHz operating LSI. The results show the possibility of the stub on-chip integration when the operating frequency of LSIs becomes higher and the stub length becomes shorter.
A system level approach for a memory power reduction is proposed in this paper. The basic idea is allocating frequently executed object codes into a small subprogram memory and optimizing supply voltage and threshold voltage of the subprogram memory. Since large scale memory contains a lot of direct paths from power supply to ground, power dissipation caused by subthreshold leakage current is more serious than dynamic power dissipation. Our approach optimizes the size of subprogram memory, supply voltage, and threshold voltage so as to minimize memory power dissipation including static power dissipation caused by leakage current. A heuristic algorithm which determines code allocation, supply voltage, and threshold voltage simultaneously so as to minimize power dissipation of memories is proposed as well. Our experiments with some benchmark programs demonstrate significant energy reductions up to 80% over a program memory which does not employ our approach.