Daeho YUN Bongsub SONG Kyunghoon KIM Junan LEE Jinwook BURM
A low-power switching method using a bootstrapping circuit is proposed for a high-speed output driver of transmitter. Compared with a conventional output driver, the proposed scheme employs only nMOSFETs to transmit data. The bootstrapping circuit ensures the proper switching of nMOSFET. The proposed scheme is simulated and fabricated using a 0.18 µm CMOS technology, showing 10.2% lower power consumption than a conventional switching driver at 2.5 Gb/s data rate.
Juinn-Dar HUANG Chia-I CHEN Wan-Ling HSU Yen-Ting LIN Jing-Yang JOU
In deep-submicron era, wire delay is becoming a bottleneck while pursuing higher system clock speed. Several distributed register (DR) architectures are proposed to cope with this problem by keeping most wires local. In this article, we propose the distributed register-file microarchitecture with inter-island delay (DRFM-IID). Though DRFM-IID is also one of the DR-based architectures, it is considered more practical than the previously proposed DRFM, in terms of delay model. With such delay consideration, the synthesis task is inherently more complicated than the one without inter-island delay concern since uncertain interconnect latency is very likely to seriously impact on the whole system performance. Therefore we also develop a performance-driven architectural synthesis framework targeting DRFM-IID. Several factors for evaluating the quality of results, such as number of inter-island transfers, timing-criticality of transfer, and resource utilization balancing, are adopted as the guidance while performing architectural synthesis for better optimization outcomes. The experimental results show that the latency and the number of inter-cluster transfers can be reduced by 26.9% and 37.5% on average; and the latter is commonly regarded as an indicator for power consumption of on-chip communication.
Toshihiro KONISHI Hyeokjong LEE Shintaro IZUMI Takashi TAKEUCHI Masahiko YOSHIMOTO Hiroshi KAWAGUCHI
We propose a transfer gate phase coupler for a low-power multi-phase oscillator (MPOSC). The phase coupler is an nMOS transfer gate, which does not waste charge to the ground and thus achieves low power. The proposed MPOSC can set the number of outputs to an arbitrary number. The test circuit in a 180-nm process and a 65-nm process exhibits 20 phases, including 90 different angles. The designs in a 180-nm CMOS process and a 65-nm CMOS process were fabricated to confirm its process scalability; in the respective designs, we observed 36.6% and 38.3% improvements in a power-delay products, compared with the conventional MPOSCs using inverters and nMOS latches. In a 65-nm process, the measured DNL and 3σ period jitter are, respectively, less than 1.22 and 5.82 ps. The power is 284 µW at 1.85 GHz.
Sungho BECK Stephen T. KIM Michael LEE Kyutae LIM Joy LASKAR Manos M. TENTZERIS
This paper proposes a technique for two-stage operational amplifiers (OPAMPs) to optimize power consumption according to various channel conditions of wireless communication systems. The proposed OPAMP has the ability of reducing the quiescent current of each stage independently by introducing additional common-mode feedback, therefore more optimization is possible according to the channel conditions than conventional two-stage OPAMPs. The simulations verify the benefits of the technique. As a proof-of-concept topology, the proposed OPAMPs were used in a channel-selection filter for a multi-standard mobile-TV receiver. The power consumption of the filter, 3.4–5.0 mW, was adjustable according to the bandwidth, the noise, and the jammer level. The performance of the filter meets the requirements and verifies the effectiveness of the proposed approach. The filter was fabricated in 0.18-µm CMOS and occupied 0.64 mm2.
Indika U. K. BOGODA APPUHAMYLAGE Shunsuke OKURA Toru IDO Kenji TANIGUCHI
This paper proposes an area efficient, low power, fractional CMOS bandgap reference (BGR) utilizing switched-current and current-memory techniques. The proposed circuit uses only one parasitic bipolar transistor and built-in current source to generate reference voltage. Therefore significant area and power reduction is achieved, and bipolar transistor device mismatch is eliminated. In addition, output reference voltage can be set to almost any value. The proposed circuit is designed and simulated in 0.18 µm CMOS process, and simulation results are presented. With a 1.6 V supply, the reference produces an output of about 628.5 mV, and simulated results show that the temperature coefficient of output is less than 13.8 ppm/ in the temperature range from 0 to 100. The average current consumption is about 8.5 µA in the above temperature range. The core circuit, including current source, opamp, current mirrors and switched capacitor filters, occupies less than 0.0064 mm2 (80 µm×80 µm).
Kenji SUZUKI Mamoru UGAJIN Mitsuru HARADA
A fifth-order switched-capacitor (SC) complex filter was implemented in 0.2-µm CMOS technology. A novel SC integrator was developed to reduce the die size and current consumption of the filter. The filter is centered at 24.730.15 kHz (3δ) and has a bandwidth of 20.260.3 kHz (3δ). The image channel is attenuated by more than 42.6 dB. The in-band third-order harmonic input intercept point (IIP3) is 17.3 dBm, and the input referred RMS noise is 34.3 µVrms. The complex filter consumes 350 µA with a 2.0-V power supply. The die size is 0.578 mm2. Owing to the new SC integrator, the filter achieves a 27% reduction in die size without any degradation in its characteristics, including its noise performance, compared with the conventional equivalent.
Kosuke MIZUNO Hiroki NOGUCHI Guangji HE Yosuke TERACHI Tetsuya KAMINO Tsuyoshi FUJINAGA Shintaro IZUMI Yasuo ARIKI Hiroshi KAWAGUCHI Masahiko YOSHIMOTO
This paper describes a SIFT (Scale Invariant Feature Transform) descriptor generation engine which features a VLSI oriented SIFT algorithm, three-stage pipelined architecture and novel systolic array architectures for Gaussian filtering and key-point extraction. The ROI-based scheme has been employed for the VLSI oriented algorithm. The novel systolic array architecture drastically reduces the number of operation cycle and memory access. The cycle counts of Gaussian filtering module is reduced by 82%, compared with the SIMD architecture. The number of memory accesses of the Gaussian filtering module and the key-point extraction module are reduced by 99.8% and 66% respectively, compared with the results obtained assuming the SIMD architecture. The proposed schemes provide processing capability for HDTV resolution video (1920 1080 pixels) at 30 frames per second (fps). The test chip has been fabricated in 65 nm CMOS technology and occupies 4.2 4.2 mm2 containing 1.1 M gates and 1.38 Mbit on-chip memory. The measured data demonstrates 38.2 mW power consumption at 78 MHz and 1.2 V.
This paper proposes an energy efficient processor which can be used as a design alternative for the dynamic voltage scaling (DVS) processors in embedded system design. The processor consists of multiple PE (processing element) cores and a selective set-associative cache memory. The PE-cores have the same instruction set architecture but differ in their clock speeds and energy consumptions. Only a single PE-core is activated at a time and the other PE-cores are deactivated using clock gating and signal gating techniques. The major advantage over the DVS processors is a small overhead for changing its performance. The gate-level simulation demonstrates that our processor can change its performance within 1.5 microsecond and dissipates about 10 nano-joule while conventional DVS processors need hundreds of microseconds and dissipate a few micro-joule for the performance transition. This makes it possible to apply our multi-performance processor to many real-time systems and to perform finer grained and more sophisticated dynamic voltage control.
Jeonghun KIM Suki KIM Kwang-Hyun BAEK
This paper presents a low-power System on Chip (SOC) architecture for the v2.0+EDR (Enhanced Data Rate) Bluetooth and its applications. Our design includes a link controller, modem, RF transceiver, Sub-Band Codec (SBC), Expanded Instruction Set Computer (ESIC) processor, and peripherals. To decrease power consumption of the proposed SOC, we reduce data transfer using a dual-port memory, including a power management unit, and a clock gated approach. We also address some of issues and benefits of reusable and unified environment on a centralized data structure and SOC verification platform. This includes flexibility in meeting the final requirements using technology-independent tools wherever possible in various processes and for projects. The other aims of this work are to minimize design efforts by avoiding the same work done twice by different people and to reuse the similar environment and platform for different projects. This chip occupies a die size of 30 mm2 in 0.18 µm CMOS, and the worst-case current of the total chip is 54 mA.
Sung-Rae LEE Ser-Hoon LEE Sun-Young HWANG
This paper presents an efficient instruction scheduling algorithm which generates low-power codes for embedded system applications. Reordering and recoding are concurrently applied for low-power code generation in the proposed algorithm. By appropriate reordering of instruction sequences, the efficiency of instruction recoding is increased. The proposed algorithm constructs program codes on a basic-block basis by selecting a code sequence from among the schedules generated randomly and maintained by the system. By generating random schedules for each of the basic blocks constituting an application program, the proposed algorithm constructs a histogram graph for each of the instruction fields to estimate the figure-of-merits achievable by reordering instruction sequences. For further optimization, the system performs simulated annealing on the generated code. Experimental results for benchmark programs show that the codes generated by the proposed algorithm consume 37.2% less power on average when compared to the previous algorithm which performs list scheduling prior to instruction recoding.
Du-Hwi KIM Ji-Hye JANG Liyan JIN Jae-Hyung LEE Pan-Bong HA Young-Hee KIM
We propose a low-power eFuse one-time programmable (OTP) memory IP based on a bipolar CMOS DMOS (BCD) process. It is an eFuse OTP memory cell which uses separate transistors that are optimized in program and in read mode. The eFuse cell also uses poly-silicon gates having co-silicide. An asynchronous interface and a separate I/O method are used for the low-power and small-area eFuse OTP memory IP. Additionally, we propose a new circuit protecting a short-circuit current in the VDD-to-VIO voltage level translator circuit while the VDD voltage is being generated by the voltage regulator at power-up. A digital sensing circuit using clocked inverters is used to sense a bit-line (BL) datum. Furthermore, the poly-silicon of the IP is split into n+ poly-silicon and p+ poly-silicon to optimize the eFuse link. The layout size of the designed eFuse OTP memory IP with Dongbu HiTek's 0.18 µm BCD process is 283.565524.180 µm2. It is measured by manufactured test IPs with Dongbu HiTek's 0.18 µm BCD process that the programming voltage of the n+ gate poly-silicon is about 0.1 V less than that of the p+ gate poly-silicon.
An operational amplifier is one of the key functional blocks and is widely used in analog and mixed-signal circuits. For low-power consumption, many techniques such as class AB and slew-rate enhancement have been proposed. Although phase compensation is related to power consumption, it has not been clearly discussed from the viewpoint of the power consumption. In this paper, the conventional and the improved Miller compensations and the phase compensation by introducing a new zero are dicussed for low-power operational amplifiers.
A new fast-lock, low-power digital delay-locked loop (DLL) is presented. A subranging searching algorithm is employed to effectively make the loop locked within only four clock cycles. A half-delay circuit is utilized to cut down power consumption. The prototype DLL in a standard 0.13-µm CMOS process operates in the range from 50 MHz to 400 MHz with four clock cycle lock time and consumes 2.379 mW with 1-V supply at 400 MHz clock rate. The measured RMS jitter and peak-to-peak jitter at 400 MHz are 1.586 ps and 16.67 ps, respectively. It occupies an active area of 0.038 mm2.
Hisashi TANAKA Koichi TÁNNO Ryota MIWA Hiroki TAMURA Kenji MURAO
In this paper, a low-voltage, wide-common-mode-range and high-CMRR OTA is presented. The proposed OTA consists of two circuit blocks; one is the input stage and operates as a differential level shifter, and the other is a highly linear output stage. Furthermore, the OTA can be operated in both weak and strong inversion regions. The proposed OTA is evaluated through Star-HSPICE with 0.18 µm CMOS device parameters (LEVEL53). Simulation results demonstrate a CMRR of 158 dB, a common-mode-input-range of 65 mV to 720 mV and a current consumption of 1.2 µA when VDD=0.8 V.
Yong-Hee KIM Myoung-Jo JUNG Cheol-Hoon LEE
We propose a dynamic voltage scaling algorithm to exploit the temporal locality called TLDVS (Temporal Locality DVS) that can achieve significant energy savings while simultaneously preserving timeliness guarantees made by real-time scheduling. Traditionally hard real-time scheduling algorithms assume that the actual computation requirement of tasks would be varied continuously from time to time, but most real-time tasks have a limited number of operational modes changing with temporal locality. Such temporal locality can be exploited for energy savings by scaling down the operating frequency and the supply voltage accordingly. The proposed algorithm does not assume task periodicity, and requires only previous execution time among a priori information on the task set to schedule. Simulation results show that TLDVS achieves up to 25% energy savings compared with OLDVS, and up to 42% over the non-DVS scheduling.
Giuseppe CARUSO Alessio MACCHIARELLA
In this paper, a design methodology for the minimization of various performance metrics of MOS Current-Mode Logic (MCML) circuits is described. In particular, it allows to minimize the delay under a given power consumption, the power consumption under a given delay and the power-delay product. Design solutions can be evaluated graphically or by simple and effective automatic procedures implemented within the MATLAB environment. The methodology exploits the novel concepts of crossing-point current and crossing-point capacitance. A useful feature of it is that it provides the designer with useful insights into the dependence of the performance metrics on design variables and fan-out capacitance. The methodology was validated by designing several MCML circuits in an IBM 130 nm CMOS process.
Vishal V. KULKARNI Hiroki ISHIKURO Tadahiro KURODA
A CMOS wireless transceiver operating in the 14-18 GHz range is proposed. The receiver uses direct conversion architecture for demodulation with a fast carrier and symbol timing recovery scheme. The transmitter uses a PLL and an up-conversion mixer to generate BPSK modulated signal. A ring oscillator is used in the PLL to make faster switching for burst transmission obtaining high speed low power operation. The transceiver operation has been verified by system simulation while the transmitter test-chip was fabricated in 65 nm CMOS technology and verified with measured results. The transmitter generates a bi-phase modulated signal with a center frequency of 16 GHz at a maximum data rate of 4 Gb/s and consumes 61 mW of power. To the best knowledge of authors, this is lowest power consumption among the reported transmitters that operate over 1 Gb/s range. The transceiver is proposed for a target communication distance of 10 cm.
Shanq-Jang RUAN Jui-Yuan HSIEH Chia-Han LEE
This paper presents a gate-block selection algorithm, which can synthesize a proper parameter extractor of the pre-computation-based content-addressable memory (PB-CAM) to enhance power efficiency for specific applications such as embedded systems, microprocessor and SOC, etc. Furthermore, a novel CAM cell design with single bit-line is proposed. The proposed CAM cell design requires only one heavy loading bit-line and merely is constructed with eight transistors. The whole PB-CAM design was described in Spice with TSMC 0.35 µm double-poly quadruple-metal CMOS process. We used Synopsys Nanosim to estimate power consumption. With a 128 words by 32 bits CAM size, the experimental results showed that our proposed PB-CAM effectively reduces 18.21% of comparison operations in the CAM and saves 16.75% in power reduction by synthesizing a proper parameter extractor of the PB-CAM compared with the 1's count PB-CAM. This implies that our proposed PB-CAM is more flexible and adaptive for specific applications.
Guy TORFS Zhisheng LI Johan BAUWELINCK Xin YIN Jan VANDEWEGE Geert Van Der PLAS
A novel low-power kick-back reduced comparator for use in high-speed flash analog-to-digital converters (ADC) is presented. The proposed comparator combines cascode transistors to reduce the kick-back noise with a built-in threshold voltage to remove the static power consumption of a reference. Without degrading other figures, the kick-back noise is reduced by a factor 8, compared to a previous design without cascode transistors. An improved calibration structure is also proposed to improve linearity when used in an ADC. Simulated in a standard CMOS technology the comparator consumes 106.5 µW at 1.8 V power supply and 1 GHz clock frequency.
Woo Joo KIM Sung Hee LEE Sun Young HWANG
This paper presents a hierarchical NoC architecture to support GT (Guaranteed Throughput) signals to process multimedia data in embedded systems. The architecture provides a communication environment that meets the diverse conditions of communication constraints among IPs in power and area. With a system based on packet switching, which requires storage/control circuits to support GT signals, it is hard to satisfy design constraints in area, scalability and power consumption. This paper proposes a hierarchical 444 mesh-type NoC architecture based on circuit switching, which is capable of processing GT signals requiring high throughput. The proposed NoC architecture shows reduction in area by 50.2% and in power consumption by 57.4% compared with the conventional NoC architecture based on circuit switching. These figures amount to by 72.4% and by 86.1%, when compared with an NoC architecture based on packet switching. The proposed NoC architecture operates in the maximum throughput of 19.2 Gb/s.