Hye-Mi CHOI Ji-Hoon KIM In-Cheol PARK
As turbo decoding is a highly memory-intensive algorithm consuming large power, a major issue to be solved in practical implementation is to reduce power consumption. This paper presents an efficient reverse calculation method to lower the power consumption by reducing the number of memory accesses required in turbo decoding. The reverse calculation method is proposed for the Max-log-MAP algorithm, and it is combined with a scaling technique to achieve a new decoding algorithm, called hybrid log-MAP, that results in a similar BER performance to the log-MAP algorithm. For the W-CDMA standard, experimental results show that 80% of memory accesses are reduced through the proposed reverse calculation method. A hybrid log-MAP turbo decoder based on the proposed reverse calculation reduces power consumption and memory size by 34.4% and 39.2%, respectively.
Yohei FUKUMIZU Shuji OHNO Makoto NAGATA Kazuo TAKI
A highly collision-resistive RFID system multiplexes communications between thousands of tags and a single reader in combination with time-domain multiplexing code division multiple access (TD-CDMA), CRC error detection, and re-transmission for error recovery. The collision probability due to a random selection of CDMA codes and TDMA channels bounds the number of IDs successfully transmitted to a reader during a limited time frame. However, theoretical analysis showed that the re-transmission greatly reduced the collision probability and that an ID error rate of 2.510-9 could be achieved when 1,000 ID tags responded within a time frame of 400 msec in ideal communication channels. The proposed collision-resistive communication scheme for a thousand multiplexed channels was modeled on a discrete-time digital expression and an FPGA-based emulator was built to evaluate a practical ID error rate under the presence of background noise in communication channels. To achieve simple anti-noise communication in a multiple-response RFID system, as well as unurged re-transmission of ID data, adjusting of correlator thresholds provides a significant improvement to the error rate. Thus, the proposed scheme does not require a reader to request ID transmission to erroneously responding tags. A reader also can lower noise influence by using correlator thresholds, since the scheme multiplexes IDs by CDMA-based communication. The effectiveness of the re-transmission was confirmed experimentally even in noisy channels, and the ID error rate derived from the emulation was 1.910-5. The emulation was useful for deriving an optimum set of RFID system parameters to be used in the design of mixed analog and digital integrated circuits for RFID communication.
An energy-efficient power-aware design is highly desirable for DSP functions that encounter a wide diversity of operating scenarios in battery-powered wireless sensor network systems. Addressing this issue, this letter presents a low-power power-aware scalable pipelined Booth multiplier that makes use of dynamic-range detection unit, sharing common functional units, ensemble of optimized Wallace-trees and a 4-bit array-based adder-tree for DSP applications.
Luca FANUCCI Sergio SAPONARA Massimiliano MELANI Pierangelo TERRENI
With reference to video motion estimation in the framework of the new H.264/AVC video coding standard, this paper presents algorithmic and architectural solutions for the implementation of context-aware coprocessors in real-time, low-power embedded systems. A low-complexity context-aware controller is added to a conventional Full Search (FS) motion estimation engine. While the FS coprocessor is working, the context-aware controller extracts from the intermediate processing results information related to the input signal statistics in order to automatically configure the coprocessor itself in terms of search area size and number of reference frames; thus unnecessary computations and memory accesses can be avoided. The achieved complexity saving factor ranges from 2.2 to 25 depending on the input signal while keeping unaltered performance in terms of motion estimation accuracy. The increased efficiency is exploited both for (i) processing time reduction in case of software implementation on a programmable platform; (ii) power consumption reduction in case of dedicated hardware implementation in CMOS technology.
Zhijun LU Yamu HU Mohamad SAWAN
In this paper, a low-voltage low-power sigma-delta modulator dedicated to implantable sensing devices is presented. This second-order single-loop sigma-delta modulator is implemented with half-delay integrators. These integrators are based on new fully-differential CMOS class AB switched-Operational Transconductance Amplifier (switched-OTA). An on-chip voltage doubler is introduced to locally boost a supply voltage at the input stage of a conventional OTA in order to allow rail-to-rail signal swing. Experimental results of the modulator fabricated in CMOS 0.18 µm technology confirm its expected features of a peak signal-to-noise ratio (SNR) of 72 dB, a signal-to-noise distortion ratio (SNDR) of 62 dB in a 5 kHz signal bandwidth, and a power consumption lower than 66 µW with a 900 mV voltage supply.
Takahiro KUMURA Norio KAYAMA Shinichi SHIONOYA Kazuo KUMAGIRI Takao KUSANO Makoto YOSHIDA Masao IKEKAWA Ichiro KURODA Takao NISHITANI
This paper provides a performance evaluation of our audio and video CODEC by using a method for rapidly verifying and evaluating overall performance on real-time workloads of system LSIs integrated with SPXK5SC DSP cores. The SPXK5SC have been developed as a DSP core well-suited to system LSIs. Despite the fact that it is very important to evaluate the overall performance of target LSIs on real workloads before actual LSI fabrication, software simulators are too slow to deal with real workloads and full hardware prototyping is unable to respond well to design improvements. Therefore, we have developed a hardware emulation approach to be used on system LSIs integrated with a SPXK5SC DSP core in order to evaluate the overall performance of audio/video CODEC on a target system. Our emulation system using a DSP core TEG, which has a bus interface, and an FPGA is suitable for overall system evaluation on real-time workloads as well as architectural investigation. In this paper, we discuss the use of the emulation system in evaluating performance during AV CODEC execution. In addition, an architecture design based on our emulation system is also described.
Minho KWON Youngcheol CHAE Gunhee HAN
In a switched-capacitor (SC) circuit, the major block is an operational transconductance amplifier (OTA) designed in order to form a feedback loop. However, the OTA is the block that consumes most of the power in SC circuits. This paper proposes the use of a class-C inverter instead of the OTA in SC circuits and a corresponding switches configuration for extremely low power applications. A detailed analysis and design trade-offs are also provided. Simulation and experimental results show that sufficient performance can be obtained even though a class-C inverter is used. The second-order biquad filter and the second-order SC sigma-delta (ΣΔ) modulator based on a class-C inverter are designed. These circuits have been fabricated with a 0.35-µm CMOS process. The measurement results of the fabricated SC biquad filter show a 59-dB signal-to-noise-plus-distortion ratio (SNDR) for a 0.2-Vp-p input signal and 0.9-V dynamic ranges. The power consumption of the biquad filter is only 0.4 µW with a 1-V power supply. The measurement results of the fabricated ΣΔ modulator show a 61-dB peak SNR for a 1.6-kHz bandwidth with a sample rate of 200 kHz. The modulator consumes 0.8 µW with a 1-V power supply.
Shoji KAWAHITO Kazutaka HONDA Masanori FURUTA Nobuhiro KAWAI Daisuke MIYAZAKI
In this paper, low-power design techniques of high-speed A/D converters are reviewed and discussed. Pipeline and parallel-pipeline architectures are treated as these are dominant architectures when required high sampling rate and high resolution with reasonable power dissipation. A systematic approach to the power optimization of pipeline and parallel pipeline ADC's is introduced based on models of noise analysis and response time of a building block in the multiple-stage pipeline ADC. Finally, the theoretical minimum of required power as functions of the sampling rate, resolution and SNR is discussed. The analysis shows that, with the developments of new circuits and systems to approach to the minimum, the power can be further reduced by a factor of more than 1/10 without changing the basic architectures.
Kyeong-Sik MIN Kouichi KANDA Hiroshi KAWAGUCHI Kenichi INAGAKI Fayez Robert SALIBA Hoon-Dae CHOI Hyun-Young CHOI Daejeong KIM Dong Myong KIM Takayasu SAKURAI
A new Row-by-Row Dynamic Source-Line Voltage Control (RRDSV) scheme is proposed to suppress leakage current by two orders of magnitude in the SRAM's for sub-70 nm process technology with sub-1-V VDD. This two-order leakage reduction is caused from the cooperation of reverse body-to-source biasing and Drain Induced Barrier Lowering (DIBL) effects. In addition, metal shields are proposed to be inserted between the cell nodes and the bit lines not to allow the cell nodes to be flipped by the external bit-line coupling noise in this paper. A test chip has been fabricated to verify the effectiveness of the RRDSV scheme with the metal shields by using 0.18-µm CMOS process. The retention voltages of SRAM's with the metal shields are measured to be improved by as much as 40-60 mV without losing the stored data compared to the SRAM's without the shields.
Luca FANUCCI Sergio SAPONARA Alexander MORELLO
Several IP cells are available in the market to implement 8051-compliant microcontroller in embedded systems. Yet they frequently lack features that have become a key point in such systems, like power optimization. This paper aims at lowering the power consumption of an 8051 IP core while keeping unaltered performances, through Register Transfer Level techniques such as clustered clock gating, operand isolation and state encoding. This approach preserves the IP high-reusability and technology independence, as it only consists of modifications to the source VHDL code. A total power reduction of about 40% is achieved, with limited area overhead.
Won-Sup CHUNG Hyeong-Woo CHA Sang-Hee SON
A new bipolar linear transconductor for low-voltage low-power signal processing is proposed. The proposed circuit has larger input linear range and smaller power dissipation when compared with the conventional bipolar linear transconductor. The experimental results show that the transconductor with a transconductance of 50 µS has a linearity error of less than 0.02% over an input voltage range of 2.1 V at supply voltages of 3 V. The power dissipation of the transconductor is 3.15 mW.
This paper presents a multiple-voltage high-level synthesis approach for low power DSP applications using algorithmic transformation techniques. Our approach is motivated by maximization of task mobilities in that the increase of mobilities may raise the possibility of assigning tasks to low-voltage components. The mobility means the ability to schedule the starting time of a task. It is defined as the distance between its as-late-as-possible (ALAP) schedule time and its as-soon-as-possible (ASAP) schedule time. To earn task mobilities, we use loop shrinking, retiming and unfolding techniques. The loop shrinking can first reduce the iteration period bound (IPB) and, then, the others are employed for shortening the iteration period (IP) as much as possible. The minimization of IP results in high task mobilities. Finally, we can assign tasks with high mobilities to low-voltage components and, thus, minimize energy under resource and latency constraints. With considering the overhead of level conversion, our approach can achieve significant power reduction. In the case of the third-order IIR filter, the proposed approach can save up to 40.2% of power consumption.
In this paper, we investigate a low-power architecture for designs modeled as an Extended Finite State Machine (EFSM). It is based on the general dynamic power management concept, in which the redundant computation can be dynamically disabled to reduce the overall power dissipation. The contribution of this paper is mainly a systematic procedure to identify almost maximal amount of redundant computation in a design given as an EFSM. There are two levels of redundant computation to be exploited--one is based on the machine state information, while the other is based on the transition information. After the extraction of the redundant computation, a low-power architecture using input gating is proposed to synthesize the final circuit. We tested the technique on a design computing a number's modulo inverse. Experimental results show that 31% power reduction can be achieved at the costs of 2% timing penalty and 16% area overhead.
Tetsuya HIROSE Ryuji YOSHIMURA Toru IDO Toshimasa MATSUOKA Kenji TANIGUCHI
We propose an ultra low power watch-dog circuit with the use of MOSFETs operation under subthreshold characteristics. The circuit monitors the amount of the product degradation because the subthreshold current of MOSFET emulates the rate of the general chemical reaction. Its operation was verified with both SPICE simulation and the measurement of the prototype chip. The new circuit embedded in a tag attached to any product could dynamically monitor the degradation regardless of storage conditions.
Sung Woo CHUNG Gi Ho PARK Sung Bae PARK
Even in embedded processors, the accuracy in a branch prediction significantly affects the performance. In designing a branch predictor, in addition to accuracy, microarchitects should consider area, delay and power consumption. We propose two techniques to reduce the power consumption; these techniques do not requires any additional storage arrays, do not incur additional delay (except just one MUX delay) and never deteriorate accuracy. One is to look up two predictions at a time by increasing the width (decreasing the depth) of the PHT (Prediction History Table). The other is to reduce unnecessary accesses to the BTB (Branch Target Buffer) by accessing the PHT in advance. Analysis results with Samsung Memory Compiler show that the proposed techniques reduce the power consumption of the branch predictor by 15-52%.
Takashi KAWANAMI Masakazu HIOKI Hiroshi NAGASE Toshiyuki TSUTSUMI Tadashi NAKAGAWA Toshihiro SEKIGAWA Hanpei KOIKE
The Flex Power FPGA is presented as a novel FPGA model offering the ability to configure the trade-off between power consumption and speed for each logic element by adjusting the threshold voltage. This FPGA model targets the reduction of static power consumption, which has become one of the most important issues in the development of future-generation devices. The present paper describes a preliminary simulation study of the Flex Power FPGA. A method to effectively assign threshold voltages to transistors at a prescribed granularity based on a timing analysis of the mapped circuit is implemented using the VPR simulator, and the static power reduction for 70 nm technologies is estimated using MCNC benchmark circuits. Simulation results show that the average static power can be reduced to as little as 1/30 of that in the corresponding conventional FPGA. This FPGA model is also demonstrated to be effective with future technologies, where the proportion of static power will be greater.
Sung Woo CHUNG Gi Ho PARK Sung Bae PARK
This letter proposes a low-power tournament branch predictor, in which the number of accesses to the branch predictors (local predictor or global predictor) is reduced. Analysis results with Samsung Memory Compiler show that the proposed branch predictor reduces the power consumption by 24-45%, compared to the conventional tournament branch predictor, not requiring any additional storage arrays, not incurring any additional delay and never harming accuracy.
Akihiro YAMAGISHI Mamoru UGAJIN Tsuneo TSUKAHARA
A 1-V 2.4-GHz-band fully monolithic PLL synthesizer was fabricated in 0.2-µm CMOS/SOI process technology. It includes a voltage-controlled oscillator (VCO) and a 3-GHz fully differential dual-modulus prescaler on a chip. A low-off-leakage-current charge pump is used for open-loop FSK modulation. When the PLL is in the open loop mode, the frequency drift of the output is lower than 2.5 Hz/µsec. The output phase noise is -104 dBc/Hz at 1-MHz offset frequency. The power consumption of the PLL-IC core is 17 mW at 1-V supply voltage. This PLL synthesizer is suitable for a 1-V Bluetooth RF transceiver LSI.
A CMOS voltage-to-current converter in weak inversion is presented in this Letter. It can operate for low supply voltage and its power consumption is also low. As the input voltage varies from -0.15 V to 0.15 V, the measured maximum linearity error for the proposed voltage-to-current converter, is about 3.35%. Its power consumption is only 26 µW under the supply voltage of 2 V. The proposed voltage-to-current converter has been fabricated in a 0.5 µm N-well CMOS 2P2M process. The proposed circuit is expected to be useful in analog signal processing applications.
Yusuke KANNO Hiroyuki MIZUNO Nobuhiro OODAIRA Yoshihiko YASU Kazumasa YANAGISAWA
A power-aware interconnect circuit design--called µI/O architecture--has been developed to provide low-cost system solutions for System-on-Chip (SoC) and System-in-Package (SiP) technologies. The µI/O architecture provides a common interface throughout the module enabling hierarchical I/O design for SoC and SiP. The hierarchical I/O design allows the driver size to be optimized without increasing design complexity. Moreover, it includes a signal-level converter for integrating wide-voltage-range circuit blocks and a signal wall function for turning off each block independently--without invalid signal transmission--by using an internal power switch.