The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] low power design(34hit)

1-20hit(34hit)

  • An Accuracy Reconfigurable Vector Accelerator based on Approximate Logarithmic Multipliers for Energy-Efficient Computing

    Lingxiao HOU  Yutaka MASUDA  Tohru ISHIHARA  

     
    PAPER

      Pubricized:
    2022/09/02
      Vol:
    E106-A No:3
      Page(s):
    532-541

    The approximate logarithmic multiplier proposed by Mitchell provides an efficient alternative for processing dense multiplication or multiply-accumulate operations in applications such as image processing and real-time robotics. It offers the advantages of small area, high energy efficiency and is suitable for applications that do not necessarily achieve high accuracy. However, its maximum error of 11.1% makes it challenging to deploy in applications requiring relatively high accuracy. This paper proposes a novel operand decomposition method (OD) that decomposes one multiplication into the sum of multiple approximate logarithmic multiplications to widely reduce Mitchell multiplier errors while taking full advantage of its area savings. Based on the proposed OD method, this paper also proposes an accuracy reconfigurable multiply-accumulate (MAC) unit that provides multiple reconfigurable accuracies with high parallelism. Compared to a MAC unit consisting of accurate multipliers, the area is significantly reduced to less than half, improving the hardware parallelism while satisfying the required accuracy for various scenarios. The experimental results show the excellent applicability of our proposed MAC unit in image smoothing and robot localization and mapping application. We have also designed a prototype processor that integrates the minimum functionality of this MAC unit as a vector accelerator and have implemented a software-level accuracy reconfiguration in the form of an instruction set extension. We experimentally confirmed the correct operation of the proposed vector accelerator, which provides the different degrees of accuracy and parallelism at the software level.

  • Analysis of Body Bias Control Using Overhead Conditions for Real Time Systems: A Practical Approach

    Carlos Cesar CORTES TORRES  Hayate OKUHARA  Nobuyuki YAMASAKI  Hideharu AMANO  

     
    PAPER-Computer System

      Pubricized:
    2018/01/12
      Vol:
    E101-D No:4
      Page(s):
    1116-1125

    In the past decade, real-time systems (RTSs), which must maintain time constraints to avoid catastrophic consequences, have been widely introduced into various embedded systems and Internet of Things (IoTs). The RTSs are required to be energy efficient as they are used in embedded devices in which battery life is important. In this study, we investigated the RTS energy efficiency by analyzing the ability of body bias (BB) in providing a satisfying tradeoff between performance and energy. We propose a practical and realistic model that includes the BB energy and timing overhead in addition to idle region analysis. This study was conducted using accurate parameters extracted from a real chip using silicon on thin box (SOTB) technology. By using the BB control based on the proposed model, about 34% energy reduction was achieved.

  • Low-Power Motion Estimation Processor with 3D Stacked Memory

    Shuping ZHANG  Jinjia ZHOU  Dajiang ZHOU  Shinji KIMURA  Satoshi GOTO  

     
    PAPER

      Vol:
    E98-A No:7
      Page(s):
    1431-1441

    Motion estimation (ME) is a key encoding component of almost all modern video coding standards. ME contributes significantly to video coding efficiency, but, it also consumes the most power of any component in a video encoder. In this paper, an ME processor with 3D stacked memory architecture is proposed to reduce memory and core power consumption. First, a memory die is designed and stacked with ME die. By adding face-to-face (F2F) pads and through-silicon-via (TSV) definitions, 2D electronic design automation (EDA) tools can be extended to support the proposed 3D stacking architecture. Moreover, a special memory controller is applied to control data transmission and timing between the memory die and the ME processor die. Finally, a 3D physical design is completed for the entire system. This design includes TSV/F2F placement, floor plan optimization, and power network generation. Compared to 2D technology, the number of input/output (IO) pins is reduced by 77%. After optimizing the floor plan of the processor die and memory die, the routing wire lengths are reduced by 13.4% and 50%, respectively. The stacking static random access memory contributes the most power reduction in this work. The simulation results show that the design can support real-time 720p @ 60fps encoding at 8MHz using less than 65mW in power, which is much better compared to the state-of-the-art ME processor.

  • A Perpetuum Mobile 32bit CPU on 65nm SOTB CMOS Technology with Reverse-Body-Bias Assisted Sleep Mode

    Koichiro ISHIBASHI  Nobuyuki SUGII  Shiro KAMOHARA  Kimiyoshi USAMI  Hideharu AMANO  Kazutoshi KOBAYASHI  Cong-Kha PHAM  

     
    PAPER

      Vol:
    E98-C No:7
      Page(s):
    536-543

    A 32bit CPU, which can operate more than 15 years with 220mAH Li battery, or eternally operate with an energy harvester of in-door light is presented. The CPU was fabricated by using 65nm SOTB CMOS technology (Silicon on Thin Buried oxide) where gate length is 60nm and BOX layer thickness is 10nm. The threshold voltage was designed to be as low as 0.19V so that the CPU operates at over threshold region, even at lower supply voltages down to 0.22V. Large reverse body bias up to -2.5V can be applied to bodies of SOTB devices without increasing gate induced drain leak current to reduce the sleep current of the CPU. It operated at 14MHz and 0.35V with the lowest energy of 13.4 pJ/cycle. The sleep current of 0.14µA at 0.35V with the body bias voltage of -2.5V was obtained. These characteristics are suitable for such new applications as energy harvesting sensor network systems, and long lasting wearable computers.

  • A Low-Power LDPC Decoder for Multimedia Wireless Sensor Networks

    Meng XU  Xincun JI  Jianhui WU  Meng ZHANG  

     
    PAPER-Fundamental Theories for Communications

      Vol:
    E96-B No:4
      Page(s):
    939-947

    This paper presents a low-power LDPC decoder that can be used in Multimedia Wireless Sensor Networks. Three low power design techniques are proposed in the decoder design: a layered decoding algorithm, a modified Benes network and a modified memory bypassing scheme. The proposed decoder is implemented in TSMC 0.13 µm, 1.2 V CMOS process. Experiments show that when the clock frequency is 32 MHz, the power consumption of the proposed decoder is 38.4 mW, the energy efficiency is 53.3 pJ/bit/ite and the core area is 1.8 mm2.

  • Energy-Aware Task Scheduling for Real-Time Systems with Discrete Frequencies

    Dejun QIAN  Zhe ZHANG  Chen HU  Xincun JI  

     
    PAPER-Software System

      Vol:
    E94-D No:4
      Page(s):
    822-832

    Power-aware scheduling of periodic tasks in real-time systems has been extensively studied to save energy while still meeting the performance requirement. Many previous studies use the probability information of tasks' execution cycles to assist the scheduling. However, most of these approaches adopt heuristic algorithms to cope with realistic CPU models with discrete frequencies and cannot achieve the globally optimal solution. Sometimes they even show worse results than non-stochastic DVS schemes. This paper presents an optimal DVS scheme for frame-based real-time systems under realistic power models in which the processor provides only a limited number of speeds and no assumption is made on power/frequency relation. A suboptimal DVS scheme is also presented in this paper to work out a solution near enough to the optimal one with only polynomial time expense. Experiment results show that the proposed algorithm can save at most 40% more energy compared with previous ones.

  • Design Optimization of High-Speed and Low-Power Operational Transconductance Amplifier Using gm/ID Lookup Table Methodology

    Takayuki KONISHI  Kenji INAZU  Jun Gyu LEE  Masanori NATSUI  Shoichi MASUI  Boris MURMANN  

     
    PAPER-Electronic Circuits

      Vol:
    E94-C No:3
      Page(s):
    334-345

    We propose a design optimization flow for a high-speed and low-power operational transconductance amplifier (OTA) using a gm/ID lookup table design methodology in scaled CMOS. This methodology advantages from using gm/ID as a primary design parameter to consider all operation regions including strong, moderate, and weak inversion regions, and enables the lowest power design. SPICE-based lookup table approach is employed to optimize the operation region specified by the gm/ID with sufficient accuracy for short-channel transistors. The optimized design flow features 1) a proposal of the worst-case design scenario for specification and gm/ID lookup table generations from worst-case SPICE simulations, 2) an optimization procedure accomplished by the combination of analytical and simulation-based approaches in order to eliminate tweaking of circuit parameters, and 3) an additional use of gm/ID subplots to take second-order effects into account. A gain-boosted folded-cascode OTA for a switched capacitor circuit is adopted as a target topology to explore the effectiveness of the proposed design methodology for a circuit with complex topology. Analytical expressions of the gain-boosted folded-cascode OTA in terms of DC gain, frequency response and output noise are presented, and detailed optimization of gm/IDs as well as circuit parameters are illustrated. The optimization flow is verified for the application to a residue amplifier in a 10-bit 125 MS/s pipeline A/D converter implemented in a 0.18 µm CMOS technology. The optimized circuit satisfies the required specification for all corner simulations without additional tweaking of circuit parameters. We finally explore the possibility of applying this design methodology as a technology migration tool, and illustrate the failure analysis by comparing the differences in the gm/ID characteristics.

  • Power Minimization for Dual- and Triple-Supply Digital Circuits via Integer Linear Programming

    Ki-Yong AHN  Chong-Min KYUNG  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E92-A No:9
      Page(s):
    2318-2325

    This paper proposes an Integer Linear Programming (ILP)-based power minimization method by partitioning into regions, first, with three different VDD's(PM3V), and, secondly, with two different VDD's(PM2V). To reduce the solving time of triple-VDD case (PM3V), we also proposed a partitioned ILP method(p-PM3V). The proposed method provides 29% power saving on the average in the case of triple-VDD compared to the case of single VDD. Power reduction of PM3V compared to Clustered Voltage Scaling (CVS) was about 18%. Compared to the unpartitioned ILP formulation(PM3V), the partitioned ILP method(p-PM3V) reduced the total solution time by 46% at the cost of additional power consumption within 1.3%.

  • Architectural Exploration and Design of Time-Interleaved SAR Arrays for Low-Power and High Speed A/D Converters

    Sergio SAPONARA  Pierluigi NUZZO  Claudio NANI  Geert VAN DER PLAS  Luca FANUCCI  

     
    PAPER

      Vol:
    E92-C No:6
      Page(s):
    843-851

    Time-interleaved (TI) analog-to-digital converters (ADCs) are frequently advocated as a power-efficient solution to realize the high sampling rates required in single-chip transceivers for the emerging communication schemes: ultra-wideband, fast serial links, cognitive-radio and software-defined radio. However, the combined effects of multiple distortion sources due to channel mismatches (bandwidth, offset, gain and timing) severely affect system performance and power consumption of a TI ADC and need to be accounted for since the earlier design phases. In this paper, system-level design of TI ADCs is addressed through a platform-based methodology, enabling effective investigation of different speed/resolution scenarios as well as the impact of parallelism on accuracy, yield, sampling-rate, area and power consumption. Design space exploration of a TI successive approximation ADC is performed top-down via Monte Carlo simulations, by exploiting behavioral models built bottom-up after characterizing feasible implementations of the main building blocks in a 90-nm 1-V CMOS process. As a result, two implementations of the TI ADC are proposed that are capable to provide an outstanding figure-of-merit below 0.15 pJ/conversion-step.

  • Design of Low Power QPP Interleave Address Generator Using the Periodicity of QPP

    Won-Ho LEE  Chong Suck RIM  

     
    LETTER-VLSI Design Technology and CAD

      Vol:
    E92-A No:6
      Page(s):
    1538-1540

    This paper presents two power-saving designs for Quadratic Polynomial Permutation (QPP) interleave address generator of which interleave length K is fixed and unfixed, respectively. These designs are based on our observation that the quadratic term f2x2%K of f(x) = (f1x+f2x2)%K, which is the QPP address generating function, has a short period and is symmetric within the period. Power consumption is reduced by 27.4% in the design with fixed-K and 5.4% in the design with unfixed-K on the average for various values of K, when compared with existing designs.

  • Look-Ahead Dynamic Threshold Voltage Control Scheme for Improving Write Margin of SOI-7T-SRAM

    Masaaki IIJIMA  Kayoko SETO  Masahiro NUMA  Akira TADA  Takashi IPPOSHI  

     
    LETTER-Memory Design and Test

      Vol:
    E90-A No:12
      Page(s):
    2691-2694

    Instability of SRAM memory cells derived from aggressive technology scaling has been recently one of the most significant issues. Although a 7T-SRAM cell with an area-tolerable separated read port improves read margins even at sub-1V, it unfortunately results in degradation of write margins. In order to assist the write operation, we address a new memory cell employing a look-ahead body-bias which dynamically controls the threshold voltage. Simulation results have shown improvement in both the write margins and access time without increasing the leakage power derived from the body-bias.

  • A Fast Probability-Based Algorithm for Leakage Current Reduction Considering Controller Cost

    Tsung-Yi WU  Jr-Luen TZENG  

     
    PAPER-Circuit Synthesis

      Vol:
    E90-A No:12
      Page(s):
    2718-2726

    Because the leakage current of a digital circuit depends on the states of the circuit's logic gates, assigning a minimum leakage vector (MLV) for the primary inputs and the flip-flops' outputs of the circuit that operates in the sleep mode is a popular technique for leakage current reduction. In this paper, we propose a novel probability-based algorithm and technique that can rapidly find an MLV. Unlike most traditional techniques that ignore the leakage current overhead of the newborn vector controller, our technique can take this overhead into account. Ignoring this overhead during solution space exploration may bring a side effect that is misrecognizing a non-optimal solution as an optimal one. Experimental results show that our heuristic algorithm can reduce the leakage current up to 59.5% and can find the optimal solutions on most of the small MCNC benchmark circuits. Moreover, the required CPU time of our probability-based program is significantly less than that of a random search program.

  • Boosted Voltage Scheme with Active Body-Biasing Control on PD-SOI for Ultra Low Voltage Operation

    Masaaki IIJIMA  Masayuki KITAMURA  Masahiro NUMA  Akira TADA  Takashi IPPOSHI  Shigeto MAEGAWA  

     
    PAPER-Digital

      Vol:
    E90-C No:4
      Page(s):
    666-674

    In this paper, we propose an Active Body-biasing Controlled (ABC)-Bootstrap PTL (Pass-Transistor Logic) on PD-SOI for ultra low power design. Although simply lowering the supply voltage (VDD) causes a lack of driving power, our boosted voltage scheme employing a strong capacitive coupling with ABC-SOI improves a driving power and allows lower voltage operation. We also present an SOI-SRAM design boosting the word line (WL) voltage higher than VDD in short transition time without dual power supply rails. Simulation results have shown improvement in both the delay time and power consumption.

  • Adaptive Supply Voltage for Low-Power Ripple-Carry and Carry-Select Adders

    Hiroaki SUZUKI  Woopyo JEONG  Kaushik ROY  

     
    PAPER-Electronic Circuits

      Vol:
    E90-C No:4
      Page(s):
    865-876

    Demands for the low power VLSI have been pushing the development of aggressive design methodologies to reduce the power consumption drastically. To meet the growing demand, we propose low power adders that adaptively select supply voltages based on the input vector patterns. First, we apply the proposed scheme to the Ripple Carry Adder (RCA). A prototype design by a 0.18 µm CMOS technology shows that the Adaptive VDD 32-bit RCA achieves 25% power improvement over the conventional RCA with similar speed. The proposed adder cancels out the delay penalty, utilizing two innovative techniques: carry-skip techniques on the checking operands, and the use of Complementary Pass Transistor Logic (CPL) with dual supply voltage for level conversion. As an expansion to faster adder architectures, we extend the proposal to the Carry-Select Adders (CSA) composed of the RCA sub-blocks. We achieved 24% power improvement on the 128-bit CSA prototype over a conventional design. The proposed scheme also achieves stand-by leakage power reduction--for 32-bit and 128-bit Adaptive RCA and CSA, respectively, 62% and 54% leakage reduction was possible.

  • An Energy-Efficient Partitioned Instruction Cache Architecture for Embedded Processors

    CheolHong KIM  SungWoo CHUNG  ChuShik JHON  

     
    PAPER-Computer Systems

      Vol:
    E89-D No:4
      Page(s):
    1450-1458

    Energy efficiency of cache memories is crucial in designing embedded processors. Reducing energy consumption in the instruction cache is especially important, since the instruction cache consumes a significant portion of total processor energy. This paper proposes a new instruction cache architecture, named Partitioned Instruction Cache (PI-Cache), for reducing dynamic energy consumption in the instruction cache by partitioning it to smaller (less power-consuming) sub-caches. When the proposed PI-Cache is accessed, only one sub-cache is accessed by utilizing the temporal/spatial locality of applications. In the meantime, other sub-caches are not accessed, leading to dynamic energy reduction. The PI-Cache also reduces dynamic energy consumption by eliminating the energy consumed in tag lookup and comparison. Moreover, the performance gap between the conventional instruction cache and the proposed PI-Cache becomes little when the physical cache access time is considered. We evaluated the energy efficiency by running a cycle accurate simulator, SimpleScalar, with power parameters obtained from CACTI. Simulation results show that the PI-Cache improves the energy-delay product by 20%-54% compared to the conventional direct-mapped instruction cache.

  • Bitwidth Optimization for Low Power Digital FIR Filter Design

    Kosuke TARUMI  Akihiko HYODO  Masanori MUROYAMA  Hiroto YASUURA  

     
    PAPER

      Vol:
    E88-A No:4
      Page(s):
    869-875

    We propose a novel approach for designing a low power datapath in wireless communication systems. Especially, we focus on the digital FIR filter. Our proposed approach can reduce the power consumption and the circuit area of the digital FIR filter by optimizing the bitwidth of the each filter coefficient with keeping the filter calculation accuracy. At first, we formulate the constraints about keeping accuracy of the filter calculations. We define the problem to find the optimized bitwidth of each filter coefficient. Our defined problem can be solved by using the commercial optimization tool. We evaluate the effects of consuming power reduction by comparing the digital FIR filters designed in the same bitwidth of all coefficients. We confirm that our approach is effective for a low power digital FIR filter.

  • A 13.56 MHz CMOS RF Identification Passive Tag LSI with Ferroelectric Random Access Memory

    Shoichi MASUI  Toshiyuki TERAMOTO  

     
    INVITED PAPER

      Vol:
    E88-C No:4
      Page(s):
    601-607

    A radio frequency identification tag LSI operating with the carrier frequency of 13.56 MHz as well as storing nonvolatile information in embedded ferroelectric random access memory (FeRAM) has been developed. A full wave rectifier composed of PMOS transistor diodes and NMOS transistor switches achieves RF-to-DC power conversion efficiency over 54%. The entire 16 kbits write and read transaction time can be reduced to 2.1 sec by the use of FeRAM, which corresponds to 2.2 times speed enhancement over conventional EEPROM based tag LSIs. The communication range of the FeRAM based tag LSI can be effectively improved by storing antitheft information in a ferroelectric nonvolatile flip-flop, which can reduce the power consumption of FeRAM from 27 µW to 5 µW. The communication range for the antitheft gate system becomes 70 cm.

  • Datapath-Layout-Driven Design for Low-Power Standard-Cell LSI Implementation

    Takahiro KAKIMOTO  Hiroyuki OCHI  Takao TSUDA  

     
    LETTER-VLSI Design

      Vol:
    E85-A No:12
      Page(s):
    2795-2798

    As a design flow for low-power FPGA implementation, Datapath-Layout-Driven Design (DLDD) has been proposed. This letter reports the effect of DLDD for standard-cell-based ASIC implementation, and proposes necessary improvements. Experimental results shows that about 8.3% reduction of power dissipation is achieved in the best case.

  • An Efficient Nonlinear Charge Pump Cell for LCD Driver

    Min JIANG  Bing YANG  Lijiu JI  

     
    PAPER-Active Matrix Displays

      Vol:
    E85-C No:11
      Page(s):
    1844-1848

    In this paper a new MOS charge pump architecture is presented, where a clock generator is used in each pump stage of the charge pump circuit to elevate voltage exponentially with stages. This charge pump with a clock level shifter is designed to run at an optimized operation frequency, which can make an excellent compromise between the rise time and the dynamic power dissipation. With less stages than the linear-cascade circuit, the power dissipation and the area of the novel charge pump circuit are markedly decreased. The simulating comparison results based on 1.2 µm CMOS, p-substrate double-poly double-metal process parameters show that the nonlinear charge pump with a high pumping efficiency can supply a steady 1 mA, 16 v output for portable LCDs.

  • Data Driven Power Saving for DCT/IDCT VLSI Macrocell

    Luca FANUCCI  Sergio SAPONARA  

     
    LETTER-VLSI Design Technology and CAD

      Vol:
    E85-A No:7
      Page(s):
    1760-1765

    In this letter a low-complexity and low-power realization of the 2D discrete-cosine-transform and its inverse (DCT/IDCT) is presented. A VLSI circuit based on the Chen algorithm with the distributed arithmetic approach is described. Furthermore low-power design techniques, based on clock gating and data driven switching activity reduction, are used to decrease the circuit power consumption. To this aim, input signal statistics have been extracted from H.263/MPEG verification models. Finally, circuit performance is compared to known software solutions and dedicated full-custom ones.

1-20hit(34hit)