Shoya SONODA Jun SHIOMI Hidetoshi ONODERA
This paper refers to the optimal voltage pair, which minimizes the energy consumption of LSI circuits under a target delay constraint, as a Minimum Energy Point (MEP). This paper proposes an approximation-based implementation method for an MEP tracking system over a wide voltage region. This paper focuses on the MEP characteristics that the energy loss is sufficiently small even though the voltage point changes near the MEP. For example, the energy loss is less than 5% even though the estimated MEP differs by a few tens of millivolts in comparison with the actual MEP. Therefore, the complexity for determining the MEP is relaxed by approximating complex operations such as the logarithmic or the exponential functions in the MEP tracking algorithm, which leads to hardware-/software-efficient implementation. When the MEP tracking algorithm is implemented in software, the MEP estimation time is reduced from 1ms to 13µs by the proposed approximation. When implemented in hardware, the proposed method can reduce the area of an MEP estimation circuit to a quarter. Measurement results of a 32-bit RISC-V processor fabricated in a 65-nm SOTB process technology show that the energy loss introduced by the proposed approximation is less than 2% in comparison with the MEP operation. Furthermore, we show that the MEP can be tracked within about 45 microseconds by the proposed MEP tracking system.
Takumi KOMORI Yutaka MASUDA Jun SHIOMI Tohru ISHIHARA
In the upcoming Internet of Things era, reducing energy consumption of embedded processors is highly desired. Minimum Energy Point Tracking (MEPT) is one of the most efficient methods to reduce both dynamic and static energy consumption of a processor. Previous works proposed a variety of MEPT methods over the past years. However, none of them incorporate their algorithms with practical real-time operating systems, although edge computing applications often require low energy task execution with guaranteeing real-time properties. The difficulty comes from the time complexity for identifying an MEP and changing voltages, which often prevents real-time task scheduling. The conventional Dynamic Voltage and Frequency Scaling (DVFS) only scales the supply voltage. On the other hand, MEPT needs to adjust the body bias voltage in addition. This additional tuning knob makes MEPT much more complicated. This paper proposes an approximate MEPT algorithm, which reduces the complexity of identifying an MEP down to that of DVFS. The key idea is to linearly approximate the relationship between the processor frequency, supply voltage, and body bias voltage. Thanks to the approximation, optimal voltages for a specified clock frequency can be derived immediately. We also propose a task scheduling algorithm, which adjusts processor performance to the workload and then provides a soft real-time capability to the system. The operating system stochastically adjusts the average response time of the processor to be equal to a specified deadline. MEPT will be performed as a general task, and its overhead is considered in the calculation of the frequency. The experiments using a fabricated test chip and on-chip sensors show that the proposed algorithm is a maximum of 16 times more energy-efficient than DVFS. Also, the energy loss induced by the approximation is only 3% at most, and the algorithm does not sacrifice the fundamental real-time properties.
Shoya SONODA Jun SHIOMI Hidetoshi ONODERA
A method for runtime energy optimization based on the supply voltage (Vdd) and the threshold voltage (Vth) scaling is proposed. This paper refers to the optimal voltage pair, which minimizes the energy consumption of LSI circuits under a target delay constraint, as a Minimum Energy Point (MEP). The MEP dynamically fluctuates depending on the operating conditions determined by a target delay constraint, an activity factor and a chip temperature. In order to track the MEP, this paper proposes a closed-form continuous function that determines the MEP over a wide operating performance region ranging from the above-threshold region down to the sub-threshold region. Based on the MEP determination formula, an MEP tracking algorithm is also proposed. The MEP tracking algorithm estimates the MEP even though the operating conditions widely change. Measurement results based on a 32-bit RISC processor fabricated in a 65-nm Silicon On Thin Buried oxide (SOTB) process technology show that the proposed method estimates the MEP within a 5% energy loss in comparison with the actual MEP operation.
Kentaro NAGAI Jun SHIOMI Hidetoshi ONODERA
This paper proposes an area- and energy-efficient DLL-based body bias generator (BBG) for minimum energy operation that controls p-well and n-well bias independently. The BBG can minimize total energy consumption of target circuits under a skewed process condition between nMOSFETs and pMOSFETs. The proposed BBG is composed of digital cells compatible with cell-based design, which enables energy- and area-efficient implementation without additional supply voltages. A test circuit is implemented in a 65-nm FDSOI process. Measurement results using a 32-bit RISC processor on the same chip show that the proposed BBG can reduce energy consumption close to a minimum within a 3% energy loss. In this condition, energy and area overheads of the BBG are 0.2% and 0.12%, respectively.
Takuya KOYANAGI Jun SHIOMI Tohru ISHIHARA Hidetoshi ONODERA
Body bias generators are useful circuits that can reduce variability and power dissipation in LSI circuits. However, the amplifier implemented into the body bias generator is difficult to design because of its complexity. To overcome the difficulty, this paper proposes a clearer cell-based design method of the amplifier than the existing cell-based design methods. The proposed method is based on a simple analytical model, which enables to easily design the amplifiers under various operating conditions. First, we introduce a small signal equivalent circuit of two-stage amplifiers by which we approximate a three-stage amplifier, and introduce a method for determining its design parameters based on the analytical model. Second, we propose a method of tuning parameters such as cell-based phase compensation elements and drive-strength of the output stage. Finally, based on the test chip measurement, we show the advantage of the body bias generator we designed in a cell-based flow over existing designs.
Takuya KOJIMA Naoki ANDO Hayate OKUHARA Ng. Anh Vu DOAN Hideharu AMANO
Variable Pipeline Cool Mega Array (VPCMA) is a low power Coarse Grained Reconfigurable Architecture (CGRA) based on the concept of CMA (Cool Mega Array). It provides a pipeline structure in the PE array that can be configured so as to fit target algorithms and required performance. Also, VPCMA uses the Silicon On Thin Buried oxide (SOTB) technology, a type of Fully Depleted Silicon On Insulator (FDSOI), so it is possible to control its body bias voltage to provide a balance between performance and leakage power. In this paper, we study the optimization of the VPCMA body bias while considering simultaneously its variable pipeline structure. Through evaluations, we can observe that it is possible to achieve an average reduction of energy consumption, for the studied applications, of 17.75% and 10.49% when compared to respectively the zero bias (without body bias control) and the uniform (control of the whole PE array) cases, while respecting performance constraints. Besides, it is observed that, with appropriate body bias control, it is possible to extend the possible performance, hence enabling broader trade-off analyzes between consumption and performance. Considering the dynamic power as well as the static power, more appropriate pipeline structure and body bias voltage can be obtained. In addition, when the control of VDD is integrated, higher performance can be achieved with a steady increase of the power. These promising results show that applying an adequate optimization technique for the body bias control while simultaneously considering pipeline structures can not only enable further power reduction than previous methods, but also allow more trade-off analysis possibilities.
Carlos Cesar CORTES TORRES Hayate OKUHARA Nobuyuki YAMASAKI Hideharu AMANO
In the past decade, real-time systems (RTSs), which must maintain time constraints to avoid catastrophic consequences, have been widely introduced into various embedded systems and Internet of Things (IoTs). The RTSs are required to be energy efficient as they are used in embedded devices in which battery life is important. In this study, we investigated the RTS energy efficiency by analyzing the ability of body bias (BB) in providing a satisfying tradeoff between performance and energy. We propose a practical and realistic model that includes the BB energy and timing overhead in addition to idle region analysis. This study was conducted using accurate parameters extracted from a real chip using silicon on thin box (SOTB) technology. By using the BB control based on the proposed model, about 34% energy reduction was achieved.
Tohru KANEKO Koji HIROSE Akira MATSUZAWA
A current mirror circuit is often used in Gm-cells and current amplifiers in order to obtain high linearity and high accurate current gain. However, it is expected that a threshold voltage mismatch between transistors pair in the current mirror affects these performances in recent scaled technologies. In this paper, negative effects caused by the mismatch in the current mirror are considered and a new calibration technique for the mismatch issues is proposed. In the current mirror without the mismatch, the high-linearity operation is provided by distortion canceling under the condition that the transistors have the same operating points. The threshold voltage mismatch disturbs the cancellation, therefore the distortion is appeared. In order to address the issue, a new calibration technique using a backgating effect is considered. This calibration can reduce the threshold voltage mismatch directly by controlling the body bias voltage with DACs. According to simulation results with Monte Carlo sampling in 65nm CMOS process, owing to the proposed calibration, the worst HD2 and HD3 are improved by 18.4dB and 11.6dB, respectively. In addition, the standard deviation of the current gain is reduced from 399mdB to 34mdB.
Toshihiro KATASHITA Masakazu HIOKI Yohei HORI Hanpei KOIKE
Field-programmable gate array (FPGA) devices are applied for accelerating specific calculations and reducing power consumption in a wide range of areas. One of the challenges associated with FPGAs is reducing static power for enforcing their power effectiveness. We propose a method involving fine-grained reconfiguration of body biases of logic and net resources to reduce the static power of FPGA devices. In addition, we develop an FPGA device called Flex Power FPGA with SOTB technology and demonstrate its power reduction function with a 32-bit counter circuit. In this paper, we describe the construction of an experimental platform to precisely evaluate power consumption and the maximum operating frequency of the device under various operating voltages and body biases with various practical circuits. Using the abovementioned platform, we evaluate the Flex Power FPGA chip at operating voltages of 0.5-1.0 V and at body biases of 0.0-0.5 V. In the evaluation, we use a 32-bit adder, 16-bit multiplier, and an SBOX circuit for AES cryptography. We operate the chip virtually with uniformed body bias voltage to drive all of the logic resources with the same threshold voltage. We demonstrate the advantage of the Flex Power FPGA by comparing its performance with non-reconfigurable biasing.
Shu HOKIMOTO Tohru ISHIHARA Hidetoshi ONODERA
Scaling the supply voltage (Vdd) and threshold voltage (Vth) for minimizing the energy consumption of processors dynamically is highly desired for applications such as wireless sensor network and Internet of Things (IoT). In this paper, we refer to the pair of Vdd and Vth, which minimizes the energy consumption of the processor under a given operating condition, as a minimum energy point (MEP in short). Since the MEP is heavily dependent on an operating condition determined by a chip temperature, an activity factor, a process variation, and a performance required for the processor, it is not very easy to closely track the MEP at runtime. This paper proposes a simple but effective algorithm for dynamically tracking the MEP of a processor under a wide range of operating conditions. Gate-level simulation of a 32-bit RISC processor in a 65nm process demonstrates that the proposed algorithm tracks the MEP under a situation that operating condition widely vary.
Yusuke YOSHIDA Kimiyoshi USAMI
This paper describes a design of energy-efficient Standard Cell Memory (SCM) using Silicon-on-Thin-BOX (SOTB). We present automatic place and routing (P&R) methodology for optimal body-bias separation (BBS) for SCM, which enables to apply different body bias voltages to latches and to other peripheral circuits within SCM. Capability of SOTB to effectively reduce leakage by body biasing is fully exploited in BBS. Simulation results demonstrated that our approach allows us to design SCM with 40% smaller energy dissipation at the energy minimum voltage as compared to the conventional design flow. For the process and temperature variations, Adaptive Body Bias (ABB) for SCM with our BBS provided 70% smaller leakage energy than ABB for the conventional SCM, while achieving the same clock frequency.
Jun SHIOMI Tohru ISHIHARA Hidetoshi ONODERA
Scaling supply voltage (VDD) and threshold voltage (Vth) dynamically has a strong impact on energy efficiency of CMOS LSI circuits. Techniques for optimizing VDD and Vth simultaneously under dynamic workloads are thus widely investigated over the past 15 years. In this paper, we refer to the optimum pair of VDD and Vth, which minimizes the energy consumption of a circuit under a specific performance constraint, as a minimum energy point (MEP). Based on the simple transregional models of a CMOS circuit, this paper derives a simple necessary and sufficient condition for the MEP operation. The simple condition helps find the MEP of CMOS circuits. Measurement results using standard-cell based memories (SCMs) fabricated in a 65-nm process technology also validate the condition derived in this paper.
Yusuke MATSUSHITA Hayate OKUHARA Koichiro MASUYAMA Yu FUJITA Ryuta KAWANO Hideharu AMANO
Body biasing can be used to control the leakage power and performance by changing the threshold voltage of transistors after fabrication. Especially, a new process called Silicon-On-Thin Box (SOTB) CMOS can control their balance widely. When it is applied to a Coarse Grained Reconfigurable Array (CGRA), the leakage power can be much reduced by precise bias control with small domain size including a small number of PEs. On the other hand, the area overhead for separating power domain and delivering a lot of wires for body bias voltage supply increases. This paper explores the grain of domain size of an energy efficient CGRA called CMA (Cool Mega Array). By using Genetic Algorithm based body bias assignment method, the leakage reduction of various grain size was evaluated. As a result, a domain with 2x1 PEs achieved about 40% power reduction with a 6% area overhead. It has appeared that a combination of three body bias voltages; zero bias, weak reverse bias and strong reverse bias can achieve the optimal leakage reduction and area overhead balance in most cases.
Koki IGAWA Masao YANAGISAWA Nozomu TOGAWA
In this paper, we propose a floorplan aware high-level synthesis algorithm with body biasing for delay variation compensation, which minimizes the average leakage energy of manufactured chips. In order to realize floorplan-aware high-level synthesis, we utilize huddle-based distributed register architecture (HDR architecture). HDR architecture divides the chip area into small partitions called a huddle and we can control a body bias voltage for every huddle. During high-level synthesis, we iteratively obtain expected leakage energy for every huddle when applying a body bias voltage. A huddle with smaller expected leakage energy contributes to reducing expected leakage energy of the entire circuit more but can increase the latency. We assign control-data flow graph (CDFG) nodes in non-critical paths to the huddles with larger expected leakage energy and those in critical paths to the huddles with smaller expected leakage energy. We expect to minimize the entire leakage energy in a manufactured chip without increasing its latency. Experimental results show that our algorithm reduces the average leakage energy by up to 39.7% without latency and yield degradation compared with typical-case design with body biasing.
A Field Programmable Gate Array (FPGA) with fine-grained body biasing shows satisfactory static power reduction. Contrarily, the FPGA incurs high overhead because additional body bias selectors and electrical isolation regions are needed to program the threshold voltage (Vt) of elemental circuits such as MUX, buffer and LUT in the FPGA. In this paper, low overhead design of FPGA with fine-grained body biasing is described. The FPGA is designed and fabricated on 65-nm SOTB CMOS technology. By not only adopting a customized design rule specifying that reliability is verified by TEGs but downsizing a body bias selector, the FPGA tile area becomes small by 39% compared with the conventional design, resulting in 900 FPGA tiles with 4,4000 programmable Vt regions. In addition, the chip performance is evaluated by implementing 32-bit binary counter in the supply voltage range of 0.5V from 1.2V. The counter circuit operates at a frequency of 72MHz and 14MHz with the supply voltage of 1.2V and 0.5V respectively. The static power saving of 80% in elemental circuits of the FPGA at 0.5-V supply voltage and 0.5-V reverse body bias voltage is achieved in the best case. In the whole chip including configuration memory and body bias selector in addition to elemental circuits, effective static power reduction around 30% is maintained by applying 0.3-V reverse body bias voltage at each supply voltage.
YoungKyu JANG Ik-Joon CHANG Jinsang KIM
PD-SOI (Partial Depleted Silicon On Insulator) process is a good candidate technology for space system designs, since it features excellent insulation to the silicon substrate compared to the conventional bulk CMOS process. However, the radioactive particles from the low earth orbit can causes single event transient (SET) or abrupt charge collection in a circuit node, leading to a logical error in space systems. Also, the side effects such as the history effect and the kink effect in PD-SOI technology cause the threshold voltage variation, degrading the circuit performance. We propose SET-tolerant PD-SOI CMOS logic circuits using a novel active body-bias scheme. Simulation results show that the proposed circuits are more effective to SET and the side effects as well.
Norihiro KAMAE Akira TSUCHIYA Hidetoshi ONODERA
A forward/reverse body bias generator (BBG) which operates under wide supply-range is proposed. Fine-grained body biasing (FGBB) is effective to reduce variability and increase energy efficiency on digital LSIs. Since FGBB requires a number of BBGs to be implemented, simple design is preferred. We propose a BBG with charge pumps for reverse body bias and the BBG operates under wide supply-range from 0.5,V to 1.2,V. Layout of the BBG was designed in a cell-based flow with an AES core and fabricated in a 65~nm CMOS process. Area of the AES core is 0.22 mm$^2$ and area overhead of the BBG is 2.3%. Demonstration of the AES core shows a successful operation with the supply voltage from 0.5,V to 1.2,V which enables the reduction of power dissipation, for example, of 17% at 400,MHz operation.
Kosuke KATAYAMA Mizuki MOTOYOSHI Kyoya TAKANO Chen Yang LI Shuhei AMAKAWA Minoru FUJISHIMA
E-band communication is allocated to the frequency bands of 71-76 and 81-86GHz. Radio-frequency (RF) front-end components for E-band communication have been realized using compound semiconductor technology. To realize a CMOS LNA for E-band communication, we propose a gain-boosted cascode amplifier (GBCA) stage that simultaneously provides high gain and stability. Designing an LNA from scratch requires considerable time because the tuning of matching networks with consideration of the parasitic elements is complicated. In this paper, we model the characteristics of devices including the effects of their parasitic elements. Using these models, an optimizer can estimate the characteristic of a designed LNA precisely without electromagnetic simulations and gives us the design values of an LNA when the layout constraint is ignored. Starting from the values, a four-stage LNA with a GBCA stage is designed very easily even though the layout constraint is considered and fabricated by a 65nm LP CMOS process. The fabricated LNA is measured, and it is confirmed that it achieves 18.5GHz bandwidth and over 24.3dB gain with 50.6mW power consumption. This is the first LNA to achieve a gain bandwidth of over 300GHz in the E-band among the LNAs utilizing any kind of semiconductor technologies. In this paper, we have proved that CMOS technology, which is suitable for baseband and digital circuitry, is applicable to a communication system covering the entire E-band.
Norihiro KAMAE Akira TSUCHIYA Hidetoshi ONODERA
A body bias generator (BBG) for fine-grained body biasing (FGBB) is proposed. The FGBB is effective to reduce variability and power consumption in a system-on-chip (SoC). Since FGBB needs a number of BBGs, the BBG is preferred to be implemented in cell-based design procedure. In the cell-based design, it is inefficient to provide an extra supply voltage for BBGs. We invented a BBG with switched capacitor configuration and it enables BBG to operate with wide range of the supply voltage from 0.6V to 1.2V. We fabricated the BBG in a 65nm CMOS process to control 0.1mm2 of core circuit with the area overhead of 1.4% for the BBG.
Yasushi IGARASHI Tadashi CHIBA Shin-ichi O'UCHI Meishoku MASAHARA Kunihiro SAKAMOTO
Voltage multiplier (VM) circuits for RF (2.45GHz)-to-DC conversion are developed for battery-less sensor nodes. Converted DC power is charged on a storage capacitor before driving a wireless sensor module. A charging time of the storage capacitor of the proposed VM circuits is reduced 1/10 of the conventional VM circuits, because they have constant current characteristics owing to self-control of body bias in diode-connected SOI MOSFETs. The wireless sensor system composed of the fabricated VM chip and a commercially available sensor module is operated using an RF signal of a wireless LAN modem (2.45GHz) as a power source.