Yuki IMAI Shinichi NISHIZAWA Kazuhito ITO
Environmental power generation devices such as solar cells are used as power sources for IoT devices. Due to the large internal resistance of such power source, LSIs in the IoT devices may malfunction when the LSI operates at high speed, a large current flows, and the voltage drops. In this paper, a standard cell library of stacked structured cells is proposed to increase the delay of logic circuits within the range not exceeding the clock cycle, thereby reducing the maximum current of the LSIs. We show that the maximum power consumption of LSIs can be reduced without increasing the energy consumption of the LSIs.
Hiroyuki UZAWA Kazuhiko TERADA Koyo NITTA
The power consumption of optical network units (ONUs) is a major issue in optical access networks. The downstream buffer is one of the largest power consumers among the functional blocks of an ONU. A cyclic sleep scheme for reducing power has been reported, which periodically powers off not only the downstream buffer but also other components, such as optical transceivers, when the idle period is long. However, when the idle period is short, it cannot power off those components even if the input data rate is low. Therefore, as continuous traffic, such as video, increases, the power-reduction effect decreases. To resolve this issue, we propose another sleep scheme in which the downstream buffer can be partially powered off by cooperative operation with an optical line terminal. Simulation and experimental results indicate that the proposed scheme reduces ONU power consumption without causing frame loss even while the ONU continuously receives traffic and the idle period is short.
Takuya KOJIMA Naoki ANDO Hayate OKUHARA Ng. Anh Vu DOAN Hideharu AMANO
Variable Pipeline Cool Mega Array (VPCMA) is a low power Coarse Grained Reconfigurable Architecture (CGRA) based on the concept of CMA (Cool Mega Array). It provides a pipeline structure in the PE array that can be configured so as to fit target algorithms and required performance. Also, VPCMA uses the Silicon On Thin Buried oxide (SOTB) technology, a type of Fully Depleted Silicon On Insulator (FDSOI), so it is possible to control its body bias voltage to provide a balance between performance and leakage power. In this paper, we study the optimization of the VPCMA body bias while considering simultaneously its variable pipeline structure. Through evaluations, we can observe that it is possible to achieve an average reduction of energy consumption, for the studied applications, of 17.75% and 10.49% when compared to respectively the zero bias (without body bias control) and the uniform (control of the whole PE array) cases, while respecting performance constraints. Besides, it is observed that, with appropriate body bias control, it is possible to extend the possible performance, hence enabling broader trade-off analyzes between consumption and performance. Considering the dynamic power as well as the static power, more appropriate pipeline structure and body bias voltage can be obtained. In addition, when the control of VDD is integrated, higher performance can be achieved with a steady increase of the power. These promising results show that applying an adequate optimization technique for the body bias control while simultaneously considering pipeline structures can not only enable further power reduction than previous methods, but also allow more trade-off analysis possibilities.
Miseon HAN Yeoul NA Dongha JUNG Hokyoon LEE Seon WOOK KIM Youngsun HAN
A memory controller refreshes DRAM rows periodically in order to prevent DRAM cells from losing data over time. Refreshes consume a large amount of energy, and the problem becomes worse with the future larger DRAM capacity. Previously proposed selective refreshing techniques are either conservative in exploiting the opportunity or expensive in terms of required implementation overhead. In this paper, we propose a novel DRAM selective refresh technique by using page residence in a memory hierarchy of hardware-managed TLB. Our technique maximizes the opportunity to optimize refreshing by activating/deactivating refreshes for DRAM pages when their PTEs are inserted to/evicted from TLB or data caches, while the implementation cost is minimized by slightly extending the existing infrastructure. Our experiment shows that the proposed technique can reduce DRAM refresh power 43.6% on average and EDP 3.5% with small amount of hardware overhead.
Masayoshi YOSHIMURA Yoshiyasu TAKAHASHI Hiroshi YAMAZAKI Toshinori HOSOKAWA
High power dissipation can occur by high launch-induced switching activity when the response to a test pattern is captured by flip-flops (FFs) in at-speed scan testing, resulting in excessive IR drop. IR drop may cause significant capture-induced yield loss in the deep submicron era. It is known that test modification methods using X-identification and X-filling are effective to reduce power dissipation in the capture cycle. Conventional low power dissipation oriented X-filling methods consecutively select FFs and assign values to decrease the number of transitions on the FFs. In this paper, we propose a novel low power dissipation oriented X-filling method using SAT Solvers that conducts simultaneous X-filling for some FFs. We also proposed a selection order of FFs based on a correlation coefficient between transitions of FFs and power dissipation. Experimental results show that the proposed method was effective for ISCAS'89 and ITC'99 benchmark circuits compared with justification-probability-based fill.
Yusuke MATSUSHITA Hayate OKUHARA Koichiro MASUYAMA Yu FUJITA Ryuta KAWANO Hideharu AMANO
Body biasing can be used to control the leakage power and performance by changing the threshold voltage of transistors after fabrication. Especially, a new process called Silicon-On-Thin Box (SOTB) CMOS can control their balance widely. When it is applied to a Coarse Grained Reconfigurable Array (CGRA), the leakage power can be much reduced by precise bias control with small domain size including a small number of PEs. On the other hand, the area overhead for separating power domain and delivering a lot of wires for body bias voltage supply increases. This paper explores the grain of domain size of an energy efficient CGRA called CMA (Cool Mega Array). By using Genetic Algorithm based body bias assignment method, the leakage reduction of various grain size was evaluated. As a result, a domain with 2x1 PEs achieved about 40% power reduction with a 6% area overhead. It has appeared that a combination of three body bias voltages; zero bias, weak reverse bias and strong reverse bias can achieve the optimal leakage reduction and area overhead balance in most cases.
The Helmholtz-Kohlraush effect is a visual characteristic that humans perceive color having higher saturation as brighter. In the proposed method, the pixel value is reduced by increasing the saturation while maintaining the hue and value of HSV color space, resulting in power saving of OLED displays since the power consumption of OLED displays directly depends on the pixel value. Although the luminance decreases, brightness of image is maintained by the Helmholtz-Kohlraush effect. In order to suppress excessive increase of saturation, the increase factor of saturation is reduced with an increase in brightness. As maximum increase factor of saturation, kMAX, increases, more power is reduced but unpleasant color change takes place. From the subjective evaluation experiment with the 23 test images consisting of skin, natural and non-natural images, it is found that kMAX is less than 2.0 to suppress the unpleasant color change. When kMAX is 2.0, the power saving is 8.0%. The effectiveness of the proposed technique is confirmed by using a smart phone having 4.5 inches diagonal RGB AMOLED display.
Masahiko SEKI Masato FUJII Tomokazu SHIGA
This paper proposes an address power reduction method for plasma display panels (PDPs) using subfield data smoothing based on a visual masking effect. High-resolution, high-frame-rate PDPs have large address power loss caused by parasitic capacitance. Although the address power is reduced by smoothing the subfield data, noise is generated. The proposed method reduces the address power while maintaining the image quality by choosing the smoothing area of the address data based on the visual masking effect. The results of subjective assessment for the images based on smoothed address data indicate that image quality is maintained.
Kohei MIYASE Ryota SAKAI Xiaoqing WEN Masao ASO Hiroshi FURUKAWA Yuta YAMATO Seiji KAJIHARA
Test power has become a critical issue, especially for low-power devices with deeply optimized functional power profiles. Particularly, excessive capture power in at-speed scan testing may cause timing failures that result in test-induced yield loss. This has made capture-safety checking mandatory for test vectors. However, previous capture-safety checking metrics suffer from inadequate accuracy since they ignore the time relations among different transitions caused by a test vector in a circuit. This paper presents a novel metric called the Transition-Time-Relation-based (TTR) metric which takes transition time relations into consideration in capture-safety checking. Detailed analysis done on an industrial circuit has demonstrated the advantages of the TTR metric. Capture-safety checking with the TTR metric greatly improves the accuracy of test vector sign-off and low-capture-power test generation.
Akihito MORIMOTO Nobuhiko MIKI Yukihiko OKUMURA
In Long-Term Evolution (LTE)-Advanced, heterogeneous networks are important to further improve the system throughput per unit area. In heterogeneous network deployment, low power nodes such as picocells are overlaid onto macrocells. In the downlink, the combined usage of inter-cell interference coordination (ICIC), which is a technique that reduces the severe interference from macrocells by reducing the transmission power or stopping the transmission from the macrocells, and cell range expansion (CRE), which is a technique that expands the cell radius of picocells by biasing the received signal power, is very effective in improving the system and cell-edge user throughput. In this paper, we consider two types of ICIC. The first one reduces the transmission power from the macrocells (referred to as reduced power ICIC) and the second one stops the transmission from the macrocells (referred to as zero power ICIC). This paper investigates the impact of the reduction in the transmission power when using reduced power ICIC and the restriction on the modulation scheme caused by the reduction in the transmission power when using reduced power ICIC on the user throughput performance with the CRE offset value as a parameter. In addition, the throughput performance when applying reduced power ICIC is compared to that when applying zero power ICIC. Simulation results show that the user throughput with reduced power ICIC is not sensitive to the protected subframe ratio compared to that with zero power ICIC even if the modulation scheme is restricted to only QPSK in the protected subframes. This indicates that reduced power ICIC is more robust than zero power ICIC for non-optimum protected subframe ratios.
In recent years, the demand for low-power design has remained undiminished. In this paper, a pseudo power gating (SPG) structure using a normal logic cell is proposed to extend the power gating to an ultrafine grained region at the gate level. In the proposed method, the controlling value of a logic element is used to control the switching activity of modules computing other inputs of the element. For each element, there exists a submodule controlled by an input to the element. Power reduction is maximized by controlling the order of the submodule selection. A basic algorithm and a switching activity first algorithm have been developed to optimize the power. In this application, a steady maximum depth constraint is added to prevent the depth increase caused by the insertion of the control signal. In this work, various factors affecting the power consumption of library level circuits with the SPG are determined. In such factors, the occurrence of glitches increases the power consumption and a method to reduce the occurrence of glitches is proposed by considering the parity of inverters. The proposed SPG method was evaluated through the simulation of the netlist extracted from the layout using the VDEC Rohm 0.18 µm process. Experiments on ISCAS'85 benchmarks show that the reduction in total power consumption achieved is 13% on average with a 2.5% circuit delay degradation. Finally, the effectiveness of the proposed method under different primary input statistics is considered.
Xin MAN Takashi HORIYAMA Shinji KIMURA
Clock gating is supported by commercial tools as a power optimization feature based on the guard signal described in HDL (structural method). However, the identification of control signals for gated registers is hard and designer-intensive work. Besides, since the clock gating cells also consume power, it is imperative to minimize the number of inserted clock gating cells and their switching activities for power optimization. In this paper, we propose an automatic multi-stage clock gating algorithm with ILP (Integer Linear Programming) formulation, including clock gating control candidate extraction, constraints construction and optimum control signal selection. By multi-stage clock gating, unnecessary clock pulses to clock gating cells can be avoided by other clock gating cells, so that the switching activity of clock gating cells can be reduced. We find that any multi-stage control signals are also single-stage control signals, and any combination of signals can be selected from single-stage candidates. The proposed method can be applied to 3 or more cascaded stages. The multi-stage clock gating optimization problem is formulated as constraints in LP format for the selection of cascaded clock-gating order of multi-stage candidate combinations, and a commercial ILP solver (IBM CPLEX) is applied to obtain the control signals for each register with minimum switching activity. Those signals are used to generate a gate level description with guarded registers from original design, and a commercial synthesis and layout tools are applied to obtain the circuit with multi-stage clock gating. For a set of benchmark circuits and a Low Density Parity Check (LDPC) Decoder (6.6k gates, 212 F.F.s), the proposed method is applied and actual power consumption is estimated using Synopsys NanoSim after layout. On average, 31% actual power reduction has been obtained compared with original designs with structural clock gating, and more than 10% improvement has been achieved for some circuits compared with single-stage optimization method. CPU time for optimum multi-stage control selection is several seconds for up to 25k variables in LP format. By applying the proposed clock gating, area can also be reduced since the multiplexors controlling register inputs are eliminated.
Xin MAN Takashi HORIYAMA Shinji KIMURA
Clock gating is the insertion of control signal for registers to switch off unnecessary clock signals selectively without violating the functional correctness of the original design so as to reduce the dynamic power consumption. Commercial EDA tools usually have a mechanism to generate clock gating logic based on the structural method where the control signals specified by designers are used, and the effectiveness of the clock gating depends on the specified control signals. In the research, we focus on the automatic clock gating logic generation and propose a method based on the candidate extraction and control signal selection. We formalize the control signal selection using linear formulae and devise an optimization method based on BDD. The method is effective for circuits with a lot of shared candidates by different registers. The method is applied to counter circuits to check the co-relation with power simulation results and a set of benchmark circuits. 19.1-71.9% power reduction has been found on counter circuitsafter layout and 2.3-18.0% cost reduction on benchmark circuits.
Test data volume and test power are two major concerns when testing modern large circuits. Recently, selective encoding of scan slices is proposed to compress test data. This encoding technique, unlike many other compression techniques encoding all the bits, only encodes the target-symbol by specifying a single bit index and copying group data. In this paper, we propose an extended selective encoding which presents two new techniques to optimize this method: a flexible grouping strategy, X bits exploitation and filling strategy. Flexible grouping strategy can decrease the number of groups which need to be encoded and improve test data compression ratio. X bits exploitation and filling strategy can exploit a large number of don't care bits to reduce testing power with no compression ratio loss. Experimental results show that the proposed technique needs less test data storage volume and reduces average weighted switching activity by 25.6% and peak weighted switching activity by 9.68% during scan shift compared to selective encoding.
In this paper, a new heuristic algorithm is proposed to optimize the power domain clustering in controlling-value-based (CV-based) power gating technology. In this algorithm, both the switching activity of sleep signals (p) and the overall numbers of sleep gates (gate count, N) are considered, and the sum of the product of p and N is optimized. The algorithm effectively exerts the total power reduction obtained from the CV-based power gating. Even when the maximum depth is kept to be the same, the proposed algorithm can still achieve power reduction approximately 10% more than that of the prior algorithms. Furthermore, detailed comparison between the proposed heuristic algorithm and other possible heuristic algorithms are also presented. HSPICE simulation results show that over 26% of total power reduction can be obtained by using the new heuristic algorithm. In addition, the effect of dynamic power reduction through the CV-based power gating method and the delay overhead caused by the switching of sleep transistors are also shown in this paper.
Ching-Hwa CHENG Chin-Hsien WANG
CMOS circuits consume great dynamic power in switching. It has been proposed that energy transfer through a rising Vdd dissipates small amounts of energy. In typical power gate circuits, the high-performance PMOS transistors (PSW) that connect the circuit blocks to the power supply reduce leakage power by shutting off outer power (Vdd) to the idle blocks. We expand this technique by utilizing active PSW, which are turned on and off by clock signal. The PSW are fully turned on only for half of each clock cycle. This means that sufficient Vdd is provided to the circuit continuously for half of each clock cycle. In this manner, the circuit charge and discharge actions are cycle occur in different phases, and ramp Vdd is supplied to the designed circuit; we name this technique "CKVdd." CKVdd is a clock-controlled self-stabilized voltage technique, which generates stable ramp voltage to suppress the currents effectively. It is proposed to reduce dynamic power dissipation in conventional CMOS digital circuits. As compared to the conventional circuit, the circuits using CKVdd technique possesses several characteristics that differ from those of the current circuits using constant Vdd power source. First, CKVdd technique combines the power source and clock signal; it is an efficient low power technique. Second, CKVdd propose a feasible method to generate ramp-Vdd and low-Vdd. This technique would be convenient used to design generic low power digital circuits. Third, normal CMOS circuits show the dynamic power consumption increase proportional to the clock frequency. CKVdd results in a lower-than-usual frequency dependency, it is suitable used to design high clock speed circuits. In investigating constant Vdd for MPEG VLD decoders, CKVdd-circuit reduces 48% of the usual power dissipation and 88% of the usual peak current with small delay penalty.
Lei CHEN Takashi HORIYAMA Yuichi NAKAMURA Shinji KIMURA
Leakage power consumption of logic elements has become a serious problem, especially in the sub-100-nanometer process. In this paper, a novel power gating approach by using the controlling value of logic elements is proposed. In the proposed method, sleep signals of the power-gated blocks are extracted completely from the original circuits without any extra logic element. A basic algorithm and a probability-based heuristic algorithm have been developed to implement the basic idea. The steady maximum delay constraint has also been introduced to handle the delay issues. Experiments on the ISCAS'85 benchmarks show that averagely 15-36% of logic elements could be power gated at a time for random input patterns, and 3-31% of elements could be stopped under the steady maximum delay constraints. We also show a power optimization method for AND/OR tree circuits, in which more than 80% of gates can be power-gated.
Maziar GOUDARZI Tadayuki MATSUMURA Tohru ISHIHARA
The share of leakage in cache power consumption increases with technology scaling. Choosing a higher threshold voltage (Vth) and/or gate-oxide thickness (Tox) for cache transistors improves leakage, but impacts cell delay. We show that due to uncorrelated random within-die delay variation, only some (not all) of cells actually violate the cache delay after the above change. We propose to add a spare cache way to replace delay-violating cache-lines separately in each cache-set. By SPICE and gate-level simulations in a commercial 90 nm process, we show that choosing higher Vth, Tox and adding one spare way to a 4-way 16 KB cache reduces leakage power by 42%, which depending on the share of leakage in total cache power, gives up to 22.59% and 41.37% reduction of total energy respectively in L1 instruction- and L2 unified-cache with a negligible delay penalty, but without sacrificing cache capacity or timing-yield.
Yosuke TAKAHASHI Yukihide KOHIRA Atsushi TAKAHASHI
The reduction of the peak power consumption of LSI is required to reduce the instability of gate operation, the delay increase, the noise, and etc. It is possible to reduce the peak power consumption by clock scheduling because it controls the switching timings of registers and combinational logic elements. In this paper, we propose a fast peak power wave estimation method for clock scheduling and fast clock scheduling methods for the peak power reduction. In experiments, it is shown that the peak power wave estimated by the proposed method in a few seconds is highly correlated with the peak power wave obtained by HSPICE simulation in several days. By using the proposed peak power wave estimation method, proposed clock scheduling methods find clock schedules that greatly reduce the peak power consumption in a few minutes.
In this paper, we propose a simple peak power reduction (PPR) method based on adaptive inversion of parity-check block of codeword in BCH-coded OFDM system. In the proposed method, the entire parity-check block of the codeword is adaptively inversed by multiplying weighting factors (WFs) so as to minimize PAPR of the OFDM signal, symbol-by-symbol. At the receiver, these WFs are estimated based on the property of BCH decoding. When the primitive BCH code with single error correction such as (31,26) code is used, to estimate the WFs, the proposed method employs a significant bit protection method which assigns a significant bit to the best subcarrier selected among all possible subcarriers. With computer simulation, when (31,26), (31,21) and (32,21) BCH codes are employed, PAPR of the OFDM signal at the CCDF (Complementary Cumulative Distribution Function) of 10-4 is reduced by about 1.9, 2.5 and 2.5 dB by applying the PPR method, while achieving the BER performance comparable to the case with the perfect WF estimation in exponentially decaying 12-path Rayleigh fading condition.