1-7hit |
Wei-Kai CHENG Jui-Hung HUNG Yi-Hsuan CHIU
As the increasing complexity of chip design, reducing both power consumption and clock skew becomes a crucial research topic in clock network synthesis. Among various clock network synthesis approaches, clock tree has less power consumption in comparison with clock mesh structure. In contrast, clock mesh has a higher tolerance of process variation and hence is easier to satisfy the clock skew constraint. To reduce the power consumption of clock mesh network, an effective way is to minimize the wire capacitance of stub wires. In addition, integration of clock gating and register clustering techniques on clock mesh network can further reduce dynamic power consumption. In this paper, under both enable timing constraint and clock skew constraint, we propose a methodology to reduce the switching capacitance by non-uniform clock mesh synthesis, clock gate insertion and register clustering. In comparison with clock mesh synthesis and clock gating technique individually, experimental results show that our methodology can improve both the clock skew and switching capacitance efficiently.
Naoya OKADA Yuichi NAKAMURA Shinji KIMURA
Nonvolatile flip-flop enables leakage power reduction in logic circuits and quick return from standby mode. However, it has limited write endurance, and its power consumption for writing is larger than that of conventional D flip-flop (DFF). For this reason, it is important to reduce the number of write operations. The write operations can be reduced by stopping the clock signal to synchronous flip-flops because write operations are executed only when the clock is applied to the flip-flops. In such clock gating, a method using Exclusive OR (XOR) of the current value and the new value as the control signal is well known. The XOR based method is effective, but there are several cases where the write operations can be reduced even if the current value and the new value are different. The paper proposes a method to detect such unnecessary write operations based on state transition analysis, and proposes a write control method to save power consumption of nonvolatile flip-flops. In the method, redundant bits are detected to reduce the number of write operations. If the next state and the outputs do not depend on some current bit, the bit is redundant and not necessary to write. The method is based on Binary Decision Diagram (BDD) calculation. We construct write control circuits to stop the clock signal by converting BDDs representing a set of states where write operations are unnecessary. Proposed method can be combined with the XOR based method and reduce the total write operations. We apply combined method to some benchmark circuits and estimate the power consumption with Synopsys NanoSim. On average, 15.0% power consumption can be reduced compared with only the XOR based method.
Xin MAN Takashi HORIYAMA Shinji KIMURA
Clock gating is supported by commercial tools as a power optimization feature based on the guard signal described in HDL (structural method). However, the identification of control signals for gated registers is hard and designer-intensive work. Besides, since the clock gating cells also consume power, it is imperative to minimize the number of inserted clock gating cells and their switching activities for power optimization. In this paper, we propose an automatic multi-stage clock gating algorithm with ILP (Integer Linear Programming) formulation, including clock gating control candidate extraction, constraints construction and optimum control signal selection. By multi-stage clock gating, unnecessary clock pulses to clock gating cells can be avoided by other clock gating cells, so that the switching activity of clock gating cells can be reduced. We find that any multi-stage control signals are also single-stage control signals, and any combination of signals can be selected from single-stage candidates. The proposed method can be applied to 3 or more cascaded stages. The multi-stage clock gating optimization problem is formulated as constraints in LP format for the selection of cascaded clock-gating order of multi-stage candidate combinations, and a commercial ILP solver (IBM CPLEX) is applied to obtain the control signals for each register with minimum switching activity. Those signals are used to generate a gate level description with guarded registers from original design, and a commercial synthesis and layout tools are applied to obtain the circuit with multi-stage clock gating. For a set of benchmark circuits and a Low Density Parity Check (LDPC) Decoder (6.6k gates, 212 F.F.s), the proposed method is applied and actual power consumption is estimated using Synopsys NanoSim after layout. On average, 31% actual power reduction has been obtained compared with original designs with structural clock gating, and more than 10% improvement has been achieved for some circuits compared with single-stage optimization method. CPU time for optimum multi-stage control selection is several seconds for up to 25k variables in LP format. By applying the proposed clock gating, area can also be reduced since the multiplexors controlling register inputs are eliminated.
Ki-Sung SOHN Da-In HAN Ki-Ju BAEK Nam-Soo KIM Yeong-Seuk KIM
A new clock gating circuit suitable for shift register is presented. The proposed clock gating circuit that consists of basic NOR gates is low power and small area. The power consumption of a 16-bit shift register implemented with the proposed clock gating circuit is about 66% lower than that found when using the conventional design.
Xin MAN Takashi HORIYAMA Shinji KIMURA
Clock gating is the insertion of control signal for registers to switch off unnecessary clock signals selectively without violating the functional correctness of the original design so as to reduce the dynamic power consumption. Commercial EDA tools usually have a mechanism to generate clock gating logic based on the structural method where the control signals specified by designers are used, and the effectiveness of the clock gating depends on the specified control signals. In the research, we focus on the automatic clock gating logic generation and propose a method based on the candidate extraction and control signal selection. We formalize the control signal selection using linear formulae and devise an optimization method based on BDD. The method is effective for circuits with a lot of shared candidates by different registers. The method is applied to counter circuits to check the co-relation with power simulation results and a set of benchmark circuits. 19.1-71.9% power reduction has been found on counter circuitsafter layout and 2.3-18.0% cost reduction on benchmark circuits.
Takashi HASHIMOTO Shunichi KUROMARU Masayoshi TOUJIMA Yasuo KOHASHI Masatoshi MATSUO Toshihiro MORIIWA Masahiro OHASHI Tsuyoshi NAKAMURA Mana HAMADA Yuji SUGISAWA Miki KUROMARU Tomonori YONEZAWA Satoshi KAJITA Takahiro KONDO Hiroki OTSUKI Kohkichi HASHIMOTO Hiromasa NAKAJIMA Taro FUKUNAGA Hiroaki TOIDA Yasuo IIZUKA Hitoshi FUJIMOTO Junji MICHIYAMA
A low power MPEG-4 video codec LSI with the capability for core profile decoding is presented. A 16-b DSP with a vector pipeline architecture and a 32-b arithmetic unit, eight dedicated hardware engines to accelerate MPEG-4 SP@L1 codec, CP@L1 decoding and post video processing, 20-Mb embedded DRAM, and three peripheral blocks are integrated together on a single chip. MPEG-4 SP@L1 codec, CP@L1 decoding and post video processing are realized with a hybrid architecture consisting of a programmable DSP and dedicated hardware engines at low operating frequency. In order to reduce the power consumption, clock gating technique is fully adopted in each hardware block and embedded DRAM is employed. The chip is implemented using 0.18-µm quad-metal CMOS technology, and its die area is 8.8 mm 8.6 mm. The power consumption is 90 mW at a SP@L1 codec and 110 mW at a CP@L1 decoding.
In this letter a low-complexity and low-power realization of the 2D discrete-cosine-transform and its inverse (DCT/IDCT) is presented. A VLSI circuit based on the Chen algorithm with the distributed arithmetic approach is described. Furthermore low-power design techniques, based on clock gating and data driven switching activity reduction, are used to decrease the circuit power consumption. To this aim, input signal statistics have been extracted from H.263/MPEG verification models. Finally, circuit performance is compared to known software solutions and dedicated full-custom ones.