Makoto SAITOH Masaaki AZUMA Atsushi TAKAHASHI
We introduce a clock schedule algorithm to obtain a clock schedule that achieves a shorter clock period and that can be realized by a light clock tree. A shorter clock period can be achieved by controlling the clock input timing of each register, but the required wire length and power consumption of a clock tree tends to be large if clock input timings are determined without considering the locations of registers. To overcome the drawback, our algorithm constructs a cluster that consists of registers with the same clock input timing located in a close area. The registers in each cluster are driven by a buffer and a shorter wire length can be achieved. In our algorithm, first registers are partitioned into clusters by their locations, and clusters are modified to improve the clock period while maintaining the radius of each cluster small. In our experiments, the clock period achieved in average is about 13% shorter than that achieved by a zero-skew clock tree, and about 4% longer than the theoretical minimum. The wire length and power consumption of a clock tree according to an obtained clock schedule is comparable to these of a zero skew tree.
Seiichiro ISHIJIMA Tetsuaki UTSUMI Tomohiro OTO Atsushi TAKAHASHI
A circuit in which the clock is assumed to be distributed periodically to each individual register though not necessarily to all registers simultaneously, called a semi-synchronous circuit, is expected to achieve higher frequency or a smaller clock tree compared with an ordinary synchronous circuit, called a complete-synchronous circuit. In this paper, we propose a circuit design method that realizes a semi-synchronous circuit with higher frequency by modifying the clock tree of a complete-synchronous circuit. We confirm that the proposed method is easy to incorporate with current practical design environment by designing a four stage pipelined processor compatible with MIPS operation code. The obtained processor circuit is the first semi-synchronous circuit designed systematically with theoretical background.
Keiichi KUROKAWA Takuya YASUI Yoichi MATSUMURA Masahiko TOYONAGA Atsushi TAKAHASHI
In several researches in recent years, it is shown that the circuit of a higher clock frequency can be obtained by controlling the clock-input timing of each register. However, the power consumption of the clock-tree obtained by them tends to be larger since the locations of registers are not well taken into account in clock scheduling. In this paper, we propose a novel clock tree synthesis that attains both the higher clock frequency and the lower power consumption. Our proposed algorithm determines the clock-input timings of registers step by step in constructing a clock tree structure. First, the clock period of a circuit is improved by controlling the clock-input timing of each register, and second, the clock-input timings are modified to construct a low power clock tree without deteriorating the obtained clock period. According to our experiments using several benchmark circuits, the power consumption of our clock trees attain about 9.5% smaller than previous methods.
Satoru HANZAWA Hiromasa NODA Takeshi SAKATA Osamu NAGASHIMA Sadayuki MORITA Masanori ISODA Michiyo SUZUKI Sadayuki OHKUMA Kyoko MURAKAMI
A hierarchical timing adjuster that operates with intermittent adjustment has been developed for use in low-power DDR SDRAMs. Intermittent adjustment reduces power consumption in both coarse- and fine-delay circuits. Furthermore, the current-controlled fine-tuning of delay is free of short-circuit current and achieves a resolution of about 0.1 ns. In a design with 0.16-µm node technology, these techniques make the hierarchical timing adjuster able to reduce the operating current to 4.8 mA, which is 20% for the value in a conventional scheme with every-cycle measurement. The proposed timing adjuster achieves a three-cycle lock-in and only generates an internal clock pulse that has coarse resolution in the second cycle. The circuit operates over the range from 60 to 150 MHz, and occupies 0.29 mm2.
In this letter a low-complexity and low-power realization of the 2D discrete-cosine-transform and its inverse (DCT/IDCT) is presented. A VLSI circuit based on the Chen algorithm with the distributed arithmetic approach is described. Furthermore low-power design techniques, based on clock gating and data driven switching activity reduction, are used to decrease the circuit power consumption. To this aim, input signal statistics have been extracted from H.263/MPEG verification models. Finally, circuit performance is compared to known software solutions and dedicated full-custom ones.
Hidekuni TAKAO Fumie INA Kazuaki SAWADA Makoto ISHIDA
In this paper, a novel method of clock feedthrough reduction in CMOS autozeroed operational amplifiers with three-phase clock operation is presented. The operational amplifiers in the method are configured by two autozeroed-gain stages. The differential input stage and the second output gain stage are autozeroed individually by a three-phase clock for autozeroing. The three-phase clock is provided so as to finish the compensation period of the input stage earlier than the end of the second stage compensation period. This operation makes it possible to absorb affection of clock feedthrough in the input stage with the second stage. As a result, residual error of offset compensation is much reduced by the voltage gain of the first stage. The effect of the two-stage autozeroing has been confirmed with SPICE simulation and fabricated CMOS circuit. The results of SPICE simulation showed that the two-stage autozeroed operational amplifier has significant advantage as compared to conventional configuration. Affection of clock feedthrough is reduced to about 1/50 in the two-stage configuration. Fabricated CMOS circuit also showed high potential of the two-stage autozeroed operational amplifier for feedthrough reduction. It has been proven experimentally that the two-stage autozeroing is an effective design approach to reduce clock feedthrough error in CMOS autozeroed operational amplifiers.
This paper describes a new circuit technique for performing clock recovery and data re-timing functions for high-speed source synchronous data communications, such as in burst-mode data transmission. The new clock recovery circuit is fully digital, non-PLL-based, and is capable of retiming the output clock with the received data within one data transition. The absence of analog filters or other analog blocks makes its area much smaller than conventional circuitry. It can also be described by any hardware description language, simulated, and synthesized into any digital process. This enables it to be ported from one technology to another and support system on a chip (SOC) designs. The design concept is demonstrated with T-Spice(R) simulations using a 0.25 µm digital CMOS technology. Static performance was evaluated in terms of supply and temperature dependent skews. The shifts in output clock due to these static conditions were within 40 pS. Also dynamic behaviours such as jitter generation and jitter transfer were evaluated. The circuit generates a jitter of 68 pS in response to a supply noise of 250 mV amplitude and 100 MHz frequency. Input data jitter transfer is within 0.1 dB up to a jitter frequency of 150 MHz.
Shinichi NODA Nozomu TOGAWA Masao YANAGISAWA Tatsuo OHTSUKI
At high-level synthesis for system VLSIs, their power consumption is efficiently reduced by applying gated clocks to them. Since using gated clocks causes the reduction of power consumption and the increase of area/delay, estimating trade-off between power and area/delay by applying gated clocks is very important. In this paper, we discuss the amount of variance of area, delay and power by applying gated clocks. We propose a simple gate-level circuit model and estimation equations. We vary parameters in our proposed circuit model, and evaluate power consumption by back-annotating gate-level simulation results to the original circuit. This paper also proposes a conditional expression for applying gated clocks. The expression shows whether or not we can reduce power consumption by applying gated clocks. We confirm the accuracy of proposed estimation equations by experiments.
Hidehiro TAKATA Rei AKIYAMA Tadao YAMANAKA Haruyuki OHKUMA Yasue SUETSUGU Toshihiro KANAOKA Satoshi KUMAKI Kazuya ISHIHARA Atsuo HANAMI Tetsuya MATSUMURA Tetsuya WATANABE Yoshihide AJIOKA Yoshio MATSUDA Syuhei IWADE
An on-chip, 64-Mb, embedded, DRAM MPEG-2 encoder LSI with a multimedia processor has been developed. To implement this large-scale and high-speed LSI, we have developed the hierarchical skew control of multi-clocks, with timing verification, in which cross-talk noise is considered, and simple measures taken against the IR drop in the power lines through decoupling capacitors. As a result, the target performance of 263 MHz at 1.5 V has been successfully attained and verified, the cross-talk noise has been considered, and, in addition, it has become possible to restrain the IR drop to 166 mV in the 162 MHz operation block.
Keiichi KUROKAWA Takuya YASUI Masahiko TOYONAGA Atsushi TAKAHASHI
In this paper, we propose a new clock tree synthesis method for semi-synchronous circuits. A clock tree obtained by the proposed method is a multi-level multi-way clock tree such that a clock-input timing of each register is a multiple of a predefined unit delay and the wire length from a clock buffer to an element driven by it is bounded. The clock trees are constructed for several practical circuits. The size of constructed clock tree is comparable to a zero skew clock tree. In order to assure the practical quality of the clock trees, they are examined under the five delay conditions, which cover various environmental and manufacturing conditions. As a result, they are proved stable under each condition and improve the clock speed up to 17.3% against the zero skew clock trees.
Xiaohong JIANG Susumu HORIGUCHI
Available statistical skew models are too conservative in estimating the expected clock skew of a well-balanced H-tree. New closed form expressions are presented for accurately estimating the expected values and the variances of both the clock skew and the largest clock delay of a well-balanced H-tree. Based on the new model, clock period optimizations of wafer scale H-tree clock network are investigated under both conventional clocking mode and pipelined clocking mode. It is found that when the conventional clocking mode is used, clock period optimization of wafer scale H-tree is reduced to the minimization of expected largest clock delay under both area restriction and power restriction. On the other hand, when the pipelined clocking mode is considered, the optimization is reduced to the minimization of expected clock skew under power restriction. The results obtained in this paper are very useful in the optimization design of wafer scale H-tree clock distribution networks.
Takashi SUGIHARA Kazuyuki ISHIDA Kenkichi SHIMOMURA Katsuhiro SHIMIZU Yukio KOBAYASHI
Using the chirped grating with temperature control, we demonstrated the adaptive dispersion compensation at 40 Gbit/s RZ transmission. The simple monitoring of the 40 GHz frequency component enables us to automatic control of the adaptive dispersion compensator.
Takashi SUGIHARA Kazuyuki ISHIDA Kenkichi SHIMOMURA Katsuhiro SHIMIZU Yukio KOBAYASHI
Using the chirped grating with temperature control, we demonstrated the adaptive dispersion compensation at 40 Gbit/s RZ transmission. The simple monitoring of the 40 GHz frequency component enables us to automatic control of the adaptive dispersion compensator.
Keiji KISHINE Noboru ISHIHARA Haruhiko ICHINO
This paper describes techniques for widening lock and pull-in ranges and suppressing jitter in clock and data recovery ICs. It is shown theoretically that using a duplicated loop control (DLC)-phase-locked loop (PLL) technique enables wider lock and pull-in ranges in clock and data recovery (CDR) without increasing the cut-off frequency of the jitter transfer function. A 2.5-Gb/s DLC-CDR IC fabricated with a 0.5-µm Si bipolar process provides 2.5 times the lock range and 1.5 times the pull-in range of a conventional CDR IC, and the jitter characteristics of the fabricated CDR IC meet all three STM-16 jitter specifications in ITU-T recommendation G.958.
Hiroyuki MATSUBARA Takahiro WATANABE Tadao NAKAMURA
In this paper, we propose a fine grain Cooled Logic architecture for low-power oriented processors. Cooled Logic detects, in novel hardware method with dual-rail logic, functional blocks to be active, and stops clocks to each of the functional blocks in order to make it inactive at certain periods. To confirm the effectiveness of our approach, we design a 4-bit and a 16-bit event-driven array multipliers, and analyze their power consumption by the HSPICE simulator. As a result, it is shown that Cooled Logic has a tendency to reduce power consumptions in both the functional blocks and the clock drivers of the multipliers.
Eiji ARITA Takashi FUJIWARA Kin-ichiro NISHIYAMA Akiko MAENO Yasuo MATSUNAMI Masahiko NAKAMURA Hirohisa MACHIDA Shuji MURAKAMI Hiroyuki NAKAYAMA Masahiko YOSHIMOTO
A complete single chip multi-format Phase Shift Keying (PSK) demodulator ULSI for Japanese BS digital broadcasting is reported. The carrier recovery system shows the pull-in range up to +/-5 MHz. The clock recovery system cancels the poor group delay characteristic and the orthogonality degradation caused by the analog front end, and improves the BER performance by 0.2 dB. Thus the requirement to the analog front end is relaxed. A digital PLL ensures minimum program clock reference jitter in the output data stream, which simplifies jitter management in the succeeding MPEG2 system decoder. It integrates two 8-bit 60 MHz ADCs, 58 MHz VCO, 1 Mbit SRAM and the 450 K-gate FEC-demodulator core. Implementation of 1 Mbit de-interleaver RAM facilitates the use of a low cost receiver. The 8.8 milion transistor chip occupies the 72 mm2 in a 0.25 µm triple-metal CMOS technology.
Tomoyuki YODA Atsushi TAKAHASHI
A semi-synchronous circuit is a circuit in which the clock is assumed to be distributed periodically to each individual register, though not necessarily to all registers simultaneously. In this paper, we propose an algorithm to achieve the target clock period by modifying a given target clock schedule as small as possible, where the realization cost of the target clock schedule is assumed to be minimum. The proposed algorithm iteratively improves a feasible clock schedule. The algorithm finds a set of registers that can reduce the cost by changing their clock timings with same amount, and changes the clock timing with optimal amount. Experiments show that the algorithm achieves the target clock period with fewer modifications.
Luc RYNDERS Patrick SCHAUMONT Serge VERNALDE Ivo BOLSENS
Timing verification of digital synchronous designs is a complex process that is traditionally carried out deep in the design cycle, at the gate level. A method, embodied in a C++ based design system, is presented that allows modeling and verification of clock regions at a higher level. By combining event-driven, clock-cycle true and behavioral simulation, we are able to perform static and dynamic timing analysis of the clock regions.
Xiaojing SHI Hiroki MATSUMOTO Kenji MURAO
A novel SC (Switched-Capacitor) offset- and gain-compensated sample/hold circuit is presented. It is implemented by a new topology which reduces the effects due to the imperfections of op-amp. Simulation results indicate that the circuit achieves high accuracy without requiring high-quality components.
Hiroyuki MATSUBARA Takahiro WATANABE Tadao NAKAMURA
This paper deals with a new low-power clocking scheme for dynamic logic circuits to reduce power dissipation. Although conventional clocking schemes for dynamic logic circuits are mainly used for high-speed applications like domino circuits, their peak-current are very large due to the concentration of precharging and discharging in a short period. It is hard for these schemes to accomplish both reductions of power dissipation and high performance at the same time. In the field of power engineering, leveling power means decreasing peak-to-peak of power keeping its amount. So, we propose a sophisticated clocking scheme leveling power dissipation of processing elements that mainly reduces power dissipation of clock drivers. Our proposed clocking scheme uses an over-lapped clock with a fine-grain power control, and peak-current becomes lower and power dissipation in short period is leveled without penalty of speed performance. Our proposed scheme is applied to a 4-bit array multiplier, and reductions of power dissipation of both the multiplier and clock driver are measured by the HSPICE simulator based on 0.5 µm CMOS technology. It is shown that power dissipation of clock drivers, 4-bit array multiplier, and the total are reduced by about 13.2 percent, 2.6 percent and 7.0 percent, respectively. As a result, our clocking scheme is effective in reduction of power dissipations of clock drivers.