Fengwei AN Lei CHEN Toshinobu AKAZAWA Shogo YAMASAKI Hans Jürgen MATTAUSCH
Nearest-neighbor-search classifiers are attractive but they have high intrinsic computational demands which limit their practical application. In this paper, we propose a coprocessor for k (k with k≥1) nearest neighbor (kNN) classification in which squared Euclidean distances (SEDs) are mapped into the clock domain for realizing high search speed and energy efficiency. The minimal SED searching is carried out by weighted frequency dividers that drastically reduce the normally exponential increase of the worst-case search-clock number with the bit width of vector components to only a linear increase. This also results in low power dissipation and high area-efficiency in comparison to the traditional method using large numbers of adders and comparators. The kNN classifier determines the class of an unknown input sample with a majority decision among the k nearest reference samples. The required majority-decision circuit is integrated with the clock-mapping-based minimal-SED searching architecture and proceeds with the classification immediately after identification of each of the k nearest references. A test chip in 180 nm CMOS technology, which can process 8 dimensions of 32 reference vectors in parallel, achieves low power dissipation of 40.32 mW (at 51.21 MHz clock frequency and 1.8 V supply voltage). Significantly, the distance search circuit consumes only 5.99 mW. Feature vectors with different dimensionality up to 2048 dimensions can be handled by the designed coprocessor due to a dimension extension circuit, enabling large flexibility for usage in different application.
Jungnam BAE Saichandrateja RADHAPURAM Ikkyun JO Takao KIHARA Toshimasa MATSUOKA
We present a low-voltage digitally-controlled oscillator (DCO) with the third-order ΔΣ modulator utilized in the medical implant communication service (MICS) frequency band. An optimized DCO core operating in the subthreshold region is designed, based on the gm/ID methodology. Thermometer coder with the dynamic element matching and ΔΣ modulator are implemented for the frequency tuning. High frequency resolution is achieved by using the ΔΣ modulator. The ΔΣ-modulator-based LC-DCO implemented in a 130-nm CMOS technology has achieved the phase noise of -115.3 dBc/Hz at 200 kHz offset frequency with the tuning range of 382 MHz to 412 MHz for the MICS band. It consumes 700 µW from a 0.7-V supply voltage and has a high frequency resolution of 18 kHz.
This paper proposes a network clock system that detects degradation in the frequency accuracy of network clocks distributed across a network and finds the sources of the degradation. This system uses two factors to identify degradation in frequency accuracy and an algorithm that finds degradation sources by integrating and analyzing the evaluation results gathered from the entire network. Many frequency stability measurement systems have been proposed, and most are based on time synchronization protocols. These systems also realize avoidance of frequency degradation and identification of the sources of the degradation. Unfortunately, the use of time synchronization protocols is impractical if the service provider, such as NTT, has already installed a frequency synchronization system; the provider must replace massive amounts of equipment with new devices that support the time synchronization protocols. Considering the expenditure of installment, this is an excessive burden on service providers. Therefore, a new system that can detect of frequency degradation in network clocks and identify the degradation causes without requiring new equipment is strongly demanded. The proposals made here are implemented by the installation of new circuit cards in current equipment and installing a server that runs the algorithm. This proposed system is currently being installed in NTT's network.
Shinnosuke YOSHIDA Youhua SHI Masao YANAGISAWA Nozomu TOGAWA
As process technologies advance, timing-error correction techniques have become important as well. A suspicious timing-error prediction (STEP) technique has been proposed recently, which predicts timing errors by monitoring the middle points, or check points of several speed-paths in a circuit. However, if we insert STEP circuits (STEPCs) in the middle points of all the paths from primary inputs to primary outputs, we need many STEPCs and thus require too much area overhead. How to determine these check points is very important. In this paper, we propose an effective STEPC insertion algorithm minimizing area overhead. Our proposed algorithm moves the STEPC insertion positions to minimize inserted STEPC counts. We apply a max-flow and min-cut approach to determine the optimal positions of inserted STEPCs and reduce the required number of STEPCs to 1/10-1/80 and their area to 1/5-1/8 compared with a naive algorithm. Furthermore, our algorithm realizes 1.12X-1.5X overclocking compared with just inserting STEPCs into several speed-paths.
Shin-ya ABE Youhua SHI Kimiyoshi USAMI Masao YANAGISAWA Nozomu TOGAWA
In this paper, we first propose an HDR-mcd architecture, which integrates periodically all-in-phase based multiple clock domains and multi-cycle interconnect communication into high-level synthesis. In HDR-mcd, an entire chip is divided into several huddles. Huddles can realize synchronization between different clock domains in which interconnection delay should be considered during high-level synthesis. Next, we propose a high-level synthesis algorithm for HDR-mcd, which can reduce energy consumption by optimizing configuration and placement of huddles. Experimental results show that the proposed method achieves 32.5% energy-saving compared with the existing single clock domain based methods.
In the traditional time delay estimation methods, it is usually implicitly assumed that the observed signals are either only direct path propagate or coherently received. In practice, the multipath propagation and incoherent reception always exist simultaneously. In response to this situation, the joint maximum likelihood (ML) estimation of multipath delays and system error is proposed, and the estimation of the number of multipath is considered as well for the specific incoherent signal model. Furthermore, an algorithm based Gibbs sampling is developed to solve the multi-dimensional nonlinear ML estimation. The efficiency of the proposed estimator is demonstrated by simulation results.
This paper presents an inductive coupling interface using a relay transmission scheme and a low-skew 3D clock distribution network synchronized with an external reference clock source for 3D chip stacking. A relayed transmission scheme using one coil is proposed to reduce the number of coils in a data link. Coupled resonation is utilized for clock and data recovery (CDR) for the first time in the world, resulting in the elimination of a source-synchronous clock link. As a result, the total number of coils required is reduced to one-fifth of the conventional number required, yielding a significant improvement in data rate, layout area, and energy consumption. A low-skew 3D clock distribution network utilizes vertically coupled LC oscillators and horizontally coupled ring oscillators. The proposed frequency-locking and phase-pulling scheme widens the lock range to $pm$ 10%. Two test chips were designed and fabricated in 0.18 $mu$m CMOS. The bandwidth of the proposed interface using relay transmission ThruChip Interface (TCI) is 2.7 Gb/s/mm$^2$; energy consumption per chip is 0.9 pJ/b/chip. Clock skew is less than 18- and 25- ps under a 1.8- and 0.9- V supply. The distributed RMS jitter is smaller than 1.72 ps.
Yanzi ZHOU Ryo TAKAHASHI Takashi HIKIHARA
In this letter, we establish a model of a digital clock synchronization method for power packet dispatching. The first-order control is carried out to a specified model to achieve the clock synchronization. From the experimental results, it is confirmed that power packets were recognized under autonomous synchronization.
Takashi KAWAMOTO Masato SUZUKI Takayuki NOTO
A serial ATA PHY fabricated in a 0.15-µm CMOS process performs the serial ATA operation in an asynchronous transition by using large variation in the reference clock. This technique calibrates a transmission signal frequency by utilizing the received signal. This is achieved by calibrating the divide ratio of a spread-spectrum clock generator (SSCG). This technique enables a serial ATA PHY to use reference oscillators with a production-frequency tolerance of less than 400ppm, i.e., higher than the permissible TX frequency variations (i.e., 350ppm). The calibrated transmission signal achieved a total jitter of 3.9ps.
Yukihide KOHIRA Atsushi TAKAHASHI
Multi-domain clock skew scheduling in general-synchronous framework is an effective technique to improve the performance of sequential circuits by using practical clock distribution network. Although the upper bound of performance of a circuit increases as the number of clock domains increases in multi-domain clock skew scheduling, the improvement of the performance becomes smaller while the cost of clock distribution network increases much. In this paper, a linear time algorithm that finds an optimum two-domain clock skew schedule in general-synchronous framework is proposed. Experimental results on ISCAS89 benchmark circuits and artificial data show that optimum circuits are efficiently obtained by our method in short time.
James LIN Masaya MIYAHARA Akira MATSUZAWA
This paper proposes an ultra-low-voltage, wide signal swing, and clock-scalable differential dynamic amplifier using a common-mode voltage detection technique. The essential characteristics of an amplifier, such as gain, linearity, power consumption, noise, etc., are analyzed. In measurement, the proposed dynamic amplifier achieves a 13dB gain with less than 1dB drop over a differential output signal swing of 340mVpp with a supply voltage of 0.5V. The attained maximum operating frequency is 700MHz. With a 0.7V supply, the gain increases to 16dB with a signal swing of 700mVpp. The prototype amplifier is fabricated in 90nm CMOS technology with the low threshold voltage and the deep N-well options.
In this paper, we propose a new design technique called extit{asynchronous multi-frequency clocking} for suppressing EMI at a chip design level by combining two independent EMI-suppressing approaches: extit{multi-frequency clocking} and extit{asynchronous circuit design} techniques. To show the effectiveness of our approach, a five-stage pipelined asynchronous MIPS with multi-frequency clocking has been implemented on a commercial Xilinx FPGA device. Our approach shows 11.05 dB and 5.88 dB reductions of peak EM radiation in the prototyped implementation when compared to conventional synchronous and bundled-data asynchronous circuit counterparts, respectively.
SinNyoung KIM Akira TSUCHIYA Hidetoshi ONODERA
This paper presents an analysis of radiation-induced clock-perturbation in phase-locked loop (PLL). Due to a trade-off between cost, performance, and reliability, radiation hardened PLL design need robust strategy. Thus, evaluation of radiation vulnerability is important to choose the robust strategy. The conventional evaluation-method is however based on brute-force analysis — SPICE simulation and experiment. The presented analysis result eliminates the brute-force analysis in evaluation of the radiation vulnerability. A set of equations enables to predict the radiation-induced clock-perturbation at the every sub-circuits. From a demonstration, the most vulnerable nodes have been found, which are validated using a PLL fabricated with 0.18µm CMOS process.
Susumu KOBAYASHI Fumihiro MINAMI
As the LSI process technology advances and the gate size becomes smaller, the signal delay on interconnect becomes a significant factor in the signal path delay. Also, as the size of interconnect structure becomes smaller, the interconnect process variations have become one of the dominant factors which influence the signal delay and thus clock skew. Therefore, controlling the influence of interconnect process variations on clock skew is a crucial issue in the advanced process technologies. In this paper, we propose a method for minimizing clock skew fluctuations caused by interconnect process variations. The proposed method identifies the suitable balance of clock buffer size and wire length in order to minimize the clock skew fluctuations caused by the interconnect process variations. Experimental results on test circuits of 28nm process technology show that the proposed method reduces the clock skew fluctuations by 30-92% compared to the conventional method.
Xin-Gang WANG Fei WANG Rui JIA Rui CHEN Tian ZHI Hai-Gang YANG
This paper proposes a coarse-fine Time-to-Digital Converter (TDC), based on a Ring-Tapped Delay Line (RTDL). The TDC achieves the picosecond's level timing resolution and microsecond's level dynamic range at low cost. The TDC is composed of two coarse time measurement blocks, a time residue generator, and a fine time measurement block. In the coarse blocks, RTDL is constructed by redesigning the conventional Tapped Delay Line (TDL) in a ring structure. A 12-bit counter is employed in one of the two coarse blocks to count the cycle times of the signal traveling in the RTDL. In this way, the input range is increased up to 20.3µs without use of an external reference clock. Besides, the setup time of soft-edged D-flip-flops (SDFFs) adopted in RTDL is set to zero. The adjustable time residue generator picks up the time residue of the coarse block and propagates the residue to the fine block. In the fine block, we use a Vernier Ring Oscillator (VRO) with MOS capacitors to achieve a scalable timing resolution of 11.8ps (1 LSB). Experimental results show that the measured characteristic curve has high-level linearity; the measured DNL and INL are within ± 0.6 LSB and ± 1.5 LSB, respectively. When stimulated by constant interval input, the standard deviation of the system is below 0.35 LSB. The dead time of the proposed TDC is less than 650ps. When operating at 5 MSPS at 3.3V power supply, the power consumption of the chip is 21.5mW. Owing to the use of RTDL and VRO structures, the chip core area is only 0.35mm × 0.28mm in a 0.35µm CMOS process.
Bongsub SONG Kyunghoon KIM Junan LEE Kwangsoo KIM Younglok KIM Jinwook BURM
A complete 4-level pulse amplitude modulation (4-PAM) serial link transceiver including a wide frequency range clock generator and clock data recovery (CDR) is proposed in this paper. A dual-loop architecture, consisting of a frequency locked loop (FLL) and a phase locked loop (PLL), is employed for the wide frequency range clocks. The generated clocks from the FLL (clock generator) and the PLL (CDR) are utilized for a transmitter clock and a receiver clock, respectively. Both FLL and PLL employ the identical voltage controlled oscillators consisting of ring-type delay-cells. To improve the frequency tuning range of the VCO, deep triode PMOS loads are utilized for each delay-cell, since the turn-on resistance of the deep triode PMOS varies substantially by the gate-voltage. As a result, fabricated in a 0.13-µm CMOS process, the proposed 4-PAM transceiver operates from 1.5 Gb/s to 9.7 Gb/s with a bit error rate of 10-12. At the maximum data-rate, the entire power dissipation of the transceiver is 254 mW, and the measured jitter of the recovered clock is 1.61 psrms.
Naoya OKADA Yuichi NAKAMURA Shinji KIMURA
Nonvolatile flip-flop enables leakage power reduction in logic circuits and quick return from standby mode. However, it has limited write endurance, and its power consumption for writing is larger than that of conventional D flip-flop (DFF). For this reason, it is important to reduce the number of write operations. The write operations can be reduced by stopping the clock signal to synchronous flip-flops because write operations are executed only when the clock is applied to the flip-flops. In such clock gating, a method using Exclusive OR (XOR) of the current value and the new value as the control signal is well known. The XOR based method is effective, but there are several cases where the write operations can be reduced even if the current value and the new value are different. The paper proposes a method to detect such unnecessary write operations based on state transition analysis, and proposes a write control method to save power consumption of nonvolatile flip-flops. In the method, redundant bits are detected to reduce the number of write operations. If the next state and the outputs do not depend on some current bit, the bit is redundant and not necessary to write. The method is based on Binary Decision Diagram (BDD) calculation. We construct write control circuits to stop the clock signal by converting BDDs representing a set of states where write operations are unnecessary. Proposed method can be combined with the XOR based method and reduce the total write operations. We apply combined method to some benchmark circuits and estimate the power consumption with Synopsys NanoSim. On average, 15.0% power consumption can be reduced compared with only the XOR based method.
Takashi KAWAMOTO Masato SUZUKI Takayuki NOTO
A technique that enables a SSCG to fine-tune an output signal frequency and a spread ratio is presented. Proposed SSCG achieves the output signal frequency from 1.2 GHz to 3.0 GHz and the spread ratio from 0 to 30000 ppm. The fine-tuning technique achieves 30 ppm adjustment of the output signal frequency and 200 ppm adjustment of the spread ratio. This technique is achieved by controlling a triangular modulation signal characteristics generated by a proposed digital controlled wave generator. A proposed multi-modulus divider can have a divide ratio of 4/5 and 8/9. This SSCG has been fabricated in a 0.13-µm CMOS process. The output signal frequency-range and the spread ratio are achieved fluently from 0.1 to 3.0 GHz and from 0 to 30000 ppm, respectively. EMI noise is suppressed at less than 17.1 dB at the output signal frequency of 3.0 GHz and spread ratio of 30000 ppm.
Yoshinobu HIGAMI Hiroshi TAKAHASHI Shin-ya KOBAYASHI Kewal K. SALUJA
This paper deals with delay faults on clock lines assuming the launch-on-capture test. In this realistic fault model, the amount of delay at the FF driven by the faulty clock line is such that the scan shift operation can perform correctly even in the presence of a fault, but during the system clock operation, capturing functional value(s) at faulty FF(s), i.e. FF(s) driven by the clock with delay, is delayed and correct value(s) may not be captured. We developed a fault simulator that can handle such faults and using this simulator we investigate the relation between the duration of the delay and the difficulty of detecting clock delay faults in the launch-on-capture test. Next, we propose test generation methods for detecting clock delay faults that affect a single or two FFs. Experimental results for benchmark circuits are given in order to establish the effectiveness of the proposed methods.
Wenpo ZHANG Kazuteru NAMBA Hideo ITO
As technology scales to 45 nm and below, the reliability of VLSI declines due to small delay defects, which are hard to detect by functional clock frequency. To detect small delay defects, a method which measures the delay time of path in circuit under test (CUT) was proposed. However, because a large number of FFs exist in recent VLSI, the probability that the resistive defect occurs in the FFs is increased. A test method measuring path delay time including the transmission time of FFs is necessary. However, the path measured by the conventional on-chip path delay time measurement method does not include a part of a master latch. Thus, testing using the conventional measurement method cannot detect defects occurring on the part. This paper proposes an improved on-chip path delay time measurement method. Test coverage is improved by measuring the path delay time including transmission time of a master latch. The proposed method uses a duty-cycle-modified clock signal. Evaluation results show that, the proposed method improves test coverage 5.2511.28% with the same area overhead as the conventional method.