IEICE global.ieice.org Site

Keyword Search Result

[Keyword] skew(55hit)

21-40hit(55hit)

Clock Skew Evaluation Considering Manufacturing Variability in Mesh-Style Clock Distribution
Shinya ABE Masanori HASHIMOTO Takao ONOYE

PAPER-Device and Circuit Modeling and Analysis

Vol:
E91-A No:12
Page(s):
3481-3487
Influence of manufacturing variability on circuit performance has been increasing because of finer manufacturing process and lowered supply voltage. In this paper, we focus on mesh-style clock distribution which is believed to be effective for reducing clock skew, and we evaluate clock skew considering manufacturing and design variabilities. Considering MOS transistor variation -- random and spatially-correlated variation -- and non-uniform flip-flop (FF) placement, we demonstrate that spatially-correlated variation and severe non-uniform FF distribution can be major sources of clock skew. We also examine the dependency of clock skew on design parameters, and reveal that finer clock mesh does not necessarily reduce clock skew.
Simultaneous Optimization of Skew and Control Step Assignments in RT-Datapath Synthesis
Takayuki OBATA Mineo KANEKO

PAPER-High-Level Synthesis and System-Level Design

Vol:
E91-A No:12
Page(s):
3585-3595
As well as the schedule affects system performance, the control skew, i.e., the arrival time difference of control signals between registers, can be utilized for improving the system performance, enhancing robustness against delay variations, etc. The simultaneous optimization of the control step assignment and the control skew assignment is more powerful technique in improving performance. In this paper, firstly, we prove that, even if the execution sequence of operations which are assigned to the same resource is fixed, the simultaneous optimization problem under a fixed clock period is NP-hard. Secondly, we propose a heuristic algorithm for the simultaneous control step and skew optimization under given clock period, and we show how much the simultaneous optimization improves system performance. This paper is the first one that uses the intentional skew to shorten control steps under a specified clock period. The proposed algorithm has the potential to play a central role in various scenarios of skew-aware high level synthesis.
Power and Skew Aware Point Diffusion Clock Network
Gunok JUNG Chunghee KIM Kyoungkuk CHAE Giho PARK Sung Bae PARK

LETTER-Integrated Electronics

Vol:
E91-C No:11
Page(s):
1832-1834
This letter presents point diffusion clock network (PDCN) with local clock tree synthesis (CTS) scheme. The clock network is implemented with ten times wider metal line space than typical mesh networks for low power and utilized to nine times smaller area CTS execution for minimized clock skew amount. The measurement results show that skew amount of PDCN with local CTS is reduced to 36% and latency is shrunk to 45% of the amount in a 4.81 mm2 CortexA-8 core with 65 nm Samsung process.
Post-Silicon Clock-Timing Tuning Based on Statistical Estimation
Yuko HASHIZUME Yasuhiro TAKASHIMA Yuichi NAKAMURA

PAPER

Vol:
E91-A No:9
Page(s):
2322-2327
In deep-submicron technologies, process variations can significantly affect the performance and yield of VLSI chips. As a countermeasure to the variations, post-silicon tuning has been proposed. Deskew, where the clock timing of flip-flops (FFs) is tuned by inserted programmable delay elements (PDEs) into the clock tree, is classified into this method. We propose a novel deskew method that decides the delay values of the elements by measuring a small amount of FFs' clock timing and presuming the rest of FFs' clock timings based on a statistical model. In addition, our proposed method can determine the discrete PDE delay value because the rewriting constraint satisfies the condition of total unimodularity.
Noise-Induced Synchronization among Sub-RF CMOS Analog Oscillators for Skew-Free Clock Distribution
Akira UTAGAWA Tetsuya ASAI Tetsuya HIROSE Yoshihito AMEMIYA

PAPER-Electronic Circuits and Systems

Vol:
E91-A No:9
Page(s):
2475-2481
We present on-chip oscillator arrays synchronized by random noises, aiming at skew-free clock distribution on synchronous digital systems. Nakao et al. recently reported that independent neural oscillators can be synchronized by applying temporal random impulses to the oscillators [1],[2]. We regard neural oscillators as independent clock sources on LSIs; i.e., clock sources are distributed on LSIs, and they are forced to synchronize through the use of random noises. We designed neuron-based clock generators operating at sub-RF region (< 1 GHz) by modifying the original neuron model to a new model that is suitable for CMOS implementation with 0.25-µm CMOS parameters. Through circuit simulations, we demonstrate that i) the clock generators are certainly synchronized by pseudo-random noises and ii) clock generators exhibited phase-locked oscillations even if they had small device mismatches.
Skew-Frobenius Maps on Hyperelliptic Curves
Shunji KOZAKI Kazuto MATSUO Yasutomo SHIMBARA

LETTER-Cryptography and Information Security

Vol:
E91-A No:7
Page(s):
1839-1843
Scalar multiplication methods using the Frobenius maps are known for efficient methods to speed up (hyper)elliptic curve cryptosystems. However, those methods are not efficient for the cryptosystems constructed on fields of small extension degrees due to costs of the field operations. Iijima et al. showed that one can use certain automorphisms on the quadratic twists of elliptic curves for fast scalar multiplications without the drawback of the Frobenius maps. This paper shows an extension of the automorphisms on the Jacobians of hyperelliptic curves of arbitrary genus.
GDME: Grey Relational Clustering Applied to a Clock Tree Construction with Zero Skew and Minimal Delay
Chia-Chun TSAI Jan-Ou WU Trong-Yen LEE

PAPER-VLSI Design Technology and CAD

Vol:
E91-A No:1
Page(s):
365-374
This study has demonstrated that the clock tree construction in an SoC should be expanded to consider the intrinsic delay and skew of each IP's clock sink. A novel algorithm, called GDME, is proposed to combine grey relational clustering and DME approach for solving the problem of clock tree construction. Grey relational analysis can cluster the best pair of clock sinks and that guide a tapping point search for a DME algorithm for constructing a clock tree with zero skew and minimal delay. Experimentally, the proposed algorithm always obtains an RC- or RLC-based clock tree with zero skew and minimal delay for all the test cases and benchmarks. Experimental results demonstrate that the GDME improves up to 3.74% for total average in terms of total wire length compared with other DME algorithms. Furthermore, our results for the zero-skew RLC-based clock trees compared with Hspice are 0.017% and 0.2% lower for absolute average in terms of skew and delay, respectively.
Variant X-Tree Clock Distribution Network and Its Performance Evaluations
Xu ZHANG Xiaohong JIANG Susumu HORIGUCHI

PAPER-Low-Power and High-Performance VLSI Circuit Technology

Vol:
E90-C No:10
Page(s):
1909-1918
The evolution of VLSI chips towards larger die size, smaller feature size and faster clock speed makes the clock distribution an increasingly important issue. In this paper, we propose a new clock distribution network (CDN), namely Variant X-Tree, based on the idea of X-Architecture proposed recently for efficient wiring within VLSI chips. The Variant X-Tree CDN keeps the nice properties of equal-clock-path and symmetric structure of the typical H-Tree CDN, but results in both a lower maximal clock delay and a lower clock skew than its H-Tree counterpart, as verified by an extensive simulation study that incorporates simultaneously the effects of process variations and on-chip inductance. We also propose a closed-form statistical models for evaluating the skew and delay of the Variant X-Tree CDN. The comparison between the theoretical results and the simulation results indicates that the proposed statistical models can be used to efficiently and rapidly evaluate the performance of the variant X-Tree CDNs.
A 100-Gb/s-Physical-Layer Architecture for Higher-Speed Ethernet for VSR and Backplane Applications
Hidehiro TOYODA Shinji NISHIMURA Michitaka OKUNO Matsuaki TERADA

PAPER-VLSI Architecture for Communication/Server Systems

Vol:
E90-C No:10
Page(s):
1957-1963
A high-speed physical-layer architecture for next-generation higher-speed Ethernet for VSR and backplane applications was developed. VSR and backplane networks provide 100-Gb/s data transmission in "mega data centers" and blade servers, which have new and broad potential markets of LAN technologies. It supports 100-Gb/s-throughput, high-reliability, and low-latency data transmission, making it well suited to VSR and backplane applications for intra-building and intra-cabinet networks. Its links comprise ten 10-Gb/s high-speed serial lanes. Payload data are transmitted by ribbon fiber cables for very short reach and by copper channels for the backplane board. Ten lanes convey 320-bit data synchronously (32 bits10 lanes) and parity data of forward-error correction code (newly developed (544, 512) code FEC), providing highly reliable (BER<1E-22) data transmission with a burst-error correction with low latency (31.0 ns on the transmitter (Tx) side and 111.6 ns on the receiver (Rx) side). A 64B/66B code-sequence-based skew compensation mechanism, which provides low-latency compensation for the lane-to-lane skew (less than 51 ns), is used for parallel transmission. Testing this physical-layer architecture in an ASIC showed that it can provide 100-Gb/s data transmission with a 772-kgate circuit, which is small enough for implementation in a single LSI.
Zero-Skew Driven Buffered RLC Clock Tree Construction
Jan-Ou WU Chia-Chun TSAI Chung-Chieh KUO Trong-Yen LEE

PAPER-VLSI Design Technology and CAD

Vol:
E90-A No:3
Page(s):
651-658
In nature an unbalanced clock tree exists in a SoC because the clock sinks of IPs have distinct input capacitive loads and internal delays. The construction of a bottom-up RLC clock tree with minimal clock delay and zero skew is crucial to ensure good SoC performance. This study proves that an RLC clock tree construction always has no zero skew owing to skew upward propagation. Specifically, this study proposes the insertion of two unit-size buffers associated with the binary search for a tapping point into each pair of subtrees to interrupt the non-zero skew upward propagation. This technique enables reliable construction of a buffered RLC clock tree with zero skew. The effectiveness of the proposed approach is demonstrated by assessing benchmarks.
All-Digital Clock Deskew Buffer with Variable Duty Cycles
Shao-Ku KAO Shen-Iuan LIU

PAPER

Vol:
E89-C No:6
Page(s):
753-760
An all-digital clock deskew buffer with variable duty cycles is presented. The proposed circuit aligns the input and output clocks with two cycles. A pulsewidth detector using the sequential time-to-digital conversion is employed to detect the duty cycle. The output clock with adjustable duty cycles can be generated. The proposed circuit has been fabricated in a 0.35 µm CMOS technology. The measured duty cycle of the output clock can be adjusted from 30% to 70% in steps of 10%. The operation frequency range is from 400 MHz to 600 MHz.
Simultaneous Compensation of RC Mismatch and Clock Skew in Time-Interleaved S/H Circuits
Zheng LIU Masanori FURUTA Shoji KAWAHITO

PAPER

Vol:
E89-C No:6
Page(s):
710-716
The RC mismatch among S/H stages for time-interleaved ADCs causes a phase error and a gain error and the phase error is dominant. The paper points out that clock skew and the phase error caused by the RC mismatch have similar effects on the sampling error and then can be compensated with the clock skew compensation. Simulation results agree well with the theoretical analysis. With the phase error compensation of RC mismatch, the SNDR in 14b ADC can be improved by more than 15 dB in the case that the bandwidth of S/H circuits is 3 times the sampling frequency. This paper also proposes a method of clock skew and RC mismatch compensation in time-interleaved sample-and-hold (S/H) circuits by sampling clock phase adjusting.
100-Gb/s Physical-Layer Architecture for Next-Generation Ethernet
Hidehiro TOYODA Shinji NISHIMURA Michitaka OKUNO Kouji FUKUDA Kouji NAKAHARA Hiroaki NISHI

PAPER

Vol:
E89-B No:3
Page(s):
696-703
A high-speed physical-layer architecture for Ethernet is described that supports 100-Gb/s throughput and 40-km transmission, making it well suited for next-generation metro-area and intrabuilding networks. Its links comprise 1210-Gb/s synchronized parallel optical lanes. Ethernet data frames are transmitted by coarse wavelength division multiplexing link and bundled optical fibers. Ten of the lanes convey 640-bit data synchronously (64 bits10 lanes). One conveys forward error correction code ((132 b, 140 b) Hamming code), providing highly reliable (BER < 10-12) data transmission, and the other conveys parity data, enabling fault-lane recovery. A newly developed 64B/66B code-sequence-based deskewing mechanism is used that provides low-latency compensation for the lane-to-lane skew, which is less than 88 ns. Testing of this physical-layer architecture in a field programmable gate array circuit demonstrated that it can provide 100-Gb/s data communication with a 590 k gate circuit, which is small enough for implementation in a single LSI circuit.
On-Chip Thermal Gradient Analysis and Temperature Flattening for SoC Design
Takashi SATO Junji ICHIMIYA Nobuto ONO Koutaro HACHIYA Masanori HASHIMOTO

PAPER-Prediction and Analysis

Vol:
E88-A No:12
Page(s):
3382-3389
This paper quantitatively analyzes thermal gradient of SoC and proposes a thermal flattening procedure. First, the impact of dominant parameters, such as area occupancy of memory/logic block, power density, and floorplan on thermal gradient are studied quantitatively. Temperature difference is also evaluated from timing and reliability standpoints. Important results obtained here are 1) the maximum temperature difference increases with higher memory area occupancy and 2) the difference is very floorplan sensitive. Then, we propose a procedure to amend thermal gradient. A slight floorplan modification using the proposed procedure improves on-chip thermal gradient significantly.
Navigating Register Placement for Low Power Clock Network Design
Yongqiang LU Chin-Ngai SZE Xianlong HONG Qiang ZHOU Yici CAI Liang HUANG Jiang HU

PAPER-Floorplan and Placement

Vol:
E88-A No:12
Page(s):
3405-3411
With VLSI design development, the increasingly severe power problem requests to minimize clock routing wirelength so that both power consumption and power supply noise can be alleviated. In contrast to most of traditional works that handle this problem only in clock routing, we propose to navigate standard cell register placement to locations that enable further less clock routing wirelength and power. To minimize adverse impacts to conventional cell placement goals such as signal net wirelength and critical path delay, the register placement is carried out in the context of a quadratic placement. The proposed technique is particularly effective for the recently popular prescribed skew clock routing. Experiments on benchmark circuits show encouraging results.
Statistical Analysis of Clock Skew Variation in H-Tree Structure
Masanori HASHIMOTO Tomonori YAMAMOTO Hidetoshi ONODERA

PAPER-Prediction and Analysis

Vol:
E88-A No:12
Page(s):
3375-3381
This paper discusses clock skew due to manufacturing variability and environmental change. In clock tree design, transition time constraint is an important design parameter that controls clock skew and power dissipation. In this paper, we evaluate clock skew under several variability models, and demonstrate relationship among clock skew, transition time constraint and power dissipation. Experimental results show that constraint of small transition time reduces clock skew under manufacturing and supply voltage variabilities, whereas there is an optimum constraint value for temperature gradient. Our experiments in a 0.18 µm technology indicate that clock skew is minimized when clock buffer is sized such that the ratio of output and input capacitance is four.
Address Computation in Configurable Parallel Memory Architecture
Eero AHO Jarno VANNE Kimmo KUUSILINNA Timo D. HAMALAINEN

PAPER-Networking and System Architectures

Vol:
E87-D No:7
Page(s):
1674-1681
Parallel memories increase memory bandwidth with several memory modules working in parallel and can be used to feed a processor with only necessary data. The Configurable Parallel Memory Architecture (CPMA) enables a multitude of access formats and module assignment functions to be used within a single hardware implementation, which has not been possible in prior embedded parallel memory systems. This paper focuses on address computation in CPMA, which is implemented using several configurable computation units in parallel. One unit is dedicated for each type of access formats and module assignment functions that the implementation supports. Timing and area estimates are given for a 0.25-micron CMOS process. The utilized resources are shown to be linearly proportional to the number of memory modules.
A Decision Feedback Equalizing Receiver for the SSTL SDRAM Interface with Clock-Data Skew Compensation
Young-Soo SOHN Seung-Jun BAE Hong-June PARK Soo-In CHO

PAPER-Integrated Electronics

Vol:
E87-C No:5
Page(s):
809-817
A CMOS DFE (decision feedback equalization) receiver with a clock-data skew compensation was implemented for the SSTL (stub-series terminated logic) SDRAM interface. The receiver consists of a 2 way interleaving DFE input buffer for ISI reduction and a X2 over-sampling phase detector for finding the optimum sampling clock position. The measurement results at 1.2 Gbps operation showed the increase of voltage margin by about 20% and the decrease of time jitter in the recovered sampling clock by about 40% by equalization in an SSTL channel with 2 pF 4 stub load. Active chip area and power consumption are 3001000 µm2 and 142 mW, respectively, with a 2.5 V, 0.25 µm CMOS process.
Signal Transmission and Coding Architecture for Next-Generation Ethernet
Hidehiro TOYODA Hiroaki NISHI Shinji NISHIMURA Hisaaki KANAI Katsuyoshi HARASAWA

PAPER

Vol:
E86-D No:11
Page(s):
2317-2324
The first practical approach to 100-Gigabit Ethernet, i.e., Ethernet with a throughput of 100-Gb/s, is proposed for use in the next generation of LANs for GRID computing and large-capacity data centers. New structures, including a coding architecture, de-skewing method and high-speed packaging techniques, are introduced to the PHY layer to obtain the required data rate. Our form of 100-Gigabit Ethernet uses 10-Gb/s 10-channel CWDM or parallel-optical links. The coding architecture is formed of 64B/66B codes, modified for the CWDM and parallel links. In the de-skewing of the parallel signals, specially designed IDLE characters are used to compensate for skewing of data in the respective signal lanes. Advanced packaging techniques, which suppress the propagation loss and reflection of the 10-Gb/s lanes to obtain high-speed, good integrity and low-noise signaling, are proposed and evaluated. The proposed architectural features make this 100-Gigabit Ethernet concept practical for next-generation LANs.
Physical Design Methodology for On-Chip 64-Mb DRAM MPEG-2 Encoding with a Multimedia Processor
Hidehiro TAKATA Rei AKIYAMA Tadao YAMANAKA Haruyuki OHKUMA Yasue SUETSUGU Toshihiro KANAOKA Satoshi KUMAKI Kazuya ISHIHARA Atsuo HANAMI Tetsuya MATSUMURA Tetsuya WATANABE Yoshihide AJIOKA Yoshio MATSUDA Syuhei IWADE

PAPER-Product Designs

Vol:
E85-C No:2
Page(s):
368-374
An on-chip, 64-Mb, embedded, DRAM MPEG-2 encoder LSI with a multimedia processor has been developed. To implement this large-scale and high-speed LSI, we have developed the hierarchical skew control of multi-clocks, with timing verification, in which cross-talk noise is considered, and simple measures taken against the IR drop in the power lines through decoupling capacitors. As a result, the target performance of 263 MHz at 1.5 V has been successfully attained and verified, the cross-talk noise has been considered, and, in addition, it has become possible to restrain the IR drop to 166 mV in the 162 MHz operation block.

21-40hit(55hit)

Keyword Search Result

[Keyword] skew(55hit)

Clock Skew Evaluation Considering Manufacturing Variability in Mesh-Style Clock Distribution

Simultaneous Optimization of Skew and Control Step Assignments in RT-Datapath Synthesis

Power and Skew Aware Point Diffusion Clock Network

Post-Silicon Clock-Timing Tuning Based on Statistical Estimation

Noise-Induced Synchronization among Sub-RF CMOS Analog Oscillators for Skew-Free Clock Distribution

Skew-Frobenius Maps on Hyperelliptic Curves

GDME: Grey Relational Clustering Applied to a Clock Tree Construction with Zero Skew and Minimal Delay

Variant X-Tree Clock Distribution Network and Its Performance Evaluations

A 100-Gb/s-Physical-Layer Architecture for Higher-Speed Ethernet for VSR and Backplane Applications

Zero-Skew Driven Buffered RLC Clock Tree Construction

All-Digital Clock Deskew Buffer with Variable Duty Cycles

Simultaneous Compensation of RC Mismatch and Clock Skew in Time-Interleaved S/H Circuits

100-Gb/s Physical-Layer Architecture for Next-Generation Ethernet

On-Chip Thermal Gradient Analysis and Temperature Flattening for SoC Design

Navigating Register Placement for Low Power Clock Network Design

Statistical Analysis of Clock Skew Variation in H-Tree Structure

Address Computation in Configurable Parallel Memory Architecture

A Decision Feedback Equalizing Receiver for the SSTL SDRAM Interface with Clock-Data Skew Compensation

Signal Transmission and Coding Architecture for Next-Generation Ethernet

Physical Design Methodology for On-Chip 64-Mb DRAM MPEG-2 Encoding with a Multimedia Processor

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles