Pengjun WANG Yuejun ZHANG Jun HAN Zhiyi YU Yibo FAN Zhang ZHANG
In modern cryptographic systems, physical unclonable functions (PUFs) are efficient mechanisms for many security applications, which extract intrinsic random physical variations to generate secret keys. The classical PUFs mainly exhibit static challenge-response behaviors and generate static keys, while many practical cryptographic systems need reconfigurable PUFs which allow dynamic keys derived from the same circuit. In this paper, the concept of reconfigurable multi-port PUFs (RM-PUFs) is proposed. RM-PUFs not only allow updating the keys without physically replacement, but also generate multiple keys from different ports in one clock cycle. A practical RM-PUFs construction is designed based on asynchronous clock and fabricated in TSMC low-power 65 nm CMOS process. The area of test chip is 1.1 mm2, and the maximum clock frequency is 0.8 GHz at 1.2 V. The average power consumption is 27.6 mW at 27. Finally, test results show that the RM-PUFs generate four reconfigurable 128-bit secret keys, and the keys are secure and reliable over a range of environmental variations such as supply voltage and temperature.
Wisam K. HUSSAIN Loay D. KHALAF Mohammed HAWA
Initial cell search in wideband code-division multiple-access (W-CDMA) systems is a challenging process. On the one hand, channel impairments such as multipath fading, Doppler shift, and noise create frequency and time offsets in the received signal. On the other hand, the residual synchronization error of the crystal oscillator at the mobile station also causes time and frequency offsets. Such offsets can affect the ability of a mobile station to perform cell search. Previous work concentrated on cell synchronization algorithms that considered multipath channels and frequency offsets, but ignored clock and timing offsets due to device tolerances. This work discusses a robust initial cell search algorithm, and quantifies its performance in the presence of frequency and time offsets due to two co-existing problems: channel impairments and clock drift at the receiver. Another desired performance enhancement is the reduction of power consumption of the receiver, which is mainly due to the computational complexity of the algorithms. This power reduction can be achieved by reducing the computational complexity by a divide and conquer strategy during the synchronization process.
Tsutomu TAKEYA Tadahiro KURODA
In this paper, a symbol-rate clock recovery scheme for a receiver that uses an integrating decision feedback equalizer (DFE) is proposed. The proposed clock recovery using expected received signal amplitudes as the criterion realizes minimum mean square error (MMSE) clock recovery. A receiver architecture using an integrating DFE with the proposed symbol-rate clock recovery is also proposed. The proposed clock recovery algorithm successfully recovered the clock phase in a system level simulation only with a DFE. Higher jitter tolerance than 0.26 UIPP at 10 Gb/s operation was also confirmed in the simulation with an 11 dB channel loss at 5 GHz.
Clock network synthesis is one of the most important and limiting factors in VLSI designs. Hence, the clock skew variation reduction is one of the most important objectives in clock distribution methodology. Cross-link insertion is proposed in [1], however, it is based on empirical methods and does not use variation information for link insertion location choice. [17] considers the delay variation, but it is slow even for small clock trees. In this paper, we propose a fast link insertion algorithm that considers the delay variation information directly during link selection process. Experimental results show that our algorithm is very fast and achieves better skew variability reduction while utilizing considerably lesser routing resources compared with existing methods.
Monica FIGUEIREDO Rui L. AGUIAR
This paper presents a model to estimate jitter insertion and accumulation in clock repeaters. We propose expressions to estimate, with low computational effort, both static and dynamic clock jitter insertion in repeaters with different sizes, interconnects and slew-rates. It requires only the pre-characterization of a reference repeater, which can be accomplished with a small number of simulations or measurements. Furthermore, we propose expressions for dynamic jitter accumulation that considers the dual nature of power and ground noise impact on delay. The complete model can be used to replace time-consuming transient noise simulations when evaluating jitter in clock distribution systems, and provide valuable insights regarding the impact of design parameters on jitter. Presented results show that our models can estimate jitter insertion and accumulation with an error within 10% of simulation results, for typical designs, and accurately reflect the impact of changing design parameters.
Yanling ZHI Wai-Shing LUK Yi WANG Changhao YAN Xuan ZENG
Yield-driven clock skew scheduling was previously formulated as a minimum cost-to-time ratio cycle problem, by assuming that variational path delays are in Gaussian distributions. However in today's nanometer technology, process variations show growing impacts on this assumption, as variational delays with non-Gaussian distributions have been observed on these paths. In this paper, we propose a novel yield-driven clock skew scheduling method for arbitrary distributions of critical path delays. Firstly, a general problem formulation is proposed. By integrating the cumulative distribution function (CDF) of critical path delays, the formulation is able to handle path delays with any distributions. It also generalizes the previous formulations on yield-driven clock skew scheduling and indicates their statistical interpretations. Generalized Howard algorithm is derived for finding the critical cycles of the underlying timing constraint graphs. Moreover, an effective algorithm based on minimum balancing is proposed for the overall yield improvement. Experimental results on ISCAS89 benchmarks show that, compared with two representative existing methods, our method remarkably improves the yield by 10.25% on average (up to 14.66%).
The impact of clock-skew on circuit timing increases rapidly as technology scales. As a result, it becomes important to deal with clock-skew at the early stages of circuit designs. This paper presents a novel datapath design that aims at mitigating the impact of clock-skew in high-level synthesis, by integrating margin (evaluated as the maximum number of clock-cycles to absorb clock-skew) and ordered clocking into high-level synthesis tasks. As a first attempt to the proposed datapath design, this paper presents a 0-1 integer linear programming formulation that focuses on register binding to achieve the minimum cost (the minimum number of registers) under given scheduling result. Experimental results show the optimal results can be obtained without increasing the latency, and with a few extra registers compared to traditional high-level synthesis design.
Xin MAN Takashi HORIYAMA Shinji KIMURA
Clock gating is supported by commercial tools as a power optimization feature based on the guard signal described in HDL (structural method). However, the identification of control signals for gated registers is hard and designer-intensive work. Besides, since the clock gating cells also consume power, it is imperative to minimize the number of inserted clock gating cells and their switching activities for power optimization. In this paper, we propose an automatic multi-stage clock gating algorithm with ILP (Integer Linear Programming) formulation, including clock gating control candidate extraction, constraints construction and optimum control signal selection. By multi-stage clock gating, unnecessary clock pulses to clock gating cells can be avoided by other clock gating cells, so that the switching activity of clock gating cells can be reduced. We find that any multi-stage control signals are also single-stage control signals, and any combination of signals can be selected from single-stage candidates. The proposed method can be applied to 3 or more cascaded stages. The multi-stage clock gating optimization problem is formulated as constraints in LP format for the selection of cascaded clock-gating order of multi-stage candidate combinations, and a commercial ILP solver (IBM CPLEX) is applied to obtain the control signals for each register with minimum switching activity. Those signals are used to generate a gate level description with guarded registers from original design, and a commercial synthesis and layout tools are applied to obtain the circuit with multi-stage clock gating. For a set of benchmark circuits and a Low Density Parity Check (LDPC) Decoder (6.6k gates, 212 F.F.s), the proposed method is applied and actual power consumption is estimated using Synopsys NanoSim after layout. On average, 31% actual power reduction has been obtained compared with original designs with structural clock gating, and more than 10% improvement has been achieved for some circuits compared with single-stage optimization method. CPU time for optimum multi-stage control selection is several seconds for up to 25k variables in LP format. By applying the proposed clock gating, area can also be reduced since the multiplexors controlling register inputs are eliminated.
Ki-Sung SOHN Da-In HAN Ki-Ju BAEK Nam-Soo KIM Yeong-Seuk KIM
A new clock gating circuit suitable for shift register is presented. The proposed clock gating circuit that consists of basic NOR gates is low power and small area. The power consumption of a 16-bit shift register implemented with the proposed clock gating circuit is about 66% lower than that found when using the conventional design.
Wei DENG Kenichi OKADA Akira MATSUZAWA
This paper investigates a clock frequency generator for ultra-low-voltage sub-picosecond-jitter clock generation in future 0.5-V LSI and power aware LSI. To address the potential possible solution for ultra-low-voltage applications, a 0.5 V clock frequency generator is proposed and implemented. Significant performances, in terms of sub 1-ps jitter, 50 MHz-to-6.4 GHz frequency tuning range with 2 bands and sub 1-mW PDC, demonstrated the viable replacement of ring oscillators in low-voltage and low-jitter clock generator.
Jaesung CHOI Joonyoung SHIN Jeong Woo LEE
A new high-throughput turbo decoding scheme adopting double flow, sliding window and shuffled decoding is proposed. Analytical and numerical results show that the proposed scheme requires low number of clock cycles and small memory size to achieve a BER performance equivalent to those of existing schemes.
The operating speed scalability is demonstrated by using the forward body biasing method for a 1-V 0.18-µm CMOS true single-phase clocking (TSPC) dual-modulus prescaler. With the forward body bias voltage varying between 0 and 0.4 V, the maximum operating speed changes by about 40–50% and the maximum input sensitivity frequency changes by about 400%. This speed scalability is achieved with less than 0.5-dB phase noise degradation. This demonstration indicates that the forward body biasing method is instrumental to build a cost-saving power-efficient 1-V 0.18-µm CMOS radio for low-power WBAN and WSN applications.
Sho ENDO Takeshi SUGAWARA Naofumi HOMMA Takafumi AOKI Akashi SATOH
This paper presents a glitchy-clock generator integrated in FPGA for evaluating fault injection attacks and their countermeasures on cryptographic modules. The proposed generator exploits clock management capabilities, which are common in modern FPGAs, to generate clock signal with temporal voltage spike. The shape and timing of the glitchy-clock cycle are configurable at run time. The proposed generator can be embedded in a single FPGA without any external instrument (e.g., a pulse generator and a variable power supply). Such integration enables reliable and reproducible fault injection experiments. In this paper, we examine the characteristics of the proposed generator through experiments on Side-channel Attack Standard Evaluation Board (SASEBO). The result shows that the timing of the glitches can be controlled at the step of about 0.17 ns. We also demonstrate its application to the safe-error attack against an RSA processor.
Fumikazu MINAMIYAMA Hidetsugu KOGA Kentaro KOBAYASHI Masaaki KATAYAMA
For the control of multiple servomotors in a humanoid robots, a communication system is proposed. In the system, DC electric power, command/response signals and a common clock signal for precise synchronous movement of the servomotors are transmitted via the same wiring with a multi-drop bus. Because of the bandwidth limitation, the common clock signal and the command/response signals overlap each other. It is confirmed that the coexistence of both signals is possible by using interference cancellation at the reception of command/response signals.
Lechang LIU Takayasu SAKURAI Makoto TAKAMIYA
A 0.6-V voltage shifter and a 0.6-V clocked comparator are presented for sampling correlation-based impulse radio UWB receiver. The voltage shifter is used for a novel split swing level scheme-based CMOS transmission gate which can reduce the power consumption by four times. Compared to the conventional voltage shifter, the proposed voltage shifter can reduce the required capacitance area by half and eliminate the non-overlapping complementary clock generator. The proposed 0.6-V clocked comparator can operate at 100-MHz clock with the voltage shifter. To reduce the power consumption of the conventional continuous-time comparator based synchronization control unit, a novel clocked-comparator based control unit is presented, thereby achieving the lowest energy consumption of 3.9 pJ/bit in the correlation-based UWB receiver with the 0.5 ns timing step for data synchronization.
Koichi YAMAGUCHI Masayuki MIZUNO
Duobinary signaling has been introduced into asymmetric multi-chip communications such as DRAM or display interfaces, which allows a controlled amount of ISI to reduce signaling bandwidth by 2/3. A × 2 oversampled equalization has been developed to realize Duobinary signaling. Symbol-rate clock recovery form Duobinary signal has been developed to reduce power consumption for receivers. A Duobinary transmitter test chip was fabricated with 90-nm CMOS process. A 3.5 dB increase in eye height and a 1.5 times increase in eye width was observed.
Keisuke INOUE Mineo KANEKO Tsuyoshi IWAGAKI
For recent and future nanometer-technology VLSIs, static and dynamic delay variations become a serious problem. In many cases, the hold timing constraint, as well as the setup timing constraint, becomes critical for latching a correct signal under delay variations. While the timing violation due to the fail of the setup timing constraint can be fixed by tuning a clock frequency or using a delayed latch, the timing violation due to the fail of the hold timing constraint cannot be fixed by those methods in general. Our approach to delay variations (in particular, the hold timing constraint) proposed in this paper is a novel register assignment strategy in high-level synthesis, which guarantees safe clocking by Backward-Data-Direction (BDD) clocking. One of the drawbacks of the proposed register assignment is the increase in the number of required registers. After the formulation of this new register minimization problem, we prove NP-hardness of the problem, and then derive an integer linear programming formulation for the problem. The proposed method receives a scheduled data flow graph, and generates a datapath having (1) robustness against delay variations, which is ensured by BDD-based register assignment, and (2) the minimum possible number of registers. Experimental results show the effectiveness of the proposed method for some benchmark circuits.
Hiroaki KATSURAI Hideki KAMITSUNA Hiroshi KOIZUMI Jun TERADA Yusuke OHTOMO Tsugumichi SHIBATA
As a future passive optical network (PON) system, the 10 Gigabit Ethernet PON (10G-EPON) has been standardized in IEEE 802.3av. As conventional Gigabit Ethernet PON (GE-PON) systems have already been widely deployed, 1G/10G co-existence technologies are strongly required for the next system. A gated voltage-controlled-oscillator (G-VCO)-based 10-Gb/s burst-mode clock and data recovery (CDR) circuit is presented for a 1G/10G co-existence PON system. It employs two new circuits to improve jitter transfer and provide tolerance to 1G/10G operation. An injection-controlled jitter-reduction circuit reduces output-clock jitter by 7 dB from 200-MHz input data jitter while keeping a short lock time of 20 ns. A frequency-variation compensation circuit reduces frequency mismatch among the three VCOs on the chip and offers large tolerance to consecutive identical digits. With the compensation, the proposed CDR circuit can employ multi VCOs, which provide tolerance to the 1G/10G co-existence situation. It achieves error-free (bit-error rate < 10-12) operation for 10-G bursts following bursts of other rates, obviously including 1G bursts. It also provides tolerance to a 256-bit sequence without a transition in the data, which is more than enough tolerance for 65-bit CIDs in the 64B/66B code of 10 Gigabit Ethernet.
Kazuyoshi TAKAGI Yuki ITO Shota TAKESHIMA Masamitsu TANAKA Naofumi TAKAGI
In this paper, we propose a method for layout-driven skewed clock tree synthesis for SFQ logic circuits. For a given logic circuit without a clock tree, our algorithm outputs a circuit with a synthesized clock tree and timing adjustments achieving the given clock period and a rough placement of the clocked gates. In the proposed algorithm, clocked gates are grouped into levels and the clock tree is synthesized for each level. For each level, we estimate the clock timing for all possible placements of each gate, and then we search a placement of all gates that minimizes the total number of delay elements for timing adjustment. Once the placement is obtained, we synthesize a clock tree without wire intersections. We applied the proposed method to a moderate size circuit and confirmed that clock trees satisfying given timing requirements can be synthesized automatically.
Chia-Chun TSAI Chung-Chieh KUO Trong-Yen LEE
As the VLSI manufacturing technology shrinks to 65 nm and below, reducing the yield loss induced by via failures is a critical issue in design for manufacturability (DFM). Semiconductor foundries highly recommend using the double-via insertion (DVI) method to improve yield and reliability of designs. This work applies the DVI method in the post-stage of an X-architecture clock routing for double-via insertion rate improvement. The proposed DVI-X algorithm constructs the bipartite graphs of the partitioned clock routing layout with single vias and redundant-via candidates (RVCs). Then, DVI-X applies the augmenting path approach associated with the construction of the maximal cliques to obtain the matching solution from the bipartite graphs. Experimental results on benchmarks show that DVI-X can achieve higher double-via insertion rate by 3% and less running time by 68% than existing works. Moreover, a skew tuning technique is further applied to achieve zero skew because the inserted double vias affect the clock skew.