Takuma NAGAO Tomoki NAKAMURA Masuo KAJIYAMA Makoto EIKI Michiko INOUE Michihiro SHINTANI
Statistical wafer-level characteristic variation modeling offers an attractive method for reducing the measurement cost in large-scale integrated (LSI) circuit testing while maintaining test quality. In this method, the performance of unmeasured LSI circuits fabricated on a wafer is statistically predicted based on a few measured LSI circuits. Conventional statistical methods model spatially smooth variations in the wafers. However, actual wafers can exhibit discontinuous variations that are systematically caused by the manufacturing environment, such as shot dependence. In this paper, we propose a modeling method that considers discontinuous variations in wafer characteristics by applying the knowledge of manufacturing engineers to a model estimated using Gaussian process regression. In the proposed method, the process variation is decomposed into systematic discontinuous and global components to improve estimation accuracy. An evaluation performed using an industrial production test dataset indicates that the proposed method effectively reduces the estimation error for an entire wafer by over 36% compared with conventional methods.
Dongyue JIN Luming CAO You WANG Xiaoxue JIA Yongan PAN Yuxin ZHOU Xin LEI Yuanyuan LIU Yingqi YANG Wanrong ZHANG
Fast switching speed, low power consumption, and good stability are some of the important properties of spin transfer torque assisted voltage controlled magnetic anisotropy magnetic tunnel junction (STT-assisted VCMA-MTJ) which makes the non-volatile full adder (NV-FA) based on it attractive for Internet of Things. However, the effects of process variations on the performances of STT-assisted VCMA-MTJ and NV-FA will be more and more obvious with the downscaling of STT-assisted VCMA-MTJ and the improvement of chip integration. In this paper, a more accurate electrical model of STT-assisted VCMA-MTJ is established on the basis of the magnetization dynamics and the process variations in film growth process and etching process. In particular, the write voltage is reduced to 0.7 V as the film thickness is reduced to 0.9 nm. The effects of free layer thickness variation (γtf) and oxide layer thickness variation (γtox) on the state switching as well as the effect of tunnel magnetoresistance ratio variation (β) on the sensing margin (SM) are studied in detail. Considering that the above process variations follow Gaussian distribution, Monte Carlo simulation is used to study the effects of the process variations on the writing and output operations of NV-FA. The result shows that the state of STT-assisted VCMA-MTJ can be switched under -0.3%≤γtf≤6% or -23%≤γtox≤0.2%. SM is reduced by 16.0% with β increases from 0 to 30%. The error rates of writing ‘0’ in the NV-FA can be reduced by increasing Vb1 or increasing positive Vb2. The error rates of writing ‘1’ can be reduced by increasing Vb1 or decreasing negative Vb2. The reduction of the output error rates can be realized effectively by increasing the driving voltage (Vdd).
Ting-Chou LU Ming-Dou KER Hsiao-Wen ZAN
Process and temperature variations have become a serious concern for ultra-low voltage (ULV) technology. The clock generator is the essential component for the ULV very-large-scale integration (VLSI). MOSFETs that are operated in the sub-threshold region are widely applied for ULV technology. However, MOSFETs at subthreshold region have relatively high variations with process and temperature. In this paper, process and temperature variations on the clock generators have been studied. This paper presents an ultra-low voltage 2.4GHz CMOS voltage controlled oscillator with temperature and process compensation. A new all-digital auto compensated mechanism to reduce process and temperature variation without any laser trimming is proposed. With the compensated circuit, the VCO frequency-drift is 16.6 times the improvements of the uncompensated one as temperature changes. Furthermore, it also provides low jitter performance.
Yu HU Jing YE Zhiping SHI Xiaowei LI
Process variation has become prominent in the advanced CMOS technology, making the timing of fabricated circuits more uncertain. In this paper, we propose a Layout-Aware Path Selection (LAPS) technique to accurately estimate the circuit timing variation from a small set of paths. Three features of paths are considered during the path selection. Experiments conducted on benchmark circuits with process variation simulated with VARIUS show that, by selecting only hundreds of paths, the fitting errors of timing distribution are kept below 5.3% when both spatial correlated and spatial uncorrelated process variations exist.
Ahmed AWAD Atsushi TAKAHASHI Chikaaki KODAMA
With being pushed into sub-16nm regime, advanced technology nodes printing in optical micro-lithography relies heavily on aggressive Optical Proximity Correction (OPC) in the foreseeable future. Although acceptable pattern fidelity is utilized under process variations, mask design time and mask manufacturability form crucial parameters whose tackling in the OPC recipe is highly demanded by the industry. In this paper, we propose an intensity based OPC algorithm to find a highly manufacturable mask solution for a target pattern with acceptable pattern fidelity under process variations within a short computation time. This is achieved through utilizing a fast intensity estimation model in which intensity is numerically correlated with local mask density and kernel type to estimate the intensity in a short time and with acceptable estimation accuracy. This estimated intensity is used to guide feature shifting, alignment, and concatenation following linearly interpolated variational intensity error model to achieve high mask manufacturability with preserving acceptable pattern fidelity under process variations. Experimental results show the effectiveness of our proposed algorithm on the public benchmarks.
Koki IGAWA Masao YANAGISAWA Nozomu TOGAWA
In order to tackle a process-variation problem, we can define several scenarios, each of which corresponds to a particular LSI behavior, such as a typical-case scenario and a worst-case scenario. By designing a single LSI chip which realizes multiple scenarios simultaneously, we can have a process-variation-tolerant LSI chip. In this paper, we propose a multi-scenario high-level synthesis algorithm for variation-tolerant floorplan-driven design targeting new distributed-register architectures, called HDR architectures. We assume two scenarios, a typical-case scenario and a worst-case scenario, and realize them onto a single chip. We first schedule/bind each of the scenarios independently. After that, we commonize the scheduling/binding results for the typical-case and worst-case scenarios and thus generate a commonized area-minimized floorplan result. At that time, we can explicitly take into account interconnection delays by using distributed-register architectures. Experimental results show that our algorithm reduces the latency of the typical-case scenario by up to 50% without increasing the latency of the worst-case scenario, compared with several existing methods.
Michitarou YABUUCHI Ryo KISHIDA Kazutoshi KOBAYASHI
We analyze the correlation between BTI (Bias Temperature Instability) -induced degradations and process variations. Those reliability issues are correlated. BTI is one of the most significant aging-degradations on LSIs. Threshold voltages of MOSFETs increase with time when biases stress their gates. It shows a strong effect of BTI on highly scaled LSIs in the same way as the process variations. The accurate prediction of the combinational effects is indispensable. We should analyze both aging-degradations and process variations of MOSFETs to explain the correlation. We measure frequencies of ROs (Ring Oscillators) of 65-nm process test circuits on two types of LSIs, ASICs and FPGAs. There are 98 and 837 ROs on our ASICs and FPGAs respectively. The frequencies of ROs follow gaussian distributions. We describe the highest frequency group as the “fast” conditon, the average group as the “typical” conditon and the lowest group as the “slow” conditon. We measure the aging-degradations of the ROs of the three conditions on the accelerated test. The degradations can be approximated by logarithmic function of stress time. The degradation at the “fast” condition has a higher impact on the frequency than the “slow” one. The correlation coefficient is 0.338. In this case, we can define a smaller design margin for BTI-induced degradations than that without considering the correlation because the degradation at the “slow” conditon is smaller than the average and the fast.
Ryo HARADA Yukio MITSUYAMA Masanori HASHIMOTO Takao ONOYE
This paper presents a measurement circuit structure for capturing SET pulse-width suppressing pulse-width modulation and within-die process variation effects. For mitigating pulse-width modulation while maintaining area efficiency, the proposed circuit uses massively parallelized short inverter chains as a target circuit. Moreover, for each inverter chain on each die, pulse-width calibration is performed. In measurements, narrow SET pulses ranging 5ps to 215ps were obtained. We confirm that an overestimation of pulse-width may happen when ignoring die-to-die and within-die variation of the measurement circuit. Our evaluation results thus point out that calibration for within-die variation in addition to die-to-die variation of the measurement circuit is indispensable.
Shiho HAGIWARA Takanori DATE Kazuya MASU Takashi SATO
This paper proposes a novel and an efficient method termed hypersphere sampling to estimate the circuit yield of low-failure probability with a large number of variable sources. Importance sampling using a mean-shift Gaussian mixture distribution as an alternative distribution is used for yield estimation. Further, the proposed method is used to determine the shift locations of the Gaussian distributions. This method involves the bisection of cones whose bases are part of the hyperspheres, in order to locate probabilistically important regions of failure; the determination of these regions accelerates the convergence speed of importance sampling. Clustering of the failure samples determines the required number of Gaussian distributions. Successful static random access memory (SRAM) yield estimations of 6- to 24-dimensional problems are presented. The number of Monte Carlo trials has been reduced by 2-5 orders of magnitude as compared to conventional Monte Carlo simulation methods.
Susumu KOBAYASHI Fumihiro MINAMI
As the LSI process technology advances and the gate size becomes smaller, the signal delay on interconnect becomes a significant factor in the signal path delay. Also, as the size of interconnect structure becomes smaller, the interconnect process variations have become one of the dominant factors which influence the signal delay and thus clock skew. Therefore, controlling the influence of interconnect process variations on clock skew is a crucial issue in the advanced process technologies. In this paper, we propose a method for minimizing clock skew fluctuations caused by interconnect process variations. The proposed method identifies the suitable balance of clock buffer size and wire length in order to minimize the clock skew fluctuations caused by the interconnect process variations. Experimental results on test circuits of 28nm process technology show that the proposed method reduces the clock skew fluctuations by 30-92% compared to the conventional method.
Junya KAWASHIMA Hiroshi TSUTSUI Hiroyuki OCHI Takashi SATO
We investigate a design strategy for subthreshold circuits focusing on energy-consumption minimization and yield maximization under process variations. The design strategy is based on the following findings related to the operation of low-power CMOS circuits: (1) The minimum operation voltage (VDDmin) of a circuit is dominated by flip-flops (FFs), and VDDmin of an FF can be improved by upsizing a few key transistors, (2) VDDmin of an FF is stochastically modeled by a log-normal distribution, (3) VDDmin of a large circuit can be efficiently estimated by using the above model, which eliminates extensive Monte Carlo simulations, and (4) improving VDDmin may substantially contribute to decreasing energy consumption. The effectiveness of the proposed design strategy has been verified through circuit simulations on various circuits, which clearly show the design tradeoff between voltage scaling and transistor sizing.
Shuta KIMURA Masanori HASHIMOTO Takao ONOYE
Post-silicon tuning is attracting a lot of attention for coping with increasing process variation. However, its tuning cost via testing is still a crucial problem. In this paper, we propose tuning-friendly body bias clustering with multiple bias voltages. The proposed method provides a small set of compensation levels so that the speed and leakage current vary monotonically according to the level. Thanks to this monotonic leveling and limitation of the number of levels, the test-cost of post-silicon tuning is significantly reduced. During the body bias clustering, the proposed method explicitly estimates and minimizes the average leakage after the post-silicon tuning. Experimental results demonstrate that the proposed method reduces the average leakage by 25.3 to 51.9% compared to non clustering case. In a test case of four clusters, the number of necessary tests is reduced by 83% compared to the conventional exhaustive test approach. We reveal that two bias voltages are sufficient when only a small number of compensation levels are allowed for test-cost reduction. We also give an implication on how to synthesize a circuit to which post-silicon tuning will be applied.
Jun Gyu LEE Zule XU Shoichi MASUI
We propose a methodology of loop design optimization for fourth-order fractional-N phase locked loop (PLL) frequency synthesizers featuring a short settling time of 5 µsec for applications in an active RFID (radio frequency identification) and automobile smart-key systems. To establish the optimized design flow, equations presenting the relationship between the specification and PLL loop parameters in terms of settling time, loop bandwidth, phase margin, and phase noise are summarized. The proposed design flow overcomes the settling time inaccuracy in conventional second-order approximation methods by obtaining the accurate relationship between settling time and loop bandwidth with the MATLAB Control System Toolbox for the fourth-order PLLs. The proposed flow also features the worst-case design by taking account of the process, voltage, and temperature (PVT) variations in loop filter components, and considers the tradeoff between phase noise and area. The three-step optimization process consists of 1) the derivation of the accurate relationship between the settling time and loop bandwidth for various PVT conditions, 2) the derivation of phase noise and area as functions of area-dominant filter capacitance, and 3) the derivation of all PLL loop components values. The optimized design result is compared with circuit simulations using an actually designed fourth-order fractional-N PLL in a 1.8 V 0.18 µm CMOS technology. The error between the design and simulation for the setting time is reduced from 0.63 µsec in the second-order approximation to 0.23 µsec in the fourth-order optimization that proves the validity of the proposed method for the high-speed settling operations.
Yohei NAKATA Hiroshi KAWAGUCHI Masahiko YOSHIMOTO
As process technology is scaled down, a typical system on a chip (SoC) becomes denser. In scaled process technology, process variation becomes greater and increasingly affects the SoC circuits. Moreover, the process variation strongly affects network-on-chips (NoCs) that have a synchronous network across the chip. Therefore, its network frequency is degraded. We propose a process-variation-adaptive NoC with a variation-adaptive variable-cycle router (VAVCR). The proposed VAVCR can configure its cycle latency adaptively on a processor core basis, corresponding to the process variation. It can increase the network frequency, which is limited by the process variation in a conventional router. Furthermore, we propose a variable-cycle pipeline adaptive routing (VCPAR) method with VAVCR; the proposed VCPAR can reduce packet latency and has tolerance to network congestion. The total execution time reduction of the proposed VAVCR with VCPAR is 15.7%, on average, for five task graphs.
Bo LIU Bo YANG Shigetoshi NAKATAKE
Current sources are essential components for analog circuit designs, the mismatch of which causes the significant degradation of the circuit performance. This paper addresses the mismatch model of CMOS current sources, unlike the conventional modeling, focusing on the layout- and λ-dependency of the process variation, where λ is the output conductance parameter. To make it clear what variation parameter influences the mismatch, we implemented a test chip on 90 nm process technology, where we can collect the characteristics variation data for MOSFETs of various layouts. The test chip also includes D/A converters to check the differential non-linearity (DNL) caused by the mismatch of current sources when behaving as a DAC. Identifying the variation and the circuit-level errors in the measured DNLs, we reveal that our model can more accurately account for the current variation compared to the conventional mismatch model.
Kei MATSUMOTO Tetsuya HIROSE Yuji OSAKI Nobutaka KUROKI Masahiro NUMA
We propose a subthreshold Static Random Access Memory (SRAM) circuit architecture with improved write ability. Even though the circuits can achieve ultra-low power dissipation in subthreshold digital circuits, the performance is significantly degraded with threshold voltage variations due to the fabrication process and temperature. Because the write operation of SRAM is prone to failure due to the unbalance of threshold voltages between the nMOSFET and pMOSFET, stable operation cannot be ensured. To achieve robust write operation of SRAM, we developed a compensation technique by using an adaptive voltage scaling technique that uses an on-chip threshold voltage monitoring circuit. The monitoring circuit detects the threshold voltage of a MOSFET with the on-chip circuit configuration. By using the monitoring voltage as a supply voltage for SRAM cells, write operation can be compensated without degrading cell stability. Monte Carlo simulations demonstrated that the proposed SRAM architecture exhibits a smaller write operation failure rate and write time variation than a conventional 6T SRAM.
In this paper, a Stochastic Non-Homogeneous ARnoldi (SNHAR) method is proposed for the analysis of the on-chip power grid networks in the presence of process variations. In SNHAR method, the polynomial chaos based stochastic method is employed to handle the variations of power grids. Different from the existing StoEKS method which uses extended Krylov Subspace (EKS) method to compute the coefficients of the polynomial chaos, a computation-efficient and numerically stable Non-Homogeneous ARnoldi (NHAR) method is employed in SNHAR method to compute the coefficients of the polynomial chaos. Compared with EKS method, NHAR method has superior numerical stability and can achieve remarkably higher accuracy with even lower computational cost. As a result, SNHAR can capture the stochastic characteristics of the on-chip power grid networks with higher accuracy, but even lower computational cost than StoEKS.
Pei-Wen LUO Jwu-E CHEN Chin-Long WEY
Device mismatch plays an important role in the design of accurate analog circuits. The common centroid structure is commonly employed to reduce device mismatches caused by symmetrical layouts and processing gradients. Among the candidate placements generated by the common centroid approach, however, whichever achieves better matching is generally difficult to be determined without performing the time-consuming yield evaluation process. In addition, this rule-based methodology makes it difficult to achieve acceptable matching between multiple capacitors and to handle an irregular layout area. Based on a spatial correlation model, this study proposed a design methodology for yield enhancement of analog circuits using switched-capacitor techniques. An efficient and effective placement generator is developed to derive a placement for a circuit to achieve the highest or near highest correlation coefficient and thus accomplishing a better yield performance. A simple yield analysis is also developed to evaluate the achieved yield performance of a derived placement. Results show that the proposed methodology derives a placement which achieves better yield performance than those generated by the common centroid approach.
Mahmoud MOMTAZPOUR Maziar GOUDARZI Esmaeil SANAEI
Parameter variations reveal themselves as different frequency and leakage powers per instances of the same MPSoC. By the increasing variation with technology scaling, worst-case-based scheduling algorithms result in either increasingly less optimal schedules or otherwise more lost yield. To address this problem, this paper introduces a variation-aware task and communication scheduling algorithm for multiprocessor system-on-chip (MPSoC). We consider both delay and leakage power variations during the process of finding the best schedule so that leakier processors are less utilized and can be more frequently put in sleep mode to reduce power. Our algorithm takes advantage of event tables to accelerate the statistical timing and power analysis. We use genetic algorithm to find the best schedule that maximizes power-yield under a performance-yield constraint. Experimental results on real world benchmarks show that our proposed algorithm achieves 16.6% power-yield improvement on average over deterministic worst-case-based scheduling.
Jun TAO Xuan ZENG Wei CAI Yangfeng SU Dian ZHOU
In this paper, a Stochastic Collocation Algorithm combined with Sparse Grid technique (SSCA) is proposed to deal with the periodic steady-state analysis for nonlinear systems with process variations. Compared to the existing approaches, SSCA has several considerable merits. Firstly, compared with the moment-matching parameterized model order reduction (PMOR) which equally treats the circuit response on process variables and frequency parameter by Taylor approximation, SSCA employs Homogeneous Chaos to capture the impact of process variations with exponential convergence rate and adopts Fourier series or Wavelet Bases to model the steady-state behavior in time domain. Secondly, contrary to Stochastic Galerkin Algorithm (SGA), which is efficient for stochastic linear system analysis, the complexity of SSCA is much smaller than that of SGA for nonlinear case. Thirdly, different from Efficient Collocation Method, the heuristic approach which may result in "Rank deficient problem" and "Runge phenomenon," Sparse Grid technique is developed to select the collocation points needed in SSCA in order to reduce the complexity while guaranteing the approximation accuracy. Furthermore, though SSCA is proposed for the stochastic nonlinear steady-state analysis, it can be applied to any other kind of nonlinear system simulation with process variations, such as transient analysis, etc.