Song BIAN Michihiro SHINTANI Masayuki HIROMOTO Takashi SATO
As technology further scales semiconductor devices, aging-induced device degradation has become one of the major threats to device reliability. Hence, taking aging-induced degradation into account during the design phase can greatly improve the reliability of the manufactured devices. However, accurately estimating the aging effect for extremely large circuits, like processors, is time-consuming. In this research, we focus on the negative bias temperature instability (NBTI) as the aging-induced degradation mechanism, and propose a fast and efficient way of estimating NBTI-induced delay degradation by utilizing static-timing analysis (STA) and simulation-based lookup table (LUT). We modeled each type of gates at different degradation levels, load capacitances and input slews. Using these gate-delay models, path delays of arbitrary circuits can be efficiently estimated. With a typical five-stage pipelined processor as the design target, by comparing the calculated delay from LUT with the reference delay calculated by a commercial circuit simulator, we achieved 4114 times speedup within 5.6% delay error.
Ing-Chao LIN Yen-Han LEE Sheng-Wei WANG
Ternary content addressable memory (TCAM), which can store 0, 1, or X in its cells, is widely used to store routing tables in network routers. Negative bias temperature instability (NBTI) and positive bias temperature instability (PBTI), which increase Vth and degrade transistor switching speed, have become major reliability challenges. This study analyzes the signal probability of routing tables. The results show that many cells retain static stress and suffer significant degradation caused by NBTI and PBTI effects. The bit flipping technique is improved and proactive power gating recovery is proposed to mitigate NBTI and PBTI effects. In order to maintain the functionality of TCAM after bit flipping, a novel TCAM cell design is proposed. Simulation results show that compared to the original architecture, the bit flipping technique improves read static noise margin (SNM) for data and mask cells by 16.84% and 29.94%, respectively, and reduces search time degradation by 12.95%. The power gating technique improves read SNM for data and mask cells by 12.31% and 20.92%, respectively, and reduces search time degradation by 17.57%. When both techniques are used, read SNM for data and mask cells is improved by 17.74% and 30.53%, respectively, and search time degradation is reduced by 21.01%.
Let G be a probabilistic graph, in which the vertices fail independently with known probabilities. Let K represent a specified subset of vertices. The K-terminal reliability of G is defined as the probability that all vertices in K are connected. When |K|=2, the K-terminal reliability is called the 2-terminal reliability, which is the probability that the source vertex is connected to the destination vertex. The problems of computing K-terminal reliability and 2-terminal reliability have been proven to be #P-complete in general. This work demonstrates that on multi-tolerance graphs, the 2-terminal reliability problem can be solved in polynomial-time and the results can be extended to the K-terminal reliability problem on bounded multi-tolerance graphs.
Carlos PEREZ-LEGUIZAMO P. Josue HERNANDEZ-TORRES J.S. Guadalupe GODINEZ-BORJA Victor TAPIA-TEC
Recently, the Services Oriented Architectures (SOA) have been recognized as the key to the integration and interoperability of different applications and systems that coexist in an organization. However, even though the use of SOA has increased, some applications are unable to use it. That is the case of mission critical information applications, whose requirements such as high reliability, non-stop operation, high flexibility and high performance are not satisfied by conventional SOA infrastructures. In this article we present a novel approach of combining SOA with Autonomous Decentralized Systems (ADS) in order to provide an infrastructure that can satisfy those requirements. We have named this infrastructure Autonomous Decentralized Service Oriented Architecture (ADSOA). We present the concept and architecture of ADSOA, as well as the Loosely Couple Delivery Transaction and Synchronization Technology for assuring the data consistency and high reliability of the application. Moreover, a real implementation and evaluation of the proposal in a mission critical information system, the Uniqueness Verifying Public Key Infrastructure (UV-PKI), is shown in order to prove its effectiveness.
Chihiro TSUTAKE Yutaka NAKANO Toshiyuki YOSHIDA
This paper proposes a fast mode decision technique for intra prediction of High Efficiency Video Coding (HEVC) based on a reliability metric for motion vectors (RMMV). Since such a decision problem can be regarded as a kind of pattern classification, an efficient classifier is required for the reduction of computation complexity. This paper employs the RMMV as a classifier because the RMMV can efficiently categorize image blocks into flat(uniform), active, and edge blocks, and can estimate the direction of an edge block as well. A local search for angular modes is introduced to further speed up the decision process. An experiment shows the advantage of our technique over other techniques.
Aibin YAN Huaguo LIANG Zhengfeng HUANG Cuiyun JIANG Maoxiang YI
In this paper, a self-recoverable, frequency-aware and cost-effective robust latch (referred to as RFC) is proposed in 45nm CMOS technology. By means of triple mutually feedback Muller C-elements, the internal nodes and output node of the latch are self-recoverable from single event upset (SEU), i.e. particle striking induced logic upset, regardless of the energy of the striking particle. The proposed robust latch offers a much wider spectrum of working clock frequency on account of a smaller delay and insensitivity to high impedance state. The proposed robust latch performs with lower costs regarding power and area than most of the compared latches. SPICE simulation results demonstrate that the area-power-delay product is 73.74% saving on average compared with previous radiation hardened latches.
In this letter we develop a software reliability modeling framework by introducing the Burr XII distributions to software fault-detection time. An extension to deal with software metrics data characterizing the product size, program complexity or testing expenditure is also proposed. Finally, we investigate the goodness-of-fit performance and compare our new models with the existing ones through real data analyses.
In this paper we consider two non-parametric estimation methods for software reliability assessment without specifying the fault-detection time distribution, where the underlying stochastic process to describe software fault-counts in the system testing is given by a non-homogeneous Poisson process. The resulting data-driven methodologies can give the useful probabilistic information on the software reliability assessment under the incomplete knowledge on fault-detection time distribution. Throughout examples with real software fault data, it is shown that the proposed methods provide more accurate estimation results than the common parametric approach.
Shuhei OTA Takao KAGEYAMA Mitsuhiro KIMURA
In this study, we investigate whether copula modeling contributes to the improvement of reliability evaluation in a cascading failure-occurrence environment. In particular, as a basic problem, we focus on a 2-unit parallel system whose units may fail dependently each other. As a result, the reliability assessment of the system by using the maximal copula provides more accurate evaluation than the traditional Weibull analysis, if the degree of dependency between two units are high. We show this result by using several simulation studies.
We discuss software reliability assessment considering multiple changes of software fault-detection phenomenon. The testing-time when the characteristic of the software failure-occurrence or fault-detection phenomenon changes notably in the testing-phase of a software development process is called change-point. It is known that the occurrence of the change-point influences the accuracy for the software reliability assessment based on a software reliability growth models, which are mainly divided into software failure-occurrence time and fault counting models. This paper discusses software reliability growth modeling frameworks considering with the effect of the multiple change-point occurrence on the software reliability growth process in software failure-occurrence time and fault counting modeling. And we show numerical illustrations for the software reliability analyses based on our models by using actual data.
Takashi IMAGAWA Masayuki HIROMOTO Hiroyuki OCHI Takashi SATO
Time redundancy is sometimes an only option for enhancing circuit reliability when the circuit area is severely restricted. In this paper, a time-redundant error-correction scheme, which is particularly suitable for coarse-grained reconfigurable arrays (CGRAs), is proposed. It judges the correctness of the executions by comparing the results of two identical runs. Once a mismatch is found, the second run is terminated immediately to start the third run, under the assumption that the errors tend to persist in many applications, for selecting the correct result in the three runs. The circuit area and reliability of the proposed method is compared with a straightforward implementation of time-redundancy and a selective triple modular redundancy (TMR). A case study on a CGRA revealed that the area of the proposed method is 1% larger than that of the implementation for the selective TMR. The study also shows the proposed scheme is up to 2.6x more reliable than the full-TMR when the persistent error is predominant.
Lei WANG Xinrong GUAN Yueming CAI Weiwei YANG Wendong YANG
This work investigates the physical layer security for three cooperative automatic-repeat-request (CARQ) protocols, including the decode-and-forward (DF) CARQ, opportunistic DF (ODF) CARQ, and the distributed space-time code (DSTC) CARQ. Assuming that there is no instantaneous channel state information (CSI) of legitimate users' channel and eavesdropper's channel at the transmitter, the connection outage performance and secrecy outage performance are derived to evaluate the reliability and security of each CARQ protocol. Then, we redefine the concept of the secrecy throughput to evaluate the overall efficiency of the system in terms of maintaining both reliable and secure transmission. Furthermore, through an asymptotic analysis in the high signal-to-noise ratio (SNR) regime, the direct relationship between reliability and security is established via the reliability-security tradeoff (RST). Numerical results verify the analysis and show the efficiency of the CARQ protocols in terms of the improvement on the secrecy throughput. More interestingly, increasing the transmit SNR and the maximum number of transmissions of the ARQ protocols may not achieve a security performance gain. In addition, the RST results underline the importance of determining how to balance the reliability vs. security, and show the superiority of ODF CARQ in terms of RST.
Chunyan HOU Chen CHEN Jinsong WANG Kai SHI
With the rise of component-based software development, its reliability has attracted much attention from both academic and industry communities. Component-based software development focuses on architecture design, and thus it is important for reliability analysis to emphasize software architecture. Existing approaches to architecture-based software reliability analysis don't model the usage profile explicitly, and they ignore the difference between the testing profile and the practical profile of components, which limits their applicability and accuracy. In response to these issues, a new reliability modeling and prediction approach is introduced. The approach considers reliability-related architecture factors by explicitly modeling the system usage profile, and transforms the testing profile into the practical usage profile of components by representing the profile with input sub-domains. Finally, the evaluation experiment shows the potential of the approach.
Huimin LIANG Jiaxin YOU Zhaowen CAI Guofu ZHAI
The reliability of electromagnetic relay (EMR) which contains a permanent magnet (PM) can be improved by a robust design method. In this parameter design process, the calculation of electromagnetic system is very important. In analytical calculation, PM is often equivalent to a lumped parameter model of one magnetic resistance and one magnetic potential, but significant error is often caused; in order to increase the accuracy, a distributed parameter calculation model (DPM) of PM bar is established; solution procedure as well as verification condition of this model is given; by a case study of the single PM bar, magnetic field lines division method is adopted to build the DPM, the starting point and section magnetic flux of each segment are solved, a comparison is made with finite element method (FEM) and measured data; the accuracy of this magnetic field line based distributed parameter model (MFDPM) in PM bar is verified; this model is applied to the electromagnetic system of a certain type EMR, electromagnetic system calculation model is established based on MFDPM, and the static force is calculated under different rotation angles; compared with traditional lumped parameter model and FEM, it proves to be of acceptable calculation accuracy and high calculation speed which fit the requirement of robust design.
Michitarou YABUUCHI Ryo KISHIDA Kazutoshi KOBAYASHI
We analyze the correlation between BTI (Bias Temperature Instability) -induced degradations and process variations. Those reliability issues are correlated. BTI is one of the most significant aging-degradations on LSIs. Threshold voltages of MOSFETs increase with time when biases stress their gates. It shows a strong effect of BTI on highly scaled LSIs in the same way as the process variations. The accurate prediction of the combinational effects is indispensable. We should analyze both aging-degradations and process variations of MOSFETs to explain the correlation. We measure frequencies of ROs (Ring Oscillators) of 65-nm process test circuits on two types of LSIs, ASICs and FPGAs. There are 98 and 837 ROs on our ASICs and FPGAs respectively. The frequencies of ROs follow gaussian distributions. We describe the highest frequency group as the “fast” conditon, the average group as the “typical” conditon and the lowest group as the “slow” conditon. We measure the aging-degradations of the ROs of the three conditions on the accelerated test. The degradations can be approximated by logarithmic function of stress time. The degradation at the “fast” condition has a higher impact on the frequency than the “slow” one. The correlation coefficient is 0.338. In this case, we can define a smaller design margin for BTI-induced degradations than that without considering the correlation because the degradation at the “slow” conditon is smaller than the average and the fast.
Katherine Shu-Min LI Yingchieh HO Yu-Wei YANG Liang-Bi CHEN
The excessively high temperature in a chip may cause circuit malfunction and performance degradation, and thus should be avoided to improve system reliability. In this paper, a novel oscillation-based on-chip thermal sensing architecture for dynamically adjusting supply voltage and clock frequency in System-on-a-Chip (SoC) is proposed. It is shown that the oscillation frequency of a ring oscillator reduces linearly as the temperature rises, and thus provides a good on-chip temperature sensing mechanism. An efficient Dynamic Voltage-to-Frequency Scaling (DF2VS) algorithm is proposed to dynamically adjust supply voltage according to the oscillation frequencies of the ring oscillators distributed in SoC so that thermal sensing can be carried at all potential hot spots. An on-chip Dynamic Voltage Scaling or Dynamic Voltage and Frequency Scaling (DVS or DVFS) monitor selects the supply voltage level and clock frequency according to the outputs of all thermal sensors. Experimental results on SoC benchmark circuits show the effectiveness of the algorithm that a 10% reduction in supply voltage alone can achieve about 20% power reduction (DVS scheme), and nearly 50% reduction in power is achievable if the clock frequency is also scaled down (DVFS scheme). The chip temperature will be significant lower due to the reduced power consumption.
Chunsheng HUA Juntong QI Jianda HAN Haiyuan WU
In this paper, we introduced a novel Kernel-Reliability-based K-Means (KRKM) clustering algorithm for categorizing an unknown dataset under noisy condition. Compared with the conventional clustering algorithms, the proposed KRKM algorithm will measure both the reliability and the similarity for classifying data into its neighbor clusters by the dynamic kernel functions, where the noisy data will be rejected by being given low reliability. The reliability for classifying data is measured by a dynamic kernel function whose window size will be determined by the triangular relationship from this data to its two nearest clusters. The similarity from a data item to its neighbor clusters is measured by another adaptive kernel function which takes into account not only the similarity from data to clusters but also that between its two nearest clusters. The main contribution of this work lies in introducing the dynamic kernel functions to evaluate both the reliability and similarity for clustering, which makes the proposed algorithm more efficient in dealing with very strong noisy data. Through various experiments, the efficiency and effectiveness of proposed algorithm have been confirmed.
Hiromitsu KIMURA Zhiyong ZHONG Yuta MIZUOCHI Norihiro KINOUCHI Yoshinobu ICHIDA Yoshikazu FUJIMORI
A ferroelectric-based (FE-based) non-volatile logic is proposed for low-power LSI. Standby currents in a logic circuit can be cut off by using FE-based non-volatile flip-flops (NVFFs), and the standby power can be reduced to zero. The FE capacitor is accessed only when the power turns on/off, performance of the NVFF is almost as same as that of the conventional flip-flop (FF) in a logic operation. The use of complementarily stored data in coupled FE capacitors makes it possible to realize wide read voltage margin, which guarantees 10 years retention at 85 degree Celsius under less than 1.5V operation. The low supply voltage and electro-static discharge (ESD) detection technique prevents data destruction caused by illegal access for the FE capacitor during standby state. Applying the proposed circuitry in CPU, the write and read operation for all FE capacitors in 1.6k-bit NVFFs are performed within 7µs and 3µs with access energy of 23.1nJ and 8.1nJ, respectively, using 130nm CMOS with Pb(Zr,Ti)O3(PZT) thin films.
Takashi YAMAMOTO Shigemasa TAKAI
In this paper, we study conjunctive decentralized diagnosis of discrete event systems (DESs). In most existing works on decentralized diagnosis of DESs, it is implicitly assumed that diagnosis decisions of all local diagnosers are available to detect a failure. However, it may be possible that some local diagnosis decisions are not available, due to some reasons. Letting n be the number of local diagnosers, the notion of (n,k)-conjunctive codiagnosability guarantees that the occurrence of any failure is detected in the conjunctive architecture as long as at least k of the n local diagnosis decisions are available. We propose an algorithm for verifying (n,k)-conjunctive codiagnosability. To construct a reliable conjunctive decentralized diagnoser, we need to compute the delay bound within which the occurrence of any failure can be detected as long as at least k of the n local diagnosis decisions are available. We show how to compute the delay bound.
Hiroaki KONOURA Toshihiro KAMEDA Yukio MITSUYAMA Masanori HASHIMOTO Takao ONOYE
Negative Bias Temperature Instability (NBTI) is one of the serious concerns for long-term circuit performance degradation. NBTI degrades PMOS transistors under negative bias, whereas they recover once negative bias is removed. In this paper, we propose a mitigation method for NBTI-induced performance degradation that exploits the recovery property by shifting random input sequence through scan paths. With this method, we prevent consecutive stress that causes large degradation. Experimental results reveal that random scan-in vectors successfully mitigate NBTI and the path delay degradation is reduced by 71% in a test case when standby mode occupies 10% of total time. We also confirmed that 8-bit LFSR is capable of random number generation for this purpose with low area and power overhead.