Xiaoqing WEN Seiji KAJIHARA Kohei MIYASE Yuta YAMATO Kewal K. SALUJA Laung-Terng WANG Kozo KINOSHITA
This paper proposes a new per-test fault diagnosis method based on the X-fault model. The X-fault model can represent all possible faulty behaviors of a physical defect or defects in a gate and/or on its fanout branches by assigning different X symbols assigned to the fanout branches. A partial symbolic fault simulation method is proposed for the X-fault model. Then, a novel technique is proposed for extracting more diagnostic information by analyzing matching details between observed and simulated responses. Furthermore, a unique method is proposed to score the results of fault diagnosis. Experimental results on benchmark circuits demonstrate the superiority of the proposed method over conventional per-test fault diagnosis based on the stuck-at fault model.
Yoshiyuki NAKAMURA Jacob SAVIR Hideo FUJIWARA
In [1] the impact of BIST on the chip defect level after test has been addressed. It was assumed in [1] that no measures are taken to ensure that the BIST circuitry is fault-free before launching the functional test. In this paper we assume that a BIST pretest is first conducted in order to get rid of all chips that fail it. Only chips whose BIST circuitry has passed the pretest are kept, while the rest are discarded. The BIST pretest, however, is assumed to have only a limited coverage against its own faults. This paper studies the product quality improvements as induced by the BIST pretest, and provides some insight as to when it may be worthwhile to perform it.
Daisuke TAKEMOTO Shigeaki TAGASHIRA Satoshi FUJITA
In this paper, we propose a new method to enhance the fault-tolerance of the Content Addressable Network (CAN), which is known as a typical pure P2P system based on the notion of Distributed Hash Table (DHT). The basic idea of the proposed method is to introduce redundancy to the management of index information distributed over the nodes in the given P2P network, by allowing each index to be assigned to several nodes, which was restricted to be one in the original CAN system. To keep the consistency among several copies of indices, we propose an efficient synchronization scheme based on the notion of labels assigned to each copy in a distinct manner. The performance of the proposed scheme is evaluated by simulation. The result of simulations indicates that the proposed scheme significantly enhances the fault-tolerance of the CAN system.
Constraint-based software specifications enable run-time monitoring to detect probable risk events and ensure the desired system behavior. SpecTRM-RL is a well-developed constraint-based specification method for computer-controlled systems. However, it is desirable to express constraints in familiar visual models. To provide better visualization and popularity, we developed methods to represent all the SpecTRM-RL constraint types in UML. We have also extended SpecTRM's constraints by adding relational and global constraints, and then expressed them in OCL. Safety verification of these specifications is also proposed. We developed a systematic way to construct fault trees for safety analysis based on UML diagrams. Due to the generality of UML as well as the defensive manner of constraints and fault tree analysis, our approach can be adapted for both general applications and safety-critical applications.
Multi-context FPGAs allow very quick reconfiguration by storing multiple configuration data at the same time. While testing for FPGAs with single-context memories has already been studied by many researchers, testing for multi-context FPGAs has not been proposed yet. This paper presents an architecture of testable multi-context FPGAs. In the proposed multi-context FPGA, configuration data stored in a context can be copied into another context. This paper also shows testing of the proposed multi-context FPGA. The proposed testing uses the testing for the traditional FPGAs with single-context. The testing is capable of detecting single stuck-at faults and single open faults which affect normal operations. The number of test configurations for the proposed testing is at most two more than that for the testing of FPGAs with single-context memories. The area overhead of the proposed architecture is 7% and 4% of the area of a multi-context FPGA without the proposed architecture when the number of contexts in a configuration memory is 8 and 16, respectively.
Mohammad Hossein KAHAEI Mehdi TORBATIAN Javad POSHTAN
This paper presents a new bearing fault detection algorithm based on analyzing singular points of vibration signals using the Haar wavelet. The proposed Haar Fault Detection (HFD) algorithm is compared with a previously-developed algorithm associated with the Morlet wavelet. We also substitute the Haar wavelet with Daubechies wavelets with larger compact supports and evaluate the results. Simulations carried on real data demonstrate that the HFD algorithm achieves a comparable accuracy while having a lower computational cost. This makes the HFD algorithm an appropriate candidate for fast processing of bearing faults.
Yoshiyuki NAKAMURA Thomas CLOUQUEUR Kewal K. SALUJA Hideo FUJIWARA
In this paper, we provide a practical formulation of the problem of identifying all error occurrences and all failed scan cells in at-speed scan based BIST environment. We propose a method that can be used to identify every error when the circuit test frequency is higher than the tester frequency. Our approach requires very little extra hardware for diagnosis and the test application time required to identify errors is a linear function of the frequency ratio between the CUT and the tester.
I-Shyan HWANG I-Feng HUANG Chih-Dar CHIEN David H. SU
This work proposes a distributed fault protection mechanism called the Dynamic-Shared Segment Protection (DSSP) algorithm for WDM (Wavelength Division Multiplexing) mesh networks. The objects are to assure high probability of path protection and efficient use of network resources. The proposed approach exploits the segment protection mode, which accommodates the characteristics of both path-based and link-based protections, for providing finer service granularities, to satisfy the versatile requirements of critical applications in the foreseeable future. To show that DSSP can improve performance efficiency, simulations are conducted using four networks (NSFNET, USANET, Mesh 66, Mesh 99) for a comparative study of the proposed DSSP versus ordinary shared protection schemes and SLSP (Short Leap Shared Protection). Simulation results reveal that the proposed DSSP method results in much lower blocking probability and has higher network utilization. Consequently, it is very useful for applications to a real-time WDM network, which changes status dynamically.
An algorithm is described for solving the node-to-set disjoint paths problem in bi-rotator graphs, which are obtained by making each edge of a rotator graph bi-directional. The algorithm is of polynomial order of n for an n-bi-rotator graph. It is based on recursion and divided into three cases according to the distribution of destination nodes in the classes into which the nodes in a bi-rotator graph are categorized. We estimated that it obtains 2n-3 disjoint paths with a time complexity of O(n5), that the sum of the path lengths is O(n3), and that the length of the maximum path is O(n2). Computer experiment showed that the average execution time was O(n3.9) and, the average sum of the path lengths was O(n3.0).
Hai JIN Xuanhua SHI Weizhong QIANG Deqing ZOU
Grid computing presents a new trend to distributed and Internet computing to coordinate large scale resources sharing and problem solving in dynamic, multi-institutional virtual organizations. Due to the diverse failures and error conditions in the grid environments, developing, deploying, and executing applications over the grid is a challenge, thus dependability is a key factor for grid computing. This paper presents a dependable grid computing framework, called DRIC, to provide an adaptive failure detection service and a policy-based failure handling mechanism. The failure detection service in DRIC is adaptive to users' QoS requirements and system conditions, and the failure-handling mechanism can be set optimized based on decision-making method by a policy engine. The performance evaluation results show that this framework is scalable, high efficiency and low overhead.
Gabriel RODRIGUEZ María J. MARTIN Patricia GONZALEZ Juan TOURIÑO
This paper presents CPPC (Controller/Precompiler for Portable Checkpointing), a checkpointing tool designed for heterogeneous clusters and Grid infrastructures through the use of portable protocols, portable checkpoint files and portable code. It works at variable level being user-directed, thus generating small checkpoint files. It allows parallel processes to checkpoint independently, without runtime coordination or message-logging. Consistency is achieved at restart time by negotiating the restart point. A directive-based checkpointing precompiler has also been implemented to ease up user's effort. CPPC was designed to work with parallel MPI programs, though it can be used with sequential ones, and easily extended to parallel programs written using different message-passing libraries, due to its highly modular design. Experimental results are shown using CPPC with different test applications.
Che-Wun CHIOU Chiou-Yng LEE An-Wen DENG Jim-Min LIN
Because fault-based attacks on cryptosystems have been proven effective, fault diagnosis and tolerance in cryptography have started a new surge of research and development activity in the field of applied cryptography. Without magnitude comparisons, the Montgomery multiplication algorithm is very attractive and popular for Elliptic Curve Cryptosystems. This paper will design a Montgomery multiplier array with a bit-parallel architecture in GF(2m) with concurrent error detection capability to protect it against fault-based attacks. The robust Montgomery multiplier array with concurrent error detection requires only about 0.2% extra space overhead (if m=512 is as an example) and requires four extra clock cycles compared to the original Montgomery multiplier array without concurrent error detection.
Giscard WEPIWE Plamen L. SIMEONOV
The paper presents HiPeer, a robust resource distribution and discovery algorithm that can be used for fast and fault-tolerant location of resources in P2P network environments. HiPeer defines a concentric multi-ring overlay networking topology, whereon dynamic network management methods are deployed. In terms of performance, HiPeer delivers of number of lowest bounds. We demonstrate that for any De Bruijn digraph of degree d 2 and diameter DDB HiPeer constructs a highly reliable network, where each node maintains a routing table with at most 2d+2 entries independent of the number N of nodes in the system. Further, we show that any existing resource in the network with at most d nodes can be found within at most DHiPeer = log d(N(d-1)+d)-1 overlay hops. This result is as close to the Moore bound [1] as the query path length in other outstanding P2P proposals based on the De Bruijn digraphs. Thus, we argue that HiPeer defines a highly connected network with connectivity d and the lowest yet known lookup bound DHiPeer. Moreover, we show that any node's "join or leave" operation in HiPeer implies a constant expected reorganization cost of the magnitude order of O(d) control messages.
Paola FLOCCHINI Antonio Mesa ENRIQUES Linda PAGLI Giuseppe PRENCIPE Nicola SANTORO
We consider the problem of computing the optimal swap edges of a shortest-path tree. This problem arises in designing systems that offer point-of-failure shortest-path rerouting service in presence of a single link failure: if the shortest path is not affected by the failed link, then the message will be delivered through that path; otherwise, the system will guarantee that, when the message reaches the node where the failure has occurred, the message will then be re-routed through the shortest detour to its destination. There exist highly efficient serial solutions for the problem, but unfortunately because of the structures they use, there is no known (nor foreseeable) efficient distributed implementation for them. A distributed protocol exists only for finding swap edges, not necessarily optimal ones. We present two simple and efficient distributed algorithms for computing the optimal swap edges of a shortest-path tree. One algorithm uses messages containing a constant amount of information, while the other is tailored for systems that allow long messages. The amount of data transferred by the protocols is the same and depends on the structure of the shortest-path spanning-tree; it is no more, and sometimes significantly less, than the cost of constructing the shortest-path tree.
There are three well-known approaches to using scan design to apply two-pattern testing: broadside testing (functional justification), skewed-load testing and enhanced scan testing. The broadside and skewed-load testing use the standard scan design, and thus the area overheads are not high. However fault coverage is low. The enhanced scan testing uses the enhanced scan design. The design uses extra latches, and allows scan-in any two-pattern testing. While this method achieves high fault coverage, it causes high area overhead because of extra latches. This paper presents a new scan design where two-pattern testing with high fault coverage can be performed with area overhead as low as the standard scan design. The proposed scan-FFs are based on master-slave FFs. The input of each scan-FF is connected to the output of the master latch and not the slave latch of the previous FF. Every scan-FF maintains the output value during scan-shift operations.
In recent years, considerable attention has been devoted to continuously running software systems whose performance characteristics are smoothly degrading in time. Software aging often affects the performance of a software system and eventually causes it to fail. A novel approach to handle transient software failures due to software aging is called software rejuvenation, which can be regarded as a preventive and proactive solution that is particularly useful for counteracting the aging phenomenon. In this paper, we focus on a high assurance software system with fault-tolerance and preventive rejuvenation, and analyze the stochastic behavior of such a highly critical software system. More precisely, we consider a fault-tolerant software system with two-version redundant structure and random rejuvenation schedule, and evaluate quantitatively some dependability measures like the steady-state system availability and MTTF based on the familiar Markovian analysis. In numerical examples, we examine the dependence of two fault tolerant techniques; design and environment diversity techniques, on the system dependability measures.
Myungseok KANG Jaeyun JUNG Younghoon WHANG Youngyong KIM Hagbae KIM
This paper presents a Fault-Tolerant Object Group (FTOG) model that provides the group management service and the fault-tolerance service for consistency maintenance and state transparency. Through Intelligent Home Network Simulator, we verify that FTOG model supports both of reliability and the stability of the distributed system.
Shigemasa TAKAI Toshimitsu USHIO
The conventional decentralized supervisory control architectures for discrete event systems assume that default control of controllable events is static. In this paper, we propose a new decentralized supervisory control architecture using dynamic default control of controllable events. We present necessary and sufficient conditions for the existence of a decentralized supervisor in the proposed architecture. Then, we give an example of a language that is achieved in the proposed architecture, but not in the conventional architectures using static default control.
Mohammad Reza AGHAEBRAHIMI Hassan KHORASHADI-ZADEH
A novel application of fuzzy-neuro approach to protection of double circuit transmission line is demonstrated in this paper. Different system faults on a protected transmission line should be detected and classified rapidly and correctly. Using the proposed approach, fault detection, classification and faulted phase selection could be achieved within a quarter of cycle. Results of performance studies show that the proposed fuzzy-neuro-based module can improve the performance of conventional fault selection algorithms.
Arindam MALLIK Matthew C. WILDRICK Gokhan MEMIK
Faults in computer systems can occur due to a variety of reasons. These include internal effects such as coupling and external effects such as alpha particles. As we move towards smaller manufacturing technologies, the probability of errors for a single transistor is likely to increase. Even if this probability remains the same, the probability of a fault in a processor will increase linearly with the boost in the number of transistors per chip. In many systems, an error has a binary effect, i.e., the output is either correct or erroneous. However, networking systems exhibit different properties. For example, although a portion of the code behaves incorrectly due to a fault, the application can still work correctly. Therefore, measuring the effects of faults on the network processor applications require new measurement metrics to be developed. Particularly, hardware faults need to be measured in the context of their effect on the application behavior. In this paper, we highlight essential application properties and data structures that can be used to measure the error behavior of network processors. Using these metrics, we study the error behavior of seven representative networking applications under different cache access fault probabilities. With this study, we hope to bridge the gap between the circuit-level phenomena and their impact on the application behavior.