The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] fault(493hit)

261-280hit(493hit)

  • A Self-Stabilizing Distributed Algorithm for the Steiner Tree Problem

    Sayaka KAMEI  Hirotsugu KAKUGAWA  

     
    PAPER

      Vol:
    E87-D No:2
      Page(s):
    299-307

    Self-stabilization is a theoretical framework of non-masking fault-tolerant distributed algorithms. In this paper, we investigate the Steiner tree problem in distributed systems, and propose a self-stabilizing heuristic solution to the problem. Our algorithm is constructed by four layered modules (sub-algorithms): construction of a shortest path forest, transformation of the network, construction of a minimum spanning tree, and pruning unnecessary links and processes. Competitiveness is 2(1-1/l), where l is the number of leaves of optimal solution.

  • Determining Consistent Global Checkpoints of a Distributed Computation

    Dakshnamoorthy MANIVANNAN  

     
    PAPER-Computer Systems

      Vol:
    E87-D No:1
      Page(s):
    164-174

    Determining consistent global checkpoints of a distributed computation has applications in the areas such as rollback recovery, distributed debugging, output commit and others. Netzer and Xu introduced the notion of zigzag paths and presented necessary and sufficient conditions for a set of checkpoints to be part of a consistent global checkpoint. This result also reveals that determining the existence of zigzag paths between checkpoints is crucial for determining consistent global checkpoints. Recent research also reveals that determining zigzag paths on-line is not possible. In this paper, we present an off-line method for determining the existence of zigzag paths between checkpoints.

  • A Novel Learning Algorithm Which Makes Multilayer Neural Networks Multiple-Weight-Fault Tolerant

    Itsuo TAKANAMI  Yasuhiro OYAMA  

     
    PAPER-Dependable Systems

      Vol:
    E86-D No:12
      Page(s):
    2536-2543

    We propose an efficient algorithm for making multi-layered neural networks (MLN) fault-tolerant to all multiple weight faults in a multi-dimensional interval by injecting intentionally two extreme multi-dimensional values in the interval into the weights of the selected multiple links in a learning phase. The degree of fault-tolerance to a multiple weight fault is measured by the number of essential multiple links. First, we analytically discuss how to choose effectively the multiple links to be injected, and present a learning algorithm for making MLNs fault tolerant to all multiple (i.e., simultaneous) faults in the interval defined by two multi-dimensional extreme points. Then it is proved that after the learning algorithm successfully finishes, MLNs become fault tolerant to all multiple faults in the interval. It is also shown that the time in a weight modification cycle depends little on multiplicity of faults k for small k. These are confirmed by simulation.

  • Dependability Evaluation with Fault Injection Experiments

    Piotr GAWKOWSKI  Janusz SOSNOWSKI  

     
    PAPER-Verification and Dependability Analysis

      Vol:
    E86-D No:12
      Page(s):
    2642-2649

    In the paper we evaluate program susceptibility to hardware faults using fault injector. The performed experiments cover many applications with different features. The effectiveness of software techniques improving system dependability is analyzed. Practical aspects of embedding these techniques in real programs are discussed. They have significant impact on the final fault robustness.

  • An Algorithm for Node-to-Set Disjoint Paths Problem in Burnt Pancake Graphs

    Keiichi KANEKO  

     
    PAPER-Dependable Communication

      Vol:
    E86-D No:12
      Page(s):
    2588-2594

    A burnt pancake graph is a variant of Cayley graphs and its topology is suitable for massively parallel systems. However, for a burnt pancake graph, there is much room for further research. Hence, in this study, we focus on n-burnt pancake graphs and propose an algorithm to obtain n disjoint paths from a source node to n destination nodes in polynomial order time of n, n being the degree of the graph. In addition, we estimate the time complexity of the algorithm and the sum of path lengths. We also give a proof of correctness of the algorithm. Moreover, we report the results of computer simulation to evaluate the average performance of the algorithm.

  • PREGMA: A New Fault Tolerant Cluster Using COTS Components for Internet Services

    Takeshi MISHIMA  Takeshi AKAIKE  

     
    PAPER-Dependable Systems

      Vol:
    E86-D No:12
      Page(s):
    2517-2526

    We propose a new dependable system called PREGMA (Platform for Reliable Environment based on a General-purpose Machine Architecture). PREGMA aims to meet two requirements -- fault tolerance and low cost -- for Internet services. It can provide fault tolerance, so we can avoid system failure and prevent data corruption, even if faults occur. That is, it masks the faults by running multiple replicated servers, each possessing its own data, in a loosely synchronized manner and delivering the majority vote as output to clients. Moreover, PREGMA is composed of COTS (Commercial Off-The-Shelf) components without modification, which makes it possible to offer the services at a low cost. We investigated two approaches for achieving redundancy of the Coordinator, which is the core of PREGMA: using the primary backup method and the active replication method. We evaluated the effectiveness of PREGMA in terms of throughput overhead, data integrity and recovery time. The results for a prototype show that PREGMA using the Coordinator with the primary backup method outperforms that with the active replication method and has throughput only 3% lower than a non-redundant system. The results also show that, in the event of failure, the recovery time is only less than one second and no data corruption occurs.

  • Consideration of Fault Tolerance in Autonomic Computing Environment

    Yoshihiro TOHMA  

     
    INVITED PAPER

      Vol:
    E86-D No:12
      Page(s):
    2503-2507

    Since the characteristic to current information systems is the dynamic and concurrent change of their configurations and scales with non-stop provision of their services, the system management should inevitably rely on autonomic computing. Since fault tolerance is the one of important system management issues, it should also be incorporated in autonomic computing environment. This paper argues what should be taken into consideration and what approach could be available to realize the fault tolerance in such environments.

  • Delay Fault Testing for CMOS Iterative Logic Arrays with a Constant Number of Patterns

    Shyue-Kung LU  

     
    PAPER-Test

      Vol:
    E86-D No:12
      Page(s):
    2659-2665

    Iterative Logic Arrays (ILAs) are widely used in many applications, e.g., general-purpose processors, digital signal processors, and embedded processors. Owing to the advanced VLSI technology, new defect mechanisms exist in the fabricated circuits. In order to ascertain the quality of manufactured products, the traditional single cell fault model is not sufficient. Therefore, more realistic fault models such as sequential fault models and delay fault models should also be adopted. A cell delay fault occurs if and only if an input transition cannot be propagated to the cell's output through a path in the cell in a specified clock period. It has been shown that all SIC (single input change) pairs of a circuit are sufficient to detect all robustly detectable path delay faults within the circuit. We extend the concept of SIC pairs for iterative logic arrays. We say that an ILA is C-testable for cell delay faults if it is possible to apply all SIC pairs of a cell to each cell of the array in such a way that the number of test pairs for the array is a constant. This is based on a novel fault model, called Realistic Sequential Cell Fault Model (RS-CFM). Necessary conditions for sending this test set to each cell in the array and propagating faulty effects to the primary outputs are derived. An efficient algorithm is also presented to obtain such a test sequence. We use the pipelined array multiplier as an example to illustrate our approach. The number of test pairs for completely testing of the array is only 84. Moreover, the hardware overhead to make it delay fault testable is about 5.66%.

  • An Alternative Test Generation for Path Delay Faults by Using Ni-Detection Test Sets

    Hiroshi TAKAHASHI  Kewal K. SALUJA  Yuzo TAKAMATSU  

     
    PAPER-Test

      Vol:
    E86-D No:12
      Page(s):
    2650-2658

    In this paper, we propose an alternative method that does not generate a test for each path delay fault directly to generate tests for path delay faults. The proposed method generates an N-propagation test-pair set by using an Ni-detection test set for single stuck-at faults. The N-propagation test-pair set is a set of vector pairs which contains N distinct vector pairs for every transition faults at a check point. Check points consist of primary inputs and fanout branches in a circuit. We do not target the path delay faults for test generation, instead, the N-propagation test-pair set is generated for the transition (both rising and falling) faults of check points in the circuit. After generating tests, tests are simulated to determine their effectiveness for singly testable path delay faults and robust path delay faults. Results of experiments on the ISCAS'85 benchmark circuits show that the N-propagation test-pair sets obtained by our method are effective in testing path delay faults.

  • A Transparent Transient Faults Tolerance Mechanism for Superscalar Processors

    Toshinori SATO  

     
    PAPER-Dependable Systems

      Vol:
    E86-D No:12
      Page(s):
    2508-2516

    In this paper, we propose a fault-tolerance mechanism for microprocessors, which detects transient faults and recovers from them. The investigation of fault-tolerance techniques for microprocessors is driven by two issues: One regards deep submicron fabrication technologies. Future semiconductor technologies could become more susceptible to alpha particles and other cosmic radiation. The other is the increasing popularity of mobile platforms. Cellular telephones are currently used for applications which are critical to our financial security, such as mobile banking, mobile trading, and making airline ticket reservations. Such applications demand that computer systems work correctly. In light of this, we propose a mechanism which is based on an instruction reissue technique for incorrect data speculation recovery and utilizes time redundancy, and evaluate our proposal using a timing simulator.

  • An Efficiently Self-Reconstructing Array System Using E-1-Track Switches

    Tadayoshi HORITA  Itsuo TAKANAMI  

     
    PAPER-Fault Tolerance

      Vol:
    E86-D No:12
      Page(s):
    2743-2752

    The E-1-track switch torus array model and the "EAR" reconfiguration method are proposed for fault tolerance of mesh or torus-connected processor arrays, where the original idea of EAR is in EAM. The comparison among these and others is described in terms of the (run-time) array reliability, hardware overhead, and/or reconfiguration time. When a designer chooses one among fault tolerant methods, he should consider their features synthetically case by case, and we consider that the results given by this paper are useful for the choice.

  • Using VHDL-Based Fault Injection for the Early Diagnosis of a TTP/C Controller

    Joaquín GRACIA  Juan C. BARAZA  Daniel GIL  Pedro J. GIL  

     
    PAPER-Verification and Dependability Analysis

      Vol:
    E86-D No:12
      Page(s):
    2634-2641

    Nowadays, the use of dependable systems is generalising, and diagnosis is an important step during their design . A diagnosis in early phases of the design cycle allows to save time and money. Fault injection can be used during the design process of the system, and using Hardware Description Languages, particularly VHDL, it is possible to accomplish this early diagnosis. During last years, the Time-Triggered Architecture (TTA) has emerged as a hard real-time fault-tolerant architecture for embedded systems. This novel architecture is gaining adepts mainly in the avionics and automotive industries ( x-by-wire ). The TTA implements a synchronous protocol with static scheduling that has been specifically targeted at hard real-time fault-tolerant distributed system. In this work, we present the study of the VHDL model of a communication controller based on the TTA, where a number of fault injection campaigns have been carried out. We comment the results produced and suggest some solutions to problems detected.

  • Impact of Internal and External Software Faults on the Linux Kernel

    Tahar JARBOUI  Jean ARLAT  Yves CROUZET  Karama KANOUN  Thomas MARTEAU  

     
    PAPER-Dependable Software

      Vol:
    E86-D No:12
      Page(s):
    2571-2578

    The application of fault injection in the context of dependability benchmarking is far from being straightforward. One decisive issue to be addressed is to what extent injected faults are representative of the considered faults. This paper proposes an approach to analyze the effects of real and injected faults.

  • Evaluation of Delay Testing Based on Path Selection

    Masayasu FUKUNAGA  Seiji KAJIHARA  Sadami TAKEOKA  Shinichi YOSHIMURA  

     
    LETTER-Timing Verification and Test Generation

      Vol:
    E86-A No:12
      Page(s):
    3208-3210

    Since a logic circuit often has too many paths to test delay of all paths, it is necessary for path delay testing to limit the number of paths to be tested. The paths to be tested should have large delay because such paths more likely cause a fault. Additionally, a test set for the paths are required to detect other models of faults as many as possible. In this paper, we investigate two typical criteria of path selection for path delay testing. From our experiments, we observe that test patterns for the longest paths cannot cover many local delay defects such as transition faults.

  • A Method of Test Generation for Acyclic Sequential Circuits Using Single Stuck-at Fault Combinational ATPG

    Hideyuki ICHIHARA  Tomoo INOUE  

     
    PAPER-Timing Verification and Test Generation

      Vol:
    E86-A No:12
      Page(s):
    3072-3078

    A test generation method with time-expansion model can achieve high fault efficiency for acyclic sequential circuits, which can be obtained by partial scan design. This method, however, requires combinational test pattern generation algorithm that can deal with multiple stuck-at faults, even if the target faults are single stuck-at faults. In this paper, we propose a test generation method for acyclic sequential circuits with a circuit model, called MS-model, which can express multiple stuck-at faults in time-expansion model as single stuck-at faults. Our procedure can generate test sequences for acyclic sequential circuits with just combinational test pattern generation algorithm for single stuck-at faults. Experimental results show that test sequences for acyclic sequential circuits with high fault efficiency are generated in small computational effort.

  • A Three-tier Active Replication Protocol for Large Scale Distributed Systems

    Carlo MARCHETTI  Sara Tucci PIERGIOVANNI  Roberto BALDONI  

     
    PAPER-Dependable Software

      Vol:
    E86-D No:12
      Page(s):
    2544-2552

    The deployment of server replicas of a service across an asynchronous distributed system (e.g., Internet) is a real practical challenge. This target cannot be indeed achieved by classical software replication techniques (e.g., passive and active replication) as these techniques usually rely on group communication toolkits that require server replicas to run over a partially synchronous distributed system to solve the underlying agreement problem. This paper proposes a three-tier architecture for software replication that encapsulates the need of partial synchrony in a specific software component of a mid-tier to free replicas and clients from the need of underlying partial synchrony assumptions. Then we propose how to specialize the mid-tier in order to manage active replication of server replicas.

  • Multidimensional Characterization of the Impact of Faulty Drivers on the Operating Systems Behavior

    João DURÃES  Henrique MADEIRA  

     
    PAPER-Dependable Software

      Vol:
    E86-D No:12
      Page(s):
    2563-2570

    This paper presents the results of a continuing research work on the practical characterization of operating systems (OS) behavior in the presence of software faults in OS components, such as faulty device drivers. The methodology used is based on the emulation of software faults in device drivers and observation of the behavior of the overall system regarding a comprehensive set of failure modes, analyzed according to different dimensions related to multiple user perspectives. The emulation of the software faults is done through the injection of specific mutations at machine-code level that reproduce the code generated by compilers when typical programming errors occur in the high level language code. Two important aspects of this methodology are the independence of source code availability and the use of simple and established practices to evaluate operating systems failure modes, thus allowing its use as a dependability benchmarking technique. The generalization of the methodology to any software system built of discrete and identifiable components is also discussed.

  • Fault-Tolerant Execution of Collaborating Mobile Agents

    Taesoon PARK  

     
    LETTER-Reliability, Maintainability and Safety Analysis

      Vol:
    E86-A No:11
      Page(s):
    2897-2900

    Fault-tolerant execution of a mobile agent is an important design issue to build a reliable mobile agent system. Several fault-tolerant schemes for a single agent system have been proposed, however, there has been little research result on the multi-agent system. For the cooperating mobile agents, fault-tolerant schemes should consider the inter-agent dependency as well as the mobility; and try to localize the effect of a failure. In this paper, we investigate properties of inter-agent dependency and agent mobility; and then characterize rollback propagation caused by the dependency and the mobility. We then suggest some schemes to localize rollback propagation.

  • Discrete Availability Models to Rejuvenate a Telecommunication Billing Application

    Tadashi DOHI  Kazuki IWAMOTO  Hiroyuki OKAMURA  Naoto KAIO  

     
    PAPER-Network Systems and Applications

      Vol:
    E86-B No:10
      Page(s):
    2931-2939

    Software rejuvenation is a proactive fault management technique that has been extensively studied in the recent literature. In this paper, we focus on an example for a telecommunication billing application considered in Huang et al. (1995) and develop the discrete-time stochastic models to estimate the optimal software rejuvenation schedule. More precisely, two software availability models with rejuvenation are formulated via the discrete semi-Markov processes, and the optimal software rejuvenation schedules which maximize the steady-state availabilities are derived analytically. Further, we develop statistically non-parametric algorithms to estimate the optimal software rejuvenation schedules, provided that the complete sample data of failure times are given. Then, a new statistical device, called the discrete total time on test statistics, is introduced. Finally, we examine asymptotic properties for the statistical estimation algorithms proposed in this paper through a simulation experiment.

  • QoS Certification of Real-Time Distributed Computing Systems: Issues and Promising Approaches

    K.H. (Kane) KIM  

     
    INVITED PAPER

      Vol:
    E86-D No:10
      Page(s):
    2077-2086

    The general public is expected to demand in not too distant future instituting more stringent certification procedures for computing parts of traditional and new-generation safety-critical application systems. Such quality-of-service (QoS) certification processes will not and can not rely solely on the testing approach. Design-time guaranteeing of timely service capabilities of various subsystems is an inevitable part of such processes. Although some promising developments in this area have been occurring in recent years, the technological challenges yet to be overcome are enormous. This paper is a summary of the author's perspective on the remaining challenges and promising directions for tackling them.

261-280hit(493hit)