The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] fault-tolerant(42hit)

21-40hit(42hit)

  • Using VHDL-Based Fault Injection for the Early Diagnosis of a TTP/C Controller

    Joaquín GRACIA  Juan C. BARAZA  Daniel GIL  Pedro J. GIL  

     
    PAPER-Verification and Dependability Analysis

      Vol:
    E86-D No:12
      Page(s):
    2634-2641

    Nowadays, the use of dependable systems is generalising, and diagnosis is an important step during their design . A diagnosis in early phases of the design cycle allows to save time and money. Fault injection can be used during the design process of the system, and using Hardware Description Languages, particularly VHDL, it is possible to accomplish this early diagnosis. During last years, the Time-Triggered Architecture (TTA) has emerged as a hard real-time fault-tolerant architecture for embedded systems. This novel architecture is gaining adepts mainly in the avionics and automotive industries ( x-by-wire ). The TTA implements a synchronous protocol with static scheduling that has been specifically targeted at hard real-time fault-tolerant distributed system. In this work, we present the study of the VHDL model of a communication controller based on the TTA, where a number of fault injection campaigns have been carried out. We comment the results produced and suggest some solutions to problems detected.

  • Fault-Tolerant Execution of Collaborating Mobile Agents

    Taesoon PARK  

     
    LETTER-Reliability, Maintainability and Safety Analysis

      Vol:
    E86-A No:11
      Page(s):
    2897-2900

    Fault-tolerant execution of a mobile agent is an important design issue to build a reliable mobile agent system. Several fault-tolerant schemes for a single agent system have been proposed, however, there has been little research result on the multi-agent system. For the cooperating mobile agents, fault-tolerant schemes should consider the inter-agent dependency as well as the mobility; and try to localize the effect of a failure. In this paper, we investigate properties of inter-agent dependency and agent mobility; and then characterize rollback propagation caused by the dependency and the mobility. We then suggest some schemes to localize rollback propagation.

  • Escape and Restoration Routing: Suspensive Deadlock Recovery in Interconnection Networks

    Toshinori TAKABATAKE  Masato KITAKAMI  Hideo ITO  

     
    PAPER-Computer Systems

      Vol:
    E85-D No:5
      Page(s):
    824-832

    In interconnection networks, deadlock recovery has been studied in routing strategy. The routing strategy for the deadlock recovery is intended to optimize the routing performance when deadlocks do not occur. On the other hand, it is important to improve the routing performance by handling deadlocks if they occur. In this paper, a routing strategy for suspensive deadlock recovery called an escape-restoration routing is proposed and its performance is evaluated. In the principle of the proposed techniques, a small amount of exclusive buffer (escape-buffer) at each router is prepared for handling one of deadlocked packets. The transmission of the packet is suspended by temporarily escaping it to the escape-buffer. After the other deadlocked packets were sent, the suspended transmission resumes by restoring the escaped packet. Evaluation results show that the proposed techniques can improve the routing performance more than that of the previous recovery-based techniques in handling deadlocks.

  • A Fault-Tolerant Deadlock-Free Routing Algorithm in a Meshed Network

    Deogkyoo LEE  Daekeun MOON  Ilgu YUN  Hagbae KIM  

     
    PAPER-Fault Tolerance

      Vol:
    E85-D No:4
      Page(s):
    722-726

    Since components faults occurring at arbitrary places (primarily on the links) affect seriously network performance and reliability, the multicomputers operating in harsh environments should be designed to guarantee normal network-missions in presence of those faults. One solution to the end is a fault-tolerant routing scheme, which enables messages to safely reach their destinations avoiding failed links when transmission of messages is blocked by certain faults. In the paper, we develop a fault-tolerant routing algorithm with deadlock freedom in an n-dimensional meshed network, and validate its efficiency and effectiveness through proper simulations. The aspects of fault-tolerance is adopted by appending partial-adaptiveness and detouring to the e-cube algorithm, while using a wormhole routing for the backbone routing method. The phenomenon of deadlock incurred due to its adaptiveness is eliminated by classifying a physical channel into a couple of virtual channels.

  • Effective Scheduling of Duplicated Tasks for Fault Tolerance in Multiprocessor Systems

    Koji HASHIMOTO  Tatsuhiro TSUCHIYA  Tohru KIKUNO  

     
    PAPER-Fault Tolerance

      Vol:
    E85-D No:3
      Page(s):
    525-534

    In this paper, we propose a new scheduling algorithm to achieve fault tolerance in multiprocessor systems. This algorithm first partitions a parallel program into subsets of tasks, based on the notion of height of a task graph. For each subset, the algorithm then duplicates and schedules the tasks in the subset successively. We prove that schedules obtained by the proposed algorithm can tolerate a single processor failure and show that the computational complexity of the algorithm is O(|V|4) where V is the set of nodes of a task graph. We conduct simulations by applying the algorithm to two kinds of practical task graphs (Gaussian elimination and LU-decomposition). The results of this experiment show that fault tolerance can be achieved at the cost of small degree of time redundancy, and that performance in the case of a processor failure is improved compared to a previous algorithm.

  • Crash Recovery for Distributed Mobile Computing Systems

    Tong-Ying Tony JUANG  

     
    PAPER-Mobile Information Network and Personal Communications

      Vol:
    E84-A No:2
      Page(s):
    668-674

    One major breakthrough on the communication society recently is the extension of networking from wired to wireless networks. This has made possible creating a mobile distributed computing environment and has brought us several new challenges in distributed protocol design. Obviously, wireless networks do have some fundamental differences from wired networks that need to be paid special attention of, such as lower communication bandwidth compared to wired networks, limited electrical power due to battery capacity, and mobility of processes. These new issues make traditional recovery algorithm unsuitable. In this paper, we propose an efficient algorithm with O(nr) message complexity where O(nr) is the total number of mobile hosts (MHs) related to the failed MH. In addition, these MHs only need to rollback once and can immediately resume its operation without waiting for any coordination message from other MHs. During normal operation, the application message needs O(1) additional information when it transmitted between MHs and mobile support stations (MSSs). Each MSS must keep an ntotal_h*n cell_h dependency matrix, where O(ntotal_h) is the total number of MHs in the system and ncell_h is the total number of MHs in its cell. Finally, one related issue of resending lost messages is also considered.

  • Fault-Tolerant Routing Algorithms for Hypercube Interconnection Networks

    Keiichi KANEKO  Hideo ITO  

     
    PAPER-Fault Tolerance

      Vol:
    E84-D No:1
      Page(s):
    121-128

    Many researchers have used hypercube interconnection networks for their good properties to construct many parallel processing systems. However, as the number of processors increases, the probability of occurrences of faulty nodes also increases. Hence, for hypercube interconnection networks which have faulty nodes, several efficient dynamic routing algorithms have been proposed which allow each node to hold status information of its neighbor nodes. In this paper, we propose an improved version of the algorithm proposed by Chiu and Wu by introducing the notion of full reachability. A fully reachable node is a node that can reach all nonfaulty nodes which have Hamming distance l from the node via paths of length l. In addition, we further improve the algorithm by classifying the possibilities of detours with respect to each Hamming distance between current and target nodes. We propose an initialization procedure which makes use of an equivalent condition to perform this classification efficiently. Moreover, we conduct a simulation to measure the improvement ratio and to compare our algorithms with others. The simulation results show that the algorithms are effective when they are applied to low-dimensional hypercube interconnection networks.

  • Evaluation of Two Load-Balancing Primary-Backup Process Allocation Schemes

    Heejo LEE  Jong KIM  Sung Je HONG  

     
    PAPER-Fault Tolerant Computing

      Vol:
    E82-D No:12
      Page(s):
    1535-1544

    In this paper, we show two process allocation schemes to tolerate multiple faults when the primary-backup replication method is used. The first scheme, called multiple backup scheme, is running multiple backup processes for each process to tolerate multiple faults. The second scheme, called regenerative backup scheme, is running only one backup process for each process, but re-generates backup processes for processes that do not have a backup process after a fault occurrence to keep the primary-backup process pair available. In both schemes, we propose heuristic process allocation methods for balancing loads in spite of the occurrence of faults. Then we evaluate and compare the performance of the proposed heuristic process allocation methods using simulation. Next, we analyze the reliability of two schemes based on their fault-tolerance capability. For the analysis of fault-tolerance capability, we find the degree of fault tolerance for each scheme. Then we find the reliability of each scheme using Markov chains. The comparison results of two schemes indicate that the regenerative single backup process allocation scheme is more suitable than the multiple backup allocation scheme.

  • Fault-Tolerant Adaptive Wormhole Routing in 2D Mesh

    Seong-Pyo KIM  Taisook HAN  

     
    PAPER-Fault Tolerant Computing

      Vol:
    E81-D No:10
      Page(s):
    1064-1071

    A fault-tolerant wormhole routing algorithm on mesh-connected processors is proposed. The proposed algorithm is based on the solid fault model and allows the fault polygons to be overlapped. The algorithm compares the position of fault region relative to current channel with the fault direction field of a misrouted message to route around overlapped fault polygons. A node deactivating algorithm to convert non-solid fault region into solid fault region is also proposed. The proposed routing algorithm uses four virtual channels and is deadlock and livelock free.

  • A Reconfigurable Digital Signal Processor

    Boon Keat TAN  Toru OGAWA  Ryuji YOSHIMURA  Kenji TANIGUCHI  

     
    PAPER

      Vol:
    E81-C No:9
      Page(s):
    1424-1430

    This paper describes a new architecture-based DSP processor, which consists of n n mesh multiprocessor for digital signal processing. A prototype chip, RCDSP9701 has been designed and implemented using a CMOS 0. 6 µm process. This architecture has better performance compare to the traditional microprocessor solution to Digital Signal Processing. The proposed method poses remarkable flexibility compare to ASIC (Application Specified Integrated Circuits) approach for Digital Signal Processing applications. In addition, the proposed architecture is fault tolerant and suitable for parallel computing applications. In this paper, an implementation into a silicon chip of the new architecture is presented to give a better understanding of our work.

  • A Fault-Tolerant Wormhole Routing Algorithm in Two Dimensional Mesh Networks

    Jinsoo KIM  Ji-Yun KIM  Hyunsoo YOON  Seung Ryoul MAENG  Jung Wan CHO  

     
    PAPER-Fault Tolerant Computing

      Vol:
    E81-D No:6
      Page(s):
    532-544

    We propose a fault-tolerant routing algorithm for 2D meshes. Our routing algorithm can tolerate any number of concave fault regions. It is based on xy-routing and uses the concept of the fault ring/chain composed of fault-free elements surrounding faults. Three virtual channels per physical link are used for deadlock-free routing on a fault ring. Four virtual channels are needed for a fault chain. For a concave fault ring, fault-free nodes in the concave region have been deactivated to avoid deadlock in the previous algorithms, which results in excessive loss of the computational power. Our algorithm ensures deadlock-freedom by restricting the virtual channel usage in the concave region, and it minimizes the loss of the computational power. We also extend the proposed routing scheme for adaptive fault-tolerant routing. The adaptive version requires the same number of virtual channels as the deterministic one.

  • Fault-Tolerant Hypercubes with Small Degree

    Toshinori YAMADA  Shuichi UENO  

     
    PAPER

      Vol:
    E81-A No:5
      Page(s):
    807-813

    For a given N-vertex graph H, a graph G obtained from H by adding t vertices and some edges is called a t-FT (t-fault-tolerant) graph for H if even after deleting any t vertices from G, the remaining graph contains H as a subgraph. For the n-dimensional cube Q(n) with N vertices, a t-FT graph with an optimal number O(tN+t2) of added edges and maximum degree of O(N+t), and a t-FT graph with O(tNlog N) added edges and maximum degree of O(tlog N) have been known. In this paper, we introduce some t-FT graphs for Q(n) with an optimal number O(tN+t2) of added edges and small maximum degree. In particular, we show a t-FT graph for Q(n) with 2ctN+ct2((logN)/C)C added edges and maximum degree of O(N/(logC/2N))+4ct.

  • Fault-Tolerant Meshes with Efficient Layouts

    Toshinori YAMADA  Shuichi UENO  

     
    PAPER-Fault Tolerant Computing

      Vol:
    E81-D No:1
      Page(s):
    56-65

    This paper presents a practical fault-tolerant architecture for mesh parallel machines that has t spare processors and has 2(t+2) communication links per processor while tolerating at most t+1 processor and link faults. We also show that the architecture presented here can be laid out efficiently in a linear area with wire length at most O(t).

  • Fault-Tolerant Graphs for Hypercubes and Tori*

    Toshinori YAMADA  Koji YAMAMOTO  Shuichi UENO  

     
    PAPER-Fault Diagnosis/Tolerance

      Vol:
    E79-D No:8
      Page(s):
    1147-1152

    Motivated by the design of fault-tolerant multiprocessor interconnection networks, this paper considers the following problem: Given a positive integer t and a graph H, construct a graph G from H by adding a minimum number Δ(t, H) of edges such that even after deleting any t edges from G the remaining graph contains H as a subgraph. We estimate Δ(t, H) for the hypercube and torus, which are well-known as important interconnection networks for multiprocessor systems. If we denote the hypercube and the square torus on N vertices by QN and DN respectively, we show, among others, that Δ(t, QN) = O(tN log(log N/t + log 2e)) for any t and N (t 2), and Δ(1, DN) = N/2 for N even.

  • A High Performance Fault-Tolerant Switching Network for ATM

    Jeen-Fong LIN  Sheng-De WANG  

     
    PAPER-Switching and Communication Processing

      Vol:
    E78-B No:11
      Page(s):
    1518-1528

    A new high-performance fault-tolerant ATM switching network is proposed. This network contains the baseline network and has many redundant switching elements to enhance the fault tolerance and throughput of the conventional multistage interconnection networks. The presented routing algorithm is very simple and can support a very huge number of paths between each input-output pair. The paths can be used to route cells when internal cell contentions occur in switching elements. The redundant switching elements at the last stage offer two access points to the output ports to resolve the output conflict. Performance analysis and simulation results show that this network has better maximum throughput even for faulty conditions. Among various networks, it has the largest number of redundant paths, and the greatest unit node contribution and unit edge contribution.

  • A New Scheduling Scheme in Responsive Systems

    Seongbae EUN  Seung Ryoul MAENG  Jung Wan CHO  

     
    PAPER-Fault Tolerant Computing

      Vol:
    E78-D No:10
      Page(s):
    1282-1287

    The integration of both real-time systems and fault-tolerant systems has been emerged as one of the greatest challenges of this decade. It is called a responsive system, which has the objective to optimeze both timeliness and reliability. The performance measure in responsive systems is responsiveness that tells how probable a system executes correctly on time with faults occurred. While there have been some achievements in communication protocols and specification, we believe that scheduling problems in responsive systems are not understood deeply and sufficiently, yet. In this paper, we discuss the scheduling problem in responsive systems. At first, we investigate the issues in the scheduling and propose the precise definition of the responsiveness. We also suggest a scheduling algorithm called Responsive Earliest Deadline First (REDF) for preemptive aperiodic tasks in a uniprocessor system. We show that REDF is optimal to obtain the maximum responsiveness, and the time complexity is analyzed to be (N 2N). By illustrating a contradictory example, it is shown that REDF can be enhanced if a constraint on tasks is released.

  • The Number of Permutations Realizable in Fault-Tolerant Multistage Interconnection Networks

    Hiroshi MASUYAMA  Tetsuo ICHIMORI  

     
    PAPER-Computer Networks

      Vol:
    E77-D No:9
      Page(s):
    1032-1041

    In this paper we estimate the number of permutations realizable in fault-tolerant multistage interconnection networks designed to tolerate faults on any switching element. The Parallel Omega network and the INDRA network are representative types of fault-tolerate multistage interconnection networks designed to tolerate a single fault. In order to evaluate the enhancement in the function of network by preparing the hardware redundancy for fault-tolerance, we estimate the number of permutations realizable in fault-tolerant networks. This result enables us to set up a standard to evaluate the hardware redundancy required to tolerate multifaults from the viewpoint of the enhancement of network function. This paper concludes that in the case where the number of inputs is up to 32 the increase ratio of the number of realizable permutations is no more than 1/0.73 even if the tolerance to multifaults is prepared instead of the tolerance to a single fault.

  • Design of Robust-Fault-Tolerant Multiple-Valued Arithmetic Circuits and Their Evaluation

    Takeshi KASUGA  Michitaka KAMEYAMA  Tatsuo HIGUCHI  

     
    PAPER

      Vol:
    E76-C No:3
      Page(s):
    428-435

    Robust-fault tolerance is a property that a computational result becomes nearly equal to the correct one at the occurrence of faults in digital system. There are many cases where the safety of digital control systems can be maintained if the property is satisfied. In this paper, robust-fault-tolerant three-valued arithmetic modules such as an adder and a multiplier are proposed. The positive and negative integers are represented by the number of 1's and 1's, respectively. The design concept of the arithmetic modules is that a fault makes linearly additive effect with a small value to the final result. Each arithmetic module consists of identical submodules linearly connected, so that multi-stage structure is formed to generate the final output from the last submodule. Between the input and output digits in the submodule some simple functional relation is satisfied with respect to the number of 1's and 1's. Moreover, the output digit value depends on very small portion of the submodules including the input digits. These properties make the linearly additive effect with a small value to the final result in the arithmetic modules even if multiple faults are occurred at the input and output of any gates in the submodules. Not only direct three-valued representation but also the use of three-valued logic circuits is inherently suitable for efficient implementation of the arithmetic VLSI system. The evaluation of the robust-fault-tolerant three-valued arithmetic modules is done with regard to the chip size and the speed using the standard CMOS design rule. As a result, it is made clear that the chip size can be greatly reduced.

  • Applying Attribute Grammars to Construct Fault-Tolerant Environments for Distributed Software Development

    An FENG  Tohru KIKUNO  Koji TORII  

     
    PAPER

      Vol:
    E75-D No:6
      Page(s):
    810-818

    When a group of developers are involved in the distributed development of some software product, they must communicate with one another frequently to exchange information about the product. To reduce the penalty of communication, the support environment should provide developers with their necessary information and update the information automatically while the product is modified by developers. Furthermore, the environment must meet the following requirements despite of workstation failures: whether a specific information is correct or not should always be decidable; as much information as possible should be updated correctly and efficiently. This paper presents a framework to construct such a fault-tolerant environment based on attribute grammars. In the framework, a product is represented by an attributed tree, which is partitioned into several subtrees {T1,,Tm}. Attribute values in each subtree Ti(1im) express the information about the product required by a developer. We introduce a set of redundant data and algorithms to meet the fault-tolerance requirements mentioned above. The correctness of an attribute value in Ti can then be decided in O(mn0log n) time, where n0n, and n is the number of attribute instances in Ti. All available attribute values can be updated with time complexity O(m2n1 log n) and communication complexity O(m2), where n1 is the number of attribute instances that must be reevaluated.

  • Modeling and Simulation of the Sliding Window Algorithm for Fault-Tolerant Clock Synchronization

    Manfred J. PFLUEGL  Douglas M. BLOUGH  

     
    PAPER

      Vol:
    E75-D No:6
      Page(s):
    792-796

    Synchronous clocks are an essential requirement for a variety of distributed system applications. Many of these applications are safety-critical and require fault tolerance. In this paper, a general probabilistic clock synchronization model is presented. This model is uniformly probabilistic, incorporating random message delays, random clock drifts, and random fault occurrences. The model allows faults in any system component and of any type. Also, a new Sliding Window Clock Synchronization Algorithm (SWA) providing increased fault tolerance is proposed. The probabilistic model is used for an evaluation of SWA which shows that SWA is capable of tolerating significantly more faults than other algorithms and that the synchronization tightness is as good or better than that of other algorithms.

21-40hit(42hit)