Xuechun WANG Yuan JI Wendong CHEN Feng RAN Aiying GUO
Hardware implementation of neural networks usually have high computational complexity that increase exponentially with the size of a circuit, leading to more uncertain and unreliable circuit performance. This letter presents a novel Radial Basis Function (RBF) neural network based on parallel fault tolerant stochastic computing, in which number is converted from deterministic domain to probabilistic domain. The Gaussian RBF for middle layer neuron is implemented using stochastic structure that reduce the hardware resources significantly. Our experimental results from two pattern recognition tests (the Thomas gestures and the MIT faces) show that the stochastic design is capable to maintain equivalent performance when the stream length set to 10Kbits. The stochastic hidden neuron uses only 1.2% hardware resource compared with the CORDIC algorithm. Furthermore, the proposed algorithm is very flexible in design tradeoff between computing accuracy, power consumption and chip area.
Motoki AMAGASAKI Yuki NISHITANI Kazuki INOUE Masahiro IIDA Morihiro KUGA Toshinori SUEYOSHI
Fault tolerance is an important feature for the system LSIs used in reliability-critical systems. Although redundancy techniques are generally used to provide fault tolerance, these techniques have significantly hardware costs. However, FPGAs can easily provide high reliability due to their reconfiguration ability. Even if faults occur, the implemented circuit can perform correctly by reconfiguring to a fault-free region of the FPGA. In this paper, we examine an FPGA-IP core loaded in SoC and introduce a fault-tolerant technology based on fault detection and recovery as a CAD-level approach. To detect fault position, we add a route to the manufacturing test method proposed in earlier research and identify fault areas. Furthermore, we perform fault recovery at the logic tile and multiplexer levels using reconfiguration. The evaluation results for the FPGA-IP core loaded in the system LSI demonstrate that it was able to completely identify and avoid fault areas relative to the faults in the routing area.
As semiconductor technologies have advanced, the reliability problem caused by soft-errors is becoming one of the serious issues in LSIs. Moreover, multiple component errors due to single soft-errors also have become a serious problem. In this paper, we propose a method to synthesize multiple component soft-error tolerant application-specific datapaths via high-level synthesis. The novel feature of our method is speculative resource sharing between the retry parts and the secondary parts for time overhead mitigation. A scheduling algorithm using a special priority function to maximize speculative resource sharing is also an important feature of this study. Our approach can reduce the latency (schedule length) in many applications without deterioration of reliability and chip area compared with conventional datapaths without speculative resource sharing. We also found that our method is more effective when a computation algorithm possesses higher parallelism and a smaller number of resources is available.
Motoki AMAGASAKI Qian ZHAO Masahiro IIDA Morihiro KUGA Toshinori SUEYOSHI
In this paper, we propose fault-tolerant field-programmable gate array (FPGA) architectures and their design framework for intellectual property (IP) cores in system-on-chip (SoC). Unlike discrete FPGAs, in which the integration scale can be made relatively large, programmable IP cores must correspond to arrays of various sizes. The key features of our architectures are a regular tile structure, spare modules and bypass wires for fault avoidance, and a configuration mechanism for single-cycle reconfiguration. In addition, we utilize routing tools, namely EasyRouter for proposed architecture. This tool can handle various array sizes corresponding to developed programmable IP cores. In this evaluation, we compared the performances of conventional FPGAs and the proposed fault-tolerant FPGA architectures. On average, our architectures have less than 1.82 times the area and 1.11 times the delay compared with traditional island-style FPGAs. At the same time, our FPGA shows a higher fault tolerant performance.
Ali MORADI AMANI Ahmad AFSHAR Mohammad Bagher MENHAJ
In this paper, the problem of control reconfiguration in the presence of actuator failure preserving the nominal controller is addressed. In the actuator failure condition, the processing algorithm of the control signal should be adapted in order to re-achieve the desired performance of the control loop. To do so, the so-called reconfiguration block, is inserted into the control loop to reallocate nominal control signals among the remaining healthy actuators. This block can be either a constant mapping or a dynamical system. In both cases, it should be designed so that the states or output of the system are fully recovered. All these situations are completely analysed in this paper using a novel structural approach leading to some theorems which are supported in each section by appropriate simulations.
This paper presents a fault-tolerance scheme based on mobile agents for the reliable mobile computing systems. Mobility of the agent is suitable to trace the mobile hosts and the intelligence of the agent makes it efficient to support the fault tolerance services. This paper presents two approaches to implement the mobile agent based fault tolerant service and their performances are evaluated and compared with other fault-tolerant schemes.
Noritaka SHIGEI Hiromi MIYAJIMA
This paper considers a reconfiguration problem on a processor array model based on single-and-half-track switches, which is proposed for a fault tolerance technique at the fabrication time. The focus of this paper is to achieve the optimal reconfigurability, which means that whenever there exists a solution for successful reconfiguration, the designed method can find the solution. The paper consists of two parts. In the first part, we show two essential constraints that have been assumed in most of the previous studies, and make four reconfiguration classes that differ in the assumed essential constraints. Then, we present some inclusion relations among the four reconfiguration classes. As a result, it becomes clear that the most restrictive class including most of the previous methods never achieves the truly optimal reconfigurability. In the second part, we present a reconfiguration method based on sequential routing (RMSR). Although the worst-case time complexity of the RMSR is exponential in the number of processing elements, the reconfigurability of the RMSR is optimal within the most restrictive reconfiguration class. The effectiveness of the RMSR is shown by a computer simulation.
Jihoon PARK Jongkyu PARK Ilseok HAN Hagbae KIM
The network duplicating can achieve significant improvements of the Local Area Network (LAN)'s performance, availability, and security. For LAN duplicating, a Dual-Path Ethernet Module (DPEM) is developed. Since a DPEM is simply located at the front end of any network device as a transparent add-on type independent hardware machine, it does not require sophisticated server reconfiguration. We examine the desirable properties and the characteristics on the Dual-LAN structure. Our evaluation results show that the developed scheme is more efficient than the conventional Single-LAN structures in various aspects.
Wen-Tzeng HUANG Yen-Chu CHUANG Jimmy Jiann-Mean TAN Lih-Hsing HSU
An n-dimensional crossed cube, CQn, is a variation of the hypercube. In this paper, we prove that CQn is (n-2)-Hamiltonian and (n-3)-Hamiltonian connected. That is, a ring of length 2n-fv can be embedded in a faulty CQn with fv faulty nodes and fe faulty edges, where fv+fen-2 and n3. In other words, we show that the faulty CQn is still Hamiltonian with n-2 faults. In addition, we also prove that there exists a Hamiltonian path between any pair of vertices in a faulty CQn with n-3 faults. The above results are optimum in the sense that the fault-tolerant Hamiltonicity (fault-tolerant Hamiltonian connectivity respectively) of CQn is at most n-2 (n-3 respectively). A recent result has shown that a ring of length 2n-2fv can be embedded in a faulty hypercube, if fv+fen-1 and n4, with a few additional constraints. Our results, in comparison to the hypercube, show that longer rings can be embedded in CQn without additional constraints.
Boon-Keat TAN Ryuji YOSHIMURA Toshimasa MATSUOKA Kenji TANIGUCHI
This paper describes a new architecture-based microprocessor, a dynamically programmable parallel processor (DPPP), that consists of large numbers of simplified ALUs (sALU) as processing blocks. All sALUs are interconnected via a code division multiple-access bus interface that provides complete routing flexibility by establishing connections virtually through code-matching instead of physical wires. This feature is utilized further to achieve high parallelism and fault tolerance. High fault tolerance is realized without the limitations of conventional fabrication-based techniques nor providing spare elements. Another feature of the DPPP is its simple programmability, as it can be configured by compiling numerical formula input using the provided user auto-program interface. A prototype chip based on the proposed architecture has been implemented on a 4.5 mm 4.5 mm chip using 0.6 µm CMOS process.
Tadayoshi HORITA Itsuo TAKANAMI
We gave in [1] the software and hardware algorithms for reconfiguring 1 1/2-track switch 2-D mesh arrays with faults of processing elements, avoiding them. This paper shows an implementation of the hardware algorithm using an FPGA device, and by the logical simulation confirms the correctness of the behavior and evaluates reconfiguration time. From the result it is found that a self-repairable system is realizable and the system is useful for the run-time as well as fabrication-time reconfiguration because it requires no host computer to execute the reconfiguration algorithm and the reconfiguration time is very short.
Adel CHERIF Masato SUZUKI Takuya KATAYAMA
We present a novel replication technique for parallel applications where instances of the replicated application are active on different group of processors called replicas. The replication technique is based on the FTAG (Fault Tolerant Attribute Grammar) computation model. FTAG is a functional and attribute based model. The developed replication technique implements "active parallel replication," that is, all replicas are active and compute concurrently a different piece of the application parallel code. In our model replicas cooperate not only to detect and mask failures but also to perform parallel computation. The replication mechanisms are supported by FTAG run time system and are fully application-transparent. Different novel mechanisms for checkpointing and recovery are developed. In our model during rollback recovery only that part of the computation that was detected faulty is discarded. The replication technique takes full advantage of parallel computing to reduce overall computation time.
The construction of fault-tolerant processor arrays with interconnections of cube-connected cycles (CCCs) by using an advanced spare-connection scheme for k-out-of-n redundancies called "generalized additional bypass linking" is described. The connection scheme uses bypass links with wired OR connections to spare processing elements (PEs) without external switches, and can reconfigure complete arrays by tolerating faulty portions in these PEs and links. The spare connections are designed as a node-coloring problem of a CCC graph with a minimum distance of 3: the chromatic numbers corresponding to the number of spare PE connections were evaluated theoretically. The proposed scheme can be used for constructing various k-out-of-n configurations capable of quick broadcasting by using spare circuits, and is superior to conventional schemes in terms of extra PE connections and reconfiguration control. In particular, it allows construction of optimal r-fault-tolerant configurations that provide r spare PEs and r extra connections per PE for CCCs with 4x PEs (x: integer) in each cycle.
In contrast to previous algorithms for reconfiguring processor arrays under the assumption that spare rows and columns are placed on the perimeter of the array or on fixed positions, our new algorithm employs movable and partitionable spare rows and columns. The objective of moving and partitioning spare rows and/or columns is the elimination of faulty processors each of which is blocked in all directions to spare processors. The results of our computer simulation indicate that reconfigurability can significantly be improved.
Shinichiro YAMAGUCHI Tetsuaki NAKAMIKAWA Naoto MIYAZAKI Yuuichirou MORITA Yoshihiro MIYAZAKI Sakou ISHIKAWA
The fault tolerant computer (FTC) is applied as a communication or database server in the information service and computer aided process control fields. User requires of the FTC are to provide the current level of performance and software transparency needing no additional dedicated program for fault tolerance. To meet these requirements, we propose quadprocessor redundancy (QPR) architecture that combines dualRISC based duplicated CPUs integrating main memories, and duplicated I/O subsystems by using some additional hardware. Duplicated CPUs run under the instruction level synchronization (lock step operation) , and the duplicated I/O subsystems are managed by an operating system. When a fault is detected, the faulty CPU is isolated by hardware. User program is continuously executed by the remaining CPU. We applied the QPR to our UNIX servers, and achieved satisfactory levels of performance.
In this paper, we study the following node-to-node and node-to-set routing problems in r-dimensional torus Trn with r 2 and n 4 nodes in each dimension: given at most 2r - 1 faulty nodes and non-faulty nodes s and t in Trn, find a fault-free path s t; and given at most 2r - k faulty nodes and non-faulty nodes s and t1,..., tk in Trn, find k fault-free node-disjoint paths s ti, 1 i k. We give an O(r2) time algorithm which finds a fault-free path s t of length at most d(Trn) + 1 for the node-to-node routing, where d(Trn) is the diameter of Trn. For node-to-set routing, we show an O(r3) time algorithm which finds k fault-free node-disjoint paths s ti, 1 i k, of length at most d(Trn) + 1. The upper bounds on the length of the found paths are optimal. From this, Rabin diameter of Trn is d(Trn) + 1.
In this paper, we give an algorithm which, given a set F of at most (n - 1) - k faulty nodes, and two sets S = {s1,..., sk} and T = {t1,..., tk}, 1 k n - 1, of nonfaulty nodes in n-dimensional star graphs Gn, finds k fault-free node disjoint paths si tji, where (j1,..., jk) is a permutation of (1,..., k), of length at most d(Gn) + 5 in O(kn) optimal time, where d(Gn) = 3(n-1)/2 is the diameter of Gn.
Qian Ping GU Satoshi OKAWA Shietung PENG
In this paper, we give an algorithm which, given a set F of at most n-k faulty nodes, and two sets S={s1,
In this paper, we study the following node-to-node fault tolerant routing problem: In the presence of up to n-1 faulty nodes, find a fault-free path which connects any two non-faulty nodes s and t in an n-connected graph. For node-to-node fault tolerant routing in n-dimensional hypercubes Hn, we give an algorithm which finds a fault-free path s t of length at most in O(n) time, where d(s, t) is the distance between s and t. We also show that a fault-free path s t in Hn of length at most d(s, t)2i, 1i, can be found in time. For node-to-node fault tolerant routing in n-dimensional star graphs Gn, we give an algorithm which finds a fault-free path s t of length at most min{d(Gn)3, d(s, t)6} in O(n) time, where is the diameter of Gn. It is previously known that, in Hn, a fault-free path s t of length at most d(s, t) for d(s, t)n and at most d(s, t)2 for d(s, t)n can be found in O(d(s, t)n) time, and in Gn, a fault-free path s t of length at most min{d(Gn)1, d(s, t)4}can be found in O(d(s, t)n) time. When the time efficiency of finding the routing path is more important than the length of the path, the algorithms in this paper are better than the previous ones.
This paper proposes a large capacity high-speed file memory system implemented with wafer scale RAM which adopts a novel defect-tolerant technique. Based on set-associative mapping, the defective memory blocks on the wafer are repaired by switching with the spare memory blocks. In order to repair the clustered defective blocks, these are permuted logically with other blocks by adding some constant value to the input block addresses. The defective blocks remaining even after applying the above two methods are repaired by using error control codes which correct soft errors induced by alpha particles in an on-line operation as well as hard errors induced by the remaining defective blocks. By using the proposed technique, this paper demonstrates a large capacity high-speed WSI file memory system implemented with high fabrication yield and low redundancy rate.