IEICE global.ieice.org Site

Keyword Search Result

[Keyword] fault(493hit)

301-320hit(493hit)

A Fault-Tolerant Deadlock-Free Routing Algorithm in a Meshed Network
Deogkyoo LEE Daekeun MOON Ilgu YUN Hagbae KIM

PAPER-Fault Tolerance

Vol:
E85-D No:4
Page(s):
722-726
Since components faults occurring at arbitrary places (primarily on the links) affect seriously network performance and reliability, the multicomputers operating in harsh environments should be designed to guarantee normal network-missions in presence of those faults. One solution to the end is a fault-tolerant routing scheme, which enables messages to safely reach their destinations avoiding failed links when transmission of messages is blocked by certain faults. In the paper, we develop a fault-tolerant routing algorithm with deadlock freedom in an n-dimensional meshed network, and validate its efficiency and effectiveness through proper simulations. The aspects of fault-tolerance is adopted by appending partial-adaptiveness and detouring to the e-cube algorithm, while using a wormhole routing for the backbone routing method. The phenomenon of deadlock incurred due to its adaptiveness is eliminated by classifying a physical channel into a couple of virtual channels.
FT_HORB : A Fault-Tolerant Distributed Programming Environment Based on RMI
Shik KIM Muyong HYUN Jiro YAMAKITA

PAPER-Computer Systems

Vol:
E85-D No:3
Page(s):
510-517
In distributed systems, the provision for failure-recovery is always a hot design issue, whereas no fault-tolerant feature has been extensively considered in the current RMI, CORBA and other OODP environments. As a result, application developers have to implement their own fault tolerant mechanisms. In this paper, we propose a fault-tolerant development environment based on one kind of RMI, called FT_HORB, as a Java extension for the reliable distributed computing with checkpoints and rollback-recovery mechanism. The FT_HORB is implemented on the Sun Ultra10 workstations connected through a 100 Mbps network. We observe that experimental applications on the FT_HORB can continue their operations in spite of hardware and software failures. Three benchmark models such as the nqueens problem, the traveling salesman problem and the gaussian elimination problem are experimented with the FT_HORB to evaluate its performance. The results show the performance of FT_HORB is acceptable. In addition, experiments demonstrate its possibility of extension to fully support our optimal design goal.
A New Diagnostic Method Using Probabilistic Temporal Fault Models
Kazuo HASHIMOTO Kazunori MATSUMOTO Norio SHIRATORI

INVITED PAPER-Artificial Intelligence,Cognitive Science

Vol:
E85-D No:3
Page(s):
444-454
This paper introduces a probabilistic modeling of alarm observation delay, and shows a novel method of model-based diagnosis for time series observation. First, a fault model is defined by associating an event tree rooted by each fault hypothesis with probabilistic variables representing temporal delay. The most probable hypothesis is obtained by selecting one whose Akaike information criterion (AIC) is minimal. It is proved by simulation that the AIC-based hypothesis selection achieves a high precision in diagnosis.
Effective Scheduling of Duplicated Tasks for Fault Tolerance in Multiprocessor Systems
Koji HASHIMOTO Tatsuhiro TSUCHIYA Tohru KIKUNO

PAPER-Fault Tolerance

Vol:
E85-D No:3
Page(s):
525-534
In this paper, we propose a new scheduling algorithm to achieve fault tolerance in multiprocessor systems. This algorithm first partitions a parallel program into subsets of tasks, based on the notion of height of a task graph. For each subset, the algorithm then duplicates and schedules the tasks in the subset successively. We prove that schedules obtained by the proposed algorithm can tolerate a single processor failure and show that the computational complexity of the algorithm is O(|V|4) where V is the set of nodes of a task graph. We conduct simulations by applying the algorithm to two kinds of practical task graphs (Gaussian elimination and LU-decomposition). The results of this experiment show that fault tolerance can be achieved at the cost of small degree of time redundancy, and that performance in the case of a processor failure is improved compared to a previous algorithm.
Potential of Constructive Timing-Violation
Toshinori SATO Itsujiro ARITA

PAPER-High-Performance Technologies

Vol:
E85-C No:2
Page(s):
323-330
This paper proposes constructive timing-violation (CTV) and evaluates its potential. It can be utilized both for increasing clock frequency and for reducing energy consumption. Increasing clock frequency over that determined by the critical paths causes timing violations. On the other hand, while supply voltage reduction can result in substantial power savings, it also causes larger gate delay and thus clock must be slow down in order not to violate timing constraints of critical paths. However, if any tolerant mechanisms are provided for the timing violations, it is not necessary to keep the constraints. Rather, the violations would be constructive for high clock frequency or for energy savings. From these observations, we propose the CTV, which is supported by the tolerant mechanism based on contemporary speculative execution mechanisms. We evaluate the CTV using a cycle-by-cycle simulator and present its considerably promising potential.
Optimal Diagnosable Systems on Cayley Graphs
Toru ARAKI Yukio SHIBATA

PAPER-Graphs and Networks

Vol:
E85-A No:2
Page(s):
455-462
In this paper, we investigate self diagnosable systems on multi-processor systems, known as one-step t-diagnosable systems introduced by Preparata et al. Kohda has proposed "highly structured system" to design diagnosable systems such that faulty processors are diagnosed efficiently. On the other hand, it is known that Cayley graphs have been investigated as good models for architectures of large-scale parallel processor systems. We investigate some conditions for Cayley graphs to be topologies for optimal highly structured diagnosable systems, and present several examples of optimal diagnosable systems represented by Cayley graphs.
A System for Efficiently Self-Reconstructing 1(1/2)-Track Switch Torus Arrays
Tadayoshi HORITA Itsuo TAKANAMI

PAPER-Fault Tolerance

Vol:
E84-D No:12
Page(s):
1801-1809
A mesh-connected processor array consists of many similar processing elements (PEs), which can be executed in both parallel and pipeline processing. For the implementation of an array of large numbers of processors, it is necessary to consider some fault tolerant issues to enhance the (fabrication-time) yield and the (run-time) reliability. In this paper, we introduce the 1(1/2)-track switch torus array by changing the connections in 1(1/2)-track switch mesh array, and we apply our approximate reconfiguration algorithm to the torus array. We describe the reconfiguration strategy for the 1(1/2)-track switch torus array and its realization using WSI, especially 3-dimensional realization. A hardware realization of the algorithm is proposed and simulation results about the array reliability are shown. These imply that a self-reconfigurable system with no host computer can be realized using our method, hence our method is effective in enhancing the run-time reliability as well as the fabrication-time yield of processor arrays.
Reliable Data Routing for Spatial-Temporal TMR Multiprocessor Systems
Mineo KANEKO

PAPER-Fault Tolerance

Vol:
E84-D No:12
Page(s):
1790-1800
This paper treats the data routing problem for fault-tolerant systolic arrays based on Triple Modular Redundancy (TMR) in mixed spatial-temporal domain. The number of logical links required in TMR systolic array is basically 9 times larger than the one for corresponding non-fault-tolerant systolic array. The link sharing is a promising method for reducing the number of physical links, which may, however, degrade the fault tolerance of TMR system. This paper proposes several robust data-routing and resource-sharing (plural data transfers share a physical link, or a data transfer and a computational task share a PE as a relay node for the former and as a processor for the latter), by which certain classes of fault tolerant property will be guaranteed. A stage and a dominated set are introduced to characterize the features of routing/resource-sharing in TMR systems, and conditions on the dominated set and their resultant fault-tolerant properties are derived.
Dynamically Programmable Parallel Processor (DPPP): A Novel Reconfigurable Architecture with Simple Program Interface
Boon-Keat TAN Ryuji YOSHIMURA Toshimasa MATSUOKA Kenji TANIGUCHI

PAPER

Vol:
E84-D No:11
Page(s):
1521-1527
This paper describes a new architecture-based microprocessor, a dynamically programmable parallel processor (DPPP), that consists of large numbers of simplified ALUs (sALU) as processing blocks. All sALUs are interconnected via a code division multiple-access bus interface that provides complete routing flexibility by establishing connections virtually through code-matching instead of physical wires. This feature is utilized further to achieve high parallelism and fault tolerance. High fault tolerance is realized without the limitations of conventional fabrication-based techniques nor providing spare elements. Another feature of the DPPP is its simple programmability, as it can be configured by compiling numerical formula input using the provided user auto-program interface. A prototype chip based on the proposed architecture has been implemented on a 4.5 mm 4.5 mm chip using 0.6 µm CMOS process.
Fault-Tolerant Ring- and Toroidal Mesh-Connected Processor Arrays Able to Enhance Emulation of Hypercubes
Nobuo TSUDA

PAPER

Vol:
E84-D No:11
Page(s):
1452-1461
An advanced spare-connection scheme for K-out-of-N redundancy is proposed for constructing fault-tolerant ring- or toroidal mesh-connected processing-node arrays able to enhance emulation of binary hypercubes by using bypass networks. With this scheme, a component redundancy configuration for a base array with a fixed number of primary nodes, such as that for 8-node ring or 32-node toroidal mesh, can be constructed by using bypass links with a segmented bus structure to selectively connect the primary nodes to a spare node in parallel. These bypass links are allocated to the primary nodes by graph-node coloring with a minimum inter-node distance of three in order to use the bypass links as the hypercube connections as well as to attain strong fault tolerance for reconfiguring the base array with the primary network topology. An extended redundancy configuration for a large fault-tolerant array can be constructed by connecting the component configurations by using external switches of a hub type provided at the bus nodes of the bypass links. This configuration has a network topology of the parallel star-connections of sub-hypercubes whose diameter is smaller than that of the regular hypercube.
Design of Fault Tolerant Multistage Interconnection Networks with Dilated Links
Naotake KAMIURA Takashi KODERA Nobuyuki MATSUI

PAPER

Vol:
E84-D No:11
Page(s):
1500-1507
In this paper we propose a MIN (Multistage Interconnection Network) whose performance in the faulty case degrades as gracefully as possible. We focus on a two-dilated baseline network as a sort of MIN. The link connection pattern in our MIN is determined so that all the available paths established between an input terminal and an output terminal via an identical input of a SE (Switching Element) in some stage will never pass through an identical SE in the next stage. Extra links are useful in improving the performance of the MIN and do not complicate the routing scheme. There is no difference between our MIN and others constructed from a baseline network with regard to numbers of links and cross points in all SEs. The theoretical computation and simulation-based study show that our MIN is superior to others in performance, especially in robustness against concentrated SE faults in an identical stage.
The Evolutionary Algorithm-Based Reasoning System
Moritoshi YASUNAGA Ikuo YOSHIHARA Jung Hwan KIM

PAPER

Vol:
E84-D No:11
Page(s):
1508-1520
In this paper, we propose the evolutionary algorithm-based reasoning system and its design methodology. In the proposed design methodology, reasoning rules behind the past cases in each task (in each case database) are extracted through genetic algorithms and are expressed as truth tables (we call them 'evolved truth tables'). Circuits for the reasoning systems are synthesized from the evolved truth tables. Parallelism in each task can be embedded directly in the circuits by the hardware implementation of the evolved truth tables, so that the high speed reasoning system with small or acceptable hardware size is achieved. We developed a prototype system using Xilinx Virtex FPGA chips and applied it to the gene boundary reasoning (GBR) and English pronunciation reasoning (EPR), which are very important practical tasks in the genome science and language processing field, respectively. The GBR and the EPR prototype systems are evaluated in terms of the reasoning accuracy, circuit size, and processing speed, and compared with the conventional approaches in the parallel AI and the artificial neural networks. Fault injection experiments are also carried out using the prototype system, and its high fault-tolerance, or graceful degradation against defective circuits that suits to the hardware implementation using wafer scale LSIs is demonstrated.
A Graph-Theoretic Approach to Minimizing the Number of Dangerous Processors in Fault-Tolerant Mesh-Connected Processor Arrays
Itsuo TAKANAMI

PAPER

Vol:
E84-D No:11
Page(s):
1462-1470
First, we give a graph-theoretic formalization for the spare assignment problems for two cases of reconfiguring NN mesh-connected processor arrays with spares on a diagonal line in the array or two orthogonal lines at the edges of the array. Second, we discuss the problems for minimizing the numbers of "dangerous processors" for the cases. Here, a dangerous processor is a nonfaulty one for which there remains no spare processor to be assigned if it becomes faulty, without modifying the spare assignments to other faulty processors. The problem for the latter case, originally presented by Melhem, has already been discussed and solved by the O(N2) algorithm in [3], but it's procedure is very complicated. Using the above graph-theoretic formalization, we give efficient plain algorithms for minimizing the numbers of dangerous processors by which the problems for both the cases can be solved in O(N) time.
A High Assurance On-Line Recovery Technology for a Space On-Board Computer
Hiroyuki YASHIRO Teruo FUJIWARA Kinji MORI

PAPER-Issues

Vol:
E84-D No:10
Page(s):
1350-1359
A high assurance on-line recovery technology for a space on-board computer that can be realized using commercial devices is proposed whereby a faulty processor node confirms its normality and then recovers without affecting the other processor nodes in operation. Also, the result of an evaluation test using the breadboard model implementing this technology is reported. Because this technology enables simple and assured recovery of a faulty processor node regardless of its degree of redundancy, it can be applied to various applications, such as a launch vehicle, a satellite, and a reusable launch vehicle. As a result, decreasing the cost of an on-board computer is possible while maintaining its high reliability.
A Learning Algorithm with Activation Function Manipulation for Fault Tolerant Neural Networks
Naotake KAMIURA Yasuyuki TANIGUCHI Yutaka HATA Nobuyuki MATSUI

PAPER-Fault Tolerance

Vol:
E84-D No:7
Page(s):
899-905
In this paper we propose a learning algorithm to enhance the fault tolerance of feedforward neural networks (NNs for short) by manipulating the gradient of sigmoid activation function of the neuron. We assume stuck-at-0 and stuck-at-1 faults of the connection link. For the output layer, we employ the function with the relatively gentle gradient to enhance its fault tolerance. For enhancing the fault tolerance of hidden layer, we steepen the gradient of function after convergence. The experimental results for a character recognition problem show that our NN is superior in fault tolerance, learning cycles and learning time to other NNs trained with the algorithms employing fault injection, forcible weight limit and the calculation of relevance of each weight to the output error. Besides the gradient manipulation incorporated in our algorithm never spoils the generalization ability.
Multiagent-Based Reservation of Backup Virtual Paths in ATM Networks
Shinji INOUE Yoshiaki KAKUDA

PAPER

Vol:
E84-B No:6
Page(s):
1541-1552
In order to make the ATM network fault-tolerant and the network service flexible, a method for the setting up of backup virtual paths (VP's for short) using multiagents is effective with respect to adaptability to change of network resource and user requirements, examples of which are failure of nodes and links and addition of VP's, respectively. In this method, under the assumption that candidates of backup VP's between different pairs of source and destination nodes are given, the optimum backup VP's are obtained by exchanging information among agents autonomously. First, this paper proposes measures for determining backup VP's between different pairs of source and destination nodes. Next, this paper presents simulation results to evaluate the adaptability of the method. The results show that the method efficiently obtains the optimum backup VP's even when the number of backup VP's increases and that different idle time at each destination node enables to shorten the total processing time while keeping complete detection of shared links.
A Hierarchical Approach to Dependability Evaluation of Distributed Systems with Replicated Resources
Eun Hye CHOI Tatsuhiro TSUCHIYA Tohru KIKUNO

PAPER-Fault Tolerance

Vol:
E84-D No:6
Page(s):
692-699
We propose a two-level hierarchical method for dependability evaluation of distributed systems with replicated programs and data files. Since Markov modeling is limited only to each component in this method, state explosion can be circumvented successfully. Simulation results show that the method can accomplish evaluation even for large systems for which Markov modeling is not feasible.
Testable Static CMOS PLA for IDDQ Testing
Masaki HASHIZUME Hiroshi HOSHIKA Hiroyuki YOTSUYANAGI Takeomi TAMESADA

PAPER

Vol:
E84-A No:6
Page(s):
1488-1495
A new IDDQ testable design method is proposed for static CMOS PLA circuits. A testable PLA circuit of NOR-NOR type is designed using this method. It is shown that all bridging faults in NOR planes of the testable designed PLA circuit can be detected by IDDQ testing with 4 sets of test input vectors. The test input vectors are independent of the logical functions to be realized in the PLA circuit. PLA circuits are designed using this method so that the quiescent supply current generated when they are tested will be zero. Thus, high resolution of IDDQ tests for the PLA circuits can be obtained by using the testable design method. Results of IDDQ tests of PLA circuits designed using this testable design method confirm not that the expected output can be generated from the circuits but that the circuits are fabricated without bridging faults in NOR planes. Since bridging faults often occur in state-of-the-art IC fabrication, the testable design is indispensable for realizing highly reliable logic systems.
Error Models and Fault-Secure Scheduling in Multiprocessor Systems
Koji HASHIMOTO Tatsuhiro TSUCHIYA Tohru KIKUNO

PAPER-Fault Tolerance

Vol:
E84-D No:5
Page(s):
635-650
A schedule for a parallel program is said to be 1-fault-secure if a system that uses the schedule can either produce correct output for the program or detect the presence of any faults in a single processor. Although several fault-secure scheduling algorithms have been proposed, they can all only be applied to a class of tree-structured task graphs with a uniform computation cost. Besides, they assume a stringent error model, called the redeemable error model, that considers extremely unlikely cases. In this paper, we first propose two new plausible error models which restrict the manner of error propagation. Then we present three fault-secure scheduling algorithms, one for each of the three models. Unlike previous algorithms, the proposed algorithms can deal with any task graphs with arbitrary computation and communication costs. Through experiments, we evaluate these algorithms and study the impact of the error models on the lengths of fault-secure schedules.
Analysis of I_DDQ Occurrence in Testing
Arabi KESHK Yukiya MIURA Kozo KINOSHITA

LETTER-Computer System Element

Vol:
E84-D No:4
Page(s):
534-536
This work presents an analysis of IDDQ dependency on the primary current that flows through the bridging fault and driven gates current. A maximum primary current depends only on the test vectors which minimize channel resistances of transistors. The driven gates current generates when intermediate voltage occurs on the faulty node with creation current path between VDD and GND through the driven gates, and its value depends on circuit parameters such as transistor sizes and fan-in number of driven gates.

301-320hit(493hit)

Keyword Search Result

[Keyword] fault(493hit)

A Fault-Tolerant Deadlock-Free Routing Algorithm in a Meshed Network

FT_HORB : A Fault-Tolerant Distributed Programming Environment Based on RMI

A New Diagnostic Method Using Probabilistic Temporal Fault Models

Effective Scheduling of Duplicated Tasks for Fault Tolerance in Multiprocessor Systems

Potential of Constructive Timing-Violation

Optimal Diagnosable Systems on Cayley Graphs

A System for Efficiently Self-Reconstructing 1(1/2)-Track Switch Torus Arrays

Reliable Data Routing for Spatial-Temporal TMR Multiprocessor Systems

Dynamically Programmable Parallel Processor (DPPP): A Novel Reconfigurable Architecture with Simple Program Interface

Fault-Tolerant Ring- and Toroidal Mesh-Connected Processor Arrays Able to Enhance Emulation of Hypercubes

Design of Fault Tolerant Multistage Interconnection Networks with Dilated Links

The Evolutionary Algorithm-Based Reasoning System

A Graph-Theoretic Approach to Minimizing the Number of Dangerous Processors in Fault-Tolerant Mesh-Connected Processor Arrays

A High Assurance On-Line Recovery Technology for a Space On-Board Computer

A Learning Algorithm with Activation Function Manipulation for Fault Tolerant Neural Networks

Multiagent-Based Reservation of Backup Virtual Paths in ATM Networks

A Hierarchical Approach to Dependability Evaluation of Distributed Systems with Replicated Resources

Testable Static CMOS PLA for IDDQ Testing

Error Models and Fault-Secure Scheduling in Multiprocessor Systems

Analysis of I_DDQ Occurrence in Testing

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles