In this paper, we propose a new scheduling algorithm to achieve fault tolerance in multiprocessor systems. This algorithm first partitions a parallel program into subsets of tasks, based on the notion of height of a task graph. For each subset, the algorithm then duplicates and schedules the tasks in the subset successively. We prove that schedules obtained by the proposed algorithm can tolerate a single processor failure and show that the computational complexity of the algorithm is O(|V|4) where V is the set of nodes of a task graph. We conduct simulations by applying the algorithm to two kinds of practical task graphs (Gaussian elimination and LU-decomposition). The results of this experiment show that fault tolerance can be achieved at the cost of small degree of time redundancy, and that performance in the case of a processor failure is improved compared to a previous algorithm.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Koji HASHIMOTO, Tatsuhiro TSUCHIYA, Tohru KIKUNO, "Effective Scheduling of Duplicated Tasks for Fault Tolerance in Multiprocessor Systems" in IEICE TRANSACTIONS on Information,
vol. E85-D, no. 3, pp. 525-534, March 2002, doi: .
Abstract: In this paper, we propose a new scheduling algorithm to achieve fault tolerance in multiprocessor systems. This algorithm first partitions a parallel program into subsets of tasks, based on the notion of height of a task graph. For each subset, the algorithm then duplicates and schedules the tasks in the subset successively. We prove that schedules obtained by the proposed algorithm can tolerate a single processor failure and show that the computational complexity of the algorithm is O(|V|4) where V is the set of nodes of a task graph. We conduct simulations by applying the algorithm to two kinds of practical task graphs (Gaussian elimination and LU-decomposition). The results of this experiment show that fault tolerance can be achieved at the cost of small degree of time redundancy, and that performance in the case of a processor failure is improved compared to a previous algorithm.
URL: https://global.ieice.org/en_transactions/information/10.1587/e85-d_3_525/_p
Copy
@ARTICLE{e85-d_3_525,
author={Koji HASHIMOTO, Tatsuhiro TSUCHIYA, Tohru KIKUNO, },
journal={IEICE TRANSACTIONS on Information},
title={Effective Scheduling of Duplicated Tasks for Fault Tolerance in Multiprocessor Systems},
year={2002},
volume={E85-D},
number={3},
pages={525-534},
abstract={In this paper, we propose a new scheduling algorithm to achieve fault tolerance in multiprocessor systems. This algorithm first partitions a parallel program into subsets of tasks, based on the notion of height of a task graph. For each subset, the algorithm then duplicates and schedules the tasks in the subset successively. We prove that schedules obtained by the proposed algorithm can tolerate a single processor failure and show that the computational complexity of the algorithm is O(|V|4) where V is the set of nodes of a task graph. We conduct simulations by applying the algorithm to two kinds of practical task graphs (Gaussian elimination and LU-decomposition). The results of this experiment show that fault tolerance can be achieved at the cost of small degree of time redundancy, and that performance in the case of a processor failure is improved compared to a previous algorithm.},
keywords={},
doi={},
ISSN={},
month={March},}
Copy
TY - JOUR
TI - Effective Scheduling of Duplicated Tasks for Fault Tolerance in Multiprocessor Systems
T2 - IEICE TRANSACTIONS on Information
SP - 525
EP - 534
AU - Koji HASHIMOTO
AU - Tatsuhiro TSUCHIYA
AU - Tohru KIKUNO
PY - 2002
DO -
JO - IEICE TRANSACTIONS on Information
SN -
VL - E85-D
IS - 3
JA - IEICE TRANSACTIONS on Information
Y1 - March 2002
AB - In this paper, we propose a new scheduling algorithm to achieve fault tolerance in multiprocessor systems. This algorithm first partitions a parallel program into subsets of tasks, based on the notion of height of a task graph. For each subset, the algorithm then duplicates and schedules the tasks in the subset successively. We prove that schedules obtained by the proposed algorithm can tolerate a single processor failure and show that the computational complexity of the algorithm is O(|V|4) where V is the set of nodes of a task graph. We conduct simulations by applying the algorithm to two kinds of practical task graphs (Gaussian elimination and LU-decomposition). The results of this experiment show that fault tolerance can be achieved at the cost of small degree of time redundancy, and that performance in the case of a processor failure is improved compared to a previous algorithm.
ER -