Effective Scheduling of Duplicated Tasks for Fault Tolerance in Multiprocessor Systems

Koji HASHIMOTO; Tatsuhiro TSUCHIYA; Tohru KIKUNO

Effective Scheduling of Duplicated Tasks for Fault Tolerance in Multiprocessor Systems

Koji HASHIMOTO, Tatsuhiro TSUCHIYA, Tohru KIKUNO

Full Text Views

0

Cite this

Summary :

In this paper, we propose a new scheduling algorithm to achieve fault tolerance in multiprocessor systems. This algorithm first partitions a parallel program into subsets of tasks, based on the notion of height of a task graph. For each subset, the algorithm then duplicates and schedules the tasks in the subset successively. We prove that schedules obtained by the proposed algorithm can tolerate a single processor failure and show that the computational complexity of the algorithm is O(|V|⁴) where V is the set of nodes of a task graph. We conduct simulations by applying the algorithm to two kinds of practical task graphs (Gaussian elimination and LU-decomposition). The results of this experiment show that fault tolerance can be achieved at the cost of small degree of time redundancy, and that performance in the case of a processor failure is improved compared to a previous algorithm.

Publication: IEICE TRANSACTIONS on Information Vol.E85-D No.3 pp.525-534

Publication Date: 2002/03/01

Publicized

Online ISSN

DOI

Type of Manuscript: PAPER

Category: Fault Tolerance

Cite this

Copy

Koji HASHIMOTO, Tatsuhiro TSUCHIYA, Tohru KIKUNO, "Effective Scheduling of Duplicated Tasks for Fault Tolerance in Multiprocessor Systems" in IEICE TRANSACTIONS on Information, vol. E85-D, no. 3, pp. 525-534, March 2002, doi: .
Abstract: In this paper, we propose a new scheduling algorithm to achieve fault tolerance in multiprocessor systems. This algorithm first partitions a parallel program into subsets of tasks, based on the notion of height of a task graph. For each subset, the algorithm then duplicates and schedules the tasks in the subset successively. We prove that schedules obtained by the proposed algorithm can tolerate a single processor failure and show that the computational complexity of the algorithm is O(|V|⁴) where V is the set of nodes of a task graph. We conduct simulations by applying the algorithm to two kinds of practical task graphs (Gaussian elimination and LU-decomposition). The results of this experiment show that fault tolerance can be achieved at the cost of small degree of time redundancy, and that performance in the case of a processor failure is improved compared to a previous algorithm.
URL: https://global.ieice.org/en_transactions/information/10.1587/e85-d_3_525/_p

Copy

@ARTICLE{e85-d_3_525,
author={Koji HASHIMOTO, Tatsuhiro TSUCHIYA, Tohru KIKUNO, },
journal={IEICE TRANSACTIONS on Information},
title={Effective Scheduling of Duplicated Tasks for Fault Tolerance in Multiprocessor Systems},
year={2002},
volume={E85-D},
number={3},
pages={525-534},
abstract={In this paper, we propose a new scheduling algorithm to achieve fault tolerance in multiprocessor systems. This algorithm first partitions a parallel program into subsets of tasks, based on the notion of height of a task graph. For each subset, the algorithm then duplicates and schedules the tasks in the subset successively. We prove that schedules obtained by the proposed algorithm can tolerate a single processor failure and show that the computational complexity of the algorithm is O(|V|⁴) where V is the set of nodes of a task graph. We conduct simulations by applying the algorithm to two kinds of practical task graphs (Gaussian elimination and LU-decomposition). The results of this experiment show that fault tolerance can be achieved at the cost of small degree of time redundancy, and that performance in the case of a processor failure is improved compared to a previous algorithm.},
keywords={},
doi={},
ISSN={},
month={March},}

Copy

TY - JOUR
TI - Effective Scheduling of Duplicated Tasks for Fault Tolerance in Multiprocessor Systems
T2 - IEICE TRANSACTIONS on Information
SP - 525
EP - 534
AU - Koji HASHIMOTO
AU - Tatsuhiro TSUCHIYA
AU - Tohru KIKUNO
PY - 2002
DO -
JO - IEICE TRANSACTIONS on Information
SN -
VL - E85-D
IS - 3
JA - IEICE TRANSACTIONS on Information
Y1 - March 2002
AB - In this paper, we propose a new scheduling algorithm to achieve fault tolerance in multiprocessor systems. This algorithm first partitions a parallel program into subsets of tasks, based on the notion of height of a task graph. For each subset, the algorithm then duplicates and schedules the tasks in the subset successively. We prove that schedules obtained by the proposed algorithm can tolerate a single processor failure and show that the computational complexity of the algorithm is O(|V|⁴) where V is the set of nodes of a task graph. We conduct simulations by applying the algorithm to two kinds of practical task graphs (Gaussian elimination and LU-decomposition). The results of this experiment show that fault tolerance can be achieved at the cost of small degree of time redundancy, and that performance in the case of a processor failure is improved compared to a previous algorithm.
ER -