Failure Detection in P2P-Grid System

Huan WANG; Hideroni NAKAZATO

doi:10.1587/transinf.2015PAP0014

IEICE TRANSACTIONS on Information

Failure Detection in P2P-Grid System

Huan WANG, Hideroni NAKAZATO

Full Text Views

0

Cite this

Summary :

Peer-to-peer (P2P)-Grid systems are being investigated as a platform for converging the Grid and P2P network in the construction of large-scale distributed applications. The highly dynamic nature of P2P-Grid systems greatly affects the execution of the distributed program. Uncertainty caused by arbitrary node failure and departure significantly affects the availability of computing resources and system performance. Checkpoint-and-restart is the most common scheme for fault tolerance because it periodically saves the execution progress onto stable storage. In this paper, we suggest a checkpoint-and-restart mechanism as a fault-tolerant method for applications on P2P-Grid systems. Failure detection mechanism is a necessary prerequisite to fault tolerance and fault recovery in general. Given the highly dynamic nature of nodes within P2P-Grid systems, any failure should be detected to ensure effective task execution. Therefore, failure detection mechanism as an integral part of P2P-Grid systems was studied. We discussed how the design of various failure detection algorithms affects their performance in average failure detection time of nodes. Numerical analysis results and implementation evaluation are also provided to show different average failure detection times in real systems for various failure detection algorithms. The comparison shows the shortest average failure detection time by 8.8s on basis of the WP failure detector. Our lowest mean time to recovery (MTTR) is also proven to have a distinct advantage with a time consumption reduction of about 5.5s over its counterparts.

Publication: IEICE TRANSACTIONS on Information Vol.E98-D No.12 pp.2123-2131

Publication Date: 2015/12/01

Publicized: 2015/09/15

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2015PAP0014

Type of Manuscript: Special Section PAPER (Special Section on Parallel and Distributed Computing and Networking)

Category: Grid System

Authors

Huan WANG
Waseda University
Hideroni NAKAZATO
Waseda University

Keyword

P2P-grid systems, fault tolerance, failure detection, failure recovery

Cite this

Copy

Huan WANG, Hideroni NAKAZATO, "Failure Detection in P2P-Grid System" in IEICE TRANSACTIONS on Information, vol. E98-D, no. 12, pp. 2123-2131, December 2015, doi: 10.1587/transinf.2015PAP0014.
Abstract: Peer-to-peer (P2P)-Grid systems are being investigated as a platform for converging the Grid and P2P network in the construction of large-scale distributed applications. The highly dynamic nature of P2P-Grid systems greatly affects the execution of the distributed program. Uncertainty caused by arbitrary node failure and departure significantly affects the availability of computing resources and system performance. Checkpoint-and-restart is the most common scheme for fault tolerance because it periodically saves the execution progress onto stable storage. In this paper, we suggest a checkpoint-and-restart mechanism as a fault-tolerant method for applications on P2P-Grid systems. Failure detection mechanism is a necessary prerequisite to fault tolerance and fault recovery in general. Given the highly dynamic nature of nodes within P2P-Grid systems, any failure should be detected to ensure effective task execution. Therefore, failure detection mechanism as an integral part of P2P-Grid systems was studied. We discussed how the design of various failure detection algorithms affects their performance in average failure detection time of nodes. Numerical analysis results and implementation evaluation are also provided to show different average failure detection times in real systems for various failure detection algorithms. The comparison shows the shortest average failure detection time by 8.8s on basis of the WP failure detector. Our lowest mean time to recovery (MTTR) is also proven to have a distinct advantage with a time consumption reduction of about 5.5s over its counterparts.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2015PAP0014/_p

Copy

@ARTICLE{e98-d_12_2123,
author={Huan WANG, Hideroni NAKAZATO, },
journal={IEICE TRANSACTIONS on Information},
title={Failure Detection in P2P-Grid System},
year={2015},
volume={E98-D},
number={12},
pages={2123-2131},
abstract={Peer-to-peer (P2P)-Grid systems are being investigated as a platform for converging the Grid and P2P network in the construction of large-scale distributed applications. The highly dynamic nature of P2P-Grid systems greatly affects the execution of the distributed program. Uncertainty caused by arbitrary node failure and departure significantly affects the availability of computing resources and system performance. Checkpoint-and-restart is the most common scheme for fault tolerance because it periodically saves the execution progress onto stable storage. In this paper, we suggest a checkpoint-and-restart mechanism as a fault-tolerant method for applications on P2P-Grid systems. Failure detection mechanism is a necessary prerequisite to fault tolerance and fault recovery in general. Given the highly dynamic nature of nodes within P2P-Grid systems, any failure should be detected to ensure effective task execution. Therefore, failure detection mechanism as an integral part of P2P-Grid systems was studied. We discussed how the design of various failure detection algorithms affects their performance in average failure detection time of nodes. Numerical analysis results and implementation evaluation are also provided to show different average failure detection times in real systems for various failure detection algorithms. The comparison shows the shortest average failure detection time by 8.8s on basis of the WP failure detector. Our lowest mean time to recovery (MTTR) is also proven to have a distinct advantage with a time consumption reduction of about 5.5s over its counterparts.},
keywords={},
doi={10.1587/transinf.2015PAP0014},
ISSN={1745-1361},
month={December},}

Copy

TY - JOUR
TI - Failure Detection in P2P-Grid System
T2 - IEICE TRANSACTIONS on Information
SP - 2123
EP - 2131
AU - Huan WANG
AU - Hideroni NAKAZATO
PY - 2015
DO - 10.1587/transinf.2015PAP0014
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E98-D
IS - 12
JA - IEICE TRANSACTIONS on Information
Y1 - December 2015
AB - Peer-to-peer (P2P)-Grid systems are being investigated as a platform for converging the Grid and P2P network in the construction of large-scale distributed applications. The highly dynamic nature of P2P-Grid systems greatly affects the execution of the distributed program. Uncertainty caused by arbitrary node failure and departure significantly affects the availability of computing resources and system performance. Checkpoint-and-restart is the most common scheme for fault tolerance because it periodically saves the execution progress onto stable storage. In this paper, we suggest a checkpoint-and-restart mechanism as a fault-tolerant method for applications on P2P-Grid systems. Failure detection mechanism is a necessary prerequisite to fault tolerance and fault recovery in general. Given the highly dynamic nature of nodes within P2P-Grid systems, any failure should be detected to ensure effective task execution. Therefore, failure detection mechanism as an integral part of P2P-Grid systems was studied. We discussed how the design of various failure detection algorithms affects their performance in average failure detection time of nodes. Numerical analysis results and implementation evaluation are also provided to show different average failure detection times in real systems for various failure detection algorithms. The comparison shows the shortest average failure detection time by 8.8s on basis of the WP failure detector. Our lowest mean time to recovery (MTTR) is also proven to have a distinct advantage with a time consumption reduction of about 5.5s over its counterparts.
ER -

IEICE TRANSACTIONS on Information

Failure Detection in P2P-Grid System

Summary :

Authors

Keyword

Latest Issue

Contents

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles

IEICE TRANSACTIONS on Information

Failure Detection in P2P-Grid System

Summary :

Authors

Keyword

Latest Issue

Contents

Copyrights notice of machine-translated contents

Cite this

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles