The search functionality is under construction.

IEICE TRANSACTIONS on Information

A Concurrent Partial Snapshot Algorithm for Large-Scale and Dynamic Distributed Systems

Yonghwan KIM, Tadashi ARARAGI, Junya NAKAMURA, Toshimitsu MASUZAWA

  • Full Text Views

    0

  • Cite this

Summary :

Checkpoint-rollback recovery, which is a universal method for restoring distributed systems after faults, requires a sophisticated snapshot algorithm especially if the systems are large-scale, since repeatedly taking global snapshots of the whole system requires unacceptable communication cost. As a sophisticated snapshot algorithm, a partial snapshot algorithm has been introduced that takes a snapshot of a subsystem consisting only of the nodes that are communication-related to the initiator instead of a global snapshot of the whole system. In this paper, we modify the previous partial snapshot algorithm to create a new one that can take a partial snapshot more efficiently, especially when multiple nodes concurrently initiate the algorithm. Experiments show that the proposed algorithm greatly reduces the amount of communication needed for taking partial snapshots.

Publication
IEICE TRANSACTIONS on Information Vol.E97-D No.1 pp.65-76
Publication Date
2014/01/01
Publicized
Online ISSN
1745-1361
DOI
10.1587/transinf.E97.D.65
Type of Manuscript
PAPER
Category
Dependable Computing

Authors

Yonghwan KIM
  Osaka University
Tadashi ARARAGI
  NTT Corporation
Junya NAKAMURA
  Osaka University
Toshimitsu MASUZAWA
  Osaka University

Keyword