Communication-induced checkpointing (CIC) has two main advantages: first, it allows processes in a distributed computation to take asynchronous checkpoints, and secondly, it avoids the domino effect. To achieve these, CIC algorithms piggyback information on the application messages and take forced local checkpoints when they recognize potentially dangerous patterns. The main disadvantages of CIC algorithms are the amount of overhead per message and the induced storage overhead. In this paper we present a communication-induced checkpointing algorithm called Scalable Fully-Informed (S-FI) that attacks the problem of message overhead. For this, our algorithm modifies the Fully-Informed algorithm by integrating it with the immediate dependency principle. The S-FI algorithm was simulated and the result shows that the algorithm is scalable since the message overhead presents an under-linear growth as the number of processes and/or the message density increase.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Alberto CALIXTO SIMON, Saul E. POMARES HERNANDEZ, Jose Roberto PEREZ CRUZ, Pilar GOMEZ-GIL, Khalil DRIRA, "A Scalable Communication-Induced Checkpointing Algorithm for Distributed Systems" in IEICE TRANSACTIONS on Information,
vol. E96-D, no. 4, pp. 886-896, April 2013, doi: 10.1587/transinf.E96.D.886.
Abstract: Communication-induced checkpointing (CIC) has two main advantages: first, it allows processes in a distributed computation to take asynchronous checkpoints, and secondly, it avoids the domino effect. To achieve these, CIC algorithms piggyback information on the application messages and take forced local checkpoints when they recognize potentially dangerous patterns. The main disadvantages of CIC algorithms are the amount of overhead per message and the induced storage overhead. In this paper we present a communication-induced checkpointing algorithm called Scalable Fully-Informed (S-FI) that attacks the problem of message overhead. For this, our algorithm modifies the Fully-Informed algorithm by integrating it with the immediate dependency principle. The S-FI algorithm was simulated and the result shows that the algorithm is scalable since the message overhead presents an under-linear growth as the number of processes and/or the message density increase.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.E96.D.886/_p
Copy
@ARTICLE{e96-d_4_886,
author={Alberto CALIXTO SIMON, Saul E. POMARES HERNANDEZ, Jose Roberto PEREZ CRUZ, Pilar GOMEZ-GIL, Khalil DRIRA, },
journal={IEICE TRANSACTIONS on Information},
title={A Scalable Communication-Induced Checkpointing Algorithm for Distributed Systems},
year={2013},
volume={E96-D},
number={4},
pages={886-896},
abstract={Communication-induced checkpointing (CIC) has two main advantages: first, it allows processes in a distributed computation to take asynchronous checkpoints, and secondly, it avoids the domino effect. To achieve these, CIC algorithms piggyback information on the application messages and take forced local checkpoints when they recognize potentially dangerous patterns. The main disadvantages of CIC algorithms are the amount of overhead per message and the induced storage overhead. In this paper we present a communication-induced checkpointing algorithm called Scalable Fully-Informed (S-FI) that attacks the problem of message overhead. For this, our algorithm modifies the Fully-Informed algorithm by integrating it with the immediate dependency principle. The S-FI algorithm was simulated and the result shows that the algorithm is scalable since the message overhead presents an under-linear growth as the number of processes and/or the message density increase.},
keywords={},
doi={10.1587/transinf.E96.D.886},
ISSN={1745-1361},
month={April},}
Copy
TY - JOUR
TI - A Scalable Communication-Induced Checkpointing Algorithm for Distributed Systems
T2 - IEICE TRANSACTIONS on Information
SP - 886
EP - 896
AU - Alberto CALIXTO SIMON
AU - Saul E. POMARES HERNANDEZ
AU - Jose Roberto PEREZ CRUZ
AU - Pilar GOMEZ-GIL
AU - Khalil DRIRA
PY - 2013
DO - 10.1587/transinf.E96.D.886
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E96-D
IS - 4
JA - IEICE TRANSACTIONS on Information
Y1 - April 2013
AB - Communication-induced checkpointing (CIC) has two main advantages: first, it allows processes in a distributed computation to take asynchronous checkpoints, and secondly, it avoids the domino effect. To achieve these, CIC algorithms piggyback information on the application messages and take forced local checkpoints when they recognize potentially dangerous patterns. The main disadvantages of CIC algorithms are the amount of overhead per message and the induced storage overhead. In this paper we present a communication-induced checkpointing algorithm called Scalable Fully-Informed (S-FI) that attacks the problem of message overhead. For this, our algorithm modifies the Fully-Informed algorithm by integrating it with the immediate dependency principle. The S-FI algorithm was simulated and the result shows that the algorithm is scalable since the message overhead presents an under-linear growth as the number of processes and/or the message density increase.
ER -