PREGMA: A New Fault Tolerant Cluster Using COTS Components for Internet Services

Takeshi MISHIMA; Takeshi AKAIKE

PREGMA: A New Fault Tolerant Cluster Using COTS Components for Internet Services

Takeshi MISHIMA, Takeshi AKAIKE

Full Text Views

0

Cite this

Summary :

We propose a new dependable system called PREGMA (Platform for Reliable Environment based on a General-purpose Machine Architecture). PREGMA aims to meet two requirements -- fault tolerance and low cost -- for Internet services. It can provide fault tolerance, so we can avoid system failure and prevent data corruption, even if faults occur. That is, it masks the faults by running multiple replicated servers, each possessing its own data, in a loosely synchronized manner and delivering the majority vote as output to clients. Moreover, PREGMA is composed of COTS (Commercial Off-The-Shelf) components without modification, which makes it possible to offer the services at a low cost. We investigated two approaches for achieving redundancy of the Coordinator, which is the core of PREGMA: using the primary backup method and the active replication method. We evaluated the effectiveness of PREGMA in terms of throughput overhead, data integrity and recovery time. The results for a prototype show that PREGMA using the Coordinator with the primary backup method outperforms that with the active replication method and has throughput only 3% lower than a non-redundant system. The results also show that, in the event of failure, the recovery time is only less than one second and no data corruption occurs.

Publication: IEICE TRANSACTIONS on Information Vol.E86-D No.12 pp.2517-2526

Publication Date: 2003/12/01

Publicized

Online ISSN

DOI

Type of Manuscript: Special Section PAPER (Special Issue on Dependable Computing)

Category: Dependable Systems

Cite this

Copy

Takeshi MISHIMA, Takeshi AKAIKE, "PREGMA: A New Fault Tolerant Cluster Using COTS Components for Internet Services" in IEICE TRANSACTIONS on Information, vol. E86-D, no. 12, pp. 2517-2526, December 2003, doi: .
Abstract: We propose a new dependable system called PREGMA (Platform for Reliable Environment based on a General-purpose Machine Architecture). PREGMA aims to meet two requirements -- fault tolerance and low cost -- for Internet services. It can provide fault tolerance, so we can avoid system failure and prevent data corruption, even if faults occur. That is, it masks the faults by running multiple replicated servers, each possessing its own data, in a loosely synchronized manner and delivering the majority vote as output to clients. Moreover, PREGMA is composed of COTS (Commercial Off-The-Shelf) components without modification, which makes it possible to offer the services at a low cost. We investigated two approaches for achieving redundancy of the Coordinator, which is the core of PREGMA: using the primary backup method and the active replication method. We evaluated the effectiveness of PREGMA in terms of throughput overhead, data integrity and recovery time. The results for a prototype show that PREGMA using the Coordinator with the primary backup method outperforms that with the active replication method and has throughput only 3% lower than a non-redundant system. The results also show that, in the event of failure, the recovery time is only less than one second and no data corruption occurs.
URL: https://global.ieice.org/en_transactions/information/10.1587/e86-d_12_2517/_p

Copy

@ARTICLE{e86-d_12_2517,
author={Takeshi MISHIMA, Takeshi AKAIKE, },
journal={IEICE TRANSACTIONS on Information},
title={PREGMA: A New Fault Tolerant Cluster Using COTS Components for Internet Services},
year={2003},
volume={E86-D},
number={12},
pages={2517-2526},
abstract={We propose a new dependable system called PREGMA (Platform for Reliable Environment based on a General-purpose Machine Architecture). PREGMA aims to meet two requirements -- fault tolerance and low cost -- for Internet services. It can provide fault tolerance, so we can avoid system failure and prevent data corruption, even if faults occur. That is, it masks the faults by running multiple replicated servers, each possessing its own data, in a loosely synchronized manner and delivering the majority vote as output to clients. Moreover, PREGMA is composed of COTS (Commercial Off-The-Shelf) components without modification, which makes it possible to offer the services at a low cost. We investigated two approaches for achieving redundancy of the Coordinator, which is the core of PREGMA: using the primary backup method and the active replication method. We evaluated the effectiveness of PREGMA in terms of throughput overhead, data integrity and recovery time. The results for a prototype show that PREGMA using the Coordinator with the primary backup method outperforms that with the active replication method and has throughput only 3% lower than a non-redundant system. The results also show that, in the event of failure, the recovery time is only less than one second and no data corruption occurs.},
keywords={},
doi={},
ISSN={},
month={December},}

Copy

TY - JOUR
TI - PREGMA: A New Fault Tolerant Cluster Using COTS Components for Internet Services
T2 - IEICE TRANSACTIONS on Information
SP - 2517
EP - 2526
AU - Takeshi MISHIMA
AU - Takeshi AKAIKE
PY - 2003
DO -
JO - IEICE TRANSACTIONS on Information
SN -
VL - E86-D
IS - 12
JA - IEICE TRANSACTIONS on Information
Y1 - December 2003
AB - We propose a new dependable system called PREGMA (Platform for Reliable Environment based on a General-purpose Machine Architecture). PREGMA aims to meet two requirements -- fault tolerance and low cost -- for Internet services. It can provide fault tolerance, so we can avoid system failure and prevent data corruption, even if faults occur. That is, it masks the faults by running multiple replicated servers, each possessing its own data, in a loosely synchronized manner and delivering the majority vote as output to clients. Moreover, PREGMA is composed of COTS (Commercial Off-The-Shelf) components without modification, which makes it possible to offer the services at a low cost. We investigated two approaches for achieving redundancy of the Coordinator, which is the core of PREGMA: using the primary backup method and the active replication method. We evaluated the effectiveness of PREGMA in terms of throughput overhead, data integrity and recovery time. The results for a prototype show that PREGMA using the Coordinator with the primary backup method outperforms that with the active replication method and has throughput only 3% lower than a non-redundant system. The results also show that, in the event of failure, the recovery time is only less than one second and no data corruption occurs.
ER -