We propose a new dependable system called PREGMA (Platform for Reliable Environment based on a General-purpose Machine Architecture). PREGMA aims to meet two requirements -- fault tolerance and low cost -- for Internet services. It can provide fault tolerance, so we can avoid system failure and prevent data corruption, even if faults occur. That is, it masks the faults by running multiple replicated servers, each possessing its own data, in a loosely synchronized manner and delivering the majority vote as output to clients. Moreover, PREGMA is composed of COTS (Commercial Off-The-Shelf) components without modification, which makes it possible to offer the services at a low cost. We investigated two approaches for achieving redundancy of the Coordinator, which is the core of PREGMA: using the primary backup method and the active replication method. We evaluated the effectiveness of PREGMA in terms of throughput overhead, data integrity and recovery time. The results for a prototype show that PREGMA using the Coordinator with the primary backup method outperforms that with the active replication method and has throughput only 3% lower than a non-redundant system. The results also show that, in the event of failure, the recovery time is only less than one second and no data corruption occurs.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Takeshi MISHIMA, Takeshi AKAIKE, "PREGMA: A New Fault Tolerant Cluster Using COTS Components for Internet Services" in IEICE TRANSACTIONS on Information,
vol. E86-D, no. 12, pp. 2517-2526, December 2003, doi: .
Abstract: We propose a new dependable system called PREGMA (Platform for Reliable Environment based on a General-purpose Machine Architecture). PREGMA aims to meet two requirements -- fault tolerance and low cost -- for Internet services. It can provide fault tolerance, so we can avoid system failure and prevent data corruption, even if faults occur. That is, it masks the faults by running multiple replicated servers, each possessing its own data, in a loosely synchronized manner and delivering the majority vote as output to clients. Moreover, PREGMA is composed of COTS (Commercial Off-The-Shelf) components without modification, which makes it possible to offer the services at a low cost. We investigated two approaches for achieving redundancy of the Coordinator, which is the core of PREGMA: using the primary backup method and the active replication method. We evaluated the effectiveness of PREGMA in terms of throughput overhead, data integrity and recovery time. The results for a prototype show that PREGMA using the Coordinator with the primary backup method outperforms that with the active replication method and has throughput only 3% lower than a non-redundant system. The results also show that, in the event of failure, the recovery time is only less than one second and no data corruption occurs.
URL: https://global.ieice.org/en_transactions/information/10.1587/e86-d_12_2517/_p
Copy
@ARTICLE{e86-d_12_2517,
author={Takeshi MISHIMA, Takeshi AKAIKE, },
journal={IEICE TRANSACTIONS on Information},
title={PREGMA: A New Fault Tolerant Cluster Using COTS Components for Internet Services},
year={2003},
volume={E86-D},
number={12},
pages={2517-2526},
abstract={We propose a new dependable system called PREGMA (Platform for Reliable Environment based on a General-purpose Machine Architecture). PREGMA aims to meet two requirements -- fault tolerance and low cost -- for Internet services. It can provide fault tolerance, so we can avoid system failure and prevent data corruption, even if faults occur. That is, it masks the faults by running multiple replicated servers, each possessing its own data, in a loosely synchronized manner and delivering the majority vote as output to clients. Moreover, PREGMA is composed of COTS (Commercial Off-The-Shelf) components without modification, which makes it possible to offer the services at a low cost. We investigated two approaches for achieving redundancy of the Coordinator, which is the core of PREGMA: using the primary backup method and the active replication method. We evaluated the effectiveness of PREGMA in terms of throughput overhead, data integrity and recovery time. The results for a prototype show that PREGMA using the Coordinator with the primary backup method outperforms that with the active replication method and has throughput only 3% lower than a non-redundant system. The results also show that, in the event of failure, the recovery time is only less than one second and no data corruption occurs.},
keywords={},
doi={},
ISSN={},
month={December},}
Copy
TY - JOUR
TI - PREGMA: A New Fault Tolerant Cluster Using COTS Components for Internet Services
T2 - IEICE TRANSACTIONS on Information
SP - 2517
EP - 2526
AU - Takeshi MISHIMA
AU - Takeshi AKAIKE
PY - 2003
DO -
JO - IEICE TRANSACTIONS on Information
SN -
VL - E86-D
IS - 12
JA - IEICE TRANSACTIONS on Information
Y1 - December 2003
AB - We propose a new dependable system called PREGMA (Platform for Reliable Environment based on a General-purpose Machine Architecture). PREGMA aims to meet two requirements -- fault tolerance and low cost -- for Internet services. It can provide fault tolerance, so we can avoid system failure and prevent data corruption, even if faults occur. That is, it masks the faults by running multiple replicated servers, each possessing its own data, in a loosely synchronized manner and delivering the majority vote as output to clients. Moreover, PREGMA is composed of COTS (Commercial Off-The-Shelf) components without modification, which makes it possible to offer the services at a low cost. We investigated two approaches for achieving redundancy of the Coordinator, which is the core of PREGMA: using the primary backup method and the active replication method. We evaluated the effectiveness of PREGMA in terms of throughput overhead, data integrity and recovery time. The results for a prototype show that PREGMA using the Coordinator with the primary backup method outperforms that with the active replication method and has throughput only 3% lower than a non-redundant system. The results also show that, in the event of failure, the recovery time is only less than one second and no data corruption occurs.
ER -