Data replication in cloud storage systems brings a lot of benefits, such as fault tolerance, data availability, data locality and load balancing both from reliability and performance perspectives. However, each time a datanode fails, data blocks stored on the failed datanode must be restored to maintain replication level. This may be a large burden for the system in which resources are highly utilized with users' application workloads. Although there have been many proposals for replication, the approach of re-replication has not been properly addressed yet. In this paper, we present a deferred re-replication algorithm to dynamically shift the re-replication workload based on current resource utilization status of the system. As workload pattern varies depending on the time of the day, simulation results from synthetic workload demonstrate a large opportunity for minimizing impacts on users' application workloads with the simple algorithm that adjusts re-replication based on current resource utilization. Our approach can reduce performance impacts on users' application workloads while ensuring the same reliability level as default HDFS can provide.
Thanda SHWE
Kumamoto University
Masayoshi ARITSUGI
Kumamoto University
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Thanda SHWE, Masayoshi ARITSUGI, "Avoiding Performance Impacts by Re-Replication Workload Shifting in HDFS Based Cloud Storage" in IEICE TRANSACTIONS on Information,
vol. E101-D, no. 12, pp. 2958-2967, December 2018, doi: 10.1587/transinf.2018PAP0017.
Abstract: Data replication in cloud storage systems brings a lot of benefits, such as fault tolerance, data availability, data locality and load balancing both from reliability and performance perspectives. However, each time a datanode fails, data blocks stored on the failed datanode must be restored to maintain replication level. This may be a large burden for the system in which resources are highly utilized with users' application workloads. Although there have been many proposals for replication, the approach of re-replication has not been properly addressed yet. In this paper, we present a deferred re-replication algorithm to dynamically shift the re-replication workload based on current resource utilization status of the system. As workload pattern varies depending on the time of the day, simulation results from synthetic workload demonstrate a large opportunity for minimizing impacts on users' application workloads with the simple algorithm that adjusts re-replication based on current resource utilization. Our approach can reduce performance impacts on users' application workloads while ensuring the same reliability level as default HDFS can provide.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2018PAP0017/_p
Copy
@ARTICLE{e101-d_12_2958,
author={Thanda SHWE, Masayoshi ARITSUGI, },
journal={IEICE TRANSACTIONS on Information},
title={Avoiding Performance Impacts by Re-Replication Workload Shifting in HDFS Based Cloud Storage},
year={2018},
volume={E101-D},
number={12},
pages={2958-2967},
abstract={Data replication in cloud storage systems brings a lot of benefits, such as fault tolerance, data availability, data locality and load balancing both from reliability and performance perspectives. However, each time a datanode fails, data blocks stored on the failed datanode must be restored to maintain replication level. This may be a large burden for the system in which resources are highly utilized with users' application workloads. Although there have been many proposals for replication, the approach of re-replication has not been properly addressed yet. In this paper, we present a deferred re-replication algorithm to dynamically shift the re-replication workload based on current resource utilization status of the system. As workload pattern varies depending on the time of the day, simulation results from synthetic workload demonstrate a large opportunity for minimizing impacts on users' application workloads with the simple algorithm that adjusts re-replication based on current resource utilization. Our approach can reduce performance impacts on users' application workloads while ensuring the same reliability level as default HDFS can provide.},
keywords={},
doi={10.1587/transinf.2018PAP0017},
ISSN={1745-1361},
month={December},}
Copy
TY - JOUR
TI - Avoiding Performance Impacts by Re-Replication Workload Shifting in HDFS Based Cloud Storage
T2 - IEICE TRANSACTIONS on Information
SP - 2958
EP - 2967
AU - Thanda SHWE
AU - Masayoshi ARITSUGI
PY - 2018
DO - 10.1587/transinf.2018PAP0017
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E101-D
IS - 12
JA - IEICE TRANSACTIONS on Information
Y1 - December 2018
AB - Data replication in cloud storage systems brings a lot of benefits, such as fault tolerance, data availability, data locality and load balancing both from reliability and performance perspectives. However, each time a datanode fails, data blocks stored on the failed datanode must be restored to maintain replication level. This may be a large burden for the system in which resources are highly utilized with users' application workloads. Although there have been many proposals for replication, the approach of re-replication has not been properly addressed yet. In this paper, we present a deferred re-replication algorithm to dynamically shift the re-replication workload based on current resource utilization status of the system. As workload pattern varies depending on the time of the day, simulation results from synthetic workload demonstrate a large opportunity for minimizing impacts on users' application workloads with the simple algorithm that adjusts re-replication based on current resource utilization. Our approach can reduce performance impacts on users' application workloads while ensuring the same reliability level as default HDFS can provide.
ER -