Avoiding Performance Impacts by Re-Replication Workload Shifting in HDFS Based Cloud Storage

Thanda SHWE; Masayoshi ARITSUGI

doi:10.1587/transinf.2018PAP0017

IEICE TRANSACTIONS on Information

Avoiding Performance Impacts by Re-Replication Workload Shifting in HDFS Based Cloud Storage

Thanda SHWE, Masayoshi ARITSUGI

Full Text Views

0

Cite this

Summary :

Data replication in cloud storage systems brings a lot of benefits, such as fault tolerance, data availability, data locality and load balancing both from reliability and performance perspectives. However, each time a datanode fails, data blocks stored on the failed datanode must be restored to maintain replication level. This may be a large burden for the system in which resources are highly utilized with users' application workloads. Although there have been many proposals for replication, the approach of re-replication has not been properly addressed yet. In this paper, we present a deferred re-replication algorithm to dynamically shift the re-replication workload based on current resource utilization status of the system. As workload pattern varies depending on the time of the day, simulation results from synthetic workload demonstrate a large opportunity for minimizing impacts on users' application workloads with the simple algorithm that adjusts re-replication based on current resource utilization. Our approach can reduce performance impacts on users' application workloads while ensuring the same reliability level as default HDFS can provide.

Publication: IEICE TRANSACTIONS on Information Vol.E101-D No.12 pp.2958-2967

Publication Date: 2018/12/01

Publicized: 2018/09/18

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2018PAP0017

Type of Manuscript: Special Section PAPER (Special Section on Parallel and Distributed Computing and Networking)

Category: Cloud Computing

Authors

Thanda SHWE
Kumamoto University
Masayoshi ARITSUGI
Kumamoto University

Keyword

re-replication, fault tolerance, data reliability, HDFS

Cite this

Copy

Thanda SHWE, Masayoshi ARITSUGI, "Avoiding Performance Impacts by Re-Replication Workload Shifting in HDFS Based Cloud Storage" in IEICE TRANSACTIONS on Information, vol. E101-D, no. 12, pp. 2958-2967, December 2018, doi: 10.1587/transinf.2018PAP0017.
Abstract: Data replication in cloud storage systems brings a lot of benefits, such as fault tolerance, data availability, data locality and load balancing both from reliability and performance perspectives. However, each time a datanode fails, data blocks stored on the failed datanode must be restored to maintain replication level. This may be a large burden for the system in which resources are highly utilized with users' application workloads. Although there have been many proposals for replication, the approach of re-replication has not been properly addressed yet. In this paper, we present a deferred re-replication algorithm to dynamically shift the re-replication workload based on current resource utilization status of the system. As workload pattern varies depending on the time of the day, simulation results from synthetic workload demonstrate a large opportunity for minimizing impacts on users' application workloads with the simple algorithm that adjusts re-replication based on current resource utilization. Our approach can reduce performance impacts on users' application workloads while ensuring the same reliability level as default HDFS can provide.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2018PAP0017/_p

Copy

@ARTICLE{e101-d_12_2958,
author={Thanda SHWE, Masayoshi ARITSUGI, },
journal={IEICE TRANSACTIONS on Information},
title={Avoiding Performance Impacts by Re-Replication Workload Shifting in HDFS Based Cloud Storage},
year={2018},
volume={E101-D},
number={12},
pages={2958-2967},
abstract={Data replication in cloud storage systems brings a lot of benefits, such as fault tolerance, data availability, data locality and load balancing both from reliability and performance perspectives. However, each time a datanode fails, data blocks stored on the failed datanode must be restored to maintain replication level. This may be a large burden for the system in which resources are highly utilized with users' application workloads. Although there have been many proposals for replication, the approach of re-replication has not been properly addressed yet. In this paper, we present a deferred re-replication algorithm to dynamically shift the re-replication workload based on current resource utilization status of the system. As workload pattern varies depending on the time of the day, simulation results from synthetic workload demonstrate a large opportunity for minimizing impacts on users' application workloads with the simple algorithm that adjusts re-replication based on current resource utilization. Our approach can reduce performance impacts on users' application workloads while ensuring the same reliability level as default HDFS can provide.},
keywords={},
doi={10.1587/transinf.2018PAP0017},
ISSN={1745-1361},
month={December},}

Copy

TY - JOUR
TI - Avoiding Performance Impacts by Re-Replication Workload Shifting in HDFS Based Cloud Storage
T2 - IEICE TRANSACTIONS on Information
SP - 2958
EP - 2967
AU - Thanda SHWE
AU - Masayoshi ARITSUGI
PY - 2018
DO - 10.1587/transinf.2018PAP0017
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E101-D
IS - 12
JA - IEICE TRANSACTIONS on Information
Y1 - December 2018
AB - Data replication in cloud storage systems brings a lot of benefits, such as fault tolerance, data availability, data locality and load balancing both from reliability and performance perspectives. However, each time a datanode fails, data blocks stored on the failed datanode must be restored to maintain replication level. This may be a large burden for the system in which resources are highly utilized with users' application workloads. Although there have been many proposals for replication, the approach of re-replication has not been properly addressed yet. In this paper, we present a deferred re-replication algorithm to dynamically shift the re-replication workload based on current resource utilization status of the system. As workload pattern varies depending on the time of the day, simulation results from synthetic workload demonstrate a large opportunity for minimizing impacts on users' application workloads with the simple algorithm that adjusts re-replication based on current resource utilization. Our approach can reduce performance impacts on users' application workloads while ensuring the same reliability level as default HDFS can provide.
ER -

IEICE TRANSACTIONS on Information