System Status Aware Hadoop Scheduling Methods for Job Performance Improvement

Masatoshi KAWARASAKI; Hyuma WATANABE

doi:10.1587/transinf.2014EDP7385

System Status Aware Hadoop Scheduling Methods for Job Performance Improvement

Masatoshi KAWARASAKI, Hyuma WATANABE

Full Text Views

0

Cite this

Summary :

MapReduce and its open software implementation Hadoop are now widely deployed for big data analysis. As MapReduce runs over a cluster of massive machines, data transfer often becomes a bottleneck in job processing. In this paper, we explore the influence of data transfer to job processing performance and analyze the mechanism of job performance deterioration caused by data transfer oriented congestion at disk I/O and/or network I/O. Based on this analysis, we update Hadoop's Heartbeat messages to contain the real time system status for each machine, like disk I/O and link usage rate. This enhancement makes Hadoop's scheduler be aware of each machine's workload and make more accurate decision of scheduling. The experiment has been done to evaluate the effectiveness of enhanced scheduling methods and discussions are provided to compare the several proposed scheduling policies.

Publication: IEICE TRANSACTIONS on Information Vol.E98-D No.7 pp.1275-1285

Publication Date: 2015/07/01

Publicized: 2015/03/26

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2014EDP7385

Type of Manuscript: PAPER

Category: Fundamentals of Information Systems

Authors

Masatoshi KAWARASAKI
University of Tsukuba
Hyuma WATANABE
University of Tsukuba

Keyword

Hadoop, MapReduce, distributed computing, task scheduling, job performance

Cite this

Copy

Masatoshi KAWARASAKI, Hyuma WATANABE, "System Status Aware Hadoop Scheduling Methods for Job Performance Improvement" in IEICE TRANSACTIONS on Information, vol. E98-D, no. 7, pp. 1275-1285, July 2015, doi: 10.1587/transinf.2014EDP7385.
Abstract: MapReduce and its open software implementation Hadoop are now widely deployed for big data analysis. As MapReduce runs over a cluster of massive machines, data transfer often becomes a bottleneck in job processing. In this paper, we explore the influence of data transfer to job processing performance and analyze the mechanism of job performance deterioration caused by data transfer oriented congestion at disk I/O and/or network I/O. Based on this analysis, we update Hadoop's Heartbeat messages to contain the real time system status for each machine, like disk I/O and link usage rate. This enhancement makes Hadoop's scheduler be aware of each machine's workload and make more accurate decision of scheduling. The experiment has been done to evaluate the effectiveness of enhanced scheduling methods and discussions are provided to compare the several proposed scheduling policies.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2014EDP7385/_p

Copy

@ARTICLE{e98-d_7_1275,
author={Masatoshi KAWARASAKI, Hyuma WATANABE, },
journal={IEICE TRANSACTIONS on Information},
title={System Status Aware Hadoop Scheduling Methods for Job Performance Improvement},
year={2015},
volume={E98-D},
number={7},
pages={1275-1285},
abstract={MapReduce and its open software implementation Hadoop are now widely deployed for big data analysis. As MapReduce runs over a cluster of massive machines, data transfer often becomes a bottleneck in job processing. In this paper, we explore the influence of data transfer to job processing performance and analyze the mechanism of job performance deterioration caused by data transfer oriented congestion at disk I/O and/or network I/O. Based on this analysis, we update Hadoop's Heartbeat messages to contain the real time system status for each machine, like disk I/O and link usage rate. This enhancement makes Hadoop's scheduler be aware of each machine's workload and make more accurate decision of scheduling. The experiment has been done to evaluate the effectiveness of enhanced scheduling methods and discussions are provided to compare the several proposed scheduling policies.},
keywords={},
doi={10.1587/transinf.2014EDP7385},
ISSN={1745-1361},
month={July},}

Copy

TY - JOUR
TI - System Status Aware Hadoop Scheduling Methods for Job Performance Improvement
T2 - IEICE TRANSACTIONS on Information
SP - 1275
EP - 1285
AU - Masatoshi KAWARASAKI
AU - Hyuma WATANABE
PY - 2015
DO - 10.1587/transinf.2014EDP7385
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E98-D
IS - 7
JA - IEICE TRANSACTIONS on Information
Y1 - July 2015
AB - MapReduce and its open software implementation Hadoop are now widely deployed for big data analysis. As MapReduce runs over a cluster of massive machines, data transfer often becomes a bottleneck in job processing. In this paper, we explore the influence of data transfer to job processing performance and analyze the mechanism of job performance deterioration caused by data transfer oriented congestion at disk I/O and/or network I/O. Based on this analysis, we update Hadoop's Heartbeat messages to contain the real time system status for each machine, like disk I/O and link usage rate. This enhancement makes Hadoop's scheduler be aware of each machine's workload and make more accurate decision of scheduling. The experiment has been done to evaluate the effectiveness of enhanced scheduling methods and discussions are provided to compare the several proposed scheduling policies.
ER -