An Efficiency-Aware Scheduling for Data-Intensive Computations on MapReduce Clusters

Hui ZHAO; Shuqiang YANG; Hua FAN; Zhikun CHEN; Jinghu XU

doi:10.1587/transinf.E96.D.2654

IEICE TRANSACTIONS on Information

An Efficiency-Aware Scheduling for Data-Intensive Computations on MapReduce Clusters

Hui ZHAO, Shuqiang YANG, Hua FAN, Zhikun CHEN, Jinghu XU

Full Text Views

0

Cite this

Summary :

Scheduling plays a key role in MapReduce systems. In this paper, we explore the efficiency of an MapReduce cluster running lots of independent and continuously arriving MapReduce jobs. Data locality and load balancing are two important factors to improve computation efficiency in MapReduce systems for data-intensive computations. Traditional cluster scheduling technologies are not well suitable for MapReduce environment, there are some in-used schedulers for the popular open-source Hadoop MapReduce implementation, however, they can not well optimize both factors. Our main objective is to minimize total flowtime of all jobs, given it's a strong NP-hard problem, we adopt some effective heuristics to seek satisfied solution. In this paper, we formalize the scheduling problem as job selection problem, a load balance aware job selection algorithm is proposed, in task level we design a strict data locality tasks scheduling algorithm for map tasks on map machines and a load balance aware scheduling algorithm for reduce tasks on reduce machines. Comprehensive experiments have been conducted to compare our scheduling strategy with well-known Hadoop scheduling strategies. The experimental results validate the efficiency of our proposed scheduling strategy.

Publication: IEICE TRANSACTIONS on Information Vol.E96-D No.12 pp.2654-2662

Publication Date: 2013/12/01

Publicized

Online ISSN: 1745-1361

DOI: 10.1587/transinf.E96.D.2654

Type of Manuscript: Special Section PAPER (Special Section on Parallel and Distributed Computing and Networking)

Category

Authors

Hui ZHAO
  National University of Defense Technology
Shuqiang YANG
  National University of Defense Technology
Hua FAN
  National University of Defense Technology
Zhikun CHEN
  National University of Defense Technology
Jinghu XU
  National University of Defense Technology

Keyword

data-intensive computation, MapReduce, Hadoop, algorithm design, scheduling, grid computing, data locality, cloud computing, flowtime

Cite this

Copy

Hui ZHAO, Shuqiang YANG, Hua FAN, Zhikun CHEN, Jinghu XU, "An Efficiency-Aware Scheduling for Data-Intensive Computations on MapReduce Clusters" in IEICE TRANSACTIONS on Information, vol. E96-D, no. 12, pp. 2654-2662, December 2013, doi: 10.1587/transinf.E96.D.2654.
Abstract: Scheduling plays a key role in MapReduce systems. In this paper, we explore the efficiency of an MapReduce cluster running lots of independent and continuously arriving MapReduce jobs. Data locality and load balancing are two important factors to improve computation efficiency in MapReduce systems for data-intensive computations. Traditional cluster scheduling technologies are not well suitable for MapReduce environment, there are some in-used schedulers for the popular open-source Hadoop MapReduce implementation, however, they can not well optimize both factors. Our main objective is to minimize total flowtime of all jobs, given it's a strong NP-hard problem, we adopt some effective heuristics to seek satisfied solution. In this paper, we formalize the scheduling problem as job selection problem, a load balance aware job selection algorithm is proposed, in task level we design a strict data locality tasks scheduling algorithm for map tasks on map machines and a load balance aware scheduling algorithm for reduce tasks on reduce machines. Comprehensive experiments have been conducted to compare our scheduling strategy with well-known Hadoop scheduling strategies. The experimental results validate the efficiency of our proposed scheduling strategy.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.E96.D.2654/_p

Copy

@ARTICLE{e96-d_12_2654,
author={Hui ZHAO, Shuqiang YANG, Hua FAN, Zhikun CHEN, Jinghu XU, },
journal={IEICE TRANSACTIONS on Information},
title={An Efficiency-Aware Scheduling for Data-Intensive Computations on MapReduce Clusters},
year={2013},
volume={E96-D},
number={12},
pages={2654-2662},
abstract={Scheduling plays a key role in MapReduce systems. In this paper, we explore the efficiency of an MapReduce cluster running lots of independent and continuously arriving MapReduce jobs. Data locality and load balancing are two important factors to improve computation efficiency in MapReduce systems for data-intensive computations. Traditional cluster scheduling technologies are not well suitable for MapReduce environment, there are some in-used schedulers for the popular open-source Hadoop MapReduce implementation, however, they can not well optimize both factors. Our main objective is to minimize total flowtime of all jobs, given it's a strong NP-hard problem, we adopt some effective heuristics to seek satisfied solution. In this paper, we formalize the scheduling problem as job selection problem, a load balance aware job selection algorithm is proposed, in task level we design a strict data locality tasks scheduling algorithm for map tasks on map machines and a load balance aware scheduling algorithm for reduce tasks on reduce machines. Comprehensive experiments have been conducted to compare our scheduling strategy with well-known Hadoop scheduling strategies. The experimental results validate the efficiency of our proposed scheduling strategy.},
keywords={},
doi={10.1587/transinf.E96.D.2654},
ISSN={1745-1361},
month={December},}

Copy

TY - JOUR
TI - An Efficiency-Aware Scheduling for Data-Intensive Computations on MapReduce Clusters
T2 - IEICE TRANSACTIONS on Information
SP - 2654
EP - 2662
AU - Hui ZHAO
AU - Shuqiang YANG
AU - Hua FAN
AU - Zhikun CHEN
AU - Jinghu XU
PY - 2013
DO - 10.1587/transinf.E96.D.2654
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E96-D
IS - 12
JA - IEICE TRANSACTIONS on Information
Y1 - December 2013
AB - Scheduling plays a key role in MapReduce systems. In this paper, we explore the efficiency of an MapReduce cluster running lots of independent and continuously arriving MapReduce jobs. Data locality and load balancing are two important factors to improve computation efficiency in MapReduce systems for data-intensive computations. Traditional cluster scheduling technologies are not well suitable for MapReduce environment, there are some in-used schedulers for the popular open-source Hadoop MapReduce implementation, however, they can not well optimize both factors. Our main objective is to minimize total flowtime of all jobs, given it's a strong NP-hard problem, we adopt some effective heuristics to seek satisfied solution. In this paper, we formalize the scheduling problem as job selection problem, a load balance aware job selection algorithm is proposed, in task level we design a strict data locality tasks scheduling algorithm for map tasks on map machines and a load balance aware scheduling algorithm for reduce tasks on reduce machines. Comprehensive experiments have been conducted to compare our scheduling strategy with well-known Hadoop scheduling strategies. The experimental results validate the efficiency of our proposed scheduling strategy.
ER -

IEICE TRANSACTIONS on Information