To analyze large-scale data efficiently, studies on Hadoop, one of the most popular MapReduce frameworks, have been actively done. Meanwhile, most of the large-scale data analysis applications, e.g., data clustering, are required to do the same map and reduce functions repeatedly. However, Hadoop cannot provide an optimal performance for iterative MapReduce jobs because it derives a result by doing one phase of map and reduce functions. To solve the problems, in this paper, we propose a new efficient resource management framework for iterative MapReduce processing in large-scale data analysis. For this, we first design an iterative job state-machine for managing the iterative MapReduce jobs. Secondly, we propose an invariant data caching mechanism for reducing the I/O costs of data accesses. Thirdly, we propose an iterative resource management technique for efficiently managing the resources of a Hadoop cluster. Fourthly, we devise a stop condition check mechanism for preventing unnecessary computation. Finally, we show the performance superiority of the proposed framework by comparing it with the existing frameworks.
Seungtae HONG
Electronics and Telecommunications Research Institute
Kyongseok PARK
Korea Institute of Science and Technology Information
Chae-Deok LIM
Electronics and Telecommunications Research Institute
Jae-Woo CHANG
Chonbuk National University
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Seungtae HONG, Kyongseok PARK, Chae-Deok LIM, Jae-Woo CHANG, "A New Efficient Resource Management Framework for Iterative MapReduce Processing in Large-Scale Data Analysis" in IEICE TRANSACTIONS on Information,
vol. E100-D, no. 4, pp. 704-717, April 2017, doi: 10.1587/transinf.2016DAP0013.
Abstract: To analyze large-scale data efficiently, studies on Hadoop, one of the most popular MapReduce frameworks, have been actively done. Meanwhile, most of the large-scale data analysis applications, e.g., data clustering, are required to do the same map and reduce functions repeatedly. However, Hadoop cannot provide an optimal performance for iterative MapReduce jobs because it derives a result by doing one phase of map and reduce functions. To solve the problems, in this paper, we propose a new efficient resource management framework for iterative MapReduce processing in large-scale data analysis. For this, we first design an iterative job state-machine for managing the iterative MapReduce jobs. Secondly, we propose an invariant data caching mechanism for reducing the I/O costs of data accesses. Thirdly, we propose an iterative resource management technique for efficiently managing the resources of a Hadoop cluster. Fourthly, we devise a stop condition check mechanism for preventing unnecessary computation. Finally, we show the performance superiority of the proposed framework by comparing it with the existing frameworks.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2016DAP0013/_p
Copy
@ARTICLE{e100-d_4_704,
author={Seungtae HONG, Kyongseok PARK, Chae-Deok LIM, Jae-Woo CHANG, },
journal={IEICE TRANSACTIONS on Information},
title={A New Efficient Resource Management Framework for Iterative MapReduce Processing in Large-Scale Data Analysis},
year={2017},
volume={E100-D},
number={4},
pages={704-717},
abstract={To analyze large-scale data efficiently, studies on Hadoop, one of the most popular MapReduce frameworks, have been actively done. Meanwhile, most of the large-scale data analysis applications, e.g., data clustering, are required to do the same map and reduce functions repeatedly. However, Hadoop cannot provide an optimal performance for iterative MapReduce jobs because it derives a result by doing one phase of map and reduce functions. To solve the problems, in this paper, we propose a new efficient resource management framework for iterative MapReduce processing in large-scale data analysis. For this, we first design an iterative job state-machine for managing the iterative MapReduce jobs. Secondly, we propose an invariant data caching mechanism for reducing the I/O costs of data accesses. Thirdly, we propose an iterative resource management technique for efficiently managing the resources of a Hadoop cluster. Fourthly, we devise a stop condition check mechanism for preventing unnecessary computation. Finally, we show the performance superiority of the proposed framework by comparing it with the existing frameworks.},
keywords={},
doi={10.1587/transinf.2016DAP0013},
ISSN={1745-1361},
month={April},}
Copy
TY - JOUR
TI - A New Efficient Resource Management Framework for Iterative MapReduce Processing in Large-Scale Data Analysis
T2 - IEICE TRANSACTIONS on Information
SP - 704
EP - 717
AU - Seungtae HONG
AU - Kyongseok PARK
AU - Chae-Deok LIM
AU - Jae-Woo CHANG
PY - 2017
DO - 10.1587/transinf.2016DAP0013
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E100-D
IS - 4
JA - IEICE TRANSACTIONS on Information
Y1 - April 2017
AB - To analyze large-scale data efficiently, studies on Hadoop, one of the most popular MapReduce frameworks, have been actively done. Meanwhile, most of the large-scale data analysis applications, e.g., data clustering, are required to do the same map and reduce functions repeatedly. However, Hadoop cannot provide an optimal performance for iterative MapReduce jobs because it derives a result by doing one phase of map and reduce functions. To solve the problems, in this paper, we propose a new efficient resource management framework for iterative MapReduce processing in large-scale data analysis. For this, we first design an iterative job state-machine for managing the iterative MapReduce jobs. Secondly, we propose an invariant data caching mechanism for reducing the I/O costs of data accesses. Thirdly, we propose an iterative resource management technique for efficiently managing the resources of a Hadoop cluster. Fourthly, we devise a stop condition check mechanism for preventing unnecessary computation. Finally, we show the performance superiority of the proposed framework by comparing it with the existing frameworks.
ER -