1-2hit |
Lei CHEN Wei LU Ergude BAO Liqiang WANG Weiwei XING Yuanyuan CAI
MapReduce is an effective framework for processing large datasets in parallel over a cluster. Data locality and data skew on the reduce side are two essential issues in MapReduce. Improving data locality can decrease network traffic by moving reduce tasks to the nodes where the reducer input data is located. Data skew will lead to load imbalance among reducer nodes. Partitioning is an important feature of MapReduce because it determines the reducer nodes to which map output results will be sent. Therefore, an effective partitioner can improve MapReduce performance by increasing data locality and decreasing data skew on the reduce side. Previous studies considering both essential issues can be divided into two categories: those that preferentially improve data locality, such as LEEN, and those that preferentially improve load balance, such as CLP. However, all these studies ignore the fact that for different types of jobs, the priority of data locality and data skew on the reduce side may produce different effects on the execution time. In this paper, we propose a naive Bayes classifier based partitioner, namely, BAPM, which achieves better performance because it can automatically choose the proper algorithm (LEEN or CLP) by leveraging the naive Bayes classifier, i.e., considering job type and bandwidth as classification attributes. Our experiments are performed in a Hadoop cluster, and the results show that BAPM boosts the computing performance of MapReduce. The selection accuracy reaches 95.15%. Further, compared with other popular algorithms, under specific bandwidths, the improvement BAPM achieved is up to 31.31%.
Wei LU Weidong WANG Ergude BAO Liqiang WANG Weiwei XING Yue CHEN
Web Service Composition (WSC) has been well recognized as a convenient and flexible way of service sharing and integration in service-oriented application fields. WSC aims at selecting and composing a set of initial services with respect to the Quality of Service (QoS) values of their attributes (e.g., price), in order to complete a complex task and meet user requirements. A major research challenge of the QoS-aware WSC problem is to select a proper set of services to maximize the QoS of the composite service meeting several QoS constraints upon various attributes, e.g. total price or runtime. In this article, a fast algorithm based on QoS-aware sampling (FAQS) is proposed, which can efficiently find the near-optimal composition result from sampled services. FAQS consists of five steps as follows. 1) QoS normalization is performed to unify different metrics for QoS attributes. 2) The normalized services are sampled and categorized by guaranteeing similar number of services in each class. 3) The frequencies of the sampled services are calculated to guarantee the composed services are the most frequent ones. This process ensures that the sampled services cover as many as possible initial services. 4) The sampled services are composed by solving a linear programming problem. 5) The initial composition results are further optimized by solving a modified multi-choice multi-dimensional knapsack problem (MMKP). Experimental results indicate that FAQS is much faster than existing algorithms and could obtain stable near-optimal result.