IEICE global.ieice.org Site

Keyword Search Result

[Keyword] MapReduce(26hit)

21-26hit(26hit)

Design and Evaluation of Materialized View as a Service for Smart City Services with Large-Scale House Log
Shintaro YAMAMOTO Shinsuke MATSUMOTO Sachio SAIKI Masahide NAKAMURA

PAPER

Vol:
E97-D No:7
Page(s):
1709-1718
Smart city services are implemented using various data collected from houses and infrastructure within a city. As the volume and variety of the smart city data becomes huge, individual services have suffered from expensive computation effort and large processing time. In order to reduce the effort and time, this paper proposes a concept of Materialized View as a Service (MVaaS). Using the MVaaS, every application can easily and dynamically construct its own materialized view, in which the raw data is converted and stored in a convenient format with appropriate granularity. Thus, once the view is constructed, the application can quickly access necessary data. In this paper, we design a framework of MVaaS specifically for large-scale house log, managed in a smart-city data platform. In the framework, each application first specifies how the raw data should be filtered, grouped and aggregated. For a given data specification, MVaaS dynamically constructs a MapReduce batch program that converts the raw data into a desired view. The batch is then executed on Hadoop, and the resultant view is stored in HBase. We present case studies using house log in a real home network system. We also conduct an experimental evaluation to compare the response time between cases with and without MVaaS.
VAWS: Constructing Trusted Open Computing System of MapReduce with Verified Participants Open Access
Yan DING Huaimin WANG Lifeng WEI Songzheng CHEN Hongyi FU Xinhai XU

PAPER

Vol:
E97-D No:4
Page(s):
721-732
MapReduce is commonly used as a parallel massive data processing model. When deploying it as a service over the open systems, the computational integrity of the participants is becoming an important issue due to the untrustworthy workers. Current duplication-based solutions can effectively solve non-collusive attacks, yet most of them require a centralized worker to re-compute additional sampled tasks to defend collusive attacks, which makes the worker a bottleneck. In this paper, we try to explore a trusted worker scheduling framework, named VAWS, to detect collusive attackers and assure the integrity of data processing without extra re-computation. Based on the historical results of verification, we construct an Integrity Attestation Graph (IAG) in VAWS to identify malicious mappers and remove them from the framework. To further improve the efficiency of identification, a verification-couple selection method with the IAG guidance is introduced to detect the potential accomplices of the confirmed malicious worker. We have proven the effectiveness of our proposed method on the improvement of system performance in theoretical analysis. Intensive experiments show the accuracy of VAWS is over 97% and the overhead of computation is closed to the ideal value of 2 with the increasing of the number of map tasks in our scheme.
Towards Trusted Result Verification in Mass Data Processing Service
Yan DING Huaimin WANG Peichang SHI Hongyi FU Xinhai XU

PAPER

Vol:
E97-B No:1
Page(s):
19-28
Computation integrity is difficult to verify when mass data processing is outsourced. Current integrity protection mechanisms and policies verify results generated by participating nodes within a computing environment of service providers (SP), which cannot prevent the subjective cheating of SPs. This paper provides an analysis and modeling of computation integrity for mass data processing services. A third-party sampling-result verification method, named TS-TRV, is proposed to prevent lazy cheating by SPs. TS-TRV is a general solution of verification on the intermediate results of common MapReduce jobs, and it utilizes the powerful computing capability of SPs to support verification computing, thus lessening the computing and transmission burdens of the verifier. Theoretical analysis indicates that TS-TRV is effective on detecting the incorrect results with no false positivity and almost no false negativity, while ensuring the authenticity of sampling. Intensive experiments show that the cheating detection rate of TS-TRV achieves over 99% with only a few samples needed, the computation overhead is mainly on the SP, while the network transmission overhead of TS-TRV is only O(log N).
An Efficiency-Aware Scheduling for Data-Intensive Computations on MapReduce Clusters
Hui ZHAO Shuqiang YANG Hua FAN Zhikun CHEN Jinghu XU

PAPER

Vol:
E96-D No:12
Page(s):
2654-2662
Scheduling plays a key role in MapReduce systems. In this paper, we explore the efficiency of an MapReduce cluster running lots of independent and continuously arriving MapReduce jobs. Data locality and load balancing are two important factors to improve computation efficiency in MapReduce systems for data-intensive computations. Traditional cluster scheduling technologies are not well suitable for MapReduce environment, there are some in-used schedulers for the popular open-source Hadoop MapReduce implementation, however, they can not well optimize both factors. Our main objective is to minimize total flowtime of all jobs, given it's a strong NP-hard problem, we adopt some effective heuristics to seek satisfied solution. In this paper, we formalize the scheduling problem as job selection problem, a load balance aware job selection algorithm is proposed, in task level we design a strict data locality tasks scheduling algorithm for map tasks on map machines and a load balance aware scheduling algorithm for reduce tasks on reduce machines. Comprehensive experiments have been conducted to compare our scheduling strategy with well-known Hadoop scheduling strategies. The experimental results validate the efficiency of our proposed scheduling strategy.
Scalable and Adaptive Graph Querying with MapReduce
Song-Hyon KIM Kyong-Ha LEE Inchul SONG Hyebong CHOI Yoon-Joon LEE

LETTER-Fundamentals of Information Systems

Vol:
E96-D No:9
Page(s):
2126-2130
We address the problem of processing graph pattern matching queries over a massive set of data graphs in this letter. As the number of data graphs is growing rapidly, it is often hard to process such queries with serial algorithms in a timely manner. We propose a distributed graph querying algorithm, which employs feature-based comparison and a filter-and-verify scheme working on the MapReduce framework. Moreover, we devise an efficient scheme that adaptively tunes a proper feature size at runtime by sampling data graphs. With various experiments, we show that the proposed method outperforms conventional algorithms in terms of scalability and efficiency.
Skew-Tolerant Key Distribution for Load Balancing in MapReduce
Jihoon SON Hyunsik CHOI Yon Dohn CHUNG

LETTER-Data Engineering, Web Information Systems

Vol:
E95-D No:2
Page(s):
677-680
MapReduce is a parallel processing framework for large scale data. In the reduce phase, MapReduce employs the hash scheme in order to distribute data sharing the same key across cluster nodes. However, this approach is not robust for the skewed data distribution. In this paper, we propose a skew-tolerant key distribution method for MapReduce. The proposed method assigns keys to cluster nodes balancing their workloads. We implemented our proposed method on Hadoop. Through experiments, we evaluate the performance of the proposed method in comparison with the conventional method.