The search functionality is under construction.

Keyword Search Result

[Keyword] MapReduce(26hit)

21-26hit(26hit)

  • Design and Evaluation of Materialized View as a Service for Smart City Services with Large-Scale House Log

    Shintaro YAMAMOTO  Shinsuke MATSUMOTO  Sachio SAIKI  Masahide NAKAMURA  

     
    PAPER

      Vol:
    E97-D No:7
      Page(s):
    1709-1718

    Smart city services are implemented using various data collected from houses and infrastructure within a city. As the volume and variety of the smart city data becomes huge, individual services have suffered from expensive computation effort and large processing time. In order to reduce the effort and time, this paper proposes a concept of Materialized View as a Service (MVaaS). Using the MVaaS, every application can easily and dynamically construct its own materialized view, in which the raw data is converted and stored in a convenient format with appropriate granularity. Thus, once the view is constructed, the application can quickly access necessary data. In this paper, we design a framework of MVaaS specifically for large-scale house log, managed in a smart-city data platform. In the framework, each application first specifies how the raw data should be filtered, grouped and aggregated. For a given data specification, MVaaS dynamically constructs a MapReduce batch program that converts the raw data into a desired view. The batch is then executed on Hadoop, and the resultant view is stored in HBase. We present case studies using house log in a real home network system. We also conduct an experimental evaluation to compare the response time between cases with and without MVaaS.

  • VAWS: Constructing Trusted Open Computing System of MapReduce with Verified Participants Open Access

    Yan DING  Huaimin WANG  Lifeng WEI  Songzheng CHEN  Hongyi FU  Xinhai XU  

     
    PAPER

      Vol:
    E97-D No:4
      Page(s):
    721-732

    MapReduce is commonly used as a parallel massive data processing model. When deploying it as a service over the open systems, the computational integrity of the participants is becoming an important issue due to the untrustworthy workers. Current duplication-based solutions can effectively solve non-collusive attacks, yet most of them require a centralized worker to re-compute additional sampled tasks to defend collusive attacks, which makes the worker a bottleneck. In this paper, we try to explore a trusted worker scheduling framework, named VAWS, to detect collusive attackers and assure the integrity of data processing without extra re-computation. Based on the historical results of verification, we construct an Integrity Attestation Graph (IAG) in VAWS to identify malicious mappers and remove them from the framework. To further improve the efficiency of identification, a verification-couple selection method with the IAG guidance is introduced to detect the potential accomplices of the confirmed malicious worker. We have proven the effectiveness of our proposed method on the improvement of system performance in theoretical analysis. Intensive experiments show the accuracy of VAWS is over 97% and the overhead of computation is closed to the ideal value of 2 with the increasing of the number of map tasks in our scheme.

  • Towards Trusted Result Verification in Mass Data Processing Service

    Yan DING  Huaimin WANG  Peichang SHI  Hongyi FU  Xinhai XU  

     
    PAPER

      Vol:
    E97-B No:1
      Page(s):
    19-28

    Computation integrity is difficult to verify when mass data processing is outsourced. Current integrity protection mechanisms and policies verify results generated by participating nodes within a computing environment of service providers (SP), which cannot prevent the subjective cheating of SPs. This paper provides an analysis and modeling of computation integrity for mass data processing services. A third-party sampling-result verification method, named TS-TRV, is proposed to prevent lazy cheating by SPs. TS-TRV is a general solution of verification on the intermediate results of common MapReduce jobs, and it utilizes the powerful computing capability of SPs to support verification computing, thus lessening the computing and transmission burdens of the verifier. Theoretical analysis indicates that TS-TRV is effective on detecting the incorrect results with no false positivity and almost no false negativity, while ensuring the authenticity of sampling. Intensive experiments show that the cheating detection rate of TS-TRV achieves over 99% with only a few samples needed, the computation overhead is mainly on the SP, while the network transmission overhead of TS-TRV is only O(log N).

  • An Efficiency-Aware Scheduling for Data-Intensive Computations on MapReduce Clusters

    Hui ZHAO  Shuqiang YANG  Hua FAN  Zhikun CHEN  Jinghu XU  

     
    PAPER

      Vol:
    E96-D No:12
      Page(s):
    2654-2662

    Scheduling plays a key role in MapReduce systems. In this paper, we explore the efficiency of an MapReduce cluster running lots of independent and continuously arriving MapReduce jobs. Data locality and load balancing are two important factors to improve computation efficiency in MapReduce systems for data-intensive computations. Traditional cluster scheduling technologies are not well suitable for MapReduce environment, there are some in-used schedulers for the popular open-source Hadoop MapReduce implementation, however, they can not well optimize both factors. Our main objective is to minimize total flowtime of all jobs, given it's a strong NP-hard problem, we adopt some effective heuristics to seek satisfied solution. In this paper, we formalize the scheduling problem as job selection problem, a load balance aware job selection algorithm is proposed, in task level we design a strict data locality tasks scheduling algorithm for map tasks on map machines and a load balance aware scheduling algorithm for reduce tasks on reduce machines. Comprehensive experiments have been conducted to compare our scheduling strategy with well-known Hadoop scheduling strategies. The experimental results validate the efficiency of our proposed scheduling strategy.

  • Scalable and Adaptive Graph Querying with MapReduce

    Song-Hyon KIM  Kyong-Ha LEE  Inchul SONG  Hyebong CHOI  Yoon-Joon LEE  

     
    LETTER-Fundamentals of Information Systems

      Vol:
    E96-D No:9
      Page(s):
    2126-2130

    We address the problem of processing graph pattern matching queries over a massive set of data graphs in this letter. As the number of data graphs is growing rapidly, it is often hard to process such queries with serial algorithms in a timely manner. We propose a distributed graph querying algorithm, which employs feature-based comparison and a filter-and-verify scheme working on the MapReduce framework. Moreover, we devise an efficient scheme that adaptively tunes a proper feature size at runtime by sampling data graphs. With various experiments, we show that the proposed method outperforms conventional algorithms in terms of scalability and efficiency.

  • Skew-Tolerant Key Distribution for Load Balancing in MapReduce

    Jihoon SON  Hyunsik CHOI  Yon Dohn CHUNG  

     
    LETTER-Data Engineering, Web Information Systems

      Vol:
    E95-D No:2
      Page(s):
    677-680

    MapReduce is a parallel processing framework for large scale data. In the reduce phase, MapReduce employs the hash scheme in order to distribute data sharing the same key across cluster nodes. However, this approach is not robust for the skewed data distribution. In this paper, we propose a skew-tolerant key distribution method for MapReduce. The proposed method assigns keys to cluster nodes balancing their workloads. We implemented our proposed method on Hadoop. Through experiments, we evaluate the performance of the proposed method in comparison with the conventional method.

21-26hit(26hit)