The search functionality is under construction.

Keyword Search Result

[Keyword] deduplication(7hit)

1-7hit
  • Parity Data De-Duplication in All Flash Array-Based OpenStack Cloud Block Storage

    Huiseong HEO  Cheongjin AHN  Deok-Hwan KIM  

     
    LETTER-Data Engineering, Web Information Systems

      Pubricized:
    2016/02/02
      Vol:
    E99-D No:5
      Page(s):
    1384-1387

    In recent years, the need to build solid state drive (SSD)-based cloud storage systems has been increasing in order to process the big data generated by lots of Internet of Things devices and Internet users. Because these kinds of cloud systems require high performance and reliable storage, the use of flash-based Redundant Array of Independent Disks (RAID) will increase. But in flash-based RAID storage, parity data must be updated with every data write operation, which can more quickly overwhelm SSD's lifespan. To solve this problem, this letter proposes parity data deduplication for OpenStack cloud storage systems using an all flash array. Unlike the traditional data deduplication method, it only removes parity data, which will be stored in the parity disks of the all flash array. Experiments show that the proposed parity data deduplication method can efficiently reduce the number of parity data write operations, compared to the traditional data deduplication method.

  • Offline Selective Data Deduplication for Primary Storage Systems

    Sejin PARK  Chanik PARK  

     
    PAPER-Data Engineering, Web Information Systems

      Pubricized:
    2015/10/26
      Vol:
    E99-D No:2
      Page(s):
    370-382

    Data deduplication is a technology that eliminates redundant data to save storage space. Most previous studies on data deduplication target backup storage, where the deduplication ratio and throughput are important. However, data deduplication on primary storage has recently been receiving attention; in this case, I/O latency should be considered equally with the deduplication ratio. Unfortunately, data deduplication causes high sequential-read-latency problems. When a file is created, the file system allocates physically contiguous blocks to support low sequential-read latency. However, the data deduplication process rearranges the block mapping information to eliminate duplicate blocks. Because of this rearrangement, the physical sequentiality of blocks in a file is broken. This makes a sequential-read request slower because it operates like a random-read operation. In this paper, we propose a selective data deduplication scheme for primary storage systems. A selective scheme can achieve a high deduplication ratio and a low I/O latency by applying different data-chunking methods to the files, according to their file access characteristics. In the proposed system, file accesses are characterized by recent access time and the access frequency of each file. No chunking is applied to update-intensive files since they are meaningless in terms of data deduplication. For sequential-read-intensive files, we apply big chunking to preserve their sequentiality on the media. For random-read-intensive files, small chunking is used to increase the deduplication ratio. Experimental evaluation showed that the proposed method achieves a maximum of 86% of an ideal deduplication ratio and 97% of the sequential-read performance of a native file system.

  • A Deduplication-Enabled P2P Protocol for VM Image Distribution

    Choonhwa LEE  Sungho KIM  Eunsam KIM  

     
    LETTER-Information Network

      Pubricized:
    2015/02/19
      Vol:
    E98-D No:5
      Page(s):
    1108-1111

    This paper presents a novel peer-to-peer protocol to efficiently distribute virtual machine images in a datacenter. A primary idea of it is to improve the performance of peer-to-peer content delivery by employing deduplication to take advantage of similarity both among and within VM images in cloud datacenters. The efficacy of the proposed scheme is validated through an evaluation that demonstrates substantial performance gains.

  • Fast Recovery and Low Cost Coexist: When Continuous Data Protection Meets the Cloud

    Yu GU  Chuanyi LIU  Dongsheng WANG  

     
    PAPER

      Vol:
    E97-D No:7
      Page(s):
    1700-1708

    Cloud computing has rising as a new popular service paradigm with typical advantages as ease of use, unlimited resources and pay-as-you-go pricing model. Cloud resources are more flexible and cost-effective than private or colocation resources thus more suitable for storing the outdated backup data that are infrequently accessed by continuous data protection (CDP) systems. However, the cloud achieves low cost at the same time may slow down the recovery procedure due to its low bandwidth and high latency. In this paper, a novel block-level CDP system architecture: MYCDP is proposed to utilize cloud resources as the back-end storage. Unlike traditional delta-encoding based CDP approaches which should traverse all the dependent versions and decode the recovery point, MYCDP adopts data deduplication mechanism to eliminate data redundancy between all versions of all blocks, and constructs a version index for all versions of the protected storage, thus it can use a query-and-fetch process to recover version data. And with a specific version index data structure and a disk/memory hybrid cache module, MYCDP reduces the storage space consumption and data transfer between local and cloud. It also supports deletion of arbitrary versions without risk of invalidating some other versions. Experimental results demonstrate that MYCDP can achieve much lower cost than traditional local based CDP approaches, while remaining almost the same recovery speed with the local based deduplication approach for most recovery cases. Furthermore, MYCDP can obtain both faster recovery and lower cost than cloud based delta-encoding CDP approaches for any recovery points. And MYCDP gets more profits while protecting multiple systems together.

  • Efficient and Secure File Deduplication in Cloud Storage

    Youngjoo SHIN  Kwangjo KIM  

     
    PAPER-Fundamentals of Information Systems

      Vol:
    E97-D No:2
      Page(s):
    184-197

    Outsourcing to a cloud storage brings forth new challenges for the efficient utilization of computing resources as well as simultaneously maintaining privacy and security for the outsourced data. Data deduplication refers to a technique that eliminates redundant data on the storage and the network, and is considered to be one of the most-promising technologies that offers efficient resource utilization in the cloud computing. In terms of data security, however, deduplication obstructs applying encryption on the outsourced data and even causes a side channel through which information can be leaked. Achieving both efficient resource utilization and data security still remains open. This paper addresses this challenging issue and proposes a novel solution that enables data deduplication while also providing the required data security and privacy. We achieve this goal by constructing and utilizing equality predicate encryption schemes which allow to know only equivalence relations between encrypted data. We also utilize a hybrid approach for data deduplication to prevent information leakage due to the side channel. The performance and security analyses indicate that the proposed scheme is efficient to securely manage the outsourced data in the cloud computing.

  • Stride Static Chunking Algorithm for Deduplication System

    Young-Woong KO  Ho-Min JUNG  Wan-Yeon LEE  Min-Ja KIM  Chuck YOO  

     
    LETTER-Computer Systems

      Vol:
    E96-D No:7
      Page(s):
    1544-1547

    In this paper, we propose a stride static chunking deduplication algorithm using a hybrid approach that exploits the advantages of static chunking and byte-shift chunking algorithm. The key contribution of our approach is to reduce the computation time and enhance deduplication performance. We assume that duplicated data blocks are generally gathered into groups; thus, if we find one duplicated data block using byte-shift, then we can find subsequent data blocks with the static chunking approach. Experimental results show that stride static chunking algorithm gives significant benefits over static chunking, byte-shift chunking and variable-length chunking algorithm, particularly for reducing processing time and storage space.

  • Implementation of a Memory Disclosure Attack on Memory Deduplication of Virtual Machines

    Kuniyasu SUZAKI  Kengo IIJIMA  Toshiki YAGI  Cyrille ARTHO  

     
    PAPER-System Security

      Vol:
    E96-A No:1
      Page(s):
    215-224

    Memory deduplication improves the utilization of physical memory by sharing identical blocks of data. Although memory deduplication is most effective when many virtual machines with same operating systems run on a CPU, cross-user memory deduplication is a covert channel and causes serious memory disclosure attack. It reveals the existence of an application or file on another virtual machine. The covert channel is a difference in write access time on deduplicated memory pages that are re-created by Copy-On-Write, but it has some interferences caused by execution environments. This paper indicates that the attack includes implementation issues caused by memory alignment, self-reflection between page cache and heap, and run-time modification (swap-out, anonymous pages, ASLR, preloading mechanism, and self-modification code). However, these problems are avoidable with some techniques. In our experience on KSM (kernel samepage merging) with the KVM virtual machine, the attack could detect the security level of attacked operating systems, find vulnerable applications, and confirm the status of attacked applications.