1-7hit |
Huiseong HEO Cheongjin AHN Deok-Hwan KIM
In recent years, the need to build solid state drive (SSD)-based cloud storage systems has been increasing in order to process the big data generated by lots of Internet of Things devices and Internet users. Because these kinds of cloud systems require high performance and reliable storage, the use of flash-based Redundant Array of Independent Disks (RAID) will increase. But in flash-based RAID storage, parity data must be updated with every data write operation, which can more quickly overwhelm SSD's lifespan. To solve this problem, this letter proposes parity data deduplication for OpenStack cloud storage systems using an all flash array. Unlike the traditional data deduplication method, it only removes parity data, which will be stored in the parity disks of the all flash array. Experiments show that the proposed parity data deduplication method can efficiently reduce the number of parity data write operations, compared to the traditional data deduplication method.
Data deduplication is a technology that eliminates redundant data to save storage space. Most previous studies on data deduplication target backup storage, where the deduplication ratio and throughput are important. However, data deduplication on primary storage has recently been receiving attention; in this case, I/O latency should be considered equally with the deduplication ratio. Unfortunately, data deduplication causes high sequential-read-latency problems. When a file is created, the file system allocates physically contiguous blocks to support low sequential-read latency. However, the data deduplication process rearranges the block mapping information to eliminate duplicate blocks. Because of this rearrangement, the physical sequentiality of blocks in a file is broken. This makes a sequential-read request slower because it operates like a random-read operation. In this paper, we propose a selective data deduplication scheme for primary storage systems. A selective scheme can achieve a high deduplication ratio and a low I/O latency by applying different data-chunking methods to the files, according to their file access characteristics. In the proposed system, file accesses are characterized by recent access time and the access frequency of each file. No chunking is applied to update-intensive files since they are meaningless in terms of data deduplication. For sequential-read-intensive files, we apply big chunking to preserve their sequentiality on the media. For random-read-intensive files, small chunking is used to increase the deduplication ratio. Experimental evaluation showed that the proposed method achieves a maximum of 86% of an ideal deduplication ratio and 97% of the sequential-read performance of a native file system.
Choonhwa LEE Sungho KIM Eunsam KIM
This paper presents a novel peer-to-peer protocol to efficiently distribute virtual machine images in a datacenter. A primary idea of it is to improve the performance of peer-to-peer content delivery by employing deduplication to take advantage of similarity both among and within VM images in cloud datacenters. The efficacy of the proposed scheme is validated through an evaluation that demonstrates substantial performance gains.
Yu GU Chuanyi LIU Dongsheng WANG
Cloud computing has rising as a new popular service paradigm with typical advantages as ease of use, unlimited resources and pay-as-you-go pricing model. Cloud resources are more flexible and cost-effective than private or colocation resources thus more suitable for storing the outdated backup data that are infrequently accessed by continuous data protection (CDP) systems. However, the cloud achieves low cost at the same time may slow down the recovery procedure due to its low bandwidth and high latency. In this paper, a novel block-level CDP system architecture: MYCDP is proposed to utilize cloud resources as the back-end storage. Unlike traditional delta-encoding based CDP approaches which should traverse all the dependent versions and decode the recovery point, MYCDP adopts data deduplication mechanism to eliminate data redundancy between all versions of all blocks, and constructs a version index for all versions of the protected storage, thus it can use a query-and-fetch process to recover version data. And with a specific version index data structure and a disk/memory hybrid cache module, MYCDP reduces the storage space consumption and data transfer between local and cloud. It also supports deletion of arbitrary versions without risk of invalidating some other versions. Experimental results demonstrate that MYCDP can achieve much lower cost than traditional local based CDP approaches, while remaining almost the same recovery speed with the local based deduplication approach for most recovery cases. Furthermore, MYCDP can obtain both faster recovery and lower cost than cloud based delta-encoding CDP approaches for any recovery points. And MYCDP gets more profits while protecting multiple systems together.
Outsourcing to a cloud storage brings forth new challenges for the efficient utilization of computing resources as well as simultaneously maintaining privacy and security for the outsourced data. Data deduplication refers to a technique that eliminates redundant data on the storage and the network, and is considered to be one of the most-promising technologies that offers efficient resource utilization in the cloud computing. In terms of data security, however, deduplication obstructs applying encryption on the outsourced data and even causes a side channel through which information can be leaked. Achieving both efficient resource utilization and data security still remains open. This paper addresses this challenging issue and proposes a novel solution that enables data deduplication while also providing the required data security and privacy. We achieve this goal by constructing and utilizing equality predicate encryption schemes which allow to know only equivalence relations between encrypted data. We also utilize a hybrid approach for data deduplication to prevent information leakage due to the side channel. The performance and security analyses indicate that the proposed scheme is efficient to securely manage the outsourced data in the cloud computing.
Young-Woong KO Ho-Min JUNG Wan-Yeon LEE Min-Ja KIM Chuck YOO
In this paper, we propose a stride static chunking deduplication algorithm using a hybrid approach that exploits the advantages of static chunking and byte-shift chunking algorithm. The key contribution of our approach is to reduce the computation time and enhance deduplication performance. We assume that duplicated data blocks are generally gathered into groups; thus, if we find one duplicated data block using byte-shift, then we can find subsequent data blocks with the static chunking approach. Experimental results show that stride static chunking algorithm gives significant benefits over static chunking, byte-shift chunking and variable-length chunking algorithm, particularly for reducing processing time and storage space.
Kuniyasu SUZAKI Kengo IIJIMA Toshiki YAGI Cyrille ARTHO
Memory deduplication improves the utilization of physical memory by sharing identical blocks of data. Although memory deduplication is most effective when many virtual machines with same operating systems run on a CPU, cross-user memory deduplication is a covert channel and causes serious memory disclosure attack. It reveals the existence of an application or file on another virtual machine. The covert channel is a difference in write access time on deduplicated memory pages that are re-created by Copy-On-Write, but it has some interferences caused by execution environments. This paper indicates that the attack includes implementation issues caused by memory alignment, self-reflection between page cache and heap, and run-time modification (swap-out, anonymous pages, ASLR, preloading mechanism, and self-modification code). However, these problems are avoidable with some techniques. In our experience on KSM (kernel samepage merging) with the KVM virtual machine, the attack could detect the security level of attacked operating systems, find vulnerable applications, and confirm the status of attacked applications.