The search functionality is under construction.

Author Search Result

[Author] Dongsheng WANG(8hit)

1-8hit
  • Content-Aware Write Reduction Mechanism of 3D Stacked Phase-Change RAM Based Frame Store in H.264 Video Codec System

    Sanchuan GUO  Zhenyu LIU  Guohong LI  Takeshi IKENAGA  Dongsheng WANG  

     
    PAPER

      Vol:
    E96-A No:6
      Page(s):
    1273-1282

    H.264 video codec system requires big capacity and high bandwidth of Frame Store (FS) for buffering reference frames. The up-to-date three dimensional (3D) stacked Phase change Random Access Memory (PRAM) is the promising approach for on-chip caching the reference signals, as 3D stacking offers high memory bandwidth, while PRAM possesses the advantages in terms of high density and low leakage power. However, the write endurance problem, that is a PRAM cell can only tolerant limited number of write operations, becomes the main barrier in practical applications. This paper studies the wear reduction techniques of PRAM based FS in H.264 codec system. On the basis of rate-distortion theory, the content oriented selective writing mechanisms are proposed to reduce bit updates in the reference frame buffers. With the proposed control parameter a, our methods make the quantitative trade off between the quality degradation and the PRAM lifetime prolongation. Specifically, taking a in the range of [0.2,2], experimental results demonstrate that, our methods averagely save 29.9–35.5% bit-wise write operations and reduce 52–57% power, at the cost of 12.95–20.57% BDBR bit-rate increase accordingly.

  • Cryptanalysis of Remote Data Integrity Checking Protocol Proposed by L. Chen for Cloud Storage

    Shaojing FU  Dongsheng WANG  Ming XU  Jiangchun REN  

     
    LETTER-Cryptography and Information Security

      Vol:
    E97-A No:1
      Page(s):
    418-420

    Remote data possession checking for cloud storage is very important, since data owners can check the integrity of outsourced data without downloading a copy to their local computers. In a previous work, Chen proposed a remote data possession checking protocol using algebraic signature and showed that it can resist against various known attacks. In this paper, we find serious security flaws in Chen's protocol, and shows that it is vulnerable to replay attack by a malicious cloud server. Finally, we propose an improved version of the protocol to guarantee secure data storage for data owners.

  • Reusing the Results of Queries in MapReduce Systems by Adopting Shared Storage

    Zhanye WANG  Chuanyi LIU  Dongsheng WANG  

     
    PAPER

      Vol:
    E99-B No:2
      Page(s):
    315-325

    Over the last few years, Apache MapReduce has become the prevailing framework for large scale data processing. Instead of writing MapReduce programs which are too obscure to express, many developers usually adopt high level query languages, such as Hive or Pig Latin, to finish their complex queries. These languages automatically compile each query into a workflow of MapReduce jobs, so they greatly facilitate the querying and management of large datasets. One option to speed up the execution of workflows is to save the results produced previously and reuse them in the future if needed. In this paper we present SuperRack, which uses shared storage devices to store the results of each workflow and allows a new query to reuse these results in order to avoid redundant computation and hasten execution. We propose several novel techniques to improve the access and storage efficiency of the previous results. We also evaluate SuperRack to verify its feasibility and effectiveness. Experiments show that our solution outperforms Hive significantly under TPC-H benchmark and real life workloads.

  • Improving Cache Partitioning Algorithms for Pseudo-LRU Policies

    Xi ZHANG  Chuanyi LIU  Zhenyu LIU  Dongsheng WANG  

     
    PAPER

      Vol:
    E96-D No:12
      Page(s):
    2514-2523

    As the number of concurrently running applications on the chip multiprocessors (CMPs) is increasing, efficient management of the shared last-level cache (LLC) is crucial to guarantee overall performance. Recent studies have shown that cache partitioning can provide benefits in throughput, fairness and quality of service. Most prior arts apply true Least Recently Used (LRU) as the underlying cache replacement policy and rely on its stack property to work properly. However, in commodity processors, pseudo-LRU policies without stack property are commonly used instead of LRU for their simplicity and low storage overhead. Therefore, this study sets out to understand whether LRU-based cache partitioning techniques can be applied to commodity processors. In this work, we instead propose a cache partitioning mechanism for two popular pseudo-LRU policies: Not Recently Used (NRU) and Binary Tree (BT). Without the help of true LRU's stack property, we propose a profiling logic that applies curve approximation methods to derive the hit curve (hit counts under varied way allocations) for an application. We then propose a hybrid partitioning mechanism, which mitigates the gap between the predicted hit curve and the actual statistics. Simulation results demonstrate that our proposal can improve throughput by 15.3% on average and outperforms the stack-estimate proposal by 12.6% on average. Similar results can be achieved in weighted speedup. For the cache configurations under study, it requires less than 0.5% storage overhead compared to the last-level cache. In addition, we also show that profiling mechanism with only one true LRU ATD achieves comparable performance and can further reduce the hardware cost by nearly two thirds compared with the hybrid mechanism.

  • Architecture and Circuit Optimization of Hardwired Integer Motion Estimation Engine for H.264/AVC

    Zhenyu LIU  Dongsheng WANG  Takeshi IKENAGA  

     
    PAPER-Image Processing

      Vol:
    E93-A No:11
      Page(s):
    2065-2073

    Variable block size motion estimation developed by the latest video coding standard H.264/AVC is the efficient approach to reduce the temporal redundancies. The intensive computational complexity coming from the variable block size technique makes the hardwired accelerator essential, for real-time applications. Propagate partial sums of absolute differences (Propagate Partial SAD) and SAD Tree hardwired engines outperform other counterparts, especially considering the impact of supporting variable block size technique. In this paper, the authors apply the architecture-level and the circuit-level approaches to improve the maximum operating frequency and reduce the hardware overhead of Propagate Partial SAD and SAD Tree, while other metrics, in terms of latency, memory bandwidth and hardware utilization, of the original architectures are maintained. Experiments demonstrate that by using the proposed approaches, at 110.8 MHz operating frequency, compared with the original architectures, 14.7% and 18.0% gate count can be saved for Propagate Partial SAD and SAD Tree, respectively. With TSMC 0.18 µm 1P6M CMOS technology, the proposed Propagate Partial SAD architecture achieves 231.6 MHz operating frequency at a cost of 84.1 k gates. Correspondingly, the maximum work frequency of the optimized SAD Tree architecture is improved to 204.8 MHz, which is almost two times of the original one, while its hardware overhead is merely 88.5 k-gate.

  • Bayesian Theory Based Adaptive Proximity Data Accessing for CMP Caches

    Guohong LI  Zhenyu LIU  Sanchuan GUO  Dongsheng WANG  

     
    PAPER

      Vol:
    E96-A No:6
      Page(s):
    1293-1305

    As the number of cores and the working sets of parallel workloads increase, shared L2 caches exhibit fewer misses than private L2 caches by making a better use of the total available cache capacity, but they induce higher overall L1 miss latencies because of the longer average distance between the requestor and the home node, and the potential congestions at certain nodes. We observed that there is a high probability that the target data of an L1 miss resides in the L1 cache of a neighbor node. In such cases, these long-distance accesses to the home nodes can be potentially avoided. In order to leverage the aforementioned property, we propose Bayesian Theory based Adaptive Proximity Data Accessing (APDA). In our proposal, we organize the multi-core into clusters of 2x2 nodes, and introduce the Proximity Data Prober (PDP) to detect whether an L1 miss can be served by one of the cluster L1 caches. Furthermore, we devise the Bayesian Decision Classifier (BDC) to adaptively select the remote L2 cache or the neighboring L1 node as the server according to the minimum miss cost. We evaluate this approach on a 64-node multi-core using SPLASH-2 and PARSEC benchmarks, and we find that the APDA can reduce the execution time by 20% and reduce the energy by 14% compared to a standard multi-core with a shared L2. The experimental results demonstrate that our proposal outperforms the up-to-date mechanisms, such as ASR, DCC and RNUCA.

  • A Novel Cache Replacement Policy via Dynamic Adaptive Insertion and Re-Reference Prediction

    Xi ZHANG  Chongmin LI  Zhenyu LIU  Haixia WANG  Dongsheng WANG  Takeshi IKENAGA  

     
    PAPER

      Vol:
    E94-C No:4
      Page(s):
    468-476

    Previous research illustrates that LRU replacement policy is not efficient when applications exhibit a distant re-reference interval. Recently RRIP policy is proposed to improve the performance for such kind of workloads. However, the lack of access recency information in RRIP confuses the replacement policy to make the accurate prediction. To enhance the robustness of RRIP for recency-friendly workloads, we propose an Dynamic Adaptive Insertion and Re-reference Prediction (DAI-RRP) policy which evicts data based on both re-reference prediction value and the access recency information. DAI-RRP makes adaptive adjustment on insertion position and prediction value for different access patterns, which makes the policy robust across different workloads and different phases. Simulation results show that DAI-RRP outperforms LRU and RRIP. For a single-core processor with a 1 MB 16-way set last-level cache (LLC), DAI-RRP reduces CPI over LRU and Dynamic RRIP by an average of 8.1% and 2.7% respectively. Evaluations on quad-core CMP with a 4 MB shared LLC show that DAI-RRP outperforms LRU and Dynamic RRIP (DRRIP) on the weighted speedup metric by an average of 8.1% and 15.7% respectively. Furthermore, compared to LRU, DAI-RRP consumes the similar hardware for 16-way cache, or even less hardware for high-associativity cache. In summary, the proposed policy is practical and can be easily integrated into existing hardware approximations of LRU.

  • Fast Recovery and Low Cost Coexist: When Continuous Data Protection Meets the Cloud

    Yu GU  Chuanyi LIU  Dongsheng WANG  

     
    PAPER

      Vol:
    E97-D No:7
      Page(s):
    1700-1708

    Cloud computing has rising as a new popular service paradigm with typical advantages as ease of use, unlimited resources and pay-as-you-go pricing model. Cloud resources are more flexible and cost-effective than private or colocation resources thus more suitable for storing the outdated backup data that are infrequently accessed by continuous data protection (CDP) systems. However, the cloud achieves low cost at the same time may slow down the recovery procedure due to its low bandwidth and high latency. In this paper, a novel block-level CDP system architecture: MYCDP is proposed to utilize cloud resources as the back-end storage. Unlike traditional delta-encoding based CDP approaches which should traverse all the dependent versions and decode the recovery point, MYCDP adopts data deduplication mechanism to eliminate data redundancy between all versions of all blocks, and constructs a version index for all versions of the protected storage, thus it can use a query-and-fetch process to recover version data. And with a specific version index data structure and a disk/memory hybrid cache module, MYCDP reduces the storage space consumption and data transfer between local and cloud. It also supports deletion of arbitrary versions without risk of invalidating some other versions. Experimental results demonstrate that MYCDP can achieve much lower cost than traditional local based CDP approaches, while remaining almost the same recovery speed with the local based deduplication approach for most recovery cases. Furthermore, MYCDP can obtain both faster recovery and lower cost than cloud based delta-encoding CDP approaches for any recovery points. And MYCDP gets more profits while protecting multiple systems together.