The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] parallel memory(4hit)

1-4hit
  • A Highly Parallel Architecture for Deblocking Filter in H.264/AVC

    Lingfeng LI  Satoshi GOTO  Takeshi IKENAGA  

     
    PAPER-Parallel and/or Distributed Processing Systems

      Vol:
    E88-D No:7
      Page(s):
    1623-1629

    This paper presents a highly parallel architecture for deblocking filter in H.264/AVC. We adopt various parallel schemes in memory sub-system and datapath. A 2-dimensional parallel memory scheme is employed to support efficient parallel access in both horizontal and vertical directions in order to speed up the whole filtering process. This parallel memory also eliminates the need for a transpose circuit. In the datapath, an algorithm optimization is performed to implement parallel filtering with hardware reuse. Pipeline techniques are also adopted to improve the throughput of filtering operations. Our design is implemented under TSMC 0.18 µm technology. Results show that the core size is 0.821.13 mm2 when the maximum frequency is 230 MHz. Compared to other existing architectures, our design has advantages in both speed and area.

  • Address Computation in Configurable Parallel Memory Architecture

    Eero AHO  Jarno VANNE  Kimmo KUUSILINNA  Timo D. HAMALAINEN  

     
    PAPER-Networking and System Architectures

      Vol:
    E87-D No:7
      Page(s):
    1674-1681

    Parallel memories increase memory bandwidth with several memory modules working in parallel and can be used to feed a processor with only necessary data. The Configurable Parallel Memory Architecture (CPMA) enables a multitude of access formats and module assignment functions to be used within a single hardware implementation, which has not been possible in prior embedded parallel memory systems. This paper focuses on address computation in CPMA, which is implemented using several configurable computation units in parallel. One unit is dedicated for each type of access formats and module assignment functions that the implementation supports. Timing and area estimates are given for a 0.25-micron CMOS process. The utilized resources are shown to be linearly proportional to the number of memory modules.

  • Data Distribution and Alignment Scheme for Conflict-Free Memory Access in Parallel Image Processing System

    Gil-Yoon KIM  Yunju BAEK  Heung-Kyu LEE  

     
    PAPER-Computer Hardware and Design

      Vol:
    E81-D No:8
      Page(s):
    806-812

    In this paper, we give a solution to the problem of conflict-free access of various slices of data in parallel processor for image processing. Image processing operations require a memory system that permits parallel and conflict-free access of rows, columns, forward diagonals, backward diagonals, and blocks of two-dimensional image array for an arbitrary location. Linear skewing schemes are useful methods for those requirements, but these schemes require complex Euclidean division by prime number. On the contrary, nonlinear skewing schemes such as XOR-schemes have more advantages than the linear ones in address generation, but these schemes allow conflict-free access of some array slices in restricted region. In this paper, we propose a new XOR-scheme which allows conflict-free access of arbitrarily located various slices of data for image processing, with a two-fold the number of memory modules than that of processing elements. Further, we propose an efficient data alignment network which consists of log N + 2-stage multistage interconnection network utilizing Omega network.

  • Scalable Parallel Memory Architecture with a Skew Scheme

    Tadayuki SAKAKIBARA  Katsuyoshi KITAI  Tadaaki ISOBE  Shigeko YAZAWA  Teruo TANAKA  Yasuhiro INAGAMI  Yoshiko TAMAKI  

     
    PAPER-Computer Architecture

      Vol:
    E80-D No:9
      Page(s):
    933-941

    We present a scalable parallel memory architecture with a skew scheme by which permanent-concentration-free strides, if any, do not depend on the number of ways in parallel memory interleaving. The permanent-concentration is a kind of memory access conflict. With conventional skew schemes, permanent-concentration-free strides depended on the number of banks (or bank groups) in parallel memory (=number of ways in parallel memory interleaving). We analyze two kinds of cause of conflicts: permanent-concentration occurs when memory access requests concentrate in limited number of banks (or bank groups) in parallel memory, and transient-concentration, when memory access requests transiently concentrate in some banks (or bank groups) in parallel memory. We have identified permanent-concentration-free strides, which are independent of the number of banks (or bank groups) in parallel memory, by solving two concentrations separately. The strategy is to increase the size of address block of shifting address assignment to the parallel memory in order to reduce permanent-concentrations, and make the size of the buffer for each banks (or bank groups) in the parallel memory match the size of address block of shifting in order to absorb transient-concentrations. The skew scheme uses the same size of address block of shifting address assignment for memory systems for different numbers of banks (or bank groups) in parallel memory. As a result, scalability for permanent-concentration-free strides is achieved independent of the number of banks (or bank groups) in parallel memory.