The search functionality is under construction.

Keyword Search Result

[Keyword] SDRAM(9hit)

1-9hit
  • Implementation of the Complete Predictor for DDR3 SDRAM

    Vladimir V. STANKOVIC  Nebojsa Z. MILENKOVIC  Oliver M. VOJINOVIC  

     
    LETTER-Computer System

      Vol:
    E97-D No:3
      Page(s):
    589-592

    In the arsenal of resources for improving computer memory system performance, predictors have gained an increasing role in the past few years. They can suppress the latencies when accessing cache or main memory. In our previous work we proposed predictors that not only close the opened DRAM row but also predict the next row to be opened, hence the name ‘Complete Predictor’. It requires less than 10kB of SRAM for a 2GB SDRAM system. In this paper we evaluate how much additional hardware is needed and whether the activations of the predictors will slow down the DRAM controller.

  • Window Memory Layout Scheme for Alternate Row-Wise/Column-Wise Matrix Access

    Lei GUO  Yuhua TANG  Yong DOU  Yuanwu LEI  Meng MA  Jie ZHOU  

     
    PAPER-Computer System

      Vol:
    E96-D No:12
      Page(s):
    2765-2775

    The effective bandwidth of the dynamic random-access memory (DRAM) for the alternate row-wise/column-wise matrix access (AR/CMA) mode, which is a basic characteristic in scientific and engineering applications, is very low. Therefore, we propose the window memory layout scheme (WMLS), which is a matrix layout scheme that does not require transposition, for AR/CMA applications. This scheme maps one row of a logical matrix into a rectangular memory window of the DRAM to balance the bandwidth of the row- and column-wise matrix access and to increase the DRAM IO bandwidth. The optimal window configuration is theoretically analyzed to minimize the total number of no-data-visit operations of the DRAM. Different WMLS implementationsare presented according to the memory structure of field-programmable gata array (FPGA), CPU, and GPU platforms. Experimental results show that the proposed WMLS can significantly improve DRAM bandwidth for AR/CMA applications. achieved speedup factors of 1.6× and 2.0× are achieved for the general-purpose CPU and GPU platforms, respectively. For the FPGA platform, the WMLS DRAM controller is custom. The maximum bandwidth for the AR/CMA mode reaches 5.94 GB/s, which is a 73.6% improvement compared with that of the traditional row-wise access mode. Finally, we apply WMLS scheme for Chirp Scaling SAR application, comparing with the traditional access approach, the maximum speedup factors of 4.73X, 1.33X and 1.56X can be achieved for FPGA, CPU and GPU platform, respectively.

  • DDR3 SDRAM with a Complete Predictor

    Vladimir V. STANKOVIC  Nebojsa Z. MILENKOVIC  

     
    LETTER-Computer System

      Vol:
    E93-D No:9
      Page(s):
    2635-2638

    In the arsenal of resources for improving computer memory system performance, predictors have gained an increasing role in the past few years. They enable hiding the latencies when accessing cache or main memory. In our previous work we proposed a DDR SDRAM controller with predictors that not only close the opened DRAM row but also predict the next row to be opened. In this paper we explore the possibilities of trying the same techniques on the latest type of DRAM memory, DDR3 SDRAM, with further improvements of the predictors.

  • Cache Optimization for H.264/AVC Motion Compensation

    Sangyong YOON  Soo-Ik CHAE  

     
    LETTER-Image Processing and Video Processing

      Vol:
    E91-D No:12
      Page(s):
    2902-2905

    In this letter, we propose a cache organization that substantially reduces the memory bandwidth of motion compensation (MC) in the H.264/AVC decoders. To reduce duplicated memory accesses to P and B pictures, we employ a four-way set-associative cache in which its index bits are composed of horizontal and vertical address bits of the frame buffer and each line stores an 8 2 pixel data in the reference frames. Moreover, we alleviate the data fragmentation problem by selecting its line size that equals the minimum access size of the DDR SDRAM. The bandwidth of the optimized cache averaged over five QCIF IBBP image sequences requires only 129% of the essential bandwidth of an H.264/AVC MC.

  • Efficient Memory Utilization for High-Speed FPGA-Based Hardware Emulators with SDRAMs

    Kohei HOSOKAWA  Katsunori TANAKA  Yuichi NAKAMURA  

     
    PAPER-System Level Design

      Vol:
    E90-A No:12
      Page(s):
    2810-2817

    FPGA-based hardware emulators are often used for the verification of LSI functions. They generally have dedicated external memories, such as SDRAMs, to compensate for the lack of memory capacity in FPGAs. In such a case, access between the FPGAs and the dedicated external memory may represent a major bottleneck with respect to emulation speed since the dedicated external memory may have to emulate a large number of memory blocks. In this paper, we propose three methods, "Dynamic Clock Control (DCC)," "Memory Mapping Optimization (MMO)," and "Efficient Access Scheduling (EAS)," to avoid this bottleneck. DCC controls an emulation clock dynamically in accord with the number of memory accesses within one emulation clock cycle. EAS optimizes the ordering of memory access to the dedicated external memory, and MMO optimizes the arrangement of the dedicated external memory addresses to which respective memories will be emulated. With them, emulation speed can be made 29.0 times faster, as evaluated in actual LSI emulations.

  • A Decision Feedback Equalizing Receiver for the SSTL SDRAM Interface with Clock-Data Skew Compensation

    Young-Soo SOHN  Seung-Jun BAE  Hong-June PARK  Soo-In CHO  

     
    PAPER-Integrated Electronics

      Vol:
    E87-C No:5
      Page(s):
    809-817

    A CMOS DFE (decision feedback equalization) receiver with a clock-data skew compensation was implemented for the SSTL (stub-series terminated logic) SDRAM interface. The receiver consists of a 2 way interleaving DFE input buffer for ISI reduction and a X2 over-sampling phase detector for finding the optimum sampling clock position. The measurement results at 1.2 Gbps operation showed the increase of voltage margin by about 20% and the decrease of time jitter in the recovered sampling clock by about 40% by equalization in an SSTL channel with 2 pF 4 stub load. Active chip area and power consumption are 3001000 µm2 and 142 mW, respectively, with a 2.5 V, 0.25 µm CMOS process.

  • A Hierarchical Timing Adjuster Featuring Intermittent Measurement for Use in Low-Power DDR SDRAMs

    Satoru HANZAWA  Hiromasa NODA  Takeshi SAKATA  Osamu NAGASHIMA  Sadayuki MORITA  Masanori ISODA  Michiyo SUZUKI  Sadayuki OHKUMA  Kyoko MURAKAMI  

     
    PAPER-Optoelectronics

      Vol:
    E85-C No:8
      Page(s):
    1625-1633

    A hierarchical timing adjuster that operates with intermittent adjustment has been developed for use in low-power DDR SDRAMs. Intermittent adjustment reduces power consumption in both coarse- and fine-delay circuits. Furthermore, the current-controlled fine-tuning of delay is free of short-circuit current and achieves a resolution of about 0.1 ns. In a design with 0.16-µm node technology, these techniques make the hierarchical timing adjuster able to reduce the operating current to 4.8 mA, which is 20% for the value in a conventional scheme with every-cycle measurement. The proposed timing adjuster achieves a three-cycle lock-in and only generates an internal clock pulse that has coarse resolution in the second cycle. The circuit operates over the range from 60 to 150 MHz, and occupies 0.29 mm2.

  • High-Level Synthesis with SDRAMs and RAMBUS DRAMs

    Asheesh KHARE  Preeti R. PANDA  Nikil D. DUTT  Alexandru NICOLAU  

     
    PAPER

      Vol:
    E82-A No:11
      Page(s):
    2347-2355

    Newer off-chip DRAM families, including Synchronous DRAMs (SDRAMs) and RAMBUS DRAMs (RDRAMs), are becoming standard choices for the design of high-performance systems. Although previous work in High-Level Synthesis (HLS) has addressed exploiting features of page-mode DRAMs, techniques do not exist for exploiting the two key features of these newer DRAM families that boost memory performance and help overcome bandwidth limitations: (1) burst mode access, and (2) interleaved access through multiple banks. We address pre-synthesis optimizations on the input behavior that extract and exploit the burst mode and multiple bank interleaved access modes of these newer DRAM families, so that these features can be exploited fully during the HLS trajectory. Our experiments, run on a suite of memory-intensive benchmarks using a contemporary SDRAM library, demonstrate significant performance improvements of up to 62.5% over the naive approach, and improvements of up to 16.7% over the previous approach that considered only page-mode or extended-data-out (EDO) DRAMS.

  • A 180 MHz Multiple-Registered 16 Mbit SDRAM with Flexible Timing Scheme

    Hisashi IWAMOTO  Naoya WATANABE  Akira YAMAZAKI  Seiji SAWADA  Yasumitsu MURAI  Yasuhiro KONISHI  Hiroshi ITOH  Masaki KUMANOYA  

     
    PAPER-DRAM

      Vol:
    E77-C No:8
      Page(s):
    1328-1333

    A multiple-registered architecture is described for 180 MHz 16 Mbit synchronous DRAM. The proposed architecture realizes a flexible control of critical timings such as I/O line busy time and achieves an operation at 180 MHz clock rate with area penalty of only 5.4% over the conventional DRAM.