IEICE global.ieice.org Site

Keyword Search Result

[Keyword] SDRAM(9hit)

1-9hit

Implementation of the Complete Predictor for DDR3 SDRAM
Vladimir V. STANKOVIC Nebojsa Z. MILENKOVIC Oliver M. VOJINOVIC

LETTER-Computer System

Vol:
E97-D No:3
Page(s):
589-592
In the arsenal of resources for improving computer memory system performance, predictors have gained an increasing role in the past few years. They can suppress the latencies when accessing cache or main memory. In our previous work we proposed predictors that not only close the opened DRAM row but also predict the next row to be opened, hence the name ‘Complete Predictor’. It requires less than 10kB of SRAM for a 2GB SDRAM system. In this paper we evaluate how much additional hardware is needed and whether the activations of the predictors will slow down the DRAM controller.
Window Memory Layout Scheme for Alternate Row-Wise/Column-Wise Matrix Access
Lei GUO Yuhua TANG Yong DOU Yuanwu LEI Meng MA Jie ZHOU

PAPER-Computer System

Vol:
E96-D No:12
Page(s):
2765-2775
The effective bandwidth of the dynamic random-access memory (DRAM) for the alternate row-wise/column-wise matrix access (AR/CMA) mode, which is a basic characteristic in scientific and engineering applications, is very low. Therefore, we propose the window memory layout scheme (WMLS), which is a matrix layout scheme that does not require transposition, for AR/CMA applications. This scheme maps one row of a logical matrix into a rectangular memory window of the DRAM to balance the bandwidth of the row- and column-wise matrix access and to increase the DRAM IO bandwidth. The optimal window configuration is theoretically analyzed to minimize the total number of no-data-visit operations of the DRAM. Different WMLS implementationsare presented according to the memory structure of field-programmable gata array (FPGA), CPU, and GPU platforms. Experimental results show that the proposed WMLS can significantly improve DRAM bandwidth for AR/CMA applications. achieved speedup factors of 1.6× and 2.0× are achieved for the general-purpose CPU and GPU platforms, respectively. For the FPGA platform, the WMLS DRAM controller is custom. The maximum bandwidth for the AR/CMA mode reaches 5.94 GB/s, which is a 73.6% improvement compared with that of the traditional row-wise access mode. Finally, we apply WMLS scheme for Chirp Scaling SAR application, comparing with the traditional access approach, the maximum speedup factors of 4.73X, 1.33X and 1.56X can be achieved for FPGA, CPU and GPU platform, respectively.
DDR3 SDRAM with a Complete Predictor
Vladimir V. STANKOVIC Nebojsa Z. MILENKOVIC

LETTER-Computer System

Vol:
E93-D No:9
Page(s):
2635-2638
In the arsenal of resources for improving computer memory system performance, predictors have gained an increasing role in the past few years. They enable hiding the latencies when accessing cache or main memory. In our previous work we proposed a DDR SDRAM controller with predictors that not only close the opened DRAM row but also predict the next row to be opened. In this paper we explore the possibilities of trying the same techniques on the latest type of DRAM memory, DDR3 SDRAM, with further improvements of the predictors.
Cache Optimization for H.264/AVC Motion Compensation
Sangyong YOON Soo-Ik CHAE

LETTER-Image Processing and Video Processing

Vol:
E91-D No:12
Page(s):
2902-2905
In this letter, we propose a cache organization that substantially reduces the memory bandwidth of motion compensation (MC) in the H.264/AVC decoders. To reduce duplicated memory accesses to P and B pictures, we employ a four-way set-associative cache in which its index bits are composed of horizontal and vertical address bits of the frame buffer and each line stores an 8 2 pixel data in the reference frames. Moreover, we alleviate the data fragmentation problem by selecting its line size that equals the minimum access size of the DDR SDRAM. The bandwidth of the optimized cache averaged over five QCIF IBBP image sequences requires only 129% of the essential bandwidth of an H.264/AVC MC.
Efficient Memory Utilization for High-Speed FPGA-Based Hardware Emulators with SDRAMs
Kohei HOSOKAWA Katsunori TANAKA Yuichi NAKAMURA

PAPER-System Level Design

Vol:
E90-A No:12
Page(s):
2810-2817
FPGA-based hardware emulators are often used for the verification of LSI functions. They generally have dedicated external memories, such as SDRAMs, to compensate for the lack of memory capacity in FPGAs. In such a case, access between the FPGAs and the dedicated external memory may represent a major bottleneck with respect to emulation speed since the dedicated external memory may have to emulate a large number of memory blocks. In this paper, we propose three methods, "Dynamic Clock Control (DCC)," "Memory Mapping Optimization (MMO)," and "Efficient Access Scheduling (EAS)," to avoid this bottleneck. DCC controls an emulation clock dynamically in accord with the number of memory accesses within one emulation clock cycle. EAS optimizes the ordering of memory access to the dedicated external memory, and MMO optimizes the arrangement of the dedicated external memory addresses to which respective memories will be emulated. With them, emulation speed can be made 29.0 times faster, as evaluated in actual LSI emulations.
A Decision Feedback Equalizing Receiver for the SSTL SDRAM Interface with Clock-Data Skew Compensation
Young-Soo SOHN Seung-Jun BAE Hong-June PARK Soo-In CHO

PAPER-Integrated Electronics

Vol:
E87-C No:5
Page(s):
809-817
A CMOS DFE (decision feedback equalization) receiver with a clock-data skew compensation was implemented for the SSTL (stub-series terminated logic) SDRAM interface. The receiver consists of a 2 way interleaving DFE input buffer for ISI reduction and a X2 over-sampling phase detector for finding the optimum sampling clock position. The measurement results at 1.2 Gbps operation showed the increase of voltage margin by about 20% and the decrease of time jitter in the recovered sampling clock by about 40% by equalization in an SSTL channel with 2 pF 4 stub load. Active chip area and power consumption are 3001000 µm2 and 142 mW, respectively, with a 2.5 V, 0.25 µm CMOS process.
A Hierarchical Timing Adjuster Featuring Intermittent Measurement for Use in Low-Power DDR SDRAMs
Satoru HANZAWA Hiromasa NODA Takeshi SAKATA Osamu NAGASHIMA Sadayuki MORITA Masanori ISODA Michiyo SUZUKI Sadayuki OHKUMA Kyoko MURAKAMI

PAPER-Optoelectronics

Vol:
E85-C No:8
Page(s):
1625-1633
A hierarchical timing adjuster that operates with intermittent adjustment has been developed for use in low-power DDR SDRAMs. Intermittent adjustment reduces power consumption in both coarse- and fine-delay circuits. Furthermore, the current-controlled fine-tuning of delay is free of short-circuit current and achieves a resolution of about 0.1 ns. In a design with 0.16-µm node technology, these techniques make the hierarchical timing adjuster able to reduce the operating current to 4.8 mA, which is 20% for the value in a conventional scheme with every-cycle measurement. The proposed timing adjuster achieves a three-cycle lock-in and only generates an internal clock pulse that has coarse resolution in the second cycle. The circuit operates over the range from 60 to 150 MHz, and occupies 0.29 mm2.
High-Level Synthesis with SDRAMs and RAMBUS DRAMs
Asheesh KHARE Preeti R. PANDA Nikil D. DUTT Alexandru NICOLAU

PAPER

Vol:
E82-A No:11
Page(s):
2347-2355
Newer off-chip DRAM families, including Synchronous DRAMs (SDRAMs) and RAMBUS DRAMs (RDRAMs), are becoming standard choices for the design of high-performance systems. Although previous work in High-Level Synthesis (HLS) has addressed exploiting features of page-mode DRAMs, techniques do not exist for exploiting the two key features of these newer DRAM families that boost memory performance and help overcome bandwidth limitations: (1) burst mode access, and (2) interleaved access through multiple banks. We address pre-synthesis optimizations on the input behavior that extract and exploit the burst mode and multiple bank interleaved access modes of these newer DRAM families, so that these features can be exploited fully during the HLS trajectory. Our experiments, run on a suite of memory-intensive benchmarks using a contemporary SDRAM library, demonstrate significant performance improvements of up to 62.5% over the naive approach, and improvements of up to 16.7% over the previous approach that considered only page-mode or extended-data-out (EDO) DRAMS.
A 180 MHz Multiple-Registered 16 Mbit SDRAM with Flexible Timing Scheme
Hisashi IWAMOTO Naoya WATANABE Akira YAMAZAKI Seiji SAWADA Yasumitsu MURAI Yasuhiro KONISHI Hiroshi ITOH Masaki KUMANOYA

PAPER-DRAM

Vol:
E77-C No:8
Page(s):
1328-1333
A multiple-registered architecture is described for 180 MHz 16 Mbit synchronous DRAM. The proposed architecture realizes a flexible control of critical timings such as I/O line busy time and achieves an operation at 180 MHz clock rate with area penalty of only 5.4% over the conventional DRAM.