The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] DRAM(107hit)

1-20hit(107hit)

  • Evaluating Energy-Efficiency of DRAM Channel Interleaving Schemes for Multithreaded Programs

    Satoshi IMAMURA  Yuichiro YASUI  Koji INOUE  Takatsugu ONO  Hiroshi SASAKI  Katsuki FUJISAWA  

     
    PAPER-Computer System

      Pubricized:
    2018/06/08
      Vol:
    E101-D No:9
      Page(s):
    2247-2257

    The power consumption of server platforms has been increasing as the amount of hardware resources equipped on them is increased. Especially, the capacity of DRAM continues to grow, and it is not rare that DRAM consumes higher power than processors on modern servers. Therefore, a reduction in the DRAM energy consumption is a critical challenge to reduce the system-level energy consumption. Although it is well known that improving row buffer locality(RBL) and bank-level parallelism (BLP) is effective to reduce the DRAM energy consumption, our preliminary evaluation on a real server demonstrates that RBL is generally low across 15 multithreaded benchmarks. In this paper, we investigate the memory access patterns of these benchmarks using a simulator and observe that cache line-grained channel interleaving schemes, which are widely applied to modern servers including multiple memory channels, hurt the RBL each of the benchmarks potentially possesses. In order to address this problem, we focus on a row-grained channel interleaving scheme and compare it with three cache line-grained schemes. Our evaluation shows that it reduces the DRAM energy consumption by 16.7%, 12.3%, and 5.5% on average (up to 34.7%, 28.2%, and 12.0%) compared to the other schemes, respectively.

  • Energy-Efficient DRAM Selective Refresh Technique with Page Residence in a Memory Hierarchy of Hardware-Managed TLB

    Miseon HAN  Yeoul NA  Dongha JUNG  Hokyoon LEE  Seon WOOK KIM  Youngsun HAN  

     
    PAPER-Integrated Electronics

      Vol:
    E101-C No:3
      Page(s):
    170-182

    A memory controller refreshes DRAM rows periodically in order to prevent DRAM cells from losing data over time. Refreshes consume a large amount of energy, and the problem becomes worse with the future larger DRAM capacity. Previously proposed selective refreshing techniques are either conservative in exploiting the opportunity or expensive in terms of required implementation overhead. In this paper, we propose a novel DRAM selective refresh technique by using page residence in a memory hierarchy of hardware-managed TLB. Our technique maximizes the opportunity to optimize refreshing by activating/deactivating refreshes for DRAM pages when their PTEs are inserted to/evicted from TLB or data caches, while the implementation cost is minimized by slightly extending the existing infrastructure. Our experiment shows that the proposed technique can reduce DRAM refresh power 43.6% on average and EDP 3.5% with small amount of hardware overhead.

  • Power-Efficient Instancy Aware DRAM Scheduling

    Gung-Yu PAN  Chih-Yen LAI  Jing-Yang JOU  Bo-Cheng Charles LAI  

     
    PAPER-Systems and Control

      Vol:
    E98-A No:4
      Page(s):
    942-953

    Nowadays, computer systems are limited by the power and memory wall. As the Dynamic Random Access Memory (DRAM) has dominated the power consumption in modern devices, developing power-saving approaches on DRAM has become more and more important. Among several techniques on different abstract levels, scheduling-based power management policies can be applied to existing memory controllers to reduce power consumption without causing severe performance degradation. Existing power-aware schedulers cluster memory requests into sets, so that the large portion of the DRAM can be switched into the power saving mode; however, only the target addresses are taken into consideration when clustering, while we observe the types (read or write) of requests can play an important role. In this paper, we propose two scheduling-based power management techniques on the DRAM controller: the inter-rank read-write aware clustering approach greatly reduces the active standby power, and the intra-rank read-write aware reordering approach mitigates the performance degradation. The simulation results show that the proposed techniques effectively reduce 75% DRAM power on average. Compared with the existing policy, the power reduction is 10% more on average with comparable or less performance degradation for the proposed techniques.

  • Implementation of the Complete Predictor for DDR3 SDRAM

    Vladimir V. STANKOVIC  Nebojsa Z. MILENKOVIC  Oliver M. VOJINOVIC  

     
    LETTER-Computer System

      Vol:
    E97-D No:3
      Page(s):
    589-592

    In the arsenal of resources for improving computer memory system performance, predictors have gained an increasing role in the past few years. They can suppress the latencies when accessing cache or main memory. In our previous work we proposed predictors that not only close the opened DRAM row but also predict the next row to be opened, hence the name ‘Complete Predictor’. It requires less than 10kB of SRAM for a 2GB SDRAM system. In this paper we evaluate how much additional hardware is needed and whether the activations of the predictors will slow down the DRAM controller.

  • Window Memory Layout Scheme for Alternate Row-Wise/Column-Wise Matrix Access

    Lei GUO  Yuhua TANG  Yong DOU  Yuanwu LEI  Meng MA  Jie ZHOU  

     
    PAPER-Computer System

      Vol:
    E96-D No:12
      Page(s):
    2765-2775

    The effective bandwidth of the dynamic random-access memory (DRAM) for the alternate row-wise/column-wise matrix access (AR/CMA) mode, which is a basic characteristic in scientific and engineering applications, is very low. Therefore, we propose the window memory layout scheme (WMLS), which is a matrix layout scheme that does not require transposition, for AR/CMA applications. This scheme maps one row of a logical matrix into a rectangular memory window of the DRAM to balance the bandwidth of the row- and column-wise matrix access and to increase the DRAM IO bandwidth. The optimal window configuration is theoretically analyzed to minimize the total number of no-data-visit operations of the DRAM. Different WMLS implementationsare presented according to the memory structure of field-programmable gata array (FPGA), CPU, and GPU platforms. Experimental results show that the proposed WMLS can significantly improve DRAM bandwidth for AR/CMA applications. achieved speedup factors of 1.6× and 2.0× are achieved for the general-purpose CPU and GPU platforms, respectively. For the FPGA platform, the WMLS DRAM controller is custom. The maximum bandwidth for the AR/CMA mode reaches 5.94 GB/s, which is a 73.6% improvement compared with that of the traditional row-wise access mode. Finally, we apply WMLS scheme for Chirp Scaling SAR application, comparing with the traditional access approach, the maximum speedup factors of 4.73X, 1.33X and 1.56X can be achieved for FPGA, CPU and GPU platform, respectively.

  • A Proposal of Spatio-Temporal Reconstruction Method Based on a Fast Block-Iterative Algorithm Open Access

    Tatsuya KON  Takashi OBI  Hideaki TASHIMA  Nagaaki OHYAMA  

     
    PAPER-Medical Image Processing

      Vol:
    E96-D No:4
      Page(s):
    819-825

    Parametric images can help investigate disease mechanisms and vital functions. To estimate parametric images, it is necessary to obtain the tissue time activity curves (tTACs), which express temporal changes of tracer activity in human tissue. In general, the tTACs are calculated from each voxel's value of the time sequential PET images estimated from dynamic PET data. Recently, spatio-temporal PET reconstruction methods have been proposed in order to take into account the temporal correlation within each tTAC. Such spatio-temporal algorithms are generally quite computationally intensive. On the other hand, typical algorithms such as the preconditioned conjugate gradient (PCG) method still does not provide good accuracy in estimation. To overcome these problems, we propose a new spatio-temporal reconstruction method based on the dynamic row-action maximum-likelihood algorithm (DRAMA). As the original algorithm does, the proposed method takes into account the noise propagation, but it achieves much faster convergence. Performance of the method is evaluated with digital phantom simulations and it is shown that the proposed method requires only a few reconstruction processes, thereby remarkably reducing the computational cost required to estimate the tTACs. The results also show that the tTACs and parametric images from the proposed method have better accuracy.

  • Evaluation of Performance in Vertical 1T-DRAM and Planar 1T-DRAM

    Yuto NORIFUSA  Tetsuo ENDOH  

     
    PAPER

      Vol:
    E95-C No:5
      Page(s):
    847-853

    The performances of the conventional planar type 1T DRAM and the Vertical type 1T DRAM are compared based on structure difference using a fully-consistent device simulator. We discuss the structural advantage of the Vertical type 1T-DRAM in comparison with the conventional planar type 1T-DRAM, and evaluate their performance in each operating mode such as write, erase, read, and hold; and discuss its cell performances such as Cell Current Margin and data retention. These results provide a useful guideline designing the high performance Vertical type 1T-DRAM cell.

  • A Low-Vt Small-Offset Gated-Preamplifier for Sub-1-V DRAM Mid-Point Sensing

    Satoru AKIYAMA  Riichiro TAKEMURA  Tomonori SEKIGUCHI  Akira KOTABE  Kiyoo ITOH  

     
    PAPER

      Vol:
    E95-C No:4
      Page(s):
    600-608

    A gated sense amplifier (GSA) consisting of a low-Vt gated preamplifier (LGA) and a high-Vt sense amplifier (SA) is proposed. The gating scheme of the LGA enables quick amplification of an initial cell signal voltage (vS0) because of its low Vt and prevents the cell signal from degrading due to interference noise between data lines. As for a conventional sense amplifier (CSA), this new type of noise causes sensing error, and the noise-generation mechanism was clarified for the first time by analysis of vS0. The high-Vt SA holds the amplified signal and keeps subthreshold current low. Moreover, the gating scheme of the low-Vt MOSFETs in the LGA drives the I/O line quickly. The GSA thus simultaneously achieves fast sensing, low-leakage data holding, and fast I/O driving, even for sub-1-V mid-point sensing. The GSA is promising for future sub-1-V gigabit dynamic random-access memory (DRAM) because of reduced variations in the threshold voltage of MOSFETs; thus, the offset voltage of the LGA is reduced. The effectiveness of the GSA was verified with a 70-nm 512-Mbit DRAM chip. It demonstrated row access time (tRCD) of 16.4 ns and read access (tAA) of 14.3 ns at array voltage of 0.9 V.

  • Small-Sized Leakage-Controlled Gated Sense Amplifier for 0.5-V Multi-Gigabit DRAM Arrays

    Akira KOTABE  Riichiro TAKEMURA  Yoshimitsu YANAGAWA  Tomonori SEKIGUCHI  Kiyoo ITOH  

     
    PAPER

      Vol:
    E95-C No:4
      Page(s):
    594-599

    A small-sized leakage-controlled gated sense amplifier (SA) and relevant circuits are proposed for 0.5-V multi-gigabit DRAM arrays. The proposed SA consists of a high-VT PMOS amplifier and a low-VT NMOS amplifier which is composed of high-VT NMOSs and a low-VT cross-coupled NMOS, and achieves 46% area reduction compared to a conventional SA with a low-VT CMOS preamplifier. Separation of the proposed SA and a data-line pair achieves a sensing time of 6 ns and a writing time of 0.6 ns. Momentarily overdriving the PMOS amplifier achieves a restoring time of 13 ns. The gate level control of the high-VT NMOSs and the gate level compensation circuit for PVT variations reduce the leakage current of the proposed SA to 2% of that without the control, and its effectiveness was confirmed using a 50-nm test chip.

  • An Area-Efficient, Low-VDD, Highly Reliable Multi-Cell Antifuse System Fully Operative in DRAMs

    Jong-Pil SON  Jin Ho KIM  Woo Song AHN  Seung Uk HAN  Satoru YAMADA  Byung-Sick MOON  Churoo PARK  Hong-Sun HWANG  Seong-Jin JANG  Joo Sun CHOI  Young-Hyun JUN  Soo-Won KIM  

     
    PAPER-Integrated Electronics

      Vol:
    E94-C No:10
      Page(s):
    1690-1697

    A reliable antifuse scheme has been very hard to build, which has precluded its implementation in DRAM products. We devised a very reliable multi-cell structure to cope with the large process variation in the DRAM-cell-capacitor type antifuse system. The programming current did not rise above 564 µA even in the nine-cell case. The cumulative distribution of the successful rupture in the multi-cell structure could be curtailed dramatically to less than 15% of the single-cell's case and the recovery problem of programmed cells after the thermal stress (300) had disappeared. In addition, we also presented a Post-Package Repair (PPR) scheme that could be directly coupled to the external high-voltage power rail via an additional pin with small protection circuits, saving the chip area otherwise consumed by the internal pump circuitry. A 1 Gbit DDR SDRAM was fabricated using Samsung's advanced 50 nm DRAM technology, successfully proving the feasibility of the proposed antifuse system implemented in it.

  • A New 1T DRAM Cell: Cone Type 1T DRAM Cell

    Gil Sung LEE  Doo-Hyun KIM  Seongjae CHO  Byung-Gook PARK  

     
    PAPER

      Vol:
    E94-C No:5
      Page(s):
    681-685

    We propose a new cone-type DRAM cell as a 1T DRAM cell. The superiority of cone shape is already reported, in that the electric field concentration effect encourages impact ionization phenomenon. So the device has improved DRAM characteristics compared with cylinder type 1T DRAM Cell (SGVC Cell). To confirm the memory operation of the cone-type DRAM cell, simulation works were carried out. Also, retention characteristic shows the device can be used practically.

  • Impact of Floating Body Type DRAM with the Vertical MOSFET

    Yuto NORIFUSA  Tetsuo ENDOH  

     
    PAPER

      Vol:
    E94-C No:5
      Page(s):
    705-711

    Several kinds of capacitor-less DRAM cells based on planar SOI-MOSFET technology have been proposed and researched to overcome the integration limit of the conventional DRAM. In this paper, we propose the Floating Body type DRAM cell array architecture with the Vertical MOSFET and discuss its basic operation using a 3-D device simulator. In contrast to previous planar SOI-MOSFET technology, the Floating Body type DRAM with the Vertical MOSFET achieves a cell area of 4F2 and obtain its floating body cell by isolating the body from the substrate vertically by the bottom-electrode. Therefore, the necessity for a SOI substrate is eliminated. In this paper, the cell array architecture of Floating Body type 1T-DRAM is proposed, and furthermore, the basic memory operations of read, write, and erase for Vertical type 1 transistor (1T) DRAM in the 45 nm technology node are shown. In addition, the retention and disturb characteristics of the Vertical type 1T-DRAM are discussed.

  • Novel 1T DRAM Cell for Low-Voltage Operation and Long Data Retention Time

    Woojun LEE  Kwangsoo KIM  Woo Young CHOI  

     
    PAPER-Integrated Electronics

      Vol:
    E94-C No:1
      Page(s):
    110-115

    A novel one-transistor dynamic random access memory (1T DRAM) cell has been proposed for a low-voltage operation and longer data retention time. The proposed 1T DRAM cell has three features compared with a conventional 1T DRAM cell: low body doping concentration, a recessed gate structure, and a P + poly-Si gate. Simulation results show that the proposed 1T DRAM cell has < 1-ns program time and > 100-ms data retention time under the condition of sub-1-V operating voltage.

  • DDR3 SDRAM with a Complete Predictor

    Vladimir V. STANKOVIC  Nebojsa Z. MILENKOVIC  

     
    LETTER-Computer System

      Vol:
    E93-D No:9
      Page(s):
    2635-2638

    In the arsenal of resources for improving computer memory system performance, predictors have gained an increasing role in the past few years. They enable hiding the latencies when accessing cache or main memory. In our previous work we proposed a DDR SDRAM controller with predictors that not only close the opened DRAM row but also predict the next row to be opened. In this paper we explore the possibilities of trying the same techniques on the latest type of DRAM memory, DDR3 SDRAM, with further improvements of the predictors.

  • A Bandwidth Optimized, 64 Cycles/MB Joint Parameter Decoder Architecture for Ultra High Definition H.264/AVC Applications

    Jinjia ZHOU  Dajiang ZHOU  Xun HE  Satoshi GOTO  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E93-A No:8
      Page(s):
    1425-1433

    In this paper, VLSI architecture of a joint parameter decoder is proposed to realize the calculation of motion vector (MV), intra prediction mode (IPM) and boundary strength (BS) for ultra high definition H.264/AVC applications. For this architecture, a 64-cycle-per-MB pipeline with simplified control modes is designed to increase system throughput and reduce hardware cost. Moreover, in order to save memory bandwidth, the data which includes the motion information for the co-located picture and the last decoded line, is pre-processed before being stored to DRAM. A partition based storage format is applied to condense the MB level data, while variable length coding based compression method is utilized to reduce the data size in each partition. Experimental results show our design is capable of real-time 38402160@60 fps decoding at less than 133 MHz, with 37.2 k logic gates. Meanwhile, by applying the proposed scheme, 85-98% bandwidth saving is achieved, compared with storing the original information for every 44 block to DRAM.

  • Adaptive Circuits for the 0.5-V Nanoscale CMOS Era Open Access

    Kiyoo ITOH  Masanao YAMAOKA  Takashi OSHIMA  

     
    INVITED PAPER

      Vol:
    E93-C No:3
      Page(s):
    216-233

    The minimum operating voltage, Vmin, of nanoscale CMOS LSIs is investigated to breach the 1-V wall that we are facing in the 65-nm device generation, and open the door to the below 0.5-V era. A new method using speed variation is proposed to evaluate Vmin. It shows that Vmin is very sensitive to the lowest necessary threshold voltage, Vt0, of MOSFETs and to threshold-voltage variations, Δ Vt, which become more significant with device scaling. There is thus a need for low-Vt0 circuits and ΔVt-immune MOSFETs to reduce Vmin. For memory-rich LSIs, the SRAM block is particularly problematic because it has the highest Vmin. Various techniques are thus proposed to reduce the Vmin: using RAM repair, shortening the data line, up-sizing, and using more relaxed MOSFET scaling. To effectively reduce Vmin of other circuit blocks, dual-Vt0 and dual-VDD circuits using gate-source reverse biasing, temporary activation, and series connection of another small low-Vt0 MOSFET are proposed. They are dynamic logic circuits enabling the power-delay product of the conventional static CMOS inverter to be reduced to 0.09 at a 0.2-V supply, and a DRAM dynamic sense amplifier and power switches operable at below 0.5 V. In addition, a fully-depleted structure (FD-SOI) and fin-type structure (FinFET) for ΔVt-immune MOSFETs are discussed in terms of their low-voltage potential and challenges. As a result, the height up-scalable FinFETs turns out to be quite effective to reduce Vmin to less than 0.5 V, if combined with the low-Vt0 circuits. For mixed-signal LSIs, investigation of low-voltage potential of analog circuits, especially for comparators and operational amplifiers, reveals that simple inverter op-amps, in which the low gain and nonlinearity are compensated for by digitally assisted analog designs, are crucial to 0.5-V operations. Finally, it is emphasized that the development of relevant devices and fabrication processes is the key to the achievement of 0.5-V nanoscale LSIs.

  • Solid-State Disk with Double Data Rate DRAM Interface for High-Performance PCs

    Dong KIM  Kwanhu BANG  Seung-Hwan HA  Chanik PARK  Sung Woo CHUNG  Eui-Young CHUNG  

     
    LETTER-Computer Systems

      Vol:
    E92-D No:4
      Page(s):
    727-731

    We propose a Solid-State Disk (SSD) with a Double Data Rate (DDR) DRAM interface for high-performance PCs. Traditional SSDs simply inherit the interface protocol of Hard Disk Drives (HDD) such as Parallel Advanced Technology Attachment (PATA) or Serial-ATA (SATA) for maintaining the compatibility. However, SSD itself provides much higher performance than HDD, hence the interface also needs to be enhanced. Unlike the traditional SSDs, the proposed SSD with DDR DRAM interface is placed in the North Bridge which provides two or more DDR DRAM interface ports in high-performance PCs. The novelty of our work is on DQS signaling scheme which allows arbitrary Column Address Strobe (CAS) latency unlike typical DDR DRAM interface scheme. The experimental results show that the proposed SSD maximally outperforms the traditional SSD by 8.7 times in read mode, by 1.5 times in write mode. Also, for synthetic workloads, the proposed scheme shows performance improvement over the conventional architecture by a factor of 1.6 times.

  • Reducing On-Chip DRAM Energy via Data Transfer Size Optimization

    Takatsugu ONO  Koji INOUE  Kazuaki MURAKAMI  Kenji YOSHIDA  

     
    PAPER

      Vol:
    E92-C No:4
      Page(s):
    433-443

    This paper proposes a software-controllable variable line-size (SC-VLS) cache architecture for low power embedded systems. High bandwidth between logic and a DRAM is realized by means of advanced integrated technology. System-in-Silicon is one of the architectural frameworks to realize the high bandwidth. An ASIC and a specific SRAM are mounted onto a silicon interposer. Each chip is connected to the silicon interposer by eutectic solder bumps. In the framework, it is important to reduce the DRAM energy consumption. The specific DRAM needs a small cache memory to improve the performance. We exploit the cache to reduce the DRAM energy consumption. During application program executions, an adequate cache line size which produces the lowest cache miss ratio is varied because the amount of spatial locality of memory references changes. If we employ a large cache line size, we can expect the effect of prefetching. However, the DRAM energy consumption is larger than a small line size because of the huge number of banks are accessed. The SC-VLS cache is able to change a line size to an adequate one at runtime with a small area and power overheads. We analyze the adequate line size and insert line size change instructions at the beginning of each function of a target program before executing the program. In our evaluation, it is observed that the SC-VLS cache reduces the DRAM energy consumption up to 88%, compared to a conventional cache with fixed 256 B lines.

  • DRAM Controller with a Complete Predictor

    Vladimir V. STANKOVIC  Nebojsa Z. MILENKOVIC  

     
    PAPER-Computer Systems

      Vol:
    E92-D No:4
      Page(s):
    584-593

    In the arsenal of resources for computer memory system performance improvement, predictors have gained an increasing role in the past years. They can suppress the latencies when accessing cache or main memory. In paper [1] it is shown how temporal parameters of cache memory access, defined as live time, dead time and access interval could be used for prediction of data prefetching. This paper examines the feasibility of applying an analog technique on controlling of opening/closing DRAM memory rows, with various improvements. The results described herein confirm the feasibility, and allow us to propose a DRAM controller with predictors that not only close the opened DRAM row, but also predict the next row to be opened.

  • Cache Optimization for H.264/AVC Motion Compensation

    Sangyong YOON  Soo-Ik CHAE  

     
    LETTER-Image Processing and Video Processing

      Vol:
    E91-D No:12
      Page(s):
    2902-2905

    In this letter, we propose a cache organization that substantially reduces the memory bandwidth of motion compensation (MC) in the H.264/AVC decoders. To reduce duplicated memory accesses to P and B pictures, we employ a four-way set-associative cache in which its index bits are composed of horizontal and vertical address bits of the frame buffer and each line stores an 8 2 pixel data in the reference frames. Moreover, we alleviate the data fragmentation problem by selecting its line size that equals the minimum access size of the DDR SDRAM. The bandwidth of the optimized cache averaged over five QCIF IBBP image sequences requires only 129% of the essential bandwidth of an H.264/AVC MC.

1-20hit(107hit)