The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] memory(653hit)

141-160hit(653hit)

  • Reducing Aging Effects on Ternary CAM

    Ing-Chao LIN  Yen-Han LEE  Sheng-Wei WANG  

     
    PAPER-Integrated Electronics

      Vol:
    E99-C No:7
      Page(s):
    878-891

    Ternary content addressable memory (TCAM), which can store 0, 1, or X in its cells, is widely used to store routing tables in network routers. Negative bias temperature instability (NBTI) and positive bias temperature instability (PBTI), which increase Vth and degrade transistor switching speed, have become major reliability challenges. This study analyzes the signal probability of routing tables. The results show that many cells retain static stress and suffer significant degradation caused by NBTI and PBTI effects. The bit flipping technique is improved and proactive power gating recovery is proposed to mitigate NBTI and PBTI effects. In order to maintain the functionality of TCAM after bit flipping, a novel TCAM cell design is proposed. Simulation results show that compared to the original architecture, the bit flipping technique improves read static noise margin (SNM) for data and mask cells by 16.84% and 29.94%, respectively, and reduces search time degradation by 12.95%. The power gating technique improves read SNM for data and mask cells by 12.31% and 20.92%, respectively, and reduces search time degradation by 17.57%. When both techniques are used, read SNM for data and mask cells is improved by 17.74% and 30.53%, respectively, and search time degradation is reduced by 21.01%.

  • PBGC: Proxy Block-Based Garbage Collection for Index Structures in NAND Flash Memory

    Seon Hwan KIM  Ju Hee CHOI  Jong Wook KWAK  

     
    LETTER-Computer System

      Pubricized:
    2016/04/01
      Vol:
    E99-D No:7
      Page(s):
    1928-1932

    In this letter, we propose a novel garbage collection technique for index structures based on flash memory systems, called Proxy Block-based Garbage Collection (PBGC). Many index structures have been proposed for flash memory systems. They exploit buffers and logs to resolve the update propagation problem, one of the a main cause of performance degradation of the index structures. However, these studies overlooked the fact that not only the record operation but also garbage collection induces the update propagation problem. The proposal, PBGC, exploits a proxy block and a block mapping table to solve the update propagation problem, which is caused by the changes in the page and block caused by garbage collection. Experiments show that PBGC decreased the execution time of garbage collection by up to 39%, compared with previous garbage collection techniques.

  • A Dynamic Switching Flash Translation Layer Based on Page-Level Mapping

    Dongchul PARK  Biplob DEBNATH  David H.C. DU  

     
    PAPER-Computer System

      Pubricized:
    2016/03/14
      Vol:
    E99-D No:6
      Page(s):
    1502-1511

    The Flash Translation Layer (FTL) is a firmware layer inside NAND flash memory that allows existing disk-based applications to use it without any significant modifications. Since the FTL has a critical impact on the performance and reliability of flash-based storage, a variety of FTLs have been proposed. The existing FTLs, however, are designed to perform well for either a read intensive workload or a write intensive workload, not for both due to their internal address mapping schemes. To overcome this limitation, we propose a novel hybrid FTL scheme named Convertible Flash Translation Layer (CFTL). CFTL is adaptive to data access patterns with the help of our unique hot data identification design that adopts multiple bloom filters. Thus, CFTL can dynamically switch its mapping scheme to either page-level mapping or block-level mapping to fully exploit the benefits of both schemes. In addition, we design a spatial locality-aware caching mechanism and adaptive cache partitioning to further improve CFTL performance. Consequently, both the adaptive switching scheme and the judicious caching mechanism empower CFTL to achieve good read and write performance. Our extensive evaluations demonstrate that CFTL outperforms existing FTLs. In particular, our specially designed caching mechanism remarkably improves the cache hit ratio, by an average of 2.4×, and achieves much higher hit ratios (up to 8.4×) especially for random read intensive workloads.

  • A New High-Density 10T CMOS Gate-Array Base Cell for Two-Port SRAM Applications

    Nobutaro SHIBATA  Yoshinori GOTOH  Takako ISHIHARA  

     
    PAPER-Integrated Electronics

      Vol:
    E99-C No:6
      Page(s):
    717-726

    Two-port SRAMs are frequently installed in gate-array VLSIs to implement smart functions. This paper presents a new high-density 10T CMOS base cell for gate-array-based two-port SRAM applications. Using the single base cell alone, we can implement a two-port memory cell whose bitline contacts are shared with the memory cell adjacent to one of two dedicated sides, resulting in greatly reduced parasitic capacitance in bitlines. To throw light on the total performance derived from the base cell, a plain two-port SRAM macro was designed and fabricated with a 0.35-µm low cost, logic process. Each of two 10-bit power-saved address decoders was formed with 36% fewer base cells by employing complex gates and a subdecoder. The new sense amplifier with a complementary sensing scheme had a fine sensitivity of 35 mVpp, and so we successfully reduced the required read bitline signal from 250 to 70 mVpp. With the macro with 1024 memory cells per bitline, the address access time under typical conditions of a 2.5-V power supply and 25°C was 4.0 ns (equal to that obtained with full-custom style design) and the power consumption at 200-MHz simultaneous operations of two ports was 6.7 mW for an I/O-data width of 1 bit.

  • Bias Polarity Dependent Resistive Switching Behaviors in Silicon Nitride-Based Memory Cell

    Sungjun KIM  Min-Hwi KIM  Seongjae CHO  Byung-Gook PARK  

     
    BRIEF PAPER

      Vol:
    E99-C No:5
      Page(s):
    547-550

    In this work, the bias polarity dependent resistive switching behaviors in Cu/Si3N4/p+ Si RRAM memory cell have been closely studied. Different switching characteristics in both unipolar and bipolar modes after the positive forming are investigated. The bipolar switching did not need a forming process and showed better characteristics including endurance cycling, uniformity of switching parameters, and on/off resistance ratio. Also, the resistive switching characteristics by both positive and negative forming switching are compared. It has been confirmed that both unipolar and bipolar modes after the negative forming exhibits inferior resistive switching performances due to high forming voltage and current.

  • Variation of SCM/NAND Flash Hybrid SSD Performance, Reliability and Cost by Using Different SSD Configurations and Error Correction Strengths

    Hirofumi TAKISHITA  Shuhei TANAKAMARU  Sheyang NING  Ken TAKEUCHI  

     
    PAPER

      Vol:
    E99-C No:4
      Page(s):
    444-451

    Storage-Class Memory (SCM) and NAND flash hybrid Solid-State Drive (SSD) has advantages of high performance and low power consumption compared with NAND flash only SSD. In this paper, first, three SSD configurations are investigated. Three different SCMs are used with 0.1 µs, 1 µs and 10 µs read/write latencies, respectively, and the required SCM/NAND flash capacity ratios are analyzed to maintain the same SSD performance. Next, by using the three SSD configurations, the variation of SSD reliability, performance and cost are analyzed by changing error correction strengths. The SSD reliability of acceptable SCM and NAND flash Bit Error Rates (BERs) is limited by achieving specified SSD performance with error correction, and/or limited by SCM and NAND flash parity size and SSD cost. Lastly, the SSD replacement cost is also analyzed by considering the limitation of NAND flash write/erase cycles. The purpose of this paper is to provide a design guideline for obtaining high performance, highly reliable and cost-effective SCM/NAND hybrid structure SSD with ECC.

  • HaWL: Hidden Cold Block-Aware Wear Leveling Using Bit-Set Threshold for NAND Flash Memory

    Seon Hwan KIM  Ju Hee CHOI  Jong Wook KWAK  

     
    LETTER-Computer System

      Pubricized:
    2016/01/13
      Vol:
    E99-D No:4
      Page(s):
    1242-1245

    In this letter, we propose a novel wear leveling technique we call Hidden cold block-aware Wear Leveling (HaWL) using a bit-set threshold. HaWL prolongs the lifetime of flash memory devices by using a bit array table in wear leveling. The bit array table saves the histories of block erasures for a period and distinguishes cold blocks from all blocks. In addition, HaWL can reduce the size of the bit array table by using a one-to-many mode, where one bit is related to many blocks. Moreover, to prevent degradation of wear leveling in the one-to-many mode, HaWL uses bit-set threshold (BST) and increases the accuracy of the cold block information. The performance results illustrate that HaWL prolongs the lifetime of flash memory by up to 48% compared with previous wear leveling techniques in our experiments.

  • A SOI Cache-Tag Memory with Dual-Rail Wordline Scheme

    Nobutaro SHIBATA  Takako ISHIHARA  

     
    PAPER-Integrated Electronics

      Vol:
    E99-C No:2
      Page(s):
    316-330

    Cache memories are the major application of high-speed SRAMs, and they are frequently installed in high performance logic VLSIs including microprocessors. This paper presents a 4-way set-associative, SOI cache-tag memory. To obtain higher operating speed with less power dissipation, we devised an I/O-separated memory cell with a dual-rail wordline, which is used to transmit complementary selection signals. The address decoding delay was shortened using CMOS dual-rail logic. To enhance the maximum operating frequency, bitline's recovery operations after writing data were eliminated using a memory array configuration without half-selected cells. Moreover, conventional, sensitive but slow differential amplifiers were successfully removed from the data I/O circuitry with a hierarchical bitline scheme. As regards the stored data management, we devised a new hardware-oriented LRU-data replacement algorithm on the basis of 6-bit directed graph. With the experimental results obtained with a test chip fabricated with a 0.25-µm CMOS/SIMOX process, the core of the cache-tag memory with a 1024-set configuration can achieve a 1.5-ns address access time under typical conditions of a 2-V power supply and 25°C. The power dissipation during standby was less than 14 µW, and that at the 500-MHz operation was 13-83 mW, depending on the bit-stream data pattern.

  • Energy-Scalable 4KB LDPC Decoding Architecture for NAND-Flash-Based Storage Systems

    Youngjoo LEE  Jaehwan JUNG  In-Cheol PARK  

     
    PAPER-Electronic Circuits

      Vol:
    E99-C No:2
      Page(s):
    293-301

    This paper presents a novel low-power decoder architecture for the (36420, 32778) binary LDPC code targeting energy-efficient NAND-flash-based mobile devices. The proposed energy-scalable decoding algorithm reduces the operating bit-width of decoding function units at the early-use stage where the channel condition is good enough to lower the precision of computation. Based on a flexible adder structure, the decoding energy of the proposed LDPC decoder can be reduced by freezing the unnecessary parts of hardware resources. A prototype 4KB LDPC decoder is designed in a 65nm CMOS technology, which achieves an average decoding throughput of 8.13Gb/s with 1.2M equivalent gates. The power consumption of the decoder ranges from 397mW to 563mW depending on operating conditions.

  • Performance of Dynamic Instruction Window Resizing for a Given Power Budget under DVFS Control

    Hideki ANDO  Ryota SHIOYA  

     
    PAPER-Computer System

      Pubricized:
    2015/11/12
      Vol:
    E99-D No:2
      Page(s):
    341-350

    Dynamic instruction window resizing (DIWR) is a scheme that effectively exploits both memory-level parallelism and instruction-level parallelism by configuring the instruction window size appropriately for exploiting each parallelism. Although a previous study has shown that the DIWR processor achieves a significant speedup, power consumption has not been explored. The power consumption is increased in DIWR because the instruction window resources are enlarged in memory-intensive phases. If the power consumption exceeds the power budget determined by certain requirements, the DIWR processor must save power and thus, the performance previously presented cannot be achieved. In this paper, we explore to what extent the DIWR processor can achieve improved performance for a given power budget, assuming that dynamic voltage and frequency scaling (DVFS) is introduced as a power saving technique. Evaluation results using the SPEC2006 benchmark programs show that the DIWR processor, even with a constrained power budget, achieves a speedup over the conventional processor over a wide range of given power budgets. At the most important power budget point, i.e., when the power a conventional processor consumes without any power constraint is supplied, DIWR achieves a 16% speedup.

  • Code Generation Limiting Maximum and Minimum Hamming Distances for Non-Volatile Memories

    Tatsuro KOJO  Masashi TAWADA  Masao YANAGISAWA  Nozomu TOGAWA  

     
    PAPER-High-Level Synthesis and System-Level Design

      Vol:
    E98-A No:12
      Page(s):
    2484-2493

    Data stored in non-volatile memories may be destructed due to crosstalk and radiation but we can restore their data by using error-correcting codes. However, non-volatile memories consume a large amount of energy in writing. How to reduce maximum writing bits even using error-correcting codes is one of the challenges in non-volatile memory design. In this paper, we first propose Doughnut code which is based on state encoding limiting maximum and minimum Hamming distances. After that, we propose a code expansion method, which improves maximum and minimum Hamming distances. When we apply our code expansion method to Doughnut code, we can obtain a code which reduces maximum-flipped bits and has error-correcting ability equal to Hamming code. Experimental results show that the proposed code efficiently reduces the number of maximum-writing bits.

  • ECC-Based Bit-Write Reduction Code Generation for Non-Volatile Memory

    Masashi TAWADA  Shinji KIMURA  Masao YANAGISAWA  Nozomu TOGAWA  

     
    PAPER-High-Level Synthesis and System-Level Design

      Vol:
    E98-A No:12
      Page(s):
    2494-2504

    Non-volatile memory has many advantages such as high density and low leakage power but it consumes larger writing energy than SRAM. It is quite necessary to reduce writing energy in non-volatile memory design. In this paper, we propose write-reduction codes based on error correcting codes and reduce writing energy in non-volatile memory by decreasing the number of writing bits. When a data is written into a memory cell, we do not write it directly but encode it into a codeword. In our write-reduction codes, every data corresponds to an information vector in an error-correcting code and an information vector corresponds not to a single codeword but a set of write-reduction codewords. Given a writing data and current memory bits, we can deterministically select a particular write-reduction codeword corresponding to the data to be written, where the maximum number of flipped bits are theoretically minimized. Then the number of writing bits into memory cells will also be minimized. Experimental results demonstrate that we have achieved writing-bits reduction by an average of 51% and energy reduction by an average of 33% compared to non-encoded memory.

  • On Two Strong Converse Theorems for Discrete Memoryless Channels

    Yasutada OOHAMA  

     
    LETTER-Shannon Theory

      Vol:
    E98-A No:12
      Page(s):
    2471-2475

    In 1973, Arimoto proved the strong converse theorem for the discrete memoryless channels stating that when transmission rate R is above channel capacity C, the error probability of decoding goes to one as the block length n of code word tends to infinity. He proved the theorem by deriving the exponent function of error probability of correct decoding that is positive if and only if R>C. Subsequently, in 1979, Dueck and Körner determined the optimal exponent of correct decoding. Arimoto's bound has been said to be equal to the bound of Dueck and Körner. However its rigorous proof has not been presented so far. In this paper we give a rigorous proof of the equivalence of Arimoto's bound to that of Dueck and Körner.

  • Low-Power Motion Estimation Processor with 3D Stacked Memory

    Shuping ZHANG  Jinjia ZHOU  Dajiang ZHOU  Shinji KIMURA  Satoshi GOTO  

     
    PAPER

      Vol:
    E98-A No:7
      Page(s):
    1431-1441

    Motion estimation (ME) is a key encoding component of almost all modern video coding standards. ME contributes significantly to video coding efficiency, but, it also consumes the most power of any component in a video encoder. In this paper, an ME processor with 3D stacked memory architecture is proposed to reduce memory and core power consumption. First, a memory die is designed and stacked with ME die. By adding face-to-face (F2F) pads and through-silicon-via (TSV) definitions, 2D electronic design automation (EDA) tools can be extended to support the proposed 3D stacking architecture. Moreover, a special memory controller is applied to control data transmission and timing between the memory die and the ME processor die. Finally, a 3D physical design is completed for the entire system. This design includes TSV/F2F placement, floor plan optimization, and power network generation. Compared to 2D technology, the number of input/output (IO) pins is reduced by 77%. After optimizing the floor plan of the processor die and memory die, the routing wire lengths are reduced by 13.4% and 50%, respectively. The stacking static random access memory contributes the most power reduction in this work. The simulation results show that the design can support real-time 720p @ 60fps encoding at 8MHz using less than 65mW in power, which is much better compared to the state-of-the-art ME processor.

  • Hybrid Quaternionic Hopfield Neural Network

    Masaki KOBAYASHI  

     
    PAPER-Nonlinear Problems

      Vol:
    E98-A No:7
      Page(s):
    1512-1518

    In recent years, applications of complex-valued neural networks have become wide spread. Quaternions are an extension of complex numbers, and neural networks with quaternions have been proposed. Because quaternion algebra is non-commutative algebra, we can consider two orders of multiplication to calculate weighted input. However, both orders provide almost the same performance. We propose hybrid quaternionic Hopfield neural networks, which have both orders of multiplication. Using computer simulations, we show that these networks outperformed conventional quaternionic Hopfield neural networks in noise tolerance. We discuss why hybrid quaternionic Hopfield neural networks improve noise tolerance from the standpoint of rotational invariance.

  • Memoryless and Adaptive State Feedback Controller for a Chain of Integrators with an Unknown Delay in the Input

    Ho-Lim CHOI  

     
    LETTER-Systems and Control

      Vol:
    E98-A No:7
      Page(s):
    1565-1568

    For systems with a delay in the input, the predictor method has been often used in state feedback controllers for system stabilization or regulation. In this letter, we show that for a chain of integrators with even an unknown input delay, a much simpler and memoryless controller is a good candidate for system regulation. With an adaptive gain-scaling factor, the proposed state feedback controller can deal with an unknown time-varying delay in the input. An example is given for illustration.

  • Resistance-Switching Characteristics of Si-rich Oxide Evaluated by Using Ni Nanodots as Electrodes in Conductive AFM Measurements

    Akio OHTA  Chong LIU  Takashi ARAI  Daichi TAKEUCHI  Hai ZHANG  Katsunori MAKIHARA  Seiichi MIYAZAKI  

     
    PAPER

      Vol:
    E98-C No:5
      Page(s):
    406-410

    Ni nanodots (NDs) used as nano-scale top electrodes were formed on a 10-nm-thick Si-rich oxide (SiO$_{mathrm{x}}$)/Ni bottom electrode by exposing a 2-nm-thick Ni layer to remote H$_{2}$-plasma (H$_{2}$-RP) without external heating, and the resistance-switching behaviors of SiO$_{mathrm{x}}$ were investigated from current-voltage ( extit{I--V}) curves. Atomic force microscope (AFM) analyses confirmed the formation of electrically isolated Ni NDs as a result of surface migration and agglomeration of Ni atoms promoted by the surface recombination of H radicals. From local extit{I--V} measurements performed by contacting a single Ni ND as a top electrode with a Rh coated Si cantilever, a distinct uni-polar type resistance switching behavior was observed repeatedly despite an average contact area between the Ni ND and the SiO$_{mathrm{x}}$ as small as $sim$ 1.9 $ imes$ 10$^{-12}$cm$^{2}$. This local extit{I--V} measurement technique is quite a simple method to evaluate the size scalability of switching properties.

  • A Detection and Measurement Approach for Memory Leaked Objects in Java Programs

    Qiao YU  Shujuan JIANG  Yingqi LIU  

     
    PAPER-Software Engineering

      Pubricized:
    2015/02/04
      Vol:
    E98-D No:5
      Page(s):
    1053-1061

    Memory leak occurs when useless objects cannot be released for a long time during program execution. Memory leaked objects may cause memory overflow, system performance degradation and even cause the system to crash when they become serious. This paper presents a dynamic approach for detecting and measuring memory leaked objects in Java programs. First, our approach tracks the program by JDI and records heap information to find out the potentially leaked objects. Second, we present memory leaking confidence to measure the influence of these objects on the program. Finally, we select three open-source programs to evaluate the efficiency of our approach. Furthermore, we choose ten programs from DaCapo 9.12 benchmark suite to reveal the time overhead of our approach. The experimental results show that our approach is able to detect and measure memory leaked objects efficiently.

  • High-Speed Design of Conflictless Name Lookup and Efficient Selective Cache on CCN Router

    Atsushi OOKA  Shingo ATA  Kazunari INOUE  Masayuki MURATA  

     
    PAPER-Network

      Vol:
    E98-B No:4
      Page(s):
    607-620

    Content-centric networking (CCN) is an innovative network architecture that is being considered as a successor to the Internet. In recent years, CCN has received increasing attention from all over the world because its novel technologies (e.g., caching, multicast, aggregating requests) and communication based on names that act as addresses for content have the potential to resolve various problems facing the Internet. To implement these technologies, however, requires routers with performance far superior to that offered by today's Internet routers. Although many researchers have proposed various router components, such as caching and name lookup mechanisms, there are few router-level designs incorporating all the necessary components. The design and evaluation of a complete router is the primary contribution of this paper. We provide a concrete hardware design for a router model that uses three basic tables — forwarding information base (FIB), pending interest table (PIT), and content store (CS) — and incorporates two entities that we propose. One of these entities is the name lookup entity, which looks up a name address within a few cycles from content-addressable memory by use of a Bloom filter; the other is the interest count entity, which counts interest packets that require certain content and selects content worth caching. Our contributions are (1) presenting a proper algorithm for looking up and matching name addresses in CCN communication, (2) proposing a method to process CCN packets in a way that achieves high throughput and very low latency, and (3) demonstrating feasible performance and cost on the basis of a concrete hardware design using distributed content-addressable memory.

  • Improved Resilience through Extended KVS-Based Messaging System

    Masafumi KINOSHITA  Osamu TAKADA  Izumi MIZUTANI  Takafumi KOIKE  Kenji LEIBNITZ  Masayuki MURATA  

     
    PAPER-Internet Operation and Management

      Pubricized:
    2014/12/11
      Vol:
    E98-D No:3
      Page(s):
    578-587

    In the big data era, messaging systems are required to process large volumes of message traffic with high scalability and availability. However, conventional systems have two issues regarding availability. The first issue is that failover processing itself has a risk of failure. The second issue is to find a trade-off between consistency and availability. We propose a resilient messaging system based on a distributed in-memory key-value store (KVS). Its servers are interconnected with each other and messages are distributed to multiple servers in normal processing state. This architecture can continue messaging services wherever in the messaging system server/process failures occur without using failover processing. Furthermore, we propose two methods for improved resilience: the round-robin method with a slowdown KVS exclusion and the two logical KVS counter-rotating rings to provide short-term-availability in the messaging system. Evaluation results demonstrate that the proposed system can continue service without failover processing. Compared with the conventional method, our proposed distribution method reduced 92% of error responses to clients caused by server failures.

141-160hit(653hit)