This study proposes a heterogeneous integration of precise and approximate storage in data center storage. The storage control engine allocates precise and error-tolerant applications to precise and approximate storage, respectively. The appropriate use of both precise and approximate storage is examined by applying a non-volatile memory capacity algorithm. To respond to the changes in application over time, the non-volatile memory capacity algorithm changes capacity of storage class memories (SCMs), namely the memory-type SCM (M-SCM) and storage-type SCM (S-SCM), in non-volatile memory resource. A three-dimensional triple-level cell (TLC) NAND flash is used as a large capacity memory. The results indicate that precise storage exhibits a high performance when the maximum storage cost is high. By contrast, with a low maximum storage cost, approximate storage exhibits high performance using a low bit cost approximate multiple-level cell (MLC) S-SCM.
Yoshiki TAKAI Mamoru FUKUCHI Chihiro MATSUI Reika KINOSHITA Ken TAKEUCHI
This paper analyzes the optimal SSD configuration including emerging non-volatile memories such as quadruple-level cell (QLC) NAND flash memory [1] and storage class memories (SCMs). First, SSD performance and SSD endurance lifetime of hybrid SSD are evaluated in four configurations: 1) single-level cell (SLC)/QLC NAND flash, 2) SCM/QLC NAND flash, 3) SCM/triple-level cell (TLC)/QLC NAND flash and 4) SCM/TLC NAND flash. Furthermore, these four configurations are compared in limited cost. In case of cold workloads or high total SSD cost assumption, SCM/TLC NAND flash hybrid configuration is recommended in both SSD performance and endurance lifetime. For hot workloads with low total SSD cost assumption, however, SLC/QLC NAND flash hybrid configuration is recommended with emphasis on SSD endurance lifetime. Under the same conditions as above, SCM/TLC/QLC NAND flash tri-hybrid is the best configuration in SSD performance considering cost. In particular, for prxy_0 (write-hot workload), SCM/TLC/QLC NAND flash tri-hybrid achieves 67% higher IOPS/cost than SCM/TLC NAND flash hybrid. Moreover, the configurations with the highest IOPS/cost in each workload and cost limit are picked up and analyzed with various types of SCMs. For all cases except for the case of prxy_1 with high total SSD cost assumption, middle-end SCM (write latency: 1us, read latency: 1us) is recommended in performance considering cost. However, for prxy_1 (read-hot workload) with high total SSD cost assumption, high-end SCM (write latency: 100ns, read latency: 100ns) achieves the best performance.
Mamoru FUKUCHI Chihiro MATSUI Ken TAKEUCHI
This paper analyzes the system-level performance of Storage Class Memory (SCM)/NAND flash hybrid solid-state drives (SSDs) and SCM/NAND flash/NAND flash tri-hybrid SSDs in difference types of NAND flash memory. There are several types of NAND flash memory, i.e. 2-dimensional (2D) or 3-dimensional (3D), charge-trap type (CT) and floating-gate type (FG) and multi-level cell (MLC) or triple-level cell (TLC). In this paper, the following four types of NAND flash memory are analyzed: 1) 3D CT TLC, 2) 3D FG TLC, 3) 2D FG TLC, and 4) 2D FG MLC NAND flash. Regardless of read- and write-intensive workloads, SCM/NAND flash hybrid SSD with low cost 3D CT TLC NAND flash achieves the best performance that is 20% higher than that with higher cost 2D FG MLC NAND flash. The performance improvement of 3D CT TLC NAND flash can be obtained by the short write latency. On the other hand, in case of tri-hybrid SSD, SCM/3D CT TLC/3D CT TLC NAND flash tri-hybrid SSD improves the performance 102% compared to SCM/2D FG MLC/3D CT TLC NAND flash tri-hybrid SSD. In addition, SCM/2D FG MLC/2D FG MLC NAND flash tri-hybrid SSD shows 49% lower performance than SCM/2D FG MLC/3D CT TLC NAND flash tri-hybrid SSD. Tri-hybrid SSD flash with 3D CT TLC NAND flash is the best performance in tri-hybrid SSD thanks to larger block size and word-line (WL) write. Therefore, in 3D CT TLC NAND flash based SSDs, higher cost MLC NAND flash is not necessary for hybrid SSD and tri-hybrid SSD for data center applications.
In this letter, we propose a static wear leveling technique, called Recency-based Wear Leveling (RbWL). The basic idea of RbWL is to execute static wear leveling at minimum levels, because the frequent migrations of cold data by static wear leveling cause significant overhead in a NAND flash memory system. RbWL adjusts the execution frequency according to a threshold value that reflects the lifetime difference of the hot/cold blocks and the total lifetime of the NAND flash memory system. The evaluation results show that RbWL improves the lifetime of NAND flash memory systems by 52%, and it also reduces the overhead of wear leveling from 8% to 42% and from 13% to 51%, in terms of the number of erase operations and the number of page migrations of valid pages, respectively, compared with other algorithms.
Joon-Young PAIK Rize JIN Tae-Sun CHUNG
In terms of system reliability, data recovery is a crucial capability. The lack of data recovery leads to the permanent loss of valuable data. This paper aims at improving data recovery in flash-based storage devices where extremely poor data recovery is shown. For this, we focus on garbage collection that determines the life span of data which have high possibility of data recovery requests by users. A new garbage collection mechanism with awareness of data recovery is proposed. First, deleted or overwritten data are categorized into shallow invalid data and deep invalid data based on the possibility of data recovery requests. Second, the proposed mechanism selects victim area for reclamation of free space, considering the shallow invalid data that have the high possibility of data recovery requests. Our proposal prohibits more shallow invalid data from being eliminated during garbage collections. The experimental results show that our garbage collection mechanism can improve data recovery with minor performance degradation.
Hirofumi TAKISHITA Yutaka ADACHI Chihiro MATSUI Ken TAKECUHI
NAND flash memories used in solid-state drives (SSDs) will be replaced with storage-class memories (SCMs), which are comparable with NAND flash in their cost, and with DRAM in their speed. This paper describes the performance difference of the SCM/NAND flash hybrid SSD and the SCM-based SSD with between sector-unit read (512 Byte) and page-unit read (16 KByte, NAND flash page-size) using synthetic and real workload. Also, effect of the SCM read-unit size on SSD performance are analyzed. When SCM write/read latency is 0.1 us, performance difference of the SCM/NAND flash hybrid SSD with between page- and sector-unit read is about 1% and 6% at most for the write-intensive and read-intensive workloads, respectively. However, performance of the SCM-based SSD is significantly improved when sector-unit read is used because extra read latency does not occur. Especially, the SCM-based SSD IOPS is improved by 131% for proj_3 (read-hot-random), because its read request size is small but its read request ratio is large. This paper also shows IOPS of SCM-based SSD write/read with sector-unit read can be predicted by the average write/read request size of workloads.
Yusuke YAMAGA Chihiro MATSUI Yukiya SAKAKI Ken TAKEUCHI
In order to reduce the memory cell errors in real-usage of NAND flash-based SSD, real usage-based precise reliability test for NAND flash of SSDs has been proposed. Reliability of the NAND flash memories of the SSDs is seriously degraded as the scaling of memory cells. However, conventional simple reliability tests of read-disturb and data-retention cannot give the same result as the real-life VTH shift and memory cell errors. To solve this problem, the proposed reliability test precisely reproduces the real memory cell failures by emulating the complicated read, write, and data-retention with SSD emulator. In this paper, the real-life VTH shift and memory cell errors between two generations of NAND flash memory with different characterized real workloads are provided. Using the proposed test method, 1.6-times BER difference is observed when write-cold and read-hot workload (hm_1) and write-hot and read-hot workload (prxy_1) are compared in 1Ynm MLC NAND flash. In addition, by NAND flash memory scaling from 1Xnm to 1Ynm generations, the discrepancy of error numbers between the conventional reliability test result and actual reliability measured by proposed reliability test is increased by 6.3-times. Finally, guidelines for read reference voltage shifts and strength of ECCs are given to achieve high memory cell reliability for various workloads.
The use of flash memory based storage devices is rapidly increasing, and user demands for high performance are also constantly increasing. The performance of the flash storage device is greatly influenced by cleaning operations of Flash Translation Layer (FTL). Various studies have been conducted to lower the cost of cleaning operations. However, there are limits to achieve sufficient performance improvement of flash storages without help of a host system, with only limited information in storage devices. Recently, SCSI, eMMC, and UFS standards provide an interface for sending semantic information from a host system to a storage device. In this paper, we analyze effects of semantic information on performance and lifetime of flash storage devices. We evaluate performance and lifetime improvement through SA-FTL (Semantic Aware Flash Translation Layer), which can take advantage of semantic information in storage devices. Experiments show that SA-FTL improves performance and lifetime of flash based storages by up to 30 and 35%, respectively, compared to a simple page-level FTL.
Tomoaki YAMADA Chihiro MATSUI Ken TAKEUCHI
In order to realize solid-state drives (SSDs) with high performance, low energy consumption and high reliability, storage class memory (SCM)/multi-level cell (MLC) NAND flash hybrid SSD has been proposed. Algorithm of the hybrid SSD should be designed according to SCM specifications and workload characteristics. In this paper, SCMs are used as non-volatile cache. Cache operation guidelines and optimal SCM specifications for the hybrid SSD are provided for various workload characteristics. Three kinds of non-volatile cache operation for the hybrid SSD are discussed: i) write cache, ii) read-write cache without space control (RW cache) and iii) read-write cache with space control (RW cache w/ SC). SSD workloads are categorized into eight according to read/write ratio, access frequency and access data size. From evaluation result, the write cache algorithm is suitable for write-intensive workloads and read-cold-sequential workloads, while the RW cache algorithm is suitable for read-cold-random workloads to achieve the highest performance of the hybrid SSD. In contrast, as for read-hot-random workloads, write cache is appropriate when the SCM capacity is less than 3% of the NAND flash capacity. On the other hand, RW cache should be used in case that SCM capacity is more than 5% of NAND flash capacity. The effect of Memory-type SCM (M-SCM) and Storage-type SCM (S-SCM) on the hybrid SSD performance is also analyzed. The M-SCM latency is below 1 us (high speed) but the capacity is only 2% of the NAND flash capacity (small capacity). On the other hand, the S-SCM capacity is assumed to be 5% of the NAND flash capacity (large capacity) but S-SCM speed is larger than 1 us (low speed). If the additional SCM cost is limited to 20% of MLC NAND flash cost, up to 7-times and 8-times performance improvement are achieved in write-hot-random workload and read-hot-random workloads, respectively. Moreover, if the additional SCM cost is the same as MLC NAND flash cost, M-SCM/MLC NAND flash hybrid SSD achieves 24-times performance improvement.
Liyu WANG Lan CHEN Xiaoran HAO
NAND flash memory has been widely used in storage systems. Aiming to design an efficient buffer policy for NAND flash memory, a life-aware buffer management algorithm named LAB-LRU is proposed, which manages the buffer by three LRU lists. A life value is defined for every page and the active pages with higher life value can stay longer in the buffer. The definition of life value considers the effect of access frequency, recency and the cost of flash read and write operations. A series of trace-driven simulations are carried out and the experimental results show that the proposed LAB-LRU algorithm outperforms the previous best-known algorithms significantly in terms of the buffer hit ratio, the numbers of flash write and read operations and overall runtime.
Seon Hwan KIM Ju Hee CHOI Jong Wook KWAK
In this letter, we propose a novel garbage collection technique for index structures based on flash memory systems, called Proxy Block-based Garbage Collection (PBGC). Many index structures have been proposed for flash memory systems. They exploit buffers and logs to resolve the update propagation problem, one of the a main cause of performance degradation of the index structures. However, these studies overlooked the fact that not only the record operation but also garbage collection induces the update propagation problem. The proposal, PBGC, exploits a proxy block and a block mapping table to solve the update propagation problem, which is caused by the changes in the page and block caused by garbage collection. Experiments show that PBGC decreased the execution time of garbage collection by up to 39%, compared with previous garbage collection techniques.
Seon Hwan KIM Ju Hee CHOI Jong Wook KWAK
In this letter, we propose a novel wear leveling technique we call Hidden cold block-aware Wear Leveling (HaWL) using a bit-set threshold. HaWL prolongs the lifetime of flash memory devices by using a bit array table in wear leveling. The bit array table saves the histories of block erasures for a period and distinguishes cold blocks from all blocks. In addition, HaWL can reduce the size of the bit array table by using a one-to-many mode, where one bit is related to many blocks. Moreover, to prevent degradation of wear leveling in the one-to-many mode, HaWL uses bit-set threshold (BST) and increases the accuracy of the cold block information. The performance results illustrate that HaWL prolongs the lifetime of flash memory by up to 48% compared with previous wear leveling techniques in our experiments.
Hirofumi TAKISHITA Shuhei TANAKAMARU Sheyang NING Ken TAKEUCHI
Storage-Class Memory (SCM) and NAND flash hybrid Solid-State Drive (SSD) has advantages of high performance and low power consumption compared with NAND flash only SSD. In this paper, first, three SSD configurations are investigated. Three different SCMs are used with 0.1 µs, 1 µs and 10 µs read/write latencies, respectively, and the required SCM/NAND flash capacity ratios are analyzed to maintain the same SSD performance. Next, by using the three SSD configurations, the variation of SSD reliability, performance and cost are analyzed by changing error correction strengths. The SSD reliability of acceptable SCM and NAND flash Bit Error Rates (BERs) is limited by achieving specified SSD performance with error correction, and/or limited by SCM and NAND flash parity size and SSD cost. Lastly, the SSD replacement cost is also analyzed by considering the limitation of NAND flash write/erase cycles. The purpose of this paper is to provide a design guideline for obtaining high performance, highly reliable and cost-effective SCM/NAND hybrid structure SSD with ECC.
Youngjoo LEE Jaehwan JUNG In-Cheol PARK
This paper presents a novel low-power decoder architecture for the (36420, 32778) binary LDPC code targeting energy-efficient NAND-flash-based mobile devices. The proposed energy-scalable decoding algorithm reduces the operating bit-width of decoding function units at the early-use stage where the channel condition is good enough to lower the precision of computation. Based on a flexible adder structure, the decoding energy of the proposed LDPC decoder can be reduced by freezing the unnecessary parts of hardware resources. A prototype 4KB LDPC decoder is designed in a 65nm CMOS technology, which achieves an average decoding throughput of 8.13Gb/s with 1.2M equivalent gates. The power consumption of the decoder ranges from 397mW to 563mW depending on operating conditions.
Shuhei TANAKAMARU Masafumi DOI Ken TAKEUCHI
A design strategy (the required ECC strength and the judgment method of the dominant error mode) of error-prediction low-density parity-check (EP-LDPC) error-correcting code (ECC) and error-recovery schemes for scaled NAND flash memories is discussed in this paper. The reliability characteristics of NAND flash memories are investigated with 1X, 2X and 3Xnm NAND flash memories. Moreover, the system-level reliability of SSDs is analyzed from the acceptable data-retention time of the SSD. The reliability of the NAND flash memory is continuously degrading as the design rule shrinks due to various problems. As a result, future SSDs will not be able to maintain system-level reliability unless advanced ECCs with signal processing are adopted. Therefore, EP-LDPC and error-recovery (ER) schemes are previously proposed to improve the reliability. The reliability characteristics such as the bit-error rate (BER) versus the data-retention time and the effect of the cell-to-cell interference on the BER are measured. These reliability characteristics obtained in this paper are stored in an SSD as a reliability table, which plays a principal role in EP-LDPC scheme. The effectiveness of the EP-LDPC scheme with the scaling of the NAND flash memory is also discussed by analyzing the cell-to-cell interference. An interference factor $alpha$ is proposed to discuss the impact of the cell-to-cell coupling. As a result, the EP-LDPC scheme is assumed to be effective down to 1Xnm NAND flash memory. On the other hand, the ER scheme applies different voltage pulses to memory cells, according to the dominant error mode: program-disturb or data-retention error dominant mode. This paper examines when the error mode changes, corresponding to which pulse should be applied. Additionally, the estimation methods of the dominant error mode by ER scheme are also discussed. Finally, as a result of the system-level reliability analysis, it is concluded that the use of the EP-LDPC scheme can maintain the reliability of the NAND flash memory in 1Xnm technology node.
The objective of this research is to design a high-performance NAND flash memory system with a data buffer. The proposed buffer system in the NAND flash memory consists of two parts, i.e., a fully associative temporal buffer for temporal locality and a fully associative spatial buffer for spatial locality. We propose a new operating mechanism for reducing overhead of flash memory, that is, erase and write operations. According to our simulation results, the proposed buffer system can reduce the write and erase operations by about 73% and 79% for spec application respectively, compared with a fully associative buffer with two times more space. Futhermore, the average memory access time can improve by about 60% compared with other large buffer systems.
NAND Flash memories are widely used as data storages today. The memories are not intrinsically error free because they are affected by several physical disturbances. Technology scaling and introduction of multi-level cell (MLC) has improved data density, but it has made error effect more significant. Error control codes (ECC) are essential to improve reliability of NAND Flash memories. Efficiency of codes depends on error characteristic of systems, and codes are required to be designed to reflect this characteristic. In MLC Flash memories, errors tend to direct values to neighborhood. These errors are a class of M-ary asymmetric symbol error. Some codes which reflect the asymmetric property were proposed. They are designed to correct only 1 level shift errors because almost all of the errors in the memories are in such errors. But technology scaling, increase of program/erase (P/E) cycles, and MLC storing the large number of bits can cause multiple-level shift. This paper proposes single error control codes which can correct an error of more than 1 levels shift. Because the number of levels to be corrected is selectable, we can fit it into noise magnitude. Furthermore, it is possible to add error detecting function for error of the larger shift. Proposed codes are equivalent to a conventional integer codes, which can correct 1 level shift, on a certain parameter. Therefore, the codes are said to be generalization of conventional integer codes. Evaluation results show information lengths to respective check symbol lengths are larger than nonbinary Hamming codes and other M-ary asymmetric symbol error correcting codes.
Recently, the 3-D vertical Floating Gate (FG) type NAND cell arrays with the Sidewall Control Gate (SCG), such as ESCG, DC-SF and S-SCG, are receiving attention to overcome the reliability issues of Charge Trap (CT) type device. Using this novel cell structure, highly reliable flash cell operations were successfully implemented without interference effect on the FG type cell. However, the 3-D vertical FG type cell has large cell size by about 60% for the cylindrical FG structure. In this point of view, we intensively investigate the scalability of the FG width of the 3-D vertical FG NAND cells. In case of the planar FG type NAND cell, the FG height cannot be scaled down due to the necessity of obtaining sufficient coupling ratio and high program speed. In contrast, for the 3-D vertical FG NAND with SCG, the FG is formed cylindrically, which is fully covered with surrounded CG, and very high CG coupling ratio can be achieved. As results, the scaling of FG width of the 3-D vertical FG NAND cell with S-SCG can be successfully demonstrated at 10 nm regime, which is almost the same as the CT layer of recent BE-SONOS NAND.
Se Hwan PARK Yoon KIM Wandong KIM Joo Yun SEO Hyungjin KIM Byung-Gook PARK
We propose a new three-dimensional (3D) NAND flash memory array having Tied Bit-line and Ground Select Transistor (TiGer) [1]. Channels are stacked in the vertical direction to increase the memory density without the device size scaling. To distinguish stacked channels, a novel operation scheme is introduced instead of adding supplementary control gates. The stacked layers are selected by using ground select line (GSL) and common source line (CSL). Device structure and fabrication process are described. Operation scheme and simulation results for program inhibition are also discussed.
Kousuke MIYAJI Ryoji YAJIMA Teruyoshi HATANAKA Mitsue TAKAHASHI Shigeki SAKAI Ken TAKEUCHI
Initialize and weak-program erasing scheme is proposed to achieve high-performance and high-reliability Ferroelectric (Fe-) NAND flash solid-state drive (SSD). Bit-by-bit erase VTH control is achieved by the proposed erasing scheme and history effects in Fe-NAND is also suppressed. History effects change the future erase VTH shift characteristics by the past program voltage. The proposed erasing scheme decreases VTH shift variation due to history effects from ±40% to ±2% and the erase VTH distribution width is reduced from over 0.4 V to 0.045 V. As a result, the read and VPASS disturbance decrease by 42% and 37%, respectively. The proposed erasing scheme is immune to VTH variations and voltage stress. The proposed erasing scheme also suppresses the power and bandwidth degradation of SSD.