Naoko KIFUNE Hironori UCHIKAWA
At a flash memory, each stored data frame is protected by error correction codes (ECC) such as Bose-Chaudhuri-Hocquenghem (BCH) codes from random errors. Exclusive-OR (XOR) based erasure codes like RAID-5 have also been employed at the flash memory to protect from memory block defects. Conventionally, the ECC and erasure codes are used separately since their target errors are different. Due to recent aggressive technology scaling, additional error correction capability for random errors is required without adding redundancy. We propose an algorithm to improve error correction capability by using XOR parity with a simple counter that counts the number of unreliable bits in the XOR stripe. We also propose to apply Chase decoding to the proposed algorithm. The counter makes it possible to reduce the false correction and execute the efficient Chase decoding. We show that combining the proposed algorithm with Chase decoding can significantly improve the decoding performance.
This paper reviews and discusses a brief history of Nyquist ADCs. Bipolar flash ADCs for early development stage of HDTV and digital oscilloscopes, a Bi-CMOS two-step flash ADC using resistive interpolation for home HDTV receivers, a CMOS two-step flash ADC using capacitive interpolation for handy camcorders, pipelined ADCs using CMOS operational amplifiers, CMOS flash ADCs using dynamic comparator and digital offset compensation, SAR ADCs using low noise dynamic comparators and MOM capacitors, and hybrid ADCs are reviewed.
This study proposes a heterogeneous integration of precise and approximate storage in data center storage. The storage control engine allocates precise and error-tolerant applications to precise and approximate storage, respectively. The appropriate use of both precise and approximate storage is examined by applying a non-volatile memory capacity algorithm. To respond to the changes in application over time, the non-volatile memory capacity algorithm changes capacity of storage class memories (SCMs), namely the memory-type SCM (M-SCM) and storage-type SCM (S-SCM), in non-volatile memory resource. A three-dimensional triple-level cell (TLC) NAND flash is used as a large capacity memory. The results indicate that precise storage exhibits a high performance when the maximum storage cost is high. By contrast, with a low maximum storage cost, approximate storage exhibits high performance using a low bit cost approximate multiple-level cell (MLC) S-SCM.
Takehiro KITAMURA Mahfuzul ISLAM Takashi HISAKADO Osami WADA
High-speed flash ADCs are useful in high-speed applications such as communication receivers. Due to offset voltage variation in the sub-micron processes, the power consumption and the area increase significantly to suppress variation. As an alternative to suppressing the variation, we have developed a flash ADC architecture that selects the comparators based on offset voltage ranking for reference generation. Specifically, with the order statistics as a basis, our method selects the minimum number of comparators to obtain equally spaced reference values. Because the proposed ADC utilizes offset voltages as references, no resistor ladder is required. We also developed a time-domain sorting mechanism for the offset voltages to achieve on-chip comparator selection. We first perform a detailed analysis of the order statistics based selection method and then design a 4-bit ADC in a commercial 65-nm process and perform transistor-level simulation. When using 127 comparators, INLs of 20 virtual chips are in the range of -0.34LSB/+0.29LSB to -0.83LSB/+0.74LSB, and DNLs are in the range of -0.33LSB/+0.24LSB to -0.77LSB/+1.18LSB at 1-GS/s operation. Our ADC achieves the SNDR of 20.9dB at Nyquist-frequency input and the power consumption of 0.84mW.
Naoki KAWASAKI Yuuki MACHIDA Takayuki MISU Keiichi ABE Hiroshi SUGIMURA Makiko OKUMURA
A line display that utilizes saccade has been proposed. When an observer moves his or her eyes on a one-dimensional fixed line display, two-dimensional information is perceived on the retina. In this paper, a high speed flashing line display was developed using a CPLD and PIC microcontroller. The flashing period was reduced to 20 µs, which was less than half that of our previous system. The relationship between the flashing frequency and the optimum distance that can be perceived with the least distortion was clarified. The results show that the higher the flashing frequency is, the more information can be perceived from a farther position. Calculated values, which were based on the relationship between the flashing period and the width of the light source, were almost identical with measured values at the flashing frequencies from 3.3 kHz to 10 kHz. Due to short flashing period, the developed line display not only was visible at distance of 15 m or more, which is suitable for outdoor use, but also realized 16 gray levels.
Jisu KWON Moon Gi SEOK Daejin PARK
IoT devices operate with a battery and have embedded firmware in flash memory. If the embedded firmware is not kept up to date, there is a possibility of problems that cannot be linked with other IoT networks, so it is necessary to maintain the latest firmware with frequent updates. However, because firmware updates require developers and equipment, they consume manpower and time. Additionally, because the device must be active during the update, a low-power operation is not possible due to frequent flash memory access. In addition, if an unexpected interruption occurs during an update, the device is unavailable and requires a reliable update. Therefore, this paper aims to improve the reliability of updates and low-power operation by proposing a technique of performing firmware updates at high speed. In this paper, we propose a technique to update only a part of the firmware stored in nonvolatile flash memory without pre-processing to generate delta files. The firmware is divided into function blocks, and their addresses are collectively managed in a separate area called a function map. When updating the firmware, only the new function block to be updated is transmitted from the host downloader, and the bootloader proceeds with the update using the function block stored in the flash memory. Instead of transmitting the entire new firmware and writing it in the memory, using only function block reduces the amount of resources required for updating. Function-blocks can be called indirectly through a function map, so that the update can be completed by modifying only the function map regardless of the physical location. Our evaluation results show that the proposed technique effectively reduces the time cost, energy consumption, and additional memory usage overhead that can occur when updating firmware.
This paper formulates minimal word-line (WL) delay time with pre-emphasis pulses to design the pulse width as a function of the overdrive voltage for large memory arrays such as 3D NAND. Circuit theory for a single RC line only with capacitance to ground and that only with coupling capacitance as well as a general case where RC lines have both grounded and coupling capacitance is discussed to provide an optimum pre-emphasis pulse width to minimize the delay time. The theory is expanded to include the cases where the resistance of the RC line driver is not negligibly small. The minimum delay time formulas of a single RC delay line and capacitive coupling RC lines was in good agreement (i.e. within 5% error) with measurement. With this research, circuit designers can estimate an optimum pre-emphasis pulse width and the delay time for an RC line in the initial design phase.
As NAND flash-based storage has been settled, a flash translation layer (FTL) has been in charge of mapping data addresses on NAND flash memory. Many FTLs implemented various mapping schemes, but the amount of mapping data depends on the mapping level. However, the FTL should contemplate mapping consistency irrespective of how much mapping data dwell in the storage. Furthermore, the recovery cost by the inconsistency needs to be considered for a faster storage reboot time. This letter proposes a novel method that enhances the consistency for a page-mapping level FTL running a legacy logging policy. Moreover, the recovery cost of page mappings also decreases. The novel method is to adopt a virtually-shrunk segment and deactivate page-mapping logs by assembling and storing the segments. This segment scheme already gave embedded NAND flash-based storage enhance its response time in our previous study. In addition to that improved result, this novel plan maximizes the page-mapping consistency, therefore improves the recovery cost compared with the legacy page-mapping FTL.
Yoshiki TAKAI Mamoru FUKUCHI Chihiro MATSUI Reika KINOSHITA Ken TAKEUCHI
This paper analyzes the optimal SSD configuration including emerging non-volatile memories such as quadruple-level cell (QLC) NAND flash memory [1] and storage class memories (SCMs). First, SSD performance and SSD endurance lifetime of hybrid SSD are evaluated in four configurations: 1) single-level cell (SLC)/QLC NAND flash, 2) SCM/QLC NAND flash, 3) SCM/triple-level cell (TLC)/QLC NAND flash and 4) SCM/TLC NAND flash. Furthermore, these four configurations are compared in limited cost. In case of cold workloads or high total SSD cost assumption, SCM/TLC NAND flash hybrid configuration is recommended in both SSD performance and endurance lifetime. For hot workloads with low total SSD cost assumption, however, SLC/QLC NAND flash hybrid configuration is recommended with emphasis on SSD endurance lifetime. Under the same conditions as above, SCM/TLC/QLC NAND flash tri-hybrid is the best configuration in SSD performance considering cost. In particular, for prxy_0 (write-hot workload), SCM/TLC/QLC NAND flash tri-hybrid achieves 67% higher IOPS/cost than SCM/TLC NAND flash hybrid. Moreover, the configurations with the highest IOPS/cost in each workload and cost limit are picked up and analyzed with various types of SCMs. For all cases except for the case of prxy_1 with high total SSD cost assumption, middle-end SCM (write latency: 1us, read latency: 1us) is recommended in performance considering cost. However, for prxy_1 (read-hot workload) with high total SSD cost assumption, high-end SCM (write latency: 100ns, read latency: 100ns) achieves the best performance.
Mamoru FUKUCHI Chihiro MATSUI Ken TAKEUCHI
This paper analyzes the system-level performance of Storage Class Memory (SCM)/NAND flash hybrid solid-state drives (SSDs) and SCM/NAND flash/NAND flash tri-hybrid SSDs in difference types of NAND flash memory. There are several types of NAND flash memory, i.e. 2-dimensional (2D) or 3-dimensional (3D), charge-trap type (CT) and floating-gate type (FG) and multi-level cell (MLC) or triple-level cell (TLC). In this paper, the following four types of NAND flash memory are analyzed: 1) 3D CT TLC, 2) 3D FG TLC, 3) 2D FG TLC, and 4) 2D FG MLC NAND flash. Regardless of read- and write-intensive workloads, SCM/NAND flash hybrid SSD with low cost 3D CT TLC NAND flash achieves the best performance that is 20% higher than that with higher cost 2D FG MLC NAND flash. The performance improvement of 3D CT TLC NAND flash can be obtained by the short write latency. On the other hand, in case of tri-hybrid SSD, SCM/3D CT TLC/3D CT TLC NAND flash tri-hybrid SSD improves the performance 102% compared to SCM/2D FG MLC/3D CT TLC NAND flash tri-hybrid SSD. In addition, SCM/2D FG MLC/2D FG MLC NAND flash tri-hybrid SSD shows 49% lower performance than SCM/2D FG MLC/3D CT TLC NAND flash tri-hybrid SSD. Tri-hybrid SSD flash with 3D CT TLC NAND flash is the best performance in tri-hybrid SSD thanks to larger block size and word-line (WL) write. Therefore, in 3D CT TLC NAND flash based SSDs, higher cost MLC NAND flash is not necessary for hybrid SSD and tri-hybrid SSD for data center applications.
Takashi KONO Yasuhiko TAITO Hideto HIDAKA
Embedded system approaches to edge computing in IoT implementations are proposed and discussed. Rationales of edge computing and essential core capabilities for IoT data supply innovation are identified. Then, innovative roles and development of MCU and embedded flash memory are illustrated by technology and applications, expanding from CPS to big-data and nomadic/autonomous elements of IoT requirements. Conclusively, a technology roadmap construction specific to IoT is proposed.
Yongju SONG Sungkyun LEE Dong Hyun KANG Young Ik EOM
Flash storage suffers from severe performance degradation due to its inherent internal synchronization overhead. Especially, flushing an L2P (logical address to physical address) mapping table significantly contributes to the performance degradation. To relieve the problem, we propose an efficient L2P mapping table management scheme on the flash storage, which works along with a small-sized NVRAM. It flushes L2P mapping table from DRAM to NVRAM or flash memory selectively. In our experiments, the proposed scheme shows up to 9.37× better performance than conventional schemes.
Xuncheng ZOU Shigetoshi NAKATAKE
A low voltage stochastic flash ADC (analog-to-digital converter) is presented, with an inverter-based comparative unit which is used to replace comparator for comparison. Aiming at the low voltage and low power consumption, a key of our design is in the simplicity of the structure. The inverter-based comparative unit replacing a comparator enables us to decrease the number of transistors for area saving and power reduction. We insert the inverter-chain in front of the comparative unit for the signal stability and discuss an appropriate circuit structure for the resolution by analyzing three different ones. Finally, we design the whole stochastic flash ADC for verifying our idea, where the supply voltage can go down to 0.6V on the 65nm CMOS process, and through post-layout simulation result, we can observe its advantage visually in voltage, area and power consumption.
This paper proposes a method to absorb flash crowd in P2P video streaming systems. The idea of the proposed method is to reduce the time before a newly arrived node becoming an uploader by explicitly constructing a group of newly arrived nodes called flash crowd absorber (FCA). FCA grows continuously while serving a video stream to the members of the group, and it is explicitly controlled so that the upload capacity of the nodes is fully utilized and it attains a nearly optimal latency of the stream during a flash crowd. A numerical comparison with a naive tree-based scheme is also given.
Kazuichi OE Mitsuru SATO Takeshi NANRI
The response times of solid state drives (SSDs) have decreased dramatically due to the growing use of non-volatile memory express (NVMe) devices. Such devices have response times of less than 100 micro seconds on average. The response times of all-flash-array systems have also decreased dramatically through the use of NVMe SSDs. However, there are applications, particularly virtual desktop infrastructure and in-memory database systems, that require storage systems with even shorter response times. Their workloads tend to contain many input-output (IO) concentrations, which are aggregations of IO accesses. They target narrow regions of the storage volume and can continue for up to an hour. These narrow regions occupy a few percent of the logical unit number capacity, are the target of most IO accesses, and appear at unpredictable logical block addresses. To drastically reduce the response times for such workloads, we developed an automated tiered storage system called “automated tiered storage with fast memory and slow flash storage” (ATSMF) in which the data in targeted regions are migrated between storage devices depending on the predicted remaining duration of the concentration. The assumed environment is a server with non-volatile memory and directly attached SSDs, with the user applications executed on the server as this reduces the average response time. Our system predicts the effect of migration by using the previously monitored values of the increase in response time during migration and the change in response time after migration. These values are consistent for each type of workload if the system is built using both non-volatile memory and SSDs. In particular, the system predicts the remaining duration of an IO concentration, calculates the expected response-time increase during migration and the expected response-time decrease after migration, and migrates the data in the targeted regions if the sum of response-time decrease after migration exceeds the sum of response-time increase during migration. Experimental results indicate that ATSMF is at least 20% faster than flash storage only and that its memory access ratio is more than 50%.
Akira YAMAWAKI Hiroshi KAMABE Shan LU
In multilevel flash memory, in general, multiple read thresholds are required to read a single logical page. Random I/O (RIO) code, introduced by Sharon and Alrod, is a coding scheme that enables the reading of one logical page using a single read threshold. It was shown that the construction of RIO codes is equivalent to the construction of write-once memory (WOM) codes. Yaakobi and Motwani proposed a family of RIO codes, called parallel RIO (P-RIO) code, in which all logical pages are encoded in parallel. In this paper, we utilize coset coding with Hamming codes in order to construct P-RIO codes. Coset coding is a technique to construct WOM codes using linear binary codes. We leverage information on the data of all pages to encode each page. Our P-RIO codes, using which more pages can be stored than RIO codes constructed via coset coding, have parameters for which RIO codes do not exist.
In this letter, we propose a static wear leveling technique, called Recency-based Wear Leveling (RbWL). The basic idea of RbWL is to execute static wear leveling at minimum levels, because the frequent migrations of cold data by static wear leveling cause significant overhead in a NAND flash memory system. RbWL adjusts the execution frequency according to a threshold value that reflects the lifetime difference of the hot/cold blocks and the total lifetime of the NAND flash memory system. The evaluation results show that RbWL improves the lifetime of NAND flash memory systems by 52%, and it also reduces the overhead of wear leveling from 8% to 42% and from 13% to 51%, in terms of the number of erase operations and the number of page migrations of valid pages, respectively, compared with other algorithms.
Joon-Young PAIK Rize JIN Tae-Sun CHUNG
In terms of system reliability, data recovery is a crucial capability. The lack of data recovery leads to the permanent loss of valuable data. This paper aims at improving data recovery in flash-based storage devices where extremely poor data recovery is shown. For this, we focus on garbage collection that determines the life span of data which have high possibility of data recovery requests by users. A new garbage collection mechanism with awareness of data recovery is proposed. First, deleted or overwritten data are categorized into shallow invalid data and deep invalid data based on the possibility of data recovery requests. Second, the proposed mechanism selects victim area for reclamation of free space, considering the shallow invalid data that have the high possibility of data recovery requests. Our proposal prohibits more shallow invalid data from being eliminated during garbage collections. The experimental results show that our garbage collection mechanism can improve data recovery with minor performance degradation.
Hirofumi TAKISHITA Yutaka ADACHI Chihiro MATSUI Ken TAKECUHI
NAND flash memories used in solid-state drives (SSDs) will be replaced with storage-class memories (SCMs), which are comparable with NAND flash in their cost, and with DRAM in their speed. This paper describes the performance difference of the SCM/NAND flash hybrid SSD and the SCM-based SSD with between sector-unit read (512 Byte) and page-unit read (16 KByte, NAND flash page-size) using synthetic and real workload. Also, effect of the SCM read-unit size on SSD performance are analyzed. When SCM write/read latency is 0.1 us, performance difference of the SCM/NAND flash hybrid SSD with between page- and sector-unit read is about 1% and 6% at most for the write-intensive and read-intensive workloads, respectively. However, performance of the SCM-based SSD is significantly improved when sector-unit read is used because extra read latency does not occur. Especially, the SCM-based SSD IOPS is improved by 131% for proj_3 (read-hot-random), because its read request size is small but its read request ratio is large. This paper also shows IOPS of SCM-based SSD write/read with sector-unit read can be predicted by the average write/read request size of workloads.
Yusuke YAMAGA Chihiro MATSUI Yukiya SAKAKI Ken TAKEUCHI
In order to reduce the memory cell errors in real-usage of NAND flash-based SSD, real usage-based precise reliability test for NAND flash of SSDs has been proposed. Reliability of the NAND flash memories of the SSDs is seriously degraded as the scaling of memory cells. However, conventional simple reliability tests of read-disturb and data-retention cannot give the same result as the real-life VTH shift and memory cell errors. To solve this problem, the proposed reliability test precisely reproduces the real memory cell failures by emulating the complicated read, write, and data-retention with SSD emulator. In this paper, the real-life VTH shift and memory cell errors between two generations of NAND flash memory with different characterized real workloads are provided. Using the proposed test method, 1.6-times BER difference is observed when write-cold and read-hot workload (hm_1) and write-hot and read-hot workload (prxy_1) are compared in 1Ynm MLC NAND flash. In addition, by NAND flash memory scaling from 1Xnm to 1Ynm generations, the discrepancy of error numbers between the conventional reliability test result and actual reliability measured by proposed reliability test is increased by 6.3-times. Finally, guidelines for read reference voltage shifts and strength of ECCs are given to achieve high memory cell reliability for various workloads.