The search functionality is under construction.

Author Search Result

[Author] Chihiro MATSUI(7hit)

1-7hit
  • Workload-Based Co-Design of Non-Volatile Cache Algorithm and Storage Class Memory Specifications for Storage Class Memory/NAND Flash Hybrid SSDs

    Tomoaki YAMADA  Chihiro MATSUI  Ken TAKEUCHI  

     
    PAPER

      Vol:
    E100-C No:4
      Page(s):
    373-381

    In order to realize solid-state drives (SSDs) with high performance, low energy consumption and high reliability, storage class memory (SCM)/multi-level cell (MLC) NAND flash hybrid SSD has been proposed. Algorithm of the hybrid SSD should be designed according to SCM specifications and workload characteristics. In this paper, SCMs are used as non-volatile cache. Cache operation guidelines and optimal SCM specifications for the hybrid SSD are provided for various workload characteristics. Three kinds of non-volatile cache operation for the hybrid SSD are discussed: i) write cache, ii) read-write cache without space control (RW cache) and iii) read-write cache with space control (RW cache w/ SC). SSD workloads are categorized into eight according to read/write ratio, access frequency and access data size. From evaluation result, the write cache algorithm is suitable for write-intensive workloads and read-cold-sequential workloads, while the RW cache algorithm is suitable for read-cold-random workloads to achieve the highest performance of the hybrid SSD. In contrast, as for read-hot-random workloads, write cache is appropriate when the SCM capacity is less than 3% of the NAND flash capacity. On the other hand, RW cache should be used in case that SCM capacity is more than 5% of NAND flash capacity. The effect of Memory-type SCM (M-SCM) and Storage-type SCM (S-SCM) on the hybrid SSD performance is also analyzed. The M-SCM latency is below 1 us (high speed) but the capacity is only 2% of the NAND flash capacity (small capacity). On the other hand, the S-SCM capacity is assumed to be 5% of the NAND flash capacity (large capacity) but S-SCM speed is larger than 1 us (low speed). If the additional SCM cost is limited to 20% of MLC NAND flash cost, up to 7-times and 8-times performance improvement are achieved in write-hot-random workload and read-hot-random workloads, respectively. Moreover, if the additional SCM cost is the same as MLC NAND flash cost, M-SCM/MLC NAND flash hybrid SSD achieves 24-times performance improvement.

  • Analysis on Hybrid SSD Configuration with Emerging Non-Volatile Memories Including Quadruple-Level Cell (QLC) NAND Flash Memory and Various Types of Storage Class Memories (SCMs)

    Yoshiki TAKAI  Mamoru FUKUCHI  Chihiro MATSUI  Reika KINOSHITA  Ken TAKEUCHI  

     
    PAPER-Integrated Electronics

      Vol:
    E103-C No:4
      Page(s):
    171-180

    This paper analyzes the optimal SSD configuration including emerging non-volatile memories such as quadruple-level cell (QLC) NAND flash memory [1] and storage class memories (SCMs). First, SSD performance and SSD endurance lifetime of hybrid SSD are evaluated in four configurations: 1) single-level cell (SLC)/QLC NAND flash, 2) SCM/QLC NAND flash, 3) SCM/triple-level cell (TLC)/QLC NAND flash and 4) SCM/TLC NAND flash. Furthermore, these four configurations are compared in limited cost. In case of cold workloads or high total SSD cost assumption, SCM/TLC NAND flash hybrid configuration is recommended in both SSD performance and endurance lifetime. For hot workloads with low total SSD cost assumption, however, SLC/QLC NAND flash hybrid configuration is recommended with emphasis on SSD endurance lifetime. Under the same conditions as above, SCM/TLC/QLC NAND flash tri-hybrid is the best configuration in SSD performance considering cost. In particular, for prxy_0 (write-hot workload), SCM/TLC/QLC NAND flash tri-hybrid achieves 67% higher IOPS/cost than SCM/TLC NAND flash hybrid. Moreover, the configurations with the highest IOPS/cost in each workload and cost limit are picked up and analyzed with various types of SCMs. For all cases except for the case of prxy_1 with high total SSD cost assumption, middle-end SCM (write latency: 1us, read latency: 1us) is recommended in performance considering cost. However, for prxy_1 (read-hot workload) with high total SSD cost assumption, high-end SCM (write latency: 100ns, read latency: 100ns) achieves the best performance.

  • Analysis of SCM-Based SSD Performance in Consideration of SCM Access Unit Size, Write/Read Latencies and Application Request Size

    Hirofumi TAKISHITA  Yutaka ADACHI  Chihiro MATSUI  Ken TAKECUHI  

     
    PAPER

      Vol:
    E101-C No:4
      Page(s):
    253-262

    NAND flash memories used in solid-state drives (SSDs) will be replaced with storage-class memories (SCMs), which are comparable with NAND flash in their cost, and with DRAM in their speed. This paper describes the performance difference of the SCM/NAND flash hybrid SSD and the SCM-based SSD with between sector-unit read (512 Byte) and page-unit read (16 KByte, NAND flash page-size) using synthetic and real workload. Also, effect of the SCM read-unit size on SSD performance are analyzed. When SCM write/read latency is 0.1 us, performance difference of the SCM/NAND flash hybrid SSD with between page- and sector-unit read is about 1% and 6% at most for the write-intensive and read-intensive workloads, respectively. However, performance of the SCM-based SSD is significantly improved when sector-unit read is used because extra read latency does not occur. Especially, the SCM-based SSD IOPS is improved by 131% for proj_3 (read-hot-random), because its read request size is small but its read request ratio is large. This paper also shows IOPS of SCM-based SSD write/read with sector-unit read can be predicted by the average write/read request size of workloads.

  • Reliability Analysis of Scaled NAND Flash Memory Based SSDs with Real Workload Characteristics by Using Real Usage-Based Precise Reliability Test

    Yusuke YAMAGA  Chihiro MATSUI  Yukiya SAKAKI  Ken TAKEUCHI  

     
    PAPER

      Vol:
    E101-C No:4
      Page(s):
    243-252

    In order to reduce the memory cell errors in real-usage of NAND flash-based SSD, real usage-based precise reliability test for NAND flash of SSDs has been proposed. Reliability of the NAND flash memories of the SSDs is seriously degraded as the scaling of memory cells. However, conventional simple reliability tests of read-disturb and data-retention cannot give the same result as the real-life VTH shift and memory cell errors. To solve this problem, the proposed reliability test precisely reproduces the real memory cell failures by emulating the complicated read, write, and data-retention with SSD emulator. In this paper, the real-life VTH shift and memory cell errors between two generations of NAND flash memory with different characterized real workloads are provided. Using the proposed test method, 1.6-times BER difference is observed when write-cold and read-hot workload (hm_1) and write-hot and read-hot workload (prxy_1) are compared in 1Ynm MLC NAND flash. In addition, by NAND flash memory scaling from 1Xnm to 1Ynm generations, the discrepancy of error numbers between the conventional reliability test result and actual reliability measured by proposed reliability test is increased by 6.3-times. Finally, guidelines for read reference voltage shifts and strength of ECCs are given to achieve high memory cell reliability for various workloads.

  • System Performance Comparison of 3D Charge-Trap TLC NAND Flash and 2D Floating-Gate MLC NAND Flash Based SSDs

    Mamoru FUKUCHI  Chihiro MATSUI  Ken TAKEUCHI  

     
    PAPER-Integrated Electronics

      Vol:
    E103-C No:4
      Page(s):
    161-170

    This paper analyzes the system-level performance of Storage Class Memory (SCM)/NAND flash hybrid solid-state drives (SSDs) and SCM/NAND flash/NAND flash tri-hybrid SSDs in difference types of NAND flash memory. There are several types of NAND flash memory, i.e. 2-dimensional (2D) or 3-dimensional (3D), charge-trap type (CT) and floating-gate type (FG) and multi-level cell (MLC) or triple-level cell (TLC). In this paper, the following four types of NAND flash memory are analyzed: 1) 3D CT TLC, 2) 3D FG TLC, 3) 2D FG TLC, and 4) 2D FG MLC NAND flash. Regardless of read- and write-intensive workloads, SCM/NAND flash hybrid SSD with low cost 3D CT TLC NAND flash achieves the best performance that is 20% higher than that with higher cost 2D FG MLC NAND flash. The performance improvement of 3D CT TLC NAND flash can be obtained by the short write latency. On the other hand, in case of tri-hybrid SSD, SCM/3D CT TLC/3D CT TLC NAND flash tri-hybrid SSD improves the performance 102% compared to SCM/2D FG MLC/3D CT TLC NAND flash tri-hybrid SSD. In addition, SCM/2D FG MLC/2D FG MLC NAND flash tri-hybrid SSD shows 49% lower performance than SCM/2D FG MLC/3D CT TLC NAND flash tri-hybrid SSD. Tri-hybrid SSD flash with 3D CT TLC NAND flash is the best performance in tri-hybrid SSD thanks to larger block size and word-line (WL) write. Therefore, in 3D CT TLC NAND flash based SSDs, higher cost MLC NAND flash is not necessary for hybrid SSD and tri-hybrid SSD for data center applications.

  • Heterogeneous Integration of Precise and Approximate Storage for Error-Tolerant Workloads

    Chihiro MATSUI  Ken TAKEUCHI  

     
    PAPER

      Pubricized:
    2022/09/05
      Vol:
    E106-A No:3
      Page(s):
    491-503

    This study proposes a heterogeneous integration of precise and approximate storage in data center storage. The storage control engine allocates precise and error-tolerant applications to precise and approximate storage, respectively. The appropriate use of both precise and approximate storage is examined by applying a non-volatile memory capacity algorithm. To respond to the changes in application over time, the non-volatile memory capacity algorithm changes capacity of storage class memories (SCMs), namely the memory-type SCM (M-SCM) and storage-type SCM (S-SCM), in non-volatile memory resource. A three-dimensional triple-level cell (TLC) NAND flash is used as a large capacity memory. The results indicate that precise storage exhibits a high performance when the maximum storage cost is high. By contrast, with a low maximum storage cost, approximate storage exhibits high performance using a low bit cost approximate multiple-level cell (MLC) S-SCM.

  • Write Variation & Reliability Error Compensation by Layer-Wise Tunable Retraining of Edge FeFET LM-GA CiM

    Shinsei YOSHIKIYO  Naoko MISAWA  Kasidit TOPRASERTPONG  Shinichi TAKAGI  Chihiro MATSUI  Ken TAKEUCHI  

     
    PAPER

      Pubricized:
    2022/12/19
      Vol:
    E106-C No:7
      Page(s):
    352-364

    This paper proposes a layer-wise tunable retraining method for edge FeFET Computation-in-Memory (CiM) to compensate the accuracy degradation of neural network (NN) by FeFET device errors. The proposed retraining can tune the number of layers to be retrained to reduce inference accuracy degradation by errors that occur after retraining. Weights of the original NN model, accurately trained in cloud data center, are written into edge FeFET CiM. The written weights are changed by FeFET device errors in the field. By partially retraining the written NN model, the proposed method combines the error-affected layers of NN model with the retrained layers. The inference accuracy is thus recovered. After retraining, the retrained layers are re-written to CiM and affected by device errors again. In the evaluation, at first, the recovery capability of NN model by partial retraining is analyzed. Then the inference accuracy after re-writing is evaluated. Recovery capability is evaluated with non-volatile memory (NVM) typical errors: normal distribution, uniform shift, and bit-inversion. For all types of errors, more than 50% of the degraded percentage of inference accuracy is recovered by retraining only the final fully-connected (FC) layer of Resnet-32. To simulate FeFET Local-Multiply and Global-accumulate (LM-GA) CiM, recovery capability is also evaluated with FeFET errors modeled based on FeFET measurements. Retraining only FC layer achieves recovery rate of up to 53%, 66%, and 72% for FeFET write variation, read-disturb, and data-retention, respectively. In addition, just adding two more retraining layers improves recovery rate by 20-30%. In order to tune the number of retraining layers, inference accuracy after re-writing is evaluated by simulating the errors that occur after retraining. When NVM typical errors are injected, it is optimal to retrain FC layer and 3-6 convolution layers of Resnet-32. The optimal number of layers can be increased or decreased depending on the balance between the size of errors before retraining and errors after retraining.