Hiroshi HAGA Takuya ASAI Shin TAKEUCHI Harue SASAKI Hirotsugu YAMAMOTO Koji SHIGEMURA
We developed an 8.4-inch electrostatic-tactile touch display using a segmented-electrode array (30×20) as both tactile pixels and touch sensors. Each pixel can be excited independently so that the electrostatic-tactile touch display allows presenting real localized tactile textures in any shape. A driving scheme in which the tactile strength is independent of the grounding state of the human body by employing two-phased actuation was also proposed and demonstrated. Furthermore, tactile crosstalk was investigated to find it was due to the voltage fluctuation in the human body and it was diminished by applying the aforementioned driving scheme.
Ryutaro DOI Xu BAI Toshitsugu SAKAMOTO Masanori HASHIMOTO
FPGA that exploits via-switches, which are a kind of non-volatile resistive RAMs, for crossbar implementation is attracting attention due to its high integration density and energy efficiency. Via-switch crossbar is responsible for the signal routing in the interconnections by changing on/off-states of via-switches. To verify the via-switch crossbar functionality after manufacturing, fault testing that checks whether we can turn on/off via-switches normally is essential. This paper confirms that a general differential pair comparator successfully discriminates on/off-states of via-switches, and clarifies fault modes of a via-switch by transistor-level SPICE simulation that injects stuck-on/off faults to atom switch and varistor, where a via-switch consists of two atom switches and two varistors. We then propose a fault diagnosis methodology for via-switches in the crossbar that diagnoses the fault modes according to the comparator response difference between the normal and faulty via-switches. The proposed method achieves 100% fault detection by checking the comparator responses after turning on/off the via-switch. In case that the number of faulty components in a via-switch is one, the ratio of the fault diagnosis, which exactly identifies the faulty varistor and atom switch inside the faulty via-switch, is 100%, and in case of up to two faults, the fault diagnosis ratio is 79%.
Yoshiki KAYANO Kazuaki MIYANAGA Hiroshi INOUE
In the design of electrical contacts, it is required to pursue a solution which satisfies simultaneously multi-objective (electrical, mechanical, and thermal) performances including conflicting requirements. Preference Set-Based Design (PSD) has been proposed as practical procedure of the fuzzy set-based design method. This brief paper newly attempts to propose a concurrent design method by PSD to electrical contact, specifically a design of a shape of cantilever in relay contacts. In order to reduce the calculation (and/or experimental) cost, this paper newly attempt to apply Design of Experiments (DoE) for meta-modeling to PSD. The number of the calculation for the meta-modeling can be reduced to $rac{1}{729}$ by using DoE. The design parameters (width and length) of a cantilever for drive an electrical contact, which satisfy required performance (target deflection), are obtained in ranges successfully by PSD. The validity of the design parameters is demonstrated by numerical modeling.
Min Gee KIM Masakazu KATAOKA Rengie Mark D. MAILIG Shun-ichiro OHMI
Ferroelectric gate field-effect transistors (MFSFETs) were investigated utilizing nondoped HfO2 deposited by RF magnetron sputtering utilizing Hf target. After the post-metallization annealing (PMA) process with Pt top gate at 500°C/30s, ferroelectric characteristic of 10nm thick nondoped HfO2 was obtained. The fabricated MFSFETs showed the memory window of 1.7V when the voltage sweep range was from -3 to 3V.
Takahiro HIROFUCHI Ryousei TAKANO
In this prompt report, we present the basic performance evaluation of Intel Optane Data Center Persistent Memory Module (Optane DCPMM), which is the first commercially-available, byte-addressable non-volatile memory modules released in April 2019. Since at the moment of writing only a few reports on its performance were published, this letter is intended to complement other performance studies. Through experiments using our own measurement tools, we obtained that the latency of random read-only access was approximately 374 ns. That of random writeback-involving access was 391 ns. The bandwidths of read-only and writeback-involving access for interleaved memory modules were approximately 38 GB/s and 3 GB/s, respectively.
Yoshiki TAKAI Mamoru FUKUCHI Chihiro MATSUI Reika KINOSHITA Ken TAKEUCHI
This paper analyzes the optimal SSD configuration including emerging non-volatile memories such as quadruple-level cell (QLC) NAND flash memory [1] and storage class memories (SCMs). First, SSD performance and SSD endurance lifetime of hybrid SSD are evaluated in four configurations: 1) single-level cell (SLC)/QLC NAND flash, 2) SCM/QLC NAND flash, 3) SCM/triple-level cell (TLC)/QLC NAND flash and 4) SCM/TLC NAND flash. Furthermore, these four configurations are compared in limited cost. In case of cold workloads or high total SSD cost assumption, SCM/TLC NAND flash hybrid configuration is recommended in both SSD performance and endurance lifetime. For hot workloads with low total SSD cost assumption, however, SLC/QLC NAND flash hybrid configuration is recommended with emphasis on SSD endurance lifetime. Under the same conditions as above, SCM/TLC/QLC NAND flash tri-hybrid is the best configuration in SSD performance considering cost. In particular, for prxy_0 (write-hot workload), SCM/TLC/QLC NAND flash tri-hybrid achieves 67% higher IOPS/cost than SCM/TLC NAND flash hybrid. Moreover, the configurations with the highest IOPS/cost in each workload and cost limit are picked up and analyzed with various types of SCMs. For all cases except for the case of prxy_1 with high total SSD cost assumption, middle-end SCM (write latency: 1us, read latency: 1us) is recommended in performance considering cost. However, for prxy_1 (read-hot workload) with high total SSD cost assumption, high-end SCM (write latency: 100ns, read latency: 100ns) achieves the best performance.
Dohyeon PARK Jinho LEE Jung-Won KANG Jae-Gon KIM
The emerging Versatile Video Coding (VVC) standard currently adopts Triangular Partitioning Mode (TPM) to make more flexible inter prediction. Due to the motion search and motion storage for TPM, the complexity of the encoder and decoder is significantly increased. This letter proposes two simplifications of TPM for reducing the complexity of the current design. One simplification is to reduce the number of combinations of motion vectors for both partitions to be checked. The method gives 4% encoding time decrease with negligible BD-rate loss. Another one is to remove the reference picture remapping process in the motion vector storage of TPM. It reduces the complexity of the encoder and decoder without a BD-rate change for the random-access configuration.
Dokeun LEE Seongjin LEE Youjip WON
Indexing is one of the fields where the non-volatile memory (NVM) has the advantages of byte-addressable characteristics and fast read/write speed. The existing index structures for NVM have been developed based on the fact that the size of cache line and the atomicity guarantee unit of NVM are different and they tried to overcome the weakness of consistency from the difference. To overcome the weakness, an expensive flush operation is required which results in a lower performance than a basic B+tree index. Recent studies have shown that the I/O units of the NVM can be matched with the atomicity guarantee units under limited circumstances. In this paper, we propose a Cache line sized Atomic Write B+tree (CAWBT), which is a minimal B+tree structure that shows higher performance than a basic b+ tree and designed for NVM. CAWBT has almost same performance compared to basic B+tree without consistency guarantee and shows remarkable performance improvement compared to other B+tree indexes for NVM.
Atsushi KOSHIBA Takahiro HIROFUCHI Ryousei TAKANO Mitaro NAMIKI
Non-volatile memory (NVM) is a promising technology for low-energy and high-capacity main memory of computers. The characteristics of NVM devices, however, tend to be fundamentally different from those of DRAM (i.e., the memory device currently used for main memory), because of differences in principles of memory cells. Typically, the write latency of an NVM device such as PCM and ReRAM is much higher than its read latency. The asymmetry in read/write latencies likely affects the performance of applications significantly. For analyzing behavior of applications running on NVM-based main memory, most researchers use software-based emulation tools due to the limited number of commercial NVM products. However, these existing emulation tools are too slow to emulate a large-scale, realistic workload or too simplistic to investigate the details of application behavior on NVM with asymmetric read/write latencies. This paper therefore proposes a new NVM emulation mechanism that is not only light-weight but also aware of a read/write latency gap in NVM-based main memory. We implemented the prototype of the proposed mechanism for the Intel CPU processors of the Haswell architecture. We also evaluated its accuracy and performed case studies for practical benchmarks. The results showed that our prototype accurately emulated write-latencies of NVM-based main memory: it emulated the NVM write latencies in a range from 200 ns to 1000 ns with negligible errors from 0.2% to 1.1%. We confirmed that the use of our emulator enabled us to successfully estimate performance of practical workloads for NVM-based main memory, while an existing light-weight emulation model misestimated.
Yongju SONG Sungkyun LEE Dong Hyun KANG Young Ik EOM
Flash storage suffers from severe performance degradation due to its inherent internal synchronization overhead. Especially, flushing an L2P (logical address to physical address) mapping table significantly contributes to the performance degradation. To relieve the problem, we propose an efficient L2P mapping table management scheme on the flash storage, which works along with a small-sized NVRAM. It flushes L2P mapping table from DRAM to NVRAM or flash memory selectively. In our experiments, the proposed scheme shows up to 9.37× better performance than conventional schemes.
Masanori HAYASHIKOSHI Hiroaki TANIZAKI Yasumitsu MURAI Takaharu TSUJI Kiyoshi KAWABATA Koji NII Hideyuki NODA Hiroyuki KONDO Yoshio MATSUDA Hideto HIDAKA
A 1-Transistor 4-Magnetic Tunnel Junction (1T-4MTJ) memory cell has been proposed for field type of Magnetic Random Access Memory (MRAM). Proposed 1T-4MTJ memory cell array is achieved 44% higher density than that of conventional 1T-1MTJ thanks to the common access transistor structure in a 4-bit memory cell. A self-reference sensing scheme which can read out with write-back in four clock cycles has been also proposed. Furthermore, we add to estimate with considering sense amplifier variation and show 1T-4MTJ cell configuration is the best solution in IoT applications. A 1-Mbit MRAM test chip is designed and fabricated successfully using 130-nm CMOS process. By applying 1T-4MTJ high density cell and partially embedded wordline driver peripheral into the cell array, the 1-Mbit macro size is 4.04 mm2 which is 35.7% smaller than the conventional one. Measured data shows that the read access is 55 ns at 1.5 V typical supply voltage and 25C. Combining with conventional high-speed 1T-1MTJ caches and proposed high-density 1T-4MTJ user memories is an effective on-chip hierarchical non-volatile memory solution, being implemented for low-power MCUs and SoCs of IoT applications.
The needs for ultra-high speed short- to medium-reach optical fiber links beyond 100-Gbit/s is becoming larger and larger especially for intra and inter-data center applications. In recent intensity-modulated/direct-detection (IM/DD) high-speed optical transceivers with the channel bit rate of 50 and/or 100 Gbit/s, multilevel pulse amplitude modulation (PAM) is finally adopted to lower the signaling speed. To further increase the transmission capacity for the next-generation optical transceivers, various signaling techniques have been studied, especially thanks to advanced digital signal processing (DSP). In this paper, we review various signaling technologies proposed so far for short-to-medium reach applications.
Kazuichi OE Mitsuru SATO Takeshi NANRI
The response times of solid state drives (SSDs) have decreased dramatically due to the growing use of non-volatile memory express (NVMe) devices. Such devices have response times of less than 100 micro seconds on average. The response times of all-flash-array systems have also decreased dramatically through the use of NVMe SSDs. However, there are applications, particularly virtual desktop infrastructure and in-memory database systems, that require storage systems with even shorter response times. Their workloads tend to contain many input-output (IO) concentrations, which are aggregations of IO accesses. They target narrow regions of the storage volume and can continue for up to an hour. These narrow regions occupy a few percent of the logical unit number capacity, are the target of most IO accesses, and appear at unpredictable logical block addresses. To drastically reduce the response times for such workloads, we developed an automated tiered storage system called “automated tiered storage with fast memory and slow flash storage” (ATSMF) in which the data in targeted regions are migrated between storage devices depending on the predicted remaining duration of the concentration. The assumed environment is a server with non-volatile memory and directly attached SSDs, with the user applications executed on the server as this reduces the average response time. Our system predicts the effect of migration by using the previously monitored values of the increase in response time during migration and the change in response time after migration. These values are consistent for each type of workload if the system is built using both non-volatile memory and SSDs. In particular, the system predicts the remaining duration of an IO concentration, calculates the expected response-time increase during migration and the expected response-time decrease after migration, and migrates the data in the targeted regions if the sum of response-time decrease after migration exceeds the sum of response-time increase during migration. Experimental results indicate that ATSMF is at least 20% faster than flash storage only and that its memory access ratio is more than 50%.
Dawei XU Jinfeng HUANG Yuta NAKANE Tomoo YOKOYAMA Takashi HORIYAMA Ryuhei UEHARA
Last year, a new notion of rep-cube was proposed. A rep-cube is a polyomino that is a net of a cube, and it can be divided into some polyominoes such that each of them can be folded into a cube. This notion was inspired by the notions of polyomino and rep-tile, which were introduced by Solomon W. Golomb. It was proved that there are infinitely many distinct rep-cubes. In this paper, we investigate this new notion and show further results.
Sipeng ZHANG Wei JIANG Shin'ichi SATOH
In this paper, a multilevel thresholding color image segmentation method is proposed using a modified Artificial Bee Colony(ABC) algorithm. In this work, in order to improve the local search ability of ABC algorithm, Krill Herd algorithm is incorporated into its onlooker bees phase. The proposed algorithm is named as Krill herd-inspired modified Artificial Bee Colony algorithm (KABC algorithm). Experiment results verify the robustness of KABC algorithm, as well as its improvement in optimizing accuracy and convergence speed. In this work, KABC algorithm is used to solve the problem of multilevel thresholding for color image segmentation. To deal with luminance variation, rather than using gray scale histogram, a HSV space-based pre-processing method is proposed to obtain 1D feature vector. KABC algorithm is then applied to find thresholds of the feature vector. At last, an additional local search around the quasi-optimal solutions is employed to improve segmentation accuracy. In this stage, we use a modified objective function which combines Structural Similarity Index Matrix (SSIM) with Kapur's entropy. The pre-processing method, the global optimization with KABC algorithm and the local optimization stage form the whole color image segmentation method. Experiment results show enhance in accuracy of segmentation with the proposed method.
Tatsuro KOJO Masashi TAWADA Masao YANAGISAWA Nozomu TOGAWA
Non-volatile memories are a promising alternative to memory design but data stored in them still may be destructed due to crosstalk and radiation. The data stored in them can be restored by using error-correcting codes but they require extra bits to correct bit errors. One of the largest problems in non-volatile memories is that they consume ten to hundred times more energy than normal memories in bit-writing. It is quite necessary to reduce writing bits. Recently, a REC code (bit-write-reducing and error-correcting code) is proposed for non-volatile memories which can reduce writing bits and has a capability of error correction. The REC code is generated from a linear systematic error-correcting code but it must include the codeword of all 1's, i.e., 11…1. The codeword bit length must be longer in order to satisfy this condition. In this letter, we propose a method to generate a relaxed REC code which is generated from a relaxed error-correcting code, which does not necessarily include the codeword of all 1's and thus its codeword bit length can be shorter. We prove that the maximum flipping bits of the relaxed REC code is still limited theoretically. Experimental results show that the relaxed REC code efficiently reduce the number of writing bits.
This paper proposes 0-1-A-Ā LUT, a new programmable logic using atom switches, and a delay-optimal mapping algorithm for it. Atom switch is a non-volatile memory device of very small geometry which is fabricated between metal layers of a VLSI, and it can be used as a switch device of very small on-resistance and parasitic capacitance. While considerable area reduction of Look Up Tables (LUTs) used in conventional Field Programmable Gate Arrays (FPGAs) has been achieved by simply replacing each SRAM element with a memory element using a pair of atom switches, our 0-1-A-Ā LUT achieves further area and delay reduction. Unlike the conventional atom-switch-based LUT in which all k input signals are fed to a MUX, one of input signals is fed to the switch array, resulting area reduction due to the reduced number of inputs of the MUX from 2k to 2k-1, as well as delay reduction due to reduced fanout load of the input buffers. Since the fanout of this input buffers depends on the mapped logic function, this paper also proposes technology mapping algorithms to select logic function of fewer number of fanouts of input buffers to achieve further delay reduction. From our experiments, the circuit delay using our k-LUT is 0.94% smaller in the best case compared with using the conventional atom-switch-based k-LUT.
Sheikh Rashel Al AHMED Kiyoteru KOBAYASHI
The electron retention characteristics of memory capacitors with blocking oxide-silicon carbonitride (SiCN)-tunnel oxide stacked films were investigated for application in embedded charge trapping nonvolatile memories (NVMs). Long-term data retention in the SiCN memory capacitors was estimated to be more than 10 years at 85 °C. We presented an improved method to analyze the energy distribution of electron trap states numerically. Using the presented analytical method, electron trap states in the SiCN film were revealed to be distributed from 0.8 to 1.3 eV below the conduction band edge in the SiCN band gap. The presence of energetically deep trap states leads us to suggest that the SiCN dielectric films can be employed as the charge trapping film of embedded NVMs.
Wenzhe ZHANG Kai LU Xiaoping WANG Jie JIAN
New volatile memory (e.g. Phase Change Memroy) presents fast access, large capacity, byte-addressable, and non-volatility features. These features will bring impacts on the design of current software system. It has become a hot research topic of how to manage it and provide what kind of interface for upper application to use it. This paper proposes FP-Heap. FP-Heap supports direct access to non-volatile memory through a persistent heap interface. With FP-Heap, traditional persistent object systems can benefit directly from the byte-persistency of non-volatile memory. FP-Heap extends current virtual memory manager (VMM) to manage non-volatile memory and maintain a persistent mapping relationship. Also, FP-Heap offers a lightweight transaction mechanism to support atomic update of persistent data, a simple namespace to facilitate data indexing, and a basic access control mechanism to support data sharing. Compared with previous work Mnemosyne, FP-Heap achieves higher performance by its customized VMM and optimized transaction mechanism.
Rapid process scaling and the introduction of the multilevel cell (MLC) concept have lowered costs of NAND Flash memories, but also degraded reliability. For this reason, the memories are depending on strong error correcting codes (ECCs), and this has enabled the memories to be used in wide range of storage applications, including solid-state drives (SSDs). Meanwhile, too strong error correcting capability requires excessive decoding complexity and check bits. In NAND Flash memories, cell errors to neighborhood voltage levels are more probable than those to distant levels. Several ECCs reflecting this characteristics, including limited-magnitude ECCs which correct only errors with a certain limited magnitude and low-density parity check (LDPC) codes, have been proposed. However, as most of these ECCs need the multiple bits in a cell for encoding, they cannot be used with multipage programing, a high speed programming method currently employed in the memories. Also, binary ECCs with Gray codes are no longer optimal when multilevel voltage shifts (MVSs) occur. In this paper, an error correction method reflecting the error characteristic is presented. This method detects errors by a binary ECC as a conventional manner, but a nonbinary value or whole the bits in a cell, are subjected to error correction, so as to be corrected into the most probable neighborhood value. The amount of bit error rate (BER) improvement is depending on the probability of the each error magnitude. In case of 2bit/cell, if only errors of magnitude 1 and 2 can occur and the latter occupies 5% of cell errors, acceptable BER is improved by 4%. This is corresponding to extending 2.4% of endurance. This method needs about 15% longer average latency, 19% longer maximum latency, and 15% lower throughput. However, with using the conventional method until the memories' lifetime number of program/erase cycling, and the proposed method after that, BER improvement can be utilized for extending endurance without latency and throughput degradation until the switch of the methods.