1-11hit |
Hideyuki NODA Katsumi DOSAKA Hans Jurgen MATTAUSCH Tetsushi KOIDE Fukashi MORISHITA Kazutami ARIMOTO
This paper describes a novel TCAM architecture designed for enhancing the soft-error immunity. An associated embedded DRAM and ECC circuits are placed next to TCAM macro to implement a unique methodology of recovering upset bits due to soft errors. The proposed configuration allows an improvement of soft-error immunity by 6 orders of magnitude compared with the conventional TCAM. We also propose a novel testing methodology of the soft-error rate with a fast parallel multi-bit test. In addition, the proposed architecture resolves the critical problem of the look-up table maintenance of TCAM. The design techniques reported in this paper are especially attractive for realizing soft-error immune, high-performance TCAM chips.
Akira YAMAZAKI Fukashi MORISHITA Naoya WATANABE Teruhiko AMANO Masaru HARAGUCHI Hideyuki NODA Atsushi HACHISUKA Katsumi DOSAKA Kazutami ARIMOTO Setsuo WAKE Hideyuki OZAKI Tsutomu YOSHIHARA
The voltage margin of an embedded DRAM's sense operation has been shrinking with the scaling of process technology. A method to estimate this margin would be a key to optimizing the memory array configuration and the size of the sense transistor. In this paper, the voltage margin of the sense operation is theoretically analyzed. The accuracy of the proposed voltage margin model was confirmed on a 0.13-µm eDRAM test chip, and the results of calculation were generally in agreement with the measured results.
Kazunari INOUE Hideyuki NODA Kazutami ARIMOTO Hans Jurgen MATTAUSCH Tetsushi KOIDE
A signature-matching co-processor in 130 nm CMOS technology for application in the network-security field is presented. Two key search technologies, implemented with fully-parallel CAM-based search cores, enable the removal of misused packets from Giga-bit-per-second (G-bps) networks in real-time without disturbing the normal network traffic. The first technology is a thorough search through packet header as well as payload in byte-shifting manner and is capable of detecting viruses, even if they are hidden at an arbitrary position within the packet. A 1.125 Mbit ternary CAM, operated at the speed of 125 Mega-searches per second (M-sps), integrates the primary lookup table for thorough packet search. The second technology applies an additional relational search with programmable logical operations to detect recently appearing more complicated misused packets. A small 192-bit binary CAM operated at 31.25 M-sps is also included for this purpose. Power dissipation, being a major concern of CAM-based application-specific LSIs, is addressed in the light of the signature-matching application, which has a high probability of multiple matches and which doesn't require to mask individual bits of the search word. Consequently, two application-driven power-reduction methods are implemented, namely an improved pipelined search for efficiently reducing power even in the case of a large number of multiple matches, and a search-line encoding for cutting search-line related power dissipation. As a result the signature-matching co-processor features low power dissipation between 0.4 W and 1.1 W for the best case and the worst case search configurations, respectively.
Takeshi KUMAKI Masakatsu ISHIZAKI Tetsushi KOIDE Hans Jurgen MATTAUSCH Yasuto KURODA Hideyuki NODA Katsumi DOSAKA Kazutami ARIMOTO Kazunori SAITO
This paper reports an efficient Discrete Cosine Transform (DCT) processing method for images using a massive-parallel memory-embedded SIMD matrix processor. The matrix-processing engine has 2,048 2-bit processing elements, which are connected by a flexible switching network, and supports 2-bit 2,048-way bit-serial and word-parallel operations with a single command. For compatibility with this matrix-processing architecture, the conventional DCT algorithm has been improved in arithmetic order and the vertical/horizontal-space 1 Dimensional (1D)-DCT processing has been further developed. Evaluation results of the matrix-engine-based DCT processing show that the necessary clock cycles per image block can be reduced by 87% in comprison to a conventional DSP architecture. The determined performances in MOPS and MOPS/mm2 are factors 8 and 5.6 better than with a conventional DSP, respectively.
Takeshi KUMAKI Yasuto KURODA Masakatsu ISHIZAKI Tetsushi KOIDE Hans Jurgen MATTAUSCH Hideyuki NODA Katsumi DOSAKA Kazutami ARIMOTO Kazunori SAITO
This paper presents a novel optimized real-time Huffman encoder using a pipelined data path based on CAM technology and a parallel code-word-table optimizer. The exploitation of CAM technology enables fast parallel search of the code word table. At the same time, the code word table is optimized according to the frequency of received input symbols and is up-dated in real-time. Since these two functions work in parallel, the proposed architecture realizes fast parallel encoding and keeps a constantly high compression ratio. Evaluation results for the JPEG application show that the proposed architecture can achieve up to 28% smaller encoded picture sizes than the conventional architectures. The obtained encoding time can be reduced by 95% in comparison to a conventional SRAM-based architecture, which is suitable even for the latest end-user-devices requiring fast frame-rates. Furthermore, the proposed architecture provides the only encoder that can simultaneously realize small compressed data size and fast processing speed.
Akira YAMAZAKI Takeshi FUJINO Kazunari INOUE Isamu HAYASHI Hideyuki NODA Naoya WATANABE Fukashi MORISHITA Katsumi DOSAKA Yoshikazu MOROOKA Shinya SOEDA Kazutami ARIMOTO Setsuo WAKE Kazuyasu FUJISHIMA Hideyuki OZAKI
A 23.3 mm2 32 Mb embedded DRAM (eDRAM) macro has been fabricated using 0.18 µm triple-well 4-metal embedded DRAM process technology to realize an accelerated 3-D graphics controller. The array architecture, using a dual-port sense amplifier, achieves the column access latency of two cycles at 222 MHz and a peak data rate of 14.2 4 GB/s at 4 macros. The process cost has been kept low by using VT-MOS circuit technology and taking advantage of a characteristic of dual-gate oxide process technology. A tRAC of 11.6 ns at 2.0 V is achieved using a 'pre-detect redundancy' circuit.
Takeshi KUMAKI Masakatsu ISHIZAKI Tetsushi KOIDE Hans Jurgen MATTAUSCH Yasuto KURODA Takayuki GYOHTEN Hideyuki NODA Katsumi DOSAKA Kazutami ARIMOTO Kazunori SAITO
This paper presents an integration architecture of content addressable memory (CAM) and a massive-parallel memory-embedded SIMD matrix for constructing a versatile multimedia processor. The massive-parallel memory-embedded SIMD matrix has 2,048 2-bit processing elements, which are connected by a flexible switching network, and supports 2-bit 2,048-way bit-serial and word-parallel operations with a single command. The SIMD matrix architecture is verified to be a better way for processing the repeated arithmetic operation types in multimedia applications. The proposed architecture, reported in this paper, exploits in addition CAM technology and enables therefore fast pipelined table-lookup coding operations. Since both arithmetic and table-lookup operations execute extremely fast, the proposed novel architecture can realize consequently efficient and versatile multimedia data processing. Evaluation results of the proposed CAM-enhanced massive-parallel SIMD matrix processor for the example of the frequently used JPEG image-compression application show that the necessary clock cycle number can be reduced by 86% in comparison to a conventional mobile DSP architecture. The determined performances in Mpixel/mm2 are factors 3.3 and 4.4 better than with a CAM-less massive-parallel memory-embedded SIMD matrix processor and a conventional mobile DSP, respectively.
Fukashi MORISHITA Hideyuki NODA Isamu HAYASHI Takayuki GYOHTEN Mako OKAMOTO Takashi IPPOSHI Shigeto MAEGAWA Katsumi DOSAKA Kazutami ARIMOTO
We propose a novel capacitorless twin-transistor random access memory (TTRAM). The 2 Mb test device has been fabricated on 130 nm SOI-CMOS process. We demonstrate the TTRAM cell has two data-storage states and confirm the data retention time of 100 ms at 80. TTRAM process is compatible with the conventional SOI-CMOS and never requires any additional processes. A 6.1 ns row-access time is achieved and 250 MHz operation can be realized by using 2 bank 8 b-burst mode.
Takayuki GYOHTEN Fukashi MORISHITA Isamu HAYASHI Mako OKAMOTO Hideyuki NODA Katsumi DOSAKA Kazutami ARIMOTO Yasutaka HORIBA
Adaptive voltage management (AVM) scheme is proposed for worst-caseless lower voltage SoC design. The AVM scheme detects the temperature accurately by using two oscillators with different temperature characteristics, and sets supply voltage most suitable with a table look-up method corresponding to the process variation. Also, the AVM can supply the stable voltage with a local shift type regulator even at lower voltage. Thereby, this supply-voltage control system considering PVT variations can control the internal voltage corresponding to process and temperature variations and can realize a wide-operating-margin, DFM function for low voltage SoC. The experimental chip is fabricated on a 90 nm CMOS process, and it was confirmed that the proposed architecture controls the voltage accurately and has a wide operating margin at a lower voltage.
Masanori HAYASHIKOSHI Hiroaki TANIZAKI Yasumitsu MURAI Takaharu TSUJI Kiyoshi KAWABATA Koji NII Hideyuki NODA Hiroyuki KONDO Yoshio MATSUDA Hideto HIDAKA
A 1-Transistor 4-Magnetic Tunnel Junction (1T-4MTJ) memory cell has been proposed for field type of Magnetic Random Access Memory (MRAM). Proposed 1T-4MTJ memory cell array is achieved 44% higher density than that of conventional 1T-1MTJ thanks to the common access transistor structure in a 4-bit memory cell. A self-reference sensing scheme which can read out with write-back in four clock cycles has been also proposed. Furthermore, we add to estimate with considering sense amplifier variation and show 1T-4MTJ cell configuration is the best solution in IoT applications. A 1-Mbit MRAM test chip is designed and fabricated successfully using 130-nm CMOS process. By applying 1T-4MTJ high density cell and partially embedded wordline driver peripheral into the cell array, the 1-Mbit macro size is 4.04 mm2 which is 35.7% smaller than the conventional one. Measured data shows that the read access is 55 ns at 1.5 V typical supply voltage and 25C. Combining with conventional high-speed 1T-1MTJ caches and proposed high-density 1T-4MTJ user memories is an effective on-chip hierarchical non-volatile memory solution, being implemented for low-power MCUs and SoCs of IoT applications.
Hideyuki NODA Kazunari INOUE Hans Jurgen MATTAUSCH Tetsushi KOIDE Katsumi DOSAKA Kazutami ARIMOTO Kazuyasu FUJISHIMA Kenji ANAMI Tsutomu YOSHIHARA
This paper describes a dynamic TCAM architecture with planar complementary capacitors, transparently scheduled refresh (TSR), autonomous power management (APM) and address-input-free writing scheme. The complementary cell structure of the planar dynamic TCAM (PD-TCAM) allows small cell size of 4.79 µm2 in 130 nm CMOS technology, and realizes stable TCAM operation even with very small storage capacitance. Due to the TSR architecture, the PD-TCAM maintains functional compatibility with a conventional SRAM-based TCAM. The combined effects of the compact PD-TCAM array matrix and the APM technique result in up to 50% reduction of the total power consumption during search operation. In addition, an intelligent address-input-free writing scheme is also introduced to facilitate the PD-TCAM application for the user. Consequently the proposed architecture is quite attractive for realizing compact and low-power embedded TCAM macros for the design of system VLSI solutions in the field of networking applications.