1-16hit |
Yoshihiro NAGURA Yoshinori FUJIWARA Katsuya FURUE Ryuji OHMURA Tatsunori KOMOIKE Takenori OKITAKA Tetsushi TANIZAKI Katsumi DOSAKA Kazutami ARIMOTO Yukiyoshi KODA Tetsuo TADA
The increase of test time of embedded DRAMs (e-DRAM) is one of the key issues of System-on-chip (SOC) device test. This paper proposes to put the repair analysis function on chip as Built In Self Repair (BISR). BISR is performed at 166 MHz as at-speed of e-DRAM with using low cost automatic test equipment (ATE). The area of the BISR is 1.7 mm2. Using error storage table form contributes to realize small area penalty of repair analysis function. e-DRAM function test time by BISR was about 20% less than the conventional method at wafer level testing. Moreover, representative samples are produced to confirm repair analysis ability. The results show that all of the samples are actually repaired by repair information generated by BISR.
Hideyuki NODA Katsumi DOSAKA Hans Jurgen MATTAUSCH Tetsushi KOIDE Fukashi MORISHITA Kazutami ARIMOTO
This paper describes a novel TCAM architecture designed for enhancing the soft-error immunity. An associated embedded DRAM and ECC circuits are placed next to TCAM macro to implement a unique methodology of recovering upset bits due to soft errors. The proposed configuration allows an improvement of soft-error immunity by 6 orders of magnitude compared with the conventional TCAM. We also propose a novel testing methodology of the soft-error rate with a fast parallel multi-bit test. In addition, the proposed architecture resolves the critical problem of the look-up table maintenance of TCAM. The design techniques reported in this paper are especially attractive for realizing soft-error immune, high-performance TCAM chips.
Hiroki SHIMANO Fukashi MORISHITA Katsumi DOSAKA Kazutami ARIMOTO
The advanced-DFM (Design For Manufacturability) RAM provides the solution for the limitation of SRAM voltage scaling down and the countermeasure of the process fluctuations. The characteristics of this RAM are the voltage scalability (@0.6 V operation) with wide operating margin and the reliability of long data retention time. The memory cell consists of 2 Cell/bit with the complementary dynamic memory operation and has the 1 Cell/bit test mode for the accelerated screening against the marginal cells. The GND bitline pre-charge sensing scheme and SSW (Sense Synchronized Write) peripheral circuit technologies are also adopted for the low voltage and DFV (Dynamic Frequency and Voltage) controllable SoC which will be strongly required from the many kinds of applications. This RAM supports the DFM functions with both good cell/bit for advanced process technologies and the voltage scalable SoC memory platform.
Akira YAMAZAKI Fukashi MORISHITA Naoya WATANABE Teruhiko AMANO Masaru HARAGUCHI Hideyuki NODA Atsushi HACHISUKA Katsumi DOSAKA Kazutami ARIMOTO Setsuo WAKE Hideyuki OZAKI Tsutomu YOSHIHARA
The voltage margin of an embedded DRAM's sense operation has been shrinking with the scaling of process technology. A method to estimate this margin would be a key to optimizing the memory array configuration and the size of the sense transistor. In this paper, the voltage margin of the sense operation is theoretically analyzed. The accuracy of the proposed voltage margin model was confirmed on a 0.13-µm eDRAM test chip, and the results of calculation were generally in agreement with the measured results.
Takeshi KUMAKI Masakatsu ISHIZAKI Tetsushi KOIDE Hans Jurgen MATTAUSCH Yasuto KURODA Hideyuki NODA Katsumi DOSAKA Kazutami ARIMOTO Kazunori SAITO
This paper reports an efficient Discrete Cosine Transform (DCT) processing method for images using a massive-parallel memory-embedded SIMD matrix processor. The matrix-processing engine has 2,048 2-bit processing elements, which are connected by a flexible switching network, and supports 2-bit 2,048-way bit-serial and word-parallel operations with a single command. For compatibility with this matrix-processing architecture, the conventional DCT algorithm has been improved in arithmetic order and the vertical/horizontal-space 1 Dimensional (1D)-DCT processing has been further developed. Evaluation results of the matrix-engine-based DCT processing show that the necessary clock cycles per image block can be reduced by 87% in comprison to a conventional DSP architecture. The determined performances in MOPS and MOPS/mm2 are factors 8 and 5.6 better than with a conventional DSP, respectively.
Takeshi KUMAKI Yasuto KURODA Masakatsu ISHIZAKI Tetsushi KOIDE Hans Jurgen MATTAUSCH Hideyuki NODA Katsumi DOSAKA Kazutami ARIMOTO Kazunori SAITO
This paper presents a novel optimized real-time Huffman encoder using a pipelined data path based on CAM technology and a parallel code-word-table optimizer. The exploitation of CAM technology enables fast parallel search of the code word table. At the same time, the code word table is optimized according to the frequency of received input symbols and is up-dated in real-time. Since these two functions work in parallel, the proposed architecture realizes fast parallel encoding and keeps a constantly high compression ratio. Evaluation results for the JPEG application show that the proposed architecture can achieve up to 28% smaller encoded picture sizes than the conventional architectures. The obtained encoding time can be reduced by 95% in comparison to a conventional SRAM-based architecture, which is suitable even for the latest end-user-devices requiring fast frame-rates. Furthermore, the proposed architecture provides the only encoder that can simultaneously realize small compressed data size and fast processing speed.
Akira YAMAZAKI Takeshi FUJINO Kazunari INOUE Isamu HAYASHI Hideyuki NODA Naoya WATANABE Fukashi MORISHITA Katsumi DOSAKA Yoshikazu MOROOKA Shinya SOEDA Kazutami ARIMOTO Setsuo WAKE Kazuyasu FUJISHIMA Hideyuki OZAKI
A 23.3 mm2 32 Mb embedded DRAM (eDRAM) macro has been fabricated using 0.18 µm triple-well 4-metal embedded DRAM process technology to realize an accelerated 3-D graphics controller. The array architecture, using a dual-port sense amplifier, achieves the column access latency of two cycles at 222 MHz and a peak data rate of 14.2 4 GB/s at 4 macros. The process cost has been kept low by using VT-MOS circuit technology and taking advantage of a characteristic of dual-gate oxide process technology. A tRAC of 11.6 ns at 2.0 V is achieved using a 'pre-detect redundancy' circuit.
Naoya WATANABE Fukashi MORISHITA Yasuhiko TAITO Akira YAMAZAKI Tetsushi TANIZAKI Katsumi DOSAKA Yoshikazu MOROOKA Futoshi IGAUE Katsuya FURUE Yoshihiro NAGURA Tatsunori KOMOIKE Toshinori MORIHARA Atsushi HACHISUKA Kazutami ARIMOTO Hideyuki OZAKI
This paper describes an Embedded DRAM Hybrid Macro, which supports various memory specifications. The eDRAM module generator with Hybrid Macro provides more than 120,000 eDRAM configurations. This eDRAM includes a new architecture called Auto Signal Management (ASM) architecture, which automatically adjusts the timing of the control signals for various eDRAM configurations, and reduces the design Turn Around Time. An Enhanced-on-chip Tester performs the maximum 512b I/O pass/fail simultaneous judgments and the real time repair analysis. The eDRAM testing time is reduced to about 1/64 of the time required using the conventional technique. A test chip is fabricated using a 0.18 µm 4-metal embedded DRAM technology, which utilizes the triple-well, dual-Tox, and Co salicide process technologies. This chip achieves a wide voltage range operation of 1.2 V at 100 MHz to 1.8 V at 200 MHz.
Katsumi DOSAKA Akira YAMAZAKI Naoya WATANABE Hideaki ABE Jun OHTANI Toshiyuki OGAWA Kazunori ISHIHARA Masaki KUMANOYA
This paper describes a system integrated memory with direct interface to CPU which integrates an SRAM, a DRAM, and control circuitry, including a tag memory (TAG). This memory realizes a computer system without glue chips, and thus enables a computer system which is low cost, low power, and compact size, but still with sufficient performance. And fast clock cycle time and access time is realized using a newly proposed clock driver and internal signal generator. This memory is fabricated with a quad-polysilicon double-metal 0.55-µm CMOS process which is the same as used in a conventional 16-Mb DRAM. The chip size of 145.3mm2 is only a 12% increase over the conventional 16-Mb DRAM. The maximum operating frequency is 90-MHz and the operating current at cache-hit is 156-mA. This memory is suitable for various types of computer systems such as personal digital assistants(PDA's), personal computer systems, and embedded controller applications.
Katsumi DOSAKA Daisuke OGAWA Takahito KUSUMOTO Masayuki MIYAMA Yoshio MATSUDA
Architecture of a low power Ternary Content Addressable Memory (TCAM) is proposed. The TCAM is a powerful engine for search and sort processing, but it has two serious problems, large power consumption and large power line noise. To solve these problems, we have developed a charge recycling scheme for match lines and search lines. A combination of the newly introduced PMOS CAM cell together with the conventional NMOS CAM cell realizes match line charge recycling. A checkerboard arrangement of the NMOS and the PMOS cell array enables search line charge recycling. By using these technologies, the power consumption of the TCAM can be reduced to 50% of conventional designs, and as a result, the power line noise is also reduced. An experimental chip has been fabricated in 180-nm 6-metal process. The power consumption of this chip is 6.3 fJ/bit/search, which is half of the conventional scheme.
Takeshi KUMAKI Masakatsu ISHIZAKI Tetsushi KOIDE Hans Jurgen MATTAUSCH Yasuto KURODA Takayuki GYOHTEN Hideyuki NODA Katsumi DOSAKA Kazutami ARIMOTO Kazunori SAITO
This paper presents an integration architecture of content addressable memory (CAM) and a massive-parallel memory-embedded SIMD matrix for constructing a versatile multimedia processor. The massive-parallel memory-embedded SIMD matrix has 2,048 2-bit processing elements, which are connected by a flexible switching network, and supports 2-bit 2,048-way bit-serial and word-parallel operations with a single command. The SIMD matrix architecture is verified to be a better way for processing the repeated arithmetic operation types in multimedia applications. The proposed architecture, reported in this paper, exploits in addition CAM technology and enables therefore fast pipelined table-lookup coding operations. Since both arithmetic and table-lookup operations execute extremely fast, the proposed novel architecture can realize consequently efficient and versatile multimedia data processing. Evaluation results of the proposed CAM-enhanced massive-parallel SIMD matrix processor for the example of the frequently used JPEG image-compression application show that the necessary clock cycle number can be reduced by 86% in comparison to a conventional mobile DSP architecture. The determined performances in Mpixel/mm2 are factors 3.3 and 4.4 better than with a CAM-less massive-parallel memory-embedded SIMD matrix processor and a conventional mobile DSP, respectively.
Fukashi MORISHITA Hideyuki NODA Isamu HAYASHI Takayuki GYOHTEN Mako OKAMOTO Takashi IPPOSHI Shigeto MAEGAWA Katsumi DOSAKA Kazutami ARIMOTO
We propose a novel capacitorless twin-transistor random access memory (TTRAM). The 2 Mb test device has been fabricated on 130 nm SOI-CMOS process. We demonstrate the TTRAM cell has two data-storage states and confirm the data retention time of 100 ms at 80. TTRAM process is compatible with the conventional SOI-CMOS and never requires any additional processes. A 6.1 ns row-access time is achieved and 250 MHz operation can be realized by using 2 bank 8 b-burst mode.
Takayuki GYOHTEN Fukashi MORISHITA Isamu HAYASHI Mako OKAMOTO Hideyuki NODA Katsumi DOSAKA Kazutami ARIMOTO Yasutaka HORIBA
Adaptive voltage management (AVM) scheme is proposed for worst-caseless lower voltage SoC design. The AVM scheme detects the temperature accurately by using two oscillators with different temperature characteristics, and sets supply voltage most suitable with a table look-up method corresponding to the process variation. Also, the AVM can supply the stable voltage with a local shift type regulator even at lower voltage. Thereby, this supply-voltage control system considering PVT variations can control the internal voltage corresponding to process and temperature variations and can realize a wide-operating-margin, DFM function for low voltage SoC. The experimental chip is fabricated on a 90 nm CMOS process, and it was confirmed that the proposed architecture controls the voltage accurately and has a wide operating margin at a lower voltage.
Hideyuki NODA Kazunari INOUE Hans Jurgen MATTAUSCH Tetsushi KOIDE Katsumi DOSAKA Kazutami ARIMOTO Kazuyasu FUJISHIMA Kenji ANAMI Tsutomu YOSHIHARA
This paper describes a dynamic TCAM architecture with planar complementary capacitors, transparently scheduled refresh (TSR), autonomous power management (APM) and address-input-free writing scheme. The complementary cell structure of the planar dynamic TCAM (PD-TCAM) allows small cell size of 4.79 µm2 in 130 nm CMOS technology, and realizes stable TCAM operation even with very small storage capacitance. Due to the TSR architecture, the PD-TCAM maintains functional compatibility with a conventional SRAM-based TCAM. The combined effects of the compact PD-TCAM array matrix and the APM technique result in up to 50% reduction of the total power consumption during search operation. In addition, an intelligent address-input-free writing scheme is also introduced to facilitate the PD-TCAM application for the user. Consequently the proposed architecture is quite attractive for realizing compact and low-power embedded TCAM macros for the design of system VLSI solutions in the field of networking applications.
Masaki KUMANOYA Toshiyuki OGAWA Yasuhiro KONISHI Katsumi DOSAKA Kazuhiro SHIMOTORI
Various kinds of new architectures have been proposed to enhance operating performance of the DRAM. This paper reviews these architectures including EDO, SDRAM, RDRAM, EDRAM, and CDRAM. The EDO slightly modifies the output control of the conventional DRAM architecture. Other innovative architectures try to enhance the performance by taking advantage of DRAM's internal multiple bits architecture with internal pipeline, parallel-serial conversion, or static buffers/on-chip cache. A quantitative analysis based on an assumption of wait cycles was made to compare PC system performance with some architectures. The calculation indicated the effectiveness of external or on-chip cache. Future trends cover high-speed I/O interface, unified memory architecture, and system integrated memory. The interface includes limited I/O swing such as HSTL and SSTL to realize more than 100MHz operation. Also, Ramlink and SyncLink are briefly reviewed as candidates for next generation interface. Unified memory architecture attempts to save total memory capacity by combining graphics and main memory. Advanced device technology enables system integration which combine system logic and memory. It suggests one potential direction towards system on a chip in the future.
Hiroki SHIMANO Fukashi MORISHITA Katsumi DOSAKA Kazutami ARIMOTO
The proposed built-in Power-Cut scheme intended for a wide range of dynamically data retaining memories including embedded SoC memories enables the system-level power management to handle SoC on which the several high density and low voltage scalable memory macros are embedded. This scheme handles the deep standby mode in which the SoC memories keep the stored data in the ultra low standby current, and quick recovery to the normal operation mode and precise power management are realized, in addition to the conventional full power-off mode in which the SoC memories stay in the negligibly low standby current but allow the stored data to disappear. The unique feature of the statically or dynamically changeable internal voltages of memory in the deep standby mode brings about much further reduction of the standby current. This scheme will contribute to the further lowering power of the mobile applications requiring larger memory capacity embedded SoC memories.