1-4hit |
Takeshi KUMAKI Masakatsu ISHIZAKI Tetsushi KOIDE Hans Jurgen MATTAUSCH Yasuto KURODA Takayuki GYOHTEN Hideyuki NODA Katsumi DOSAKA Kazutami ARIMOTO Kazunori SAITO
This paper presents an integration architecture of content addressable memory (CAM) and a massive-parallel memory-embedded SIMD matrix for constructing a versatile multimedia processor. The massive-parallel memory-embedded SIMD matrix has 2,048 2-bit processing elements, which are connected by a flexible switching network, and supports 2-bit 2,048-way bit-serial and word-parallel operations with a single command. The SIMD matrix architecture is verified to be a better way for processing the repeated arithmetic operation types in multimedia applications. The proposed architecture, reported in this paper, exploits in addition CAM technology and enables therefore fast pipelined table-lookup coding operations. Since both arithmetic and table-lookup operations execute extremely fast, the proposed novel architecture can realize consequently efficient and versatile multimedia data processing. Evaluation results of the proposed CAM-enhanced massive-parallel SIMD matrix processor for the example of the frequently used JPEG image-compression application show that the necessary clock cycle number can be reduced by 86% in comparison to a conventional mobile DSP architecture. The determined performances in Mpixel/mm2 are factors 3.3 and 4.4 better than with a CAM-less massive-parallel memory-embedded SIMD matrix processor and a conventional mobile DSP, respectively.
Fukashi MORISHITA Hideyuki NODA Isamu HAYASHI Takayuki GYOHTEN Mako OKAMOTO Takashi IPPOSHI Shigeto MAEGAWA Katsumi DOSAKA Kazutami ARIMOTO
We propose a novel capacitorless twin-transistor random access memory (TTRAM). The 2 Mb test device has been fabricated on 130 nm SOI-CMOS process. We demonstrate the TTRAM cell has two data-storage states and confirm the data retention time of 100 ms at 80. TTRAM process is compatible with the conventional SOI-CMOS and never requires any additional processes. A 6.1 ns row-access time is achieved and 250 MHz operation can be realized by using 2 bank 8 b-burst mode.
Takayuki GYOHTEN Fukashi MORISHITA Isamu HAYASHI Mako OKAMOTO Hideyuki NODA Katsumi DOSAKA Kazutami ARIMOTO Yasutaka HORIBA
Adaptive voltage management (AVM) scheme is proposed for worst-caseless lower voltage SoC design. The AVM scheme detects the temperature accurately by using two oscillators with different temperature characteristics, and sets supply voltage most suitable with a table look-up method corresponding to the process variation. Also, the AVM can supply the stable voltage with a local shift type regulator even at lower voltage. Thereby, this supply-voltage control system considering PVT variations can control the internal voltage corresponding to process and temperature variations and can realize a wide-operating-margin, DFM function for low voltage SoC. The experimental chip is fabricated on a 90 nm CMOS process, and it was confirmed that the proposed architecture controls the voltage accurately and has a wide operating margin at a lower voltage.
Hans Jurgen MATTAUSCH Koji KISHI Takayuki GYOHTEN
The recent trend towards highly parallel on-chip data processing, as e.g. in single-chip processors with parallel execution capability of multiple instructions, leads to the requirement of on-chip data storage with high random-access bandwidth, parallel access capability and large capacity. The first two requirements call for the application of multi-ported memories. However, the conventional architecture, based on multi-port storage cells for each bit, cannot efficiently realize the large storage capacity, because cell area explodes due to a quadratic increase with port number (N). A promising method for obtaining area efficiency is to increase the size of the smallest unit with N-port capability, e.g. by introducing N-port capability on the level of blocks of 1-port cells and not for each cell. We report a quantitative analysis of this method for the SRAM case, which is based on design data in a 0.5 µm, 2-metal CMOS technology. Achievable area-reduction magnitudes in comparison to the conventional architecture are found to be enormous and to accelerate as a function of N. Reduction factors to areas < 1/2, < 1/5, < 1/14 and < 1/30 are estimated for 4, 8, 16 and 32 ports, respectively. Since the demerit of the proposed approach is an increased access-rejection probability, a trade-off between area reduction and allowable access-rejection probability is always necessary for practical applications. This is discussed for the application of multi-port cache memories.