1-9hit |
Takahiro SEKI Satoshi AKUI Katsunori SENO Masakatsu NAKAI Tetsumasa MEGURO Tetsuo KONDO Akihiko HASHIGUCHI Hirokazu KAWAHARA Kazuo KUMANO Masayuki SHIMURA
In this paper, a Dynamic Voltage and Frequency Management (DVFM) scheme introduced in a microprocessor for handheld devices with wideband embedded DRAM is reported. Our DVFM scheme reduces the power consumption effectively by cooperation of the autonomous clock frequency control and the adaptive supply voltage control. The clock frequency is controlled using hardware activity information to determine the minimum value required by the current processor load. This clock frequency control is realized without special power management software. The supply voltage is controlled according to the delay information provided from a delay synthesizer circuit, which consists of three programmable delay components, gate delay, RC delay and a rise/fall delay. The delay synthesizer circuit emulates the critical-path delay within 4% voltage accuracy over the full range of process deviation and voltage. This accurate tracking ability realizes the supply voltage scaling according to the fluctuation of the LSI's characteristic caused by the temperature and process deviation. The DVFM contributes not only the dynamic power reduction, but also the leakage power reduction. This microprocessor, fabricated in 0.18 µm CMOS embedded DRAM technology achieves 82% power reduction in a Personal Information Management scheduler (PIM) application and 40% power reduction in a MPEG4 movie playback application. As process technology shrinks, the DVFM scheme with leakage power compensation effect will become more important realizing in high-performance and low-power mobile consumer applications.
Takashi HASHIMOTO Shunichi KUROMARU Masayoshi TOUJIMA Yasuo KOHASHI Masatoshi MATSUO Toshihiro MORIIWA Masahiro OHASHI Tsuyoshi NAKAMURA Mana HAMADA Yuji SUGISAWA Miki KUROMARU Tomonori YONEZAWA Satoshi KAJITA Takahiro KONDO Hiroki OTSUKI Kohkichi HASHIMOTO Hiromasa NAKAJIMA Taro FUKUNAGA Hiroaki TOIDA Yasuo IIZUKA Hitoshi FUJIMOTO Junji MICHIYAMA
A low power MPEG-4 video codec LSI with the capability for core profile decoding is presented. A 16-b DSP with a vector pipeline architecture and a 32-b arithmetic unit, eight dedicated hardware engines to accelerate MPEG-4 SP@L1 codec, CP@L1 decoding and post video processing, 20-Mb embedded DRAM, and three peripheral blocks are integrated together on a single chip. MPEG-4 SP@L1 codec, CP@L1 decoding and post video processing are realized with a hybrid architecture consisting of a programmable DSP and dedicated hardware engines at low operating frequency. In order to reduce the power consumption, clock gating technique is fully adopted in each hardware block and embedded DRAM is employed. The chip is implemented using 0.18-µm quad-metal CMOS technology, and its die area is 8.8 mm 8.6 mm. The power consumption is 90 mW at a SP@L1 codec and 110 mW at a CP@L1 decoding.
Naoya WATANABE Fukashi MORISHITA Yasuhiko TAITO Akira YAMAZAKI Tetsushi TANIZAKI Katsumi DOSAKA Yoshikazu MOROOKA Futoshi IGAUE Katsuya FURUE Yoshihiro NAGURA Tatsunori KOMOIKE Toshinori MORIHARA Atsushi HACHISUKA Kazutami ARIMOTO Hideyuki OZAKI
This paper describes an Embedded DRAM Hybrid Macro, which supports various memory specifications. The eDRAM module generator with Hybrid Macro provides more than 120,000 eDRAM configurations. This eDRAM includes a new architecture called Auto Signal Management (ASM) architecture, which automatically adjusts the timing of the control signals for various eDRAM configurations, and reduces the design Turn Around Time. An Enhanced-on-chip Tester performs the maximum 512b I/O pass/fail simultaneous judgments and the real time repair analysis. The eDRAM testing time is reduced to about 1/64 of the time required using the conventional technique. A test chip is fabricated using a 0.18 µm 4-metal embedded DRAM technology, which utilizes the triple-well, dual-Tox, and Co salicide process technologies. This chip achieves a wide voltage range operation of 1.2 V at 100 MHz to 1.8 V at 200 MHz.
Yoshihiro NAGURA Yoshinori FUJIWARA Katsuya FURUE Ryuji OHMURA Tatsunori KOMOIKE Takenori OKITAKA Tetsushi TANIZAKI Katsumi DOSAKA Kazutami ARIMOTO Yukiyoshi KODA Tetsuo TADA
The increase of test time of embedded DRAMs (e-DRAM) is one of the key issues of System-on-chip (SOC) device test. This paper proposes to put the repair analysis function on chip as Built In Self Repair (BISR). BISR is performed at 166 MHz as at-speed of e-DRAM with using low cost automatic test equipment (ATE). The area of the BISR is 1.7 mm2. Using error storage table form contributes to realize small area penalty of repair analysis function. e-DRAM function test time by BISR was about 20% less than the conventional method at wafer level testing. Moreover, representative samples are produced to confirm repair analysis ability. The results show that all of the samples are actually repaired by repair information generated by BISR.
Akira YAMAZAKI Takeshi FUJINO Kazunari INOUE Isamu HAYASHI Hideyuki NODA Naoya WATANABE Fukashi MORISHITA Katsumi DOSAKA Yoshikazu MOROOKA Shinya SOEDA Kazutami ARIMOTO Setsuo WAKE Kazuyasu FUJISHIMA Hideyuki OZAKI
A 23.3 mm2 32 Mb embedded DRAM (eDRAM) macro has been fabricated using 0.18 µm triple-well 4-metal embedded DRAM process technology to realize an accelerated 3-D graphics controller. The array architecture, using a dual-port sense amplifier, achieves the column access latency of two cycles at 222 MHz and a peak data rate of 14.2 4 GB/s at 4 macros. The process cost has been kept low by using VT-MOS circuit technology and taking advantage of a characteristic of dual-gate oxide process technology. A tRAC of 11.6 ns at 2.0 V is achieved using a 'pre-detect redundancy' circuit.
Tadaaki YAMAUCHI Lance HAMMOND Oyekunle A. OLUKOTUN Kazutami ARIMOTO
A microprocessor integrated with DRAM on the same die has the potential to improve system performance by reducing memory latency and improving memory bandwidth. In this paper we evaluate the performance of a single chip multiprocessor integrated with DRAM when the DRAM is organized as on-chip main memory and as on-chip cache. We compare the performance of this architecture with that of a more conventional chip which only has SRAM-based on-chip cache. The DRAM-based architecture with four processors outperforms the SRAM-based architecture on floating point applications which are effectively parallelized and have large working sets. This performance difference is significantly better than that possible in a uniprocessor DRAM-based architecture, which performs only slightly faster than an SRAM-based architecture on the same applications. In addition, on multiprogrammed workloads, in which independent processes are assigned to every processor in a single chip multiprocessor, the large bandwidth of on-chip DRAM can handle the inter-access contention better. These results demonstrate that a multiprocessor takes better advantage of the large bandwidth provided by the on-chip DRAM than a uniprocessor.
Yoshiharu AIMOTO Tohru KIMURA Yoshikazu YABE Hideki HEIUCHI Youetsu NAKAZAWA Masato MOTOMURA Takuya KOGA Yoshihiro FUJITA Masayuki HAMADA Takaho TANIGAWA Hajime NOBUSAWA Kuniaki KOYAMA
We have developed a parallel image processing RAM (PIP-RAM) which integrates a 16-Mb DRAM and 128 processor elements (PEs) by means of 0. 38-µm CMOS 64-Mb DRAM process technology. It achieves 7. 68-GIPS processing performance and 3. 84-GB/s memory bandwidth with only 1-W power dissipation (@ 30-MHz), and the key to this performance is the DRAM design. This paper presents the key circuit techniques employed in the DRAM design: 1) a paged-segmentation accessing scheme that reduces sense amplifier power dissipation, and 2) a clocked low-voltage-swing differential-charge-transfer scheme that reduces data line power dissipation with the help of a multi-phase synchronization DRAM control scheme. These techniques have general importance for the design of LSIs in which DRAMs and logic are tightly integrated on single chips.
Akira YAMAZAKI Tadato YAMAGATA Yutaka ARITA Makoto TANIGUCHI Michihiro YAMADA
The features for the integration of 1Tr/1C DRAM and logic for graphic and multimedia applications are surveyed. The key circuit/process technology for large scale embedded DRAM cores is described. The methods to improve transistor performance and gate density are shown. Noise immunity design and easy customization techniques are also introduced.
Takao WATANABE Ryo FUJITA Kazumasa YANAGISAWA
The advantages of DRAM-logic integration were demonstrated through a comparison with a conventional separate-chip architecture. Although the available DRAM capacity is restricted by chip size, the integration provides a high throughput and low I/O-power dissipation due to a large number of on-chip I/O lines with small load capacitance. These features result in smaller chip counts as well as lower power dissipation for systems requiring high data throughput and having relatively small memory capacity. The chip count and I/O-power dissipation were formulated for multimedia systems. For the 3-D computer graphics system with a frame of 12801024 pixels requiring a 60-Mbit memory capacity and a 4.8-Gbyte/s throughput, DRAM-logic integration enabled a 1/12 smaller chip count and 1/10 smaller I/O-power dissipation. For the 200-MIPS hand-held portable computing system that had a 16-Mbit memory capacity and required a 416-Mbyte/s throughput, DRAM-logic integration enabled a 1/4 smaller chip count and 1/17 smaller I/O-power dissipation. In addition, innovative architectures that enhance the advantages of DRAM-logic integration were discussed. Pipeline access for a DRAM macro having a cascaded multi-bank structure, an on-chip cache DRAM, and parallel processing with a reduced supply voltage were introduced.