1-10hit |
Tetsuya YAMADA Makoto ISHIKAWA Yuji OGATA Takanobu TSUNODA Takahiro IRITA Saneaki TAMAKI Kunihiko NISHIYAMA Tatsuya KAMEI Ken TATEZAWA Fumio ARAKAWA Takuichiro NAKAZAWA Toshihiro HATTORI Kunio UCHIYAMA
A 32-bit embedded RISC microprocessor core integrating a DSP has been developed using a 0.18-µm five-layer-metal CMOS technology. The integrated DSP has a single-MAC and exploits CPU resources to reduce hardware. The DSP occupies only 0.5 mm2. The processor core includes a large on-chip 128 kB SRAM called U-memory. A large capacity on-chip memory decreases the amount of traffic with an external memory. And it is effective for low-power and high-performance operation. To realize low-power dissipation for the U-memory access, the active ratio of U-memory's access is reduced. The critical path is a load path from the U-memory, and we optimized the path through the whole chip. The chip achieves 0.79 mA/MHz executing Dhrystone 1.1 at 108 MHz, which is suitable for mobile applications.
Kunio UCHIYAMA Fumio ARAKAWA Yasuhiko SAITO Koki NOGUCHI Atsushi HASEGAWA Shinichi YOSHIOKA Naohiko IRIE Takeshi KITAHARA Mark DEBBAGE Andy STURGES
A 64-bit architecture for an embedded processor targeted for next-generation digital consumer products has been developed. It has dual-mode instruction sets and is optimized for high multimedia performance, provided by SIMD/floating-point vector instructions in 32-bit length ISA, and small code size, provided by a conventional 16-bit length ISA. Large register files, (6464b and 6432b), a split-branch mechanism, and virtual cache are also adopted in the architecture. A 714MIPS/9.6 GOPS/400 MHz processor core with the 64-bit architecture and a system LSI containing the core are developed using 0.15-µm technology. The LSI includes a 3.2 GB/sec high-bandwidth on-chip bus, a high-speed DRAM interface, a SRAM/Flash/ROM/Multiplexed-bus interface, and a 66 MHz PCI interface that provide the performance required for next-generation multimedia applications.
Yoshitaka HIRAMATSU Hasitha Muthumala WAIDYASOORIYA Masanori HARIYAMA Toru NOJIRI Kunio UCHIYAMA Michitaka KAMEYAMA
The large data-transfer time among different cores is a big problem in heterogeneous multi-core processors. This paper presents a method to accelerate the data transfers exploiting data-transfer-units together with complex memory allocation. We used block matching, which is very common in image processing, to evaluate our technique. The proposed method reduces the data-transfer time by more than 42% compared to the earlier works that use CPU-based data transfers. Moreover, the total processing time is only 15 ms for a VGA image with 1616 pixel blocks.
Naohiko IRIE Fumio ARAKAWA Kunio UCHIYAMA Shinichi YOSHIOKA Atsushi HASEGAWA Kevin IADONATE Mark DEBBAGE David SHEPHERD Margaret GEARTY
An embedded processor core using split branch architecture has been developed. This processor core targets 400 MHz using 0.18 µm technology, and its higher frequency needs deeper pipeline than the conventional processor. To solve the increasing branch penalty problem caused by a deeper pipeline, this processor takes an active preload mechanism to preload the target instructions to internal buffers in order to hide the instruction cache latency. The processor also uses multiple instruction buffers to reduce branch penalty cycles of branch misprediction. The performance estimation result shows that about 70% of branch overhead cycles can be reduced from the conventional implementation. The area for this branch mechanism consumes only 1% of the total core, which is smaller than the conventional branch target buffer (BTB) scheme, and helps to achieve low power and low cost.
Hideo MAEJIMA Masahiro KAINAGA Kunio UCHIYAMA
This paper describes the design and architecture for a newly developed microprocessor suitable for consumer applications, which we call SuperH. To achieve both low-power and high-speed, the SuperH architecture includes 16-bit fixed length instruction code and several power saving features. The 16-bit fixed length instruction code makes the SuperH possible to achieve excellent code efficiency for the SPECint benchmarks when compared with conventional microcontrollers and RISC's for workstations and PC's. As a result, the SuperH provides almost the same code efficiency as that of 8-bit microcontrollers, and also achieves similar performance as that of RISC's with 32-bit fixed length instruction code. The SuperH also incorporates several power reduction techniques through the control of clock frequency and clock distribution. Thus, the 16-bit code format, power saving features, and other architectural innovations make the SuperH particularly proficient for portable multi-media applications.
Koichiro ISHIBASHI Hisayuki HIGUCHI Toshinobu SHIMBO Kunio UCHIYAMA Kenji SHIOZAWA Naotaka HASHIMOTO Shuji IKEDA
There are various kinds of analog CMOS circuits in microprocessors. IOs, clock distribution circuits including PLL, memories are the main analog circuits. The circuit techniques to achieve low power dissipation combined with high performance in newest prototype chip in the Super H RISC engines are described. A TLB delay can be decreased by using a CAM with a differential amplifier to generate the match signal. The accelerator circuit also helps to speed up the TLB circuit, enabling single-cycle operation. A fabricated 96- mm 2 test chip with the super H architecture using 0. 35-µm four metal CMOS technology is capable of 167-MHz operation at 300 Dhrystone MIPS with 2. 0-W power dissipation.