1-4hit |
Kunio UCHIYAMA Fumio ARAKAWA Yasuhiko SAITO Koki NOGUCHI Atsushi HASEGAWA Shinichi YOSHIOKA Naohiko IRIE Takeshi KITAHARA Mark DEBBAGE Andy STURGES
A 64-bit architecture for an embedded processor targeted for next-generation digital consumer products has been developed. It has dual-mode instruction sets and is optimized for high multimedia performance, provided by SIMD/floating-point vector instructions in 32-bit length ISA, and small code size, provided by a conventional 16-bit length ISA. Large register files, (6464b and 6432b), a split-branch mechanism, and virtual cache are also adopted in the architecture. A 714MIPS/9.6 GOPS/400 MHz processor core with the 64-bit architecture and a system LSI containing the core are developed using 0.15-µm technology. The LSI includes a 3.2 GB/sec high-bandwidth on-chip bus, a high-speed DRAM interface, a SRAM/Flash/ROM/Multiplexed-bus interface, and a 66 MHz PCI interface that provide the performance required for next-generation multimedia applications.
Naohiko IRIE Fumio ARAKAWA Kunio UCHIYAMA Shinichi YOSHIOKA Atsushi HASEGAWA Kevin IADONATE Mark DEBBAGE David SHEPHERD Margaret GEARTY
An embedded processor core using split branch architecture has been developed. This processor core targets 400 MHz using 0.18 µm technology, and its higher frequency needs deeper pipeline than the conventional processor. To solve the increasing branch penalty problem caused by a deeper pipeline, this processor takes an active preload mechanism to preload the target instructions to internal buffers in order to hide the instruction cache latency. The processor also uses multiple instruction buffers to reduce branch penalty cycles of branch misprediction. The performance estimation result shows that about 70% of branch overhead cycles can be reduced from the conventional implementation. The area for this branch mechanism consumes only 1% of the total core, which is smaller than the conventional branch target buffer (BTB) scheme, and helps to achieve low power and low cost.
Naohiko IRIE Toshihiro HATTORI
SoC has driven the evolution of embedded systems or consumer electronics. Multi-core/multi-IP is the key technology to integrate many functions on a SoC for future embedded applications. In this paper, the transition of SoC and its required functions for cellular phones as an example is described. And the state-of-the-art multi-core technology of homogeneous type and heterogeneous type are shown. When many cores and IPs are integrated on a chip, collaboration between cores and IPs becomes important to meet requirement. To realize it, "MPSoC Platform" concept and elementary technology for this platform is described.
Tetsuya YAMADA Naohiko IRIE Takanobu TSUNODA Takahiro IRITA Kenji KITAGAWA Ryohei YOSHIDA Keisuke TOYAMA Motoaki SATOYAMA
We have developed a hardware accelerator for Java platforms, integrated on a SuperH microprocessor core, using a 130-nm CMOS process. The Java accelerator, a bytecode translation unit (BTU), is tightly coupled with the CPU to share resources. The BTU supports 159 basic bytecodes and 5 or 6 optional bytecodes. It supports both connected device configuration (CDC) 1.0 and connected limited device configuration (CLDC) 1.0.4 technologies. The BTU corresponds to the dual-issued superscalar CPU and applies a new method, control-sharing. With this method, the BTU always grasps the pipeline status of the CPU, and the Java program is processed by both the BTU and the CPU. To implement this method, we developed some acceleration techniques: fast branch requests, enhanced CPU instructions, Java runtime exception detection hardware, and fewer overhead cycles of handover between the BTU and the CPU. In particular, the BTU can detect Java runtime exceptions in parallel with other processing, such as an array access. With previous methods, there is a disadvantage in that CPU efficiency decreases for Java-specific processing, such as array index bounds checking. The sample chip was fabricated in Renesas 130-nm, five-layer Cu, dual-vth low-power CMOS technology. The chip runs at 216 MHz and 1.2 V. The BTU has 75 kG. The benchmark on an evaluation board showed 6.55 embedded caffeine marks (ECM)/MHz on the CLDC 1.0.4 configuration, a tenfold speed increase without the BTU for roughly the same power consumption. In other words, power savings of 90 percent with the same performance were achieved.