1-2hit |
Tetsuya YAMADA Naohiko IRIE Takanobu TSUNODA Takahiro IRITA Kenji KITAGAWA Ryohei YOSHIDA Keisuke TOYAMA Motoaki SATOYAMA
We have developed a hardware accelerator for Java platforms, integrated on a SuperH microprocessor core, using a 130-nm CMOS process. The Java accelerator, a bytecode translation unit (BTU), is tightly coupled with the CPU to share resources. The BTU supports 159 basic bytecodes and 5 or 6 optional bytecodes. It supports both connected device configuration (CDC) 1.0 and connected limited device configuration (CLDC) 1.0.4 technologies. The BTU corresponds to the dual-issued superscalar CPU and applies a new method, control-sharing. With this method, the BTU always grasps the pipeline status of the CPU, and the Java program is processed by both the BTU and the CPU. To implement this method, we developed some acceleration techniques: fast branch requests, enhanced CPU instructions, Java runtime exception detection hardware, and fewer overhead cycles of handover between the BTU and the CPU. In particular, the BTU can detect Java runtime exceptions in parallel with other processing, such as an array access. With previous methods, there is a disadvantage in that CPU efficiency decreases for Java-specific processing, such as array index bounds checking. The sample chip was fabricated in Renesas 130-nm, five-layer Cu, dual-vth low-power CMOS technology. The chip runs at 216 MHz and 1.2 V. The BTU has 75 kG. The benchmark on an evaluation board showed 6.55 embedded caffeine marks (ECM)/MHz on the CLDC 1.0.4 configuration, a tenfold speed increase without the BTU for roughly the same power consumption. In other words, power savings of 90 percent with the same performance were achieved.
Tetsuya YAMADA Makoto ISHIKAWA Yuji OGATA Takanobu TSUNODA Takahiro IRITA Saneaki TAMAKI Kunihiko NISHIYAMA Tatsuya KAMEI Ken TATEZAWA Fumio ARAKAWA Takuichiro NAKAZAWA Toshihiro HATTORI Kunio UCHIYAMA
A 32-bit embedded RISC microprocessor core integrating a DSP has been developed using a 0.18-µm five-layer-metal CMOS technology. The integrated DSP has a single-MAC and exploits CPU resources to reduce hardware. The DSP occupies only 0.5 mm2. The processor core includes a large on-chip 128 kB SRAM called U-memory. A large capacity on-chip memory decreases the amount of traffic with an external memory. And it is effective for low-power and high-performance operation. To realize low-power dissipation for the U-memory access, the active ratio of U-memory's access is reduced. The critical path is a load path from the U-memory, and we optimized the path through the whole chip. The chip achieves 0.79 mA/MHz executing Dhrystone 1.1 at 108 MHz, which is suitable for mobile applications.