The search functionality is under construction.

Author Search Result

[Author] Naohiko IRIE(4hit)

1-4hit
  • Embedded Processor Core with 64-Bit Architecture and Its System-On-Chip Integration for Digital Consumer Products

    Kunio UCHIYAMA  Fumio ARAKAWA  Yasuhiko SAITO  Koki NOGUCHI  Atsushi HASEGAWA  Shinichi YOSHIOKA  Naohiko IRIE  Takeshi KITAHARA  Mark DEBBAGE  Andy STURGES  

     
    PAPER

      Vol:
    E84-C No:2
      Page(s):
    139-149

    A 64-bit architecture for an embedded processor targeted for next-generation digital consumer products has been developed. It has dual-mode instruction sets and is optimized for high multimedia performance, provided by SIMD/floating-point vector instructions in 32-bit length ISA, and small code size, provided by a conventional 16-bit length ISA. Large register files, (6464b and 6432b), a split-branch mechanism, and virtual cache are also adopted in the architecture. A 714MIPS/9.6 GOPS/400 MHz processor core with the 64-bit architecture and a system LSI containing the core are developed using 0.15-µm technology. The LSI includes a 3.2 GB/sec high-bandwidth on-chip bus, a high-speed DRAM interface, a SRAM/Flash/ROM/Multiplexed-bus interface, and a 66 MHz PCI interface that provide the performance required for next-generation multimedia applications.

  • Branch Micro-Architecture of an Embedded Processor with Split Branch Architecture for Digital Consumer Products

    Naohiko IRIE  Fumio ARAKAWA  Kunio UCHIYAMA  Shinichi YOSHIOKA  Atsushi HASEGAWA  Kevin IADONATE  Mark DEBBAGE  David SHEPHERD  Margaret GEARTY  

     
    PAPER-High-Performance Technologies

      Vol:
    E85-C No:2
      Page(s):
    315-322

    An embedded processor core using split branch architecture has been developed. This processor core targets 400 MHz using 0.18 µm technology, and its higher frequency needs deeper pipeline than the conventional processor. To solve the increasing branch penalty problem caused by a deeper pipeline, this processor takes an active preload mechanism to preload the target instructions to internal buffers in order to hide the instruction cache latency. The processor also uses multiple instruction buffers to reduce branch penalty cycles of branch misprediction. The performance estimation result shows that about 70% of branch overhead cycles can be reduced from the conventional implementation. The area for this branch mechanism consumes only 1% of the total core, which is smaller than the conventional branch target buffer (BTB) scheme, and helps to achieve low power and low cost.

  • Multi-Core/Multi-IP Technology for Embedded Applications Open Access

    Naohiko IRIE  Toshihiro HATTORI  

     
    INVITED PAPER

      Vol:
    E92-C No:10
      Page(s):
    1232-1239

    SoC has driven the evolution of embedded systems or consumer electronics. Multi-core/multi-IP is the key technology to integrate many functions on a SoC for future embedded applications. In this paper, the transition of SoC and its required functions for cellular phones as an example is described. And the state-of-the-art multi-core technology of homogeneous type and heterogeneous type are shown. When many cores and IPs are integrated on a chip, collaboration between cores and IPs becomes important to meet requirement. To realize it, "MPSoC Platform" concept and elementary technology for this platform is described.

  • A Hardware Accelerator for JavaTM Platforms on a 130-nm Embedded Processor Core

    Tetsuya YAMADA  Naohiko IRIE  Takanobu TSUNODA  Takahiro IRITA  Kenji KITAGAWA  Ryohei YOSHIDA  Keisuke TOYAMA  Motoaki SATOYAMA  

     
    PAPER-Integrated Electronics

      Vol:
    E90-C No:2
      Page(s):
    523-530

    We have developed a hardware accelerator for Java platforms, integrated on a SuperH microprocessor core, using a 130-nm CMOS process. The Java accelerator, a bytecode translation unit (BTU), is tightly coupled with the CPU to share resources. The BTU supports 159 basic bytecodes and 5 or 6 optional bytecodes. It supports both connected device configuration (CDC) 1.0 and connected limited device configuration (CLDC) 1.0.4 technologies. The BTU corresponds to the dual-issued superscalar CPU and applies a new method, control-sharing. With this method, the BTU always grasps the pipeline status of the CPU, and the Java program is processed by both the BTU and the CPU. To implement this method, we developed some acceleration techniques: fast branch requests, enhanced CPU instructions, Java runtime exception detection hardware, and fewer overhead cycles of handover between the BTU and the CPU. In particular, the BTU can detect Java runtime exceptions in parallel with other processing, such as an array access. With previous methods, there is a disadvantage in that CPU efficiency decreases for Java-specific processing, such as array index bounds checking. The sample chip was fabricated in Renesas 130-nm, five-layer Cu, dual-vth low-power CMOS technology. The chip runs at 216 MHz and 1.2 V. The BTU has 75 kG. The benchmark on an evaluation board showed 6.55 embedded caffeine marks (ECM)/MHz on the CLDC 1.0.4 configuration, a tenfold speed increase without the BTU for roughly the same power consumption. In other words, power savings of 90 percent with the same performance were achieved.