The search functionality is under construction.

Author Search Result

[Author] Takashi MIYAMORI(3hit)

1-3hit
  • Architecture and Evaluation of Low Power Many-Core SoC with Two 32-Core Clusters

    Takashi MIYAMORI  Hui XU  Hiroyuki USUI  Soichiro HOSODA  Toru SANO  Kazumasa YAMAMOTO  Takeshi KODAKA  Nobuhiro NONOGAKI  Nau OZAKI  Jun TANABE  

     
    PAPER

      Vol:
    E97-C No:4
      Page(s):
    360-368

    New media processing applications such as image recognition and AR (Augment Reality) have become into practical on embedded systems for automotive, digital-consumer and mobile products. Many-core processors have been proposed to realize much higher performance than multi-core processors. We have developed a low-power many-core SoC for multimedia applications in 40nm CMOS technology. Within a 210mm2 die, two 32-core clusters are integrated with dynamically reconfigurable processors, hardware accelerators, 2-channel DDR3 I/Fs, and other peripherals. Processor cores in the cluster share a 2MB L2 cache connected through a tree-based Network-on-Chip (NoC). Its total peak performance exceeds 1.5TOPS (Tera Operations Per Second). The high scalability and low power consumption are accomplished by parallelized software for multimedia applications. In case of face detection, the performance scales up to 64 cores and the SoC consumes only 2.21W. Moreover, it can execute the 1080p 48fps H.264 decoding about 520mW by 28 cores and the 4K2K 15fps super resolution about 770mW by 32 cores in one cluster. Exploiting parallelism by low power processor cores, the many-core SoC provides several tens of times better energy efficiency than that of a high performance desk-top quad-core processor.

  • REMARC: Reconfigurable Multimedia Array Coprocessor

    Takashi MIYAMORI  Kunle OLUKOTUN  

     
    PAPER-Computer Hardware and Design

      Vol:
    E82-D No:2
      Page(s):
    389-397

    This paper describes a new reconfigurable processor architecture called REMARC (Reconfigurable Multimedia Array Coprocessor). REMARC is a small array processor that is tightly coupled to a main RISC processor. It consists of a global control unit and 64 16-bit processors called nano processors. REMARC is designed to accelerate multimedia applications, such as video compression, decompression, and image processing. These applications typically use 8-bit or 16-bit data therefore, each nano processor has a 16-bit datapath that is much wider than those of other reconfigurable coprocessors. We have developed a programming environment for REMARC and several realistic application programs, DES encryption, MPEG-2 decoding, and MPEG-2 encoding. REMARC can implement various parallel algorithms which appear in these multimedia applications. For instance, REMARC can implement SIMD type instructions similar to multimedia instruction extensions for motion compensation of the MPEG-2 decoding. Furthermore, the highly pipelined algorithms, like systolic algorithms, which appear in motion estimation of the MPEG-2 encoding can also be implemented efficiently. REMARC achieves speedups ranging from a factor of 2.3 to 21.2 over the base processor which is a single issue processor or 2-issue superscalar processor. We also compare its performance with multimedia instruction extensions. Using more processing resources, REMARC can achieve higher performance than multimedia instruction extensions.

  • A 4GOPS 3 Way-VLIW Image Recognition Processor Based on a Configurable Media Processor

    Hiroyuki TAKANO  Takashi MIYAMORI  Yasuhiro TANIGUCHI  Yoshihisa KONDO  

     
    PAPER-Product Designs

      Vol:
    E85-C No:2
      Page(s):
    347-351

    A 4GOPS 3 way-VLIW image recognition processor for an automobile system has been developed. The processor is based on a configurable and extensible media processor enabling optimization for a specific application by means of design-time configuration. Using VLIW coprocessor extension, the processor can satisfy the performance requirements of the system. Overhead by VLIW-mode instructions is only 7%. The VLIW co-processor occupies only 12% of the die area. Thus, good cost-performance for media processing in each embedded system can be achieved by this configurable media processor.