The search functionality is under construction.

Author Search Result

[Author] Mitsuhisa SATO(6hit)

1-6hit
  • Evaluation of Performance and Power Consumption on Supercomputer Fugaku Using SPEC HPC Benchmarks

    Yuetsu KODAMA  Masaaki KONDO  Mitsuhisa SATO  

     
    PAPER

      Pubricized:
    2022/12/12
      Vol:
    E106-C No:6
      Page(s):
    303-311

    The supercomputer, “Fugaku”, which ranked number one in multiple supercomputing lists, including the Top500 in June 2020, has various power control features, such as (1) an eco mode that utilizes only one of two floating-point pipelines while decreasing the power supply to the chip; (2) a boost mode that increases clock frequency; and (3) a core retention feature that turns unused cores to the low-power state. By orchestrating these power-performance features while considering the characteristics of running applications, we can potentially gain even better system-level energy efficiency. In this paper, we report on the performance and power consumption of Fugaku using SPEC HPC benchmarks. Consequently, we confirmed that it is possible to reduce the energy by about 17% while improving the performance by about 2% from the normal mode by combining boost mode and eco mode.

  • Cylindrical Active Phased Array Antenna

    Mitsuhisa SATO  Masayuki SUGANO  Kazuo IKEBA  Koichi FUKUTANI  Atushi TERADA  Tsugio YAMAZAKI  

     
    PAPER-Radar System

      Vol:
    E76-B No:10
      Page(s):
    1243-1248

    A cylindrical active phased array antenna was developed. A primary surveillance radar (PSR) antenna and a secondary surveillance radar (SSR) antenna are integrated conformally. The PSR antenna employs two-dimensional electronic beam scanning. The SSR antenna employs electronic beam scanning in azimuth. Advantages of this antenna, design architecture employed and measured characteristics are described.

  • Processor Pipeline Design for Fast Network Message Handling in RWC-1 Multiprocessor

    Hiroshi MATSUOKA  Kazuaki OKAMOTO  Hideo HIRONO  Mitsuhisa SATO  Takashi YOKOTA  Shuichi SAKAI  

     
    PAPER

      Vol:
    E81-C No:9
      Page(s):
    1391-1397

    In this paper we describe the pipeline design and enhanced hardware for fast message handling in a RICA-1 processor, a processing element (PE) in the RWC-1 multiprocessor. The RWC-1 is based on the reduced inter-processor communication architecture (RICA), in which communications are combined with computation in the processor pipeline. The pipeline is enhanced with hardware mechanisms to support fine-grain parallel execution. The data paths of the RICA-1 super-scalar processor are commonly used for communication as well as instruction execution to minimize its implementation cost. A 128-PE system has been built on January 1998, and it is currently used for hardware debugging, software development and performance evaluation.

  • Dynamic Multiple Work Stealing Strategy for Flexible Load Balancing

    ADNAN  Mitsuhisa SATO  

     
    PAPER-Fundamentals of Information Systems

      Vol:
    E95-D No:6
      Page(s):
    1565-1576

    Lazy-task creation is an efficient method of overcoming the overhead of the grain-size problem in parallel computing. Work stealing is an effective load balancing strategy for parallel computing. In this paper, we present dynamic work stealing strategies in a lazy-task creation technique for efficient fine-grain task scheduling. The basic idea is to control load balancing granularity depending on the number of task parents in a stack. The dynamic-length strategy of work stealing uses run-time information, which is information on the load of the victim, to determine the number of tasks that a thief is allowed to steal. We compare it with the bottommost first work stealing strategy used in StackThread/MP, and the fixed-length strategy of work stealing, where a thief requests to steal a fixed number of tasks, as well as other multithreaded frameworks such as Cilk and OpenMP task implementations. The experiments show that the dynamic-length strategy of work stealing performs well in irregular workloads such as in UTS benchmarks, as well as in regular workloads such as Fibonacci, Strassen's matrix multiplication, FFT, and Sparse-LU factorization. The dynamic-length strategy works better than the fixed-length strategy because it is more flexible than the latter; this strategy can avoid load imbalance due to overstealing.

  • Development and Implementation of an Interactive Parallelization Assistance Tool for OpenMP: iPat/OMP

    Makoto ISHIHARA  Hiroki HONDA  Mitsuhisa SATO  

     
    PAPER-Parallel/Distributed Programming Models, Paradigms and Tools

      Vol:
    E89-D No:2
      Page(s):
    399-407

    iPat/OMP is an interactive parallelization assistance tool for OpenMP. In the present paper, we describe the design concept of iPat/OMP, the parallelization sequence achieved by the tool and its current implementation status. In addition, we present an evaluation of the performance of the implemented functionalities. The experimental results show that iPat/OMP can detect parallelism and create an appropriate OpenMP directive for several for-loops.

  • Message-Based Efficient Remote Memory Access on a Highly Parallel Computer EM-X

    Yuetsu KODAMA  Hirohumi SAKANE  Mitsuhisa SATO  Hayato YAMANA  Shuichi SAKAI  Yoshinori YAMAGUCHI  

     
    PAPER-Architectures

      Vol:
    E79-D No:8
      Page(s):
    1065-1071

    Communication latency is central to multiprocessor design. This study presents the design principles of the EM-X distributed-memory multiprocessor towards tolerating communication latency. The EM-X overlaps computation with communication for latency tolerance by multithreading. In particular, we present two types of hardware support for remote memory access: (1) priority-based packet scheduling for thread invocation, and (2) direct remote memory access. The priority-based scheduling policy extends a FIFO ordered thread invocation policy to adopt to different computational needs. The direct remote memory access is designed to overlap remote memory operations with thread execution. The 80-processor prototype of EM-X is developed and is operational since December 1995. We execute several programs on the machine and evaluate how the EM-X effectively overlaps computation with communication toward tolerating communication latency for high performance parallel computing.