1-5hit |
Yuetsu KODAMA Masaaki KONDO Mitsuhisa SATO
The supercomputer, “Fugaku”, which ranked number one in multiple supercomputing lists, including the Top500 in June 2020, has various power control features, such as (1) an eco mode that utilizes only one of two floating-point pipelines while decreasing the power supply to the chip; (2) a boost mode that increases clock frequency; and (3) a core retention feature that turns unused cores to the low-power state. By orchestrating these power-performance features while considering the characteristics of running applications, we can potentially gain even better system-level energy efficiency. In this paper, we report on the performance and power consumption of Fugaku using SPEC HPC benchmarks. Consequently, we confirmed that it is possible to reduce the energy by about 17% while improving the performance by about 2% from the normal mode by combining boost mode and eco mode.
Yuetsu KODAMA Toshihiro KATASHITA Kenji SAYANO
REX is a reconfigurable experimental system for evaluating and developing parallel computer systems. It consists of large-scale FPGAs, and enables the systems to be reconfigured from their processors to the network topology in order to support their evaluation and development. We evaluated REX using several implementations of parallel computer systems, and showed that it had enough scalability of gates, memory throughput and network throughput. We also showed that REX was an effective tool because of its emulation speed and reconfigurability to develop systems.
Ryousei TAKANO Tomohiro KUDOH Yuetsu KODAMA Fumihiro OKAZAKI
Packet pacing is a well-known technique for reducing the short-time-scale burstiness of traffic, and software-based packet pacing has been categorized into two approaches: the timer interrupt-based approach and the gap packet-based approach. The former was originally hard to implement for Gigabit class networks because it requires the operating system to handle too frequent periodic timer interrupts, thus incurring a large overhead. On the other hand, a gap packet-based packet pacing mechanism achieves precise pacing without depending on the timer resolution. However, in order to guarantee the accuracy of rate control, the system must be able to transmit packets at the wire rate. In this paper, we propose a high-resolution timer-based packet pacing mechanism that determines the transmission timing of packets by using a sub-microsecond resolution timer. The high-resolution timer is a light-weight mechanism compared to the traditional low-resolution periodic timer. With recent progress in hardware protocol offload technologies and multicore-aware network protocol stacks, we believe high-resolution timer-based packet pacing has become practical. Our experimental results show that the proposed mechanism can work on a wider range of systems without degrading the accuracy of rate control. However, a higher CPU load is observed when the number of traffic classes increases, compared to a gap packet-based pacing mechanism.
Yuetsu KODAMA Hirohumi SAKANE Mitsuhisa SATO Hayato YAMANA Shuichi SAKAI Yoshinori YAMAGUCHI
Communication latency is central to multiprocessor design. This study presents the design principles of the EM-X distributed-memory multiprocessor towards tolerating communication latency. The EM-X overlaps computation with communication for latency tolerance by multithreading. In particular, we present two types of hardware support for remote memory access: (1) priority-based packet scheduling for thread invocation, and (2) direct remote memory access. The priority-based scheduling policy extends a FIFO ordered thread invocation policy to adopt to different computational needs. The direct remote memory access is designed to overlap remote memory operations with thread execution. The 80-processor prototype of EM-X is developed and is operational since December 1995. We execute several programs on the machine and evaluate how the EM-X effectively overlaps computation with communication toward tolerating communication latency for high performance parallel computing.
Yoshinori YAMAGUCHI Shuichi SAKAI Yuetsu KODAMA
This paper presents the synchronization mechanisms of the highly parallel dataflow machine EM-4 with some results of measurement. First, various synchronization mechanisms of parallel computers are surveyed and compared, including dataflow synchronization. Then, the fundamental synchronization mechanisms of the EM-4 are shown, examining the reason why they are adopted. There are three types of synchronizations: (1) strongly connected instruction sequencing, (2) instruction level direct matching, and (3) function level synchronization. These mechanisms are preliminary evaluated on the EM-4 prototype, and the results are reported and analyzed. Next, synchronization mechanisms for resource managements are described.