The search functionality is under construction.

Keyword Search Result

[Keyword] HPC(10hit)

1-10hit
  • Evaluation of Performance and Power Consumption on Supercomputer Fugaku Using SPEC HPC Benchmarks

    Yuetsu KODAMA  Masaaki KONDO  Mitsuhisa SATO  

     
    PAPER

      Pubricized:
    2022/12/12
      Vol:
    E106-C No:6
      Page(s):
    303-311

    The supercomputer, “Fugaku”, which ranked number one in multiple supercomputing lists, including the Top500 in June 2020, has various power control features, such as (1) an eco mode that utilizes only one of two floating-point pipelines while decreasing the power supply to the chip; (2) a boost mode that increases clock frequency; and (3) a core retention feature that turns unused cores to the low-power state. By orchestrating these power-performance features while considering the characteristics of running applications, we can potentially gain even better system-level energy efficiency. In this paper, we report on the performance and power consumption of Fugaku using SPEC HPC benchmarks. Consequently, we confirmed that it is possible to reduce the energy by about 17% while improving the performance by about 2% from the normal mode by combining boost mode and eco mode.

  • Hybrid Electrical/Optical Switch Architectures for Training Distributed Deep Learning in Large-Scale

    Thao-Nguyen TRUONG  Ryousei TAKANO  

     
    PAPER-Information Network

      Pubricized:
    2021/04/23
      Vol:
    E104-D No:8
      Page(s):
    1332-1339

    Data parallelism is the dominant method used to train deep learning (DL) models on High-Performance Computing systems such as large-scale GPU clusters. When training a DL model on a large number of nodes, inter-node communication becomes bottle-neck due to its relatively higher latency and lower link bandwidth (than intra-node communication). Although some communication techniques have been proposed to cope with this problem, all of these approaches target to deal with the large message size issue while diminishing the effect of the limitation of the inter-node network. In this study, we investigate the benefit of increasing inter-node link bandwidth by using hybrid switching systems, i.e., Electrical Packet Switching and Optical Circuit Switching. We found that the typical data-transfer of synchronous data-parallelism training is long-lived and rarely changed that can be speed-up with optical switching. Simulation results on the Simgrid simulator show that our approach speed-up the training time of deep learning applications, especially in a large-scale manner.

  • Efficient and Precise Profiling, Modeling and Management on Power and Performance for Power Constrained HPC Systems

    Yuan HE  Yasutaka WADA  Wenchao LUO  Ryuichi SAKAMOTO  Guanqin PAN  Thang CAO  Masaaki KONDO  

     
    PAPER

      Pubricized:
    2020/12/01
      Vol:
    E104-C No:6
      Page(s):
    237-246

    Due to the slowdown of Moore's Law, power limitation has been one of the most critical issues for current and future HPC systems. To more efficiently utilize HPC systems when power budgets or deadlines are given, it is very desirable to accurately estimate the performance or power consumption of applications before conducting their tuned production runs on any specific systems. In order to ease such estimations, we showcase a straight-forward and yet effective method, based on the enhanced power management framework and DSL we developed, to help HPC users to clarify the performance and power relationships of their applications. This method demonstrates an easy process of profiling, modeling and management on both performance and power of HPC systems and applications. In our evaluations, only a few (up to 3) profiled runs are necessary before very precise models of HPC applications can be obtained through this method (and algorithm), which has dramatically improved the efficiency of and lowered the difficulty in utilizing HPC systems under limited power budgets.

  • Application Mapping and Scheduling of Uncertain Communication Patterns onto Non-Random and Random Network Topologies

    Yao HU  Michihiro KOIBUCHI  

     
    PAPER-Computer System

      Pubricized:
    2020/07/20
      Vol:
    E103-D No:12
      Page(s):
    2480-2493

    Due to recent technology progress based on big-data processing, many applications present irregular or unpredictable communication patterns among compute nodes in high-performance computing (HPC) systems. Traditional communication infrastructures, e.g., torus or fat-tree interconnection networks, may not handle well their matchmaking problems with these newly emerging applications. There are already many communication-efficient application mapping algorithms for these typical non-random network topologies, which use nearby compute nodes to reduce the network distances. However, for the above unpredictable communication patterns, it is difficult to efficiently map their applications onto the non-random network topologies. In this context, we recommend using random network topologies as the communication infrastructures, which have drawn increasing attention for the use of HPC interconnects due to their small diameter and average shortest path length (ASPL). We make a comparative study to analyze the impact of application mapping performance on non-random and random network topologies. We propose using topology embedding metrics, i.e., diameter and ASPL, and list several diameter/ASPL-based application mapping algorithms to compare their job scheduling performances, assuming that the communication pattern of each application is unpredictable to the computing system. Evaluation with a large compound application workload shows that, when compared to non-random topologies, random topologies can reduce the average turnaround time up to 39.3% by a random connected mapping method and up to 72.1% by a diameter/ASPL-based mapping algorithm. Moreover, when compared to the baseline topology mapping method, the proposed diameter/ASPL-based topology mapping strategy can reduce up to 48.0% makespan and up to 78.1% average turnaround time, and improve up to 1.9x system utilization over random topologies.

  • A Multidimensional Configurable Processor Array — Vocalise

    Jiang LI  Yusuke ATSUMARI  Hiromasa KUBO  Yuichi OGISHIMA  Satoru YOKOTA  Hakaru TAMUKOH  Masatoshi SEKINE  

     
    PAPER-Computer System

      Pubricized:
    2014/10/27
      Vol:
    E98-D No:2
      Page(s):
    313-324

    A processing system with multiple field programmable gate array (FPGA) cards is described. Each FPGA card can interconnect using six I/O (up, down, left, right, front, and back) terminals. The communication network among FPGAs is scalable according to user design. When the system operates multi-dimensional applications, transmission efficiency among FPGA improved through user-adjusted dimensionality and network topologies for different applications. We provide a fast and flexible circuit configuration method for FPGAs of a multi-dimensional FPGA array. To demonstrate the effectiveness of the proposed method, we assess performance and power consumption of a circuit that calculated 3D Poisson equations using the finite difference method.

  • EDISON Science Gateway: A Cyber-Environment for Domain-Neutral Scientific Computing

    Hoon RYU  Jung-Lok YU  Duseok JIN  Jun-Hyung LEE  Dukyun NAM  Jongsuk LEE  Kumwon CHO  Hee-Jung BYUN  Okhwan BYEON  

     
    PAPER-Scientific Application

      Vol:
    E97-D No:8
      Page(s):
    1953-1964

    We discuss a new high performance computing service (HPCS) platform that has been developed to provide domain-neutral computing service under the governmental support from “EDucation-research Integration through Simulation On the Net” (EDISON) project. With a first focus on technical features, we not only present in-depth explanations of the implementation details, but also describe the strengths of the EDISON platform against the successful nanoHUB.org gateway. To validate the performance and utility of the platform, we provide benchmarking results for the resource virtualization framework, and prove the stability and promptness of the EDISON platform in processing simulation requests by analyzing several statistical datasets obtained from a three-month trial service in the initiative area of computational nanoelectronics. We firmly believe that this work provides a good opportunity for understanding the science gateway project ongoing for the first time in Republic of Korea, and that the technical details presented here can be served as an useful guideline for any potential designs of HPCS platforms.

  • Cooperative VM Migration: A Symbiotic Virtualization Mechanism by Leveraging the Guest OS Knowledge

    Ryousei TAKANO  Hidemoto NAKADA  Takahiro HIROFUCHI  Yoshio TANAKA  Tomohiro KUDOH  

     
    PAPER

      Vol:
    E96-D No:12
      Page(s):
    2675-2683

    A virtual machine (VM) migration is useful for improving flexibility and maintainability in cloud computing environments. However, VM monitor (VMM)-bypass I/O technologies, including PCI passthrough and SR-IOV, in which the overhead of I/O virtualization can be significantly reduced, make VM migration impossible. This paper proposes a novel and practical mechanism, called Symbiotic Virtualization (SymVirt), for enabling migration and checkpoint/restart on a virtualized cluster with VMM-bypass I/O devices, without the virtualization overhead during normal operations. SymVirt allows a VMM to cooperate with a message passing layer on the guest OS, then it realizes VM-level migration and checkpoint/restart by using a combination of a user-level dynamic device configuration and coordination of distributed VMMs. We have implemented the proposed mechanism on top of QEMU/KVM and the Open MPI system. All PCI devices, including Infiniband, Ethernet, and Myrinet, are supported without implementing specific para-virtualized drivers; and it is not necessary to modify either of the MPI runtime and applications. Using the proposed mechanism, we demonstrate reactive and proactive FT mechanisms on a virtualized Infiniband cluster. We have confirmed the effectiveness using both a memory intensive micro benchmark and the NAS parallel benchmark.

  • SSM-HPC: Front View Gait Recognition Using Spherical Space Model with Human Point Clouds

    Jegoon RYU  Sei-ichiro KAMATA  Alireza AHRARY  

     
    PAPER-Image Recognition, Computer Vision

      Vol:
    E95-D No:7
      Page(s):
    1969-1978

    In this paper, we propose a novel gait recognition framework - Spherical Space Model with Human Point Clouds (SSM-HPC) to recognize front view of human gait. A new gait representation - Marching in Place (MIP) gait is also introduced which preserves the spatiotemporal characteristics of individual gait manner. In comparison with the previous studies on gait recognition which usually use human silhouette images from image sequences, this research applies three dimensional (3D) point clouds data of human body obtained from stereo camera. The proposed framework exhibits gait recognition rates superior to those of other gait recognition methods.

  • Study of a Microwave Simulation Dedicated Computer, FDTD/FIT Data Flow Machine

    Shun-suke MATSUOKA  Katsunori OHMI  Hideki KAWAGUCHI  

     
    PAPER

      Vol:
    E86-C No:11
      Page(s):
    2199-2206

    For High Performance Computing (HPC) of electromagnetic microwave simulations, the authors present concept for a microwave simulation dedicated computer, FDTD/FIT data flow machine. By constructing a dedicated computer customized to the data flow of the FDTD or FIT scheme, we can obtain maximum performance from the FDTD/FIT simulations and achieve T FLOPS performance computing by using much smaller size computer system than conventional supercomputers. In addition to the basic idea, this paper identifies with solution to some other factors which are needed to execute practical simulations (e.g., boundary condition setting, power input, simulation result data upload to PC, etc.). Moreover, the VHDL design and logical simulation of the 2D data flow machine are also shown as the first step of development of the FDTD/FIT data flow machine.

  • The Development of the Earth Simulator

    Shinichi HABATA  Mitsuo YOKOKAWA  Shigemune KITAWAKI  

     
    INVITED PAPER

      Vol:
    E86-D No:10
      Page(s):
    1947-1954

    The Earth Simulator (ES), developed by the Japanese government's initiative "Earth Simulator project," is a highly parallel vector supercomputer system. In May 2002, the ES was proven to be the most powerful computer in the world by achieving 35.86 teraflops on the LINPACK benchmark and 26.58 teraflops for a global atmospheric circulation model with the spectral method. Three architectural features enabled these great achievements; vector processor, shared-memory and high-bandwidth non-blocking interconnection crossbar network. In this paper, an overview of the ES, the three architectural features and the result of performance evaluation are described particularly with its hardware realization of the interconnection among 640 processor nodes.