The search functionality is under construction.

Keyword Search Result

[Keyword] supercomputer(7hit)

1-7hit
  • Evaluation of Performance and Power Consumption on Supercomputer Fugaku Using SPEC HPC Benchmarks

    Yuetsu KODAMA  Masaaki KONDO  Mitsuhisa SATO  

     
    PAPER

      Pubricized:
    2022/12/12
      Vol:
    E106-C No:6
      Page(s):
    303-311

    The supercomputer, “Fugaku”, which ranked number one in multiple supercomputing lists, including the Top500 in June 2020, has various power control features, such as (1) an eco mode that utilizes only one of two floating-point pipelines while decreasing the power supply to the chip; (2) a boost mode that increases clock frequency; and (3) a core retention feature that turns unused cores to the low-power state. By orchestrating these power-performance features while considering the characteristics of running applications, we can potentially gain even better system-level energy efficiency. In this paper, we report on the performance and power consumption of Fugaku using SPEC HPC benchmarks. Consequently, we confirmed that it is possible to reduce the energy by about 17% while improving the performance by about 2% from the normal mode by combining boost mode and eco mode.

  • What Factors Affect the Performance of Software after Migration: A Case Study on Sunway TaihuLight Supercomputer

    Jie TAN  Jianmin PANG  Cong LIU  

     
    LETTER

      Pubricized:
    2021/10/21
      Vol:
    E105-D No:1
      Page(s):
    26-30

    Due to the rapid development of different processors, e.g., x86 and Sunway, software porting between different platforms is becoming more frequent. However, the migrated software's execution efficiency on the target platform is different from that of the source platform, and most of the previous studies have investigated the improvement of the efficiency from the hardware perspective. To the best of our knowledge, this is the first paper to exclusively focus on studying what software factors can result in performance change after software migration. To perform our study, we used SonarQube to detect and measure five software factors, namely Duplicated Lines (DL), Code Smells Density (CSD), Big Functions (BF), Cyclomatic Complexity (CC), and Complex Functions (CF), from 13 selected projects of SPEC CPU2006 benchmark suite. Then, we measured the change of software performance by calculating the acceleration ratio of execution time before (x86) and after (Sunway) software migration. Finally, we performed a multiple linear regression model to analyze the relationship between the software performance change and the software factors. The results indicate that the performance change of software migration from the x86 platform to the Sunway platform is mainly affected by three software factors, i.e., Code Smell Density (CSD), Cyclomatic Complexity (CC), and Complex Functions (CF). The findings can benefit both researchers and practitioners.

  • Set-to-Set Disjoint Paths Routing in Torus-Connected Cycles

    Antoine BOSSARD  Keiichi KANEKO  

     
    LETTER-Dependable Computing

      Pubricized:
    2016/08/10
      Vol:
    E99-D No:11
      Page(s):
    2821-2823

    Extending the very popular tori interconnection networks[1]-[3], Torus-Connected Cycles (TCC) have been proposed as a novel network topology for massively parallel systems [5]. Here, the set-to-set disjoint paths routing problem in a TCC is solved. In a TCC(k,n), it is proved that paths of lengths at most kn2+2n can be selected in O(kn2) time.

  • Proposal of a Desk-Side Supercomputer with Reconfigurable Data-Paths Using Rapid Single-Flux-Quantum Circuits

    Naofumi TAKAGI  Kazuaki MURAKAMI  Akira FUJIMAKI  Nobuyuki YOSHIKAWA  Koji INOUE  Hiroaki HONDA  

     
    INVITED PAPER

      Vol:
    E91-C No:3
      Page(s):
    350-355

    We propose a desk-side supercomputer with large-scale reconfigurable data-paths (LSRDPs) using superconducting rapid single-flux-quantum (RSFQ) circuits. It has several sets of computing unit which consists of a general-purpose microprocessor, an LSRDP and a memory. An LSRDP consists of a lot of, e.g., a few thousand, floating-point units (FPUs) and operand routing networks (ORNs) which connect the FPUs. We reconfigure the LSRDP to fit a computation, i.e., a group of floating-point operations, which appears in a 'for' loop of numerical programs by setting the route in ORNs before the execution of the loop. We propose to implement the LSRDPs by RSFQ circuits. The processors and the memories can be implemented by semiconductor technology. We expect that a 10 TFLOPS supercomputer, as well as a refrigerating engine, will be housed in a desk-side rack, using a near-future RSFQ process technology, such as 0.35 µm process.

  • The Development of the Earth Simulator

    Shinichi HABATA  Mitsuo YOKOKAWA  Shigemune KITAWAKI  

     
    INVITED PAPER

      Vol:
    E86-D No:10
      Page(s):
    1947-1954

    The Earth Simulator (ES), developed by the Japanese government's initiative "Earth Simulator project," is a highly parallel vector supercomputer system. In May 2002, the ES was proven to be the most powerful computer in the world by achieving 35.86 teraflops on the LINPACK benchmark and 26.58 teraflops for a global atmospheric circulation model with the spectral method. Three architectural features enabled these great achievements; vector processor, shared-memory and high-bandwidth non-blocking interconnection crossbar network. In this paper, an overview of the ES, the three architectural features and the result of performance evaluation are described particularly with its hardware realization of the interconnection among 640 processor nodes.

  • Interprocessor Memory Access Arbitrating Scheme for TCMP Type Vector Supercomputer

    Tadayuki SAKAKIBARA  Katsuyoshi KITAI  Tadaaki ISOBE  Shigeko YAZAWA  Teruo TANAKA  Yoshiko TAMAKI  Yasuhiro INAGAMI  

     
    PAPER-Computer Architecture

      Vol:
    E80-D No:9
      Page(s):
    925-932

    We propose an instruction-based variable priority scheme (IBVPS) which achieves high sustained memory throughput on a TCMP type vector supercomputer. Generally, there are two approaches to arbitrating interprocessor memory access conflict: request level priority control and fixed priority control. Each approach, however, affects performance in its own way: In the case of request level priority control, mutual obstruction causes a performance degradation, and in the case of fixed priority control, memory bank monopoly causes a performance degradation. Mutual obstruction refers to the interference among access requests coming from different instructions; memory bank monopoly refers to the un-interrupted accessing of the same memory bank by a series of higher priority instructions. The strategy of the instruction-based variable priority scheme consists in: (a) generally changing the priority assignment of all load/store pipelines at the end of any instruction running in the system, and (b) changing the priority assignment of all load/store pipelines more than once in the middle of an access instruction with a stride greater than 1 or an indirect access instruction which may monopolize some memory banks for an extended period of time. This strategy reduces mutual obstruction because the priority assignment is reshuffled for the entire group of load/store pipelines at a time. it also reduces memory bank monopoly because the opportunity for memory access is made equal among different instructions by changing the priority assignment at the end of an instruction. Moreover, it prevents the memory bank monopoly by a memory access instruction with a stride greater than 1 or an indirect access instruction, by changing the priority assignment more frequently. Consequently, high sustained memory throughput is achieved on TCMP type vector supercomputers.

  • Scalable Parallel Memory Architecture with a Skew Scheme

    Tadayuki SAKAKIBARA  Katsuyoshi KITAI  Tadaaki ISOBE  Shigeko YAZAWA  Teruo TANAKA  Yasuhiro INAGAMI  Yoshiko TAMAKI  

     
    PAPER-Computer Architecture

      Vol:
    E80-D No:9
      Page(s):
    933-941

    We present a scalable parallel memory architecture with a skew scheme by which permanent-concentration-free strides, if any, do not depend on the number of ways in parallel memory interleaving. The permanent-concentration is a kind of memory access conflict. With conventional skew schemes, permanent-concentration-free strides depended on the number of banks (or bank groups) in parallel memory (=number of ways in parallel memory interleaving). We analyze two kinds of cause of conflicts: permanent-concentration occurs when memory access requests concentrate in limited number of banks (or bank groups) in parallel memory, and transient-concentration, when memory access requests transiently concentrate in some banks (or bank groups) in parallel memory. We have identified permanent-concentration-free strides, which are independent of the number of banks (or bank groups) in parallel memory, by solving two concentrations separately. The strategy is to increase the size of address block of shifting address assignment to the parallel memory in order to reduce permanent-concentrations, and make the size of the buffer for each banks (or bank groups) in the parallel memory match the size of address block of shifting in order to absorb transient-concentrations. The skew scheme uses the same size of address block of shifting address assignment for memory systems for different numbers of banks (or bank groups) in parallel memory. As a result, scalability for permanent-concentration-free strides is achieved independent of the number of banks (or bank groups) in parallel memory.