The search functionality is under construction.

Author Search Result

[Author] Yukinori SATO(6hit)

1-6hit
  • Identifying Program Loop Nesting Structures during Execution of Machine Code

    Yukinori SATO  Yasushi INOGUCHI  Tadao NAKAMURA  

     
    PAPER-Computer System

      Vol:
    E97-D No:9
      Page(s):
    2371-2385

    This paper presents a mechanism for detecting dynamic loop and procedure nesting during the actual program execution on-the-fly. This mechanism aims primarily at making better strategies for performance tuning or parallelization. Using a pre-compiled application executable machine code as an input, our mechanism statically generates simple but precise markers that indicate loop entries and loop exits, and dynamically monitors loop nesting that appears during the actual execution together with call context tree. To keep precise loop structures all the time, we monitor the indirect jumps that enter the loop regions and the setjmp/longjmp functions that cause irregular function call transfers. We also present a novel representation called Loop-Call Context Graph that can keep track of inter-procedural loop nests. We implement our mechanism and evaluate it using SPEC CPU2006 benchmark suite. The results confirm that our mechanism can successfully reveal the precise inter-procedural loop nest structures from all of SPEC CPU2006 benchmark executions without any particular compiler support. The results also show that it can reduce runtime loop detection overheads compared with the existing loop profiling method.

  • On Nonuniform Traffic Pattern of Modified Hierarchical 3D-Torus Network

    M.M. Hafizur RAHMAN  Yukinori SATO  Yasushi INOGUCHI  

     
    LETTER-Computer System

      Vol:
    E94-D No:5
      Page(s):
    1109-1112

    A Modified Hierarchical 3D-Torus (MH3DT) network is a 3D-torus network consisting of multiple basic modules, in which each basic module itself is a 3D-torus network. Inter-node communication performance has been evaluated using dimension-order routing and 2 virtual channels (VCs) under uniform traffic patterns but not under non-uniform traffic patterns. In this paper, we evaluate the inter-node communication performance of MH3DT under five non-uniform traffic patterns and compare it with other networks. We found that under non-uniform traffic patterns, the MH3DT yields high throughput and low latency, providing better inter-node communication performance compared to H3DT, TESH, mesh, and torus networks. Also, we found that non-uniform traffic patterns have higher throughput than uniform traffic in the MH3DT network.

  • Power Estimation of Partitioned Register Files in a Clustered Architecture with Performance Evaluation

    Yukinori SATO  Ken-ichi SUZUKI  Tadao NAKAMURA  

     
    PAPER-VLSI Systems

      Vol:
    E90-D No:3
      Page(s):
    627-636

    High power consumption and slow access of enlarged and multiported register files make it difficult to design high performance superscalar processors. The clustered architecture, where the conventional monolithic register file is partitioned into several smaller register files, is expect to overcome the register file issues. In the clustered architecture, the more a monolithic register file is partitioned, the lower power and faster access register files can be realized. However, the partitioning causes losses of IPC (instructions per clock cycle) due to communication among register files. Therefore, degree of partitioning has a strong impact on the trade-off between power consumption and performance. In addition, the organization of partitioned register files also affects the trade-off. In this paper, we attempt to investigate appropriate degrees of partitioning and organizations of partitioned register files in a clustered architecture to assess the trade-off. From the results of execute-driven simulation, we find that the organization of register files and the degree of partitioning have a strong impact on the IPC, and the configuration with non-consistent register files can make use of the partitioned resources more effectively. From the results of register file access time and energy modeling, we find that the configurations with the highly partitioned non-consistent register file organization can receive benefit of the partitioning in terms of operating frequency and access energy of register files. Further, we examine relationship between IPS (instructions per second) and the product of IPC and operating frequency of register files. The results suggest that highly partitioned non-consistent configurations tends to gain more advantage in performance and power.

  • A Prediction-Based Green Scheduler for Datacenters in Clouds

    Truong Vinh Truong DUY  Yukinori SATO  Yasushi INOGUCHI  

     
    PAPER-Fundamentals of Information Systems

      Vol:
    E94-D No:9
      Page(s):
    1731-1741

    With energy shortages and global climate change leading our concerns these days, the energy consumption of datacenters has become a key issue. Obviously, a substantial reduction in energy consumption can be made by powering down servers when they are not in use. This paper aims at designing, implementing and evaluating a Green Scheduler for reducing energy consumption of datacenters in Cloud computing platforms. It is composed of four algorithms: prediction, ON/OFF, task scheduling, and evaluation algorithms. The prediction algorithm employs a neural predictor to predict future load demand based on historical demand. According to the prediction, the ON/OFF algorithm dynamically adjusts server allocations to minimize the number of servers running, thus minimizing the energy use at the points of consumption to benefit all other levels. The task scheduling algorithm is responsible for directing request traffic away from powered-down servers and toward active servers. The performance is monitored by the evaluation algorithm to balance the system's adaptability against stability. For evaluation, we perform simulations with two load traces. The results show that the prediction mode, with a combination of dynamic training and dynamic provisioning of 20% additional servers, can reduce energy consumption by 49.8% with a drop rate of 0.02% on one load trace, and a drop rate of 0.16% with an energy consumption reduction of 55.4% on the other. Our method is also proven to have a distinct advantage over its counterparts.

  • Energy-Performance Modeling of Speculative Checkpointing for Exascale Systems

    Muhammad ALFIAN AMRIZAL  Atsuya UNO  Yukinori SATO  Hiroyuki TAKIZAWA  Hiroaki KOBAYASHI  

     
    PAPER-High performance computing

      Pubricized:
    2017/07/14
      Vol:
    E100-D No:12
      Page(s):
    2749-2760

    Coordinated checkpointing is a widely-used checkpoint/restart protocol for fault-tolerance in large-scale HPC systems. However, this protocol will involve massive amounts of I/O concentration, resulting in considerably high checkpoint overhead and high energy consumption. This paper focuses on speculative checkpointing, a CPR mechanism that allows for temporal distribution of checkpointings to avoid I/O concentration. We propose execution time and energy models for speculative checkpointing, and investigate energy-performance characteristics when speculative checkpointing is adopted in exascale systems. Using these models, we study the benefit of speculative checkpointing over coordinated checkpointing under various realistic scenarios for exascale HPC systems. We show that, compared to coordinated checkpointing, speculative checkpointing can achieve up to a 11% energy reduction at the cost of a relatively-small increase in the execution time. In addition, a significant energy-performance trade-off is expected when the system scale exceeds 1.2 million nodes.

  • TTN: A High Performance Hierarchical Interconnection Network for Massively Parallel Computers

    M.M. Hafizur RAHMAN  Yasushi INOGUCHI  Yukinori SATO  Susumu HORIGUCHI  

     
    PAPER-Computer Systems

      Vol:
    E92-D No:5
      Page(s):
    1062-1078

    Interconnection networks play a crucial role in the performance of massively parallel computers. Hierarchical interconnection networks provide high performance at low cost by exploring the locality that exists in the communication patterns of massively parallel computers. A Tori connected Torus Network (TTN) is a 2D-torus network of multiple basic modules, in which the basic modules are 2D-torus networks that are hierarchically interconnected for higher-level networks. This paper addresses the architectural details of the TTN and explores aspects such as node degree, network diameter, cost, average distance, arc connectivity, bisection width, and wiring complexity. We also present a deadlock-free routing algorithm for the TTN using four virtual channels and evaluate the network's dynamic communication performance using the proposed routing algorithm under uniform and various non-uniform traffic patterns. We evaluate the dynamic communication performance of TTN, TESH, MH3DT, mesh, and torus networks by computer simulation. It is shown that the TTN possesses several attractive features, including constant node degree, small diameter, low cost, small average distance, moderate (neither too low, nor too high) bisection width, and high throughput and very low zero load latency, which provide better dynamic communication performance than that of other conventional and hierarchical networks.