IEICE global.ieice.org Site

Author Search Result

[Author] Yasushi INOGUCHI(11hit)

1-11hit

A Prediction-Based Green Scheduler for Datacenters in Clouds
Truong Vinh Truong DUY Yukinori SATO Yasushi INOGUCHI

PAPER-Fundamentals of Information Systems

Vol:
E94-D No:9
Page(s):
1731-1741
With energy shortages and global climate change leading our concerns these days, the energy consumption of datacenters has become a key issue. Obviously, a substantial reduction in energy consumption can be made by powering down servers when they are not in use. This paper aims at designing, implementing and evaluating a Green Scheduler for reducing energy consumption of datacenters in Cloud computing platforms. It is composed of four algorithms: prediction, ON/OFF, task scheduling, and evaluation algorithms. The prediction algorithm employs a neural predictor to predict future load demand based on historical demand. According to the prediction, the ON/OFF algorithm dynamically adjusts server allocations to minimize the number of servers running, thus minimizing the energy use at the points of consumption to benefit all other levels. The task scheduling algorithm is responsible for directing request traffic away from powered-down servers and toward active servers. The performance is monitored by the evaluation algorithm to balance the system's adaptability against stability. For evaluation, we perform simulations with two load traces. The results show that the prediction mode, with a combination of dynamic training and dynamic provisioning of 20% additional servers, can reduce energy consumption by 49.8% with a drop rate of 0.02% on one load trace, and a drop rate of 0.16% with an energy consumption reduction of 55.4% on the other. Our method is also proven to have a distinct advantage over its counterparts.
Dynamic Task Flow Scheduling for Heterogeneous Distributed Computing: Algorithm and Strategy
Wei SUN Yuanyuan ZHANG Yasushi INOGUCHI

PAPER-Computer Systems

Vol:
E90-D No:4
Page(s):
736-744
Heterogeneous distributed computing environments are well suited to meet the fast increasing computational demands. Task scheduling is very important for a heterogeneous distributed system to satisfy the large computational demands of applications. The performance of a scheduler in a heterogeneous distributed system normally has something to do with the dynamic task flow, that is, the scheduler always suffers from the heterogeneity of task sizes and the variety of task arrivals. From the long-term viewpoint it is necessary and possible to improve the performance of the scheduler serving the dynamic task flow. In this paper we propose a task scheduling method including a scheduling strategy which adapts to the dynamic task flow and a genetic algorithm which can achieve the short completion time of a batch of tasks. The strategy and the genetic algorithm work with each other to enhance the scheduler's efficiency and performance. We simulated a task flow with enough tasks, the scheduler with our strategy and algorithm, and the schedulers with other strategies and algorithms. We also simulated a complex scenario including the variant arrival rate of tasks and the heterogeneous computational nodes. The simulation results show that our scheduler achieves much better scheduling results than the others, in terms of the average waiting time, the average response time, and the finish time of all tasks.
Dynamic Scheduling Real-Time Task Using Primary-Backup Overloading Strategy for Multiprocessor Systems
Wei SUN Chen YU Xavier DEFAGO Yasushi INOGUCHI

PAPER-Dependable Computing

Vol:
E91-D No:3
Page(s):
796-806
The scheduling of real-time tasks with fault-tolerant requirements has been an important problem in multiprocessor systems. The primary-backup (PB) approach is often used as a fault-tolerant technique to guarantee the deadlines of tasks despite the presence of faults. In this paper we propose a dynamic PB-based task scheduling approach, wherein an allocation parameter is used to search the available time slots for a newly arriving task, and the previously scheduled tasks can be re-scheduled when there is no available time slot for the newly arriving task. In order to improve the schedulability we also propose an overloading strategy for PB-overloading and Backup-backup (BB) overloading. Our proposed task scheduling algorithm is compared with some existing scheduling algorithms in the literature through simulation studies. The results have shown that the task rejection ratio of our real-time task scheduling algorithm is almost 50% lower than the compared algorithms.
TTN: A High Performance Hierarchical Interconnection Network for Massively Parallel Computers
M.M. Hafizur RAHMAN Yasushi INOGUCHI Yukinori SATO Susumu HORIGUCHI

PAPER-Computer Systems

Vol:
E92-D No:5
Page(s):
1062-1078
Interconnection networks play a crucial role in the performance of massively parallel computers. Hierarchical interconnection networks provide high performance at low cost by exploring the locality that exists in the communication patterns of massively parallel computers. A Tori connected Torus Network (TTN) is a 2D-torus network of multiple basic modules, in which the basic modules are 2D-torus networks that are hierarchically interconnected for higher-level networks. This paper addresses the architectural details of the TTN and explores aspects such as node degree, network diameter, cost, average distance, arc connectivity, bisection width, and wiring complexity. We also present a deadlock-free routing algorithm for the TTN using four virtual channels and evaluate the network's dynamic communication performance using the proposed routing algorithm under uniform and various non-uniform traffic patterns. We evaluate the dynamic communication performance of TTN, TESH, MH3DT, mesh, and torus networks by computer simulation. It is shown that the TTN possesses several attractive features, including constant node degree, small diameter, low cost, small average distance, moderate (neither too low, nor too high) bisection width, and high throughput and very low zero load latency, which provide better dynamic communication performance than that of other conventional and hierarchical networks.
A New Dimension Analysis on Blocking Behavior in Banyan-Based Optical Switching Networks
Chen YU Yasushi INOGUCHI Susumu HORIGUCHI

PAPER-Networks

Vol:
E91-D No:7
Page(s):
1991-1998
Vertically stacked optical banyan (VSOB) is an attractive architecture for constructing banyan-based optical switches. Blocking behaviors analysis is an effective approach to studying network performance and finding a graceful compromise among hardware costs, blocking probability and crosstalk tolerance; however, little has been done on analyzing the blocking behavior of VSOB networks under crosstalk constraint which adds a new dimension to the switching performance. In this paper, we study the overall blocking behavior of a VSOB network under various degree of crosstalk, where an upper bound on the blocking probability of the network is developed. The upper bound depicts accurately the overall blocking behavior of a VSOB network as verified by extensive simulation results and it agrees with the strictly nonblocking condition of the network. The derived upper bound is significant because it reveals the inherent relationship between blocking probability and network hardware cost, by which a desirable tradeoff can be made between them under various degree of crosstalk constraint. Also, the upper bound shows how crosstalk adds a new dimension to the theory of switching systems.
Influence of Inaccurate Performance Prediction on Task Scheduling in a Grid Environment
Yuanyuan ZHANG Yasushi INOGUCHI

PAPER-Performance Evaluation

Vol:
E89-D No:2
Page(s):
479-486
Efficient task scheduling is critical for achieving high performance in grid computing systems. Existing task scheduling algorithms for grid environments usually assume that the performance prediction for both tasks and resources is perfectly accurate. In practice, however, it is very difficult to achieve such an accurate prediction in a heterogeneous and dynamic grid environment. Therefore, the performance of a task scheduling algorithm may be significantly influenced by prediction inaccuracy. In this paper, we study the influence of inaccurate predictions on task scheduling in the contexts of task selection and processor selection, which are two critical phases in task scheduling algorithms. We develop formulas for the misprediction degree, which is defined as the probability that the predicted values for the performances of tasks and processors reveal different orders from their real values. Based on these formulas, we also investigate the effect of several key parameters on the misprediction degree. Finally, we conduct extensive simulation for the sensitivities of some existing task scheduling algorithms to the prediction errors.
CPU Load Predictions on the Computational Grid
Yuanyuan ZHANG Wei SUN Yasushi INOGUCHI

PAPER-Grid Computing

Vol:
E90-D No:1
Page(s):
40-47
To make the best use of the resources in a shared grid environment, an application scheduler must make a prediction of available performance on each resource. In this paper, we examine the problem of predicting available CPU performance in time-shared grid system. We present and evaluate a new and innovative method to predict the one-step-ahead CPU load in a grid. Our prediction strategy forecasts the future CPU load based on the variety tendency in several past steps and in previous similar patterns, and uses a polynomial fitting method. Our experimental results on large load traces collected from four different kinds of machines demonstrate that this new prediction strategy achieves average prediction errors which are between 22% and 86% less than those incurred by four previous methods.
Identifying Program Loop Nesting Structures during Execution of Machine Code
Yukinori SATO Yasushi INOGUCHI Tadao NAKAMURA

PAPER-Computer System

Vol:
E97-D No:9
Page(s):
2371-2385
This paper presents a mechanism for detecting dynamic loop and procedure nesting during the actual program execution on-the-fly. This mechanism aims primarily at making better strategies for performance tuning or parallelization. Using a pre-compiled application executable machine code as an input, our mechanism statically generates simple but precise markers that indicate loop entries and loop exits, and dynamically monitors loop nesting that appears during the actual execution together with call context tree. To keep precise loop structures all the time, we monitor the indirect jumps that enter the loop regions and the setjmp/longjmp functions that cause irregular function call transfers. We also present a novel representation called Loop-Call Context Graph that can keep track of inter-procedural loop nests. We implement our mechanism and evaluate it using SPEC CPU2006 benchmark suite. The results confirm that our mechanism can successfully reveal the precise inter-procedural loop nest structures from all of SPEC CPU2006 benchmark executions without any particular compiler support. The results also show that it can reduce runtime loop detection overheads compared with the existing loop profiling method.
On Nonuniform Traffic Pattern of Modified Hierarchical 3D-Torus Network
M.M. Hafizur RAHMAN Yukinori SATO Yasushi INOGUCHI

LETTER-Computer System

Vol:
E94-D No:5
Page(s):
1109-1112
A Modified Hierarchical 3D-Torus (MH3DT) network is a 3D-torus network consisting of multiple basic modules, in which each basic module itself is a 3D-torus network. Inter-node communication performance has been evaluated using dimension-order routing and 2 virtual channels (VCs) under uniform traffic patterns but not under non-uniform traffic patterns. In this paper, we evaluate the inter-node communication performance of MH3DT under five non-uniform traffic patterns and compare it with other networks. We found that under non-uniform traffic patterns, the MH3DT yields high throughput and low latency, providing better inter-node communication performance compared to H3DT, TESH, mesh, and torus networks. Also, we found that non-uniform traffic patterns have higher throughput than uniform traffic in the MH3DT network.
Modified Hierarchical 3D-Torus Network
M.M. Hafizur RAHMAN Yasushi INOGUCHI Susumu HORIGUCHI

PAPER-Computer Systems

Vol:
E88-D No:2
Page(s):
177-186
Three-dimensional (3D) wafer stacked implementation (WSI) has been proposed as a promising technology for massively parallel computers. A hierarchical 3D-torus (H3DT) network, which is a 3D-torus network of multiple basic modules in which the basic modules are 3D-mesh networks, has been proposed for efficient 3D-WSI. However, the restricted use of physical links between basic modules in the higher level networks reduces the dynamic communication performance of this network. A torus network has better dynamic communication performance than a mesh network. Therefore, we have modified the H3DT network by replacing the 3D-mesh modules by 3D-tori, calling it a Modified H3DT (MH3DT) network. This paper addresses the architectural details of the MH3DT network and explores aspects such as degree, diameter, cost, average distance, arc connectivity, bisection width, and wiring complexity. We also present a deadlock-free routing algorithm for the MH3DT network using two virtual channels and evaluate the network's dynamic communication performance under the uniform traffic pattern, using the proposed routing algorithm. It is shown that the MH3DT network possesses several attractive features including small diameter, small cost, small average distance, better bisection width, and better dynamic communication performance.
High-Performance Training of Conditional Random Fields for Large-Scale Applications of Labeling Sequence Data
Xuan-Hieu PHAN Le-Minh NGUYEN Yasushi INOGUCHI Susumu HORIGUCHI

PAPER-Parallel Processing System

Vol:
E90-D No:1
Page(s):
13-21
Conditional random fields (CRFs) have been successfully applied to various applications of predicting and labeling structured data, such as natural language tagging & parsing, image segmentation & object recognition, and protein secondary structure prediction. The key advantages of CRFs are the ability to encode a variety of overlapping, non-independent features from empirical data as well as the capability of reaching the global normalization and optimization. However, estimating parameters for CRFs is very time-consuming due to an intensive forward-backward computation needed to estimate the likelihood function and its gradient during training. This paper presents a high-performance training of CRFs on massively parallel processing systems that allows us to handle huge datasets with hundreds of thousand data sequences and millions of features. We performed the experiments on an important natural language processing task (text chunking) on large-scale corpora and achieved significant results in terms of both the reduction of computational time and the improvement of prediction accuracy.

Author Search Result

[Author] Yasushi INOGUCHI(11hit)

A Prediction-Based Green Scheduler for Datacenters in Clouds

Dynamic Task Flow Scheduling for Heterogeneous Distributed Computing: Algorithm and Strategy

Dynamic Scheduling Real-Time Task Using Primary-Backup Overloading Strategy for Multiprocessor Systems

TTN: A High Performance Hierarchical Interconnection Network for Massively Parallel Computers

A New Dimension Analysis on Blocking Behavior in Banyan-Based Optical Switching Networks

Influence of Inaccurate Performance Prediction on Task Scheduling in a Grid Environment

CPU Load Predictions on the Computational Grid

Identifying Program Loop Nesting Structures during Execution of Machine Code

On Nonuniform Traffic Pattern of Modified Hierarchical 3D-Torus Network

Modified Hierarchical 3D-Torus Network

High-Performance Training of Conditional Random Fields for Large-Scale Applications of Labeling Sequence Data

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles