The search functionality is under construction.

Author Search Result

[Author] Saneyasu YAMAGUCHI(7hit)

1-7hit
  • Improving Dynamic Scaling Performance of Cassandra

    Saneyasu YAMAGUCHI  Yuki MORIMITSU  

     
    PAPER

      Pubricized:
    2017/01/17
      Vol:
    E100-D No:4
      Page(s):
    682-692

    Load size for a service on the Internet changes remarkably every hour. Thus, it is expected for service system scales to change dynamically according to load size. KVS (key-value store) is a scalable DBMS (database management system) widely used in largescale Internet services. In this paper, we focus on Cassandra, a popular open-source KVS implementation, and discuss methods for improving dynamic scaling performance. First, we evaluate node joining time, which is the time to complete adding a node to a running KVS system, and show that its bottleneck process is disk I/O. Second, we analyze disk accesses in the nodes and indicate that some heavily accessed files cause a large number of disk accesses. Third, we propose two methods for improving elasticity, which means decreasing node adding and removing time, of Cassandra. One method reduces disk accesses significantly by keeping the heavily accessed file in the page cache. The other method optimizes I/O scheduler behavior. Lastly, we evaluate elasticity of our methods. Our experimental results demonstrate that the methods can improve the scaling-up and scaling-down performance of Cassandra.

  • Power-Effective File Layout Based on Large Scale Data-Intensive Application in Virtualized Environment

    Shunsuke YAGAI  Masato OGUCHI  Miyuki NAKANO  Saneyasu YAMAGUCHI  

     
    PAPER-Database system

      Pubricized:
    2017/07/14
      Vol:
    E100-D No:12
      Page(s):
    2761-2770

    In data centers, large numbers of computers are run simultaneously. These computers consume an enormous amount of energy. Several challenges related to this issue have been published. An energy-efficient storage management method that cooperates with applications was one effective approach. In this method, data and storage devices are managed using application support and the power consumption of storage devices is significantly decreased. However, existing studies do not take the virtualized environment into account. Recently, many data-intensive applications have been run in a virtualized environment, such as the cloud computing environment. In this paper, we focus on a virtualized environment wherein multiple virtual machines run on a physical computer and a data intensive application runs on each virtual machine. We discuss a method for reducing storage device power consumption using application support. First, we propose two storage management methods using application information. One method optimizes the inter-HDD file layout. This method removes frequently-accessed files from a certain HDD and switches the HDD to power-off mode. To balance loads and reduce seek distances, this method separates a heavily accessed file and consolidates files in a virtual machine with low access frequency. The other method optimizes the intra-HDD file layout, in addition to performing inter-HDD optimization. This method places frequently accessed files near each other. Second, we present our experimental results and demonstrate that the proposed methods can create sufficiently long HDD access intervals that power-off mode can be used, and thereby, reduce the power consumption of storage devices.

  • Hadoop I/O Performance Improvement by File Layout Optimization

    Eita FUJISHIMA  Kenji NAKASHIMA  Saneyasu YAMAGUCHI  

     
    PAPER-Data Engineering, Web Information Systems

      Pubricized:
    2017/11/22
      Vol:
    E101-D No:2
      Page(s):
    415-427

    Hadoop is a popular open-source MapReduce implementation. In the cases of jobs, wherein huge scale of output files of all relevant Map tasks are transmitted into Reduce tasks, such as TeraSort, the Reduce tasks are the bottleneck tasks and are I/O bounded for processing many large output files. In most cases, including TeraSort, the intermediate data, which include the output files of the Map tasks, are large and accessed sequentially. For improving the performance of these jobs, it is important to increase the sequential access performance. In this paper, we propose methods for improving the performance of Reduce tasks of such jobs by considering the following two things. One is that these files are accessed sequentially on an HDD, and the other is that each zone in an HDD has different sequential I/O performance. The proposed methods control the location to store intermediate data by modifying block bitmap of filesystem, which manages utilization (free or used) of blocks in an HDD. In addition, we propose striping layout for applying these methods for virtualized environment using image files. We then present performance evaluation of the proposed method and demonstrate that our methods improve the Hadoop application performance.

  • I/O Performance Improvement of FHE Apriori with Striping File Layout Considering Storage of Intermediate Data

    Atsuki KAMO  Saneyasu YAMAGUCHI  

     
    LETTER-Data Engineering, Web Information Systems

      Pubricized:
    2023/03/13
      Vol:
    E106-D No:6
      Page(s):
    1183-1185

    Fully homomorphic encryption (FHE) enables secret computations. Users can perform computation using data encrypted with FHE without decryption. Uploading private data without encryption to a public cloud has the risk of data leakage, which makes many users hesitant to utilize a public cloud. Uploading data encrypted with FHE avoids this risk, while still providing the computing power of the public cloud. In many cases, data are stored in HDDs because the data size increases significantly when FHE is used. One important data analysis is Apriori data mining. In this application, two files are accessed alternately, and this causes long-distance seeking on its HDD and low performance. In this paper, we propose a new striping layout with reservations for write areas. This method intentionally fragments files and arranges blocks to reduce the distance between blocks in a file and another file. It reserves the area for intermediate files of FHE Apriori. The performance of the proposed method was evaluated based on the I/O processing of a large FHE Apriori, and the results showed that the proposed method could improve performance by up to approximately 28%.

  • Deeply Programmable Application Switch for Performance Improvement of KVS in Data Center Open Access

    Satoshi ITO  Tomoaki KANAYA  Akihiro NAKAO  Masato OGUCHI  Saneyasu YAMAGUCHI  

     
    PAPER

      Pubricized:
    2024/01/17
      Vol:
    E107-D No:5
      Page(s):
    659-673

    The concepts of programmable switches and software-defined networking (SDN) give developers flexible and deep control over the behavior of switches. We expect these concepts to dramatically improve the functionality of switches. In this paper, we focus on the concept of Deeply Programmable Networks (DPN), where data planes are programmable, and application switches based on DPN. We then propose a method to improve the performance of a key-value store (KVS) through an application switch. First, we explain the DPN and application switches. The DPN is a network that makes not only control planes but also data planes programmable. An application switch is a switch that implements some functions of network applications, such as database management system (DBMS). Second, we propose a method to improve the performance of Cassandra, one of the most popular key-value based DBMS, by implementing a caching function in a switch in a dedicated network such as a data center. The proposed method is expected to be effective even though it is a simple and traditional way because it is in the data path and the center of the network application. Third, we implement a switch with the caching function, which monitors the accessed data described in packets (Ethernet frames) and dynamically replaces the cached data in the switch, and then show that the proposed caching switch can significantly improve the KVS transaction performance with this implementation. In the case of our evaluation, our method improved the KVS transaction throughput by up to 47%.

  • Job-Aware File-Storage Optimization for Improved Hadoop I/O Performance

    Makoto NAKAGAMI  Jose A.B. FORTES  Saneyasu YAMAGUCHI  

     
    PAPER-Software System

      Pubricized:
    2020/06/30
      Vol:
    E103-D No:10
      Page(s):
    2083-2093

    Hadoop is a popular data-analytics platform based on Google's MapReduce programming model. Hard-disk drives (HDDs) are generally used in big-data analysis, and the effectiveness of the Hadoop platform can be optimized by enhancing its I/O performance. HDD performance varies depending on whether the data are stored in the inner or outer disk zones. This paper proposes a method that utilizes the knowledge of job characteristics to realize efficient data storage in HDDs, which in turn, helps improve Hadoop performance. Per the proposed method, job files that need to be frequently accessed are stored in outer disk tracks which are capable of facilitating sequential-access speeds that are higher than those provided by inner tracks. Thus, the proposed method stores temporary and permanent files in the outer and inner zones, respectively, thereby facilitating fast access to frequently required data. Results of performance evaluation demonstrate that the proposed method improves Hadoop performance by 15.4% when compared to normal cases when file placement is not used. Additionally, the proposed method outperforms a previously proposed placement approach by 11.1%.

  • A Multi-Agent System for Dynamic Network Routing

    Ryokichi ONISHI  Saneyasu YAMAGUCHI  Hiroaki MORINO  Hitoshi AIDA  Tadao SAITO  

     
    PAPER-Mobile Agent

      Vol:
    E84-B No:10
      Page(s):
    2721-2728

    Single-hop communication methods of the current wireless network cannot meet new demands in new domains, especially ITS (Intelligent Transport Systems). Even though the ad-hoc network architecture is expected to solve this problem, but the nature of a dynamic topology makes this routing hard to be realized. This paper introduces a new ad-hoc routing algorithm, which is inspired by [1]. In their system, some control agents explore the network and update routing tables on their own knowledge. Using these routing tables, other agents deliver messages. They considered the feasibility of the agent-based routing system, but did not refer to an efficient algorithm. In this paper, we consider that algorithm without increasing network load. We propose multiple entries for each destination in the routing table to store much more information from agents and evaluating them to make better use of information, which succeeded in raising the network connectivity by about 40% by simulation.