The search functionality is under construction.

Author Search Result

[Author] Haruo YOKOTA(11hit)

1-11hit
  • File and Task Abstraction in Task Workflow Patterns for File Recommendation Using File-Access Log Open Access

    Qiang SONG  Takayuki KAWABATA  Fumiaki ITOH  Yousuke WATANABE  Haruo YOKOTA  

     
    PAPER

      Vol:
    E97-D No:4
      Page(s):
    634-643

    The numbers of files in file systems have increased dramatically in recent years. Office workers spend much time and effort searching for the documents required for their jobs. To reduce these costs, we propose a new method for recommending files and operations on them. Existing technologies for recommendation, such as collaborative filtering, suffer from two problems. First, they can only work with documents that have been accessed in the past, so that they cannot recommend when only newly generated documents are inputted. Second, they cannot easily handle sequences involving similar or differently ordered elements because of the strict matching used in the access sequences. To solve these problems, such minor variations should be ignored. In our proposed method, we introduce the concepts of abstract files as groups of similar files used for a similar purpose, abstract tasks as groups of similar tasks, and frequent abstract workflows grouped from similar workflows, which are sequences of abstract tasks. In experiments using real file-access logs, we confirmed that our proposed method could extract workflow patterns with longer sequences and higher support-count values, which are more suitable as recommendations. In addition, the F-measure for the recommendation results was improved significantly, from 0.301 to 0.598, compared with a method that did not use the concepts of abstract tasks and abstract workflows.

  • Cache-Conscious Data Access for DBMS in Multicore Environments

    Fang XI  Takeshi MISHIMA  Haruo YOKOTA  

     
    PAPER

      Pubricized:
    2015/01/21
      Vol:
    E98-D No:5
      Page(s):
    1001-1012

    In recent years, dramatic improvements have been made to computer hardware. In particular, the number of cores on a chip has been growing exponentially, enabling an ever-increasing number of processes to be executed in parallel. Having been originally developed for single-core processors, database (DB) management systems (DBMSs) running on multicore processors suffer from cache conflicts as the number of concurrently executing DB processes (DBPs) increases. Therefore, a cache-efficient solution for arranging the execution of concurrent DBPs on multicore platforms would be highly attractive for DBMSs. In this paper, we propose CARIC-DA, middleware for achieving higher performance in DBMSs on multicore processors, by reducing cache misses with a new cache-conscious dispatcher for concurrent queries. CARIC-DA logically range-partitions the dataset into multiple subsets. This enables different processor cores to access different subsets by ensuring that different DBPs are pinned to different cores and by dispatching queries to DBPs according to the data-partitioning information. In this way, CARIC-DA is expected to achieve better performance via a higher cache hit rate for the private cache of each core. It can also balance the loads between cores by changing the range of each subset. Note that CARIC-DA is pure middleware, meaning that it avoids any modification to existing operating systems (OSs) and DBMSs, thereby making it more practical. This is important because the source code for existing DBMSs is large and complex, making it very expensive to modify. We implemented a prototype that uses unmodified existing Linux and PostgreSQL environments, and evaluated the effectiveness of our proposal on three different multicore platforms. The performance evaluation against benchmarks revealed that CARIC-DA achieved improved cache hit rates and higher performance.

  • UPRISE: Unified Presentation Slide Retrieval by Impression Search Engine

    Haruo YOKOTA  Takashi KOBAYASHI  Taichi MURAKI  Satoshi NAOI  

     
    PAPER

      Vol:
    E87-D No:2
      Page(s):
    397-406

    A combination of slides used in a presentation and a video recording of the circumstances of the presentation are quite useful for many applications, such as e-learning. However, to create new content from these with current authoring tools requires considerable effort for the author and the products have reduced flexibility. In this paper, we propose the preparation of a unifying function without creating new content manually. We also propose a new approach to search unified presentation manuscripts for slides matched with given keywords by considering the features peculiar to the presentation slides. We propose impression indicators to express how well a slide matches the given keywords. We also propose a system for retrieving a sequence of desired presentation slides from archives of the combined slides and video. We named the system Unified Presentation Slide Retrieval by Impression Search Engine or UPRISE. We describe the system configuration of UPRISE and the experimentation undertaken to evaluate the effect of the proposed indicators and to compare the results with those of the traditional tf.idf retrieval method.

  • MARK-OPT: A Concurrency Control Protocol for Parallel B-Tree Structures to Reduce the Cost of SMOs

    Tomohiro YOSHIHARA  Dai KOBAYASHI  Haruo YOKOTA  

     
    PAPER-Database

      Vol:
    E90-D No:8
      Page(s):
    1213-1224

    In this paper, we propose a new concurrency control protocol for parallel B-tree structures capable reducing the cost of structure-modification-operation (SMO) compared to the conventional protocols such as ARIES/IM and INC-OPT. We call this protocol the MARK-OPT protocol, since it marks the lowest SMO occurrence point during optimistic latch-coupling operations. The marking reduces middle phases for spreading an X latch and removes needless X latches. In addition, we propose three variations of the MARK-OPT, which focus on tree structure changes from other transactions. Moreover, the proposed protocols are deadlock-free and satisfy the physical consistency requirement for B-trees. These indicate that the proposed protocols are suitable as concurrency control protocols for B-tree structures. To compare the performance of the proposed protocols, the INC-OPT, and the ARIES/IM, we implement these protocols on an autonomous disk system adopting the Fat-Btree structure, a form of parallel B-tree structure. Experimental results in various environments indicate that the proposed protocols always improve system throughput, and 2P-REP-MARK-OPT is the most useful protocol in high update environment. Additionally, to mitigate access skew, data should be migrated between PEs. We also demonstrate that MARK-OPT improves the system throughput under the data migration and reduces the time for data migration to balance load distribution.

  • Software Cache Techniques for Memory Nodes in Distributed Memory Parallel Production Systems

    Jun MIYAZAKI   Haruo YOKOTA  

     
    PAPER-Architectures

      Vol:
    E79-D No:8
      Page(s):
    1046-1054

    Because the match phase in OPS5-type production systems requires most of the system's execution time and memory accesses, we proposed hash-based parallel production systems, CPPS (Clustered Parallel Production Systems), based on the RETE algorithm for distributed memory parallel computers, or multicomputers to reduce such a bottleneck. CPPS was effective in speeding up the match phase, but still left room for optimizations. In this paper, we introduce software cache techniques to memory nodes in the CPPS as one of the optimizations, and implement it on a multicomputer, nCUBE2. The benchmark results show that the CPPS with the software cache is about 2-fold faster than the original, and more than 7-fold faster than the simple hash method proposed by Acharya et al. for a large scale problem. The speed-up can be attributed to decreased communication costs.

  • FOREWORD

    Haruo YOKOTA  

     
    FOREWORD

      Vol:
    E86-D No:12
      Page(s):
    2501-2502
  • Concurrency Control and Performance Evaluation of Parallel B-tree Structures

    Jun MIYAZAKI  Haruo YOKOTA  

     
    PAPER-Databases

      Vol:
    E85-D No:8
      Page(s):
    1269-1283

    The Fat-Btree which is a new parallel B-tree structure has been proposed to improve the access performance of shared-nothing parallel database systems. Since the Fat-Btree has only a part of index nodes on each processing element, it can reduce the synchronization cost in update operations. For these reasons, both retrieval and update operations can be processed at high throughput compared to previously proposed parallel B-tree structures for shared-nothing computers. Though we tried to apply some conventional concurrency control methods to the Fat-Btree, e.g., B-OPT and ARIES/IM, which were designed for shared-everything machines, we found that these methods are not always appropriate for the Fat-Btree. In this paper, it is shown that the conventional methods are not suitable for the Fat-Btree and other parallel B-trees. We propose a new deadlock free concurrency control protocol, named INC-OPT, to improve the performance of the Fat-Btree more effectively than the B-OPT and ARIES/IM. Furthermore, in order to prove that the Fat-Btree provides the impact on the performance of shared-nothing parallel databases, we compare the real performance of three types of parallel B-tree structures, Fat-Btree, Copy-Whole-Btree, and Single-Index-Btree, on an nCUBE3 machine where the INC-OPT is applied.

  • Concurrency Control Protocol for Parallel B-Tree Structures That Improves the Efficiency of Request Transfers and SMOs within a Node

    Tomohiro YOSHIHARA  Dai KOBAYASHI  Haruo YOKOTA  

     
    PAPER-Data Engineering, Web Information Systems

      Pubricized:
    2017/10/18
      Vol:
    E101-D No:1
      Page(s):
    152-170

    Many concurrency control protocols for B-trees use latch-coupling because its execution is efficient on a single machine. Some studies have indicated that latch-coupling may involve a performance bottleneck when using multicore processors in a shared-everything environment, but no studies have considered the possible performance bottleneck caused by sending messages between processing elements (PEs) in shared-nothing environments. We propose two new concurrency control protocols, “LCFB” and “LCFB-link”, which require no latch-coupling in optimistic processes. The LCFB-link also innovates B-link approach within each PE to reduce the cost of modifications in the PE, as a solution to the difficulty of consistency management for the side pointers in a parallel B-tree. The B-link algorithm is well known as a protocol without latch-coupling, but B-link has the difficulty of guaranteeing the consistency of the side pointers in a parallel B-tree. Experimental results in various environments indicated that the system throughput of the proposed protocols was always superior to those of the conventional protocols, particularly in large-scale configurations, and using LCFB-link was effective for higher update ratios. In addition, to mitigate access skew, data should migrate between PEs. We have demonstrated that our protocols always improve the system throughput and are effective as concurrency controls for data migration.

  • NDCouplingHDFS: A Coupling Architecture for a Power-Proportional Hadoop Distributed File System

    Hieu Hanh LE  Satoshi HIKIDA  Haruo YOKOTA  

     
    PAPER-Data Engineering, Web Information Systems

      Vol:
    E97-D No:2
      Page(s):
    213-222

    Energy-aware distributed file systems are increasingly moving toward power-proportional designs. However, current works have not considered the cost of updating data sets that were modified in a low-power mode, where a subset of nodes were powered off. In detail, when the system moves to a high-power mode, it must internally replicate the updated data to the reactivated nodes. Effectively reflecting the updated data is vital in making a distributed file system, such as the Hadoop Distributed File System (HDFS), power proportional. In the current HDFS design, when the system changes power mode, the block replication process is ineffectively restrained by a single NameNode because of access congestion of the metadata information of blocks. This paper presents a novel architecture, a NameNode and DataNode Coupling Hadoop Distributed File System (NDCouplingHDFS), which effectively reflects the updated blocks when the system goes into high-power mode. This is achieved by coupling metadata management and data management at each node to efficiently localize the range of blocks maintained by the metadata. Experiments using actual machines show that NDCouplingHDFS is able to significantly reduce the execution time required to move updated blocks by 46% relative to the normal HDFS. Moreover, NDCouplingHDFS is capable of increasing the throughput of the system supporting MapReduce by applying an index in metadata management.

  • Accordion: An Efficient Gear-Shifting for a Power-Proportional Distributed Data-Placement Method

    Hieu Hanh LE  Satoshi HIKIDA  Haruo YOKOTA  

     
    PAPER

      Pubricized:
    2015/01/21
      Vol:
    E98-D No:5
      Page(s):
    1013-1026

    Power-aware distributed file systems for efficient Big Data processing are increasingly moving towards power-proportional designs. However, current data placement methods for such systems have not given careful consideration to the effect of gear-shifting during operations. If the system wants to shift to a higher gear, it must reallocate the updated datasets that were modified in a lower gear when a subset of the nodes was inactive, but without disrupting the servicing of requests from clients. Inefficient gear-shifting that requires a large amount of data reallocation greatly degrades the system performance. To address this challenge, this paper proposes a data placement method known as Accordion, which uses data replication to arrange the data layout comprehensively and provide efficient gear-shifting. Compared with current methods, Accordion reduces the amount of data transferred, which significantly shortens the period required to reallocate the updated data during gear-shifting then able to improve the performance of the systems. The effect of this reduction is larger with higher gears, so Accordion is suitable for smooth gear-shifting in multigear systems. Moreover, the times when the active nodes serve the requests are well distributed, so Accordion is capable of higher scalability than existing methods based on the I/O throughput performance. Accordion does not require any strict constraint on the number of nodes in the system therefore our proposed method is expected to work well in practical environments. Extensive empirical experiments using actual machines with an Accordion prototype based on the Hadoop Distributed File System demonstrated that our proposed method significantly reduced the period required to transfer updated data, i.e., by 66% compared with an existing method.

  • A Compound Parallel Btree for High Scalability and Availability on Chained Declustering Parallel Systems

    Min LUO  Akitsugu WATANABE  Haruo YOKOTA  

     
    PAPER

      Vol:
    E94-D No:3
      Page(s):
    587-601

    Scalability and availability are the key features of parallel database systems. To realize scalability, many dynamic load-balancing methods with data placement and parallel index structures on shared-nothing parallel infrastructure have been proposed. Data migration with range-partitioned placement using a parallel Btree is one solution. The combination of range partitioning and chained declustered replicas provides high availability (HA) while preserving scalability. However, independent treatment of the primary and backup data in each node requires long failover times. We propose a novel method for the compound treatment of chained declustered replicas using a parallel Btree, termed the Fat-Btree. In the proposed method, a single Fat-Btree provides access paths to both the primary and backup data of all processor elements (PEs), which greatly reduces failover time. Moreover, these access paths overlap between two neighboring PEs, which enables dynamic load balancing without physical data migration by dynamically redirecting the access paths. In addition, this compound treatment improves memory space utilization to enable index processing with good scalability. Experiments using PostgreSQL on a 160-node PC cluster demonstrate the effectiveness of the high scalability and availability of our proposed method.