The search functionality is under construction.
The search functionality is under construction.

Author Search Result

[Author] Kazuo AOYAMA(5hit)

1-5hit
  • Efficient K-Nearest Neighbor Graph Construction Using MapReduce for Large-Scale Data Sets

    Tomohiro WARASHINA  Kazuo AOYAMA  Hiroshi SAWADA  Takashi HATTORI  

     
    PAPER-Data Engineering, Web Information Systems

      Vol:
    E97-D No:12
      Page(s):
    3142-3154

    This paper presents an efficient method using Hadoop MapReduce for constructing a K-nearest neighbor graph (K-NNG) from a large-scale data set. K-NNG has been utilized as a data structure for data analysis techniques in various applications. If we are to apply the techniques to a large-scale data set, it is desirable that we develop an efficient K-NNG construction method. We focus on NN-Descent, which is a recently proposed method that efficiently constructs an approximate K-NNG. NN-Descent is implemented on a shared-memory system with OpenMP-based parallelization, and its extension for the Hadoop MapReduce framework is implied for a larger data set such that the shared-memory system is difficult to deal with. However, a simple extension for the Hadoop MapReduce framework is impractical since it requires extremely high system performance because of the high memory consumption and the low data transmission efficiency of MapReduce jobs. The proposed method relaxes the requirement by improving the MapReduce jobs, which employs an appropriate key-value pair format and an efficient sampling strategy. Experiments on large-scale data sets demonstrate that the proposed method both works efficiently and is scalable in terms of a data size, the number of machine nodes, and the graph structural parameter K.

  • Efficient Similarity Search with a Pivot-Based Complete Binary Tree

    Yuki YAMAGISHI  Kazuo AOYAMA  Kazumi SAITO  Tetsuo IKEDA  

     
    PAPER-Data Engineering, Web Information Systems

      Pubricized:
    2017/07/04
      Vol:
    E100-D No:10
      Page(s):
    2526-2536

    This paper presents an efficient similarity search method utilizing as an index a complete binary tree (CBT) based on optimized pivots for a large-scale and high-dimensional data set. A similarity search method, in general, requires high-speed performance on both index construction off-line and similarity search itself online. To fulfill the requirement, we introduce novel techniques into an index construction and a similarity search algorithm in the proposed method for a range query. The index construction algorithm recursively employs the following two main functions, resulting in a CBT index. One is a pivot generation function that obtains one effective pivot at each node by efficiently maximizing a defined objective function. The other is a node bisection function that partitions a set of objects at a node into two almost equal-sized subsets based on the optimized pivot. The similarity search algorithm employs a three-stage process that narrows down candidate objects within a given range by pruning unnecessary branches and filtering objects in each stage. Experimental results on one million real image data set with high dimensionality demonstrate that the proposed method finds an exact solution for a range query at around one-quarter to half of the computational cost of one of the state-of-the-art methods, by using a CBT index constructed off-line at a reasonable computational cost.

  • Accelerating a Lloyd-Type k-Means Clustering Algorithm with Summable Lower Bounds in a Lower-Dimensional Space

    Kazuo AOYAMA  Kazumi SAITO  Tetsuo IKEDA  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2018/08/02
      Vol:
    E101-D No:11
      Page(s):
    2773-2783

    This paper presents an efficient acceleration algorithm for Lloyd-type k-means clustering, which is suitable to a large-scale and high-dimensional data set with potentially numerous classes. The algorithm employs a novel projection-based filter (PRJ) to avoid unnecessary distance calculations, resulting in high-speed performance keeping the same results as a standard Lloyd's algorithm. The PRJ exploits a summable lower bound on a squared distance defined in a lower-dimensional space to which data points are projected. The summable lower bound can make the bound tighter dynamically by incremental addition of components in the lower-dimensional space within each iteration although the existing lower bounds used in other acceleration algorithms work only once as a fixed filter. Experimental results on large-scale and high-dimensional real image data sets demonstrate that the proposed algorithm works at high speed and with low memory consumption when large k values are given, compared with the state-of-the-art algorithms.

  • Pivot Generation Algorithm with a Complete Binary Tree for Efficient Exact Similarity Search

    Yuki YAMAGISHI  Kazuo AOYAMA  Kazumi SAITO  Tetsuo IKEDA  

     
    PAPER-Data Engineering, Web Information Systems

      Pubricized:
    2017/10/20
      Vol:
    E101-D No:1
      Page(s):
    142-151

    This paper presents a pivot-set generation algorithm for accelerating exact similarity search in a large-scale data set. To deal with the large-scale data set, it is important to efficiently construct a search index offline as well as to perform fast exact similarity search online. Our proposed algorithm efficiently generates competent pivots with two novel techniques: hierarchical data partitioning and fast pivot optimization techniques. To make effective use of a small number of pivots, the former recursively partitions a data set into two subsets with the same size depending on the rank order from each of two assigned pivots, resulting in a complete binary tree. The latter calculates a defined objective function for pivot optimization with a low computational cost by skillfully operating data objects mapped into a pivot space. Since the generated pivots provide the tight lower bounds on distances between a query object and the data objects, an exact similarity search algorithm effectively avoids unnecessary distance calculations. We demonstrate that the search algorithm using the pivots generated by the proposed algorithm reduces distance calculations with an extremely high rate regarding a range query problem for real large-scale image data sets.

  • Dependence of CMOS/SIMOX Inverter Delay Time on Gate Overlap Capacitance

    Takakuni DOUSEKI  Kazuo AOYAMA  Yasuhisa OMURA  

     
    PAPER-Electronic Circuits

      Vol:
    E76-C No:8
      Page(s):
    1325-1332

    This paper describes the dependence of the delay time of a CMOS/SIMOX inverter on the gate-overlap capacitance. An analytical delay-time equation for the CMOS/SIMOX inverter, which includes the gate-overlap capacitance, is derived. This equation shows that the feed-forward effect dominates the characteristics of inverters with a small fanout. The validity of the delay-time equation is confirmed by the comparison to experimental measurements of 0.4-µm CMOS/SIMOX devices. Moreover, a sensitivity analysis shows that it is very important to reduce the gate-drain overlap capacitance for fabricating high-speed scaled-down CMOS/SIMOX devices.