IEICE global.ieice.org Site

Keyword Search Result

[Keyword] pivot(6hit)

1-6hit

Pivot Generation Algorithm with a Complete Binary Tree for Efficient Exact Similarity Search
Yuki YAMAGISHI Kazuo AOYAMA Kazumi SAITO Tetsuo IKEDA

PAPER-Data Engineering, Web Information Systems

Pubricized:
2017/10/20
Vol:
E101-D No:1
Page(s):
142-151
This paper presents a pivot-set generation algorithm for accelerating exact similarity search in a large-scale data set. To deal with the large-scale data set, it is important to efficiently construct a search index offline as well as to perform fast exact similarity search online. Our proposed algorithm efficiently generates competent pivots with two novel techniques: hierarchical data partitioning and fast pivot optimization techniques. To make effective use of a small number of pivots, the former recursively partitions a data set into two subsets with the same size depending on the rank order from each of two assigned pivots, resulting in a complete binary tree. The latter calculates a defined objective function for pivot optimization with a low computational cost by skillfully operating data objects mapped into a pivot space. Since the generated pivots provide the tight lower bounds on distances between a query object and the data objects, an exact similarity search algorithm effectively avoids unnecessary distance calculations. We demonstrate that the search algorithm using the pivots generated by the proposed algorithm reduces distance calculations with an extremely high rate regarding a range query problem for real large-scale image data sets.
Efficient Similarity Search with a Pivot-Based Complete Binary Tree
Yuki YAMAGISHI Kazuo AOYAMA Kazumi SAITO Tetsuo IKEDA

PAPER-Data Engineering, Web Information Systems

Pubricized:
2017/07/04
Vol:
E100-D No:10
Page(s):
2526-2536
This paper presents an efficient similarity search method utilizing as an index a complete binary tree (CBT) based on optimized pivots for a large-scale and high-dimensional data set. A similarity search method, in general, requires high-speed performance on both index construction off-line and similarity search itself online. To fulfill the requirement, we introduce novel techniques into an index construction and a similarity search algorithm in the proposed method for a range query. The index construction algorithm recursively employs the following two main functions, resulting in a CBT index. One is a pivot generation function that obtains one effective pivot at each node by efficiently maximizing a defined objective function. The other is a node bisection function that partitions a set of objects at a node into two almost equal-sized subsets based on the optimized pivot. The similarity search algorithm employs a three-stage process that narrows down candidate objects within a given range by pruning unnecessary branches and filtering objects in each stage. Experimental results on one million real image data set with high dimensionality demonstrate that the proposed method finds an exact solution for a range query at around one-quarter to half of the computational cost of one of the state-of-the-art methods, by using a CBT index constructed off-line at a reasonable computational cost.
Nearest Neighbor Search with the Revised TLAESA
Dong WANG Hiroyuki MITSUHARA Masami SHISHIBORI

PAPER

Vol:
E98-D No:1
Page(s):
65-77
It is significant to develop better search methods to handle the rapidly increasing volume of multimedia data. For NN (Nearest Neighbor) search in metric spaces, the TLAESA (Tree Linear Approximating and Eliminating Search Algorithm) is a state of art fast search method. In this paper a method is proposed to improve the TLAESA by revising the tree structure with an optimal number of selected global pivots in the higher levels as representatives and employing the best-first search strategy. Based on an improved version of the TLAESA that succeeds in using the best-first search strategy to greatly reduce the distance calculations, this method improves the drawback that calculating less at the price of the lower pruning rate of branches. The lower pruning rate further can lead to lower search efficiency, because the priority queue used in the adopted best-first search strategy stores the information of the visited but unpruned nodes, and need be frequently accessed and sorted. In order to enhance the pruning rate of branches, the improved method tries to make more selected global pivots locate in the higher levels of the search tree as representatives. As more real distances instead of lower bound estimations of the node-representatives are used for approximating the closet node and for “branch and bound”, not only which nodes are close to the query object can be evaluated more effectively, but also the pruning rate of branches can be enhanced. Experiments show that for k-NN queries in Euclidean space, in a proper pivot selection strategy the proposed method can reach the same fewest distance calculations as the LAESA (Linear Approximating and Eliminating Search Algorithm) which saves more calculations than the TLAESA, and can achieve a higher search efficiency than the TLAESA.
Optimal Pivot Selection Method Based on the Partition and the Pruning Effect for Metric Space Indexes
Hisashi KURASAWA Daiji FUKAGAWA Atsuhiro TAKASU Jun ADACHI

PAPER

Vol:
E94-D No:3
Page(s):
504-514
This paper proposes a new method to reduce the cost of nearest neighbor searches in metric spaces. Many similarity search indexes recursively divide a region into subregions by using pivots, and construct a tree-structured index. Most of recently developed indexes focus on pruning objects and do not pay much attention to the tree balancing. As a result, indexes having imbalanced tree-structure may be constructed and the search cost is degraded. We propose a similarity search index called the Partitioning Capacity (PC) Tree. It selects the optimal pivot in terms of the PC that quantifies the balance of the regions partitioned by a pivot as well as the estimated effectiveness of the search pruning by the pivot. As a result, PCTree reduces the search cost for various data distributions. We experimentally compared PCTree with four indexes using synthetic data and five real datasets. The experimental results shows that the PCTree successfully reduces the search cost.
Efficient Storage and Querying of Horizontal Tables Using a PIVOT Operation in Commercial Relational DBMSs
Sung-Hyun SHIN Yang-Sae MOON Jinho KIM Sang-Wook KIM

PAPER-Database

Vol:
E91-D No:6
Page(s):
1719-1729
In recent years, a horizontal table with a large number of attributes is widely used in OLAP or e-business applications to analyze multidimensional data efficiently. For efficient storing and querying of horizontal tables, recent works have tried to transform a horizontal table to a traditional vertical table. Existing works, however, have the drawback of not considering an optimized PIVOT operation provided (or to be provided) in recent commercial RDBMSs. In this paper we propose a formal approach that exploits the optimized PIVOT operation of commercial RDBMSs for storing and querying of horizontal tables. To achieve this goal, we first provide an overall framework that stores and queries a horizontal table using an equivalent vertical table. Under the proposed framework, we then formally define 1) a method that stores a horizontal table in an equivalent vertical table and 2) a PIVOT operation that converts a stored vertical table to an equivalent horizontal view. Next, we propose a novel method that transforms a user-specified query on horizontal tables to an equivalent PIVOT-included query on vertical tables. In particular, by providing transformation rules for all five elementary operations in relational algebra as theorems, we prove our method is theoretically applicable to commercial RDBMSs. Experimental results show that, compared with the earlier work, our method reduces storage space significantly and also improves average performance by several orders of magnitude. These results indicate that our method provides an excellent framework to maximize performance in handling horizontal tables by exploiting the optimized PIVOT operation in commercial RDBMSs.
Structuring Search Space for Accelerating Large Set Character Recognition
Yiping YANG Bilan ZHU Masaki NAKAGAWA

PAPER-Search Space for Character Recognition

Vol:
E88-D No:8
Page(s):
1799-1806
This paper proposes a "structuring search space" (SSS) method aimed to accelerate recognition of large character sets. We divide the feature space of character categories into smaller clusters and derive the centroid of each cluster as a pivot. Given an input pattern, it is compared with all the pivots and only a limited number of clusters whose pivots have higher similarity (or smaller distance) to the input pattern are searched in, thus accelerating the recognition speed. This is based on the assumption that the search space is a distance space. We also consider two ways of candidate selection and finally combine them the method has been applied to a practical off-line Japanese character recognizer with the result that the coarse classification time is reduced to 56% and the whole recognition time is reduced to 52% while keeping its recognition rate as the original.