Pivot Generation Algorithm with a Complete Binary Tree for Efficient Exact Similarity Search

Yuki YAMAGISHI; Kazuo AOYAMA; Kazumi SAITO; Tetsuo IKEDA

doi:10.1587/transinf.2017EDP7077

Pivot Generation Algorithm with a Complete Binary Tree for Efficient Exact Similarity Search

Yuki YAMAGISHI, Kazuo AOYAMA, Kazumi SAITO, Tetsuo IKEDA

Full Text Views

0

Cite this

Summary :

This paper presents a pivot-set generation algorithm for accelerating exact similarity search in a large-scale data set. To deal with the large-scale data set, it is important to efficiently construct a search index offline as well as to perform fast exact similarity search online. Our proposed algorithm efficiently generates competent pivots with two novel techniques: hierarchical data partitioning and fast pivot optimization techniques. To make effective use of a small number of pivots, the former recursively partitions a data set into two subsets with the same size depending on the rank order from each of two assigned pivots, resulting in a complete binary tree. The latter calculates a defined objective function for pivot optimization with a low computational cost by skillfully operating data objects mapped into a pivot space. Since the generated pivots provide the tight lower bounds on distances between a query object and the data objects, an exact similarity search algorithm effectively avoids unnecessary distance calculations. We demonstrate that the search algorithm using the pivots generated by the proposed algorithm reduces distance calculations with an extremely high rate regarding a range query problem for real large-scale image data sets.

Publication: IEICE TRANSACTIONS on Information Vol.E101-D No.1 pp.142-151

Publication Date: 2018/01/01

Publicized: 2017/10/20

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2017EDP7077

Type of Manuscript: PAPER

Category: Data Engineering, Web Information Systems

Authors

Yuki YAMAGISHI
  University of Shizuoka
Kazuo AOYAMA
  NTT Communication Science Laboratories
Kazumi SAITO
  University of Shizuoka
Tetsuo IKEDA
  University of Shizuoka

Keyword

Cite this

Copy

Yuki YAMAGISHI, Kazuo AOYAMA, Kazumi SAITO, Tetsuo IKEDA, "Pivot Generation Algorithm with a Complete Binary Tree for Efficient Exact Similarity Search" in IEICE TRANSACTIONS on Information, vol. E101-D, no. 1, pp. 142-151, January 2018, doi: 10.1587/transinf.2017EDP7077.
Abstract: This paper presents a pivot-set generation algorithm for accelerating exact similarity search in a large-scale data set. To deal with the large-scale data set, it is important to efficiently construct a search index offline as well as to perform fast exact similarity search online. Our proposed algorithm efficiently generates competent pivots with two novel techniques: hierarchical data partitioning and fast pivot optimization techniques. To make effective use of a small number of pivots, the former recursively partitions a data set into two subsets with the same size depending on the rank order from each of two assigned pivots, resulting in a complete binary tree. The latter calculates a defined objective function for pivot optimization with a low computational cost by skillfully operating data objects mapped into a pivot space. Since the generated pivots provide the tight lower bounds on distances between a query object and the data objects, an exact similarity search algorithm effectively avoids unnecessary distance calculations. We demonstrate that the search algorithm using the pivots generated by the proposed algorithm reduces distance calculations with an extremely high rate regarding a range query problem for real large-scale image data sets.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2017EDP7077/_p

Copy

@ARTICLE{e101-d_1_142,
author={Yuki YAMAGISHI, Kazuo AOYAMA, Kazumi SAITO, Tetsuo IKEDA, },
journal={IEICE TRANSACTIONS on Information},
title={Pivot Generation Algorithm with a Complete Binary Tree for Efficient Exact Similarity Search},
year={2018},
volume={E101-D},
number={1},
pages={142-151},
abstract={This paper presents a pivot-set generation algorithm for accelerating exact similarity search in a large-scale data set. To deal with the large-scale data set, it is important to efficiently construct a search index offline as well as to perform fast exact similarity search online. Our proposed algorithm efficiently generates competent pivots with two novel techniques: hierarchical data partitioning and fast pivot optimization techniques. To make effective use of a small number of pivots, the former recursively partitions a data set into two subsets with the same size depending on the rank order from each of two assigned pivots, resulting in a complete binary tree. The latter calculates a defined objective function for pivot optimization with a low computational cost by skillfully operating data objects mapped into a pivot space. Since the generated pivots provide the tight lower bounds on distances between a query object and the data objects, an exact similarity search algorithm effectively avoids unnecessary distance calculations. We demonstrate that the search algorithm using the pivots generated by the proposed algorithm reduces distance calculations with an extremely high rate regarding a range query problem for real large-scale image data sets.},
keywords={},
doi={10.1587/transinf.2017EDP7077},
ISSN={1745-1361},
month={January},}

Copy

TY - JOUR
TI - Pivot Generation Algorithm with a Complete Binary Tree for Efficient Exact Similarity Search
T2 - IEICE TRANSACTIONS on Information
SP - 142
EP - 151
AU - Yuki YAMAGISHI
AU - Kazuo AOYAMA
AU - Kazumi SAITO
AU - Tetsuo IKEDA
PY - 2018
DO - 10.1587/transinf.2017EDP7077
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E101-D
IS - 1
JA - IEICE TRANSACTIONS on Information
Y1 - January 2018
AB - This paper presents a pivot-set generation algorithm for accelerating exact similarity search in a large-scale data set. To deal with the large-scale data set, it is important to efficiently construct a search index offline as well as to perform fast exact similarity search online. Our proposed algorithm efficiently generates competent pivots with two novel techniques: hierarchical data partitioning and fast pivot optimization techniques. To make effective use of a small number of pivots, the former recursively partitions a data set into two subsets with the same size depending on the rank order from each of two assigned pivots, resulting in a complete binary tree. The latter calculates a defined objective function for pivot optimization with a low computational cost by skillfully operating data objects mapped into a pivot space. Since the generated pivots provide the tight lower bounds on distances between a query object and the data objects, an exact similarity search algorithm effectively avoids unnecessary distance calculations. We demonstrate that the search algorithm using the pivots generated by the proposed algorithm reduces distance calculations with an extremely high rate regarding a range query problem for real large-scale image data sets.
ER -