Exact Algorithm to Solve Continuous Similarity Search for Evolving Queries and Its Variant

Tomohiro YAMAZAKI; Hisashi KOGA

doi:10.1587/transinf.2021DAP0003

Exact Algorithm to Solve Continuous Similarity Search for Evolving Queries and Its Variant

Tomohiro YAMAZAKI, Hisashi KOGA

Full Text Views

0

Cite this

Summary :

We study the continuous similarity search problem for evolving queries which has recently been formulated. Given a data stream and a database composed of n sets of items, the purpose of this problem is to maintain the top-k most similar sets to the query which evolves over time and consists of the latest W items in the data stream. For this problem, the previous exact algorithm adopts a pruning strategy which, at the present time T, decides the candidates of the top-k most similar sets from past similarity values and computes the similarity values only for them. This paper proposes a new exact algorithm which shortens the execution time by computing the similarity values only for sets whose similarity values at T can change from time T-1. We identify such sets very fast with frequency-based inverted lists (FIL). Moreover, we derive the similarity values at T in O(1) time by updating the previous values computed at time T-1. Experimentally, our exact algorithm runs faster than the previous exact algorithm by one order of magnitude and as fast as the previous approximation algorithm.

Publication: IEICE TRANSACTIONS on Information Vol.E105-D No.5 pp.898-908

Publication Date: 2022/05/01

Publicized: 2022/02/07

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2021DAP0003

Type of Manuscript: Special Section PAPER (Special Section on Data Engineering and Information Management)

Category

Authors

Tomohiro YAMAZAKI
Engineering, the Univeristy of Electro-Communications
Hisashi KOGA
Engineering, the Univeristy of Electro-Communications

Keyword

Cite this

Copy

Tomohiro YAMAZAKI, Hisashi KOGA, "Exact Algorithm to Solve Continuous Similarity Search for Evolving Queries and Its Variant" in IEICE TRANSACTIONS on Information, vol. E105-D, no. 5, pp. 898-908, May 2022, doi: 10.1587/transinf.2021DAP0003.
Abstract: We study the continuous similarity search problem for evolving queries which has recently been formulated. Given a data stream and a database composed of n sets of items, the purpose of this problem is to maintain the top-k most similar sets to the query which evolves over time and consists of the latest W items in the data stream. For this problem, the previous exact algorithm adopts a pruning strategy which, at the present time T, decides the candidates of the top-k most similar sets from past similarity values and computes the similarity values only for them. This paper proposes a new exact algorithm which shortens the execution time by computing the similarity values only for sets whose similarity values at T can change from time T-1. We identify such sets very fast with frequency-based inverted lists (FIL). Moreover, we derive the similarity values at T in O(1) time by updating the previous values computed at time T-1. Experimentally, our exact algorithm runs faster than the previous exact algorithm by one order of magnitude and as fast as the previous approximation algorithm.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2021DAP0003/_p

Copy

@ARTICLE{e105-d_5_898,
author={Tomohiro YAMAZAKI, Hisashi KOGA, },
journal={IEICE TRANSACTIONS on Information},
title={Exact Algorithm to Solve Continuous Similarity Search for Evolving Queries and Its Variant},
year={2022},
volume={E105-D},
number={5},
pages={898-908},
abstract={We study the continuous similarity search problem for evolving queries which has recently been formulated. Given a data stream and a database composed of n sets of items, the purpose of this problem is to maintain the top-k most similar sets to the query which evolves over time and consists of the latest W items in the data stream. For this problem, the previous exact algorithm adopts a pruning strategy which, at the present time T, decides the candidates of the top-k most similar sets from past similarity values and computes the similarity values only for them. This paper proposes a new exact algorithm which shortens the execution time by computing the similarity values only for sets whose similarity values at T can change from time T-1. We identify such sets very fast with frequency-based inverted lists (FIL). Moreover, we derive the similarity values at T in O(1) time by updating the previous values computed at time T-1. Experimentally, our exact algorithm runs faster than the previous exact algorithm by one order of magnitude and as fast as the previous approximation algorithm.},
keywords={},
doi={10.1587/transinf.2021DAP0003},
ISSN={1745-1361},
month={May},}

Copy

TY - JOUR
TI - Exact Algorithm to Solve Continuous Similarity Search for Evolving Queries and Its Variant
T2 - IEICE TRANSACTIONS on Information
SP - 898
EP - 908
AU - Tomohiro YAMAZAKI
AU - Hisashi KOGA
PY - 2022
DO - 10.1587/transinf.2021DAP0003
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E105-D
IS - 5
JA - IEICE TRANSACTIONS on Information
Y1 - May 2022
AB - We study the continuous similarity search problem for evolving queries which has recently been formulated. Given a data stream and a database composed of n sets of items, the purpose of this problem is to maintain the top-k most similar sets to the query which evolves over time and consists of the latest W items in the data stream. For this problem, the previous exact algorithm adopts a pruning strategy which, at the present time T, decides the candidates of the top-k most similar sets from past similarity values and computes the similarity values only for them. This paper proposes a new exact algorithm which shortens the execution time by computing the similarity values only for sets whose similarity values at T can change from time T-1. We identify such sets very fast with frequency-based inverted lists (FIL). Moreover, we derive the similarity values at T in O(1) time by updating the previous values computed at time T-1. Experimentally, our exact algorithm runs faster than the previous exact algorithm by one order of magnitude and as fast as the previous approximation algorithm.
ER -