The search functionality is under construction.

IEICE TRANSACTIONS on Information

Continuous Similarity Search for Dynamic Text Streams

Yuma TSUCHIDA, Kohei KUBO, Hisashi KOGA

  • Full Text Views

    0

  • Cite this

Summary :

Similarity search for data streams has attracted much attention for information recommendation. In this context, recent leading works regard the latest W items in a data stream as an evolving set and reduce similarity search for data streams to set similarity search. Whereas they consider standard sets composed of items, this paper uniquely studies similarity search for text streams and treats evolving sets whose elements are texts. Specifically, we formulate a new continuous range search problem named the CTS problem (Continuous similarity search for Text Sets). The task of the CTS problem is to find all the text streams from the database whose similarity to the query becomes larger than a threshold ε. It abstracts a scenario in which a user-based recommendation system searches similar users from social networking services. The CTS is important because it allows both the query and the database to change dynamically. We develop a fast pruning-based algorithm for the CTS. Moreover, we discuss how to speed up it with the inverted index.

Publication
IEICE TRANSACTIONS on Information Vol.E106-D No.12 pp.2026-2035
Publication Date
2023/12/01
Publicized
2023/09/21
Online ISSN
1745-1361
DOI
10.1587/transinf.2022EDP7229
Type of Manuscript
PAPER
Category
Data Engineering, Web Information Systems

Authors

Yuma TSUCHIDA
  University of Electro-Communications
Kohei KUBO
  University of Electro-Communications
Hisashi KOGA
  University of Electro-Communications

Keyword