1-4hit |
Tomohiro YAMAZAKI Hisashi KOGA
We study the continuous similarity search problem for evolving queries which has recently been formulated. Given a data stream and a database composed of n sets of items, the purpose of this problem is to maintain the top-k most similar sets to the query which evolves over time and consists of the latest W items in the data stream. For this problem, the previous exact algorithm adopts a pruning strategy which, at the present time T, decides the candidates of the top-k most similar sets from past similarity values and computes the similarity values only for them. This paper proposes a new exact algorithm which shortens the execution time by computing the similarity values only for sets whose similarity values at T can change from time T-1. We identify such sets very fast with frequency-based inverted lists (FIL). Moreover, we derive the similarity values at T in O(1) time by updating the previous values computed at time T-1. Experimentally, our exact algorithm runs faster than the previous exact algorithm by one order of magnitude and as fast as the previous approximation algorithm.
Gibran FUENTES PINEDA Hisashi KOGA Toshinori WATANABE
We present a scalable approach to automatically discovering particular objects (as opposed to object categories) from a set of images. The basic idea is to search for local image features that consistently appear in the same images under the assumption that such co-occurring features underlie the same object. We first represent each image in the set as a set of visual words (vector quantized local image features) and construct an inverted file to memorize the set of images in which each visual word appears. Then, our object discovery method proceeds by searching the inverted file and extracting visual word sets whose elements tend to appear in the same images; such visual word sets are called co-occurring word sets. Because of unstable and polysemous visual words, a co-occurring word set typically represents only a part of an object. We observe that co-occurring word sets associated with the same object often share many visual words with one another. Hence, to obtain the object models, we further cluster highly overlapping co-occurring word sets in an agglomerative manner. Remarkably, we accelerate both extraction and clustering of co-occurring word sets by Min-Hashing. We show that the models generated by our method can effectively discriminate particular objects. We demonstrate our method on the Oxford buildings dataset. In a quantitative evaluation using a set of ground truth landmarks, our method achieved higher scores than the state-of-the-art methods.
Yuma TSUCHIDA Kohei KUBO Hisashi KOGA
Similarity search for data streams has attracted much attention for information recommendation. In this context, recent leading works regard the latest W items in a data stream as an evolving set and reduce similarity search for data streams to set similarity search. Whereas they consider standard sets composed of items, this paper uniquely studies similarity search for text streams and treats evolving sets whose elements are texts. Specifically, we formulate a new continuous range search problem named the CTS problem (Continuous similarity search for Text Sets). The task of the CTS problem is to find all the text streams from the database whose similarity to the query becomes larger than a threshold ε. It abstracts a scenario in which a user-based recommendation system searches similar users from social networking services. The CTS is important because it allows both the query and the database to change dynamically. We develop a fast pruning-based algorithm for the CTS. Moreover, we discuss how to speed up it with the inverted index.