When we search from a huge amount of documents, we often specify several keywords and use conjunctive queries to narrow the result of the search. Though the searched documents contain all keywords, positions of the keywords are usually not considered. As a result, the search result contains some meaningless documents. It is therefore effective to rank documents according to proximity of keywords in the documents. This ranking is regarded as a kind of text data mining. In this paper, we propose two algorithms for finding documents in which all given keywords appear in neighboring places. One is based on plane-sweep algorithm and the other is based on divide-and-conquer approach. Both algorithms run in O(n log n) time where n is the number of occurrences of given keywords. We run the algorithms on a large collection of html files and verify its effectiveness.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Kunihiko SADAKANE, Hiroshi IMAI, "Fast Algorithms for k-Word Proximity Search" in IEICE TRANSACTIONS on Fundamentals,
vol. E84-A, no. 9, pp. 2311-2318, September 2001, doi: .
Abstract: When we search from a huge amount of documents, we often specify several keywords and use conjunctive queries to narrow the result of the search. Though the searched documents contain all keywords, positions of the keywords are usually not considered. As a result, the search result contains some meaningless documents. It is therefore effective to rank documents according to proximity of keywords in the documents. This ranking is regarded as a kind of text data mining. In this paper, we propose two algorithms for finding documents in which all given keywords appear in neighboring places. One is based on plane-sweep algorithm and the other is based on divide-and-conquer approach. Both algorithms run in O(n log n) time where n is the number of occurrences of given keywords. We run the algorithms on a large collection of html files and verify its effectiveness.
URL: https://global.ieice.org/en_transactions/fundamentals/10.1587/e84-a_9_2311/_p
Copy
@ARTICLE{e84-a_9_2311,
author={Kunihiko SADAKANE, Hiroshi IMAI, },
journal={IEICE TRANSACTIONS on Fundamentals},
title={Fast Algorithms for k-Word Proximity Search},
year={2001},
volume={E84-A},
number={9},
pages={2311-2318},
abstract={When we search from a huge amount of documents, we often specify several keywords and use conjunctive queries to narrow the result of the search. Though the searched documents contain all keywords, positions of the keywords are usually not considered. As a result, the search result contains some meaningless documents. It is therefore effective to rank documents according to proximity of keywords in the documents. This ranking is regarded as a kind of text data mining. In this paper, we propose two algorithms for finding documents in which all given keywords appear in neighboring places. One is based on plane-sweep algorithm and the other is based on divide-and-conquer approach. Both algorithms run in O(n log n) time where n is the number of occurrences of given keywords. We run the algorithms on a large collection of html files and verify its effectiveness.},
keywords={},
doi={},
ISSN={},
month={September},}
Copy
TY - JOUR
TI - Fast Algorithms for k-Word Proximity Search
T2 - IEICE TRANSACTIONS on Fundamentals
SP - 2311
EP - 2318
AU - Kunihiko SADAKANE
AU - Hiroshi IMAI
PY - 2001
DO -
JO - IEICE TRANSACTIONS on Fundamentals
SN -
VL - E84-A
IS - 9
JA - IEICE TRANSACTIONS on Fundamentals
Y1 - September 2001
AB - When we search from a huge amount of documents, we often specify several keywords and use conjunctive queries to narrow the result of the search. Though the searched documents contain all keywords, positions of the keywords are usually not considered. As a result, the search result contains some meaningless documents. It is therefore effective to rank documents according to proximity of keywords in the documents. This ranking is regarded as a kind of text data mining. In this paper, we propose two algorithms for finding documents in which all given keywords appear in neighboring places. One is based on plane-sweep algorithm and the other is based on divide-and-conquer approach. Both algorithms run in O(n log n) time where n is the number of occurrences of given keywords. We run the algorithms on a large collection of html files and verify its effectiveness.
ER -