Question answering (QA) is the task of retrieving an answer in response to a question by analyzing documents. Although most of the efforts in developing QA systems are devoted to dealing with electronic text, we consider it is also necessary to develop systems for document images. In this paper, we propose a method of document image retrieval for such QA systems. Since the task is not to retrieve all relevant documents but to find the answer somewhere in documents, retrieval should be precision oriented. The main contribution of this paper is to propose a method of improving precision of document image retrieval by taking into account the co-occurrence of successive terms in a question. The indexing scheme is based on two-dimensional distributions of terms and the weight of co-occurrence is measured by calculating the density distributions of terms. The proposed method was tested by using 1253 pages of documents about the major league baseball with 20 questions and found that it is superior to the baseline method proposed by the authors.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Koichi KISE, Shota FUKUSHIMA, Keinosuke MATSUMOTO, "Document Image Retrieval for QA Systems Based on the Density Distributions of Successive Terms" in IEICE TRANSACTIONS on Information,
vol. E88-D, no. 8, pp. 1843-1851, August 2005, doi: 10.1093/ietisy/e88-d.8.1843.
Abstract: Question answering (QA) is the task of retrieving an answer in response to a question by analyzing documents. Although most of the efforts in developing QA systems are devoted to dealing with electronic text, we consider it is also necessary to develop systems for document images. In this paper, we propose a method of document image retrieval for such QA systems. Since the task is not to retrieve all relevant documents but to find the answer somewhere in documents, retrieval should be precision oriented. The main contribution of this paper is to propose a method of improving precision of document image retrieval by taking into account the co-occurrence of successive terms in a question. The indexing scheme is based on two-dimensional distributions of terms and the weight of co-occurrence is measured by calculating the density distributions of terms. The proposed method was tested by using 1253 pages of documents about the major league baseball with 20 questions and found that it is superior to the baseline method proposed by the authors.
URL: https://global.ieice.org/en_transactions/information/10.1093/ietisy/e88-d.8.1843/_p
Copy
@ARTICLE{e88-d_8_1843,
author={Koichi KISE, Shota FUKUSHIMA, Keinosuke MATSUMOTO, },
journal={IEICE TRANSACTIONS on Information},
title={Document Image Retrieval for QA Systems Based on the Density Distributions of Successive Terms},
year={2005},
volume={E88-D},
number={8},
pages={1843-1851},
abstract={Question answering (QA) is the task of retrieving an answer in response to a question by analyzing documents. Although most of the efforts in developing QA systems are devoted to dealing with electronic text, we consider it is also necessary to develop systems for document images. In this paper, we propose a method of document image retrieval for such QA systems. Since the task is not to retrieve all relevant documents but to find the answer somewhere in documents, retrieval should be precision oriented. The main contribution of this paper is to propose a method of improving precision of document image retrieval by taking into account the co-occurrence of successive terms in a question. The indexing scheme is based on two-dimensional distributions of terms and the weight of co-occurrence is measured by calculating the density distributions of terms. The proposed method was tested by using 1253 pages of documents about the major league baseball with 20 questions and found that it is superior to the baseline method proposed by the authors.},
keywords={},
doi={10.1093/ietisy/e88-d.8.1843},
ISSN={},
month={August},}
Copy
TY - JOUR
TI - Document Image Retrieval for QA Systems Based on the Density Distributions of Successive Terms
T2 - IEICE TRANSACTIONS on Information
SP - 1843
EP - 1851
AU - Koichi KISE
AU - Shota FUKUSHIMA
AU - Keinosuke MATSUMOTO
PY - 2005
DO - 10.1093/ietisy/e88-d.8.1843
JO - IEICE TRANSACTIONS on Information
SN -
VL - E88-D
IS - 8
JA - IEICE TRANSACTIONS on Information
Y1 - August 2005
AB - Question answering (QA) is the task of retrieving an answer in response to a question by analyzing documents. Although most of the efforts in developing QA systems are devoted to dealing with electronic text, we consider it is also necessary to develop systems for document images. In this paper, we propose a method of document image retrieval for such QA systems. Since the task is not to retrieve all relevant documents but to find the answer somewhere in documents, retrieval should be precision oriented. The main contribution of this paper is to propose a method of improving precision of document image retrieval by taking into account the co-occurrence of successive terms in a question. The indexing scheme is based on two-dimensional distributions of terms and the weight of co-occurrence is measured by calculating the density distributions of terms. The proposed method was tested by using 1253 pages of documents about the major league baseball with 20 questions and found that it is superior to the baseline method proposed by the authors.
ER -