In this paper, we propose two weighting techniques to improve performances of query expansion in biomedical document retrieval, especially when a short biomedical term in a query is expanded with its synonymous multi-word terms. When a query contains synonymous terms of different lengths, a traditional IR model highly ranks a document containing a longer terminology because a longer terminology has more chance to be matched with a query. However, such preference is clearly inappropriate and it often yields an unsatisfactory result. To alleviate the bias weighting problem, we devise a method of normalizing the weights of query terms in a long multi-word biomedical term, and a method of discriminating terms by using inverse terminology frequency which is a novel statistics estimated in a query domain. The experiment results on MEDLINE corpus show that our two simple techniques improve the retrieval performance by adjusting the inadequate preference for long multi-word terminologies in an expanded query.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Young-In SONG, Kyoung-Soo HAN, So-Young PARK, Sang-Bum KIM, Hae-Chang RIM, "Simple Weighting Techniques for Query Expansion in Biomedical Document Retrieval" in IEICE TRANSACTIONS on Information,
vol. E90-D, no. 11, pp. 1873-1876, November 2007, doi: 10.1093/ietisy/e90-d.11.1873.
Abstract: In this paper, we propose two weighting techniques to improve performances of query expansion in biomedical document retrieval, especially when a short biomedical term in a query is expanded with its synonymous multi-word terms. When a query contains synonymous terms of different lengths, a traditional IR model highly ranks a document containing a longer terminology because a longer terminology has more chance to be matched with a query. However, such preference is clearly inappropriate and it often yields an unsatisfactory result. To alleviate the bias weighting problem, we devise a method of normalizing the weights of query terms in a long multi-word biomedical term, and a method of discriminating terms by using inverse terminology frequency which is a novel statistics estimated in a query domain. The experiment results on MEDLINE corpus show that our two simple techniques improve the retrieval performance by adjusting the inadequate preference for long multi-word terminologies in an expanded query.
URL: https://global.ieice.org/en_transactions/information/10.1093/ietisy/e90-d.11.1873/_p
Copy
@ARTICLE{e90-d_11_1873,
author={Young-In SONG, Kyoung-Soo HAN, So-Young PARK, Sang-Bum KIM, Hae-Chang RIM, },
journal={IEICE TRANSACTIONS on Information},
title={Simple Weighting Techniques for Query Expansion in Biomedical Document Retrieval},
year={2007},
volume={E90-D},
number={11},
pages={1873-1876},
abstract={In this paper, we propose two weighting techniques to improve performances of query expansion in biomedical document retrieval, especially when a short biomedical term in a query is expanded with its synonymous multi-word terms. When a query contains synonymous terms of different lengths, a traditional IR model highly ranks a document containing a longer terminology because a longer terminology has more chance to be matched with a query. However, such preference is clearly inappropriate and it often yields an unsatisfactory result. To alleviate the bias weighting problem, we devise a method of normalizing the weights of query terms in a long multi-word biomedical term, and a method of discriminating terms by using inverse terminology frequency which is a novel statistics estimated in a query domain. The experiment results on MEDLINE corpus show that our two simple techniques improve the retrieval performance by adjusting the inadequate preference for long multi-word terminologies in an expanded query.},
keywords={},
doi={10.1093/ietisy/e90-d.11.1873},
ISSN={1745-1361},
month={November},}
Copy
TY - JOUR
TI - Simple Weighting Techniques for Query Expansion in Biomedical Document Retrieval
T2 - IEICE TRANSACTIONS on Information
SP - 1873
EP - 1876
AU - Young-In SONG
AU - Kyoung-Soo HAN
AU - So-Young PARK
AU - Sang-Bum KIM
AU - Hae-Chang RIM
PY - 2007
DO - 10.1093/ietisy/e90-d.11.1873
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E90-D
IS - 11
JA - IEICE TRANSACTIONS on Information
Y1 - November 2007
AB - In this paper, we propose two weighting techniques to improve performances of query expansion in biomedical document retrieval, especially when a short biomedical term in a query is expanded with its synonymous multi-word terms. When a query contains synonymous terms of different lengths, a traditional IR model highly ranks a document containing a longer terminology because a longer terminology has more chance to be matched with a query. However, such preference is clearly inappropriate and it often yields an unsatisfactory result. To alleviate the bias weighting problem, we devise a method of normalizing the weights of query terms in a long multi-word biomedical term, and a method of discriminating terms by using inverse terminology frequency which is a novel statistics estimated in a query domain. The experiment results on MEDLINE corpus show that our two simple techniques improve the retrieval performance by adjusting the inadequate preference for long multi-word terminologies in an expanded query.
ER -