The search functionality is under construction.
The search functionality is under construction.

Unsupervised Word-Sense Disambiguation Using Bilingual Comparable Corpora

Hiroyuki KAJI, Yasutsugu MORIMOTO

  • Full Text Views

    0

  • Cite this

Summary :

An unsupervised method for word-sense disambiguation using bilingual comparable corpora was developed. First, it extracts word associations, i.e., statistically significant pairs of associated words, from the corpus of each language. Then, it aligns word associations by consulting a bilingual dictionary and calculates correlation between senses of a target polysemous word and its associated words, which can be regarded as clues for identifying the sense of the target word. To overcome the problem of disparity of topical coverage between corpora of the two languages as well as the problem of ambiguity in word-association alignment, an algorithm for iteratively calculating a sense-vs.-clue correlation matrix for each target word was devised. Word-sense disambiguation for each instance of the target word is done by selecting the sense that maximizes the score, i.e., a weighted sum of the correlations between each sense and clues appearing in the context of the instance. An experiment using Wall Street Journal and Nihon Keizai Shimbun corpora together with the EDR bilingual dictionary showed that the new method has promising performance; namely, the F-measure of its sense selection was 74.6% compared to a baseline of 62.8%. The developed method will possibly be extended into a fully unsupervised method that features automatic division and definition of word senses.

Publication
IEICE TRANSACTIONS on Information Vol.E88-D No.2 pp.289-301
Publication Date
2005/02/01
Publicized
Online ISSN
DOI
10.1093/ietisy/e88-d.2.289
Type of Manuscript
PAPER
Category
Natural Language Processing

Authors

Keyword