The search functionality is under construction.

IEICE TRANSACTIONS on Information

An EM-Based Approach for Mining Word Senses from Corpora

Thatsanee CHAROENPORN, Canasai KRUENGKRAI, Thanaruk THEERAMUNKONG, Virach SORNLERTLAMVANICH

  • Full Text Views

    0

  • Cite this

Summary :

Manually collecting contexts of a target word and grouping them based on their meanings yields a set of word senses but the task is quite tedious. Towards automated lexicography, this paper proposes a word-sense discrimination method based on two modern techniques; EM algorithm and principal component analysis (PCA). The spherical Gaussian EM algorithm enhanced with PCA for robust initialization is proposed to cluster word senses of a target word automatically. Three variants of the algorithm, namely PCA, sGEM, and PCA-sGEM, are investigated using a gold standard dataset of two polysemous words. The clustering result is evaluated using the measures of purity and entropy as well as a more recent measure called normalized mutual information (NMI). The experimental result indicates that the proposed algorithms gain promising performance with regard to discriminate word senses and the PCA-sGEM outperforms the other two methods to some extent.

Publication
IEICE TRANSACTIONS on Information Vol.E90-D No.4 pp.775-782
Publication Date
2007/04/01
Publicized
Online ISSN
1745-1361
DOI
10.1093/ietisy/e90-d.4.775
Type of Manuscript
PAPER
Category
Natural Language Processing

Authors

Keyword