The search functionality is under construction.

Author Search Result

[Author] Hiroyuki KAJI(3hit)

1-3hit
  • Adapting a Bilingual Dictionary to Domains

    Hiroyuki KAJI  

     
    PAPER-Natural Language Processing

      Vol:
    E88-D No:2
      Page(s):
    302-312

    Two methods using comparable corpora to select translation equivalents appropriate to a domain were devised and evaluated. The first method ranks translation equivalents of a target word according to similarity of their contexts to that of the target word. The second method ranks translation equivalents according to the ratio of associated words that suggest them. An experiment using the EDR bilingual dictionary together with Wall Street Journal and Nihon Keizai Shimbun corpora showed that the method using the ratio of associated words outperforms the method based on contextual similarity. Namely, in a quantitative evaluation using pseudo words, the maximum F-measure of the former method was 86%, while that of the latter method was 82%. The key feature of the method using the ratio of associated words is that it outputs selected translation equivalents together with representative associated words, enabling the translation equivalents to be validated.

  • Extracting Translation Equivalents from Bilingual Comparable Corpora

    Hiroyuki KAJI  

     
    PAPER-Natural Language Processing

      Vol:
    E88-D No:2
      Page(s):
    313-323

    An improved method for extracting translation equivalents from bilingual comparable corpora according to contextual similarity was developed. This method has two main features. First, a seed bilingual lexicon--which is used to bridge contexts in different languages--is adapted to the corpora from which translation equivalents are to be extracted. Second, the contextual similarity is evaluated by using a combination of similarity measures defined in opposite directions. An experiment using Wall Street Journal and Nihon Keizai Shimbun corpora, together with the EDR bilingual dictionary, demonstrated the effectiveness of the method; it produced lists of candidate translation equivalents with an accuracy of around 30% for frequently occurring unknown words. The method thus proved to be useful for improving the coverage of a bilingual lexicon.

  • Unsupervised Word-Sense Disambiguation Using Bilingual Comparable Corpora

    Hiroyuki KAJI  Yasutsugu MORIMOTO  

     
    PAPER-Natural Language Processing

      Vol:
    E88-D No:2
      Page(s):
    289-301

    An unsupervised method for word-sense disambiguation using bilingual comparable corpora was developed. First, it extracts word associations, i.e., statistically significant pairs of associated words, from the corpus of each language. Then, it aligns word associations by consulting a bilingual dictionary and calculates correlation between senses of a target polysemous word and its associated words, which can be regarded as clues for identifying the sense of the target word. To overcome the problem of disparity of topical coverage between corpora of the two languages as well as the problem of ambiguity in word-association alignment, an algorithm for iteratively calculating a sense-vs.-clue correlation matrix for each target word was devised. Word-sense disambiguation for each instance of the target word is done by selecting the sense that maximizes the score, i.e., a weighted sum of the correlations between each sense and clues appearing in the context of the instance. An experiment using Wall Street Journal and Nihon Keizai Shimbun corpora together with the EDR bilingual dictionary showed that the new method has promising performance; namely, the F-measure of its sense selection was 74.6% compared to a baseline of 62.8%. The developed method will possibly be extended into a fully unsupervised method that features automatic division and definition of word senses.