The search functionality is under construction.

IEICE TRANSACTIONS on Information

Splitting Input for Machine Translation Using N-gram Language Model Together with Utterance Similarity

Takao DOI, Eiichiro SUMITA

  • Full Text Views

    0

  • Cite this

Summary :

In order to boost the translation quality of corpus-based MT systems for speech translation, the technique of splitting an input utterance appears promising. In previous research, many methods used word-sequence characteristics like N-gram clues among splitting positions. In this paper, to supplement splitting methods based on word-sequence characteristics, we introduce another clue using similarity based on edit-distance. In our splitting method, we generate candidates for utterance splitting based on N-grams, and select the best one by measuring the utterance similarity against a corpus. This selection is founded on the assumption that a corpus-based MT system can correctly translate an utterance that is similar to an utterance in its training corpus. We conducted experiments using three MT systems: two EBMT systems, one of which uses a phrase as a translation unit and the other of which uses an utterance, and an SMT system. The translation results under various conditions were evaluated by objective measures and a subjective measure. The experimental results demonstrate that the proposed method is valuable for the three systems. Using utterance similarity can improve the translation quality.

Publication
IEICE TRANSACTIONS on Information Vol.E88-D No.6 pp.1256-1264
Publication Date
2005/06/01
Publicized
Online ISSN
DOI
10.1093/ietisy/e88-d.6.1256
Type of Manuscript
PAPER
Category
Natural Language Processing

Authors

Keyword