Splitting Input for Machine Translation Using N-gram Language Model Together with Utterance Similarity

Takao DOI; Eiichiro SUMITA

doi:10.1093/ietisy/e88-d.6.1256

IEICE TRANSACTIONS on Information

Splitting Input for Machine Translation Using N-gram Language Model Together with Utterance Similarity

Takao DOI, Eiichiro SUMITA

Full Text Views

0

Cite this

Summary :

In order to boost the translation quality of corpus-based MT systems for speech translation, the technique of splitting an input utterance appears promising. In previous research, many methods used word-sequence characteristics like N-gram clues among splitting positions. In this paper, to supplement splitting methods based on word-sequence characteristics, we introduce another clue using similarity based on edit-distance. In our splitting method, we generate candidates for utterance splitting based on N-grams, and select the best one by measuring the utterance similarity against a corpus. This selection is founded on the assumption that a corpus-based MT system can correctly translate an utterance that is similar to an utterance in its training corpus. We conducted experiments using three MT systems: two EBMT systems, one of which uses a phrase as a translation unit and the other of which uses an utterance, and an SMT system. The translation results under various conditions were evaluated by objective measures and a subjective measure. The experimental results demonstrate that the proposed method is valuable for the three systems. Using utterance similarity can improve the translation quality.

Publication: IEICE TRANSACTIONS on Information Vol.E88-D No.6 pp.1256-1264

Publication Date: 2005/06/01

Publicized

Online ISSN

DOI: 10.1093/ietisy/e88-d.6.1256

Type of Manuscript: PAPER

Category: Natural Language Processing

Cite this

Copy

Takao DOI, Eiichiro SUMITA, "Splitting Input for Machine Translation Using N-gram Language Model Together with Utterance Similarity" in IEICE TRANSACTIONS on Information, vol. E88-D, no. 6, pp. 1256-1264, June 2005, doi: 10.1093/ietisy/e88-d.6.1256.
Abstract: In order to boost the translation quality of corpus-based MT systems for speech translation, the technique of splitting an input utterance appears promising. In previous research, many methods used word-sequence characteristics like N-gram clues among splitting positions. In this paper, to supplement splitting methods based on word-sequence characteristics, we introduce another clue using similarity based on edit-distance. In our splitting method, we generate candidates for utterance splitting based on N-grams, and select the best one by measuring the utterance similarity against a corpus. This selection is founded on the assumption that a corpus-based MT system can correctly translate an utterance that is similar to an utterance in its training corpus. We conducted experiments using three MT systems: two EBMT systems, one of which uses a phrase as a translation unit and the other of which uses an utterance, and an SMT system. The translation results under various conditions were evaluated by objective measures and a subjective measure. The experimental results demonstrate that the proposed method is valuable for the three systems. Using utterance similarity can improve the translation quality.
URL: https://global.ieice.org/en_transactions/information/10.1093/ietisy/e88-d.6.1256/_p

Copy

@ARTICLE{e88-d_6_1256,
author={Takao DOI, Eiichiro SUMITA, },
journal={IEICE TRANSACTIONS on Information},
title={Splitting Input for Machine Translation Using N-gram Language Model Together with Utterance Similarity},
year={2005},
volume={E88-D},
number={6},
pages={1256-1264},
abstract={In order to boost the translation quality of corpus-based MT systems for speech translation, the technique of splitting an input utterance appears promising. In previous research, many methods used word-sequence characteristics like N-gram clues among splitting positions. In this paper, to supplement splitting methods based on word-sequence characteristics, we introduce another clue using similarity based on edit-distance. In our splitting method, we generate candidates for utterance splitting based on N-grams, and select the best one by measuring the utterance similarity against a corpus. This selection is founded on the assumption that a corpus-based MT system can correctly translate an utterance that is similar to an utterance in its training corpus. We conducted experiments using three MT systems: two EBMT systems, one of which uses a phrase as a translation unit and the other of which uses an utterance, and an SMT system. The translation results under various conditions were evaluated by objective measures and a subjective measure. The experimental results demonstrate that the proposed method is valuable for the three systems. Using utterance similarity can improve the translation quality.},
keywords={},
doi={10.1093/ietisy/e88-d.6.1256},
ISSN={},
month={June},}

Copy

TY - JOUR
TI - Splitting Input for Machine Translation Using N-gram Language Model Together with Utterance Similarity
T2 - IEICE TRANSACTIONS on Information
SP - 1256
EP - 1264
AU - Takao DOI
AU - Eiichiro SUMITA
PY - 2005
DO - 10.1093/ietisy/e88-d.6.1256
JO - IEICE TRANSACTIONS on Information
SN -
VL - E88-D
IS - 6
JA - IEICE TRANSACTIONS on Information
Y1 - June 2005
AB - In order to boost the translation quality of corpus-based MT systems for speech translation, the technique of splitting an input utterance appears promising. In previous research, many methods used word-sequence characteristics like N-gram clues among splitting positions. In this paper, to supplement splitting methods based on word-sequence characteristics, we introduce another clue using similarity based on edit-distance. In our splitting method, we generate candidates for utterance splitting based on N-grams, and select the best one by measuring the utterance similarity against a corpus. This selection is founded on the assumption that a corpus-based MT system can correctly translate an utterance that is similar to an utterance in its training corpus. We conducted experiments using three MT systems: two EBMT systems, one of which uses a phrase as a translation unit and the other of which uses an utterance, and an SMT system. The translation results under various conditions were evaluated by objective measures and a subjective measure. The experimental results demonstrate that the proposed method is valuable for the three systems. Using utterance similarity can improve the translation quality.
ER -

IEICE TRANSACTIONS on Information

Splitting Input for Machine Translation Using N-gram Language Model Together with Utterance Similarity

Summary :

Authors

Keyword

Latest Issue

Contents

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles

IEICE TRANSACTIONS on Information

Splitting Input for Machine Translation Using N-gram Language Model Together with Utterance Similarity

Summary :

Authors

Keyword

Latest Issue

Contents

Copyrights notice of machine-translated contents

Cite this

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles