Automatic alignment of bilingual texts is useful to example-based machine translation by facilitating the creation of example pairs of translation for the machine. Two main approaches to automatic alignment have been reported in the literature. They are lexical approach and statistical approach. The former looks for relationships between lexical contents of the bilingual texts in order to find alignment pairs, while the latter uses statistical correlation between sentence lengths of the bilingual texts as the basis of matching. This paper describes a combination of the two approaches in aligning Japanese-Cinese bilingual texts by allowing kanji contents and sentence lengths in the texts to work together in achieving an alignment process. Because of the sentential structure differences between Japanese and Chinese, matching at the sentence level may result in frequent matching between a number of sentences en masses. In view of this, the current work also attempts to create shorter alignment pairs by permitting sentences to be matched with clauses or phrases of the other text if possible. While such matching is more difficult and error-prone, the reliance on kanji contents has proven to be very useful in minimizing the errors. The current research has thus found solutions to problems that are unique to the present work.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Chew Lim TAN, Makoto NAGAO, "Automatic Alignment of Japanese-Chinese Bilingual Texts" in IEICE TRANSACTIONS on Information,
vol. E78-D, no. 1, pp. 68-76, January 1995, doi: .
Abstract: Automatic alignment of bilingual texts is useful to example-based machine translation by facilitating the creation of example pairs of translation for the machine. Two main approaches to automatic alignment have been reported in the literature. They are lexical approach and statistical approach. The former looks for relationships between lexical contents of the bilingual texts in order to find alignment pairs, while the latter uses statistical correlation between sentence lengths of the bilingual texts as the basis of matching. This paper describes a combination of the two approaches in aligning Japanese-Cinese bilingual texts by allowing kanji contents and sentence lengths in the texts to work together in achieving an alignment process. Because of the sentential structure differences between Japanese and Chinese, matching at the sentence level may result in frequent matching between a number of sentences en masses. In view of this, the current work also attempts to create shorter alignment pairs by permitting sentences to be matched with clauses or phrases of the other text if possible. While such matching is more difficult and error-prone, the reliance on kanji contents has proven to be very useful in minimizing the errors. The current research has thus found solutions to problems that are unique to the present work.
URL: https://global.ieice.org/en_transactions/information/10.1587/e78-d_1_68/_p
Copy
@ARTICLE{e78-d_1_68,
author={Chew Lim TAN, Makoto NAGAO, },
journal={IEICE TRANSACTIONS on Information},
title={Automatic Alignment of Japanese-Chinese Bilingual Texts},
year={1995},
volume={E78-D},
number={1},
pages={68-76},
abstract={Automatic alignment of bilingual texts is useful to example-based machine translation by facilitating the creation of example pairs of translation for the machine. Two main approaches to automatic alignment have been reported in the literature. They are lexical approach and statistical approach. The former looks for relationships between lexical contents of the bilingual texts in order to find alignment pairs, while the latter uses statistical correlation between sentence lengths of the bilingual texts as the basis of matching. This paper describes a combination of the two approaches in aligning Japanese-Cinese bilingual texts by allowing kanji contents and sentence lengths in the texts to work together in achieving an alignment process. Because of the sentential structure differences between Japanese and Chinese, matching at the sentence level may result in frequent matching between a number of sentences en masses. In view of this, the current work also attempts to create shorter alignment pairs by permitting sentences to be matched with clauses or phrases of the other text if possible. While such matching is more difficult and error-prone, the reliance on kanji contents has proven to be very useful in minimizing the errors. The current research has thus found solutions to problems that are unique to the present work.},
keywords={},
doi={},
ISSN={},
month={January},}
Copy
TY - JOUR
TI - Automatic Alignment of Japanese-Chinese Bilingual Texts
T2 - IEICE TRANSACTIONS on Information
SP - 68
EP - 76
AU - Chew Lim TAN
AU - Makoto NAGAO
PY - 1995
DO -
JO - IEICE TRANSACTIONS on Information
SN -
VL - E78-D
IS - 1
JA - IEICE TRANSACTIONS on Information
Y1 - January 1995
AB - Automatic alignment of bilingual texts is useful to example-based machine translation by facilitating the creation of example pairs of translation for the machine. Two main approaches to automatic alignment have been reported in the literature. They are lexical approach and statistical approach. The former looks for relationships between lexical contents of the bilingual texts in order to find alignment pairs, while the latter uses statistical correlation between sentence lengths of the bilingual texts as the basis of matching. This paper describes a combination of the two approaches in aligning Japanese-Cinese bilingual texts by allowing kanji contents and sentence lengths in the texts to work together in achieving an alignment process. Because of the sentential structure differences between Japanese and Chinese, matching at the sentence level may result in frequent matching between a number of sentences en masses. In view of this, the current work also attempts to create shorter alignment pairs by permitting sentences to be matched with clauses or phrases of the other text if possible. While such matching is more difficult and error-prone, the reliance on kanji contents has proven to be very useful in minimizing the errors. The current research has thus found solutions to problems that are unique to the present work.
ER -