An improved method for extracting translation equivalents from bilingual comparable corpora according to contextual similarity was developed. This method has two main features. First, a seed bilingual lexicon--which is used to bridge contexts in different languages--is adapted to the corpora from which translation equivalents are to be extracted. Second, the contextual similarity is evaluated by using a combination of similarity measures defined in opposite directions. An experiment using Wall Street Journal and Nihon Keizai Shimbun corpora, together with the EDR bilingual dictionary, demonstrated the effectiveness of the method; it produced lists of candidate translation equivalents with an accuracy of around 30% for frequently occurring unknown words. The method thus proved to be useful for improving the coverage of a bilingual lexicon.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Hiroyuki KAJI, "Extracting Translation Equivalents from Bilingual Comparable Corpora" in IEICE TRANSACTIONS on Information,
vol. E88-D, no. 2, pp. 313-323, February 2005, doi: 10.1093/ietisy/e88-d.2.313.
Abstract: An improved method for extracting translation equivalents from bilingual comparable corpora according to contextual similarity was developed. This method has two main features. First, a seed bilingual lexicon--which is used to bridge contexts in different languages--is adapted to the corpora from which translation equivalents are to be extracted. Second, the contextual similarity is evaluated by using a combination of similarity measures defined in opposite directions. An experiment using Wall Street Journal and Nihon Keizai Shimbun corpora, together with the EDR bilingual dictionary, demonstrated the effectiveness of the method; it produced lists of candidate translation equivalents with an accuracy of around 30% for frequently occurring unknown words. The method thus proved to be useful for improving the coverage of a bilingual lexicon.
URL: https://global.ieice.org/en_transactions/information/10.1093/ietisy/e88-d.2.313/_p
Copy
@ARTICLE{e88-d_2_313,
author={Hiroyuki KAJI, },
journal={IEICE TRANSACTIONS on Information},
title={Extracting Translation Equivalents from Bilingual Comparable Corpora},
year={2005},
volume={E88-D},
number={2},
pages={313-323},
abstract={An improved method for extracting translation equivalents from bilingual comparable corpora according to contextual similarity was developed. This method has two main features. First, a seed bilingual lexicon--which is used to bridge contexts in different languages--is adapted to the corpora from which translation equivalents are to be extracted. Second, the contextual similarity is evaluated by using a combination of similarity measures defined in opposite directions. An experiment using Wall Street Journal and Nihon Keizai Shimbun corpora, together with the EDR bilingual dictionary, demonstrated the effectiveness of the method; it produced lists of candidate translation equivalents with an accuracy of around 30% for frequently occurring unknown words. The method thus proved to be useful for improving the coverage of a bilingual lexicon.},
keywords={},
doi={10.1093/ietisy/e88-d.2.313},
ISSN={},
month={February},}
Copy
TY - JOUR
TI - Extracting Translation Equivalents from Bilingual Comparable Corpora
T2 - IEICE TRANSACTIONS on Information
SP - 313
EP - 323
AU - Hiroyuki KAJI
PY - 2005
DO - 10.1093/ietisy/e88-d.2.313
JO - IEICE TRANSACTIONS on Information
SN -
VL - E88-D
IS - 2
JA - IEICE TRANSACTIONS on Information
Y1 - February 2005
AB - An improved method for extracting translation equivalents from bilingual comparable corpora according to contextual similarity was developed. This method has two main features. First, a seed bilingual lexicon--which is used to bridge contexts in different languages--is adapted to the corpora from which translation equivalents are to be extracted. Second, the contextual similarity is evaluated by using a combination of similarity measures defined in opposite directions. An experiment using Wall Street Journal and Nihon Keizai Shimbun corpora, together with the EDR bilingual dictionary, demonstrated the effectiveness of the method; it produced lists of candidate translation equivalents with an accuracy of around 30% for frequently occurring unknown words. The method thus proved to be useful for improving the coverage of a bilingual lexicon.
ER -