1-4hit |
Qing-dao-er-ji REN Yuan LI Shi BAO Yong-chao LIU Xiu-hong CHEN
As the mainstream approach in the field of machine translation, neural machine translation (NMT) has achieved great improvements on many rich-source languages, but performance of NMT for low-resource languages ae not very good yet. This paper uses data enhancement technology to construct Mongolian-Chinese pseudo parallel corpus, so as to improve the translation ability of Mongolian-Chinese translation model. Experiments show that the above methods can improve the translation ability of the translation model. Finally, a translation model trained with large-scale pseudo parallel corpus and integrated with soft context data enhancement technology is obtained, and its BLEU value is 39.3.
In this paper, we propose a method to find similar sentences based on language resources for building a parallel corpus between English and Korean from Wikipedia. We use a Wiki-dictionary consisted of document titles from the Wikipedia and bilingual example sentence pairs from Web dictionary instead of traditional machine readable dictionary. In this way, we perform similarity calculation between sentences using sequential matching of the language resources, and evaluate the extracted parallel sentences. In the experiments, the proposed parallel sentences extraction method finally shows 65.4% of F1-score.
Yoshihide KATO Shigeki MATSUBARA
This paper proposes a method of correcting annotation errors in a treebank. By using a synchronous grammar, the method transforms parse trees containing annotation errors into the ones whose errors are corrected. The synchronous grammar is automatically induced from the treebank. We report an experimental result of applying our method to the Penn Treebank. The result demonstrates that our method corrects syntactic annotation errors with high precision.
Masahiko HARUNO Satoru IKEHARA
This paper describes a new method for learning bilingual collocations from sentence-aligned parallel corpora. Our method comprises two steps: (1) extracting useful word chunks (n-grams) in each language by word-level sorting and (2) constructing bilingual collocations by combining the word-chunks acquired in stage (1). We apply the method to a two kinds of Japanese-English texts; (1) scientific articles that comprise relatively literal translations and (2) more challenging texts: a stock market bulletin in Japanese and its abstract in English. In both cases, domain specific collocations are well captured even if they were not contained in the dictionaries of specialized terms.