IEICE global.ieice.org Site

Keyword Search Result

[Keyword] parallel corpus(4hit)

1-4hit

Research on Mongolian-Chinese Translation Model Based on Transformer with Soft Context Data Augmentation Technique
Qing-dao-er-ji REN Yuan LI Shi BAO Yong-chao LIU Xiu-hong CHEN

PAPER-Neural Networks and Bioengineering

Pubricized:
2021/11/19
Vol:
E105-A No:5
Page(s):
871-876
As the mainstream approach in the field of machine translation, neural machine translation (NMT) has achieved great improvements on many rich-source languages, but performance of NMT for low-resource languages ae not very good yet. This paper uses data enhancement technology to construct Mongolian-Chinese pseudo parallel corpus, so as to improve the translation ability of Mongolian-Chinese translation model. Experiments show that the above methods can improve the translation ability of the translation model. Finally, a translation model trained with large-scale pseudo parallel corpus and integrated with soft context data enhancement technology is obtained, and its BLEU value is 39.3.
Automatically Extracting Parallel Sentences from Wikipedia Using Sequential Matching of Language Resources
Juryong CHEON Youngjoong KO

LETTER-Natural Language Processing

Pubricized:
2016/11/11
Vol:
E100-D No:2
Page(s):
405-408
In this paper, we propose a method to find similar sentences based on language resources for building a parallel corpus between English and Korean from Wikipedia. We use a Wiki-dictionary consisted of document titles from the Wikipedia and bilingual example sentence pairs from Web dictionary instead of traditional machine readable dictionary. In this way, we perform similarity calculation between sentences using sequential matching of the language resources, and evaluate the extracted parallel sentences. In the experiments, the proposed parallel sentences extraction method finally shows 65.4% of F1-score.
Correcting Syntactic Annotation Errors Using a Synchronous Tree Substitution Grammar
Yoshihide KATO Shigeki MATSUBARA

LETTER-Natural Language Processing

Vol:
E93-D No:9
Page(s):
2660-2663
This paper proposes a method of correcting annotation errors in a treebank. By using a synchronous grammar, the method transforms parse trees containing annotation errors into the ones whose errors are corrected. The synchronous grammar is automatically induced from the treebank. We report an experimental result of applying our method to the Penn Treebank. The result demonstrates that our method corrects syntactic annotation errors with high precision.
Two-Step Extraction of Bilingual Collocations by Using Word-Level Sorting
Masahiko HARUNO Satoru IKEHARA

PAPER-Artificial Intelligence and Cognitive Science

Vol:
E81-D No:10
Page(s):
1103-1110
This paper describes a new method for learning bilingual collocations from sentence-aligned parallel corpora. Our method comprises two steps: (1) extracting useful word chunks (n-grams) in each language by word-level sorting and (2) constructing bilingual collocations by combining the word-chunks acquired in stage (1). We apply the method to a two kinds of Japanese-English texts; (1) scientific articles that comprise relatively literal translations and (2) more challenging texts: a stock market bulletin in Japanese and its abstract in English. In both cases, domain specific collocations are well captured even if they were not contained in the dictionaries of specialized terms.