The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] parallel corpus(4hit)

1-4hit
  • Research on Mongolian-Chinese Translation Model Based on Transformer with Soft Context Data Augmentation Technique

    Qing-dao-er-ji REN  Yuan LI  Shi BAO  Yong-chao LIU  Xiu-hong CHEN  

     
    PAPER-Neural Networks and Bioengineering

      Pubricized:
    2021/11/19
      Vol:
    E105-A No:5
      Page(s):
    871-876

    As the mainstream approach in the field of machine translation, neural machine translation (NMT) has achieved great improvements on many rich-source languages, but performance of NMT for low-resource languages ae not very good yet. This paper uses data enhancement technology to construct Mongolian-Chinese pseudo parallel corpus, so as to improve the translation ability of Mongolian-Chinese translation model. Experiments show that the above methods can improve the translation ability of the translation model. Finally, a translation model trained with large-scale pseudo parallel corpus and integrated with soft context data enhancement technology is obtained, and its BLEU value is 39.3.

  • Automatically Extracting Parallel Sentences from Wikipedia Using Sequential Matching of Language Resources

    Juryong CHEON  Youngjoong KO  

     
    LETTER-Natural Language Processing

      Pubricized:
    2016/11/11
      Vol:
    E100-D No:2
      Page(s):
    405-408

    In this paper, we propose a method to find similar sentences based on language resources for building a parallel corpus between English and Korean from Wikipedia. We use a Wiki-dictionary consisted of document titles from the Wikipedia and bilingual example sentence pairs from Web dictionary instead of traditional machine readable dictionary. In this way, we perform similarity calculation between sentences using sequential matching of the language resources, and evaluate the extracted parallel sentences. In the experiments, the proposed parallel sentences extraction method finally shows 65.4% of F1-score.

  • Correcting Syntactic Annotation Errors Using a Synchronous Tree Substitution Grammar

    Yoshihide KATO  Shigeki MATSUBARA  

     
    LETTER-Natural Language Processing

      Vol:
    E93-D No:9
      Page(s):
    2660-2663

    This paper proposes a method of correcting annotation errors in a treebank. By using a synchronous grammar, the method transforms parse trees containing annotation errors into the ones whose errors are corrected. The synchronous grammar is automatically induced from the treebank. We report an experimental result of applying our method to the Penn Treebank. The result demonstrates that our method corrects syntactic annotation errors with high precision.

  • Two-Step Extraction of Bilingual Collocations by Using Word-Level Sorting

    Masahiko HARUNO  Satoru IKEHARA  

     
    PAPER-Artificial Intelligence and Cognitive Science

      Vol:
    E81-D No:10
      Page(s):
    1103-1110

    This paper describes a new method for learning bilingual collocations from sentence-aligned parallel corpora. Our method comprises two steps: (1) extracting useful word chunks (n-grams) in each language by word-level sorting and (2) constructing bilingual collocations by combining the word-chunks acquired in stage (1). We apply the method to a two kinds of Japanese-English texts; (1) scientific articles that comprise relatively literal translations and (2) more challenging texts: a stock market bulletin in Japanese and its abstract in English. In both cases, domain specific collocations are well captured even if they were not contained in the dictionaries of specialized terms.