The search functionality is under construction.
The search functionality is under construction.

Automatically Extracting Parallel Sentences from Wikipedia Using Sequential Matching of Language Resources

Juryong CHEON, Youngjoong KO

  • Full Text Views

    0

  • Cite this

Summary :

In this paper, we propose a method to find similar sentences based on language resources for building a parallel corpus between English and Korean from Wikipedia. We use a Wiki-dictionary consisted of document titles from the Wikipedia and bilingual example sentence pairs from Web dictionary instead of traditional machine readable dictionary. In this way, we perform similarity calculation between sentences using sequential matching of the language resources, and evaluate the extracted parallel sentences. In the experiments, the proposed parallel sentences extraction method finally shows 65.4% of F1-score.

Publication
IEICE TRANSACTIONS on Information Vol.E100-D No.2 pp.405-408
Publication Date
2017/02/01
Publicized
2016/11/11
Online ISSN
1745-1361
DOI
10.1587/transinf.2016EDL8135
Type of Manuscript
LETTER
Category
Natural Language Processing

Authors

Juryong CHEON
  Dong-A University
Youngjoong KO
  Dong-A University

Keyword