Since deep learning was introduced, a series of achievements has been published in the field of automatic machine translation (MT). However, Korean-Vietnamese MT systems face many challenges because of a lack of data, multiple meanings of individual words, and grammatical diversity that depends on context. Therefore, the quality of Korean-Vietnamese MT systems is still sub-optimal. This paper discusses a method for applying Named Entity Recognition (NER) and Part-of-Speech (POS) tagging to Vietnamese sentences to improve the performance of Korean-Vietnamese MT systems. In terms of implementation, we used a tool to tag NER and POS in Vietnamese sentences. In addition, we had access to a Korean-Vietnamese parallel corpus with more than 450K paired sentences from our previous research paper. The experimental results indicate that tagging NER and POS in Vietnamese sentences can improve the quality of Korean-Vietnamese Neural MT (NMT) in terms of the Bi-Lingual Evaluation Understudy (BLEU) and Translation Error Rate (TER) score. On average, our MT system improved by 1.21 BLEU points or 2.33 TER scores after applying both NER and POS tagging to the Vietnamese corpus. Due to the structural features of language, the MT systems in the Korean to Vietnamese direction always give better BLEU and TER results than translation machines in the reverse direction.
Van-Hai VU
University of Ulsan
Quang-Phuoc NGUYEN
University of Ulsan
Kiem-Hieu NGUYEN
Hanoi University of Science and Technology
Joon-Choul SHIN
University of Ulsan
Cheol-Young OCK
University of Ulsan
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Van-Hai VU, Quang-Phuoc NGUYEN, Kiem-Hieu NGUYEN, Joon-Choul SHIN, Cheol-Young OCK, "Korean-Vietnamese Neural Machine Translation with Named Entity Recognition and Part-of-Speech Tags" in IEICE TRANSACTIONS on Information,
vol. E103-D, no. 4, pp. 866-873, April 2020, doi: 10.1587/transinf.2019EDP7154.
Abstract: Since deep learning was introduced, a series of achievements has been published in the field of automatic machine translation (MT). However, Korean-Vietnamese MT systems face many challenges because of a lack of data, multiple meanings of individual words, and grammatical diversity that depends on context. Therefore, the quality of Korean-Vietnamese MT systems is still sub-optimal. This paper discusses a method for applying Named Entity Recognition (NER) and Part-of-Speech (POS) tagging to Vietnamese sentences to improve the performance of Korean-Vietnamese MT systems. In terms of implementation, we used a tool to tag NER and POS in Vietnamese sentences. In addition, we had access to a Korean-Vietnamese parallel corpus with more than 450K paired sentences from our previous research paper. The experimental results indicate that tagging NER and POS in Vietnamese sentences can improve the quality of Korean-Vietnamese Neural MT (NMT) in terms of the Bi-Lingual Evaluation Understudy (BLEU) and Translation Error Rate (TER) score. On average, our MT system improved by 1.21 BLEU points or 2.33 TER scores after applying both NER and POS tagging to the Vietnamese corpus. Due to the structural features of language, the MT systems in the Korean to Vietnamese direction always give better BLEU and TER results than translation machines in the reverse direction.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2019EDP7154/_p
Copy
@ARTICLE{e103-d_4_866,
author={Van-Hai VU, Quang-Phuoc NGUYEN, Kiem-Hieu NGUYEN, Joon-Choul SHIN, Cheol-Young OCK, },
journal={IEICE TRANSACTIONS on Information},
title={Korean-Vietnamese Neural Machine Translation with Named Entity Recognition and Part-of-Speech Tags},
year={2020},
volume={E103-D},
number={4},
pages={866-873},
abstract={Since deep learning was introduced, a series of achievements has been published in the field of automatic machine translation (MT). However, Korean-Vietnamese MT systems face many challenges because of a lack of data, multiple meanings of individual words, and grammatical diversity that depends on context. Therefore, the quality of Korean-Vietnamese MT systems is still sub-optimal. This paper discusses a method for applying Named Entity Recognition (NER) and Part-of-Speech (POS) tagging to Vietnamese sentences to improve the performance of Korean-Vietnamese MT systems. In terms of implementation, we used a tool to tag NER and POS in Vietnamese sentences. In addition, we had access to a Korean-Vietnamese parallel corpus with more than 450K paired sentences from our previous research paper. The experimental results indicate that tagging NER and POS in Vietnamese sentences can improve the quality of Korean-Vietnamese Neural MT (NMT) in terms of the Bi-Lingual Evaluation Understudy (BLEU) and Translation Error Rate (TER) score. On average, our MT system improved by 1.21 BLEU points or 2.33 TER scores after applying both NER and POS tagging to the Vietnamese corpus. Due to the structural features of language, the MT systems in the Korean to Vietnamese direction always give better BLEU and TER results than translation machines in the reverse direction.},
keywords={},
doi={10.1587/transinf.2019EDP7154},
ISSN={1745-1361},
month={April},}
Copy
TY - JOUR
TI - Korean-Vietnamese Neural Machine Translation with Named Entity Recognition and Part-of-Speech Tags
T2 - IEICE TRANSACTIONS on Information
SP - 866
EP - 873
AU - Van-Hai VU
AU - Quang-Phuoc NGUYEN
AU - Kiem-Hieu NGUYEN
AU - Joon-Choul SHIN
AU - Cheol-Young OCK
PY - 2020
DO - 10.1587/transinf.2019EDP7154
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E103-D
IS - 4
JA - IEICE TRANSACTIONS on Information
Y1 - April 2020
AB - Since deep learning was introduced, a series of achievements has been published in the field of automatic machine translation (MT). However, Korean-Vietnamese MT systems face many challenges because of a lack of data, multiple meanings of individual words, and grammatical diversity that depends on context. Therefore, the quality of Korean-Vietnamese MT systems is still sub-optimal. This paper discusses a method for applying Named Entity Recognition (NER) and Part-of-Speech (POS) tagging to Vietnamese sentences to improve the performance of Korean-Vietnamese MT systems. In terms of implementation, we used a tool to tag NER and POS in Vietnamese sentences. In addition, we had access to a Korean-Vietnamese parallel corpus with more than 450K paired sentences from our previous research paper. The experimental results indicate that tagging NER and POS in Vietnamese sentences can improve the quality of Korean-Vietnamese Neural MT (NMT) in terms of the Bi-Lingual Evaluation Understudy (BLEU) and Translation Error Rate (TER) score. On average, our MT system improved by 1.21 BLEU points or 2.33 TER scores after applying both NER and POS tagging to the Vietnamese corpus. Due to the structural features of language, the MT systems in the Korean to Vietnamese direction always give better BLEU and TER results than translation machines in the reverse direction.
ER -