Inference Discrepancy Based Curriculum Learning for Neural Machine Translation

Lei ZHOU; Ryohei SASANO; Koichi TAKEDA

doi:10.1587/transinf.2023EDP7048

Inference Discrepancy Based Curriculum Learning for Neural Machine Translation

Lei ZHOU, Ryohei SASANO, Koichi TAKEDA

Full Text Views

0

Cite this

Summary :

In practice, even a well-trained neural machine translation (NMT) model can still make biased inferences on the training set due to distribution shifts. For the human learning process, if we can not reproduce something correctly after learning it multiple times, we consider it to be more difficult. Likewise, a training example causing a large discrepancy between inference and reference implies higher learning difficulty for the MT model. Therefore, we propose to adopt the inference discrepancy of each training example as the difficulty criterion, and according to which rank training examples from easy to hard. In this way, a trained model can guide the curriculum learning process of an initial model identical to itself. We put forward an analogy to this training scheme as guiding the learning process of a curriculum NMT model by a pretrained vanilla model. In this paper, we assess the effectiveness of the proposed training scheme and take an insight into the influence of translation direction, evaluation metrics and different curriculum schedules. Experimental results on translation benchmarks WMT14 English ⇒ German, WMT17 Chinese ⇒ English and Multitarget TED Talks Task (MTTT) English ⇔ German, English ⇔ Chinese, English ⇔ Russian demonstrate that our proposed method consistently improves the translation performance against the advanced Transformer baseline.

Publication: IEICE TRANSACTIONS on Information Vol.E107-D No.1 pp.135-143

Publication Date: 2024/01/01

Publicized: 2023/10/18

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2023EDP7048

Type of Manuscript: PAPER

Category: Natural Language Processing

Authors

Lei ZHOU
  Nagoya University
Ryohei SASANO
  Nagoya University
Koichi TAKEDA
  Nagoya University

Keyword

curriculum learning, machine translation, inference discrepancy, self-paced learning

Cite this

Copy

Lei ZHOU, Ryohei SASANO, Koichi TAKEDA, "Inference Discrepancy Based Curriculum Learning for Neural Machine Translation" in IEICE TRANSACTIONS on Information, vol. E107-D, no. 1, pp. 135-143, January 2024, doi: 10.1587/transinf.2023EDP7048.
Abstract: In practice, even a well-trained neural machine translation (NMT) model can still make biased inferences on the training set due to distribution shifts. For the human learning process, if we can not reproduce something correctly after learning it multiple times, we consider it to be more difficult. Likewise, a training example causing a large discrepancy between inference and reference implies higher learning difficulty for the MT model. Therefore, we propose to adopt the inference discrepancy of each training example as the difficulty criterion, and according to which rank training examples from easy to hard. In this way, a trained model can guide the curriculum learning process of an initial model identical to itself. We put forward an analogy to this training scheme as guiding the learning process of a curriculum NMT model by a pretrained vanilla model. In this paper, we assess the effectiveness of the proposed training scheme and take an insight into the influence of translation direction, evaluation metrics and different curriculum schedules. Experimental results on translation benchmarks WMT14 English ⇒ German, WMT17 Chinese ⇒ English and Multitarget TED Talks Task (MTTT) English ⇔ German, English ⇔ Chinese, English ⇔ Russian demonstrate that our proposed method consistently improves the translation performance against the advanced Transformer baseline.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2023EDP7048/_p

Copy

@ARTICLE{e107-d_1_135,
author={Lei ZHOU, Ryohei SASANO, Koichi TAKEDA, },
journal={IEICE TRANSACTIONS on Information},
title={Inference Discrepancy Based Curriculum Learning for Neural Machine Translation},
year={2024},
volume={E107-D},
number={1},
pages={135-143},
abstract={In practice, even a well-trained neural machine translation (NMT) model can still make biased inferences on the training set due to distribution shifts. For the human learning process, if we can not reproduce something correctly after learning it multiple times, we consider it to be more difficult. Likewise, a training example causing a large discrepancy between inference and reference implies higher learning difficulty for the MT model. Therefore, we propose to adopt the inference discrepancy of each training example as the difficulty criterion, and according to which rank training examples from easy to hard. In this way, a trained model can guide the curriculum learning process of an initial model identical to itself. We put forward an analogy to this training scheme as guiding the learning process of a curriculum NMT model by a pretrained vanilla model. In this paper, we assess the effectiveness of the proposed training scheme and take an insight into the influence of translation direction, evaluation metrics and different curriculum schedules. Experimental results on translation benchmarks WMT14 English ⇒ German, WMT17 Chinese ⇒ English and Multitarget TED Talks Task (MTTT) English ⇔ German, English ⇔ Chinese, English ⇔ Russian demonstrate that our proposed method consistently improves the translation performance against the advanced Transformer baseline.},
keywords={},
doi={10.1587/transinf.2023EDP7048},
ISSN={1745-1361},
month={January},}

Copy

TY - JOUR
TI - Inference Discrepancy Based Curriculum Learning for Neural Machine Translation
T2 - IEICE TRANSACTIONS on Information
SP - 135
EP - 143
AU - Lei ZHOU
AU - Ryohei SASANO
AU - Koichi TAKEDA
PY - 2024
DO - 10.1587/transinf.2023EDP7048
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E107-D
IS - 1
JA - IEICE TRANSACTIONS on Information
Y1 - January 2024
AB - In practice, even a well-trained neural machine translation (NMT) model can still make biased inferences on the training set due to distribution shifts. For the human learning process, if we can not reproduce something correctly after learning it multiple times, we consider it to be more difficult. Likewise, a training example causing a large discrepancy between inference and reference implies higher learning difficulty for the MT model. Therefore, we propose to adopt the inference discrepancy of each training example as the difficulty criterion, and according to which rank training examples from easy to hard. In this way, a trained model can guide the curriculum learning process of an initial model identical to itself. We put forward an analogy to this training scheme as guiding the learning process of a curriculum NMT model by a pretrained vanilla model. In this paper, we assess the effectiveness of the proposed training scheme and take an insight into the influence of translation direction, evaluation metrics and different curriculum schedules. Experimental results on translation benchmarks WMT14 English ⇒ German, WMT17 Chinese ⇒ English and Multitarget TED Talks Task (MTTT) English ⇔ German, English ⇔ Chinese, English ⇔ Russian demonstrate that our proposed method consistently improves the translation performance against the advanced Transformer baseline.
ER -