Leveraging Neural Caption Translation with Visually Grounded Paraphrase Augmentation

Johanes EFFENDI; Sakriani SAKTI; Katsuhito SUDOH; Satoshi NAKAMURA

doi:10.1587/transinf.2019EDP7065

IEICE TRANSACTIONS on Information

Leveraging Neural Caption Translation with Visually Grounded Paraphrase Augmentation

Johanes EFFENDI, Sakriani SAKTI, Katsuhito SUDOH, Satoshi NAKAMURA

Full Text Views

0

Cite this

Summary :

Since a concept can be represented by different vocabularies, styles, and levels of detail, a translation task resembles a many-to-many mapping task from a distribution of sentences in the source language into a distribution of sentences in the target language. This viewpoint, however, is not fully implemented in current neural machine translation (NMT), which is one-to-one sentence mapping. In this study, we represent the distribution itself as multiple paraphrase sentences, which will enrich the model context understanding and trigger it to produce numerous hypotheses. We use a visually grounded paraphrase (VGP), which uses images as a constraint of the concept in paraphrasing, to guarantee that the created paraphrases are within the intended distribution. In this way, our method can also be considered as incorporating image information into NMT without using the image itself. We implement this idea by crowdsourcing a paraphrasing corpus that realizes VGP and construct neural paraphrasing that behaves as expert models in a NMT. Our experimental results reveal that our proposed VGP augmentation strategies showed improvement against a vanilla NMT baseline.

Publication: IEICE TRANSACTIONS on Information Vol.E103-D No.3 pp.674-683

Publication Date: 2020/03/01

Publicized: 2019/11/25

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2019EDP7065

Type of Manuscript: PAPER

Category: Natural Language Processing

Authors

Johanes EFFENDI
  Nara Institute of Science and Technology,RIKEN, Center for Advanced Intelligence Project AIP
Sakriani SAKTI
  Nara Institute of Science and Technology,RIKEN, Center for Advanced Intelligence Project AIP
Katsuhito SUDOH
  Nara Institute of Science and Technology,RIKEN, Center for Advanced Intelligence Project AIP
Satoshi NAKAMURA
  Nara Institute of Science and Technology,RIKEN, Center for Advanced Intelligence Project AIP

Keyword

visually grounded paraphrase, data augmentation, neural machine translation

Cite this

Copy

Johanes EFFENDI, Sakriani SAKTI, Katsuhito SUDOH, Satoshi NAKAMURA, "Leveraging Neural Caption Translation with Visually Grounded Paraphrase Augmentation" in IEICE TRANSACTIONS on Information, vol. E103-D, no. 3, pp. 674-683, March 2020, doi: 10.1587/transinf.2019EDP7065.
Abstract: Since a concept can be represented by different vocabularies, styles, and levels of detail, a translation task resembles a many-to-many mapping task from a distribution of sentences in the source language into a distribution of sentences in the target language. This viewpoint, however, is not fully implemented in current neural machine translation (NMT), which is one-to-one sentence mapping. In this study, we represent the distribution itself as multiple paraphrase sentences, which will enrich the model context understanding and trigger it to produce numerous hypotheses. We use a visually grounded paraphrase (VGP), which uses images as a constraint of the concept in paraphrasing, to guarantee that the created paraphrases are within the intended distribution. In this way, our method can also be considered as incorporating image information into NMT without using the image itself. We implement this idea by crowdsourcing a paraphrasing corpus that realizes VGP and construct neural paraphrasing that behaves as expert models in a NMT. Our experimental results reveal that our proposed VGP augmentation strategies showed improvement against a vanilla NMT baseline.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2019EDP7065/_p

Copy

@ARTICLE{e103-d_3_674,
author={Johanes EFFENDI, Sakriani SAKTI, Katsuhito SUDOH, Satoshi NAKAMURA, },
journal={IEICE TRANSACTIONS on Information},
title={Leveraging Neural Caption Translation with Visually Grounded Paraphrase Augmentation},
year={2020},
volume={E103-D},
number={3},
pages={674-683},
abstract={Since a concept can be represented by different vocabularies, styles, and levels of detail, a translation task resembles a many-to-many mapping task from a distribution of sentences in the source language into a distribution of sentences in the target language. This viewpoint, however, is not fully implemented in current neural machine translation (NMT), which is one-to-one sentence mapping. In this study, we represent the distribution itself as multiple paraphrase sentences, which will enrich the model context understanding and trigger it to produce numerous hypotheses. We use a visually grounded paraphrase (VGP), which uses images as a constraint of the concept in paraphrasing, to guarantee that the created paraphrases are within the intended distribution. In this way, our method can also be considered as incorporating image information into NMT without using the image itself. We implement this idea by crowdsourcing a paraphrasing corpus that realizes VGP and construct neural paraphrasing that behaves as expert models in a NMT. Our experimental results reveal that our proposed VGP augmentation strategies showed improvement against a vanilla NMT baseline.},
keywords={},
doi={10.1587/transinf.2019EDP7065},
ISSN={1745-1361},
month={March},}

Copy

TY - JOUR
TI - Leveraging Neural Caption Translation with Visually Grounded Paraphrase Augmentation
T2 - IEICE TRANSACTIONS on Information
SP - 674
EP - 683
AU - Johanes EFFENDI
AU - Sakriani SAKTI
AU - Katsuhito SUDOH
AU - Satoshi NAKAMURA
PY - 2020
DO - 10.1587/transinf.2019EDP7065
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E103-D
IS - 3
JA - IEICE TRANSACTIONS on Information
Y1 - March 2020
AB - Since a concept can be represented by different vocabularies, styles, and levels of detail, a translation task resembles a many-to-many mapping task from a distribution of sentences in the source language into a distribution of sentences in the target language. This viewpoint, however, is not fully implemented in current neural machine translation (NMT), which is one-to-one sentence mapping. In this study, we represent the distribution itself as multiple paraphrase sentences, which will enrich the model context understanding and trigger it to produce numerous hypotheses. We use a visually grounded paraphrase (VGP), which uses images as a constraint of the concept in paraphrasing, to guarantee that the created paraphrases are within the intended distribution. In this way, our method can also be considered as incorporating image information into NMT without using the image itself. We implement this idea by crowdsourcing a paraphrasing corpus that realizes VGP and construct neural paraphrasing that behaves as expert models in a NMT. Our experimental results reveal that our proposed VGP augmentation strategies showed improvement against a vanilla NMT baseline.
ER -

IEICE TRANSACTIONS on Information