Articulatory Modeling for Pronunciation Error Detection without Non-Native Training Data Based on DNN Transfer Learning

Richeng DUAN; Tatsuya KAWAHARA; Masatake DANTSUJI; Jinsong ZHANG

doi:10.1587/transinf.2017EDP7019

IEICE TRANSACTIONS on Information

Articulatory Modeling for Pronunciation Error Detection without Non-Native Training Data Based on DNN Transfer Learning

Richeng DUAN, Tatsuya KAWAHARA, Masatake DANTSUJI, Jinsong ZHANG

Full Text Views

0

Cite this

Summary :

Aiming at detecting pronunciation errors produced by second language learners and providing corrective feedbacks related with articulation, we address effective articulatory models based on deep neural network (DNN). Articulatory attributes are defined for manner and place of articulation. In order to efficiently train these models of non-native speech without such data, which is difficult to collect in a large scale, several transfer learning based modeling methods are explored. We first investigate three closely-related secondary tasks which aim at effective learning of DNN articulatory models. We also propose to exploit large speech corpora of native and target language to model inter-language phenomena. This kind of transfer learning can provide a better feature representation of non-native speech. Related task transfer and language transfer learning are further combined on the network level. Compared with the conventional DNN which is used as the baseline, all proposed methods improved the performance. In the native attribute recognition task, the network-level combination method reduced the recognition error rate by more than 10% relative for all articulatory attributes. The method was also applied to pronunciation error detection in Mandarin Chinese pronunciation learning by Japanese native speakers, and achieved the relative improvement up to 17.0% for detection accuracy and up to 19.9% for F-score, which is also better than the lattice-based combination.

Publication: IEICE TRANSACTIONS on Information Vol.E100-D No.9 pp.2174-2182

Publication Date: 2017/09/01

Publicized: 2017/05/26

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2017EDP7019

Type of Manuscript: PAPER

Category: Speech and Hearing

Authors

Richeng DUAN
  Kyoto University
Tatsuya KAWAHARA
  Kyoto University
Masatake DANTSUJI
  Kyoto University
Jinsong ZHANG
  Beijing Language and Culture University

Keyword

CALL, CAPT, pronunciation error detection, articulation modeling, transfer learning

Cite this

Copy

Richeng DUAN, Tatsuya KAWAHARA, Masatake DANTSUJI, Jinsong ZHANG, "Articulatory Modeling for Pronunciation Error Detection without Non-Native Training Data Based on DNN Transfer Learning" in IEICE TRANSACTIONS on Information, vol. E100-D, no. 9, pp. 2174-2182, September 2017, doi: 10.1587/transinf.2017EDP7019.
Abstract: Aiming at detecting pronunciation errors produced by second language learners and providing corrective feedbacks related with articulation, we address effective articulatory models based on deep neural network (DNN). Articulatory attributes are defined for manner and place of articulation. In order to efficiently train these models of non-native speech without such data, which is difficult to collect in a large scale, several transfer learning based modeling methods are explored. We first investigate three closely-related secondary tasks which aim at effective learning of DNN articulatory models. We also propose to exploit large speech corpora of native and target language to model inter-language phenomena. This kind of transfer learning can provide a better feature representation of non-native speech. Related task transfer and language transfer learning are further combined on the network level. Compared with the conventional DNN which is used as the baseline, all proposed methods improved the performance. In the native attribute recognition task, the network-level combination method reduced the recognition error rate by more than 10% relative for all articulatory attributes. The method was also applied to pronunciation error detection in Mandarin Chinese pronunciation learning by Japanese native speakers, and achieved the relative improvement up to 17.0% for detection accuracy and up to 19.9% for F-score, which is also better than the lattice-based combination.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2017EDP7019/_p

Copy

@ARTICLE{e100-d_9_2174,
author={Richeng DUAN, Tatsuya KAWAHARA, Masatake DANTSUJI, Jinsong ZHANG, },
journal={IEICE TRANSACTIONS on Information},
title={Articulatory Modeling for Pronunciation Error Detection without Non-Native Training Data Based on DNN Transfer Learning},
year={2017},
volume={E100-D},
number={9},
pages={2174-2182},
abstract={Aiming at detecting pronunciation errors produced by second language learners and providing corrective feedbacks related with articulation, we address effective articulatory models based on deep neural network (DNN). Articulatory attributes are defined for manner and place of articulation. In order to efficiently train these models of non-native speech without such data, which is difficult to collect in a large scale, several transfer learning based modeling methods are explored. We first investigate three closely-related secondary tasks which aim at effective learning of DNN articulatory models. We also propose to exploit large speech corpora of native and target language to model inter-language phenomena. This kind of transfer learning can provide a better feature representation of non-native speech. Related task transfer and language transfer learning are further combined on the network level. Compared with the conventional DNN which is used as the baseline, all proposed methods improved the performance. In the native attribute recognition task, the network-level combination method reduced the recognition error rate by more than 10% relative for all articulatory attributes. The method was also applied to pronunciation error detection in Mandarin Chinese pronunciation learning by Japanese native speakers, and achieved the relative improvement up to 17.0% for detection accuracy and up to 19.9% for F-score, which is also better than the lattice-based combination.},
keywords={},
doi={10.1587/transinf.2017EDP7019},
ISSN={1745-1361},
month={September},}

Copy

TY - JOUR
TI - Articulatory Modeling for Pronunciation Error Detection without Non-Native Training Data Based on DNN Transfer Learning
T2 - IEICE TRANSACTIONS on Information
SP - 2174
EP - 2182
AU - Richeng DUAN
AU - Tatsuya KAWAHARA
AU - Masatake DANTSUJI
AU - Jinsong ZHANG
PY - 2017
DO - 10.1587/transinf.2017EDP7019
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E100-D
IS - 9
JA - IEICE TRANSACTIONS on Information
Y1 - September 2017
AB - Aiming at detecting pronunciation errors produced by second language learners and providing corrective feedbacks related with articulation, we address effective articulatory models based on deep neural network (DNN). Articulatory attributes are defined for manner and place of articulation. In order to efficiently train these models of non-native speech without such data, which is difficult to collect in a large scale, several transfer learning based modeling methods are explored. We first investigate three closely-related secondary tasks which aim at effective learning of DNN articulatory models. We also propose to exploit large speech corpora of native and target language to model inter-language phenomena. This kind of transfer learning can provide a better feature representation of non-native speech. Related task transfer and language transfer learning are further combined on the network level. Compared with the conventional DNN which is used as the baseline, all proposed methods improved the performance. In the native attribute recognition task, the network-level combination method reduced the recognition error rate by more than 10% relative for all articulatory attributes. The method was also applied to pronunciation error detection in Mandarin Chinese pronunciation learning by Japanese native speakers, and achieved the relative improvement up to 17.0% for detection accuracy and up to 19.9% for F-score, which is also better than the lattice-based combination.
ER -

IEICE TRANSACTIONS on Information