The search functionality is under construction.

IEICE TRANSACTIONS on Information

End-to-End Multilingual Speech Recognition System with Language Supervision Training

Danyang LIU, Ji XU, Pengyuan ZHANG

  • Full Text Views

    0

  • Cite this

Summary :

End-to-end (E2E) multilingual automatic speech recognition (ASR) systems aim to recognize multilingual speeches in a unified framework. In the current E2E multilingual ASR framework, the output prediction for a specific language lacks constraints on the output scope of modeling units. In this paper, a language supervision training strategy is proposed with language masks to constrain the neural network output distribution. To simulate the multilingual ASR scenario with unknown language identity information, a language identification (LID) classifier is applied to estimate the language masks. On four Babel corpora, the proposed E2E multilingual ASR system achieved an average absolute word error rate (WER) reduction of 2.6% compared with the multilingual baseline system.

Publication
IEICE TRANSACTIONS on Information Vol.E103-D No.6 pp.1427-1430
Publication Date
2020/06/01
Publicized
2020/03/19
Online ISSN
1745-1361
DOI
10.1587/transinf.2019EDL8214
Type of Manuscript
LETTER
Category
Speech and Hearing

Authors

Danyang LIU
  Chinese Academy of Sciences,University of Chinese Academy of Sciences
Ji XU
  Chinese Academy of Sciences,University of Chinese Academy of Sciences
Pengyuan ZHANG
  Chinese Academy of Sciences,University of Chinese Academy of Sciences

Keyword