IEICE global.ieice.org Site

Author Search Result

[Author] Ji XU(3hit)

1-3hit

End-to-End Multilingual Speech Recognition System with Language Supervision Training
Danyang LIU Ji XU Pengyuan ZHANG

LETTER-Speech and Hearing

Pubricized:
2020/03/19
Vol:
E103-D No:6
Page(s):
1427-1430
End-to-end (E2E) multilingual automatic speech recognition (ASR) systems aim to recognize multilingual speeches in a unified framework. In the current E2E multilingual ASR framework, the output prediction for a specific language lacks constraints on the output scope of modeling units. In this paper, a language supervision training strategy is proposed with language masks to constrain the neural network output distribution. To simulate the multilingual ASR scenario with unknown language identity information, a language identification (LID) classifier is applied to estimate the language masks. On four Babel corpora, the proposed E2E multilingual ASR system achieved an average absolute word error rate (WER) reduction of 2.6% compared with the multilingual baseline system.
A Two-Fold Cross-Validation Training Framework Combined with Meta-Learning for Code-Switching Speech Recognition
Zheying HUANG Ji XU Qingwei ZHAO Pengyuan ZHANG

LETTER-Speech and Hearing

Pubricized:
2022/06/20
Vol:
E105-D No:9
Page(s):
1639-1642
Although end-to-end based speech recognition research for Mandarin-English code-switching has attracted increasing interests, it remains challenging due to data scarcity. Meta-learning approach is popular with low-resource modeling using high-resource data, but it does not make full use of low-resource code-switching data. Therefore we propose a two-fold cross-validation training framework combined with meta-learning approach. Experiments on the SEAME corpus demonstrate the effects of our method.
Automatic Speech Recognition System with Output-Gate Projected Gated Recurrent Unit
Gaofeng CHENG Pengyuan ZHANG Ji XU

PAPER-Speech and Hearing

Pubricized:
2018/11/19
Vol:
E102-D No:2
Page(s):
355-363
The long short-term memory recurrent neural network (LSTM) has achieved tremendous success for automatic speech recognition (ASR). However, the complicated gating mechanism of LSTM introduces a massive computational cost and limits the application of LSTM in some scenarios. In this paper, we describe our work on accelerating the decoding speed and improving the decoding accuracy. First, we propose an architecture, which is called Projected Gated Recurrent Unit (PGRU), for ASR tasks, and show that the PGRU can consistently outperform the standard GRU. Second, to improve the PGRU generalization, particularly on large-scale ASR tasks, we propose the Output-gate PGRU (OPGRU). In addition, the time delay neural network (TDNN) and normalization methods are found beneficial for OPGRU. In this paper, we apply the OPGRU for both the acoustic model and recurrent neural network language model (RNN-LM). Finally, we evaluate the PGRU on the total Eval2000 / RT03 test sets, and the proposed OPGRU single ASR system achieves 0.9% / 0.9% absolute (8.2% / 8.6% relative) reduction in word error rate (WER) compared to our previous best LSTM single ASR system. Furthermore, the OPGRU ASR system achieves significant speed-up on both acoustic model and language model rescoring.

Author Search Result

[Author] Ji XU(3hit)

End-to-End Multilingual Speech Recognition System with Language Supervision Training

A Two-Fold Cross-Validation Training Framework Combined with Meta-Learning for Code-Switching Speech Recognition

Automatic Speech Recognition System with Output-Gate Projected Gated Recurrent Unit

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles