Automatic Speech Recognition System with Output-Gate Projected Gated Recurrent Unit

Gaofeng CHENG; Pengyuan ZHANG; Ji XU

doi:10.1587/transinf.2018EDP7155

IEICE TRANSACTIONS on Information

Automatic Speech Recognition System with Output-Gate Projected Gated Recurrent Unit

Gaofeng CHENG, Pengyuan ZHANG, Ji XU

Full Text Views

0

Cite this

Summary :

The long short-term memory recurrent neural network (LSTM) has achieved tremendous success for automatic speech recognition (ASR). However, the complicated gating mechanism of LSTM introduces a massive computational cost and limits the application of LSTM in some scenarios. In this paper, we describe our work on accelerating the decoding speed and improving the decoding accuracy. First, we propose an architecture, which is called Projected Gated Recurrent Unit (PGRU), for ASR tasks, and show that the PGRU can consistently outperform the standard GRU. Second, to improve the PGRU generalization, particularly on large-scale ASR tasks, we propose the Output-gate PGRU (OPGRU). In addition, the time delay neural network (TDNN) and normalization methods are found beneficial for OPGRU. In this paper, we apply the OPGRU for both the acoustic model and recurrent neural network language model (RNN-LM). Finally, we evaluate the PGRU on the total Eval2000 / RT03 test sets, and the proposed OPGRU single ASR system achieves 0.9% / 0.9% absolute (8.2% / 8.6% relative) reduction in word error rate (WER) compared to our previous best LSTM single ASR system. Furthermore, the OPGRU ASR system achieves significant speed-up on both acoustic model and language model rescoring.

Publication: IEICE TRANSACTIONS on Information Vol.E102-D No.2 pp.355-363

Publication Date: 2019/02/01

Publicized: 2018/11/19

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2018EDP7155

Type of Manuscript: PAPER

Category: Speech and Hearing

Authors

Gaofeng CHENG
  Beijing,Institute of Acoustics
Pengyuan ZHANG
  Beijing,Institute of Acoustics
Ji XU
  Institute of Acoustics

Keyword

GRU, LSTM, neural network language model, speech recognition

Cite this

Copy

Gaofeng CHENG, Pengyuan ZHANG, Ji XU, "Automatic Speech Recognition System with Output-Gate Projected Gated Recurrent Unit" in IEICE TRANSACTIONS on Information, vol. E102-D, no. 2, pp. 355-363, February 2019, doi: 10.1587/transinf.2018EDP7155.
Abstract: The long short-term memory recurrent neural network (LSTM) has achieved tremendous success for automatic speech recognition (ASR). However, the complicated gating mechanism of LSTM introduces a massive computational cost and limits the application of LSTM in some scenarios. In this paper, we describe our work on accelerating the decoding speed and improving the decoding accuracy. First, we propose an architecture, which is called Projected Gated Recurrent Unit (PGRU), for ASR tasks, and show that the PGRU can consistently outperform the standard GRU. Second, to improve the PGRU generalization, particularly on large-scale ASR tasks, we propose the Output-gate PGRU (OPGRU). In addition, the time delay neural network (TDNN) and normalization methods are found beneficial for OPGRU. In this paper, we apply the OPGRU for both the acoustic model and recurrent neural network language model (RNN-LM). Finally, we evaluate the PGRU on the total Eval2000 / RT03 test sets, and the proposed OPGRU single ASR system achieves 0.9% / 0.9% absolute (8.2% / 8.6% relative) reduction in word error rate (WER) compared to our previous best LSTM single ASR system. Furthermore, the OPGRU ASR system achieves significant speed-up on both acoustic model and language model rescoring.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2018EDP7155/_p

Copy

@ARTICLE{e102-d_2_355,
author={Gaofeng CHENG, Pengyuan ZHANG, Ji XU, },
journal={IEICE TRANSACTIONS on Information},
title={Automatic Speech Recognition System with Output-Gate Projected Gated Recurrent Unit},
year={2019},
volume={E102-D},
number={2},
pages={355-363},
abstract={The long short-term memory recurrent neural network (LSTM) has achieved tremendous success for automatic speech recognition (ASR). However, the complicated gating mechanism of LSTM introduces a massive computational cost and limits the application of LSTM in some scenarios. In this paper, we describe our work on accelerating the decoding speed and improving the decoding accuracy. First, we propose an architecture, which is called Projected Gated Recurrent Unit (PGRU), for ASR tasks, and show that the PGRU can consistently outperform the standard GRU. Second, to improve the PGRU generalization, particularly on large-scale ASR tasks, we propose the Output-gate PGRU (OPGRU). In addition, the time delay neural network (TDNN) and normalization methods are found beneficial for OPGRU. In this paper, we apply the OPGRU for both the acoustic model and recurrent neural network language model (RNN-LM). Finally, we evaluate the PGRU on the total Eval2000 / RT03 test sets, and the proposed OPGRU single ASR system achieves 0.9% / 0.9% absolute (8.2% / 8.6% relative) reduction in word error rate (WER) compared to our previous best LSTM single ASR system. Furthermore, the OPGRU ASR system achieves significant speed-up on both acoustic model and language model rescoring.},
keywords={},
doi={10.1587/transinf.2018EDP7155},
ISSN={1745-1361},
month={February},}

Copy

TY - JOUR
TI - Automatic Speech Recognition System with Output-Gate Projected Gated Recurrent Unit
T2 - IEICE TRANSACTIONS on Information
SP - 355
EP - 363
AU - Gaofeng CHENG
AU - Pengyuan ZHANG
AU - Ji XU
PY - 2019
DO - 10.1587/transinf.2018EDP7155
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E102-D
IS - 2
JA - IEICE TRANSACTIONS on Information
Y1 - February 2019
AB - The long short-term memory recurrent neural network (LSTM) has achieved tremendous success for automatic speech recognition (ASR). However, the complicated gating mechanism of LSTM introduces a massive computational cost and limits the application of LSTM in some scenarios. In this paper, we describe our work on accelerating the decoding speed and improving the decoding accuracy. First, we propose an architecture, which is called Projected Gated Recurrent Unit (PGRU), for ASR tasks, and show that the PGRU can consistently outperform the standard GRU. Second, to improve the PGRU generalization, particularly on large-scale ASR tasks, we propose the Output-gate PGRU (OPGRU). In addition, the time delay neural network (TDNN) and normalization methods are found beneficial for OPGRU. In this paper, we apply the OPGRU for both the acoustic model and recurrent neural network language model (RNN-LM). Finally, we evaluate the PGRU on the total Eval2000 / RT03 test sets, and the proposed OPGRU single ASR system achieves 0.9% / 0.9% absolute (8.2% / 8.6% relative) reduction in word error rate (WER) compared to our previous best LSTM single ASR system. Furthermore, the OPGRU ASR system achieves significant speed-up on both acoustic model and language model rescoring.
ER -

IEICE TRANSACTIONS on Information