Performance in Automatic Speech Recognition (ASR) degrades dramatically in noisy environments. To alleviate this problem, a variety of deep networks based on convolutional neural networks and recurrent neural networks were proposed by applying L1 or L2 loss. In this Letter, we propose a new orthogonal gradient penalty (OGP) method for Wasserstein Generative Adversarial Networks (WGAN) applied to denoising and despeeching models. WGAN integrates a multi-task autoencoder which estimates not only speech features but also noise features from noisy speech. While achieving 14.1% improvement in Wasserstein distance convergence rate, the proposed OGP enhanced features are tested in ASR and achieve 9.7%, 8.6%, 6.2%, and 4.8% WER improvements over DDAE, MTAE, R-CED(CNN) and RNN models.
Chao-Yuan KAO
Korea University
Sangwook PARK
Johns Hopkins University
Alzahra BADI
Korea University
David K. HAN
US Army Research Laboratory (ARL)
Hanseok KO
Korea University
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Chao-Yuan KAO, Sangwook PARK, Alzahra BADI, David K. HAN, Hanseok KO, "Orthogonal Gradient Penalty for Fast Training of Wasserstein GAN Based Multi-Task Autoencoder toward Robust Speech Recognition" in IEICE TRANSACTIONS on Information,
vol. E103-D, no. 5, pp. 1195-1198, May 2020, doi: 10.1587/transinf.2019EDL8183.
Abstract: Performance in Automatic Speech Recognition (ASR) degrades dramatically in noisy environments. To alleviate this problem, a variety of deep networks based on convolutional neural networks and recurrent neural networks were proposed by applying L1 or L2 loss. In this Letter, we propose a new orthogonal gradient penalty (OGP) method for Wasserstein Generative Adversarial Networks (WGAN) applied to denoising and despeeching models. WGAN integrates a multi-task autoencoder which estimates not only speech features but also noise features from noisy speech. While achieving 14.1% improvement in Wasserstein distance convergence rate, the proposed OGP enhanced features are tested in ASR and achieve 9.7%, 8.6%, 6.2%, and 4.8% WER improvements over DDAE, MTAE, R-CED(CNN) and RNN models.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2019EDL8183/_p
Copy
@ARTICLE{e103-d_5_1195,
author={Chao-Yuan KAO, Sangwook PARK, Alzahra BADI, David K. HAN, Hanseok KO, },
journal={IEICE TRANSACTIONS on Information},
title={Orthogonal Gradient Penalty for Fast Training of Wasserstein GAN Based Multi-Task Autoencoder toward Robust Speech Recognition},
year={2020},
volume={E103-D},
number={5},
pages={1195-1198},
abstract={Performance in Automatic Speech Recognition (ASR) degrades dramatically in noisy environments. To alleviate this problem, a variety of deep networks based on convolutional neural networks and recurrent neural networks were proposed by applying L1 or L2 loss. In this Letter, we propose a new orthogonal gradient penalty (OGP) method for Wasserstein Generative Adversarial Networks (WGAN) applied to denoising and despeeching models. WGAN integrates a multi-task autoencoder which estimates not only speech features but also noise features from noisy speech. While achieving 14.1% improvement in Wasserstein distance convergence rate, the proposed OGP enhanced features are tested in ASR and achieve 9.7%, 8.6%, 6.2%, and 4.8% WER improvements over DDAE, MTAE, R-CED(CNN) and RNN models.},
keywords={},
doi={10.1587/transinf.2019EDL8183},
ISSN={1745-1361},
month={May},}
Copy
TY - JOUR
TI - Orthogonal Gradient Penalty for Fast Training of Wasserstein GAN Based Multi-Task Autoencoder toward Robust Speech Recognition
T2 - IEICE TRANSACTIONS on Information
SP - 1195
EP - 1198
AU - Chao-Yuan KAO
AU - Sangwook PARK
AU - Alzahra BADI
AU - David K. HAN
AU - Hanseok KO
PY - 2020
DO - 10.1587/transinf.2019EDL8183
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E103-D
IS - 5
JA - IEICE TRANSACTIONS on Information
Y1 - May 2020
AB - Performance in Automatic Speech Recognition (ASR) degrades dramatically in noisy environments. To alleviate this problem, a variety of deep networks based on convolutional neural networks and recurrent neural networks were proposed by applying L1 or L2 loss. In this Letter, we propose a new orthogonal gradient penalty (OGP) method for Wasserstein Generative Adversarial Networks (WGAN) applied to denoising and despeeching models. WGAN integrates a multi-task autoencoder which estimates not only speech features but also noise features from noisy speech. While achieving 14.1% improvement in Wasserstein distance convergence rate, the proposed OGP enhanced features are tested in ASR and achieve 9.7%, 8.6%, 6.2%, and 4.8% WER improvements over DDAE, MTAE, R-CED(CNN) and RNN models.
ER -