This paper introduces new methods of robust speech recognition using discrete-mixture HMMs (DMHMMs). The aim of this work is to develop robust speech recognition for adverse conditions that contain both stationary and non-stationary noise. In particular, we focus on the issue of impulsive noise, which is a major problem in practical speech recognition system. In this paper, two strategies were utilized to solve the problem. In the first strategy, adverse conditions are represented by an acoustic model. In this case, a large amount of training data and accurate acoustic models are required to present a variety of acoustic environments. This strategy is suitable for recognition in stationary or slow-varying noise conditions. The second is based on the idea that the corrupted frames are treated to reduce the adverse effect by compensation method. Since impulsive noise has a wide variety of features and its modeling is difficult, the second strategy is employed. In order to achieve those strategies, we propose two methods. Those methods are based on DMHMM framework which is one type of discrete HMM (DHMM). First, an estimation method of DMHMM parameters based on MAP is proposed aiming to improve trainability. The second is a method of compensating the observation probabilities of DMHMMs by threshold to reduce adverse effect of outlier values. Observation probabilities of impulsive noise tend to be much smaller than those of normal speech. The motivation in this approach is that flooring the observation probability reduces the adverse effect caused by impulsive noise. Experimental evaluations on Japanese LVCSR for read newspaper speech showed that the proposed method achieved the average error rate reduction of 48.5% in impulsive noise conditions. Also the experimental results in adverse conditions that contain both stationary and impulsive noises showed that the proposed method achieved the average error rate reduction of 28.1%.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Tetsuo KOSAKA, Masaharu KATOH, Masaki KOHDA, "Robust Speech Recognition Using Discrete-Mixture HMMs" in IEICE TRANSACTIONS on Information,
vol. E88-D, no. 12, pp. 2811-2818, December 2005, doi: 10.1093/ietisy/e88-d.12.2811.
Abstract: This paper introduces new methods of robust speech recognition using discrete-mixture HMMs (DMHMMs). The aim of this work is to develop robust speech recognition for adverse conditions that contain both stationary and non-stationary noise. In particular, we focus on the issue of impulsive noise, which is a major problem in practical speech recognition system. In this paper, two strategies were utilized to solve the problem. In the first strategy, adverse conditions are represented by an acoustic model. In this case, a large amount of training data and accurate acoustic models are required to present a variety of acoustic environments. This strategy is suitable for recognition in stationary or slow-varying noise conditions. The second is based on the idea that the corrupted frames are treated to reduce the adverse effect by compensation method. Since impulsive noise has a wide variety of features and its modeling is difficult, the second strategy is employed. In order to achieve those strategies, we propose two methods. Those methods are based on DMHMM framework which is one type of discrete HMM (DHMM). First, an estimation method of DMHMM parameters based on MAP is proposed aiming to improve trainability. The second is a method of compensating the observation probabilities of DMHMMs by threshold to reduce adverse effect of outlier values. Observation probabilities of impulsive noise tend to be much smaller than those of normal speech. The motivation in this approach is that flooring the observation probability reduces the adverse effect caused by impulsive noise. Experimental evaluations on Japanese LVCSR for read newspaper speech showed that the proposed method achieved the average error rate reduction of 48.5% in impulsive noise conditions. Also the experimental results in adverse conditions that contain both stationary and impulsive noises showed that the proposed method achieved the average error rate reduction of 28.1%.
URL: https://global.ieice.org/en_transactions/information/10.1093/ietisy/e88-d.12.2811/_p
Copy
@ARTICLE{e88-d_12_2811,
author={Tetsuo KOSAKA, Masaharu KATOH, Masaki KOHDA, },
journal={IEICE TRANSACTIONS on Information},
title={Robust Speech Recognition Using Discrete-Mixture HMMs},
year={2005},
volume={E88-D},
number={12},
pages={2811-2818},
abstract={This paper introduces new methods of robust speech recognition using discrete-mixture HMMs (DMHMMs). The aim of this work is to develop robust speech recognition for adverse conditions that contain both stationary and non-stationary noise. In particular, we focus on the issue of impulsive noise, which is a major problem in practical speech recognition system. In this paper, two strategies were utilized to solve the problem. In the first strategy, adverse conditions are represented by an acoustic model. In this case, a large amount of training data and accurate acoustic models are required to present a variety of acoustic environments. This strategy is suitable for recognition in stationary or slow-varying noise conditions. The second is based on the idea that the corrupted frames are treated to reduce the adverse effect by compensation method. Since impulsive noise has a wide variety of features and its modeling is difficult, the second strategy is employed. In order to achieve those strategies, we propose two methods. Those methods are based on DMHMM framework which is one type of discrete HMM (DHMM). First, an estimation method of DMHMM parameters based on MAP is proposed aiming to improve trainability. The second is a method of compensating the observation probabilities of DMHMMs by threshold to reduce adverse effect of outlier values. Observation probabilities of impulsive noise tend to be much smaller than those of normal speech. The motivation in this approach is that flooring the observation probability reduces the adverse effect caused by impulsive noise. Experimental evaluations on Japanese LVCSR for read newspaper speech showed that the proposed method achieved the average error rate reduction of 48.5% in impulsive noise conditions. Also the experimental results in adverse conditions that contain both stationary and impulsive noises showed that the proposed method achieved the average error rate reduction of 28.1%.},
keywords={},
doi={10.1093/ietisy/e88-d.12.2811},
ISSN={},
month={December},}
Copy
TY - JOUR
TI - Robust Speech Recognition Using Discrete-Mixture HMMs
T2 - IEICE TRANSACTIONS on Information
SP - 2811
EP - 2818
AU - Tetsuo KOSAKA
AU - Masaharu KATOH
AU - Masaki KOHDA
PY - 2005
DO - 10.1093/ietisy/e88-d.12.2811
JO - IEICE TRANSACTIONS on Information
SN -
VL - E88-D
IS - 12
JA - IEICE TRANSACTIONS on Information
Y1 - December 2005
AB - This paper introduces new methods of robust speech recognition using discrete-mixture HMMs (DMHMMs). The aim of this work is to develop robust speech recognition for adverse conditions that contain both stationary and non-stationary noise. In particular, we focus on the issue of impulsive noise, which is a major problem in practical speech recognition system. In this paper, two strategies were utilized to solve the problem. In the first strategy, adverse conditions are represented by an acoustic model. In this case, a large amount of training data and accurate acoustic models are required to present a variety of acoustic environments. This strategy is suitable for recognition in stationary or slow-varying noise conditions. The second is based on the idea that the corrupted frames are treated to reduce the adverse effect by compensation method. Since impulsive noise has a wide variety of features and its modeling is difficult, the second strategy is employed. In order to achieve those strategies, we propose two methods. Those methods are based on DMHMM framework which is one type of discrete HMM (DHMM). First, an estimation method of DMHMM parameters based on MAP is proposed aiming to improve trainability. The second is a method of compensating the observation probabilities of DMHMMs by threshold to reduce adverse effect of outlier values. Observation probabilities of impulsive noise tend to be much smaller than those of normal speech. The motivation in this approach is that flooring the observation probability reduces the adverse effect caused by impulsive noise. Experimental evaluations on Japanese LVCSR for read newspaper speech showed that the proposed method achieved the average error rate reduction of 48.5% in impulsive noise conditions. Also the experimental results in adverse conditions that contain both stationary and impulsive noises showed that the proposed method achieved the average error rate reduction of 28.1%.
ER -