Improved HMM Separation for Distant-Talking Speech Recognition

Tetsuya TAKIGUCHI; Masafumi NISHIMURA

IEICE TRANSACTIONS on Information

Improved HMM Separation for Distant-Talking Speech Recognition

Tetsuya TAKIGUCHI, Masafumi NISHIMURA

Full Text Views

0

Cite this

Summary :

In distant-talking speech recognition, the recognition accuracy is seriously degraded by reverberation and environmental noise. A robust speech recognition technique in such environments, HMM separation and composition, has been described in. HMM separation estimates the model parameters of the acoustic transfer function using adaptation data uttered from an unknown position in noisy and reverberant environments, and HMM composition builds an HMM of noisy and reverberant speech, using the acoustic transfer function estimated by HMM separation. Previously, HMM separation has been applied to the acoustic transfer function based on a single Gaussian distribution. However the improvement was smaller than expected for the impulse response with long reverberations. This is because the variance of the acoustic transfer function in each frame increases, since the length of the impulse response of the room reverberation is longer than that of the spectral analysis window. In this paper, HMM separation is extended to estimate the acoustic transfer function based on the Gaussian mixture components in order to compensate for the greater variability of the acoustic transfer function, and the re-estimation formulae are derived. In addition, this paper introduces a technique to adapt the noise weight for each mel-spaced frequency in order to improve the performance of the HMM separation in the linear-spectral domain, since the use of the HMM separation in the linear-spectral domain sometimes causes a negative mean output due to the subtraction operation. The extended HMM separation is evaluated on distant-talking speech recognition tasks. The results of the experiments clarify the effectiveness of the proposed method.

Publication: IEICE TRANSACTIONS on Information Vol.E87-D No.5 pp.1127-1137

Publication Date: 2004/05/01

Publicized

Online ISSN

DOI

Type of Manuscript: Special Section PAPER (Special Section on Speech Dynamics by Ear, Eye, Mouth and Machine)

Category

Cite this

Copy

Tetsuya TAKIGUCHI, Masafumi NISHIMURA, "Improved HMM Separation for Distant-Talking Speech Recognition" in IEICE TRANSACTIONS on Information, vol. E87-D, no. 5, pp. 1127-1137, May 2004, doi: .
Abstract: In distant-talking speech recognition, the recognition accuracy is seriously degraded by reverberation and environmental noise. A robust speech recognition technique in such environments, HMM separation and composition, has been described in. HMM separation estimates the model parameters of the acoustic transfer function using adaptation data uttered from an unknown position in noisy and reverberant environments, and HMM composition builds an HMM of noisy and reverberant speech, using the acoustic transfer function estimated by HMM separation. Previously, HMM separation has been applied to the acoustic transfer function based on a single Gaussian distribution. However the improvement was smaller than expected for the impulse response with long reverberations. This is because the variance of the acoustic transfer function in each frame increases, since the length of the impulse response of the room reverberation is longer than that of the spectral analysis window. In this paper, HMM separation is extended to estimate the acoustic transfer function based on the Gaussian mixture components in order to compensate for the greater variability of the acoustic transfer function, and the re-estimation formulae are derived. In addition, this paper introduces a technique to adapt the noise weight for each mel-spaced frequency in order to improve the performance of the HMM separation in the linear-spectral domain, since the use of the HMM separation in the linear-spectral domain sometimes causes a negative mean output due to the subtraction operation. The extended HMM separation is evaluated on distant-talking speech recognition tasks. The results of the experiments clarify the effectiveness of the proposed method.
URL: https://global.ieice.org/en_transactions/information/10.1587/e87-d_5_1127/_p

Copy

@ARTICLE{e87-d_5_1127,
author={Tetsuya TAKIGUCHI, Masafumi NISHIMURA, },
journal={IEICE TRANSACTIONS on Information},
title={Improved HMM Separation for Distant-Talking Speech Recognition},
year={2004},
volume={E87-D},
number={5},
pages={1127-1137},
abstract={In distant-talking speech recognition, the recognition accuracy is seriously degraded by reverberation and environmental noise. A robust speech recognition technique in such environments, HMM separation and composition, has been described in. HMM separation estimates the model parameters of the acoustic transfer function using adaptation data uttered from an unknown position in noisy and reverberant environments, and HMM composition builds an HMM of noisy and reverberant speech, using the acoustic transfer function estimated by HMM separation. Previously, HMM separation has been applied to the acoustic transfer function based on a single Gaussian distribution. However the improvement was smaller than expected for the impulse response with long reverberations. This is because the variance of the acoustic transfer function in each frame increases, since the length of the impulse response of the room reverberation is longer than that of the spectral analysis window. In this paper, HMM separation is extended to estimate the acoustic transfer function based on the Gaussian mixture components in order to compensate for the greater variability of the acoustic transfer function, and the re-estimation formulae are derived. In addition, this paper introduces a technique to adapt the noise weight for each mel-spaced frequency in order to improve the performance of the HMM separation in the linear-spectral domain, since the use of the HMM separation in the linear-spectral domain sometimes causes a negative mean output due to the subtraction operation. The extended HMM separation is evaluated on distant-talking speech recognition tasks. The results of the experiments clarify the effectiveness of the proposed method.},
keywords={},
doi={},
ISSN={},
month={May},}

Copy

TY - JOUR
TI - Improved HMM Separation for Distant-Talking Speech Recognition
T2 - IEICE TRANSACTIONS on Information
SP - 1127
EP - 1137
AU - Tetsuya TAKIGUCHI
AU - Masafumi NISHIMURA
PY - 2004
DO -
JO - IEICE TRANSACTIONS on Information
SN -
VL - E87-D
IS - 5
JA - IEICE TRANSACTIONS on Information
Y1 - May 2004
AB - In distant-talking speech recognition, the recognition accuracy is seriously degraded by reverberation and environmental noise. A robust speech recognition technique in such environments, HMM separation and composition, has been described in. HMM separation estimates the model parameters of the acoustic transfer function using adaptation data uttered from an unknown position in noisy and reverberant environments, and HMM composition builds an HMM of noisy and reverberant speech, using the acoustic transfer function estimated by HMM separation. Previously, HMM separation has been applied to the acoustic transfer function based on a single Gaussian distribution. However the improvement was smaller than expected for the impulse response with long reverberations. This is because the variance of the acoustic transfer function in each frame increases, since the length of the impulse response of the room reverberation is longer than that of the spectral analysis window. In this paper, HMM separation is extended to estimate the acoustic transfer function based on the Gaussian mixture components in order to compensate for the greater variability of the acoustic transfer function, and the re-estimation formulae are derived. In addition, this paper introduces a technique to adapt the noise weight for each mel-spaced frequency in order to improve the performance of the HMM separation in the linear-spectral domain, since the use of the HMM separation in the linear-spectral domain sometimes causes a negative mean output due to the subtraction operation. The extended HMM separation is evaluated on distant-talking speech recognition tasks. The results of the experiments clarify the effectiveness of the proposed method.
ER -

IEICE TRANSACTIONS on Information

Improved HMM Separation for Distant-Talking Speech Recognition

Summary :

Authors

Keyword

Latest Issue

Contents

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles

IEICE TRANSACTIONS on Information

Improved HMM Separation for Distant-Talking Speech Recognition

Summary :

Authors

Keyword

Latest Issue

Contents

Copyrights notice of machine-translated contents

Cite this

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles