The search functionality is under construction.

The search functionality is under construction.

We propose a blind dereverberation method based on spectral subtraction using a multi-channel least mean squares (MCLMS) algorithm for distant-talking speech recognition. In a distant-talking environment, the channel impulse response is longer than the short-term spectral analysis window. By treating the late reverberation as additive noise, a noise reduction technique based on spectral subtraction was proposed to estimate the power spectrum of the clean speech using power spectra of the distorted speech and the unknown impulse responses. To estimate the power spectra of the impulse responses, a variable step-size unconstrained MCLMS (VSS-UMCLMS) algorithm for identifying the impulse responses in a time domain is extended to a frequency domain. To reduce the effect of the estimation error of the channel impulse response, we normalize the early reverberation by cepstral mean normalization (CMN) instead of spectral subtraction using the estimated impulse response. Furthermore, our proposed method is combined with conventional delay-and-sum beamforming. We conducted recognition experiments on a distorted speech signal simulated by convolving multi-channel impulse responses with clean speech. The proposed method achieved a relative error reduction rate of 22.4% in relation to conventional CMN. By combining the proposed method with beamforming, a relative error reduction rate of 24.5% in relation to the conventional CMN with beamforming was achieved using only an isolated word (with duration of about 0.6 s) to estimate the spectrum of the impulse response.

- Publication
- IEICE TRANSACTIONS on Information Vol.E94-D No.3 pp.659-667

- Publication Date
- 2011/03/01

- Publicized

- Online ISSN
- 1745-1361

- DOI
- 10.1587/transinf.E94.D.659

- Type of Manuscript
- PAPER

- Category
- Speech and Hearing

The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.

Copy

Longbiao WANG, Norihide KITAOKA, Seiichi NAKAGAWA, "Distant-Talking Speech Recognition Based on Spectral Subtraction by Multi-Channel LMS Algorithm" in IEICE TRANSACTIONS on Information,
vol. E94-D, no. 3, pp. 659-667, March 2011, doi: 10.1587/transinf.E94.D.659.

Abstract: We propose a blind dereverberation method based on spectral subtraction using a multi-channel least mean squares (MCLMS) algorithm for distant-talking speech recognition. In a distant-talking environment, the channel impulse response is longer than the short-term spectral analysis window. By treating the late reverberation as additive noise, a noise reduction technique based on spectral subtraction was proposed to estimate the power spectrum of the clean speech using power spectra of the distorted speech and the unknown impulse responses. To estimate the power spectra of the impulse responses, a variable step-size unconstrained MCLMS (VSS-UMCLMS) algorithm for identifying the impulse responses in a time domain is extended to a frequency domain. To reduce the effect of the estimation error of the channel impulse response, we normalize the early reverberation by cepstral mean normalization (CMN) instead of spectral subtraction using the estimated impulse response. Furthermore, our proposed method is combined with conventional delay-and-sum beamforming. We conducted recognition experiments on a distorted speech signal simulated by convolving multi-channel impulse responses with clean speech. The proposed method achieved a relative error reduction rate of 22.4% in relation to conventional CMN. By combining the proposed method with beamforming, a relative error reduction rate of 24.5% in relation to the conventional CMN with beamforming was achieved using only an isolated word (with duration of about 0.6 s) to estimate the spectrum of the impulse response.

URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.E94.D.659/_p

Copy

@ARTICLE{e94-d_3_659,

author={Longbiao WANG, Norihide KITAOKA, Seiichi NAKAGAWA, },

journal={IEICE TRANSACTIONS on Information},

title={Distant-Talking Speech Recognition Based on Spectral Subtraction by Multi-Channel LMS Algorithm},

year={2011},

volume={E94-D},

number={3},

pages={659-667},

abstract={We propose a blind dereverberation method based on spectral subtraction using a multi-channel least mean squares (MCLMS) algorithm for distant-talking speech recognition. In a distant-talking environment, the channel impulse response is longer than the short-term spectral analysis window. By treating the late reverberation as additive noise, a noise reduction technique based on spectral subtraction was proposed to estimate the power spectrum of the clean speech using power spectra of the distorted speech and the unknown impulse responses. To estimate the power spectra of the impulse responses, a variable step-size unconstrained MCLMS (VSS-UMCLMS) algorithm for identifying the impulse responses in a time domain is extended to a frequency domain. To reduce the effect of the estimation error of the channel impulse response, we normalize the early reverberation by cepstral mean normalization (CMN) instead of spectral subtraction using the estimated impulse response. Furthermore, our proposed method is combined with conventional delay-and-sum beamforming. We conducted recognition experiments on a distorted speech signal simulated by convolving multi-channel impulse responses with clean speech. The proposed method achieved a relative error reduction rate of 22.4% in relation to conventional CMN. By combining the proposed method with beamforming, a relative error reduction rate of 24.5% in relation to the conventional CMN with beamforming was achieved using only an isolated word (with duration of about 0.6 s) to estimate the spectrum of the impulse response.},

keywords={},

doi={10.1587/transinf.E94.D.659},

ISSN={1745-1361},

month={March},}

Copy

TY - JOUR

TI - Distant-Talking Speech Recognition Based on Spectral Subtraction by Multi-Channel LMS Algorithm

T2 - IEICE TRANSACTIONS on Information

SP - 659

EP - 667

AU - Longbiao WANG

AU - Norihide KITAOKA

AU - Seiichi NAKAGAWA

PY - 2011

DO - 10.1587/transinf.E94.D.659

JO - IEICE TRANSACTIONS on Information

SN - 1745-1361

VL - E94-D

IS - 3

JA - IEICE TRANSACTIONS on Information

Y1 - March 2011

AB - We propose a blind dereverberation method based on spectral subtraction using a multi-channel least mean squares (MCLMS) algorithm for distant-talking speech recognition. In a distant-talking environment, the channel impulse response is longer than the short-term spectral analysis window. By treating the late reverberation as additive noise, a noise reduction technique based on spectral subtraction was proposed to estimate the power spectrum of the clean speech using power spectra of the distorted speech and the unknown impulse responses. To estimate the power spectra of the impulse responses, a variable step-size unconstrained MCLMS (VSS-UMCLMS) algorithm for identifying the impulse responses in a time domain is extended to a frequency domain. To reduce the effect of the estimation error of the channel impulse response, we normalize the early reverberation by cepstral mean normalization (CMN) instead of spectral subtraction using the estimated impulse response. Furthermore, our proposed method is combined with conventional delay-and-sum beamforming. We conducted recognition experiments on a distorted speech signal simulated by convolving multi-channel impulse responses with clean speech. The proposed method achieved a relative error reduction rate of 22.4% in relation to conventional CMN. By combining the proposed method with beamforming, a relative error reduction rate of 24.5% in relation to the conventional CMN with beamforming was achieved using only an isolated word (with duration of about 0.6 s) to estimate the spectrum of the impulse response.

ER -