In this paper, we propose a new technique of filter bank subtraction for robust speech recognition under various acoustic conditions. Spectral subtraction is a simple and useful technique for reducing the influence of additive noise. Conventional spectral subtraction assumes accurate estimation of the noise spectrum and no correlation between speech and noise. Those assumptions, however, are rarely satisfied in reality, leading to the degradation of speech recognition accuracy. Moreover, the recognition improvement attained by conventional methods is slight when the input SNR changes sharply. We propose a new method in which the output values of filter banks are used for noise estimation and subtraction. By estimating noise at each filter bank, instead of at each frequency point, the method alleviates the necessity for precise estimation of noise. We also take into consideration expected phase differences between the spectra of speech and noise in the subtraction and control a subtraction coefficient theoretically. Recognition experiments on test sets at several SNRs showed that the filter bank subtraction technique improved the word accuracy significantly and got better results than conventional spectral subtraction on all the test sets. In other experiments, on recognizing speech from TV news field reports with environmental noise, the proposed subtraction method yielded better results than the conventional method.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Kazuo ONOE, Hiroyuki SEGI, Takeshi KOBAYAKAWA, Shoei SATO, Shinichi HOMMA, Toru IMAI, Akio ANDO, "Filter Bank Subtraction for Robust Speech Recognition" in IEICE TRANSACTIONS on Information,
vol. E86-D, no. 3, pp. 483-488, March 2003, doi: .
Abstract: In this paper, we propose a new technique of filter bank subtraction for robust speech recognition under various acoustic conditions. Spectral subtraction is a simple and useful technique for reducing the influence of additive noise. Conventional spectral subtraction assumes accurate estimation of the noise spectrum and no correlation between speech and noise. Those assumptions, however, are rarely satisfied in reality, leading to the degradation of speech recognition accuracy. Moreover, the recognition improvement attained by conventional methods is slight when the input SNR changes sharply. We propose a new method in which the output values of filter banks are used for noise estimation and subtraction. By estimating noise at each filter bank, instead of at each frequency point, the method alleviates the necessity for precise estimation of noise. We also take into consideration expected phase differences between the spectra of speech and noise in the subtraction and control a subtraction coefficient theoretically. Recognition experiments on test sets at several SNRs showed that the filter bank subtraction technique improved the word accuracy significantly and got better results than conventional spectral subtraction on all the test sets. In other experiments, on recognizing speech from TV news field reports with environmental noise, the proposed subtraction method yielded better results than the conventional method.
URL: https://global.ieice.org/en_transactions/information/10.1587/e86-d_3_483/_p
Copy
@ARTICLE{e86-d_3_483,
author={Kazuo ONOE, Hiroyuki SEGI, Takeshi KOBAYAKAWA, Shoei SATO, Shinichi HOMMA, Toru IMAI, Akio ANDO, },
journal={IEICE TRANSACTIONS on Information},
title={Filter Bank Subtraction for Robust Speech Recognition},
year={2003},
volume={E86-D},
number={3},
pages={483-488},
abstract={In this paper, we propose a new technique of filter bank subtraction for robust speech recognition under various acoustic conditions. Spectral subtraction is a simple and useful technique for reducing the influence of additive noise. Conventional spectral subtraction assumes accurate estimation of the noise spectrum and no correlation between speech and noise. Those assumptions, however, are rarely satisfied in reality, leading to the degradation of speech recognition accuracy. Moreover, the recognition improvement attained by conventional methods is slight when the input SNR changes sharply. We propose a new method in which the output values of filter banks are used for noise estimation and subtraction. By estimating noise at each filter bank, instead of at each frequency point, the method alleviates the necessity for precise estimation of noise. We also take into consideration expected phase differences between the spectra of speech and noise in the subtraction and control a subtraction coefficient theoretically. Recognition experiments on test sets at several SNRs showed that the filter bank subtraction technique improved the word accuracy significantly and got better results than conventional spectral subtraction on all the test sets. In other experiments, on recognizing speech from TV news field reports with environmental noise, the proposed subtraction method yielded better results than the conventional method.},
keywords={},
doi={},
ISSN={},
month={March},}
Copy
TY - JOUR
TI - Filter Bank Subtraction for Robust Speech Recognition
T2 - IEICE TRANSACTIONS on Information
SP - 483
EP - 488
AU - Kazuo ONOE
AU - Hiroyuki SEGI
AU - Takeshi KOBAYAKAWA
AU - Shoei SATO
AU - Shinichi HOMMA
AU - Toru IMAI
AU - Akio ANDO
PY - 2003
DO -
JO - IEICE TRANSACTIONS on Information
SN -
VL - E86-D
IS - 3
JA - IEICE TRANSACTIONS on Information
Y1 - March 2003
AB - In this paper, we propose a new technique of filter bank subtraction for robust speech recognition under various acoustic conditions. Spectral subtraction is a simple and useful technique for reducing the influence of additive noise. Conventional spectral subtraction assumes accurate estimation of the noise spectrum and no correlation between speech and noise. Those assumptions, however, are rarely satisfied in reality, leading to the degradation of speech recognition accuracy. Moreover, the recognition improvement attained by conventional methods is slight when the input SNR changes sharply. We propose a new method in which the output values of filter banks are used for noise estimation and subtraction. By estimating noise at each filter bank, instead of at each frequency point, the method alleviates the necessity for precise estimation of noise. We also take into consideration expected phase differences between the spectra of speech and noise in the subtraction and control a subtraction coefficient theoretically. Recognition experiments on test sets at several SNRs showed that the filter bank subtraction technique improved the word accuracy significantly and got better results than conventional spectral subtraction on all the test sets. In other experiments, on recognizing speech from TV news field reports with environmental noise, the proposed subtraction method yielded better results than the conventional method.
ER -