A solution to the problem of improving robustness to noise in automatic speech recognition is presented in the framework of multi-band, multi-SNR, and multi-path approaches. In our word recognizer, the whole frequency band is divided into seven-overlapped sub-bands, and then sub-band noisy phoneme HMMs are trained on speech data mixed with the filtered white Gaussian noise at multiple SNRs. The acoustic model of a word is built as a set of concatenations of clean and noisy sub-band phoneme HMMs arranged in parallel. A Viterbi decoder allows a search path to transit to another SNR condition at a phoneme boundary. The recognition scores of the sub-bands are then recombined to give the score for a word. Experiments show that the overlapped seven-band system yields the best performance under nonstationary ambient noises. It is also shown that the use of filtered white Gaussian noise is advantageous for training noisy phoneme HMMs.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Yutaka TSUBOI, Takehiro IHARA, Kazuyuki TAKAGI, Kazuhiko OZEKI, "The Use of Overlapped Sub-Bands in Multi-Band, Multi-SNR, Multi-Path Recognition of Noisy Word Utterances" in IEICE TRANSACTIONS on Information,
vol. E91-D, no. 6, pp. 1774-1782, June 2008, doi: 10.1093/ietisy/e91-d.6.1774.
Abstract: A solution to the problem of improving robustness to noise in automatic speech recognition is presented in the framework of multi-band, multi-SNR, and multi-path approaches. In our word recognizer, the whole frequency band is divided into seven-overlapped sub-bands, and then sub-band noisy phoneme HMMs are trained on speech data mixed with the filtered white Gaussian noise at multiple SNRs. The acoustic model of a word is built as a set of concatenations of clean and noisy sub-band phoneme HMMs arranged in parallel. A Viterbi decoder allows a search path to transit to another SNR condition at a phoneme boundary. The recognition scores of the sub-bands are then recombined to give the score for a word. Experiments show that the overlapped seven-band system yields the best performance under nonstationary ambient noises. It is also shown that the use of filtered white Gaussian noise is advantageous for training noisy phoneme HMMs.
URL: https://global.ieice.org/en_transactions/information/10.1093/ietisy/e91-d.6.1774/_p
Copy
@ARTICLE{e91-d_6_1774,
author={Yutaka TSUBOI, Takehiro IHARA, Kazuyuki TAKAGI, Kazuhiko OZEKI, },
journal={IEICE TRANSACTIONS on Information},
title={The Use of Overlapped Sub-Bands in Multi-Band, Multi-SNR, Multi-Path Recognition of Noisy Word Utterances},
year={2008},
volume={E91-D},
number={6},
pages={1774-1782},
abstract={A solution to the problem of improving robustness to noise in automatic speech recognition is presented in the framework of multi-band, multi-SNR, and multi-path approaches. In our word recognizer, the whole frequency band is divided into seven-overlapped sub-bands, and then sub-band noisy phoneme HMMs are trained on speech data mixed with the filtered white Gaussian noise at multiple SNRs. The acoustic model of a word is built as a set of concatenations of clean and noisy sub-band phoneme HMMs arranged in parallel. A Viterbi decoder allows a search path to transit to another SNR condition at a phoneme boundary. The recognition scores of the sub-bands are then recombined to give the score for a word. Experiments show that the overlapped seven-band system yields the best performance under nonstationary ambient noises. It is also shown that the use of filtered white Gaussian noise is advantageous for training noisy phoneme HMMs.},
keywords={},
doi={10.1093/ietisy/e91-d.6.1774},
ISSN={1745-1361},
month={June},}
Copy
TY - JOUR
TI - The Use of Overlapped Sub-Bands in Multi-Band, Multi-SNR, Multi-Path Recognition of Noisy Word Utterances
T2 - IEICE TRANSACTIONS on Information
SP - 1774
EP - 1782
AU - Yutaka TSUBOI
AU - Takehiro IHARA
AU - Kazuyuki TAKAGI
AU - Kazuhiko OZEKI
PY - 2008
DO - 10.1093/ietisy/e91-d.6.1774
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E91-D
IS - 6
JA - IEICE TRANSACTIONS on Information
Y1 - June 2008
AB - A solution to the problem of improving robustness to noise in automatic speech recognition is presented in the framework of multi-band, multi-SNR, and multi-path approaches. In our word recognizer, the whole frequency band is divided into seven-overlapped sub-bands, and then sub-band noisy phoneme HMMs are trained on speech data mixed with the filtered white Gaussian noise at multiple SNRs. The acoustic model of a word is built as a set of concatenations of clean and noisy sub-band phoneme HMMs arranged in parallel. A Viterbi decoder allows a search path to transit to another SNR condition at a phoneme boundary. The recognition scores of the sub-bands are then recombined to give the score for a word. Experiments show that the overlapped seven-band system yields the best performance under nonstationary ambient noises. It is also shown that the use of filtered white Gaussian noise is advantageous for training noisy phoneme HMMs.
ER -