1-2hit |
Takehiro IHARA Takayuki NAGAI Kazuhiko OZEKI Akira KUREMATSU
We present a novel approach for single-channel noise reduction of speech signals contaminated by additive noise. In this approach, the system requires speech samples to be uttered in advance by the same speaker as that of the input signal. Speech samples used in this method must have enough phonetic variety to reconstruct the input signal. In the proposed method, which we refer to as referential reconstruction, we have used a small database created from examples of speech, which will be called reference signals. Referential reconstruction uses an example-based approach, in which the objective is to find the candidate speech frame which is the most similar to the clean input frame without noise, although the input frame is contaminated with noise. When candidate frames are found, they become final outputs without any special processing. In order to find the candidate frames, a correlation coefficient is used as a similarity measure. Through automatic speech recognition experiments, the proposed method was shown to be effective, particularly for low-SNR speech signals corrupted with white noise or noise in high-frequency bands. Since the direct implementation of this method requires infeasible computational cost for searching through reference signals, a coarse-to-fine strategy is introduced in this paper.
Yutaka TSUBOI Takehiro IHARA Kazuyuki TAKAGI Kazuhiko OZEKI
A solution to the problem of improving robustness to noise in automatic speech recognition is presented in the framework of multi-band, multi-SNR, and multi-path approaches. In our word recognizer, the whole frequency band is divided into seven-overlapped sub-bands, and then sub-band noisy phoneme HMMs are trained on speech data mixed with the filtered white Gaussian noise at multiple SNRs. The acoustic model of a word is built as a set of concatenations of clean and noisy sub-band phoneme HMMs arranged in parallel. A Viterbi decoder allows a search path to transit to another SNR condition at a phoneme boundary. The recognition scores of the sub-bands are then recombined to give the score for a word. Experiments show that the overlapped seven-band system yields the best performance under nonstationary ambient noises. It is also shown that the use of filtered white Gaussian noise is advantageous for training noisy phoneme HMMs.