IEICE global.ieice.org Site

Author Search Result

[Author] Nobuo HATAOKA(2hit)

1-2hit

Normalization of Time-Derivative Parameters for Robust Speech Recognition in Small Devices
Yasunari OBUCHI Nobuo HATAOKA Richard M. STERN

PAPER-Speech and Hearing

Vol:
E87-D No:4
Page(s):
1004-1011
In this paper we describe a new framework of feature compensation for robust speech recognition, which is suitable especially for small devices. We introduce Delta-cepstrum Normalization (DCN) that normalizes not only cepstral coefficients, but also their time-derivatives. Cepstral Mean Normalization (CMN) and Mean and Variance Normalization (MVN) are fast and efficient algorithms of environmental adaptation, and have been used widely. In those algorithms, normalization was applied to cepstral coefficients to reduce the irrelevant information from them, but such a normalization was not applied to time-derivative parameters because the reduction of the irrelevant information was not enough. However, Histogram Equalization (HEQ) provides better compensation and can be applied even to the delta and delta-delta cepstra. We investigate various implementation of DCN, and show that we can achieve the best performance when the normalization of the cepstra and the delta cepstra can be mutually interdependent. We evaluate the performance of DCN using speech data recorded by a PDA. DCN provides significant improvements compared to HEQ. It is shown that DCN gives 15% relative word error rate reduction from HEQ. We also examine the possibility of combining Vector Taylor Series (VTS) and DCN. Even though some combinations do not improve the performance of VTS, it is shown that the best combination gives the better performance than VTS alone. Finally, the advantage of DCN in terms of the computation speed is also discussed.
Multi-Input Feature Combination in the Cepstral Domain for Practical Speech Recognition Systems
Yasunari OBUCHI Nobuo HATAOKA

PAPER-Speech and Hearing

Vol:
E92-D No:4
Page(s):
662-670
In this paper we describe a new framework of feature combination in the cepstral domain for multi-input robust speech recognition. The general framework of working in the cepstral domain has various advantages over working in the time or hypothesis domain. It is stable, easy to maintain, and less expensive because it does not require precise calibration. It is also easy to configure in a complex speech recognition system. However, it is not straightforward to improve the recognition performance by increasing the number of inputs, and we introduce the concept of variance re-scaling to compensate the negative effect of averaging several input features. Finally, we propose to take another advantage of working in the cepstral domain. The speech can be modeled using hidden Markov models, and the model can be used as prior knowledge. This approach is formulated as a new algorithm, referred to as Hypothesis-Based Feature Combination. The effectiveness of various algorithms are evaluated using two sets of speech databases. We also refer to automatic optimization of some parameters in the proposed algorithms.

Author Search Result

[Author] Nobuo HATAOKA(2hit)

Normalization of Time-Derivative Parameters for Robust Speech Recognition in Small Devices

Multi-Input Feature Combination in the Cepstral Domain for Practical Speech Recognition Systems

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles