Multi-Input Feature Combination in the Cepstral Domain for Practical Speech Recognition Systems

Yasunari OBUCHI; Nobuo HATAOKA

doi:10.1587/transinf.E92.D.662

IEICE TRANSACTIONS on Information

Multi-Input Feature Combination in the Cepstral Domain for Practical Speech Recognition Systems

Yasunari OBUCHI, Nobuo HATAOKA

Full Text Views

0

Cite this

Summary :

In this paper we describe a new framework of feature combination in the cepstral domain for multi-input robust speech recognition. The general framework of working in the cepstral domain has various advantages over working in the time or hypothesis domain. It is stable, easy to maintain, and less expensive because it does not require precise calibration. It is also easy to configure in a complex speech recognition system. However, it is not straightforward to improve the recognition performance by increasing the number of inputs, and we introduce the concept of variance re-scaling to compensate the negative effect of averaging several input features. Finally, we propose to take another advantage of working in the cepstral domain. The speech can be modeled using hidden Markov models, and the model can be used as prior knowledge. This approach is formulated as a new algorithm, referred to as Hypothesis-Based Feature Combination. The effectiveness of various algorithms are evaluated using two sets of speech databases. We also refer to automatic optimization of some parameters in the proposed algorithms.

Publication: IEICE TRANSACTIONS on Information Vol.E92-D No.4 pp.662-670

Publication Date: 2009/04/01

Publicized

Online ISSN: 1745-1361

DOI: 10.1587/transinf.E92.D.662

Type of Manuscript: PAPER

Category: Speech and Hearing

Cite this

Copy

Yasunari OBUCHI, Nobuo HATAOKA, "Multi-Input Feature Combination in the Cepstral Domain for Practical Speech Recognition Systems" in IEICE TRANSACTIONS on Information, vol. E92-D, no. 4, pp. 662-670, April 2009, doi: 10.1587/transinf.E92.D.662.
Abstract: In this paper we describe a new framework of feature combination in the cepstral domain for multi-input robust speech recognition. The general framework of working in the cepstral domain has various advantages over working in the time or hypothesis domain. It is stable, easy to maintain, and less expensive because it does not require precise calibration. It is also easy to configure in a complex speech recognition system. However, it is not straightforward to improve the recognition performance by increasing the number of inputs, and we introduce the concept of variance re-scaling to compensate the negative effect of averaging several input features. Finally, we propose to take another advantage of working in the cepstral domain. The speech can be modeled using hidden Markov models, and the model can be used as prior knowledge. This approach is formulated as a new algorithm, referred to as Hypothesis-Based Feature Combination. The effectiveness of various algorithms are evaluated using two sets of speech databases. We also refer to automatic optimization of some parameters in the proposed algorithms.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.E92.D.662/_p

Copy

@ARTICLE{e92-d_4_662,
author={Yasunari OBUCHI, Nobuo HATAOKA, },
journal={IEICE TRANSACTIONS on Information},
title={Multi-Input Feature Combination in the Cepstral Domain for Practical Speech Recognition Systems},
year={2009},
volume={E92-D},
number={4},
pages={662-670},
abstract={In this paper we describe a new framework of feature combination in the cepstral domain for multi-input robust speech recognition. The general framework of working in the cepstral domain has various advantages over working in the time or hypothesis domain. It is stable, easy to maintain, and less expensive because it does not require precise calibration. It is also easy to configure in a complex speech recognition system. However, it is not straightforward to improve the recognition performance by increasing the number of inputs, and we introduce the concept of variance re-scaling to compensate the negative effect of averaging several input features. Finally, we propose to take another advantage of working in the cepstral domain. The speech can be modeled using hidden Markov models, and the model can be used as prior knowledge. This approach is formulated as a new algorithm, referred to as Hypothesis-Based Feature Combination. The effectiveness of various algorithms are evaluated using two sets of speech databases. We also refer to automatic optimization of some parameters in the proposed algorithms.},
keywords={},
doi={10.1587/transinf.E92.D.662},
ISSN={1745-1361},
month={April},}

Copy

TY - JOUR
TI - Multi-Input Feature Combination in the Cepstral Domain for Practical Speech Recognition Systems
T2 - IEICE TRANSACTIONS on Information
SP - 662
EP - 670
AU - Yasunari OBUCHI
AU - Nobuo HATAOKA
PY - 2009
DO - 10.1587/transinf.E92.D.662
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E92-D
IS - 4
JA - IEICE TRANSACTIONS on Information
Y1 - April 2009
AB - In this paper we describe a new framework of feature combination in the cepstral domain for multi-input robust speech recognition. The general framework of working in the cepstral domain has various advantages over working in the time or hypothesis domain. It is stable, easy to maintain, and less expensive because it does not require precise calibration. It is also easy to configure in a complex speech recognition system. However, it is not straightforward to improve the recognition performance by increasing the number of inputs, and we introduce the concept of variance re-scaling to compensate the negative effect of averaging several input features. Finally, we propose to take another advantage of working in the cepstral domain. The speech can be modeled using hidden Markov models, and the model can be used as prior knowledge. This approach is formulated as a new algorithm, referred to as Hypothesis-Based Feature Combination. The effectiveness of various algorithms are evaluated using two sets of speech databases. We also refer to automatic optimization of some parameters in the proposed algorithms.
ER -

IEICE TRANSACTIONS on Information

Multi-Input Feature Combination in the Cepstral Domain for Practical Speech Recognition Systems

Summary :

Authors

Keyword

Latest Issue

Contents

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles

IEICE TRANSACTIONS on Information

Multi-Input Feature Combination in the Cepstral Domain for Practical Speech Recognition Systems

Summary :

Authors

Keyword

Latest Issue

Contents

Copyrights notice of machine-translated contents

Cite this

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles