A Bayesian Framework Using Multiple Model Structures for Speech Recognition

Sayaka SHIOTA; Kei HASHIMOTO; Yoshihiko NANKAKU; Keiichi TOKUDA

doi:10.1587/transinf.E96.D.939

IEICE TRANSACTIONS on Information

A Bayesian Framework Using Multiple Model Structures for Speech Recognition

Sayaka SHIOTA, Kei HASHIMOTO, Yoshihiko NANKAKU, Keiichi TOKUDA

Full Text Views

0

Cite this

Summary :

This paper proposes an acoustic modeling technique based on Bayesian framework using multiple model structures for speech recognition. The aim of the Bayesian approach is to obtain good prediction of observation by marginalizing all variables related to generative processes. Although the effectiveness of marginalizing model parameters was recently reported in speech recognition, most of these systems use only “one” model structure, e.g., topologies of HMMs, the number of states and mixtures, types of state output distributions, and parameter tying structures. However, it is insufficient to represent a true model distribution, because a family of such models usually does not include a true distribution in most practical cases. One of solutions of this problem is to use multiple model structures. Although several approaches using multiple model structures have already been proposed, the consistent integration of multiple model structures based on the Bayesian approach has not seen in speech recognition. This paper focuses on integrating multiple phonetic decision trees based on the Bayesian framework in HMM based acoustic modeling. The proposed method is derived from a new marginal likelihood function which includes the model structures as a latent variable in addition to HMM state sequences and model parameters, and the posterior distributions of these latent variables are obtained using the variational Bayesian method. Furthermore, to improve the optimization algorithm, the deterministic annealing EM (DAEM) algorithm is applied to the training process. The proposed method effectively utilizes multiple model structures, especially in the early stage of training and this leads to better predictive distributions and improvement of recognition performance.

Publication: IEICE TRANSACTIONS on Information Vol.E96-D No.4 pp.939-948

Publication Date: 2013/04/01

Publicized

Online ISSN: 1745-1361

DOI: 10.1587/transinf.E96.D.939

Type of Manuscript: PAPER

Category: Speech and Hearing

Cite this

Copy

Sayaka SHIOTA, Kei HASHIMOTO, Yoshihiko NANKAKU, Keiichi TOKUDA, "A Bayesian Framework Using Multiple Model Structures for Speech Recognition" in IEICE TRANSACTIONS on Information, vol. E96-D, no. 4, pp. 939-948, April 2013, doi: 10.1587/transinf.E96.D.939.
Abstract: This paper proposes an acoustic modeling technique based on Bayesian framework using multiple model structures for speech recognition. The aim of the Bayesian approach is to obtain good prediction of observation by marginalizing all variables related to generative processes. Although the effectiveness of marginalizing model parameters was recently reported in speech recognition, most of these systems use only “one” model structure, e.g., topologies of HMMs, the number of states and mixtures, types of state output distributions, and parameter tying structures. However, it is insufficient to represent a true model distribution, because a family of such models usually does not include a true distribution in most practical cases. One of solutions of this problem is to use multiple model structures. Although several approaches using multiple model structures have already been proposed, the consistent integration of multiple model structures based on the Bayesian approach has not seen in speech recognition. This paper focuses on integrating multiple phonetic decision trees based on the Bayesian framework in HMM based acoustic modeling. The proposed method is derived from a new marginal likelihood function which includes the model structures as a latent variable in addition to HMM state sequences and model parameters, and the posterior distributions of these latent variables are obtained using the variational Bayesian method. Furthermore, to improve the optimization algorithm, the deterministic annealing EM (DAEM) algorithm is applied to the training process. The proposed method effectively utilizes multiple model structures, especially in the early stage of training and this leads to better predictive distributions and improvement of recognition performance.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.E96.D.939/_p

Copy

@ARTICLE{e96-d_4_939,
author={Sayaka SHIOTA, Kei HASHIMOTO, Yoshihiko NANKAKU, Keiichi TOKUDA, },
journal={IEICE TRANSACTIONS on Information},
title={A Bayesian Framework Using Multiple Model Structures for Speech Recognition},
year={2013},
volume={E96-D},
number={4},
pages={939-948},
abstract={This paper proposes an acoustic modeling technique based on Bayesian framework using multiple model structures for speech recognition. The aim of the Bayesian approach is to obtain good prediction of observation by marginalizing all variables related to generative processes. Although the effectiveness of marginalizing model parameters was recently reported in speech recognition, most of these systems use only “one” model structure, e.g., topologies of HMMs, the number of states and mixtures, types of state output distributions, and parameter tying structures. However, it is insufficient to represent a true model distribution, because a family of such models usually does not include a true distribution in most practical cases. One of solutions of this problem is to use multiple model structures. Although several approaches using multiple model structures have already been proposed, the consistent integration of multiple model structures based on the Bayesian approach has not seen in speech recognition. This paper focuses on integrating multiple phonetic decision trees based on the Bayesian framework in HMM based acoustic modeling. The proposed method is derived from a new marginal likelihood function which includes the model structures as a latent variable in addition to HMM state sequences and model parameters, and the posterior distributions of these latent variables are obtained using the variational Bayesian method. Furthermore, to improve the optimization algorithm, the deterministic annealing EM (DAEM) algorithm is applied to the training process. The proposed method effectively utilizes multiple model structures, especially in the early stage of training and this leads to better predictive distributions and improvement of recognition performance.},
keywords={},
doi={10.1587/transinf.E96.D.939},
ISSN={1745-1361},
month={April},}

Copy

TY - JOUR
TI - A Bayesian Framework Using Multiple Model Structures for Speech Recognition
T2 - IEICE TRANSACTIONS on Information
SP - 939
EP - 948
AU - Sayaka SHIOTA
AU - Kei HASHIMOTO
AU - Yoshihiko NANKAKU
AU - Keiichi TOKUDA
PY - 2013
DO - 10.1587/transinf.E96.D.939
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E96-D
IS - 4
JA - IEICE TRANSACTIONS on Information
Y1 - April 2013
AB - This paper proposes an acoustic modeling technique based on Bayesian framework using multiple model structures for speech recognition. The aim of the Bayesian approach is to obtain good prediction of observation by marginalizing all variables related to generative processes. Although the effectiveness of marginalizing model parameters was recently reported in speech recognition, most of these systems use only “one” model structure, e.g., topologies of HMMs, the number of states and mixtures, types of state output distributions, and parameter tying structures. However, it is insufficient to represent a true model distribution, because a family of such models usually does not include a true distribution in most practical cases. One of solutions of this problem is to use multiple model structures. Although several approaches using multiple model structures have already been proposed, the consistent integration of multiple model structures based on the Bayesian approach has not seen in speech recognition. This paper focuses on integrating multiple phonetic decision trees based on the Bayesian framework in HMM based acoustic modeling. The proposed method is derived from a new marginal likelihood function which includes the model structures as a latent variable in addition to HMM state sequences and model parameters, and the posterior distributions of these latent variables are obtained using the variational Bayesian method. Furthermore, to improve the optimization algorithm, the deterministic annealing EM (DAEM) algorithm is applied to the training process. The proposed method effectively utilizes multiple model structures, especially in the early stage of training and this leads to better predictive distributions and improvement of recognition performance.
ER -

IEICE TRANSACTIONS on Information

A Bayesian Framework Using Multiple Model Structures for Speech Recognition

Summary :

Authors

Keyword

Latest Issue

Contents

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles

IEICE TRANSACTIONS on Information

A Bayesian Framework Using Multiple Model Structures for Speech Recognition

Summary :

Authors

Keyword

Latest Issue

Contents

Copyrights notice of machine-translated contents

Cite this

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles