Acoustic Model Training Using Pseudo-Speaker Features Generated by MLLR Transformations for Robust Speaker-Independent Speech Recognition

Arata ITOH; Sunao HARA; Norihide KITAOKA; Kazuya TAKEDA

doi:10.1587/transinf.E95.D.2479

IEICE TRANSACTIONS on Information

Acoustic Model Training Using Pseudo-Speaker Features Generated by MLLR Transformations for Robust Speaker-Independent Speech Recognition

Arata ITOH, Sunao HARA, Norihide KITAOKA, Kazuya TAKEDA

Full Text Views

0

Cite this

Summary :

A novel speech feature generation-based acoustic model training method for robust speaker-independent speech recognition is proposed. For decades, speaker adaptation methods have been widely used. All of these adaptation methods need adaptation data. However, our proposed method aims to create speaker-independent acoustic models that cover not only known but also unknown speakers. We achieve this by adopting inverse maximum likelihood linear regression (MLLR) transformation-based feature generation, and then we train our models using these features. First we obtain MLLR transformation matrices from a limited number of existing speakers. Then we extract the bases of the MLLR transformation matrices using PCA. The distribution of the weight parameters to express the transformation matrices for the existing speakers are estimated. Next, we construct pseudo-speaker transformations by sampling the weight parameters from the distribution, and apply the transformation to the normalized features of the existing speaker to generate the features of the pseudo-speakers. Finally, using these features, we train the acoustic models. Evaluation results show that the acoustic models trained using our proposed method are robust for unknown speakers.

Publication: IEICE TRANSACTIONS on Information Vol.E95-D No.10 pp.2479-2485

Publication Date: 2012/10/01

Publicized

Online ISSN: 1745-1361

DOI: 10.1587/transinf.E95.D.2479

Type of Manuscript: PAPER

Category: Speech and Hearing

Cite this

Copy

Arata ITOH, Sunao HARA, Norihide KITAOKA, Kazuya TAKEDA, "Acoustic Model Training Using Pseudo-Speaker Features Generated by MLLR Transformations for Robust Speaker-Independent Speech Recognition" in IEICE TRANSACTIONS on Information, vol. E95-D, no. 10, pp. 2479-2485, October 2012, doi: 10.1587/transinf.E95.D.2479.
Abstract: A novel speech feature generation-based acoustic model training method for robust speaker-independent speech recognition is proposed. For decades, speaker adaptation methods have been widely used. All of these adaptation methods need adaptation data. However, our proposed method aims to create speaker-independent acoustic models that cover not only known but also unknown speakers. We achieve this by adopting inverse maximum likelihood linear regression (MLLR) transformation-based feature generation, and then we train our models using these features. First we obtain MLLR transformation matrices from a limited number of existing speakers. Then we extract the bases of the MLLR transformation matrices using PCA. The distribution of the weight parameters to express the transformation matrices for the existing speakers are estimated. Next, we construct pseudo-speaker transformations by sampling the weight parameters from the distribution, and apply the transformation to the normalized features of the existing speaker to generate the features of the pseudo-speakers. Finally, using these features, we train the acoustic models. Evaluation results show that the acoustic models trained using our proposed method are robust for unknown speakers.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.E95.D.2479/_p

Copy

@ARTICLE{e95-d_10_2479,
author={Arata ITOH, Sunao HARA, Norihide KITAOKA, Kazuya TAKEDA, },
journal={IEICE TRANSACTIONS on Information},
title={Acoustic Model Training Using Pseudo-Speaker Features Generated by MLLR Transformations for Robust Speaker-Independent Speech Recognition},
year={2012},
volume={E95-D},
number={10},
pages={2479-2485},
abstract={A novel speech feature generation-based acoustic model training method for robust speaker-independent speech recognition is proposed. For decades, speaker adaptation methods have been widely used. All of these adaptation methods need adaptation data. However, our proposed method aims to create speaker-independent acoustic models that cover not only known but also unknown speakers. We achieve this by adopting inverse maximum likelihood linear regression (MLLR) transformation-based feature generation, and then we train our models using these features. First we obtain MLLR transformation matrices from a limited number of existing speakers. Then we extract the bases of the MLLR transformation matrices using PCA. The distribution of the weight parameters to express the transformation matrices for the existing speakers are estimated. Next, we construct pseudo-speaker transformations by sampling the weight parameters from the distribution, and apply the transformation to the normalized features of the existing speaker to generate the features of the pseudo-speakers. Finally, using these features, we train the acoustic models. Evaluation results show that the acoustic models trained using our proposed method are robust for unknown speakers.},
keywords={},
doi={10.1587/transinf.E95.D.2479},
ISSN={1745-1361},
month={October},}

Copy

TY - JOUR
TI - Acoustic Model Training Using Pseudo-Speaker Features Generated by MLLR Transformations for Robust Speaker-Independent Speech Recognition
T2 - IEICE TRANSACTIONS on Information
SP - 2479
EP - 2485
AU - Arata ITOH
AU - Sunao HARA
AU - Norihide KITAOKA
AU - Kazuya TAKEDA
PY - 2012
DO - 10.1587/transinf.E95.D.2479
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E95-D
IS - 10
JA - IEICE TRANSACTIONS on Information
Y1 - October 2012
AB - A novel speech feature generation-based acoustic model training method for robust speaker-independent speech recognition is proposed. For decades, speaker adaptation methods have been widely used. All of these adaptation methods need adaptation data. However, our proposed method aims to create speaker-independent acoustic models that cover not only known but also unknown speakers. We achieve this by adopting inverse maximum likelihood linear regression (MLLR) transformation-based feature generation, and then we train our models using these features. First we obtain MLLR transformation matrices from a limited number of existing speakers. Then we extract the bases of the MLLR transformation matrices using PCA. The distribution of the weight parameters to express the transformation matrices for the existing speakers are estimated. Next, we construct pseudo-speaker transformations by sampling the weight parameters from the distribution, and apply the transformation to the normalized features of the existing speaker to generate the features of the pseudo-speakers. Finally, using these features, we train the acoustic models. Evaluation results show that the acoustic models trained using our proposed method are robust for unknown speakers.
ER -

IEICE TRANSACTIONS on Information

Acoustic Model Training Using Pseudo-Speaker Features Generated by MLLR Transformations for Robust Speaker-Independent Speech Recognition

Summary :

Authors

Keyword

Latest Issue

Contents

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles

IEICE TRANSACTIONS on Information

Acoustic Model Training Using Pseudo-Speaker Features Generated by MLLR Transformations for Robust Speaker-Independent Speech Recognition

Summary :

Authors

Keyword

Latest Issue

Contents

Copyrights notice of machine-translated contents

Cite this

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles