The search functionality is under construction.
The search functionality is under construction.

Utterance-Based Selective Training for the Automatic Creation of Task-Dependent Acoustic Models

Tobias CINCAREK, Tomoki TODA, Hiroshi SARUWATARI, Kiyohiro SHIKANO

  • Full Text Views

    0

  • Cite this

Summary :

To obtain a robust acoustic model for a certain speech recognition task, a large amount of speech data is necessary. However, the preparation of speech data including recording and transcription is very costly and time-consuming. Although there are attempts to build generic acoustic models which are portable among different applications, speech recognition performance is typically task-dependent. This paper introduces a method for automatically building task-dependent acoustic models based on selective training. Instead of setting up a new database, only a small amount of task-specific development data needs to be collected. Based on the likelihood of the target model parameters given this development data, utterances which are acoustically close to the development data are selected from existing speech data resources. Since there are too many possibilities for selecting a data subset from a larger database in general, a heuristic has to be employed. The proposed algorithm deletes single utterances temporarily or alternates between successive deletion and addition of multiple utterances. In order to make selective training computationally practical, model retraining and likelihood calculation need to be fast. It is shown, that the model likelihood can be calculated fast and easily based on sufficient statistics without the need for explicit reconstruction of model parameters. The algorithm is applied to obtain an infant- and elderly-dependent acoustic model with only very few development data available. There is an improvement in word accuracy of up to 9% in comparison to conventional EM training without selection. Furthermore, the approach was also better than MLLR and MAP adaptation with the development data.

Publication
IEICE TRANSACTIONS on Information Vol.E89-D No.3 pp.962-969
Publication Date
2006/03/01
Publicized
Online ISSN
1745-1361
DOI
10.1093/ietisy/e89-d.3.962
Type of Manuscript
Special Section PAPER (Special Section on Statistical Modeling for Speech Processing)
Category
Speech Recognition

Authors

Keyword