The search functionality is under construction.
The search functionality is under construction.

Duration Modeling with Decreased Intra-Group Temporal Variation for HMM-Based Phoneme Recognition

Nobuaki MINEMATSU, Keikichi HIROSE

  • Full Text Views

    0

  • Cite this

Summary :

A new clustering method was proposed to increase the effect of duration modeling on the HMM-based phoneme recognition. A precise observation on the temporal correspondences between a phoneme HMM with output probabilities by single Gaussian modeling and its training data indicated that there were two extreme cases, one with several types of correspondences in a phoneme class completely different from each other, and the other with only one type of correspondence. Although duration modeling was commonly used to incorporate the temporal information in the HMMs, a good modeling could not be obtained for the former case. Further observation for phoneme HMMs with output probabilities by Gaussian mixture modeling also showed that some HMMs still had multiple temporal correspondences, though the number of such phonemes was reduced as compared to the case of single Gaussian modeling. An appropriate duration modeling cannot be obtained for these phoneme HMMs by the conventional methods, where the duration distribution for each HMM state is represented by a distribution function. In order to cope with the problem, a new method was proposed which was based on the clustering of phoneme classes with plural types of temporal correspondences into sub-classes. The clustering was conducted so as to reduce the variations of the temporal correspondences in sub-classes. After the clustering, an HMM was constructed for each sub-class. Using the proposed method, speaker dependent recognition experiments were performed for phonemes segmented from isolated words. A few-percent increase was realized in the recognition rate, which was not obtained by another method based on the duration modeling with a Gaussian mixture.

Publication
IEICE TRANSACTIONS on Information Vol.E78-D No.6 pp.654-661
Publication Date
1995/06/25
Publicized
Online ISSN
DOI
Type of Manuscript
Special Section PAPER (Special Issue on Spoken Language Processing)
Category

Authors

Keyword