Duration Modeling with Decreased Intra-Group Temporal Variation for HMM-Based Phoneme Recognition

Nobuaki MINEMATSU; Keikichi HIROSE

Duration Modeling with Decreased Intra-Group Temporal Variation for HMM-Based Phoneme Recognition

Nobuaki MINEMATSU, Keikichi HIROSE

Full Text Views

0

Cite this

Summary :

A new clustering method was proposed to increase the effect of duration modeling on the HMM-based phoneme recognition. A precise observation on the temporal correspondences between a phoneme HMM with output probabilities by single Gaussian modeling and its training data indicated that there were two extreme cases, one with several types of correspondences in a phoneme class completely different from each other, and the other with only one type of correspondence. Although duration modeling was commonly used to incorporate the temporal information in the HMMs, a good modeling could not be obtained for the former case. Further observation for phoneme HMMs with output probabilities by Gaussian mixture modeling also showed that some HMMs still had multiple temporal correspondences, though the number of such phonemes was reduced as compared to the case of single Gaussian modeling. An appropriate duration modeling cannot be obtained for these phoneme HMMs by the conventional methods, where the duration distribution for each HMM state is represented by a distribution function. In order to cope with the problem, a new method was proposed which was based on the clustering of phoneme classes with plural types of temporal correspondences into sub-classes. The clustering was conducted so as to reduce the variations of the temporal correspondences in sub-classes. After the clustering, an HMM was constructed for each sub-class. Using the proposed method, speaker dependent recognition experiments were performed for phonemes segmented from isolated words. A few-percent increase was realized in the recognition rate, which was not obtained by another method based on the duration modeling with a Gaussian mixture.

Publication: IEICE TRANSACTIONS on Information Vol.E78-D No.6 pp.654-661

Publication Date: 1995/06/25

Publicized

Online ISSN

DOI

Type of Manuscript: Special Section PAPER (Special Issue on Spoken Language Processing)

Category

Cite this

Copy

Nobuaki MINEMATSU, Keikichi HIROSE, "Duration Modeling with Decreased Intra-Group Temporal Variation for HMM-Based Phoneme Recognition" in IEICE TRANSACTIONS on Information, vol. E78-D, no. 6, pp. 654-661, June 1995, doi: .
Abstract: A new clustering method was proposed to increase the effect of duration modeling on the HMM-based phoneme recognition. A precise observation on the temporal correspondences between a phoneme HMM with output probabilities by single Gaussian modeling and its training data indicated that there were two extreme cases, one with several types of correspondences in a phoneme class completely different from each other, and the other with only one type of correspondence. Although duration modeling was commonly used to incorporate the temporal information in the HMMs, a good modeling could not be obtained for the former case. Further observation for phoneme HMMs with output probabilities by Gaussian mixture modeling also showed that some HMMs still had multiple temporal correspondences, though the number of such phonemes was reduced as compared to the case of single Gaussian modeling. An appropriate duration modeling cannot be obtained for these phoneme HMMs by the conventional methods, where the duration distribution for each HMM state is represented by a distribution function. In order to cope with the problem, a new method was proposed which was based on the clustering of phoneme classes with plural types of temporal correspondences into sub-classes. The clustering was conducted so as to reduce the variations of the temporal correspondences in sub-classes. After the clustering, an HMM was constructed for each sub-class. Using the proposed method, speaker dependent recognition experiments were performed for phonemes segmented from isolated words. A few-percent increase was realized in the recognition rate, which was not obtained by another method based on the duration modeling with a Gaussian mixture.
URL: https://global.ieice.org/en_transactions/information/10.1587/e78-d_6_654/_p

Copy

@ARTICLE{e78-d_6_654,
author={Nobuaki MINEMATSU, Keikichi HIROSE, },
journal={IEICE TRANSACTIONS on Information},
title={Duration Modeling with Decreased Intra-Group Temporal Variation for HMM-Based Phoneme Recognition},
year={1995},
volume={E78-D},
number={6},
pages={654-661},
abstract={A new clustering method was proposed to increase the effect of duration modeling on the HMM-based phoneme recognition. A precise observation on the temporal correspondences between a phoneme HMM with output probabilities by single Gaussian modeling and its training data indicated that there were two extreme cases, one with several types of correspondences in a phoneme class completely different from each other, and the other with only one type of correspondence. Although duration modeling was commonly used to incorporate the temporal information in the HMMs, a good modeling could not be obtained for the former case. Further observation for phoneme HMMs with output probabilities by Gaussian mixture modeling also showed that some HMMs still had multiple temporal correspondences, though the number of such phonemes was reduced as compared to the case of single Gaussian modeling. An appropriate duration modeling cannot be obtained for these phoneme HMMs by the conventional methods, where the duration distribution for each HMM state is represented by a distribution function. In order to cope with the problem, a new method was proposed which was based on the clustering of phoneme classes with plural types of temporal correspondences into sub-classes. The clustering was conducted so as to reduce the variations of the temporal correspondences in sub-classes. After the clustering, an HMM was constructed for each sub-class. Using the proposed method, speaker dependent recognition experiments were performed for phonemes segmented from isolated words. A few-percent increase was realized in the recognition rate, which was not obtained by another method based on the duration modeling with a Gaussian mixture.},
keywords={},
doi={},
ISSN={},
month={June},}

Copy

TY - JOUR
TI - Duration Modeling with Decreased Intra-Group Temporal Variation for HMM-Based Phoneme Recognition
T2 - IEICE TRANSACTIONS on Information
SP - 654
EP - 661
AU - Nobuaki MINEMATSU
AU - Keikichi HIROSE
PY - 1995
DO -
JO - IEICE TRANSACTIONS on Information
SN -
VL - E78-D
IS - 6
JA - IEICE TRANSACTIONS on Information
Y1 - June 1995
AB - A new clustering method was proposed to increase the effect of duration modeling on the HMM-based phoneme recognition. A precise observation on the temporal correspondences between a phoneme HMM with output probabilities by single Gaussian modeling and its training data indicated that there were two extreme cases, one with several types of correspondences in a phoneme class completely different from each other, and the other with only one type of correspondence. Although duration modeling was commonly used to incorporate the temporal information in the HMMs, a good modeling could not be obtained for the former case. Further observation for phoneme HMMs with output probabilities by Gaussian mixture modeling also showed that some HMMs still had multiple temporal correspondences, though the number of such phonemes was reduced as compared to the case of single Gaussian modeling. An appropriate duration modeling cannot be obtained for these phoneme HMMs by the conventional methods, where the duration distribution for each HMM state is represented by a distribution function. In order to cope with the problem, a new method was proposed which was based on the clustering of phoneme classes with plural types of temporal correspondences into sub-classes. The clustering was conducted so as to reduce the variations of the temporal correspondences in sub-classes. After the clustering, an HMM was constructed for each sub-class. Using the proposed method, speaker dependent recognition experiments were performed for phonemes segmented from isolated words. A few-percent increase was realized in the recognition rate, which was not obtained by another method based on the duration modeling with a Gaussian mixture.
ER -