This paper presents a new approach to modeling speech spectra and pitch for text-independent speaker identification using Gaussian mixture models based on multi-space probability distribution (MSD-GMM). MSD-GMM allows us to model continuous pitch values of voiced frames and discrete symbols for unvoiced frames in a unified framework. Spectral and pitch features are jointly modeled by a two-stream MSD-GMM. We derive maximum likelihood (ML) estimation formulae and minimum classification error (MCE) training procedure for MSD-GMM parameters. The MSD-GMM speaker models are evaluated for text-independent speaker identification tasks. The experimental results show that the MSD-GMM can efficiently model spectral and pitch features of each speaker and outperforms conventional speaker models. The results also demonstrate the utility of the MCE training of the MSD-GMM parameters and the robustness for the inter-session variability.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Chiyomi MIYAJIMA, Yosuke HATTORI, Keiichi TOKUDA, Takashi MASUKO, Takao KOBAYASHI, Tadashi KITAMURA, "Text-Independent Speaker Identification Using Gaussian Mixture Models Based on Multi-Space Probability Distribution" in IEICE TRANSACTIONS on Information,
vol. E84-D, no. 7, pp. 847-855, July 2001, doi: .
Abstract: This paper presents a new approach to modeling speech spectra and pitch for text-independent speaker identification using Gaussian mixture models based on multi-space probability distribution (MSD-GMM). MSD-GMM allows us to model continuous pitch values of voiced frames and discrete symbols for unvoiced frames in a unified framework. Spectral and pitch features are jointly modeled by a two-stream MSD-GMM. We derive maximum likelihood (ML) estimation formulae and minimum classification error (MCE) training procedure for MSD-GMM parameters. The MSD-GMM speaker models are evaluated for text-independent speaker identification tasks. The experimental results show that the MSD-GMM can efficiently model spectral and pitch features of each speaker and outperforms conventional speaker models. The results also demonstrate the utility of the MCE training of the MSD-GMM parameters and the robustness for the inter-session variability.
URL: https://global.ieice.org/en_transactions/information/10.1587/e84-d_7_847/_p
Copy
@ARTICLE{e84-d_7_847,
author={Chiyomi MIYAJIMA, Yosuke HATTORI, Keiichi TOKUDA, Takashi MASUKO, Takao KOBAYASHI, Tadashi KITAMURA, },
journal={IEICE TRANSACTIONS on Information},
title={Text-Independent Speaker Identification Using Gaussian Mixture Models Based on Multi-Space Probability Distribution},
year={2001},
volume={E84-D},
number={7},
pages={847-855},
abstract={This paper presents a new approach to modeling speech spectra and pitch for text-independent speaker identification using Gaussian mixture models based on multi-space probability distribution (MSD-GMM). MSD-GMM allows us to model continuous pitch values of voiced frames and discrete symbols for unvoiced frames in a unified framework. Spectral and pitch features are jointly modeled by a two-stream MSD-GMM. We derive maximum likelihood (ML) estimation formulae and minimum classification error (MCE) training procedure for MSD-GMM parameters. The MSD-GMM speaker models are evaluated for text-independent speaker identification tasks. The experimental results show that the MSD-GMM can efficiently model spectral and pitch features of each speaker and outperforms conventional speaker models. The results also demonstrate the utility of the MCE training of the MSD-GMM parameters and the robustness for the inter-session variability.},
keywords={},
doi={},
ISSN={},
month={July},}
Copy
TY - JOUR
TI - Text-Independent Speaker Identification Using Gaussian Mixture Models Based on Multi-Space Probability Distribution
T2 - IEICE TRANSACTIONS on Information
SP - 847
EP - 855
AU - Chiyomi MIYAJIMA
AU - Yosuke HATTORI
AU - Keiichi TOKUDA
AU - Takashi MASUKO
AU - Takao KOBAYASHI
AU - Tadashi KITAMURA
PY - 2001
DO -
JO - IEICE TRANSACTIONS on Information
SN -
VL - E84-D
IS - 7
JA - IEICE TRANSACTIONS on Information
Y1 - July 2001
AB - This paper presents a new approach to modeling speech spectra and pitch for text-independent speaker identification using Gaussian mixture models based on multi-space probability distribution (MSD-GMM). MSD-GMM allows us to model continuous pitch values of voiced frames and discrete symbols for unvoiced frames in a unified framework. Spectral and pitch features are jointly modeled by a two-stream MSD-GMM. We derive maximum likelihood (ML) estimation formulae and minimum classification error (MCE) training procedure for MSD-GMM parameters. The MSD-GMM speaker models are evaluated for text-independent speaker identification tasks. The experimental results show that the MSD-GMM can efficiently model spectral and pitch features of each speaker and outperforms conventional speaker models. The results also demonstrate the utility of the MCE training of the MSD-GMM parameters and the robustness for the inter-session variability.
ER -