In this paper we describe a method, which allows the likelihood normalization technique, widely used for speaker verification, to be implemented in a text-independent speaker identification system. The essence of this method is to apply likelihood normalization at frame level instead of, as it is usually done, at utterance level. Every frame of the test utterance is inputed to all the reference models in parallel. In this procedure, for each frame, likelihoods from all the models are available, hence they can be normalized at every frame. A special kind of likelihood normalization, called Weighting Models Rank, is also experimented. We have implemented these techniques in speaker identification system based on VQ-distortion codebooks or Gaussian Mixture Models. Evaluation results showed that the frame level likelihood normalization technique gives higher speaker identification rates than the standard accumulated likelihood approach.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Konstantin P. MARKOV, Seiichi NAKAGAWA, "Text-Independent Speaker Identification Utilizing Likelihood Normalization Technique" in IEICE TRANSACTIONS on Information,
vol. E80-D, no. 5, pp. 585-593, May 1997, doi: .
Abstract: In this paper we describe a method, which allows the likelihood normalization technique, widely used for speaker verification, to be implemented in a text-independent speaker identification system. The essence of this method is to apply likelihood normalization at frame level instead of, as it is usually done, at utterance level. Every frame of the test utterance is inputed to all the reference models in parallel. In this procedure, for each frame, likelihoods from all the models are available, hence they can be normalized at every frame. A special kind of likelihood normalization, called Weighting Models Rank, is also experimented. We have implemented these techniques in speaker identification system based on VQ-distortion codebooks or Gaussian Mixture Models. Evaluation results showed that the frame level likelihood normalization technique gives higher speaker identification rates than the standard accumulated likelihood approach.
URL: https://global.ieice.org/en_transactions/information/10.1587/e80-d_5_585/_p
Copy
@ARTICLE{e80-d_5_585,
author={Konstantin P. MARKOV, Seiichi NAKAGAWA, },
journal={IEICE TRANSACTIONS on Information},
title={Text-Independent Speaker Identification Utilizing Likelihood Normalization Technique},
year={1997},
volume={E80-D},
number={5},
pages={585-593},
abstract={In this paper we describe a method, which allows the likelihood normalization technique, widely used for speaker verification, to be implemented in a text-independent speaker identification system. The essence of this method is to apply likelihood normalization at frame level instead of, as it is usually done, at utterance level. Every frame of the test utterance is inputed to all the reference models in parallel. In this procedure, for each frame, likelihoods from all the models are available, hence they can be normalized at every frame. A special kind of likelihood normalization, called Weighting Models Rank, is also experimented. We have implemented these techniques in speaker identification system based on VQ-distortion codebooks or Gaussian Mixture Models. Evaluation results showed that the frame level likelihood normalization technique gives higher speaker identification rates than the standard accumulated likelihood approach.},
keywords={},
doi={},
ISSN={},
month={May},}
Copy
TY - JOUR
TI - Text-Independent Speaker Identification Utilizing Likelihood Normalization Technique
T2 - IEICE TRANSACTIONS on Information
SP - 585
EP - 593
AU - Konstantin P. MARKOV
AU - Seiichi NAKAGAWA
PY - 1997
DO -
JO - IEICE TRANSACTIONS on Information
SN -
VL - E80-D
IS - 5
JA - IEICE TRANSACTIONS on Information
Y1 - May 1997
AB - In this paper we describe a method, which allows the likelihood normalization technique, widely used for speaker verification, to be implemented in a text-independent speaker identification system. The essence of this method is to apply likelihood normalization at frame level instead of, as it is usually done, at utterance level. Every frame of the test utterance is inputed to all the reference models in parallel. In this procedure, for each frame, likelihoods from all the models are available, hence they can be normalized at every frame. A special kind of likelihood normalization, called Weighting Models Rank, is also experimented. We have implemented these techniques in speaker identification system based on VQ-distortion codebooks or Gaussian Mixture Models. Evaluation results showed that the frame level likelihood normalization technique gives higher speaker identification rates than the standard accumulated likelihood approach.
ER -