Robust Speaker Identification System Based on Multilayer Eigen-Codebook Vector Quantization

Ching-Tang HSIEH; Eugene LAI; Wan-Chen CHEN

IEICE TRANSACTIONS on Information

Robust Speaker Identification System Based on Multilayer Eigen-Codebook Vector Quantization

Ching-Tang HSIEH, Eugene LAI, Wan-Chen CHEN

Full Text Views

0

Cite this

Summary :

This paper presents some effective methods for improving the performance of a speaker identification system. Based on the multiresolution property of the wavelet transform, the input speech signal is decomposed into various frequency subbands in order not to spread noise distortions over the entire feature space. For capturing the characteristics of the vocal tract, the linear predictive cepstral coefficients (LPCC) of the lower frequency subband for each decomposition process are calculated. In addition, a hard threshold technique for the lower frequency subband in each decomposition process is also applied to eliminate the effect of noise interference. Furthermore, cepstral domain feature vector normalization is applied to all computed features in order to provide similar parameter statistics in all acoustic environments. In order to effectively utilize all these multiband speech features, we propose a modified vector quantization as the identifier. This model uses the multilayer concept to eliminate the interference among the multiband speech features and then uses the principal component analysis (PCA) method to evaluate the codebooks for capturing a more detailed distribution of the speaker's phoneme characteristics. The proposed method is evaluated using the KING speech database for text-independent speaker identification. Experimental results show that the recognition performance of the proposed method is better than those of the vector quantization (VQ) and the Gaussian mixture model (GMM) using full-band LPCC and mel-frequency cepstral coefficients (MFCC) features in both clean and noisy environments. Also, a satisfactory performance can be achieved in low SNR environments.

Publication: IEICE TRANSACTIONS on Information Vol.E87-D No.5 pp.1185-1193

Publication Date: 2004/05/01

Publicized

Online ISSN

DOI

Type of Manuscript: Special Section PAPER (Special Section on Speech Dynamics by Ear, Eye, Mouth and Machine)

Category

Cite this

Copy

Ching-Tang HSIEH, Eugene LAI, Wan-Chen CHEN, "Robust Speaker Identification System Based on Multilayer Eigen-Codebook Vector Quantization" in IEICE TRANSACTIONS on Information, vol. E87-D, no. 5, pp. 1185-1193, May 2004, doi: .
Abstract: This paper presents some effective methods for improving the performance of a speaker identification system. Based on the multiresolution property of the wavelet transform, the input speech signal is decomposed into various frequency subbands in order not to spread noise distortions over the entire feature space. For capturing the characteristics of the vocal tract, the linear predictive cepstral coefficients (LPCC) of the lower frequency subband for each decomposition process are calculated. In addition, a hard threshold technique for the lower frequency subband in each decomposition process is also applied to eliminate the effect of noise interference. Furthermore, cepstral domain feature vector normalization is applied to all computed features in order to provide similar parameter statistics in all acoustic environments. In order to effectively utilize all these multiband speech features, we propose a modified vector quantization as the identifier. This model uses the multilayer concept to eliminate the interference among the multiband speech features and then uses the principal component analysis (PCA) method to evaluate the codebooks for capturing a more detailed distribution of the speaker's phoneme characteristics. The proposed method is evaluated using the KING speech database for text-independent speaker identification. Experimental results show that the recognition performance of the proposed method is better than those of the vector quantization (VQ) and the Gaussian mixture model (GMM) using full-band LPCC and mel-frequency cepstral coefficients (MFCC) features in both clean and noisy environments. Also, a satisfactory performance can be achieved in low SNR environments.
URL: https://global.ieice.org/en_transactions/information/10.1587/e87-d_5_1185/_p

Copy

@ARTICLE{e87-d_5_1185,
author={Ching-Tang HSIEH, Eugene LAI, Wan-Chen CHEN, },
journal={IEICE TRANSACTIONS on Information},
title={Robust Speaker Identification System Based on Multilayer Eigen-Codebook Vector Quantization},
year={2004},
volume={E87-D},
number={5},
pages={1185-1193},
abstract={This paper presents some effective methods for improving the performance of a speaker identification system. Based on the multiresolution property of the wavelet transform, the input speech signal is decomposed into various frequency subbands in order not to spread noise distortions over the entire feature space. For capturing the characteristics of the vocal tract, the linear predictive cepstral coefficients (LPCC) of the lower frequency subband for each decomposition process are calculated. In addition, a hard threshold technique for the lower frequency subband in each decomposition process is also applied to eliminate the effect of noise interference. Furthermore, cepstral domain feature vector normalization is applied to all computed features in order to provide similar parameter statistics in all acoustic environments. In order to effectively utilize all these multiband speech features, we propose a modified vector quantization as the identifier. This model uses the multilayer concept to eliminate the interference among the multiband speech features and then uses the principal component analysis (PCA) method to evaluate the codebooks for capturing a more detailed distribution of the speaker's phoneme characteristics. The proposed method is evaluated using the KING speech database for text-independent speaker identification. Experimental results show that the recognition performance of the proposed method is better than those of the vector quantization (VQ) and the Gaussian mixture model (GMM) using full-band LPCC and mel-frequency cepstral coefficients (MFCC) features in both clean and noisy environments. Also, a satisfactory performance can be achieved in low SNR environments.},
keywords={},
doi={},
ISSN={},
month={May},}

Copy

TY - JOUR
TI - Robust Speaker Identification System Based on Multilayer Eigen-Codebook Vector Quantization
T2 - IEICE TRANSACTIONS on Information
SP - 1185
EP - 1193
AU - Ching-Tang HSIEH
AU - Eugene LAI
AU - Wan-Chen CHEN
PY - 2004
DO -
JO - IEICE TRANSACTIONS on Information
SN -
VL - E87-D
IS - 5
JA - IEICE TRANSACTIONS on Information
Y1 - May 2004
AB - This paper presents some effective methods for improving the performance of a speaker identification system. Based on the multiresolution property of the wavelet transform, the input speech signal is decomposed into various frequency subbands in order not to spread noise distortions over the entire feature space. For capturing the characteristics of the vocal tract, the linear predictive cepstral coefficients (LPCC) of the lower frequency subband for each decomposition process are calculated. In addition, a hard threshold technique for the lower frequency subband in each decomposition process is also applied to eliminate the effect of noise interference. Furthermore, cepstral domain feature vector normalization is applied to all computed features in order to provide similar parameter statistics in all acoustic environments. In order to effectively utilize all these multiband speech features, we propose a modified vector quantization as the identifier. This model uses the multilayer concept to eliminate the interference among the multiband speech features and then uses the principal component analysis (PCA) method to evaluate the codebooks for capturing a more detailed distribution of the speaker's phoneme characteristics. The proposed method is evaluated using the KING speech database for text-independent speaker identification. Experimental results show that the recognition performance of the proposed method is better than those of the vector quantization (VQ) and the Gaussian mixture model (GMM) using full-band LPCC and mel-frequency cepstral coefficients (MFCC) features in both clean and noisy environments. Also, a satisfactory performance can be achieved in low SNR environments.
ER -

IEICE TRANSACTIONS on Information

Robust Speaker Identification System Based on Multilayer Eigen-Codebook Vector Quantization

Summary :

Authors

Keyword

Latest Issue

Contents

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles

IEICE TRANSACTIONS on Information

Robust Speaker Identification System Based on Multilayer Eigen-Codebook Vector Quantization

Summary :

Authors

Keyword

Latest Issue

Contents

Copyrights notice of machine-translated contents

Cite this

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles