Speaker-Phonetic I-Vector Modeling for Text-Dependent Speaker Verification with Random Digit Strings

Shengyu YAO; Ruohua ZHOU; Pengyuan ZHANG

doi:10.1587/transinf.2018EDP7310

IEICE TRANSACTIONS on Information

Speaker-Phonetic I-Vector Modeling for Text-Dependent Speaker Verification with Random Digit Strings

Shengyu YAO, Ruohua ZHOU, Pengyuan ZHANG

Full Text Views

0

Cite this

Summary :

This paper proposes a speaker-phonetic i-vector modeling method for text-dependent speaker verification with random digit strings, in which enrollment and test utterances are not of the same phrase. The core of the proposed method is making use of digit alignment information in i-vector framework. By utilizing force alignment information, verification scores of the testing trials can be computed in the fixed-phrase situation, in which the compared speech segments between the enrollment and test utterances are of the same phonetic content. Specifically, utterances are segmented into digits, then a unique phonetically-constrained i-vector extractor is applied to obtain speaker and channel variability representation for every digit segment. Probabilistic linear discriminant analysis (PLDA) and s-norm are subsequently used for channel compensation and score normalization respectively. The final score is obtained by combing the digit scores, which are computed by scoring individual digit segments of the test utterance against the corresponding ones of the enrollment. Experimental results on the Part 3 of Robust Speaker Recognition (RSR2015) database demonstrate that the proposed approach significantly outperforms GMM-UBM by 52.3% and 53.5% relative in equal error rate (EER) for male and female respectively.

Publication: IEICE TRANSACTIONS on Information Vol.E102-D No.2 pp.346-354

Publication Date: 2019/02/01

Publicized: 2018/11/19

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2018EDP7310

Type of Manuscript: PAPER

Category: Speech and Hearing

Authors

Shengyu YAO
  Chinese Academy of Sciences,University of Chinese Academy of Sciences
Ruohua ZHOU
  Chinese Academy of Sciences,University of Chinese Academy of Sciences
Pengyuan ZHANG
  Chinese Academy of Sciences,University of Chinese Academy of Sciences

Keyword

speaker verification, text-dependent, speaker-phonetic, random digit strings, i-vector, phonetically-constrained

Cite this

Copy

Shengyu YAO, Ruohua ZHOU, Pengyuan ZHANG, "Speaker-Phonetic I-Vector Modeling for Text-Dependent Speaker Verification with Random Digit Strings" in IEICE TRANSACTIONS on Information, vol. E102-D, no. 2, pp. 346-354, February 2019, doi: 10.1587/transinf.2018EDP7310.
Abstract: This paper proposes a speaker-phonetic i-vector modeling method for text-dependent speaker verification with random digit strings, in which enrollment and test utterances are not of the same phrase. The core of the proposed method is making use of digit alignment information in i-vector framework. By utilizing force alignment information, verification scores of the testing trials can be computed in the fixed-phrase situation, in which the compared speech segments between the enrollment and test utterances are of the same phonetic content. Specifically, utterances are segmented into digits, then a unique phonetically-constrained i-vector extractor is applied to obtain speaker and channel variability representation for every digit segment. Probabilistic linear discriminant analysis (PLDA) and s-norm are subsequently used for channel compensation and score normalization respectively. The final score is obtained by combing the digit scores, which are computed by scoring individual digit segments of the test utterance against the corresponding ones of the enrollment. Experimental results on the Part 3 of Robust Speaker Recognition (RSR2015) database demonstrate that the proposed approach significantly outperforms GMM-UBM by 52.3% and 53.5% relative in equal error rate (EER) for male and female respectively.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2018EDP7310/_p

Copy

@ARTICLE{e102-d_2_346,
author={Shengyu YAO, Ruohua ZHOU, Pengyuan ZHANG, },
journal={IEICE TRANSACTIONS on Information},
title={Speaker-Phonetic I-Vector Modeling for Text-Dependent Speaker Verification with Random Digit Strings},
year={2019},
volume={E102-D},
number={2},
pages={346-354},
abstract={This paper proposes a speaker-phonetic i-vector modeling method for text-dependent speaker verification with random digit strings, in which enrollment and test utterances are not of the same phrase. The core of the proposed method is making use of digit alignment information in i-vector framework. By utilizing force alignment information, verification scores of the testing trials can be computed in the fixed-phrase situation, in which the compared speech segments between the enrollment and test utterances are of the same phonetic content. Specifically, utterances are segmented into digits, then a unique phonetically-constrained i-vector extractor is applied to obtain speaker and channel variability representation for every digit segment. Probabilistic linear discriminant analysis (PLDA) and s-norm are subsequently used for channel compensation and score normalization respectively. The final score is obtained by combing the digit scores, which are computed by scoring individual digit segments of the test utterance against the corresponding ones of the enrollment. Experimental results on the Part 3 of Robust Speaker Recognition (RSR2015) database demonstrate that the proposed approach significantly outperforms GMM-UBM by 52.3% and 53.5% relative in equal error rate (EER) for male and female respectively.},
keywords={},
doi={10.1587/transinf.2018EDP7310},
ISSN={1745-1361},
month={February},}

Copy

TY - JOUR
TI - Speaker-Phonetic I-Vector Modeling for Text-Dependent Speaker Verification with Random Digit Strings
T2 - IEICE TRANSACTIONS on Information
SP - 346
EP - 354
AU - Shengyu YAO
AU - Ruohua ZHOU
AU - Pengyuan ZHANG
PY - 2019
DO - 10.1587/transinf.2018EDP7310
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E102-D
IS - 2
JA - IEICE TRANSACTIONS on Information
Y1 - February 2019
AB - This paper proposes a speaker-phonetic i-vector modeling method for text-dependent speaker verification with random digit strings, in which enrollment and test utterances are not of the same phrase. The core of the proposed method is making use of digit alignment information in i-vector framework. By utilizing force alignment information, verification scores of the testing trials can be computed in the fixed-phrase situation, in which the compared speech segments between the enrollment and test utterances are of the same phonetic content. Specifically, utterances are segmented into digits, then a unique phonetically-constrained i-vector extractor is applied to obtain speaker and channel variability representation for every digit segment. Probabilistic linear discriminant analysis (PLDA) and s-norm are subsequently used for channel compensation and score normalization respectively. The final score is obtained by combing the digit scores, which are computed by scoring individual digit segments of the test utterance against the corresponding ones of the enrollment. Experimental results on the Part 3 of Robust Speaker Recognition (RSR2015) database demonstrate that the proposed approach significantly outperforms GMM-UBM by 52.3% and 53.5% relative in equal error rate (EER) for male and female respectively.
ER -

IEICE TRANSACTIONS on Information