Assessment of On-Line Model Quality and Threshold Estimation in Speaker Verification

Javier R. SAETA; Javier HERNANDO

doi:10.1093/ietisy/e90-d.4.759

Assessment of On-Line Model Quality and Threshold Estimation in Speaker Verification

Javier R. SAETA, Javier HERNANDO

Full Text Views

0

Cite this

Summary :

The selection of the most representative utterances coming from a speaker is essential for the right performance of automatic enrollment in speaker verification. Model quality measures and threshold estimation methods mainly deal with the scarcity of data and the difficulty of obtaining data from impostors in real applications. Conventional methods estimate the quality of the training utterances once the model is created. In such case, it is not possible to ask the user for more utterances during the training session if necessary. A new training session must be started. That was especially unusable in applications where only one or two enrolment sessions were allowed. In this paper, a new on-line quality method based on a male and a female Universal Background Model (UBM) is introduced. The two models act as a reference for new utterances and show if they belong to the same speaker and provide a measure of its quality at the same time. On the other hand, the estimation of the verification threshold is also strongly influenced by the previous selection of the speaker's utterances. In this context, potential outliers, i.e., those client scores which are distant with regard to mean, could lead to wrong mean and variance client estimations. To alleviate this problem, some efficient threshold estimation methods based on removing or weighting scores are proposed here. Before estimating the threshold, the client scores catalogued as outliers are removed, pruned or weighted, improving subsequent estimations. Text-dependent experiments have been carried out by using a telephonic multi-session database in Spanish. The database has been recorded by the authors and has 184 speakers.

Publication: IEICE TRANSACTIONS on Information Vol.E90-D No.4 pp.759-765

Publication Date: 2007/04/01

Publicized

Online ISSN: 1745-1361

DOI: 10.1093/ietisy/e90-d.4.759

Type of Manuscript: PAPER

Category: Speech and Hearing

Cite this

Copy

Javier R. SAETA, Javier HERNANDO, "Assessment of On-Line Model Quality and Threshold Estimation in Speaker Verification" in IEICE TRANSACTIONS on Information, vol. E90-D, no. 4, pp. 759-765, April 2007, doi: 10.1093/ietisy/e90-d.4.759.
Abstract: The selection of the most representative utterances coming from a speaker is essential for the right performance of automatic enrollment in speaker verification. Model quality measures and threshold estimation methods mainly deal with the scarcity of data and the difficulty of obtaining data from impostors in real applications. Conventional methods estimate the quality of the training utterances once the model is created. In such case, it is not possible to ask the user for more utterances during the training session if necessary. A new training session must be started. That was especially unusable in applications where only one or two enrolment sessions were allowed. In this paper, a new on-line quality method based on a male and a female Universal Background Model (UBM) is introduced. The two models act as a reference for new utterances and show if they belong to the same speaker and provide a measure of its quality at the same time. On the other hand, the estimation of the verification threshold is also strongly influenced by the previous selection of the speaker's utterances. In this context, potential outliers, i.e., those client scores which are distant with regard to mean, could lead to wrong mean and variance client estimations. To alleviate this problem, some efficient threshold estimation methods based on removing or weighting scores are proposed here. Before estimating the threshold, the client scores catalogued as outliers are removed, pruned or weighted, improving subsequent estimations. Text-dependent experiments have been carried out by using a telephonic multi-session database in Spanish. The database has been recorded by the authors and has 184 speakers.
URL: https://global.ieice.org/en_transactions/information/10.1093/ietisy/e90-d.4.759/_p

Copy

@ARTICLE{e90-d_4_759,
author={Javier R. SAETA, Javier HERNANDO, },
journal={IEICE TRANSACTIONS on Information},
title={Assessment of On-Line Model Quality and Threshold Estimation in Speaker Verification},
year={2007},
volume={E90-D},
number={4},
pages={759-765},
abstract={The selection of the most representative utterances coming from a speaker is essential for the right performance of automatic enrollment in speaker verification. Model quality measures and threshold estimation methods mainly deal with the scarcity of data and the difficulty of obtaining data from impostors in real applications. Conventional methods estimate the quality of the training utterances once the model is created. In such case, it is not possible to ask the user for more utterances during the training session if necessary. A new training session must be started. That was especially unusable in applications where only one or two enrolment sessions were allowed. In this paper, a new on-line quality method based on a male and a female Universal Background Model (UBM) is introduced. The two models act as a reference for new utterances and show if they belong to the same speaker and provide a measure of its quality at the same time. On the other hand, the estimation of the verification threshold is also strongly influenced by the previous selection of the speaker's utterances. In this context, potential outliers, i.e., those client scores which are distant with regard to mean, could lead to wrong mean and variance client estimations. To alleviate this problem, some efficient threshold estimation methods based on removing or weighting scores are proposed here. Before estimating the threshold, the client scores catalogued as outliers are removed, pruned or weighted, improving subsequent estimations. Text-dependent experiments have been carried out by using a telephonic multi-session database in Spanish. The database has been recorded by the authors and has 184 speakers.},
keywords={},
doi={10.1093/ietisy/e90-d.4.759},
ISSN={1745-1361},
month={April},}

Copy

TY - JOUR
TI - Assessment of On-Line Model Quality and Threshold Estimation in Speaker Verification
T2 - IEICE TRANSACTIONS on Information
SP - 759
EP - 765
AU - Javier R. SAETA
AU - Javier HERNANDO
PY - 2007
DO - 10.1093/ietisy/e90-d.4.759
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E90-D
IS - 4
JA - IEICE TRANSACTIONS on Information
Y1 - April 2007
AB - The selection of the most representative utterances coming from a speaker is essential for the right performance of automatic enrollment in speaker verification. Model quality measures and threshold estimation methods mainly deal with the scarcity of data and the difficulty of obtaining data from impostors in real applications. Conventional methods estimate the quality of the training utterances once the model is created. In such case, it is not possible to ask the user for more utterances during the training session if necessary. A new training session must be started. That was especially unusable in applications where only one or two enrolment sessions were allowed. In this paper, a new on-line quality method based on a male and a female Universal Background Model (UBM) is introduced. The two models act as a reference for new utterances and show if they belong to the same speaker and provide a measure of its quality at the same time. On the other hand, the estimation of the verification threshold is also strongly influenced by the previous selection of the speaker's utterances. In this context, potential outliers, i.e., those client scores which are distant with regard to mean, could lead to wrong mean and variance client estimations. To alleviate this problem, some efficient threshold estimation methods based on removing or weighting scores are proposed here. Before estimating the threshold, the client scores catalogued as outliers are removed, pruned or weighted, improving subsequent estimations. Text-dependent experiments have been carried out by using a telephonic multi-session database in Spanish. The database has been recorded by the authors and has 184 speakers.
ER -