Robust Model for Speaker Verification against Session-Dependent Utterance Variation

Tomoko MATSUI; Kiyoaki AIKAWA

IEICE TRANSACTIONS on Information

Robust Model for Speaker Verification against Session-Dependent Utterance Variation

Tomoko MATSUI, Kiyoaki AIKAWA

Full Text Views

0

Cite this

Summary :

This paper investigates a new method for creating robust speaker models to cope with inter-session variation of a speaker in a continuous HMM-based speaker verification system. The new method estimates session-independent parameters by decomposing inter-session variations into two distinct parts: session-dependent and -independent. The parameters of the speaker models are estimated using the speaker adaptive training algorithm in conjunction with the equalization of session-dependent variation. The resultant models capture the session-independent speaker characteristics more reliably than the conventional models and their discriminative power improves accordingly. Moreover we have made our models more invariant to handset variations in a public switched telephone network (PSTN) by focusing on session-dependent variation and handset-dependent distortion separately. Text-independent speech data recorded by 20 speakers in seven sessions over 16 months was used to evaluate the new approach. The proposed method reduces the error rate by 15% relatively. When compared with the popular cepstral mean normalization, the error rate is reduced by 24% relatively when the speaker models were recreated using speech data recorded in four or more sessions.

Publication: IEICE TRANSACTIONS on Information Vol.E86-D No.4 pp.712-718

Publication Date: 2003/04/01

Publicized

Online ISSN

DOI

Type of Manuscript: PAPER

Category: Speech and Hearing

Cite this

Copy

Tomoko MATSUI, Kiyoaki AIKAWA, "Robust Model for Speaker Verification against Session-Dependent Utterance Variation" in IEICE TRANSACTIONS on Information, vol. E86-D, no. 4, pp. 712-718, April 2003, doi: .
Abstract: This paper investigates a new method for creating robust speaker models to cope with inter-session variation of a speaker in a continuous HMM-based speaker verification system. The new method estimates session-independent parameters by decomposing inter-session variations into two distinct parts: session-dependent and -independent. The parameters of the speaker models are estimated using the speaker adaptive training algorithm in conjunction with the equalization of session-dependent variation. The resultant models capture the session-independent speaker characteristics more reliably than the conventional models and their discriminative power improves accordingly. Moreover we have made our models more invariant to handset variations in a public switched telephone network (PSTN) by focusing on session-dependent variation and handset-dependent distortion separately. Text-independent speech data recorded by 20 speakers in seven sessions over 16 months was used to evaluate the new approach. The proposed method reduces the error rate by 15% relatively. When compared with the popular cepstral mean normalization, the error rate is reduced by 24% relatively when the speaker models were recreated using speech data recorded in four or more sessions.
URL: https://global.ieice.org/en_transactions/information/10.1587/e86-d_4_712/_p

Copy

@ARTICLE{e86-d_4_712,
author={Tomoko MATSUI, Kiyoaki AIKAWA, },
journal={IEICE TRANSACTIONS on Information},
title={Robust Model for Speaker Verification against Session-Dependent Utterance Variation},
year={2003},
volume={E86-D},
number={4},
pages={712-718},
abstract={This paper investigates a new method for creating robust speaker models to cope with inter-session variation of a speaker in a continuous HMM-based speaker verification system. The new method estimates session-independent parameters by decomposing inter-session variations into two distinct parts: session-dependent and -independent. The parameters of the speaker models are estimated using the speaker adaptive training algorithm in conjunction with the equalization of session-dependent variation. The resultant models capture the session-independent speaker characteristics more reliably than the conventional models and their discriminative power improves accordingly. Moreover we have made our models more invariant to handset variations in a public switched telephone network (PSTN) by focusing on session-dependent variation and handset-dependent distortion separately. Text-independent speech data recorded by 20 speakers in seven sessions over 16 months was used to evaluate the new approach. The proposed method reduces the error rate by 15% relatively. When compared with the popular cepstral mean normalization, the error rate is reduced by 24% relatively when the speaker models were recreated using speech data recorded in four or more sessions.},
keywords={},
doi={},
ISSN={},
month={April},}

Copy

TY - JOUR
TI - Robust Model for Speaker Verification against Session-Dependent Utterance Variation
T2 - IEICE TRANSACTIONS on Information
SP - 712
EP - 718
AU - Tomoko MATSUI
AU - Kiyoaki AIKAWA
PY - 2003
DO -
JO - IEICE TRANSACTIONS on Information
SN -
VL - E86-D
IS - 4
JA - IEICE TRANSACTIONS on Information
Y1 - April 2003
AB - This paper investigates a new method for creating robust speaker models to cope with inter-session variation of a speaker in a continuous HMM-based speaker verification system. The new method estimates session-independent parameters by decomposing inter-session variations into two distinct parts: session-dependent and -independent. The parameters of the speaker models are estimated using the speaker adaptive training algorithm in conjunction with the equalization of session-dependent variation. The resultant models capture the session-independent speaker characteristics more reliably than the conventional models and their discriminative power improves accordingly. Moreover we have made our models more invariant to handset variations in a public switched telephone network (PSTN) by focusing on session-dependent variation and handset-dependent distortion separately. Text-independent speech data recorded by 20 speakers in seven sessions over 16 months was used to evaluate the new approach. The proposed method reduces the error rate by 15% relatively. When compared with the popular cepstral mean normalization, the error rate is reduced by 24% relatively when the speaker models were recreated using speech data recorded in four or more sessions.
ER -

IEICE TRANSACTIONS on Information

Robust Model for Speaker Verification against Session-Dependent Utterance Variation

Summary :

Authors

Keyword

Latest Issue

Contents

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles

IEICE TRANSACTIONS on Information

Robust Model for Speaker Verification against Session-Dependent Utterance Variation

Summary :

Authors

Keyword

Latest Issue

Contents

Copyrights notice of machine-translated contents

Cite this

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles