An Unsupervised Speaker Adaptation Method for Lecture-Style Spontaneous Speech Recognition Using Multiple Recognition Systems

Seiichi NAKAGAWA; Tomohiro WATANABE; Hiromitsu NISHIZAKI; Takehito UTSURO

doi:10.1093/ietisy/e88-d.3.463

An Unsupervised Speaker Adaptation Method for Lecture-Style Spontaneous Speech Recognition Using Multiple Recognition Systems

Seiichi NAKAGAWA, Tomohiro WATANABE, Hiromitsu NISHIZAKI, Takehito UTSURO

Full Text Views

0

Cite this

Summary :

This paper describes an accurate unsupervised speaker adaptation method for lecture style spontaneous speech recognition using multiple LVCSR systems. In an unsupervised speaker adaptation framework, the improvement of recognition performance by adapting acoustic models remarkably depends on the accuracy of labels such as phonemes and syllables. Therefore, extraction of the adaptation data guided by confidence measure is effective for unsupervised adaptation. In this paper, we looked for the high confidence portions based on the agreement between two LVCSR systems, adapted acoustic models using the portions attached with high accurate labels, and then improved the recognition accuracy. We applied our method to the Corpus of Spontaneous Japanese (CSJ) and the method improved the recognition rate by about 2.1% in comparison with a traditional method.

Publication: IEICE TRANSACTIONS on Information Vol.E88-D No.3 pp.463-471

Publication Date: 2005/03/01

Publicized

Online ISSN

DOI: 10.1093/ietisy/e88-d.3.463

Type of Manuscript: Special Section PAPER (Special Section on Corpus-Based Speech Technologies)

Category: Spoken Language Systems

Cite this

Copy

Seiichi NAKAGAWA, Tomohiro WATANABE, Hiromitsu NISHIZAKI, Takehito UTSURO, "An Unsupervised Speaker Adaptation Method for Lecture-Style Spontaneous Speech Recognition Using Multiple Recognition Systems" in IEICE TRANSACTIONS on Information, vol. E88-D, no. 3, pp. 463-471, March 2005, doi: 10.1093/ietisy/e88-d.3.463.
Abstract: This paper describes an accurate unsupervised speaker adaptation method for lecture style spontaneous speech recognition using multiple LVCSR systems. In an unsupervised speaker adaptation framework, the improvement of recognition performance by adapting acoustic models remarkably depends on the accuracy of labels such as phonemes and syllables. Therefore, extraction of the adaptation data guided by confidence measure is effective for unsupervised adaptation. In this paper, we looked for the high confidence portions based on the agreement between two LVCSR systems, adapted acoustic models using the portions attached with high accurate labels, and then improved the recognition accuracy. We applied our method to the Corpus of Spontaneous Japanese (CSJ) and the method improved the recognition rate by about 2.1% in comparison with a traditional method.
URL: https://global.ieice.org/en_transactions/information/10.1093/ietisy/e88-d.3.463/_p

Copy

@ARTICLE{e88-d_3_463,
author={Seiichi NAKAGAWA, Tomohiro WATANABE, Hiromitsu NISHIZAKI, Takehito UTSURO, },
journal={IEICE TRANSACTIONS on Information},
title={An Unsupervised Speaker Adaptation Method for Lecture-Style Spontaneous Speech Recognition Using Multiple Recognition Systems},
year={2005},
volume={E88-D},
number={3},
pages={463-471},
abstract={This paper describes an accurate unsupervised speaker adaptation method for lecture style spontaneous speech recognition using multiple LVCSR systems. In an unsupervised speaker adaptation framework, the improvement of recognition performance by adapting acoustic models remarkably depends on the accuracy of labels such as phonemes and syllables. Therefore, extraction of the adaptation data guided by confidence measure is effective for unsupervised adaptation. In this paper, we looked for the high confidence portions based on the agreement between two LVCSR systems, adapted acoustic models using the portions attached with high accurate labels, and then improved the recognition accuracy. We applied our method to the Corpus of Spontaneous Japanese (CSJ) and the method improved the recognition rate by about 2.1% in comparison with a traditional method.},
keywords={},
doi={10.1093/ietisy/e88-d.3.463},
ISSN={},
month={March},}

Copy

TY - JOUR
TI - An Unsupervised Speaker Adaptation Method for Lecture-Style Spontaneous Speech Recognition Using Multiple Recognition Systems
T2 - IEICE TRANSACTIONS on Information
SP - 463
EP - 471
AU - Seiichi NAKAGAWA
AU - Tomohiro WATANABE
AU - Hiromitsu NISHIZAKI
AU - Takehito UTSURO
PY - 2005
DO - 10.1093/ietisy/e88-d.3.463
JO - IEICE TRANSACTIONS on Information
SN -
VL - E88-D
IS - 3
JA - IEICE TRANSACTIONS on Information
Y1 - March 2005
AB - This paper describes an accurate unsupervised speaker adaptation method for lecture style spontaneous speech recognition using multiple LVCSR systems. In an unsupervised speaker adaptation framework, the improvement of recognition performance by adapting acoustic models remarkably depends on the accuracy of labels such as phonemes and syllables. Therefore, extraction of the adaptation data guided by confidence measure is effective for unsupervised adaptation. In this paper, we looked for the high confidence portions based on the agreement between two LVCSR systems, adapted acoustic models using the portions attached with high accurate labels, and then improved the recognition accuracy. We applied our method to the Corpus of Spontaneous Japanese (CSJ) and the method improved the recognition rate by about 2.1% in comparison with a traditional method.
ER -