A Study on Acoustic Modeling of Pauses for Recognizing Noisy Conversational Speech

Jin-Song ZHANG; Konstantin MARKOV; Tomoko MATSUI; Satoshi NAKAMURA

IEICE TRANSACTIONS on Information

A Study on Acoustic Modeling of Pauses for Recognizing Noisy Conversational Speech

Jin-Song ZHANG, Konstantin MARKOV, Tomoko MATSUI, Satoshi NAKAMURA

Full Text Views

0

Cite this

Summary :

This paper presents a study on modeling inter-word pauses to improve the robustness of acoustic models for recognizing noisy conversational speech. When precise contextual modeling is used for pauses, the frequent appearances and varying acoustics of pauses in noisy conversational speech make it a problem to automatically generate an accurate phonetic transcription of the training data for developing robust acoustic models. This paper presents a proposal to exploit the reliable phonetic heuristics of pauses in speech to aid the detection of varying pauses. Based on it, a stepwise approach to optimize pause HMMs was applied to the data of the DARPA SPINE2 project, and more correct phonetic transcription was achieved. The cross-word triphone HMMs developed using this method got an absolute 9.2% word error reduction when compared to the conventional method with only context free modeling of pauses. For the same pause modeling method, the use of the optimized phonetic segmentation brought about an absolute 5.2% improvements.

Publication: IEICE TRANSACTIONS on Information Vol.E86-D No.3 pp.489-496

Publication Date: 2003/03/01

Publicized

Online ISSN

DOI

Type of Manuscript: Special Section PAPER (Special Issue on Speech Information Processing)

Category: Robust Speech Recognition and Enhancement

Cite this

Copy

Jin-Song ZHANG, Konstantin MARKOV, Tomoko MATSUI, Satoshi NAKAMURA, "A Study on Acoustic Modeling of Pauses for Recognizing Noisy Conversational Speech" in IEICE TRANSACTIONS on Information, vol. E86-D, no. 3, pp. 489-496, March 2003, doi: .
Abstract: This paper presents a study on modeling inter-word pauses to improve the robustness of acoustic models for recognizing noisy conversational speech. When precise contextual modeling is used for pauses, the frequent appearances and varying acoustics of pauses in noisy conversational speech make it a problem to automatically generate an accurate phonetic transcription of the training data for developing robust acoustic models. This paper presents a proposal to exploit the reliable phonetic heuristics of pauses in speech to aid the detection of varying pauses. Based on it, a stepwise approach to optimize pause HMMs was applied to the data of the DARPA SPINE2 project, and more correct phonetic transcription was achieved. The cross-word triphone HMMs developed using this method got an absolute 9.2% word error reduction when compared to the conventional method with only context free modeling of pauses. For the same pause modeling method, the use of the optimized phonetic segmentation brought about an absolute 5.2% improvements.
URL: https://global.ieice.org/en_transactions/information/10.1587/e86-d_3_489/_p

Copy

@ARTICLE{e86-d_3_489,
author={Jin-Song ZHANG, Konstantin MARKOV, Tomoko MATSUI, Satoshi NAKAMURA, },
journal={IEICE TRANSACTIONS on Information},
title={A Study on Acoustic Modeling of Pauses for Recognizing Noisy Conversational Speech},
year={2003},
volume={E86-D},
number={3},
pages={489-496},
abstract={This paper presents a study on modeling inter-word pauses to improve the robustness of acoustic models for recognizing noisy conversational speech. When precise contextual modeling is used for pauses, the frequent appearances and varying acoustics of pauses in noisy conversational speech make it a problem to automatically generate an accurate phonetic transcription of the training data for developing robust acoustic models. This paper presents a proposal to exploit the reliable phonetic heuristics of pauses in speech to aid the detection of varying pauses. Based on it, a stepwise approach to optimize pause HMMs was applied to the data of the DARPA SPINE2 project, and more correct phonetic transcription was achieved. The cross-word triphone HMMs developed using this method got an absolute 9.2% word error reduction when compared to the conventional method with only context free modeling of pauses. For the same pause modeling method, the use of the optimized phonetic segmentation brought about an absolute 5.2% improvements.},
keywords={},
doi={},
ISSN={},
month={March},}

Copy

TY - JOUR
TI - A Study on Acoustic Modeling of Pauses for Recognizing Noisy Conversational Speech
T2 - IEICE TRANSACTIONS on Information
SP - 489
EP - 496
AU - Jin-Song ZHANG
AU - Konstantin MARKOV
AU - Tomoko MATSUI
AU - Satoshi NAKAMURA
PY - 2003
DO -
JO - IEICE TRANSACTIONS on Information
SN -
VL - E86-D
IS - 3
JA - IEICE TRANSACTIONS on Information
Y1 - March 2003
AB - This paper presents a study on modeling inter-word pauses to improve the robustness of acoustic models for recognizing noisy conversational speech. When precise contextual modeling is used for pauses, the frequent appearances and varying acoustics of pauses in noisy conversational speech make it a problem to automatically generate an accurate phonetic transcription of the training data for developing robust acoustic models. This paper presents a proposal to exploit the reliable phonetic heuristics of pauses in speech to aid the detection of varying pauses. Based on it, a stepwise approach to optimize pause HMMs was applied to the data of the DARPA SPINE2 project, and more correct phonetic transcription was achieved. The cross-word triphone HMMs developed using this method got an absolute 9.2% word error reduction when compared to the conventional method with only context free modeling of pauses. For the same pause modeling method, the use of the optimized phonetic segmentation brought about an absolute 5.2% improvements.
ER -

IEICE TRANSACTIONS on Information

A Study on Acoustic Modeling of Pauses for Recognizing Noisy Conversational Speech

Summary :

Authors

Keyword

Latest Issue

Contents

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles

IEICE TRANSACTIONS on Information

A Study on Acoustic Modeling of Pauses for Recognizing Noisy Conversational Speech

Summary :

Authors

Keyword

Latest Issue

Contents

Copyrights notice of machine-translated contents

Cite this

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles