Sequence-Based Pronunciation Variation Modeling for Spontaneous ASR Using a Noisy Channel Approach

Hansjorg HOFMANN; Sakriani SAKTI; Chiori HORI; Hideki KASHIOKA; Satoshi NAKAMURA; Wolfgang MINKER

doi:10.1587/transinf.E95.D.2084

IEICE TRANSACTIONS on Information

Sequence-Based Pronunciation Variation Modeling for Spontaneous ASR Using a Noisy Channel Approach

Hansjorg HOFMANN, Sakriani SAKTI, Chiori HORI, Hideki KASHIOKA, Satoshi NAKAMURA, Wolfgang MINKER

Full Text Views

0

Cite this

Summary :

The performance of English automatic speech recognition systems decreases when recognizing spontaneous speech mainly due to multiple pronunciation variants in the utterances. Previous approaches address this problem by modeling the alteration of the pronunciation on a phoneme to phoneme level. However, the phonetic transformation effects induced by the pronunciation of the whole sentence have not yet been considered. In this article, the sequence-based pronunciation variation is modeled using a noisy channel approach where the spontaneous phoneme sequence is considered as a “noisy” string and the goal is to recover the “clean” string of the word sequence. Hereby, the whole word sequence and its effect on the alternation of the phonemes will be taken into consideration. Moreover, the system not only learns the phoneme transformation but also the mapping from the phoneme to the word directly. In this study, first the phonemes will be recognized with the present recognition system and afterwards the pronunciation variation model based on the noisy channel approach will map from the phoneme to the word level. Two well-known natural language processing approaches are adopted and derived from the noisy channel model theory: Joint-sequence models and statistical machine translation. Both of them are applied and various experiments are conducted using microphone and telephone of spontaneous speech.

Publication: IEICE TRANSACTIONS on Information Vol.E95-D No.8 pp.2084-2093

Publication Date: 2012/08/01

Publicized

Online ISSN: 1745-1361

DOI: 10.1587/transinf.E95.D.2084

Type of Manuscript: PAPER

Category: Speech and Hearing

Cite this

Copy

Hansjorg HOFMANN, Sakriani SAKTI, Chiori HORI, Hideki KASHIOKA, Satoshi NAKAMURA, Wolfgang MINKER, "Sequence-Based Pronunciation Variation Modeling for Spontaneous ASR Using a Noisy Channel Approach" in IEICE TRANSACTIONS on Information, vol. E95-D, no. 8, pp. 2084-2093, August 2012, doi: 10.1587/transinf.E95.D.2084.
Abstract: The performance of English automatic speech recognition systems decreases when recognizing spontaneous speech mainly due to multiple pronunciation variants in the utterances. Previous approaches address this problem by modeling the alteration of the pronunciation on a phoneme to phoneme level. However, the phonetic transformation effects induced by the pronunciation of the whole sentence have not yet been considered. In this article, the sequence-based pronunciation variation is modeled using a noisy channel approach where the spontaneous phoneme sequence is considered as a “noisy” string and the goal is to recover the “clean” string of the word sequence. Hereby, the whole word sequence and its effect on the alternation of the phonemes will be taken into consideration. Moreover, the system not only learns the phoneme transformation but also the mapping from the phoneme to the word directly. In this study, first the phonemes will be recognized with the present recognition system and afterwards the pronunciation variation model based on the noisy channel approach will map from the phoneme to the word level. Two well-known natural language processing approaches are adopted and derived from the noisy channel model theory: Joint-sequence models and statistical machine translation. Both of them are applied and various experiments are conducted using microphone and telephone of spontaneous speech.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.E95.D.2084/_p

Copy

@ARTICLE{e95-d_8_2084,
author={Hansjorg HOFMANN, Sakriani SAKTI, Chiori HORI, Hideki KASHIOKA, Satoshi NAKAMURA, Wolfgang MINKER, },
journal={IEICE TRANSACTIONS on Information},
title={Sequence-Based Pronunciation Variation Modeling for Spontaneous ASR Using a Noisy Channel Approach},
year={2012},
volume={E95-D},
number={8},
pages={2084-2093},
abstract={The performance of English automatic speech recognition systems decreases when recognizing spontaneous speech mainly due to multiple pronunciation variants in the utterances. Previous approaches address this problem by modeling the alteration of the pronunciation on a phoneme to phoneme level. However, the phonetic transformation effects induced by the pronunciation of the whole sentence have not yet been considered. In this article, the sequence-based pronunciation variation is modeled using a noisy channel approach where the spontaneous phoneme sequence is considered as a “noisy” string and the goal is to recover the “clean” string of the word sequence. Hereby, the whole word sequence and its effect on the alternation of the phonemes will be taken into consideration. Moreover, the system not only learns the phoneme transformation but also the mapping from the phoneme to the word directly. In this study, first the phonemes will be recognized with the present recognition system and afterwards the pronunciation variation model based on the noisy channel approach will map from the phoneme to the word level. Two well-known natural language processing approaches are adopted and derived from the noisy channel model theory: Joint-sequence models and statistical machine translation. Both of them are applied and various experiments are conducted using microphone and telephone of spontaneous speech.},
keywords={},
doi={10.1587/transinf.E95.D.2084},
ISSN={1745-1361},
month={August},}

Copy

TY - JOUR
TI - Sequence-Based Pronunciation Variation Modeling for Spontaneous ASR Using a Noisy Channel Approach
T2 - IEICE TRANSACTIONS on Information
SP - 2084
EP - 2093
AU - Hansjorg HOFMANN
AU - Sakriani SAKTI
AU - Chiori HORI
AU - Hideki KASHIOKA
AU - Satoshi NAKAMURA
AU - Wolfgang MINKER
PY - 2012
DO - 10.1587/transinf.E95.D.2084
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E95-D
IS - 8
JA - IEICE TRANSACTIONS on Information
Y1 - August 2012
AB - The performance of English automatic speech recognition systems decreases when recognizing spontaneous speech mainly due to multiple pronunciation variants in the utterances. Previous approaches address this problem by modeling the alteration of the pronunciation on a phoneme to phoneme level. However, the phonetic transformation effects induced by the pronunciation of the whole sentence have not yet been considered. In this article, the sequence-based pronunciation variation is modeled using a noisy channel approach where the spontaneous phoneme sequence is considered as a “noisy” string and the goal is to recover the “clean” string of the word sequence. Hereby, the whole word sequence and its effect on the alternation of the phonemes will be taken into consideration. Moreover, the system not only learns the phoneme transformation but also the mapping from the phoneme to the word directly. In this study, first the phonemes will be recognized with the present recognition system and afterwards the pronunciation variation model based on the noisy channel approach will map from the phoneme to the word level. Two well-known natural language processing approaches are adopted and derived from the noisy channel model theory: Joint-sequence models and statistical machine translation. Both of them are applied and various experiments are conducted using microphone and telephone of spontaneous speech.
ER -

IEICE TRANSACTIONS on Information

Sequence-Based Pronunciation Variation Modeling for Spontaneous ASR Using a Noisy Channel Approach

Summary :

Authors

Keyword

Latest Issue

Contents

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles

IEICE TRANSACTIONS on Information

Sequence-Based Pronunciation Variation Modeling for Spontaneous ASR Using a Noisy Channel Approach

Summary :

Authors

Keyword

Latest Issue

Contents

Copyrights notice of machine-translated contents

Cite this

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles