Robust Speech Recognition by Using Compensated Acoustic Scores

Shoei SATO; Kazuo ONOE; Akio KOBAYASHI; Toru IMAI

doi:10.1093/ietisy/e89-d.3.915

IEICE TRANSACTIONS on Information

Robust Speech Recognition by Using Compensated Acoustic Scores

Shoei SATO, Kazuo ONOE, Akio KOBAYASHI, Toru IMAI

Full Text Views

0

Cite this

Summary :

This paper proposes a new compensation method of acoustic scores in the Viterbi search for robust speech recognition. This method introduces noise models to represent a wide variety of noises and realizes robust decoding together with conventional techniques of subtraction and adaptation. This method uses likelihoods of noise models in two ways. One is to calculate a confidence factor for each input frame by comparing likelihoods of speech models and noise models. Then the weight of the acoustic score for a noisy frame is reduced according to the value of the confidence factor for compensation. The other is to use the likelihood of noise model as an alternative that of a silence model when given noisy input. Since a lower confidence factor compresses acoustic scores, the decoder rather relies on language scores and keeps more hypotheses within a fixed search depth for a noisy frame. An experiment using commentary transcriptions of a broadcast sports program (MLB: Major League Baseball) showed that the proposed method obtained a 6.7% relative word error reduction. The method also reduced the relative error rate of key words by 17.9%, and this is expected lead to an improvement metadata extraction accuracy.

Publication: IEICE TRANSACTIONS on Information Vol.E89-D No.3 pp.915-921

Publication Date: 2006/03/01

Publicized

Online ISSN: 1745-1361

DOI: 10.1093/ietisy/e89-d.3.915

Type of Manuscript: Special Section PAPER (Special Section on Statistical Modeling for Speech Processing)

Category: Speech Recognition

Cite this

Copy

Shoei SATO, Kazuo ONOE, Akio KOBAYASHI, Toru IMAI, "Robust Speech Recognition by Using Compensated Acoustic Scores" in IEICE TRANSACTIONS on Information, vol. E89-D, no. 3, pp. 915-921, March 2006, doi: 10.1093/ietisy/e89-d.3.915.
Abstract: This paper proposes a new compensation method of acoustic scores in the Viterbi search for robust speech recognition. This method introduces noise models to represent a wide variety of noises and realizes robust decoding together with conventional techniques of subtraction and adaptation. This method uses likelihoods of noise models in two ways. One is to calculate a confidence factor for each input frame by comparing likelihoods of speech models and noise models. Then the weight of the acoustic score for a noisy frame is reduced according to the value of the confidence factor for compensation. The other is to use the likelihood of noise model as an alternative that of a silence model when given noisy input. Since a lower confidence factor compresses acoustic scores, the decoder rather relies on language scores and keeps more hypotheses within a fixed search depth for a noisy frame. An experiment using commentary transcriptions of a broadcast sports program (MLB: Major League Baseball) showed that the proposed method obtained a 6.7% relative word error reduction. The method also reduced the relative error rate of key words by 17.9%, and this is expected lead to an improvement metadata extraction accuracy.
URL: https://global.ieice.org/en_transactions/information/10.1093/ietisy/e89-d.3.915/_p

Copy

@ARTICLE{e89-d_3_915,
author={Shoei SATO, Kazuo ONOE, Akio KOBAYASHI, Toru IMAI, },
journal={IEICE TRANSACTIONS on Information},
title={Robust Speech Recognition by Using Compensated Acoustic Scores},
year={2006},
volume={E89-D},
number={3},
pages={915-921},
abstract={This paper proposes a new compensation method of acoustic scores in the Viterbi search for robust speech recognition. This method introduces noise models to represent a wide variety of noises and realizes robust decoding together with conventional techniques of subtraction and adaptation. This method uses likelihoods of noise models in two ways. One is to calculate a confidence factor for each input frame by comparing likelihoods of speech models and noise models. Then the weight of the acoustic score for a noisy frame is reduced according to the value of the confidence factor for compensation. The other is to use the likelihood of noise model as an alternative that of a silence model when given noisy input. Since a lower confidence factor compresses acoustic scores, the decoder rather relies on language scores and keeps more hypotheses within a fixed search depth for a noisy frame. An experiment using commentary transcriptions of a broadcast sports program (MLB: Major League Baseball) showed that the proposed method obtained a 6.7% relative word error reduction. The method also reduced the relative error rate of key words by 17.9%, and this is expected lead to an improvement metadata extraction accuracy.},
keywords={},
doi={10.1093/ietisy/e89-d.3.915},
ISSN={1745-1361},
month={March},}

Copy

TY - JOUR
TI - Robust Speech Recognition by Using Compensated Acoustic Scores
T2 - IEICE TRANSACTIONS on Information
SP - 915
EP - 921
AU - Shoei SATO
AU - Kazuo ONOE
AU - Akio KOBAYASHI
AU - Toru IMAI
PY - 2006
DO - 10.1093/ietisy/e89-d.3.915
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E89-D
IS - 3
JA - IEICE TRANSACTIONS on Information
Y1 - March 2006
AB - This paper proposes a new compensation method of acoustic scores in the Viterbi search for robust speech recognition. This method introduces noise models to represent a wide variety of noises and realizes robust decoding together with conventional techniques of subtraction and adaptation. This method uses likelihoods of noise models in two ways. One is to calculate a confidence factor for each input frame by comparing likelihoods of speech models and noise models. Then the weight of the acoustic score for a noisy frame is reduced according to the value of the confidence factor for compensation. The other is to use the likelihood of noise model as an alternative that of a silence model when given noisy input. Since a lower confidence factor compresses acoustic scores, the decoder rather relies on language scores and keeps more hypotheses within a fixed search depth for a noisy frame. An experiment using commentary transcriptions of a broadcast sports program (MLB: Major League Baseball) showed that the proposed method obtained a 6.7% relative word error reduction. The method also reduced the relative error rate of key words by 17.9%, and this is expected lead to an improvement metadata extraction accuracy.
ER -

IEICE TRANSACTIONS on Information

Robust Speech Recognition by Using Compensated Acoustic Scores

Summary :

Authors

Keyword

Latest Issue

Contents

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles

IEICE TRANSACTIONS on Information

Robust Speech Recognition by Using Compensated Acoustic Scores

Summary :

Authors

Keyword

Latest Issue

Contents

Copyrights notice of machine-translated contents

Cite this

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles