Utterance Verification Using State-Level Log-Likelihood Ratio with Frame and State Selection

Suk-Bong KWON; Hoirin KIM

doi:10.1587/transinf.E93.D.647

Utterance Verification Using State-Level Log-Likelihood Ratio with Frame and State Selection

Suk-Bong KWON, Hoirin KIM

Full Text Views

0

Cite this

Summary :

This paper suggests utterance verification system using state-level log-likelihood ratio with frame and state selection. We use hidden Markov models for speech recognition and utterance verification as acoustic models and anti-phone models. The hidden Markov models have three states and each state represents different characteristics of a phone. Thus we propose an algorithm to compute state-level log-likelihood ratio and give weights on states for obtaining more reliable confidence measure of recognized phones. Additionally, we propose a frame selection algorithm to compute confidence measure on frames including proper speech in the input speech. In general, phone segmentation information obtained from speaker-independent speech recognition system is not accurate because triphone-based acoustic models are difficult to effectively train for covering diverse pronunciation and coarticulation effect. So, it is more difficult to find the right matched states when obtaining state segmentation information. A state selection algorithm is suggested for finding valid states. The proposed method using state-level log-likelihood ratio with frame and state selection shows that the relative reduction in equal error rate is 18.1% compared to the baseline system using simple phone-level log-likelihood ratios.

Publication: IEICE TRANSACTIONS on Information Vol.E93-D No.3 pp.647-650

Publication Date: 2010/03/01

Publicized

Online ISSN: 1745-1361

DOI: 10.1587/transinf.E93.D.647

Type of Manuscript: LETTER

Category: Speech and Hearing

Cite this

Copy

Suk-Bong KWON, Hoirin KIM, "Utterance Verification Using State-Level Log-Likelihood Ratio with Frame and State Selection" in IEICE TRANSACTIONS on Information, vol. E93-D, no. 3, pp. 647-650, March 2010, doi: 10.1587/transinf.E93.D.647.
Abstract: This paper suggests utterance verification system using state-level log-likelihood ratio with frame and state selection. We use hidden Markov models for speech recognition and utterance verification as acoustic models and anti-phone models. The hidden Markov models have three states and each state represents different characteristics of a phone. Thus we propose an algorithm to compute state-level log-likelihood ratio and give weights on states for obtaining more reliable confidence measure of recognized phones. Additionally, we propose a frame selection algorithm to compute confidence measure on frames including proper speech in the input speech. In general, phone segmentation information obtained from speaker-independent speech recognition system is not accurate because triphone-based acoustic models are difficult to effectively train for covering diverse pronunciation and coarticulation effect. So, it is more difficult to find the right matched states when obtaining state segmentation information. A state selection algorithm is suggested for finding valid states. The proposed method using state-level log-likelihood ratio with frame and state selection shows that the relative reduction in equal error rate is 18.1% compared to the baseline system using simple phone-level log-likelihood ratios.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.E93.D.647/_p

Copy

@ARTICLE{e93-d_3_647,
author={Suk-Bong KWON, Hoirin KIM, },
journal={IEICE TRANSACTIONS on Information},
title={Utterance Verification Using State-Level Log-Likelihood Ratio with Frame and State Selection},
year={2010},
volume={E93-D},
number={3},
pages={647-650},
abstract={This paper suggests utterance verification system using state-level log-likelihood ratio with frame and state selection. We use hidden Markov models for speech recognition and utterance verification as acoustic models and anti-phone models. The hidden Markov models have three states and each state represents different characteristics of a phone. Thus we propose an algorithm to compute state-level log-likelihood ratio and give weights on states for obtaining more reliable confidence measure of recognized phones. Additionally, we propose a frame selection algorithm to compute confidence measure on frames including proper speech in the input speech. In general, phone segmentation information obtained from speaker-independent speech recognition system is not accurate because triphone-based acoustic models are difficult to effectively train for covering diverse pronunciation and coarticulation effect. So, it is more difficult to find the right matched states when obtaining state segmentation information. A state selection algorithm is suggested for finding valid states. The proposed method using state-level log-likelihood ratio with frame and state selection shows that the relative reduction in equal error rate is 18.1% compared to the baseline system using simple phone-level log-likelihood ratios.},
keywords={},
doi={10.1587/transinf.E93.D.647},
ISSN={1745-1361},
month={March},}

Copy

TY - JOUR
TI - Utterance Verification Using State-Level Log-Likelihood Ratio with Frame and State Selection
T2 - IEICE TRANSACTIONS on Information
SP - 647
EP - 650
AU - Suk-Bong KWON
AU - Hoirin KIM
PY - 2010
DO - 10.1587/transinf.E93.D.647
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E93-D
IS - 3
JA - IEICE TRANSACTIONS on Information
Y1 - March 2010
AB - This paper suggests utterance verification system using state-level log-likelihood ratio with frame and state selection. We use hidden Markov models for speech recognition and utterance verification as acoustic models and anti-phone models. The hidden Markov models have three states and each state represents different characteristics of a phone. Thus we propose an algorithm to compute state-level log-likelihood ratio and give weights on states for obtaining more reliable confidence measure of recognized phones. Additionally, we propose a frame selection algorithm to compute confidence measure on frames including proper speech in the input speech. In general, phone segmentation information obtained from speaker-independent speech recognition system is not accurate because triphone-based acoustic models are difficult to effectively train for covering diverse pronunciation and coarticulation effect. So, it is more difficult to find the right matched states when obtaining state segmentation information. A state selection algorithm is suggested for finding valid states. The proposed method using state-level log-likelihood ratio with frame and state selection shows that the relative reduction in equal error rate is 18.1% compared to the baseline system using simple phone-level log-likelihood ratios.
ER -