The search functionality is under construction.

The search functionality is under construction.

This paper suggests utterance verification system using state-level log-likelihood ratio with frame and state selection. We use hidden Markov models for speech recognition and utterance verification as acoustic models and anti-phone models. The hidden Markov models have three states and each state represents different characteristics of a phone. Thus we propose an algorithm to compute state-level log-likelihood ratio and give weights on states for obtaining more reliable confidence measure of recognized phones. Additionally, we propose a frame selection algorithm to compute confidence measure on frames including proper speech in the input speech. In general, phone segmentation information obtained from speaker-independent speech recognition system is not accurate because triphone-based acoustic models are difficult to effectively train for covering diverse pronunciation and coarticulation effect. So, it is more difficult to find the right matched states when obtaining state segmentation information. A state selection algorithm is suggested for finding valid states. The proposed method using state-level log-likelihood ratio with frame and state selection shows that the relative reduction in equal error rate is 18.1% compared to the baseline system using simple phone-level log-likelihood ratios.

- Publication
- IEICE TRANSACTIONS on Information Vol.E93-D No.3 pp.647-650

- Publication Date
- 2010/03/01

- Publicized

- Online ISSN
- 1745-1361

- DOI
- 10.1587/transinf.E93.D.647

- Type of Manuscript
- LETTER

- Category
- Speech and Hearing

The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.

Copy

Suk-Bong KWON, Hoirin KIM, "Utterance Verification Using State-Level Log-Likelihood Ratio with Frame and State Selection" in IEICE TRANSACTIONS on Information,
vol. E93-D, no. 3, pp. 647-650, March 2010, doi: 10.1587/transinf.E93.D.647.

Abstract: This paper suggests utterance verification system using state-level log-likelihood ratio with frame and state selection. We use hidden Markov models for speech recognition and utterance verification as acoustic models and anti-phone models. The hidden Markov models have three states and each state represents different characteristics of a phone. Thus we propose an algorithm to compute state-level log-likelihood ratio and give weights on states for obtaining more reliable confidence measure of recognized phones. Additionally, we propose a frame selection algorithm to compute confidence measure on frames including proper speech in the input speech. In general, phone segmentation information obtained from speaker-independent speech recognition system is not accurate because triphone-based acoustic models are difficult to effectively train for covering diverse pronunciation and coarticulation effect. So, it is more difficult to find the right matched states when obtaining state segmentation information. A state selection algorithm is suggested for finding valid states. The proposed method using state-level log-likelihood ratio with frame and state selection shows that the relative reduction in equal error rate is 18.1% compared to the baseline system using simple phone-level log-likelihood ratios.

URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.E93.D.647/_p

Copy

@ARTICLE{e93-d_3_647,

author={Suk-Bong KWON, Hoirin KIM, },

journal={IEICE TRANSACTIONS on Information},

title={Utterance Verification Using State-Level Log-Likelihood Ratio with Frame and State Selection},

year={2010},

volume={E93-D},

number={3},

pages={647-650},

abstract={This paper suggests utterance verification system using state-level log-likelihood ratio with frame and state selection. We use hidden Markov models for speech recognition and utterance verification as acoustic models and anti-phone models. The hidden Markov models have three states and each state represents different characteristics of a phone. Thus we propose an algorithm to compute state-level log-likelihood ratio and give weights on states for obtaining more reliable confidence measure of recognized phones. Additionally, we propose a frame selection algorithm to compute confidence measure on frames including proper speech in the input speech. In general, phone segmentation information obtained from speaker-independent speech recognition system is not accurate because triphone-based acoustic models are difficult to effectively train for covering diverse pronunciation and coarticulation effect. So, it is more difficult to find the right matched states when obtaining state segmentation information. A state selection algorithm is suggested for finding valid states. The proposed method using state-level log-likelihood ratio with frame and state selection shows that the relative reduction in equal error rate is 18.1% compared to the baseline system using simple phone-level log-likelihood ratios.},

keywords={},

doi={10.1587/transinf.E93.D.647},

ISSN={1745-1361},

month={March},}

Copy

TY - JOUR

TI - Utterance Verification Using State-Level Log-Likelihood Ratio with Frame and State Selection

T2 - IEICE TRANSACTIONS on Information

SP - 647

EP - 650

AU - Suk-Bong KWON

AU - Hoirin KIM

PY - 2010

DO - 10.1587/transinf.E93.D.647

JO - IEICE TRANSACTIONS on Information

SN - 1745-1361

VL - E93-D

IS - 3

JA - IEICE TRANSACTIONS on Information

Y1 - March 2010

AB - This paper suggests utterance verification system using state-level log-likelihood ratio with frame and state selection. We use hidden Markov models for speech recognition and utterance verification as acoustic models and anti-phone models. The hidden Markov models have three states and each state represents different characteristics of a phone. Thus we propose an algorithm to compute state-level log-likelihood ratio and give weights on states for obtaining more reliable confidence measure of recognized phones. Additionally, we propose a frame selection algorithm to compute confidence measure on frames including proper speech in the input speech. In general, phone segmentation information obtained from speaker-independent speech recognition system is not accurate because triphone-based acoustic models are difficult to effectively train for covering diverse pronunciation and coarticulation effect. So, it is more difficult to find the right matched states when obtaining state segmentation information. A state selection algorithm is suggested for finding valid states. The proposed method using state-level log-likelihood ratio with frame and state selection shows that the relative reduction in equal error rate is 18.1% compared to the baseline system using simple phone-level log-likelihood ratios.

ER -