The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] confidence measure(8hit)

1-8hit
  • Confidence Measure Based on Context Consistency Using Word Occurrence Probability and Topic Adaptation for Spoken Term Detection

    Haiyang LI  Tieran ZHENG  Guibin ZHENG  Jiqing HAN  

     
    PAPER-Speech and Hearing

      Vol:
    E97-D No:3
      Page(s):
    554-561

    In this paper, we propose a novel confidence measure to improve the performance of spoken term detection (STD). The proposed confidence measure is based on the context consistency between a hypothesized word and its context in a word lattice. The main contribution of this paper is to compute the context consistency by considering the uncertainty in the results of speech recognition and the effect of topic. To measure the uncertainty of the context, we employ the word occurrence probability, which is obtained through combining the overlapping hypotheses in a word posterior lattice. To handle the effect of topic, we propose a method of topic adaptation. The adaptation method firstly classifies the spoken document according to the topics and then computes the context consistency of the hypothesized word with the topic-specific measure of semantic similarity. Additionally, we apply the topic-specific measure of semantic similarity by two means, and they are performed respectively with the information of the top-1 topic and the mixture of all topics according to topic classification. The experiments conducted on the Hub-4NE Mandarin database show that both the occurrence probability of context word and the topic adaptation are effective for the confidence measure of STD. The proposed confidence measure performs better compared with the one ignoring the uncertainty of the context or the one using a non-topic method.

  • Enhancing the Robustness of the Posterior-Based Confidence Measures Using Entropy Information for Speech Recognition

    Yanqing SUN  Yu ZHOU  Qingwei ZHAO  Pengyuan ZHANG  Fuping PAN  Yonghong YAN  

     
    PAPER-Robust Speech Recognition

      Vol:
    E93-D No:9
      Page(s):
    2431-2439

    In this paper, the robustness of the posterior-based confidence measures is improved by utilizing entropy information, which is calculated for speech-unit-level posteriors using only the best recognition result, without requiring a larger computational load than conventional methods. Using different normalization methods, two posterior-based entropy confidence measures are proposed. Practical details are discussed for two typical levels of hidden Markov model (HMM)-based posterior confidence measures, and both levels are compared in terms of their performances. Experiments show that the entropy information results in significant improvements in the posterior-based confidence measures. The absolute improvements of the out-of-vocabulary (OOV) rejection rate are more than 20% for both the phoneme-level confidence measures and the state-level confidence measures for our embedded test sets, without a significant decline of the in-vocabulary accuracy.

  • Utterance Verification Using State-Level Log-Likelihood Ratio with Frame and State Selection

    Suk-Bong KWON  Hoirin KIM  

     
    LETTER-Speech and Hearing

      Vol:
    E93-D No:3
      Page(s):
    647-650

    This paper suggests utterance verification system using state-level log-likelihood ratio with frame and state selection. We use hidden Markov models for speech recognition and utterance verification as acoustic models and anti-phone models. The hidden Markov models have three states and each state represents different characteristics of a phone. Thus we propose an algorithm to compute state-level log-likelihood ratio and give weights on states for obtaining more reliable confidence measure of recognized phones. Additionally, we propose a frame selection algorithm to compute confidence measure on frames including proper speech in the input speech. In general, phone segmentation information obtained from speaker-independent speech recognition system is not accurate because triphone-based acoustic models are difficult to effectively train for covering diverse pronunciation and coarticulation effect. So, it is more difficult to find the right matched states when obtaining state segmentation information. A state selection algorithm is suggested for finding valid states. The proposed method using state-level log-likelihood ratio with frame and state selection shows that the relative reduction in equal error rate is 18.1% compared to the baseline system using simple phone-level log-likelihood ratios.

  • Utterance Verification Using Word Voiceprint Models Based on Probabilistic Distributions of Phone-Level Log-Likelihood Ratio and Phone Duration

    Suk-Bong KWON  HoiRin KIM  

     
    LETTER-Speech and Hearing

      Vol:
    E91-D No:11
      Page(s):
    2746-2750

    This paper suggests word voiceprint models to verify the recognition results obtained from a speech recognition system. Word voiceprint models have word-dependent information based on the distributions of phone-level log-likelihood ratio and duration. Thus, we can obtain a more reliable confidence score for a recognized word by using its word voiceprint models that represent the more proper characteristics of utterance verification for the word. Additionally, when obtaining a log-likelihood ratio-based word voiceprint score, this paper proposes a new log-scale normalization function using the distribution of the phone-level log-likelihood ratio, instead of the sigmoid function widely used in obtaining a phone-level log-likelihood ratio. This function plays a role of emphasizing a mis-recognized phone in a word. This individual information of a word is used to help achieve a more discriminative score against out-of-vocabulary words. The proposed method requires additional memory, but it shows that the relative reduction in equal error rate is 16.9% compared to the baseline system using simple phone log-likelihood ratios.

  • Verification of Speech Recognition Results Incorporating In-domain Confidence and Discourse Coherence Measures

    Ian R. LANE  Tatsuya KAWAHARA  

     
    PAPER-Speech Recognition

      Vol:
    E89-D No:3
      Page(s):
    931-938

    Conventional confidence measures for assessing the reliability of ASR (automatic speech recognition) output are typically derived from "low-level" information which is obtained during speech recognition decoding. In contrast to these approaches, we propose a novel utterance verification framework which incorporates "high-level" knowledge sources. Specifically, we investigate two application-independent measures: in-domain confidence, the degree of match between the input utterance and the application domain of the back-end system, and discourse coherence, the consistency between consecutive utterances in a dialogue session. A joint confidence score is generated by combining these two measures with an orthodox measure based on GPP (generalized posterior probability). The proposed framework was evaluated on an utterance verification task for spontaneous dialogue performed via a (English/Japanese) speech-to-speech translation system. Incorporating the two proposed measures significantly improved utterance verification accuracy compared to using GPP alone, realizing reductions in CER (confidence error-rate) of 11.4% and 8.1% for the English and Japanese sides, respectively. When negligible ASR errors (that do not affect translation) were ignored, further improvement was achieved for the English side, realizing a reduction in CER of up to 14.6% compared to the GPP case.

  • Bayesian Confidence Scoring and Adaptation Techniques for Speech Recognition

    Tae-Yoon KIM  Hanseok KO  

     
    LETTER-Multimedia Systems for Communications" Multimedia Systems for Communications

      Vol:
    E88-B No:4
      Page(s):
    1756-1759

    Bayesian combining of confidence measures is proposed for speech recognition. Bayesian combining is achieved by the estimation of joint pdf of confidence feature vector in correct and incorrect hypothesis classes. In addition, the adaptation of a confidence score using the pdf is presented. The proposed methods reduced the classification error rate by 18% from the conventional single feature based confidence scoring method in isolated word Out-of-Vocabulary rejection test.

  • An Unsupervised Speaker Adaptation Method for Lecture-Style Spontaneous Speech Recognition Using Multiple Recognition Systems

    Seiichi NAKAGAWA  Tomohiro WATANABE  Hiromitsu NISHIZAKI  Takehito UTSURO  

     
    PAPER-Spoken Language Systems

      Vol:
    E88-D No:3
      Page(s):
    463-471

    This paper describes an accurate unsupervised speaker adaptation method for lecture style spontaneous speech recognition using multiple LVCSR systems. In an unsupervised speaker adaptation framework, the improvement of recognition performance by adapting acoustic models remarkably depends on the accuracy of labels such as phonemes and syllables. Therefore, extraction of the adaptation data guided by confidence measure is effective for unsupervised adaptation. In this paper, we looked for the high confidence portions based on the agreement between two LVCSR systems, adapted acoustic models using the portions attached with high accurate labels, and then improved the recognition accuracy. We applied our method to the Corpus of Spontaneous Japanese (CSJ) and the method improved the recognition rate by about 2.1% in comparison with a traditional method.

  • Confidence Scoring for Accurate HMM-Based Speech Recognition by Using Monophone-Level Normalization Based on Subspace Method

    Muhammad GHULAM  Takaharu SATO  Takashi FUKUDA  Tsuneo NITTA  

     
    PAPER-Speech and Speaker Recognition

      Vol:
    E86-D No:3
      Page(s):
    430-437

    In this paper, a novel confidence scoring method that is applied to N-best hypotheses (word candidates) output from an HMM-based classifier is proposed. In the first pass of the proposed method, the HMM-based classifier with monophone models outputs N-best hypotheses and boundaries of all monophones in the hypotheses. In the second pass, an SM (Subspace Method)-based verifier tests the hypotheses by comparing confidence scores. To test the hypotheses, at first, the SM-based verifier calculates the similarity between phone vectors and an eigen vector set of monophones, then this similarity score is converted into a likelihood score with normalization of acoustic quality, and finally, an HMM-based likelihood of word level and an SM-based likelihood of monophone level are combined to formulate the confidence measure. Two kinds of experiments were performed to evaluate this confidence measure on speaker-independent word recognition. The results showed that the proposed confidence scoring method significantly reduced the word error rate from 4.7% obtained by the standard HMM classifier to 2.0%, and in an unknown word rejection, it reduced the equal error rate from 9.0% to 6.5%.