The search functionality is under construction.

Author Search Result

[Author] Hoirin KIM(8hit)

1-8hit
  • Response Time Reduction of Speech Recognizers Using Single Gaussians

    Sangbae JEONG  Hoirin KIM  Minsoo HAHN  

     
    LETTER-Speech and Hearing

      Vol:
    E90-D No:5
      Page(s):
    868-871

    In this paper, we propose a useful algorithm that can be applied to reduce the response time of speech recognizers based on HMM's. In our algorithm, to reduce the response time, promising HMM states are selected by single Gaussians. In speech recognition, HMM state likelihoods are evaluated by the corresponding single Gaussians first, and then likelihoods by original full Gaussians are computed and replaced only for the HMM states having relatively large likelihoods. By doing so, we can reduce the pattern-matching time for speech recognition significantly without any noticeable loss of the recognition rate. In addition, we cluster the single Gaussians into groups by measuring the distance between Gaussians. Therefore, we can reduce the extra memory much more. In our 10,000 word Korean POI (point-of-interest) recognition task, our proposed algorithm shows 35.57% reduction of the response time in comparison with that of the baseline system at the cost of 10% degradation of the WER.

  • Noise Robust Speaker Identification Using Sub-Band Weighting in Multi-Band Approach

    Sungtak KIM  Mikyong JI  Youngjoo SUH  Hoirin KIM  

     
    LETTER-Speech and Hearing

      Vol:
    E90-D No:12
      Page(s):
    2110-2114

    Recently, many techniques have been proposed to improve speaker identification in noise environments. Among these techniques, we consider the feature recombination technique for the multi-band approach in noise robust speaker identification. The conventional feature recombination technique is very effective in the band-limited noise condition, but in broad-band noise condition, the conventional feature recombination technique does not provide notable performance improvement compared with the full-band system. Even though the speech is corrupted by the broad-band noise, the degree of the noise corruption on each sub-band is different from each other. In the conventional feature recombination for speaker identification, all sub-band features are used to compute multi-band likelihood score, but this likelihood computation does not use a merit of multi-band approach effectively, even though the sub-band features are extracted independently. Here we propose a new technique of sub-band likelihood computation with sub-band weighting in the feature recombination method. The signal to noise ratio (SNR) is used to compute the sub-band weights. The proposed sub-band-weighted likelihood computation makes a speaker identification system more robust to noise. Experimental results show that the average error reduction rate (ERR) in various noise environments is more than 24% compared with the conventional feature recombination-based speaker identification system.

  • Cepstral Domain Feature Extraction Utilizing Entropic Distance-Based Filterbank

    Youngjoo SUH  Hoirin KIM  

     
    LETTER-Speech and Hearing

      Vol:
    E93-D No:2
      Page(s):
    392-394

    The selection of effective features is especially important in achieving highly accurate speech recognition. Although the mel-cepstrum is a popular and effective feature for speech recognition, it is still unclear that the filterbank adopted in the mel-cepstrum always produces the optimal performance regardless of the phonetic environment of any specific speech recognition task. In this paper, we propose a new cepstral domain feature extraction approach utilizing the entropic distance-based filterbank for highly accurate speech recognition. Experimental results showed that the cepstral features employing the proposed filterbank reduce the relative error by 31% for clean as well as noisy speech compared to the mel-cepstral features.

  • Histogram Equalization Utilizing Window-Based Smoothed CDF Estimation for Feature Compensation

    Youngjoo SUH  Hoirin KIM  Munchurl KIM  

     
    LETTER-Speech and Hearing

      Vol:
    E91-D No:8
      Page(s):
    2199-2202

    In this letter, we propose a new histogram equalization method to compensate for acoustic mismatches mainly caused by corruption of additive noise and channel distortion in speech recognition. The proposed method employs an improved test cumulative distribution function (CDF) by more accurately smoothing the conventional order statistics-based test CDF with the use of window functions for robust feature compensation. Experiments on the AURORA 2 framework confirmed that the proposed method is effective in compensating speech recognition features by reducing the averaged relative error by 13.12% over the order statistics-based conventional histogram equalization method and by 58.02% over the mel-cepstral-based features for the three test sets.

  • Utterance Verification Using State-Level Log-Likelihood Ratio with Frame and State Selection

    Suk-Bong KWON  Hoirin KIM  

     
    LETTER-Speech and Hearing

      Vol:
    E93-D No:3
      Page(s):
    647-650

    This paper suggests utterance verification system using state-level log-likelihood ratio with frame and state selection. We use hidden Markov models for speech recognition and utterance verification as acoustic models and anti-phone models. The hidden Markov models have three states and each state represents different characteristics of a phone. Thus we propose an algorithm to compute state-level log-likelihood ratio and give weights on states for obtaining more reliable confidence measure of recognized phones. Additionally, we propose a frame selection algorithm to compute confidence measure on frames including proper speech in the input speech. In general, phone segmentation information obtained from speaker-independent speech recognition system is not accurate because triphone-based acoustic models are difficult to effectively train for covering diverse pronunciation and coarticulation effect. So, it is more difficult to find the right matched states when obtaining state segmentation information. A state selection algorithm is suggested for finding valid states. The proposed method using state-level log-likelihood ratio with frame and state selection shows that the relative reduction in equal error rate is 18.1% compared to the baseline system using simple phone-level log-likelihood ratios.

  • Text-Independent Speaker Identification in a Distant-Talking Multi-Microphone Environment

    Mikyong JI  Sungtak KIM  Hoirin KIM  

     
    LETTER-Speech and Hearing

      Vol:
    E90-D No:11
      Page(s):
    1892-1895

    With the aim of improving speaker identification, we propose a likelihood-based integration method to combine the speaker identification results obtained through multiple microphones. In many cases, the composite result has lower error rate than that by any single channel. The proposed integration method can achieve more reliable identification performance in the ubiquitous robot companion (URC) environment in which the robot is connected to a server through an extremely high broadband penetration rate.

  • Soft Counting Poisson Mixture Model-Based Polling Method for Speech/Nonspeech Classification

    Youngjoo SUH  Hoirin KIM  Minsoo HAHN  Yongju LEE  

     
    LETTER-Speech and Hearing

      Vol:
    E89-D No:12
      Page(s):
    2994-2997

    In this letter, a new segment-level speech/nonspeech classification method based on the Poisson polling technique is proposed. The proposed method makes two modifications from the baseline Poisson polling method to further improve the classification accuracy. One of them is to employ Poisson mixture models to more accurately represent various segmental patterns of the observed frequencies for frame-level input features. The other is the soft counting-based frequency estimation to improve the reliability of the observed frequencies. The effectiveness of the proposed method is confirmed by the experimental results showing the maximum error reduction of 39% compared to the segmentally accumulated log-likelihood ratio-based method.

  • Utterance Verification Using Word Voiceprint Models Based on Probabilistic Distributions of Phone-Level Log-Likelihood Ratio and Phone Duration

    Suk-Bong KWON  HoiRin KIM  

     
    LETTER-Speech and Hearing

      Vol:
    E91-D No:11
      Page(s):
    2746-2750

    This paper suggests word voiceprint models to verify the recognition results obtained from a speech recognition system. Word voiceprint models have word-dependent information based on the distributions of phone-level log-likelihood ratio and duration. Thus, we can obtain a more reliable confidence score for a recognized word by using its word voiceprint models that represent the more proper characteristics of utterance verification for the word. Additionally, when obtaining a log-likelihood ratio-based word voiceprint score, this paper proposes a new log-scale normalization function using the distribution of the phone-level log-likelihood ratio, instead of the sigmoid function widely used in obtaining a phone-level log-likelihood ratio. This function plays a role of emphasizing a mis-recognized phone in a word. This individual information of a word is used to help achieve a more discriminative score against out-of-vocabulary words. The proposed method requires additional memory, but it shows that the relative reduction in equal error rate is 16.9% compared to the baseline system using simple phone log-likelihood ratios.