The search functionality is under construction.

Author Search Result

[Author] Youngjoo SUH(4hit)

1-4hit
  • Noise Robust Speaker Identification Using Sub-Band Weighting in Multi-Band Approach

    Sungtak KIM  Mikyong JI  Youngjoo SUH  Hoirin KIM  

     
    LETTER-Speech and Hearing

      Vol:
    E90-D No:12
      Page(s):
    2110-2114

    Recently, many techniques have been proposed to improve speaker identification in noise environments. Among these techniques, we consider the feature recombination technique for the multi-band approach in noise robust speaker identification. The conventional feature recombination technique is very effective in the band-limited noise condition, but in broad-band noise condition, the conventional feature recombination technique does not provide notable performance improvement compared with the full-band system. Even though the speech is corrupted by the broad-band noise, the degree of the noise corruption on each sub-band is different from each other. In the conventional feature recombination for speaker identification, all sub-band features are used to compute multi-band likelihood score, but this likelihood computation does not use a merit of multi-band approach effectively, even though the sub-band features are extracted independently. Here we propose a new technique of sub-band likelihood computation with sub-band weighting in the feature recombination method. The signal to noise ratio (SNR) is used to compute the sub-band weights. The proposed sub-band-weighted likelihood computation makes a speaker identification system more robust to noise. Experimental results show that the average error reduction rate (ERR) in various noise environments is more than 24% compared with the conventional feature recombination-based speaker identification system.

  • Cepstral Domain Feature Extraction Utilizing Entropic Distance-Based Filterbank

    Youngjoo SUH  Hoirin KIM  

     
    LETTER-Speech and Hearing

      Vol:
    E93-D No:2
      Page(s):
    392-394

    The selection of effective features is especially important in achieving highly accurate speech recognition. Although the mel-cepstrum is a popular and effective feature for speech recognition, it is still unclear that the filterbank adopted in the mel-cepstrum always produces the optimal performance regardless of the phonetic environment of any specific speech recognition task. In this paper, we propose a new cepstral domain feature extraction approach utilizing the entropic distance-based filterbank for highly accurate speech recognition. Experimental results showed that the cepstral features employing the proposed filterbank reduce the relative error by 31% for clean as well as noisy speech compared to the mel-cepstral features.

  • Histogram Equalization Utilizing Window-Based Smoothed CDF Estimation for Feature Compensation

    Youngjoo SUH  Hoirin KIM  Munchurl KIM  

     
    LETTER-Speech and Hearing

      Vol:
    E91-D No:8
      Page(s):
    2199-2202

    In this letter, we propose a new histogram equalization method to compensate for acoustic mismatches mainly caused by corruption of additive noise and channel distortion in speech recognition. The proposed method employs an improved test cumulative distribution function (CDF) by more accurately smoothing the conventional order statistics-based test CDF with the use of window functions for robust feature compensation. Experiments on the AURORA 2 framework confirmed that the proposed method is effective in compensating speech recognition features by reducing the averaged relative error by 13.12% over the order statistics-based conventional histogram equalization method and by 58.02% over the mel-cepstral-based features for the three test sets.

  • Soft Counting Poisson Mixture Model-Based Polling Method for Speech/Nonspeech Classification

    Youngjoo SUH  Hoirin KIM  Minsoo HAHN  Yongju LEE  

     
    LETTER-Speech and Hearing

      Vol:
    E89-D No:12
      Page(s):
    2994-2997

    In this letter, a new segment-level speech/nonspeech classification method based on the Poisson polling technique is proposed. The proposed method makes two modifications from the baseline Poisson polling method to further improve the classification accuracy. One of them is to employ Poisson mixture models to more accurately represent various segmental patterns of the observed frequencies for frame-level input features. The other is the soft counting-based frequency estimation to improve the reliability of the observed frequencies. The effectiveness of the proposed method is confirmed by the experimental results showing the maximum error reduction of 39% compared to the segmentally accumulated log-likelihood ratio-based method.