The search functionality is under construction.

Author Search Result

[Author] Seung Ho CHOI(5hit)

1-5hit
  • A Deep Learning-Based Approach to Non-Intrusive Objective Speech Intelligibility Estimation

    Deokgyu YUN  Hannah LEE  Seung Ho CHOI  

     
    LETTER-Speech and Hearing

      Pubricized:
    2018/01/09
      Vol:
    E101-D No:4
      Page(s):
    1207-1208

    This paper proposes a deep learning-based non-intrusive objective speech intelligibility estimation method based on recurrent neural network (RNN) with long short-term memory (LSTM) structure. Conventional non-intrusive estimation methods such as standard P.563 have poor estimation performance and lack of consistency, especially, in various noise and reverberation environments. The proposed method trains the LSTM RNN model parameters by utilizing the STOI that is the standard intrusive intelligibility estimation method with reference speech signal. The input and output of the LSTM RNN are the MFCC vector and the frame-wise STOI value, respectively. Experimental results show that the proposed objective intelligibility estimation method outperforms the conventional standard P.563 in various noisy and reverberant environments.

  • Dynamic Cepstral Representations Based on Order-Dependent Windowing Methods

    Hong Kook KIM  Seung Ho CHOI  Hwang Soo LEE  

     
    PAPER-Speech Processing and Acoustics

      Vol:
    E81-D No:5
      Page(s):
    434-440

    In this paper, we propose dynamic cepstral representations to effectively capture the temporal information of cepstral coefficients. The number of speech frames for the regression analysis to extract a dynamic cepstral coefficient is inversely proportional to the cepstral order since the cepstral coefficients of higher orders are more fluctuating than those of lower orders. By exploiting the relationship between the window length for extracting a dynamic cepstral coefficient and the statistical variance of the cepstral coefficient, we propose three kinds of windowing methods in this work: an utterance-specific variance-ratio windowing method, a statistical variance-ratio windowing method, and an inverse-lifter windowing method. Intra-speaker, inter-speaker, and speaker-independent recognition tests on 100 phonetically balanced words are carried out to evaluate the performance of the proposed order-dependent windowing methods.

  • Spectral Domain Noise Modeling in Compressive Sensing-Based Tonal Signal Detection

    Chenlin HU  Jin Young KIM  Seung Ho CHOI  Chang Joo KIM  

     
    LETTER-Digital Signal Processing

      Vol:
    E98-A No:5
      Page(s):
    1122-1125

    Tonal signals are shown as spectral peaks in the frequency domain. When the number of spectral peaks is small and the spectral signal is sparse, Compressive Sensing (CS) can be adopted to locate the peaks with a low-cost sensing system. In the CS scheme, a time domain signal is modelled as $oldsymbol{y}=Phi F^{-1}oldsymbol{s}$, where y and s are signal vectors in the time and frequency domains. In addition, F-1 and $Phi$ are an inverse DFT matrix and a random-sampling matrix, respectively. For a given y and $Phi$, the CS method attempts to estimate s with l0 or l1 optimization. To generate the peak candidates, we adopt the frequency-domain information of $ esmile{oldsymbol{s}}$ = $oldsymbol{F} esmile{oldsymbol{y}}$, where $ esmile{y}$ is the extended version of y and $ esmile{oldsymbol{y}}left(oldsymbol{n} ight)$ is zero when n is not elements of CS time instances. In this paper, we develop Gaussian statistics of $ esmile{oldsymbol{s}}$. That is, the variance and the mean values of $ esmile{oldsymbol{s}}left(oldsymbol{k} ight)$ are examined.

  • A Non-Intrusive Speech Intelligibility Estimation Method Based on Deep Learning Using Autoencoder Features

    Yoonhee KIM  Deokgyu YUN  Hannah LEE  Seung Ho CHOI  

     
    LETTER-Speech and Hearing

      Pubricized:
    2019/12/11
      Vol:
    E103-D No:3
      Page(s):
    714-715

    This paper presents a deep learning-based non-intrusive speech intelligibility estimation method using bottleneck features of autoencoder. The conventional standard non-intrusive speech intelligibility estimation method, P.563, lacks intelligibility estimation performance in various noise environments. We propose a more accurate speech intelligibility estimation method based on long-short term memory (LSTM) neural network whose input and output are an autoencoder bottleneck features and a short-time objective intelligence (STOI) score, respectively, where STOI is a standard tool for measuring intrusive speech intelligibility with reference speech signals. We showed that the proposed method has a superior performance by comparing with the conventional standard P.563 and mel-frequency cepstral coefficient (MFCC) feature-based intelligibility estimation methods for speech signals in various noise environments.

  • A Statistical Approach to Error Compensation in Spectral Quantization

    Seung Ho CHOI  Hong Kook KIM  

     
    LETTER-Speech and Hearing

      Vol:
    E90-D No:9
      Page(s):
    1460-1464

    In this paper, we propose a statistical approach to improve the performance of spectral quantization of speech coders. The proposed techniques compensate for the distortion in a decoded line spectrum pair (LSP) vector based on a statistical mapping function between a decoded LSP vector and its corresponding original LSP vector. We first develop two codebook-based probabilistic matching (CBPM) methods by investigating the distribution of LSP vectors. In addition, we propose an iterative procedure for the two CBPMs. Next, the proposed techniques are applied to the predictive vector quantizer (PVQ) used for the IS-641 speech coder. The experimental results show that the proposed techniques reduce average spectral distortion by around 0.064 dB and the percentage of outliers compared with the PVQ without any compensation, resulting in transparent quality of spectral quantization. Finally, the comparison of speech quality using the perceptual evaluation of speech quality (PESQ) measure is performed and it is shown that the IS-641 speech coder employing the proposed techniques has better decoded speech quality than the standard IS-641 speech coder.