The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] cepstral analysis(4hit)

1-4hit
  • Integration of Spectral Feature Extraction and Modeling for HMM-Based Speech Synthesis

    Kazuhiro NAKAMURA  Kei HASHIMOTO  Yoshihiko NANKAKU  Keiichi TOKUDA  

     
    PAPER-HMM-based Speech Synthesis

      Vol:
    E97-D No:6
      Page(s):
    1438-1448

    This paper proposes a novel approach for integrating spectral feature extraction and acoustic modeling in hidden Markov model (HMM) based speech synthesis. The statistical modeling process of speech waveforms is typically divided into two component modules: the frame-by-frame feature extraction module and the acoustic modeling module. In the feature extraction module, the statistical mel-cepstral analysis technique has been used and the objective function is the likelihood of mel-cepstral coefficients for given speech waveforms. In the acoustic modeling module, the objective function is the likelihood of model parameters for given mel-cepstral coefficients. It is important to improve the performance of each component module for achieving higher quality synthesized speech. However, the final objective of speech synthesis systems is to generate natural speech waveforms from given texts, and the improvement of each component module does not always lead to the improvement of the quality of synthesized speech. Therefore, ideally all objective functions should be optimized based on an integrated criterion which well represents subjective speech quality of human perception. In this paper, we propose an approach to model speech waveforms directly and optimize the final objective function. Experimental results show that the proposed method outperformed the conventional methods in objective and subjective measures.

  • Perceptually Weighted Mel-Cepstrum Analysis of Speech Based on Psychoacoustic Model

    Hongwu YANG  Dezhi HUANG  Lianhong CAI  

     
    LETTER-Speech and Hearing

      Vol:
    E89-D No:12
      Page(s):
    2998-3001

    This letter proposes a novel approach for mel-cepstral analysis based on the psychoacoustic model of MPEG. A perceptual weighting function is developed by applying cubic spline interpolation on the signal-to-mask ratios (SMRs) which are obtained from the psychoacoustic model. Experiments on speaker identification and speech re-synthesis showed that the proposed method not only improved the speaker recognition performance, but also improved the speech quality of the re-synthesized speech.

  • Spectral Peak-Weighted Liftering of Cepstral Coefficients for Speech Recognition

    Hong Kook KIM  Hwang Soo LEE  

     
    PAPER-Speech and Hearing

      Vol:
    E83-D No:7
      Page(s):
    1540-1549

    In this paper, we propose a peak-weighted cepstral lifter (PWL) for enhancing the spectral peaks of an all-pole model spectrum in the cepstral domain. The design parameter of the PWL is the degree of pole enhancement or pole shifting toward the unit circle. The optimal pole shifting factor is chosen by considering the sensitivity to spectral resonance peaks, the variability of cepstral variances, and the recognition accuracy. Next, we generalize the PWL so that the optimal shifting factor is adaptively determined in frame-by-frame basis. Compared with other cepstral lifters, a speech recognizer employing the frame-adaptive PWL provides better recognition performance.

  • A 16 kb/s Wideband CELP-Based Speech Coder Using Mel-Generalized Cepstral Analysis

    Kazuhito KOISHIDA  Gou HIRABAYASHI  Keiichi TOKUDA  Takao KOBAYASHI  

     
    PAPER-Speech and Hearing

      Vol:
    E83-D No:4
      Page(s):
    876-883

    We propose a wideband CELP-type speech coder at 16 kb/s based on a mel-generalized cepstral (MGC) analysis technique. MGC analysis makes it possible to obtain a more accurate representation of spectral zeros compared to linear predictive (LP) analysis and take a perceptual frequency scale into account. A major advantage of the proposed coder is that the benefits of MGC representation of speech spectra can be incorporated into the CELP coding process. Subjective tests show that the proposed coder at 16 kb/s achieves a significant improvement in performance over a 16 kb/s conventional CELP coder under the same coding framework and bit allocation. Moreover, the proposed coder is found to outperform the ITU-T G. 722 standard at 64 kb/s.