IEICE global.ieice.org Site

Keyword Search Result

[Keyword] cepstral analysis(4hit)

1-4hit

Integration of Spectral Feature Extraction and Modeling for HMM-Based Speech Synthesis
Kazuhiro NAKAMURA Kei HASHIMOTO Yoshihiko NANKAKU Keiichi TOKUDA

PAPER-HMM-based Speech Synthesis

Vol:
E97-D No:6
Page(s):
1438-1448
This paper proposes a novel approach for integrating spectral feature extraction and acoustic modeling in hidden Markov model (HMM) based speech synthesis. The statistical modeling process of speech waveforms is typically divided into two component modules: the frame-by-frame feature extraction module and the acoustic modeling module. In the feature extraction module, the statistical mel-cepstral analysis technique has been used and the objective function is the likelihood of mel-cepstral coefficients for given speech waveforms. In the acoustic modeling module, the objective function is the likelihood of model parameters for given mel-cepstral coefficients. It is important to improve the performance of each component module for achieving higher quality synthesized speech. However, the final objective of speech synthesis systems is to generate natural speech waveforms from given texts, and the improvement of each component module does not always lead to the improvement of the quality of synthesized speech. Therefore, ideally all objective functions should be optimized based on an integrated criterion which well represents subjective speech quality of human perception. In this paper, we propose an approach to model speech waveforms directly and optimize the final objective function. Experimental results show that the proposed method outperformed the conventional methods in objective and subjective measures.
Perceptually Weighted Mel-Cepstrum Analysis of Speech Based on Psychoacoustic Model
Hongwu YANG Dezhi HUANG Lianhong CAI

LETTER-Speech and Hearing

Vol:
E89-D No:12
Page(s):
2998-3001
This letter proposes a novel approach for mel-cepstral analysis based on the psychoacoustic model of MPEG. A perceptual weighting function is developed by applying cubic spline interpolation on the signal-to-mask ratios (SMRs) which are obtained from the psychoacoustic model. Experiments on speaker identification and speech re-synthesis showed that the proposed method not only improved the speaker recognition performance, but also improved the speech quality of the re-synthesized speech.
Spectral Peak-Weighted Liftering of Cepstral Coefficients for Speech Recognition
Hong Kook KIM Hwang Soo LEE

PAPER-Speech and Hearing

Vol:
E83-D No:7
Page(s):
1540-1549
In this paper, we propose a peak-weighted cepstral lifter (PWL) for enhancing the spectral peaks of an all-pole model spectrum in the cepstral domain. The design parameter of the PWL is the degree of pole enhancement or pole shifting toward the unit circle. The optimal pole shifting factor is chosen by considering the sensitivity to spectral resonance peaks, the variability of cepstral variances, and the recognition accuracy. Next, we generalize the PWL so that the optimal shifting factor is adaptively determined in frame-by-frame basis. Compared with other cepstral lifters, a speech recognizer employing the frame-adaptive PWL provides better recognition performance.
A 16 kb/s Wideband CELP-Based Speech Coder Using Mel-Generalized Cepstral Analysis
Kazuhito KOISHIDA Gou HIRABAYASHI Keiichi TOKUDA Takao KOBAYASHI

PAPER-Speech and Hearing

Vol:
E83-D No:4
Page(s):
876-883
We propose a wideband CELP-type speech coder at 16 kb/s based on a mel-generalized cepstral (MGC) analysis technique. MGC analysis makes it possible to obtain a more accurate representation of spectral zeros compared to linear predictive (LP) analysis and take a perceptual frequency scale into account. A major advantage of the proposed coder is that the benefits of MGC representation of speech spectra can be incorporated into the CELP coding process. Subjective tests show that the proposed coder at 16 kb/s achieves a significant improvement in performance over a 16 kb/s conventional CELP coder under the same coding framework and bit allocation. Moreover, the proposed coder is found to outperform the ITU-T G. 722 standard at 64 kb/s.

Keyword Search Result

[Keyword] cepstral analysis(4hit)

Integration of Spectral Feature Extraction and Modeling for HMM-Based Speech Synthesis

Perceptually Weighted Mel-Cepstrum Analysis of Speech Based on Psychoacoustic Model

Spectral Peak-Weighted Liftering of Cepstral Coefficients for Speech Recognition

A 16 kb/s Wideband CELP-Based Speech Coder Using Mel-Generalized Cepstral Analysis

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles