The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] Mandarin speech(3hit)

1-3hit
  • Tone Recognition of Continuous Mandarin Speech Based on Tone Nucleus Model and Neural Network

    Xiao-Dong WANG  Keikichi HIROSE  Jin-Song ZHANG  Nobuaki MINEMATSU  

     
    PAPER-Pattern Recognition

      Vol:
    E91-D No:6
      Page(s):
    1748-1755

    A method was developed for automatic recognition of syllable tone types in continuous speech of Mandarin by integrating two techniques, tone nucleus modeling and neural network classifier. The tone nucleus modeling considers a syllable F0 contour as consisting of three parts: onset course, tone nucleus, and offset course. Two courses are transitions from/to neighboring syllable F0 contours, while the tone nucleus is intrinsic part of the F0 contour. By viewing only the tone nucleus, acoustic features less affected by neighboring syllables are obtained. When using the tone nucleus modeling, automatic detection of tone nucleus comes crucial. An improvement was added to the original detection method. Distinctive acoustic features for tone types are not limited to F0 contours. Other prosodic features, such as waveform power and syllable duration, are also useful for tone recognition. Their heterogeneous features are rather difficult to be handled simultaneously in hidden Markov models (HMM), but are easy in neural networks. We adopted multi-layer perceptron (MLP) as a neural network. Tone recognition experiments were conducted for speaker dependent and independent cases. In order to show the effect of integration, experiments were conducted also for two baselines: HMM classifier with tone nucleus modeling, and MLP classifier viewing entire syllable instead of tone nucleus. The integrated method showed 87.1% of tone recognition rate in speaker dependent case, and 80.9% in speaker independent case, which was about 10% relative error reduction as compared to the baselines.

  • Comparison of Classification Methods for Detecting Emotion from Mandarin Speech

    Tsang-Long PAO  Yu-Te CHEN  Jun-Heng YEH  

     
    PAPER-Human-computer Interaction

      Vol:
    E91-D No:4
      Page(s):
    1074-1081

    It is said that technology comes out from humanity. What is humanity? The very definition of humanity is emotion. Emotion is the basis for all human expression and the underlying theme behind everything that is done, said, thought or imagined. Making computers being able to perceive and respond to human emotion, the human-computer interaction will be more natural. Several classifiers are adopted for automatically assigning an emotion category, such as anger, happiness or sadness, to a speech utterance. These classifiers were designed independently and tested on various emotional speech corpora, making it difficult to compare and evaluate their performance. In this paper, we first compared several popular classification methods and evaluated their performance by applying them to a Mandarin speech corpus consisting of five basic emotions, including anger, happiness, boredom, sadness and neutral. The extracted feature streams contain MFCC, LPCC, and LPC. The experimental results show that the proposed WD-MKNN classifier achieves an accuracy of 81.4% for the 5-class emotion recognition and outperforms other classification techniques, including KNN, MKNN, DW-KNN, LDA, QDA, GMM, HMM, SVM, and BPNN. Then, to verify the advantage of the proposed method, we compared these classifiers by applying them to another Mandarin expressive speech corpus consisting of two emotions. The experimental results still show that the proposed WD-MKNN outperforms others.

  • A Robust Speaker Identification System Based on Wavelet Transform

    Ching-Tang HSIEH  You-Chuang WANG  

     
    PAPER

      Vol:
    E84-D No:7
      Page(s):
    839-846

    A new approach for extracting significant characteristic within speech signal for distinct speaker is presented. Based on the multiresolution property of wavelet transform, quadrature mirror filters (QMFs) derived by Daubechies is used to decompose the input signal into varied frequency channels. Owning to the uncorrelation property of each resolution derived from QMFs, Linear Predict Coding Cepstrum (LPCC) of lower frequency region and entropy information of higher frequency region for each decomposition process are calculated as the speech feature vectors. In addition, a hard thresholding technique for lower resolution in each decomposition process is also used to remove the effect of noise interference. The experimental result shows that by using this mechanism, not only effectively reduce the effect of noise inference but improve the recognition rate. The proposed feature extraction algorithm is evaluated on MAT telephone speech database for Text-Independent speaker identification using vector quantization (VQ). Some popular existing methods are also evaluated for comparison in this paper. Experimental results show that the performance of the proposed method is more effective and robust than that of the other existing methods. For 80 speakers and 2 seconds utterance, the identification rate is 98.52%. In addition, the performance of our method is very satisfactory even at low SNR.