IEICE global.ieice.org Site

Author Search Result

[Author] Yanqing SUN(3hit)

1-3hit

Enhancing the Robustness of the Posterior-Based Confidence Measures Using Entropy Information for Speech Recognition
Yanqing SUN Yu ZHOU Qingwei ZHAO Pengyuan ZHANG Fuping PAN Yonghong YAN

PAPER-Robust Speech Recognition

Vol:
E93-D No:9
Page(s):
2431-2439
In this paper, the robustness of the posterior-based confidence measures is improved by utilizing entropy information, which is calculated for speech-unit-level posteriors using only the best recognition result, without requiring a larger computational load than conventional methods. Using different normalization methods, two posterior-based entropy confidence measures are proposed. Practical details are discussed for two typical levels of hidden Markov model (HMM)-based posterior confidence measures, and both levels are compared in terms of their performances. Experiments show that the entropy information results in significant improvements in the posterior-based confidence measures. The absolute improvements of the out-of-vocabulary (OOV) rejection rate are more than 20% for both the phoneme-level confidence measures and the state-level confidence measures for our embedded test sets, without a significant decline of the in-vocabulary accuracy.
Acoustic Feature Optimization Based on F-Ratio for Robust Speech Recognition
Yanqing SUN Yu ZHOU Qingwei ZHAO Yonghong YAN

PAPER-Robust Speech Recognition

Vol:
E93-D No:9
Page(s):
2417-2430
This paper focuses on the problem of performance degradation in mismatched speech recognition. The F-Ratio analysis method is utilized to analyze the significance of different frequency bands for speech unit classification, and we find that frequencies around 1 kHz and 3 kHz, which are the upper bounds of the first and the second formants for most of the vowels, should be emphasized in comparison to the Mel-frequency cepstral coefficients (MFCC). The analysis result is further observed to be stable in several typical mismatched situations. Similar to the Mel-Frequency scale, another frequency scale called the F-Ratio-scale is thus proposed to optimize the filter bank design for the MFCC features, and make each subband contains equal significance for speech unit classification. Under comparable conditions, with the modified features we get a relative 43.20% decrease compared with the MFCC in sentence error rate for the emotion affected speech recognition, 35.54%, 23.03% for the noisy speech recognition at 15 dB and 0 dB SNR (signal to noise ratio) respectively, and 64.50% for the three years' 863 test data. The application of the F-Ratio analysis on the clean training set of the Aurora2 database demonstrates its robustness over languages, texts and sampling rates.
A Hybrid Speech Emotion Recognition System Based on Spectral and Prosodic Features
Yu ZHOU Junfeng LI Yanqing SUN Jianping ZHANG Yonghong YAN Masato AKAGI

PAPER-Human-computer Interaction

Vol:
E93-D No:10
Page(s):
2813-2821
In this paper, we present a hybrid speech emotion recognition system exploiting both spectral and prosodic features in speech. For capturing the emotional information in the spectral domain, we propose a new spectral feature extraction method by applying a novel non-uniform subband processing, instead of the mel-frequency subbands used in Mel-Frequency Cepstral Coefficients (MFCC). For prosodic features, a set of features that are closely correlated with speech emotional states are selected. In the proposed hybrid emotion recognition system, due to the inherently different characteristics of these two kinds of features (e.g., data size), the newly extracted spectral features are modeled by Gaussian Mixture Model (GMM) and the selected prosodic features are modeled by Support Vector Machine (SVM). The final result of the proposed emotion recognition system is obtained by combining the results from these two subsystems. Experimental results show that (1) the proposed non-uniform spectral features are more effective than the traditional MFCC features for emotion recognition; (2) the proposed hybrid emotion recognition system using both spectral and prosodic features yields the relative recognition error reduction rate of 17.0% over the traditional recognition systems using only the spectral features, and 62.3% over those using only the prosodic features.

Author Search Result

[Author] Yanqing SUN(3hit)

Enhancing the Robustness of the Posterior-Based Confidence Measures Using Entropy Information for Speech Recognition

Acoustic Feature Optimization Based on F-Ratio for Robust Speech Recognition

A Hybrid Speech Emotion Recognition System Based on Spectral and Prosodic Features

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles