IEICE global.ieice.org Site

Keyword Search Result

[Keyword] CMN(4hit)

1-4hit

Complex Noisy Independent Component Analysis by Negentropy Maximization
Guobing QIAN Liping LI Hongshu LIAO

LETTER-Noise and Vibration

Vol:
E97-A No:12
Page(s):
2641-2644
The maximization of non-Gaussianity is an effective approach to achieve the complex independent component analysis (ICA) problem. However, the traditional complex maximization of non-Gaussianity (CMN) algorithm does not consider the influence of noise. In this letter, a modification of the fixed-point algorithm is proposed for more practical occasions of the complex noisy ICA model. Simulations show that the proposed method demonstrates significantly improved performance over the traditional CMN algorithm in the noisy ICA model when the sample size is sufficient.
Distant Speech Recognition Using a Microphone Array Network
Alberto Yoshihiro NAKANO Seiichi NAKAGAWA Kazumasa YAMAMOTO

PAPER-Microphone Array

Vol:
E93-D No:9
Page(s):
2451-2462
In this work, spatial information consisting of the position and orientation angle of an acoustic source is estimated by an artificial neural network (ANN). The estimated position of a speaker in an enclosed space is used to refine the estimated time delays for a delay-and-sum beamformer, thus enhancing the output signal. On the other hand, the orientation angle is used to restrict the lexicon used in the recognition phase, assuming that the speaker faces a particular direction while speaking. To compensate the effect of the transmission channel inside a short frame analysis window, a new cepstral mean normalization (CMN) method based on a Gaussian mixture model (GMM) is investigated and shows better performance than the conventional CMN for short utterances. The performance of the proposed method is evaluated through Japanese digit/command recognition experiments.
Effective Acoustic Modeling for Pronunciation Quality Scoring of Strongly Accented Mandarin Speech
Fengpei GE Changliang LIU Jian SHAO Fuping PAN Bin DONG Yonghong YAN

PAPER-Speech and Hearing

Vol:
E91-D No:10
Page(s):
2485-2492
In this paper we present our investigation into improving the performance of our computer-assisted language learning (CALL) system through exploiting the acoustic model and features within the speech recognition framework. First, to alleviate channel distortion, speaker-dependent cepstrum mean normalization (CMN) is adopted and the average correlation coefficient (average CC) between machine and expert scores is improved from 78.00% to 84.14%. Second, heteroscedastic linear discriminant analysis (HLDA) is adopted to enhance the discriminability of the acoustic model, which successfully increases the average CC from 84.14% to 84.62%. Additionally, HLDA causes the scoring accuracy to be more stable at various pronunciation proficiency levels, and thus leads to an increase in the speaker correct-rank rate from 85.59% to 90.99%. Finally, we use maximum a posteriori (MAP) estimation to tune the acoustic model to fit strongly accented test speech. As a result, the average CC is improved from 84.62% to 86.57%. These three novel techniques improve the accuracy of evaluating pronunciation quality.
Robust Speech Recognition by Combining Short-Term and Long-Term Spectrum Based Position-Dependent CMN with Conventional CMN
Longbiao WANG Seiichi NAKAGAWA Norihide KITAOKA

PAPER-ASR under Reverberant Conditions

Vol:
E91-D No:3
Page(s):
457-466
In a distant-talking environment, the length of channel impulse response is longer than the short-term spectral analysis window. Conventional short-term spectrum based Cepstral Mean Normalization (CMN) is therefore, not effective under these conditions. In this paper, we propose a robust speech recognition method by combining a short-term spectrum based CMN with a long-term one. We assume that a static speech segment (such as a vowel, for example) affected by reverberation, can be modeled by a long-term cepstral analysis. Thus, the effect of long reverberation on a static speech segment may be compensated by the long-term spectrum based CMN. The cepstral distance of neighboring frames is used to discriminate the static speech segment (long-term spectrum) and the non-static speech segment (short-term spectrum). The cepstra of the static and non-static speech segments are normalized by the corresponding cepstral means. In a previous study, we proposed an environmentally robust speech recognition method based on Position-Dependent CMN (PDCMN) to compensate for channel distortion depending on speaker position, and which is more efficient than conventional CMN. In this paper, the concept of combining short-term and long-term spectrum based CMN is extended to PDCMN. We call this Variable Term spectrum based PDCMN (VT-PDCMN). Since PDCMN/VT-PDCMN cannot normalize speaker variations because a position-dependent cepstral mean contains the average speaker characteristics over all speakers, we also combine PDCMN/VT-PDCMN with conventional CMN in this study. We conducted the experiments based on our proposed method using limited vocabulary (100 words) distant-talking isolated word recognition in a real environment. The proposed method achieved a relative error reduction rate of 60.9% over the conventional short-term spectrum based CMN and 30.6% over the short-term spectrum based PDCMN.

Keyword Search Result

[Keyword] CMN(4hit)

Complex Noisy Independent Component Analysis by Negentropy Maximization

Distant Speech Recognition Using a Microphone Array Network

Effective Acoustic Modeling for Pronunciation Quality Scoring of Strongly Accented Mandarin Speech

Robust Speech Recognition by Combining Short-Term and Long-Term Spectrum Based Position-Dependent CMN with Conventional CMN

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles