IEICE global.ieice.org Site

Author Search Result

[Author] Jae Sam YOON(2hit)

1-2hit

HMM-Based Mask Estimation for a Speech Recognition Front-End Using Computational Auditory Scene Analysis
Ji Hun PARK Jae Sam YOON Hong Kook KIM

LETTER-Speech and Hearing

Vol:
E91-D No:9
Page(s):
2360-2364
In this paper, we propose a new mask estimation method for the computational auditory scene analysis (CASA) of speech using two microphones. The proposed method is based on a hidden Markov model (HMM) in order to incorporate an observation that the mask information should be correlated over contiguous analysis frames. In other words, HMM is used to estimate the mask information represented as the interaural time difference (ITD) and the interaural level difference (ILD) of two channel signals, and the estimated mask information is finally employed in the separation of desired speech from noisy speech. To show the effectiveness of the proposed mask estimation, we then compare the performance of the proposed method with that of a Gaussian kernel-based estimation method in terms of the performance of speech recognition. As a result, the proposed HMM-based mask estimation method provided an average word error rate reduction of 61.4% when compared with the Gaussian kernel-based mask estimation method.
A MFCC-Based CELP Speech Coder for Server-Based Speech Recognition in Network Environments
Jae Sam YOON Gil Ho LEE Hong Kook KIM

PAPER-Speech/Audio Processing

Vol:
E90-A No:3
Page(s):
626-632
Existing standard speech coders can provide high quality speech communication. However, they tend to degrade the performance of automatic speech recognition (ASR) systems that use the reconstructed speech. The main cause of the degradation is in that the linear predictive coefficients (LPCs), which are typical spectral envelope parameters in speech coding, are optimized to speech quality rather than to the performance of speech recognition. In this paper, we propose a speech coder using mel-frequency cepstral coefficients (MFCCs) instead of LPCs to improve the performance of a server-based speech recognition system in network environments. To develop the proposed speech coder with a low-bit rate, we first explore the interframe correlation of MFCCs, which results in the predictive quantization of MFCC. Second, a safety-net scheme is proposed to make the MFCC-based speech coder robust to channel errors. As a result, we propose an 8.7 kbps MFCC-based CELP coder. It is shown that the proposed speech coder has a comparable speech quality to 8 kbps G.729 and the ASR system using the proposed speech coder gives the relative word error rate reduction by 6.8% as compared to the ASR system using G.729 on a large vocabulary task (AURORA4).

Author Search Result

[Author] Jae Sam YOON(2hit)

HMM-Based Mask Estimation for a Speech Recognition Front-End Using Computational Auditory Scene Analysis

A MFCC-Based CELP Speech Coder for Server-Based Speech Recognition in Network Environments

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles