HMM-Based Mask Estimation for a Speech Recognition Front-End Using Computational Auditory Scene Analysis

Ji Hun PARK; Jae Sam YOON; Hong Kook KIM

doi:10.1093/ietisy/e91-d.9.2360

HMM-Based Mask Estimation for a Speech Recognition Front-End Using Computational Auditory Scene Analysis

Ji Hun PARK, Jae Sam YOON, Hong Kook KIM

Full Text Views

0

Cite this

Summary :

In this paper, we propose a new mask estimation method for the computational auditory scene analysis (CASA) of speech using two microphones. The proposed method is based on a hidden Markov model (HMM) in order to incorporate an observation that the mask information should be correlated over contiguous analysis frames. In other words, HMM is used to estimate the mask information represented as the interaural time difference (ITD) and the interaural level difference (ILD) of two channel signals, and the estimated mask information is finally employed in the separation of desired speech from noisy speech. To show the effectiveness of the proposed mask estimation, we then compare the performance of the proposed method with that of a Gaussian kernel-based estimation method in terms of the performance of speech recognition. As a result, the proposed HMM-based mask estimation method provided an average word error rate reduction of 61.4% when compared with the Gaussian kernel-based mask estimation method.

Publication: IEICE TRANSACTIONS on Information Vol.E91-D No.9 pp.2360-2364

Publication Date: 2008/09/01

Publicized

Online ISSN: 1745-1361

DOI: 10.1093/ietisy/e91-d.9.2360

Type of Manuscript: LETTER

Category: Speech and Hearing

Cite this

Copy

Ji Hun PARK, Jae Sam YOON, Hong Kook KIM, "HMM-Based Mask Estimation for a Speech Recognition Front-End Using Computational Auditory Scene Analysis" in IEICE TRANSACTIONS on Information, vol. E91-D, no. 9, pp. 2360-2364, September 2008, doi: 10.1093/ietisy/e91-d.9.2360.
Abstract: In this paper, we propose a new mask estimation method for the computational auditory scene analysis (CASA) of speech using two microphones. The proposed method is based on a hidden Markov model (HMM) in order to incorporate an observation that the mask information should be correlated over contiguous analysis frames. In other words, HMM is used to estimate the mask information represented as the interaural time difference (ITD) and the interaural level difference (ILD) of two channel signals, and the estimated mask information is finally employed in the separation of desired speech from noisy speech. To show the effectiveness of the proposed mask estimation, we then compare the performance of the proposed method with that of a Gaussian kernel-based estimation method in terms of the performance of speech recognition. As a result, the proposed HMM-based mask estimation method provided an average word error rate reduction of 61.4% when compared with the Gaussian kernel-based mask estimation method.
URL: https://global.ieice.org/en_transactions/information/10.1093/ietisy/e91-d.9.2360/_p

Copy

@ARTICLE{e91-d_9_2360,
author={Ji Hun PARK, Jae Sam YOON, Hong Kook KIM, },
journal={IEICE TRANSACTIONS on Information},
title={HMM-Based Mask Estimation for a Speech Recognition Front-End Using Computational Auditory Scene Analysis},
year={2008},
volume={E91-D},
number={9},
pages={2360-2364},
abstract={In this paper, we propose a new mask estimation method for the computational auditory scene analysis (CASA) of speech using two microphones. The proposed method is based on a hidden Markov model (HMM) in order to incorporate an observation that the mask information should be correlated over contiguous analysis frames. In other words, HMM is used to estimate the mask information represented as the interaural time difference (ITD) and the interaural level difference (ILD) of two channel signals, and the estimated mask information is finally employed in the separation of desired speech from noisy speech. To show the effectiveness of the proposed mask estimation, we then compare the performance of the proposed method with that of a Gaussian kernel-based estimation method in terms of the performance of speech recognition. As a result, the proposed HMM-based mask estimation method provided an average word error rate reduction of 61.4% when compared with the Gaussian kernel-based mask estimation method.},
keywords={},
doi={10.1093/ietisy/e91-d.9.2360},
ISSN={1745-1361},
month={September},}

Copy

TY - JOUR
TI - HMM-Based Mask Estimation for a Speech Recognition Front-End Using Computational Auditory Scene Analysis
T2 - IEICE TRANSACTIONS on Information
SP - 2360
EP - 2364
AU - Ji Hun PARK
AU - Jae Sam YOON
AU - Hong Kook KIM
PY - 2008
DO - 10.1093/ietisy/e91-d.9.2360
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E91-D
IS - 9
JA - IEICE TRANSACTIONS on Information
Y1 - September 2008
AB - In this paper, we propose a new mask estimation method for the computational auditory scene analysis (CASA) of speech using two microphones. The proposed method is based on a hidden Markov model (HMM) in order to incorporate an observation that the mask information should be correlated over contiguous analysis frames. In other words, HMM is used to estimate the mask information represented as the interaural time difference (ITD) and the interaural level difference (ILD) of two channel signals, and the estimated mask information is finally employed in the separation of desired speech from noisy speech. To show the effectiveness of the proposed mask estimation, we then compare the performance of the proposed method with that of a Gaussian kernel-based estimation method in terms of the performance of speech recognition. As a result, the proposed HMM-based mask estimation method provided an average word error rate reduction of 61.4% when compared with the Gaussian kernel-based mask estimation method.
ER -