IEICE global.ieice.org Site

Keyword Search Result

[Keyword] mask estimation(2hit)

1-2hit

A Two-Stage Phase-Aware Approach for Monaural Multi-Talker Speech Separation
Lu YIN Junfeng LI Yonghong YAN Masato AKAGI

PAPER-Speech and Hearing

Pubricized:
2020/04/20
Vol:
E103-D No:7
Page(s):
1732-1743
The simultaneous utterances impact the ability of both the hearing-impaired persons and automatic speech recognition systems. Recently, deep neural networks have dramatically improved the speech separation performance. However, most previous works only estimate the speech magnitude and use the mixture phase for speech reconstruction. The use of the mixture phase has become a critical limitation for separation performance. This study proposes a two-stage phase-aware approach for multi-talker speech separation, which integrally recovers the magnitude as well as the phase. For the phase recovery, Multiple Input Spectrogram Inversion (MISI) algorithm is utilized due to its effectiveness and simplicity. The study implements the MISI algorithm based on the mask and gives that the ideal amplitude mask (IAM) is the optimal mask for the mask-based MISI phase recovery, which brings less phase distortion. To compensate for the error of phase recovery and minimize the signal distortion, an advanced mask is proposed for the magnitude estimation. The IAM and the proposed mask are estimated at different stages to recover the phase and the magnitude, respectively. Two frameworks of neural network are evaluated for the magnitude estimation on the second stage, demonstrating the effectiveness and flexibility of the proposed approach. The experimental results demonstrate that the proposed approach significantly minimizes the distortions of the separated speech.
HMM-Based Mask Estimation for a Speech Recognition Front-End Using Computational Auditory Scene Analysis
Ji Hun PARK Jae Sam YOON Hong Kook KIM

LETTER-Speech and Hearing

Vol:
E91-D No:9
Page(s):
2360-2364
In this paper, we propose a new mask estimation method for the computational auditory scene analysis (CASA) of speech using two microphones. The proposed method is based on a hidden Markov model (HMM) in order to incorporate an observation that the mask information should be correlated over contiguous analysis frames. In other words, HMM is used to estimate the mask information represented as the interaural time difference (ITD) and the interaural level difference (ILD) of two channel signals, and the estimated mask information is finally employed in the separation of desired speech from noisy speech. To show the effectiveness of the proposed mask estimation, we then compare the performance of the proposed method with that of a Gaussian kernel-based estimation method in terms of the performance of speech recognition. As a result, the proposed HMM-based mask estimation method provided an average word error rate reduction of 61.4% when compared with the Gaussian kernel-based mask estimation method.

Keyword Search Result

[Keyword] mask estimation(2hit)

A Two-Stage Phase-Aware Approach for Monaural Multi-Talker Speech Separation

HMM-Based Mask Estimation for a Speech Recognition Front-End Using Computational Auditory Scene Analysis

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles