Underdetermined Blind Separation of Convolutive Mixtures of Speech Using Time-Frequency Mask and Mixing Matrix Estimation

Audrey BLIN; Shoko ARAKI; Shoji MAKINO

doi:10.1093/ietfec/e88-a.7.1693

Underdetermined Blind Separation of Convolutive Mixtures of Speech Using Time-Frequency Mask and Mixing Matrix Estimation

Audrey BLIN, Shoko ARAKI, Shoji MAKINO

Full Text Views

0

Cite this

Summary :

This paper focuses on the underdetermined blind source separation (BSS) of three speech signals mixed in a real environment from measurements provided by two sensors. To date, solutions to the underdetermined BSS problem have mainly been based on the assumption that the speech signals are sufficiently sparse. They involve designing binary masks that extract signals at time-frequency points where only one signal was assumed to exist. The major issue encountered in previous work relates to the occurrence of distortion, which affects a separated signal with loud musical noise. To overcome this problem, we propose combining sparseness with the use of an estimated mixing matrix. First, we use a geometrical approach to detect when only one source is active and to perform a preliminary separation with a time-frequency mask. This information is then used to estimate the mixing matrix, which allows us to improve our separation. Experimental results show that this combination of time-frequency mask and mixing matrix estimation provides separated signals of better quality (less distortion, less musical noise) than those extracted without using the estimated mixing matrix in reverberant conditions where the reverberant time (TR) was 130 ms and 200 ms. Furthermore, informal listening tests clearly show that musical noise is deeply lowered by the proposed method comparatively to the classical approaches.

Publication: IEICE TRANSACTIONS on Fundamentals Vol.E88-A No.7 pp.1693-1700

Publication Date: 2005/07/01

Publicized

Online ISSN

DOI: 10.1093/ietfec/e88-a.7.1693

Type of Manuscript: Special Section PAPER (Special Section on Multi-channel Acoustic Signal Processing)

Category: Blind Source Separation

Cite this

Copy

Audrey BLIN, Shoko ARAKI, Shoji MAKINO, "Underdetermined Blind Separation of Convolutive Mixtures of Speech Using Time-Frequency Mask and Mixing Matrix Estimation" in IEICE TRANSACTIONS on Fundamentals, vol. E88-A, no. 7, pp. 1693-1700, July 2005, doi: 10.1093/ietfec/e88-a.7.1693.
Abstract: This paper focuses on the underdetermined blind source separation (BSS) of three speech signals mixed in a real environment from measurements provided by two sensors. To date, solutions to the underdetermined BSS problem have mainly been based on the assumption that the speech signals are sufficiently sparse. They involve designing binary masks that extract signals at time-frequency points where only one signal was assumed to exist. The major issue encountered in previous work relates to the occurrence of distortion, which affects a separated signal with loud musical noise. To overcome this problem, we propose combining sparseness with the use of an estimated mixing matrix. First, we use a geometrical approach to detect when only one source is active and to perform a preliminary separation with a time-frequency mask. This information is then used to estimate the mixing matrix, which allows us to improve our separation. Experimental results show that this combination of time-frequency mask and mixing matrix estimation provides separated signals of better quality (less distortion, less musical noise) than those extracted without using the estimated mixing matrix in reverberant conditions where the reverberant time (TR) was 130 ms and 200 ms. Furthermore, informal listening tests clearly show that musical noise is deeply lowered by the proposed method comparatively to the classical approaches.
URL: https://global.ieice.org/en_transactions/fundamentals/10.1093/ietfec/e88-a.7.1693/_p

Copy

@ARTICLE{e88-a_7_1693,
author={Audrey BLIN, Shoko ARAKI, Shoji MAKINO, },
journal={IEICE TRANSACTIONS on Fundamentals},
title={Underdetermined Blind Separation of Convolutive Mixtures of Speech Using Time-Frequency Mask and Mixing Matrix Estimation},
year={2005},
volume={E88-A},
number={7},
pages={1693-1700},
abstract={This paper focuses on the underdetermined blind source separation (BSS) of three speech signals mixed in a real environment from measurements provided by two sensors. To date, solutions to the underdetermined BSS problem have mainly been based on the assumption that the speech signals are sufficiently sparse. They involve designing binary masks that extract signals at time-frequency points where only one signal was assumed to exist. The major issue encountered in previous work relates to the occurrence of distortion, which affects a separated signal with loud musical noise. To overcome this problem, we propose combining sparseness with the use of an estimated mixing matrix. First, we use a geometrical approach to detect when only one source is active and to perform a preliminary separation with a time-frequency mask. This information is then used to estimate the mixing matrix, which allows us to improve our separation. Experimental results show that this combination of time-frequency mask and mixing matrix estimation provides separated signals of better quality (less distortion, less musical noise) than those extracted without using the estimated mixing matrix in reverberant conditions where the reverberant time (TR) was 130 ms and 200 ms. Furthermore, informal listening tests clearly show that musical noise is deeply lowered by the proposed method comparatively to the classical approaches.},
keywords={},
doi={10.1093/ietfec/e88-a.7.1693},
ISSN={},
month={July},}

Copy

TY - JOUR
TI - Underdetermined Blind Separation of Convolutive Mixtures of Speech Using Time-Frequency Mask and Mixing Matrix Estimation
T2 - IEICE TRANSACTIONS on Fundamentals
SP - 1693
EP - 1700
AU - Audrey BLIN
AU - Shoko ARAKI
AU - Shoji MAKINO
PY - 2005
DO - 10.1093/ietfec/e88-a.7.1693
JO - IEICE TRANSACTIONS on Fundamentals
SN -
VL - E88-A
IS - 7
JA - IEICE TRANSACTIONS on Fundamentals
Y1 - July 2005
AB - This paper focuses on the underdetermined blind source separation (BSS) of three speech signals mixed in a real environment from measurements provided by two sensors. To date, solutions to the underdetermined BSS problem have mainly been based on the assumption that the speech signals are sufficiently sparse. They involve designing binary masks that extract signals at time-frequency points where only one signal was assumed to exist. The major issue encountered in previous work relates to the occurrence of distortion, which affects a separated signal with loud musical noise. To overcome this problem, we propose combining sparseness with the use of an estimated mixing matrix. First, we use a geometrical approach to detect when only one source is active and to perform a preliminary separation with a time-frequency mask. This information is then used to estimate the mixing matrix, which allows us to improve our separation. Experimental results show that this combination of time-frequency mask and mixing matrix estimation provides separated signals of better quality (less distortion, less musical noise) than those extracted without using the estimated mixing matrix in reverberant conditions where the reverberant time (TR) was 130 ms and 200 ms. Furthermore, informal listening tests clearly show that musical noise is deeply lowered by the proposed method comparatively to the classical approaches.
ER -