We propose utilizing subband-based blind source separation (BSS) for convolutive mixtures of speech. This is motivated by the drawback of frequency-domain BSS, i.e., when a long frame with a fixed long frame-shift is used to cover reverberation, the number of samples in each frequency decreases and the separation performance is degraded. In subband BSS, (1) by using a moderate number of subbands, a sufficient number of samples can be held in each subband, and (2) by using FIR filters in each subband, we can manage long reverberation. We confirm that subband BSS achieves better performance than frequency-domain BSS. Moreover, subband BSS allows us to select a separation method suited to each subband. Using this advantage, we propose efficient separation procedures that consider the frequency characteristics of room reverberation and speech signals (3) by using longer unmixing filters in low frequency bands and (4) by adopting an overlap-blockshift in BSS's batch adaptation in low frequency bands. Consequently, frequency-dependent subband processing is successfully realized with the proposed subband BSS.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Shoko ARAKI, Shoji MAKINO, Robert AICHNER, Tsuyoki NISHIKAWA, Hiroshi SARUWATARI, "Subband-Based Blind Separation for Convolutive Mixtures of Speech" in IEICE TRANSACTIONS on Fundamentals,
vol. E88-A, no. 12, pp. 3593-3603, December 2005, doi: 10.1093/ietfec/e88-a.12.3593.
Abstract: We propose utilizing subband-based blind source separation (BSS) for convolutive mixtures of speech. This is motivated by the drawback of frequency-domain BSS, i.e., when a long frame with a fixed long frame-shift is used to cover reverberation, the number of samples in each frequency decreases and the separation performance is degraded. In subband BSS, (1) by using a moderate number of subbands, a sufficient number of samples can be held in each subband, and (2) by using FIR filters in each subband, we can manage long reverberation. We confirm that subband BSS achieves better performance than frequency-domain BSS. Moreover, subband BSS allows us to select a separation method suited to each subband. Using this advantage, we propose efficient separation procedures that consider the frequency characteristics of room reverberation and speech signals (3) by using longer unmixing filters in low frequency bands and (4) by adopting an overlap-blockshift in BSS's batch adaptation in low frequency bands. Consequently, frequency-dependent subband processing is successfully realized with the proposed subband BSS.
URL: https://global.ieice.org/en_transactions/fundamentals/10.1093/ietfec/e88-a.12.3593/_p
Copy
@ARTICLE{e88-a_12_3593,
author={Shoko ARAKI, Shoji MAKINO, Robert AICHNER, Tsuyoki NISHIKAWA, Hiroshi SARUWATARI, },
journal={IEICE TRANSACTIONS on Fundamentals},
title={Subband-Based Blind Separation for Convolutive Mixtures of Speech},
year={2005},
volume={E88-A},
number={12},
pages={3593-3603},
abstract={We propose utilizing subband-based blind source separation (BSS) for convolutive mixtures of speech. This is motivated by the drawback of frequency-domain BSS, i.e., when a long frame with a fixed long frame-shift is used to cover reverberation, the number of samples in each frequency decreases and the separation performance is degraded. In subband BSS, (1) by using a moderate number of subbands, a sufficient number of samples can be held in each subband, and (2) by using FIR filters in each subband, we can manage long reverberation. We confirm that subband BSS achieves better performance than frequency-domain BSS. Moreover, subband BSS allows us to select a separation method suited to each subband. Using this advantage, we propose efficient separation procedures that consider the frequency characteristics of room reverberation and speech signals (3) by using longer unmixing filters in low frequency bands and (4) by adopting an overlap-blockshift in BSS's batch adaptation in low frequency bands. Consequently, frequency-dependent subband processing is successfully realized with the proposed subband BSS.},
keywords={},
doi={10.1093/ietfec/e88-a.12.3593},
ISSN={},
month={December},}
Copy
TY - JOUR
TI - Subband-Based Blind Separation for Convolutive Mixtures of Speech
T2 - IEICE TRANSACTIONS on Fundamentals
SP - 3593
EP - 3603
AU - Shoko ARAKI
AU - Shoji MAKINO
AU - Robert AICHNER
AU - Tsuyoki NISHIKAWA
AU - Hiroshi SARUWATARI
PY - 2005
DO - 10.1093/ietfec/e88-a.12.3593
JO - IEICE TRANSACTIONS on Fundamentals
SN -
VL - E88-A
IS - 12
JA - IEICE TRANSACTIONS on Fundamentals
Y1 - December 2005
AB - We propose utilizing subband-based blind source separation (BSS) for convolutive mixtures of speech. This is motivated by the drawback of frequency-domain BSS, i.e., when a long frame with a fixed long frame-shift is used to cover reverberation, the number of samples in each frequency decreases and the separation performance is degraded. In subband BSS, (1) by using a moderate number of subbands, a sufficient number of samples can be held in each subband, and (2) by using FIR filters in each subband, we can manage long reverberation. We confirm that subband BSS achieves better performance than frequency-domain BSS. Moreover, subband BSS allows us to select a separation method suited to each subband. Using this advantage, we propose efficient separation procedures that consider the frequency characteristics of room reverberation and speech signals (3) by using longer unmixing filters in low frequency bands and (4) by adopting an overlap-blockshift in BSS's batch adaptation in low frequency bands. Consequently, frequency-dependent subband processing is successfully realized with the proposed subband BSS.
ER -