Online Convolutive Non-Negative Bases Learning for Speech Enhancement

Yinan LI; Xiongwei ZHANG; Meng SUN; Yonggang HU; Li LI

doi:10.1587/transfun.E99.A.1609

IEICE TRANSACTIONS on Fundamentals

Online Convolutive Non-Negative Bases Learning for Speech Enhancement

Yinan LI, Xiongwei ZHANG, Meng SUN, Yonggang HU, Li LI

Full Text Views

0

Cite this

Summary :

An online version of convolutive non-negative sparse coding (CNSC) with the generalized Kullback-Leibler (K-L) divergence is proposed to adaptively learn spectral-temporal bases from speech streams. The proposed scheme processes training data piece-by-piece and incrementally updates learned bases with accumulated statistics to overcome the inefficiency of its offline counterpart in processing large scale or streaming data. Compared to conventional non-negative sparse coding, we utilize the convolutive model within bases, so that each basis is capable of describing a relatively long temporal span of signals, which helps to improve the representation power of the model. Moreover, by incorporating a voice activity detector (VAD), we propose an unsupervised enhancement algorithm that updates the noise dictionary adaptively from non-speech intervals. Meanwhile, for the speech intervals, one can adaptively learn the speech bases by keeping the noise ones fixed. Experimental results show that the proposed algorithm outperforms the competing algorithms substantially, especially when the background noise is highly non-stationary.

Publication: IEICE TRANSACTIONS on Fundamentals Vol.E99-A No.8 pp.1609-1613

Publication Date: 2016/08/01

Publicized

Online ISSN: 1745-1337

DOI: 10.1587/transfun.E99.A.1609

Type of Manuscript: LETTER

Category: Speech and Hearing

Authors

Yinan LI
  PLA University of Science and Technology
Xiongwei ZHANG
  PLA University of Science and Technology
Meng SUN
  PLA University of Science and Technology
Yonggang HU
  PLA University of Science and Technology
Li LI
  PLA University of Science and Technology

Keyword

convolutive non-negative sparse coding, online learning, speech enhancement

Cite this

Copy

Yinan LI, Xiongwei ZHANG, Meng SUN, Yonggang HU, Li LI, "Online Convolutive Non-Negative Bases Learning for Speech Enhancement" in IEICE TRANSACTIONS on Fundamentals, vol. E99-A, no. 8, pp. 1609-1613, August 2016, doi: 10.1587/transfun.E99.A.1609.
Abstract: An online version of convolutive non-negative sparse coding (CNSC) with the generalized Kullback-Leibler (K-L) divergence is proposed to adaptively learn spectral-temporal bases from speech streams. The proposed scheme processes training data piece-by-piece and incrementally updates learned bases with accumulated statistics to overcome the inefficiency of its offline counterpart in processing large scale or streaming data. Compared to conventional non-negative sparse coding, we utilize the convolutive model within bases, so that each basis is capable of describing a relatively long temporal span of signals, which helps to improve the representation power of the model. Moreover, by incorporating a voice activity detector (VAD), we propose an unsupervised enhancement algorithm that updates the noise dictionary adaptively from non-speech intervals. Meanwhile, for the speech intervals, one can adaptively learn the speech bases by keeping the noise ones fixed. Experimental results show that the proposed algorithm outperforms the competing algorithms substantially, especially when the background noise is highly non-stationary.
URL: https://global.ieice.org/en_transactions/fundamentals/10.1587/transfun.E99.A.1609/_p

Copy

@ARTICLE{e99-a_8_1609,
author={Yinan LI, Xiongwei ZHANG, Meng SUN, Yonggang HU, Li LI, },
journal={IEICE TRANSACTIONS on Fundamentals},
title={Online Convolutive Non-Negative Bases Learning for Speech Enhancement},
year={2016},
volume={E99-A},
number={8},
pages={1609-1613},
abstract={An online version of convolutive non-negative sparse coding (CNSC) with the generalized Kullback-Leibler (K-L) divergence is proposed to adaptively learn spectral-temporal bases from speech streams. The proposed scheme processes training data piece-by-piece and incrementally updates learned bases with accumulated statistics to overcome the inefficiency of its offline counterpart in processing large scale or streaming data. Compared to conventional non-negative sparse coding, we utilize the convolutive model within bases, so that each basis is capable of describing a relatively long temporal span of signals, which helps to improve the representation power of the model. Moreover, by incorporating a voice activity detector (VAD), we propose an unsupervised enhancement algorithm that updates the noise dictionary adaptively from non-speech intervals. Meanwhile, for the speech intervals, one can adaptively learn the speech bases by keeping the noise ones fixed. Experimental results show that the proposed algorithm outperforms the competing algorithms substantially, especially when the background noise is highly non-stationary.},
keywords={},
doi={10.1587/transfun.E99.A.1609},
ISSN={1745-1337},
month={August},}

Copy

TY - JOUR
TI - Online Convolutive Non-Negative Bases Learning for Speech Enhancement
T2 - IEICE TRANSACTIONS on Fundamentals
SP - 1609
EP - 1613
AU - Yinan LI
AU - Xiongwei ZHANG
AU - Meng SUN
AU - Yonggang HU
AU - Li LI
PY - 2016
DO - 10.1587/transfun.E99.A.1609
JO - IEICE TRANSACTIONS on Fundamentals
SN - 1745-1337
VL - E99-A
IS - 8
JA - IEICE TRANSACTIONS on Fundamentals
Y1 - August 2016
AB - An online version of convolutive non-negative sparse coding (CNSC) with the generalized Kullback-Leibler (K-L) divergence is proposed to adaptively learn spectral-temporal bases from speech streams. The proposed scheme processes training data piece-by-piece and incrementally updates learned bases with accumulated statistics to overcome the inefficiency of its offline counterpart in processing large scale or streaming data. Compared to conventional non-negative sparse coding, we utilize the convolutive model within bases, so that each basis is capable of describing a relatively long temporal span of signals, which helps to improve the representation power of the model. Moreover, by incorporating a voice activity detector (VAD), we propose an unsupervised enhancement algorithm that updates the noise dictionary adaptively from non-speech intervals. Meanwhile, for the speech intervals, one can adaptively learn the speech bases by keeping the noise ones fixed. Experimental results show that the proposed algorithm outperforms the competing algorithms substantially, especially when the background noise is highly non-stationary.
ER -

IEICE TRANSACTIONS on Fundamentals