Semi-Supervised Speech Enhancement Combining Nonnegative Matrix Factorization and Robust Principal Component Analysis

Yonggang HU; Xiongwei ZHANG; Xia ZOU; Meng SUN; Yunfei ZHENG; Gang MIN

doi:10.1587/transfun.E100.A.1714

IEICE TRANSACTIONS on Fundamentals

Semi-Supervised Speech Enhancement Combining Nonnegative Matrix Factorization and Robust Principal Component Analysis

Yonggang HU, Xiongwei ZHANG, Xia ZOU, Meng SUN, Yunfei ZHENG, Gang MIN

Full Text Views

0

Cite this

Summary :

Nonnegative matrix factorization (NMF) is one of the most popular machine learning tools for speech enhancement. The supervised NMF-based speech enhancement is accomplished by updating iteratively with the prior knowledge of the clean speech and noise spectra bases. However, in many real-world scenarios, it is not always possible for conducting any prior training. The traditional semi-supervised NMF (SNMF) version overcomes this shortcoming while the performance degrades. In this letter, without any prior knowledge of the speech and noise, we present an improved semi-supervised NMF-based speech enhancement algorithm combining techniques of NMF and robust principal component analysis (RPCA). In this approach, fixed speech bases are obtained from the training samples chosen from public dateset offline. The noise samples used for noise bases training, instead of characterizing a priori as usual, can be obtained via RPCA algorithm on the fly. This letter also conducts a study on the assumption whether the time length of the estimated noise samples may have an effect on the performance of the algorithm. Three metrics, including PESQ, SDR and SNR are applied to evaluate the performance of the algorithms by making experiments on TIMIT with 20 noise types at various signal-to-noise ratio levels. Extensive experimental results demonstrate the superiority of the proposed algorithm over the competing speech enhancement algorithm.

Publication: IEICE TRANSACTIONS on Fundamentals Vol.E100-A No.8 pp.1714-1719

Publication Date: 2017/08/01

Publicized

Online ISSN: 1745-1337

DOI: 10.1587/transfun.E100.A.1714

Type of Manuscript: LETTER

Category: Speech and Hearing

Authors

Yonggang HU
  Australian National University
Xiongwei ZHANG
  PLA University of Science and Technology
Xia ZOU
  PLA University of Science and Technology
Meng SUN
  PLA University of Science and Technology
Yunfei ZHENG
  PLA University of Science and Technology,the Army Officer Academy of PLA and Key Laboratory of Polarization Imaging Detection Technology
Gang MIN
  XI'AN Communications Institute

Keyword

non-negative matrix factorization, robust principal component analysis, improved semi-supervised NMF

Cite this

Copy

Yonggang HU, Xiongwei ZHANG, Xia ZOU, Meng SUN, Yunfei ZHENG, Gang MIN, "Semi-Supervised Speech Enhancement Combining Nonnegative Matrix Factorization and Robust Principal Component Analysis" in IEICE TRANSACTIONS on Fundamentals, vol. E100-A, no. 8, pp. 1714-1719, August 2017, doi: 10.1587/transfun.E100.A.1714.
Abstract: Nonnegative matrix factorization (NMF) is one of the most popular machine learning tools for speech enhancement. The supervised NMF-based speech enhancement is accomplished by updating iteratively with the prior knowledge of the clean speech and noise spectra bases. However, in many real-world scenarios, it is not always possible for conducting any prior training. The traditional semi-supervised NMF (SNMF) version overcomes this shortcoming while the performance degrades. In this letter, without any prior knowledge of the speech and noise, we present an improved semi-supervised NMF-based speech enhancement algorithm combining techniques of NMF and robust principal component analysis (RPCA). In this approach, fixed speech bases are obtained from the training samples chosen from public dateset offline. The noise samples used for noise bases training, instead of characterizing a priori as usual, can be obtained via RPCA algorithm on the fly. This letter also conducts a study on the assumption whether the time length of the estimated noise samples may have an effect on the performance of the algorithm. Three metrics, including PESQ, SDR and SNR are applied to evaluate the performance of the algorithms by making experiments on TIMIT with 20 noise types at various signal-to-noise ratio levels. Extensive experimental results demonstrate the superiority of the proposed algorithm over the competing speech enhancement algorithm.
URL: https://global.ieice.org/en_transactions/fundamentals/10.1587/transfun.E100.A.1714/_p

Copy

@ARTICLE{e100-a_8_1714,
author={Yonggang HU, Xiongwei ZHANG, Xia ZOU, Meng SUN, Yunfei ZHENG, Gang MIN, },
journal={IEICE TRANSACTIONS on Fundamentals},
title={Semi-Supervised Speech Enhancement Combining Nonnegative Matrix Factorization and Robust Principal Component Analysis},
year={2017},
volume={E100-A},
number={8},
pages={1714-1719},
abstract={Nonnegative matrix factorization (NMF) is one of the most popular machine learning tools for speech enhancement. The supervised NMF-based speech enhancement is accomplished by updating iteratively with the prior knowledge of the clean speech and noise spectra bases. However, in many real-world scenarios, it is not always possible for conducting any prior training. The traditional semi-supervised NMF (SNMF) version overcomes this shortcoming while the performance degrades. In this letter, without any prior knowledge of the speech and noise, we present an improved semi-supervised NMF-based speech enhancement algorithm combining techniques of NMF and robust principal component analysis (RPCA). In this approach, fixed speech bases are obtained from the training samples chosen from public dateset offline. The noise samples used for noise bases training, instead of characterizing a priori as usual, can be obtained via RPCA algorithm on the fly. This letter also conducts a study on the assumption whether the time length of the estimated noise samples may have an effect on the performance of the algorithm. Three metrics, including PESQ, SDR and SNR are applied to evaluate the performance of the algorithms by making experiments on TIMIT with 20 noise types at various signal-to-noise ratio levels. Extensive experimental results demonstrate the superiority of the proposed algorithm over the competing speech enhancement algorithm.},
keywords={},
doi={10.1587/transfun.E100.A.1714},
ISSN={1745-1337},
month={August},}

Copy

TY - JOUR
TI - Semi-Supervised Speech Enhancement Combining Nonnegative Matrix Factorization and Robust Principal Component Analysis
T2 - IEICE TRANSACTIONS on Fundamentals
SP - 1714
EP - 1719
AU - Yonggang HU
AU - Xiongwei ZHANG
AU - Xia ZOU
AU - Meng SUN
AU - Yunfei ZHENG
AU - Gang MIN
PY - 2017
DO - 10.1587/transfun.E100.A.1714
JO - IEICE TRANSACTIONS on Fundamentals
SN - 1745-1337
VL - E100-A
IS - 8
JA - IEICE TRANSACTIONS on Fundamentals
Y1 - August 2017
AB - Nonnegative matrix factorization (NMF) is one of the most popular machine learning tools for speech enhancement. The supervised NMF-based speech enhancement is accomplished by updating iteratively with the prior knowledge of the clean speech and noise spectra bases. However, in many real-world scenarios, it is not always possible for conducting any prior training. The traditional semi-supervised NMF (SNMF) version overcomes this shortcoming while the performance degrades. In this letter, without any prior knowledge of the speech and noise, we present an improved semi-supervised NMF-based speech enhancement algorithm combining techniques of NMF and robust principal component analysis (RPCA). In this approach, fixed speech bases are obtained from the training samples chosen from public dateset offline. The noise samples used for noise bases training, instead of characterizing a priori as usual, can be obtained via RPCA algorithm on the fly. This letter also conducts a study on the assumption whether the time length of the estimated noise samples may have an effect on the performance of the algorithm. Three metrics, including PESQ, SDR and SNR are applied to evaluate the performance of the algorithms by making experiments on TIMIT with 20 noise types at various signal-to-noise ratio levels. Extensive experimental results demonstrate the superiority of the proposed algorithm over the competing speech enhancement algorithm.
ER -

IEICE TRANSACTIONS on Fundamentals