DNN-Based Low-Musical-Noise Single-Channel Speech Enhancement Based on Higher-Order-Moments Matching

Satoshi MIZOGUCHI; Yuki SAITO; Shinnosuke TAKAMICHI; Hiroshi SARUWATARI

doi:10.1587/transinf.2021EDP7041

DNN-Based Low-Musical-Noise Single-Channel Speech Enhancement Based on Higher-Order-Moments Matching

Satoshi MIZOGUCHI, Yuki SAITO, Shinnosuke TAKAMICHI, Hiroshi SARUWATARI

Full Text Views

0

Cite this

Summary :

We propose deep neural network (DNN)-based speech enhancement that reduces musical noise and achieves better auditory impressions. The musical noise is an artifact generated by nonlinear signal processing and negatively affects the auditory impressions. We aim to develop musical-noise-free speech enhancement methods that suppress the musical noise generation and produce perceptually-comfortable enhanced speech. DNN-based speech enhancement using a soft mask achieves high noise reduction but generates musical noise in non-speech regions. Therefore, first, we define kurtosis matching for DNN-based low-musical-noise speech enhancement. Kurtosis is the fourth-order moment and is known to correlate with the amount of musical noise. The kurtosis matching is a penalty term of the DNN training and works to reduce the amount of musical noise. We further extend this scheme to standardized-moment matching. The extended scheme involves using moments whose orders are higher than kurtosis and generalizes the conventional musical-noise-free method based on kurtosis matching. We formulate standardized-moment matching and explore how effectively the higher-order moments reduce the amount of musical noise. Experimental evaluation results 1) demonstrate that kurtosis matching can reduce musical noise without negatively affecting noise suppression and 2) newly reveal that the sixth-moment matching also achieves low-musical-noise speech enhancement as well as kurtosis matching.

Publication: IEICE TRANSACTIONS on Information Vol.E104-D No.11 pp.1971-1980

Publication Date: 2021/11/01

Publicized: 2021/07/30

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2021EDP7041

Type of Manuscript: PAPER

Category: Speech and Hearing

Authors

Satoshi MIZOGUCHI
  University of Tokyo
Yuki SAITO
  University of Tokyo
Shinnosuke TAKAMICHI
  University of Tokyo
Hiroshi SARUWATARI
  University of Tokyo

Keyword

speech enhancement, musical noise, kurtosis, moment matching, deep learning

Cite this

Copy

Satoshi MIZOGUCHI, Yuki SAITO, Shinnosuke TAKAMICHI, Hiroshi SARUWATARI, "DNN-Based Low-Musical-Noise Single-Channel Speech Enhancement Based on Higher-Order-Moments Matching" in IEICE TRANSACTIONS on Information, vol. E104-D, no. 11, pp. 1971-1980, November 2021, doi: 10.1587/transinf.2021EDP7041.
Abstract: We propose deep neural network (DNN)-based speech enhancement that reduces musical noise and achieves better auditory impressions. The musical noise is an artifact generated by nonlinear signal processing and negatively affects the auditory impressions. We aim to develop musical-noise-free speech enhancement methods that suppress the musical noise generation and produce perceptually-comfortable enhanced speech. DNN-based speech enhancement using a soft mask achieves high noise reduction but generates musical noise in non-speech regions. Therefore, first, we define kurtosis matching for DNN-based low-musical-noise speech enhancement. Kurtosis is the fourth-order moment and is known to correlate with the amount of musical noise. The kurtosis matching is a penalty term of the DNN training and works to reduce the amount of musical noise. We further extend this scheme to standardized-moment matching. The extended scheme involves using moments whose orders are higher than kurtosis and generalizes the conventional musical-noise-free method based on kurtosis matching. We formulate standardized-moment matching and explore how effectively the higher-order moments reduce the amount of musical noise. Experimental evaluation results 1) demonstrate that kurtosis matching can reduce musical noise without negatively affecting noise suppression and 2) newly reveal that the sixth-moment matching also achieves low-musical-noise speech enhancement as well as kurtosis matching.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2021EDP7041/_p

Copy

@ARTICLE{e104-d_11_1971,
author={Satoshi MIZOGUCHI, Yuki SAITO, Shinnosuke TAKAMICHI, Hiroshi SARUWATARI, },
journal={IEICE TRANSACTIONS on Information},
title={DNN-Based Low-Musical-Noise Single-Channel Speech Enhancement Based on Higher-Order-Moments Matching},
year={2021},
volume={E104-D},
number={11},
pages={1971-1980},
abstract={We propose deep neural network (DNN)-based speech enhancement that reduces musical noise and achieves better auditory impressions. The musical noise is an artifact generated by nonlinear signal processing and negatively affects the auditory impressions. We aim to develop musical-noise-free speech enhancement methods that suppress the musical noise generation and produce perceptually-comfortable enhanced speech. DNN-based speech enhancement using a soft mask achieves high noise reduction but generates musical noise in non-speech regions. Therefore, first, we define kurtosis matching for DNN-based low-musical-noise speech enhancement. Kurtosis is the fourth-order moment and is known to correlate with the amount of musical noise. The kurtosis matching is a penalty term of the DNN training and works to reduce the amount of musical noise. We further extend this scheme to standardized-moment matching. The extended scheme involves using moments whose orders are higher than kurtosis and generalizes the conventional musical-noise-free method based on kurtosis matching. We formulate standardized-moment matching and explore how effectively the higher-order moments reduce the amount of musical noise. Experimental evaluation results 1) demonstrate that kurtosis matching can reduce musical noise without negatively affecting noise suppression and 2) newly reveal that the sixth-moment matching also achieves low-musical-noise speech enhancement as well as kurtosis matching.},
keywords={},
doi={10.1587/transinf.2021EDP7041},
ISSN={1745-1361},
month={November},}

Copy

TY - JOUR
TI - DNN-Based Low-Musical-Noise Single-Channel Speech Enhancement Based on Higher-Order-Moments Matching
T2 - IEICE TRANSACTIONS on Information
SP - 1971
EP - 1980
AU - Satoshi MIZOGUCHI
AU - Yuki SAITO
AU - Shinnosuke TAKAMICHI
AU - Hiroshi SARUWATARI
PY - 2021
DO - 10.1587/transinf.2021EDP7041
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E104-D
IS - 11
JA - IEICE TRANSACTIONS on Information
Y1 - November 2021
AB - We propose deep neural network (DNN)-based speech enhancement that reduces musical noise and achieves better auditory impressions. The musical noise is an artifact generated by nonlinear signal processing and negatively affects the auditory impressions. We aim to develop musical-noise-free speech enhancement methods that suppress the musical noise generation and produce perceptually-comfortable enhanced speech. DNN-based speech enhancement using a soft mask achieves high noise reduction but generates musical noise in non-speech regions. Therefore, first, we define kurtosis matching for DNN-based low-musical-noise speech enhancement. Kurtosis is the fourth-order moment and is known to correlate with the amount of musical noise. The kurtosis matching is a penalty term of the DNN training and works to reduce the amount of musical noise. We further extend this scheme to standardized-moment matching. The extended scheme involves using moments whose orders are higher than kurtosis and generalizes the conventional musical-noise-free method based on kurtosis matching. We formulate standardized-moment matching and explore how effectively the higher-order moments reduce the amount of musical noise. Experimental evaluation results 1) demonstrate that kurtosis matching can reduce musical noise without negatively affecting noise suppression and 2) newly reveal that the sixth-moment matching also achieves low-musical-noise speech enhancement as well as kurtosis matching.
ER -