Low-Complexity and Accurate Noise Suppression Based on an a Priori SNR Model for Robust Speech Recognition on Embedded Systems and Its Evaluation in a Car Environment

Masanori TSUJIKAWA; Yoshinobu KAJIKAWA

doi:10.1587/transfun.2022EAP1130

IEICE TRANSACTIONS on Fundamentals

Low-Complexity and Accurate Noise Suppression Based on an a Priori SNR Model for Robust Speech Recognition on Embedded Systems and Its Evaluation in a Car Environment

Masanori TSUJIKAWA, Yoshinobu KAJIKAWA

Full Text Views

0

Cite this

Summary :

In this paper, we propose a low-complexity and accurate noise suppression based on an a priori SNR (Speech to Noise Ratio) model for greater robustness w.r.t. short-term noise-fluctuation. The a priori SNR, the ratio of speech spectra and noise spectra in the spectral domain, represents the difference between speech features and noise features in the feature domain, including the mel-cepstral domain and the logarithmic power spectral domain. This is because logarithmic operations are used for domain conversions. Therefore, an a priori SNR model can easily be expressed in terms of the difference between the speech model and the noise model, which are modeled by the Gaussian mixture models, and it can be generated with low computational cost. By using a priori SNRs accurately estimated on the basis of an a priori SNR model, it is possible to calculate accurate coefficients of noise suppression filters taking into account the variance of noise, without serious increase in computational cost over that of a conventional model-based Wiener filter (MBW). We have conducted in-car speech recognition evaluation using the CENSREC-2 database, and a comparison of the proposed method with a conventional MBW showed that the recognition error rate for all noise environments was reduced by 9%, and that, notably, that for audio-noise environments was reduced by 11%. We show that the proposed method can be processed with low levels of computational and memory resources through implementation on a digital signal processor.

Publication: IEICE TRANSACTIONS on Fundamentals Vol.E106-A No.9 pp.1224-1233

Publication Date: 2023/09/01

Publicized: 2023/02/28

Online ISSN: 1745-1337

DOI: 10.1587/transfun.2022EAP1130

Type of Manuscript: PAPER

Category: Digital Signal Processing

Authors

Masanori TSUJIKAWA
Kansai University,NEC Corporation
Yoshinobu KAJIKAWA
Kansai University

Keyword

noise-robust speech recognition, model-based noise suppression, a priori SNR model, embedded system

Cite this

Copy

Masanori TSUJIKAWA, Yoshinobu KAJIKAWA, "Low-Complexity and Accurate Noise Suppression Based on an a Priori SNR Model for Robust Speech Recognition on Embedded Systems and Its Evaluation in a Car Environment" in IEICE TRANSACTIONS on Fundamentals, vol. E106-A, no. 9, pp. 1224-1233, September 2023, doi: 10.1587/transfun.2022EAP1130.
Abstract: In this paper, we propose a low-complexity and accurate noise suppression based on an a priori SNR (Speech to Noise Ratio) model for greater robustness w.r.t. short-term noise-fluctuation. The a priori SNR, the ratio of speech spectra and noise spectra in the spectral domain, represents the difference between speech features and noise features in the feature domain, including the mel-cepstral domain and the logarithmic power spectral domain. This is because logarithmic operations are used for domain conversions. Therefore, an a priori SNR model can easily be expressed in terms of the difference between the speech model and the noise model, which are modeled by the Gaussian mixture models, and it can be generated with low computational cost. By using a priori SNRs accurately estimated on the basis of an a priori SNR model, it is possible to calculate accurate coefficients of noise suppression filters taking into account the variance of noise, without serious increase in computational cost over that of a conventional model-based Wiener filter (MBW). We have conducted in-car speech recognition evaluation using the CENSREC-2 database, and a comparison of the proposed method with a conventional MBW showed that the recognition error rate for all noise environments was reduced by 9%, and that, notably, that for audio-noise environments was reduced by 11%. We show that the proposed method can be processed with low levels of computational and memory resources through implementation on a digital signal processor.
URL: https://global.ieice.org/en_transactions/fundamentals/10.1587/transfun.2022EAP1130/_p

Copy

@ARTICLE{e106-a_9_1224,
author={Masanori TSUJIKAWA, Yoshinobu KAJIKAWA, },
journal={IEICE TRANSACTIONS on Fundamentals},
title={Low-Complexity and Accurate Noise Suppression Based on an a Priori SNR Model for Robust Speech Recognition on Embedded Systems and Its Evaluation in a Car Environment},
year={2023},
volume={E106-A},
number={9},
pages={1224-1233},
abstract={In this paper, we propose a low-complexity and accurate noise suppression based on an a priori SNR (Speech to Noise Ratio) model for greater robustness w.r.t. short-term noise-fluctuation. The a priori SNR, the ratio of speech spectra and noise spectra in the spectral domain, represents the difference between speech features and noise features in the feature domain, including the mel-cepstral domain and the logarithmic power spectral domain. This is because logarithmic operations are used for domain conversions. Therefore, an a priori SNR model can easily be expressed in terms of the difference between the speech model and the noise model, which are modeled by the Gaussian mixture models, and it can be generated with low computational cost. By using a priori SNRs accurately estimated on the basis of an a priori SNR model, it is possible to calculate accurate coefficients of noise suppression filters taking into account the variance of noise, without serious increase in computational cost over that of a conventional model-based Wiener filter (MBW). We have conducted in-car speech recognition evaluation using the CENSREC-2 database, and a comparison of the proposed method with a conventional MBW showed that the recognition error rate for all noise environments was reduced by 9%, and that, notably, that for audio-noise environments was reduced by 11%. We show that the proposed method can be processed with low levels of computational and memory resources through implementation on a digital signal processor.},
keywords={},
doi={10.1587/transfun.2022EAP1130},
ISSN={1745-1337},
month={September},}

Copy

TY - JOUR
TI - Low-Complexity and Accurate Noise Suppression Based on an a Priori SNR Model for Robust Speech Recognition on Embedded Systems and Its Evaluation in a Car Environment
T2 - IEICE TRANSACTIONS on Fundamentals
SP - 1224
EP - 1233
AU - Masanori TSUJIKAWA
AU - Yoshinobu KAJIKAWA
PY - 2023
DO - 10.1587/transfun.2022EAP1130
JO - IEICE TRANSACTIONS on Fundamentals
SN - 1745-1337
VL - E106-A
IS - 9
JA - IEICE TRANSACTIONS on Fundamentals
Y1 - September 2023
AB - In this paper, we propose a low-complexity and accurate noise suppression based on an a priori SNR (Speech to Noise Ratio) model for greater robustness w.r.t. short-term noise-fluctuation. The a priori SNR, the ratio of speech spectra and noise spectra in the spectral domain, represents the difference between speech features and noise features in the feature domain, including the mel-cepstral domain and the logarithmic power spectral domain. This is because logarithmic operations are used for domain conversions. Therefore, an a priori SNR model can easily be expressed in terms of the difference between the speech model and the noise model, which are modeled by the Gaussian mixture models, and it can be generated with low computational cost. By using a priori SNRs accurately estimated on the basis of an a priori SNR model, it is possible to calculate accurate coefficients of noise suppression filters taking into account the variance of noise, without serious increase in computational cost over that of a conventional model-based Wiener filter (MBW). We have conducted in-car speech recognition evaluation using the CENSREC-2 database, and a comparison of the proposed method with a conventional MBW showed that the recognition error rate for all noise environments was reduced by 9%, and that, notably, that for audio-noise environments was reduced by 11%. We show that the proposed method can be processed with low levels of computational and memory resources through implementation on a digital signal processor.
ER -

IEICE TRANSACTIONS on Fundamentals