Speech captured by an in-ear microphone placed inside an occluded ear has a high signal-to-noise ratio; however, it has different sound characteristics compared to normal speech captured through air conduction. In this study, a method for blind speech quality enhancement is proposed that can convert speech captured by an in-ear microphone to one that resembles normal speech. The proposed method estimates an input-dependent enhancement function by using a neural network in the feature domain and enhances the captured speech via time-domain filtering. Subjective and objective evaluations confirm that the speech enhanced using our proposed method sounds more similar to normal speech than that enhanced using conventional equalizer-based methods.
Hochong PARK
Kwangwoon University
Yong-Shik SHIN
RippleBuds Ltd.
Seong-Hyeon SHIN
Kwangwoon University
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Hochong PARK, Yong-Shik SHIN, Seong-Hyeon SHIN, "Speech Quality Enhancement for In-Ear Microphone Based on Neural Network" in IEICE TRANSACTIONS on Information,
vol. E102-D, no. 8, pp. 1594-1597, August 2019, doi: 10.1587/transinf.2018EDL8249.
Abstract: Speech captured by an in-ear microphone placed inside an occluded ear has a high signal-to-noise ratio; however, it has different sound characteristics compared to normal speech captured through air conduction. In this study, a method for blind speech quality enhancement is proposed that can convert speech captured by an in-ear microphone to one that resembles normal speech. The proposed method estimates an input-dependent enhancement function by using a neural network in the feature domain and enhances the captured speech via time-domain filtering. Subjective and objective evaluations confirm that the speech enhanced using our proposed method sounds more similar to normal speech than that enhanced using conventional equalizer-based methods.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2018EDL8249/_p
Copy
@ARTICLE{e102-d_8_1594,
author={Hochong PARK, Yong-Shik SHIN, Seong-Hyeon SHIN, },
journal={IEICE TRANSACTIONS on Information},
title={Speech Quality Enhancement for In-Ear Microphone Based on Neural Network},
year={2019},
volume={E102-D},
number={8},
pages={1594-1597},
abstract={Speech captured by an in-ear microphone placed inside an occluded ear has a high signal-to-noise ratio; however, it has different sound characteristics compared to normal speech captured through air conduction. In this study, a method for blind speech quality enhancement is proposed that can convert speech captured by an in-ear microphone to one that resembles normal speech. The proposed method estimates an input-dependent enhancement function by using a neural network in the feature domain and enhances the captured speech via time-domain filtering. Subjective and objective evaluations confirm that the speech enhanced using our proposed method sounds more similar to normal speech than that enhanced using conventional equalizer-based methods.},
keywords={},
doi={10.1587/transinf.2018EDL8249},
ISSN={1745-1361},
month={August},}
Copy
TY - JOUR
TI - Speech Quality Enhancement for In-Ear Microphone Based on Neural Network
T2 - IEICE TRANSACTIONS on Information
SP - 1594
EP - 1597
AU - Hochong PARK
AU - Yong-Shik SHIN
AU - Seong-Hyeon SHIN
PY - 2019
DO - 10.1587/transinf.2018EDL8249
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E102-D
IS - 8
JA - IEICE TRANSACTIONS on Information
Y1 - August 2019
AB - Speech captured by an in-ear microphone placed inside an occluded ear has a high signal-to-noise ratio; however, it has different sound characteristics compared to normal speech captured through air conduction. In this study, a method for blind speech quality enhancement is proposed that can convert speech captured by an in-ear microphone to one that resembles normal speech. The proposed method estimates an input-dependent enhancement function by using a neural network in the feature domain and enhances the captured speech via time-domain filtering. Subjective and objective evaluations confirm that the speech enhanced using our proposed method sounds more similar to normal speech than that enhanced using conventional equalizer-based methods.
ER -