This paper describes a hands-free speech recognition technique based on acoustic model adaptation to reverberant speech. In hands-free speech recognition, the recognition accuracy is degraded by reverberation, since each segment of speech is affected by the reflection energy of the preceding segment. To compensate for the reflection signal we introduce a frame-by-frame adaptation method adding the reflection signal to the means of the acoustic model. The reflection signal is approximated by a first-order linear prediction from the observation signal at the preceding frame, and the linear prediction coefficient is estimated with a maximum likelihood method by using the EM algorithm, which maximizes the likelihood of the adaptation data. Its effectiveness is confirmed by word recognition experiments on reverberant speech.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Tetsuya TAKIGUCHI, Masafumi NISHIMURA, Yasuo ARIKI, "Acoustic Model Adaptation Using First-Order Linear Prediction for Reverberant Speech" in IEICE TRANSACTIONS on Information,
vol. E89-D, no. 3, pp. 908-914, March 2006, doi: 10.1093/ietisy/e89-d.3.908.
Abstract: This paper describes a hands-free speech recognition technique based on acoustic model adaptation to reverberant speech. In hands-free speech recognition, the recognition accuracy is degraded by reverberation, since each segment of speech is affected by the reflection energy of the preceding segment. To compensate for the reflection signal we introduce a frame-by-frame adaptation method adding the reflection signal to the means of the acoustic model. The reflection signal is approximated by a first-order linear prediction from the observation signal at the preceding frame, and the linear prediction coefficient is estimated with a maximum likelihood method by using the EM algorithm, which maximizes the likelihood of the adaptation data. Its effectiveness is confirmed by word recognition experiments on reverberant speech.
URL: https://global.ieice.org/en_transactions/information/10.1093/ietisy/e89-d.3.908/_p
Copy
@ARTICLE{e89-d_3_908,
author={Tetsuya TAKIGUCHI, Masafumi NISHIMURA, Yasuo ARIKI, },
journal={IEICE TRANSACTIONS on Information},
title={Acoustic Model Adaptation Using First-Order Linear Prediction for Reverberant Speech},
year={2006},
volume={E89-D},
number={3},
pages={908-914},
abstract={This paper describes a hands-free speech recognition technique based on acoustic model adaptation to reverberant speech. In hands-free speech recognition, the recognition accuracy is degraded by reverberation, since each segment of speech is affected by the reflection energy of the preceding segment. To compensate for the reflection signal we introduce a frame-by-frame adaptation method adding the reflection signal to the means of the acoustic model. The reflection signal is approximated by a first-order linear prediction from the observation signal at the preceding frame, and the linear prediction coefficient is estimated with a maximum likelihood method by using the EM algorithm, which maximizes the likelihood of the adaptation data. Its effectiveness is confirmed by word recognition experiments on reverberant speech.},
keywords={},
doi={10.1093/ietisy/e89-d.3.908},
ISSN={1745-1361},
month={March},}
Copy
TY - JOUR
TI - Acoustic Model Adaptation Using First-Order Linear Prediction for Reverberant Speech
T2 - IEICE TRANSACTIONS on Information
SP - 908
EP - 914
AU - Tetsuya TAKIGUCHI
AU - Masafumi NISHIMURA
AU - Yasuo ARIKI
PY - 2006
DO - 10.1093/ietisy/e89-d.3.908
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E89-D
IS - 3
JA - IEICE TRANSACTIONS on Information
Y1 - March 2006
AB - This paper describes a hands-free speech recognition technique based on acoustic model adaptation to reverberant speech. In hands-free speech recognition, the recognition accuracy is degraded by reverberation, since each segment of speech is affected by the reflection energy of the preceding segment. To compensate for the reflection signal we introduce a frame-by-frame adaptation method adding the reflection signal to the means of the acoustic model. The reflection signal is approximated by a first-order linear prediction from the observation signal at the preceding frame, and the linear prediction coefficient is estimated with a maximum likelihood method by using the EM algorithm, which maximizes the likelihood of the adaptation data. Its effectiveness is confirmed by word recognition experiments on reverberant speech.
ER -