Full Text Views
106
Sound event detection is intended to identify the sound events in audio recordings, which has widespread applications in real life. Recently, convolutional recurrent neural network (CRNN) models have achieved state-of-the-art performance in this task due to their capabilities in learning the representative features. However, the CRNN models are of high complexities with millions of parameters to be trained, which limits their usage for the mobile and embedded devices with limited computation resource. Model distillation is effective to distill the knowledge of a complex model to a smaller one, which can be deployed on the devices with limited computational power. In this letter, we propose a novel multi model-based distillation approach for sound event detection by making use of the knowledge from models of multiple teachers which are complementary in detecting sound events. Extensive experimental results demonstrated that our approach achieves a compression ratio about 50 times. In addition, better performance is obtained for the sound event detection task.
Yingwei FU
National Key Laboratory of Parallel and Distributed Processing,National University of Defense Technology
Kele XU
National Key Laboratory of Parallel and Distributed Processing,National University of Defense Technology
Haibo MI
National Key Laboratory of Parallel and Distributed Processing,National University of Defense Technology
Qiuqiang KONG
University of Surrey
Dezhi WANG
National University of Defense Technology
Huaimin WANG
National Key Laboratory of Parallel and Distributed Processing,National University of Defense Technology
Tie HONG
National University of Defense Technology
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Yingwei FU, Kele XU, Haibo MI, Qiuqiang KONG, Dezhi WANG, Huaimin WANG, Tie HONG, "Multi Model-Based Distillation for Sound Event Detection" in IEICE TRANSACTIONS on Information,
vol. E102-D, no. 10, pp. 2055-2058, October 2019, doi: 10.1587/transinf.2019EDL8062.
Abstract: Sound event detection is intended to identify the sound events in audio recordings, which has widespread applications in real life. Recently, convolutional recurrent neural network (CRNN) models have achieved state-of-the-art performance in this task due to their capabilities in learning the representative features. However, the CRNN models are of high complexities with millions of parameters to be trained, which limits their usage for the mobile and embedded devices with limited computation resource. Model distillation is effective to distill the knowledge of a complex model to a smaller one, which can be deployed on the devices with limited computational power. In this letter, we propose a novel multi model-based distillation approach for sound event detection by making use of the knowledge from models of multiple teachers which are complementary in detecting sound events. Extensive experimental results demonstrated that our approach achieves a compression ratio about 50 times. In addition, better performance is obtained for the sound event detection task.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2019EDL8062/_p
Copy
@ARTICLE{e102-d_10_2055,
author={Yingwei FU, Kele XU, Haibo MI, Qiuqiang KONG, Dezhi WANG, Huaimin WANG, Tie HONG, },
journal={IEICE TRANSACTIONS on Information},
title={Multi Model-Based Distillation for Sound Event Detection},
year={2019},
volume={E102-D},
number={10},
pages={2055-2058},
abstract={Sound event detection is intended to identify the sound events in audio recordings, which has widespread applications in real life. Recently, convolutional recurrent neural network (CRNN) models have achieved state-of-the-art performance in this task due to their capabilities in learning the representative features. However, the CRNN models are of high complexities with millions of parameters to be trained, which limits their usage for the mobile and embedded devices with limited computation resource. Model distillation is effective to distill the knowledge of a complex model to a smaller one, which can be deployed on the devices with limited computational power. In this letter, we propose a novel multi model-based distillation approach for sound event detection by making use of the knowledge from models of multiple teachers which are complementary in detecting sound events. Extensive experimental results demonstrated that our approach achieves a compression ratio about 50 times. In addition, better performance is obtained for the sound event detection task.},
keywords={},
doi={10.1587/transinf.2019EDL8062},
ISSN={1745-1361},
month={October},}
Copy
TY - JOUR
TI - Multi Model-Based Distillation for Sound Event Detection
T2 - IEICE TRANSACTIONS on Information
SP - 2055
EP - 2058
AU - Yingwei FU
AU - Kele XU
AU - Haibo MI
AU - Qiuqiang KONG
AU - Dezhi WANG
AU - Huaimin WANG
AU - Tie HONG
PY - 2019
DO - 10.1587/transinf.2019EDL8062
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E102-D
IS - 10
JA - IEICE TRANSACTIONS on Information
Y1 - October 2019
AB - Sound event detection is intended to identify the sound events in audio recordings, which has widespread applications in real life. Recently, convolutional recurrent neural network (CRNN) models have achieved state-of-the-art performance in this task due to their capabilities in learning the representative features. However, the CRNN models are of high complexities with millions of parameters to be trained, which limits their usage for the mobile and embedded devices with limited computation resource. Model distillation is effective to distill the knowledge of a complex model to a smaller one, which can be deployed on the devices with limited computational power. In this letter, we propose a novel multi model-based distillation approach for sound event detection by making use of the knowledge from models of multiple teachers which are complementary in detecting sound events. Extensive experimental results demonstrated that our approach achieves a compression ratio about 50 times. In addition, better performance is obtained for the sound event detection task.
ER -