In human-computer interaction, acoustic scene classification (ASC) is one of the relevant research domains. In real life, the recorded audio may include a lot of noise and quiet clips, making it hard for earlier ASC-based research to isolate the crucial scene information in sound. Furthermore, scene information may be scattered across numerous audio frames; hence, selecting scene-related frames is crucial for ASC. In this context, an integrated convolutional neural network with a fusion attention mechanism (ICNN-FA) is proposed for ASC. Firstly, segmented mel-spectrograms as the input of ICNN can assist the model in learning the short-term time-frequency correlation information. Then, the designed ICNN model is employed to learn these segment-level features. In addition, the proposed global attention layer may gather global information by integrating these segment features. Finally, the developed fusion attention layer is utilized to fuse all segment-level features while the classifier classifies various situations. Experimental findings using ASC datasets from DCASE 2018 and 2019 indicate the efficacy of the suggested method.
Pengxu JIANG
Southeast University
Yue XIE
Southeast University
Cairong ZOU
Southeast University
Li ZHAO
Southeast University
Qingyun WANG
Nanjing Institute of Technology
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Pengxu JIANG, Yue XIE, Cairong ZOU, Li ZHAO, Qingyun WANG, "An Integrated Convolutional Neural Network with a Fusion Attention Mechanism for Acoustic Scene Classification" in IEICE TRANSACTIONS on Fundamentals,
vol. E106-A, no. 8, pp. 1057-1061, August 2023, doi: 10.1587/transfun.2022EAL2091.
Abstract: In human-computer interaction, acoustic scene classification (ASC) is one of the relevant research domains. In real life, the recorded audio may include a lot of noise and quiet clips, making it hard for earlier ASC-based research to isolate the crucial scene information in sound. Furthermore, scene information may be scattered across numerous audio frames; hence, selecting scene-related frames is crucial for ASC. In this context, an integrated convolutional neural network with a fusion attention mechanism (ICNN-FA) is proposed for ASC. Firstly, segmented mel-spectrograms as the input of ICNN can assist the model in learning the short-term time-frequency correlation information. Then, the designed ICNN model is employed to learn these segment-level features. In addition, the proposed global attention layer may gather global information by integrating these segment features. Finally, the developed fusion attention layer is utilized to fuse all segment-level features while the classifier classifies various situations. Experimental findings using ASC datasets from DCASE 2018 and 2019 indicate the efficacy of the suggested method.
URL: https://global.ieice.org/en_transactions/fundamentals/10.1587/transfun.2022EAL2091/_p
Copy
@ARTICLE{e106-a_8_1057,
author={Pengxu JIANG, Yue XIE, Cairong ZOU, Li ZHAO, Qingyun WANG, },
journal={IEICE TRANSACTIONS on Fundamentals},
title={An Integrated Convolutional Neural Network with a Fusion Attention Mechanism for Acoustic Scene Classification},
year={2023},
volume={E106-A},
number={8},
pages={1057-1061},
abstract={In human-computer interaction, acoustic scene classification (ASC) is one of the relevant research domains. In real life, the recorded audio may include a lot of noise and quiet clips, making it hard for earlier ASC-based research to isolate the crucial scene information in sound. Furthermore, scene information may be scattered across numerous audio frames; hence, selecting scene-related frames is crucial for ASC. In this context, an integrated convolutional neural network with a fusion attention mechanism (ICNN-FA) is proposed for ASC. Firstly, segmented mel-spectrograms as the input of ICNN can assist the model in learning the short-term time-frequency correlation information. Then, the designed ICNN model is employed to learn these segment-level features. In addition, the proposed global attention layer may gather global information by integrating these segment features. Finally, the developed fusion attention layer is utilized to fuse all segment-level features while the classifier classifies various situations. Experimental findings using ASC datasets from DCASE 2018 and 2019 indicate the efficacy of the suggested method.},
keywords={},
doi={10.1587/transfun.2022EAL2091},
ISSN={1745-1337},
month={August},}
Copy
TY - JOUR
TI - An Integrated Convolutional Neural Network with a Fusion Attention Mechanism for Acoustic Scene Classification
T2 - IEICE TRANSACTIONS on Fundamentals
SP - 1057
EP - 1061
AU - Pengxu JIANG
AU - Yue XIE
AU - Cairong ZOU
AU - Li ZHAO
AU - Qingyun WANG
PY - 2023
DO - 10.1587/transfun.2022EAL2091
JO - IEICE TRANSACTIONS on Fundamentals
SN - 1745-1337
VL - E106-A
IS - 8
JA - IEICE TRANSACTIONS on Fundamentals
Y1 - August 2023
AB - In human-computer interaction, acoustic scene classification (ASC) is one of the relevant research domains. In real life, the recorded audio may include a lot of noise and quiet clips, making it hard for earlier ASC-based research to isolate the crucial scene information in sound. Furthermore, scene information may be scattered across numerous audio frames; hence, selecting scene-related frames is crucial for ASC. In this context, an integrated convolutional neural network with a fusion attention mechanism (ICNN-FA) is proposed for ASC. Firstly, segmented mel-spectrograms as the input of ICNN can assist the model in learning the short-term time-frequency correlation information. Then, the designed ICNN model is employed to learn these segment-level features. In addition, the proposed global attention layer may gather global information by integrating these segment features. Finally, the developed fusion attention layer is utilized to fuse all segment-level features while the classifier classifies various situations. Experimental findings using ASC datasets from DCASE 2018 and 2019 indicate the efficacy of the suggested method.
ER -