An Integrated Convolutional Neural Network with a Fusion Attention Mechanism for Acoustic Scene Classification

Pengxu JIANG; Yue XIE; Cairong ZOU; Li ZHAO; Qingyun WANG

doi:10.1587/transfun.2022EAL2091

IEICE TRANSACTIONS on Fundamentals

An Integrated Convolutional Neural Network with a Fusion Attention Mechanism for Acoustic Scene Classification

Pengxu JIANG, Yue XIE, Cairong ZOU, Li ZHAO, Qingyun WANG

Full Text Views

24

Cite this

Summary :

In human-computer interaction, acoustic scene classification (ASC) is one of the relevant research domains. In real life, the recorded audio may include a lot of noise and quiet clips, making it hard for earlier ASC-based research to isolate the crucial scene information in sound. Furthermore, scene information may be scattered across numerous audio frames; hence, selecting scene-related frames is crucial for ASC. In this context, an integrated convolutional neural network with a fusion attention mechanism (ICNN-FA) is proposed for ASC. Firstly, segmented mel-spectrograms as the input of ICNN can assist the model in learning the short-term time-frequency correlation information. Then, the designed ICNN model is employed to learn these segment-level features. In addition, the proposed global attention layer may gather global information by integrating these segment features. Finally, the developed fusion attention layer is utilized to fuse all segment-level features while the classifier classifies various situations. Experimental findings using ASC datasets from DCASE 2018 and 2019 indicate the efficacy of the suggested method.

Publication: IEICE TRANSACTIONS on Fundamentals Vol.E106-A No.8 pp.1057-1061

Publication Date: 2023/08/01

Publicized: 2023/02/06

Online ISSN: 1745-1337

DOI: 10.1587/transfun.2022EAL2091

Type of Manuscript: LETTER

Category: Engineering Acoustics

Authors

Pengxu JIANG
  Southeast University
Yue XIE
  Southeast University
Cairong ZOU
  Southeast University
Li ZHAO
  Southeast University
Qingyun WANG
  Nanjing Institute of Technology

Keyword

acoustic scene classification, ICNN-FA, CNN, attention mechanism, Mel-spectrograms

Cite this

Copy

Pengxu JIANG, Yue XIE, Cairong ZOU, Li ZHAO, Qingyun WANG, "An Integrated Convolutional Neural Network with a Fusion Attention Mechanism for Acoustic Scene Classification" in IEICE TRANSACTIONS on Fundamentals, vol. E106-A, no. 8, pp. 1057-1061, August 2023, doi: 10.1587/transfun.2022EAL2091.
Abstract: In human-computer interaction, acoustic scene classification (ASC) is one of the relevant research domains. In real life, the recorded audio may include a lot of noise and quiet clips, making it hard for earlier ASC-based research to isolate the crucial scene information in sound. Furthermore, scene information may be scattered across numerous audio frames; hence, selecting scene-related frames is crucial for ASC. In this context, an integrated convolutional neural network with a fusion attention mechanism (ICNN-FA) is proposed for ASC. Firstly, segmented mel-spectrograms as the input of ICNN can assist the model in learning the short-term time-frequency correlation information. Then, the designed ICNN model is employed to learn these segment-level features. In addition, the proposed global attention layer may gather global information by integrating these segment features. Finally, the developed fusion attention layer is utilized to fuse all segment-level features while the classifier classifies various situations. Experimental findings using ASC datasets from DCASE 2018 and 2019 indicate the efficacy of the suggested method.
URL: https://global.ieice.org/en_transactions/fundamentals/10.1587/transfun.2022EAL2091/_p

Copy

@ARTICLE{e106-a_8_1057,
author={Pengxu JIANG, Yue XIE, Cairong ZOU, Li ZHAO, Qingyun WANG, },
journal={IEICE TRANSACTIONS on Fundamentals},
title={An Integrated Convolutional Neural Network with a Fusion Attention Mechanism for Acoustic Scene Classification},
year={2023},
volume={E106-A},
number={8},
pages={1057-1061},
abstract={In human-computer interaction, acoustic scene classification (ASC) is one of the relevant research domains. In real life, the recorded audio may include a lot of noise and quiet clips, making it hard for earlier ASC-based research to isolate the crucial scene information in sound. Furthermore, scene information may be scattered across numerous audio frames; hence, selecting scene-related frames is crucial for ASC. In this context, an integrated convolutional neural network with a fusion attention mechanism (ICNN-FA) is proposed for ASC. Firstly, segmented mel-spectrograms as the input of ICNN can assist the model in learning the short-term time-frequency correlation information. Then, the designed ICNN model is employed to learn these segment-level features. In addition, the proposed global attention layer may gather global information by integrating these segment features. Finally, the developed fusion attention layer is utilized to fuse all segment-level features while the classifier classifies various situations. Experimental findings using ASC datasets from DCASE 2018 and 2019 indicate the efficacy of the suggested method.},
keywords={},
doi={10.1587/transfun.2022EAL2091},
ISSN={1745-1337},
month={August},}

Copy

TY - JOUR
TI - An Integrated Convolutional Neural Network with a Fusion Attention Mechanism for Acoustic Scene Classification
T2 - IEICE TRANSACTIONS on Fundamentals
SP - 1057
EP - 1061
AU - Pengxu JIANG
AU - Yue XIE
AU - Cairong ZOU
AU - Li ZHAO
AU - Qingyun WANG
PY - 2023
DO - 10.1587/transfun.2022EAL2091
JO - IEICE TRANSACTIONS on Fundamentals
SN - 1745-1337
VL - E106-A
IS - 8
JA - IEICE TRANSACTIONS on Fundamentals
Y1 - August 2023
AB - In human-computer interaction, acoustic scene classification (ASC) is one of the relevant research domains. In real life, the recorded audio may include a lot of noise and quiet clips, making it hard for earlier ASC-based research to isolate the crucial scene information in sound. Furthermore, scene information may be scattered across numerous audio frames; hence, selecting scene-related frames is crucial for ASC. In this context, an integrated convolutional neural network with a fusion attention mechanism (ICNN-FA) is proposed for ASC. Firstly, segmented mel-spectrograms as the input of ICNN can assist the model in learning the short-term time-frequency correlation information. Then, the designed ICNN model is employed to learn these segment-level features. In addition, the proposed global attention layer may gather global information by integrating these segment features. Finally, the developed fusion attention layer is utilized to fuse all segment-level features while the classifier classifies various situations. Experimental findings using ASC datasets from DCASE 2018 and 2019 indicate the efficacy of the suggested method.
ER -

IEICE TRANSACTIONS on Fundamentals