The search functionality is under construction.

IEICE TRANSACTIONS on Information

A CNN-Based Multi-Scale Pooling Strategy for Acoustic Scene Classification

Rong HUANG, Yue XIE

  • Full Text Views

    0

  • Cite this

Summary :

Acoustic scene classification (ASC) is a fundamental domain within the realm of artificial intelligence classification tasks. ASC-based tasks commonly employ models based on convolutional neural networks (CNNs) that utilize log-Mel spectrograms as input for gathering acoustic features. In this paper, we designed a CNN-based multi-scale pooling (MSP) strategy for ASC. The log-Mel spectrograms are utilized as the input to CNN, which is partitioned into four frequency axis segments. Furthermore, we devised four CNN channels to acquire inputs from distinct frequency ranges. The high-level features extracted from outputs in various frequency bands are integrated through frequency pyramid average pooling layers at multiple levels. Subsequently, a softmax classifier is employed to classify different scenes. Our study demonstrates that the implementation of our designed model leads to a significant enhancement in the model's performance, as evidenced by the testing of two acoustic datasets.

Publication
IEICE TRANSACTIONS on Information Vol.E107-D No.1 pp.153-156
Publication Date
2024/01/01
Publicized
2023/10/17
Online ISSN
1745-1361
DOI
10.1587/transinf.2023EDL8048
Type of Manuscript
LETTER
Category
Speech and Hearing

Authors

Rong HUANG
  Nanjing University of Posts and Telecommunications
Yue XIE
  Nanjing Institute of Technology

Keyword