The Comparison of Attention Mechanisms with Different Embedding Modes for Performance Improvement of Fine-Grained Classification

Wujian YE; Run TAN; Yijun LIU; Chin-Chen CHANG

doi:10.1587/transinf.2022DLP0006

The Comparison of Attention Mechanisms with Different Embedding Modes for Performance Improvement of Fine-Grained Classification

Wujian YE, Run TAN, Yijun LIU, Chin-Chen CHANG

Full Text Views

7

Cite this

Summary :

Fine-grained image classification is one of the key basic tasks of computer vision. The appearance of traditional deep convolutional neural network (DCNN) combined with attention mechanism can focus on partial and local features of fine-grained images, but it still lacks the consideration of the embedding mode of different attention modules in the network, leading to the unsatisfactory result of classification model. To solve the above problems, three different attention mechanisms are introduced into the DCNN network (like ResNet, VGGNet, etc.), including SE, CBAM and ECA modules, so that DCNN could better focus on the key local features of salient regions in the image. At the same time, we adopt three different embedding modes of attention modules, including serial, residual and parallel modes, to further improve the performance of the classification model. The experimental results show that the three attention modules combined with three different embedding modes can improve the performance of DCNN network effectively. Moreover, compared with SE and ECA, CBAM has stronger feature extraction capability. Among them, the parallelly embedded CBAM can make the local information paid attention to by DCNN richer and more accurate, and bring the optimal effect for DCNN, which is 1.98% and 1.57% higher than that of original VGG16 and Resnet34 in CUB-200-2011 dataset, respectively. The visualization analysis also indicates that the attention modules can be easily embedded into DCNN networks, especially in the parallel mode, with stronger generality and universality.

Publication: IEICE TRANSACTIONS on Information Vol.E106-D No.5 pp.590-600

Publication Date: 2023/05/01

Publicized: 2021/12/22

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2022DLP0006

Type of Manuscript: Special Section PAPER (Special Section on Deep Learning Technologies: Architecture, Optimization, Techniques, and Applications)

Category: Core Methods

Authors

Wujian YE
  Guangdong University of Technology
Run TAN
  Guangdong University of Technology
Yijun LIU
  Guangdong University of Technology
Chin-Chen CHANG
  Feng Chia University

Keyword

fine-grained classification, attention block, embedding mode, attention visualization

Cite this

Copy

Wujian YE, Run TAN, Yijun LIU, Chin-Chen CHANG, "The Comparison of Attention Mechanisms with Different Embedding Modes for Performance Improvement of Fine-Grained Classification" in IEICE TRANSACTIONS on Information, vol. E106-D, no. 5, pp. 590-600, May 2023, doi: 10.1587/transinf.2022DLP0006.
Abstract: Fine-grained image classification is one of the key basic tasks of computer vision. The appearance of traditional deep convolutional neural network (DCNN) combined with attention mechanism can focus on partial and local features of fine-grained images, but it still lacks the consideration of the embedding mode of different attention modules in the network, leading to the unsatisfactory result of classification model. To solve the above problems, three different attention mechanisms are introduced into the DCNN network (like ResNet, VGGNet, etc.), including SE, CBAM and ECA modules, so that DCNN could better focus on the key local features of salient regions in the image. At the same time, we adopt three different embedding modes of attention modules, including serial, residual and parallel modes, to further improve the performance of the classification model. The experimental results show that the three attention modules combined with three different embedding modes can improve the performance of DCNN network effectively. Moreover, compared with SE and ECA, CBAM has stronger feature extraction capability. Among them, the parallelly embedded CBAM can make the local information paid attention to by DCNN richer and more accurate, and bring the optimal effect for DCNN, which is 1.98% and 1.57% higher than that of original VGG16 and Resnet34 in CUB-200-2011 dataset, respectively. The visualization analysis also indicates that the attention modules can be easily embedded into DCNN networks, especially in the parallel mode, with stronger generality and universality.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2022DLP0006/_p

Copy

@ARTICLE{e106-d_5_590,
author={Wujian YE, Run TAN, Yijun LIU, Chin-Chen CHANG, },
journal={IEICE TRANSACTIONS on Information},
title={The Comparison of Attention Mechanisms with Different Embedding Modes for Performance Improvement of Fine-Grained Classification},
year={2023},
volume={E106-D},
number={5},
pages={590-600},
abstract={Fine-grained image classification is one of the key basic tasks of computer vision. The appearance of traditional deep convolutional neural network (DCNN) combined with attention mechanism can focus on partial and local features of fine-grained images, but it still lacks the consideration of the embedding mode of different attention modules in the network, leading to the unsatisfactory result of classification model. To solve the above problems, three different attention mechanisms are introduced into the DCNN network (like ResNet, VGGNet, etc.), including SE, CBAM and ECA modules, so that DCNN could better focus on the key local features of salient regions in the image. At the same time, we adopt three different embedding modes of attention modules, including serial, residual and parallel modes, to further improve the performance of the classification model. The experimental results show that the three attention modules combined with three different embedding modes can improve the performance of DCNN network effectively. Moreover, compared with SE and ECA, CBAM has stronger feature extraction capability. Among them, the parallelly embedded CBAM can make the local information paid attention to by DCNN richer and more accurate, and bring the optimal effect for DCNN, which is 1.98% and 1.57% higher than that of original VGG16 and Resnet34 in CUB-200-2011 dataset, respectively. The visualization analysis also indicates that the attention modules can be easily embedded into DCNN networks, especially in the parallel mode, with stronger generality and universality.},
keywords={},
doi={10.1587/transinf.2022DLP0006},
ISSN={1745-1361},
month={May},}

Copy

TY - JOUR
TI - The Comparison of Attention Mechanisms with Different Embedding Modes for Performance Improvement of Fine-Grained Classification
T2 - IEICE TRANSACTIONS on Information
SP - 590
EP - 600
AU - Wujian YE
AU - Run TAN
AU - Yijun LIU
AU - Chin-Chen CHANG
PY - 2023
DO - 10.1587/transinf.2022DLP0006
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E106-D
IS - 5
JA - IEICE TRANSACTIONS on Information
Y1 - May 2023
AB - Fine-grained image classification is one of the key basic tasks of computer vision. The appearance of traditional deep convolutional neural network (DCNN) combined with attention mechanism can focus on partial and local features of fine-grained images, but it still lacks the consideration of the embedding mode of different attention modules in the network, leading to the unsatisfactory result of classification model. To solve the above problems, three different attention mechanisms are introduced into the DCNN network (like ResNet, VGGNet, etc.), including SE, CBAM and ECA modules, so that DCNN could better focus on the key local features of salient regions in the image. At the same time, we adopt three different embedding modes of attention modules, including serial, residual and parallel modes, to further improve the performance of the classification model. The experimental results show that the three attention modules combined with three different embedding modes can improve the performance of DCNN network effectively. Moreover, compared with SE and ECA, CBAM has stronger feature extraction capability. Among them, the parallelly embedded CBAM can make the local information paid attention to by DCNN richer and more accurate, and bring the optimal effect for DCNN, which is 1.98% and 1.57% higher than that of original VGG16 and Resnet34 in CUB-200-2011 dataset, respectively. The visualization analysis also indicates that the attention modules can be easily embedded into DCNN networks, especially in the parallel mode, with stronger generality and universality.
ER -