Attention-Guided Spatial Transformer Networks for Fine-Grained Visual Recognition

Dichao LIU; Yu WANG; Jien KATO

doi:10.1587/transinf.2019EDP7045

IEICE TRANSACTIONS on Information

Attention-Guided Spatial Transformer Networks for Fine-Grained Visual Recognition

Dichao LIU, Yu WANG, Jien KATO

Full Text Views

0

Cite this

Summary :

The aim of this paper is to propose effective attentional regions for fine-grained visual recognition. Based on the Spatial Transformers' capability of spatial manipulation within networks, we propose an extension model, the Attention-Guided Spatial Transformer Networks (AG-STNs). This model can guide the Spatial Transformers with hard-coded attentional regions at first. Then such guidance can be turned off, and the network model will adjust the region learning in terms of the location and scale. Such adjustment is conditioned to the classification loss so that it is actually optimized for better recognition results. With this model, we are able to successfully capture detailed attentional information. Also, the AG-STNs are able to capture attentional information in multiple levels, and different levels of attentional information are complementary to each other in our experiments. A fusion of them brings better results.

Publication: IEICE TRANSACTIONS on Information Vol.E102-D No.12 pp.2577-2586

Publication Date: 2019/12/01

Publicized: 2019/09/04

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2019EDP7045

Type of Manuscript: PAPER

Category: Image Recognition, Computer Vision

Authors

Dichao LIU
  Nagoya University
Yu WANG
  Ritsumeikan University
Jien KATO
  Ritsumeikan University

Keyword

recognition, attention, fine-grained, deep learning

Cite this

Copy

Dichao LIU, Yu WANG, Jien KATO, "Attention-Guided Spatial Transformer Networks for Fine-Grained Visual Recognition" in IEICE TRANSACTIONS on Information, vol. E102-D, no. 12, pp. 2577-2586, December 2019, doi: 10.1587/transinf.2019EDP7045.
Abstract: The aim of this paper is to propose effective attentional regions for fine-grained visual recognition. Based on the Spatial Transformers' capability of spatial manipulation within networks, we propose an extension model, the Attention-Guided Spatial Transformer Networks (AG-STNs). This model can guide the Spatial Transformers with hard-coded attentional regions at first. Then such guidance can be turned off, and the network model will adjust the region learning in terms of the location and scale. Such adjustment is conditioned to the classification loss so that it is actually optimized for better recognition results. With this model, we are able to successfully capture detailed attentional information. Also, the AG-STNs are able to capture attentional information in multiple levels, and different levels of attentional information are complementary to each other in our experiments. A fusion of them brings better results.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2019EDP7045/_p

Copy

@ARTICLE{e102-d_12_2577,
author={Dichao LIU, Yu WANG, Jien KATO, },
journal={IEICE TRANSACTIONS on Information},
title={Attention-Guided Spatial Transformer Networks for Fine-Grained Visual Recognition},
year={2019},
volume={E102-D},
number={12},
pages={2577-2586},
abstract={The aim of this paper is to propose effective attentional regions for fine-grained visual recognition. Based on the Spatial Transformers' capability of spatial manipulation within networks, we propose an extension model, the Attention-Guided Spatial Transformer Networks (AG-STNs). This model can guide the Spatial Transformers with hard-coded attentional regions at first. Then such guidance can be turned off, and the network model will adjust the region learning in terms of the location and scale. Such adjustment is conditioned to the classification loss so that it is actually optimized for better recognition results. With this model, we are able to successfully capture detailed attentional information. Also, the AG-STNs are able to capture attentional information in multiple levels, and different levels of attentional information are complementary to each other in our experiments. A fusion of them brings better results.},
keywords={},
doi={10.1587/transinf.2019EDP7045},
ISSN={1745-1361},
month={December},}

Copy

TY - JOUR
TI - Attention-Guided Spatial Transformer Networks for Fine-Grained Visual Recognition
T2 - IEICE TRANSACTIONS on Information
SP - 2577
EP - 2586
AU - Dichao LIU
AU - Yu WANG
AU - Jien KATO
PY - 2019
DO - 10.1587/transinf.2019EDP7045
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E102-D
IS - 12
JA - IEICE TRANSACTIONS on Information
Y1 - December 2019
AB - The aim of this paper is to propose effective attentional regions for fine-grained visual recognition. Based on the Spatial Transformers' capability of spatial manipulation within networks, we propose an extension model, the Attention-Guided Spatial Transformer Networks (AG-STNs). This model can guide the Spatial Transformers with hard-coded attentional regions at first. Then such guidance can be turned off, and the network model will adjust the region learning in terms of the location and scale. Such adjustment is conditioned to the classification loss so that it is actually optimized for better recognition results. With this model, we are able to successfully capture detailed attentional information. Also, the AG-STNs are able to capture attentional information in multiple levels, and different levels of attentional information are complementary to each other in our experiments. A fusion of them brings better results.
ER -

IEICE TRANSACTIONS on Information