Triplet Attention Network for Video-Based Person Re-Identification

Rui SUN; Qili LIANG; Zi YANG; Zhenghui ZHAO; Xudong ZHANG

doi:10.1587/transinf.2021EDL8037

Triplet Attention Network for Video-Based Person Re-Identification

Rui SUN, Qili LIANG, Zi YANG, Zhenghui ZHAO, Xudong ZHANG

Full Text Views

0

Cite this

Summary :

Video-based person re-identification (re-ID) aims at retrieving person across non-overlapping camera and has achieved promising results owing to deep convolutional neural network. Due to the dynamic properties of the video, the problems of background clutters and occlusion are more serious than image-based person Re-ID. In this letter, we present a novel triple attention network (TriANet) that simultaneously utilizes temporal, spatial, and channel context information by employing the self-attention mechanism to get robust and discriminative feature. Specifically, the network has two parts, where the first part introduces a residual attention subnetwork, which contains channel attention module to capture cross-dimension dependencies by using rotation and transformation and spatial attention module to focus on pedestrian feature. In the second part, a time attention module is designed to judge the quality score of each pedestrian, and to reduce the weight of the incomplete pedestrian image to alleviate the occlusion problem. We evaluate our proposed architecture on three datasets, iLIDS-VID, PRID2011 and MARS. Extensive comparative experimental results show that our proposed method achieves state-of-the-art results.

Publication: IEICE TRANSACTIONS on Information Vol.E104-D No.10 pp.1775-1779

Publication Date: 2021/10/01

Publicized: 2021/07/21

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2021EDL8037

Type of Manuscript: LETTER

Category: Image Recognition, Computer Vision

Authors

Rui SUN
  Hefei University of Technology
Qili LIANG
  Hefei University of Technology
Zi YANG
  Hefei University of Technology
Zhenghui ZHAO
  Hefei University of Technology
Xudong ZHANG
  Hefei University of Technology

Keyword

person re-identification, convolutional neural network, attention mechanism, deep learning

Cite this

Copy

Rui SUN, Qili LIANG, Zi YANG, Zhenghui ZHAO, Xudong ZHANG, "Triplet Attention Network for Video-Based Person Re-Identification" in IEICE TRANSACTIONS on Information, vol. E104-D, no. 10, pp. 1775-1779, October 2021, doi: 10.1587/transinf.2021EDL8037.
Abstract: Video-based person re-identification (re-ID) aims at retrieving person across non-overlapping camera and has achieved promising results owing to deep convolutional neural network. Due to the dynamic properties of the video, the problems of background clutters and occlusion are more serious than image-based person Re-ID. In this letter, we present a novel triple attention network (TriANet) that simultaneously utilizes temporal, spatial, and channel context information by employing the self-attention mechanism to get robust and discriminative feature. Specifically, the network has two parts, where the first part introduces a residual attention subnetwork, which contains channel attention module to capture cross-dimension dependencies by using rotation and transformation and spatial attention module to focus on pedestrian feature. In the second part, a time attention module is designed to judge the quality score of each pedestrian, and to reduce the weight of the incomplete pedestrian image to alleviate the occlusion problem. We evaluate our proposed architecture on three datasets, iLIDS-VID, PRID2011 and MARS. Extensive comparative experimental results show that our proposed method achieves state-of-the-art results.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2021EDL8037/_p

Copy

@ARTICLE{e104-d_10_1775,
author={Rui SUN, Qili LIANG, Zi YANG, Zhenghui ZHAO, Xudong ZHANG, },
journal={IEICE TRANSACTIONS on Information},
title={Triplet Attention Network for Video-Based Person Re-Identification},
year={2021},
volume={E104-D},
number={10},
pages={1775-1779},
abstract={Video-based person re-identification (re-ID) aims at retrieving person across non-overlapping camera and has achieved promising results owing to deep convolutional neural network. Due to the dynamic properties of the video, the problems of background clutters and occlusion are more serious than image-based person Re-ID. In this letter, we present a novel triple attention network (TriANet) that simultaneously utilizes temporal, spatial, and channel context information by employing the self-attention mechanism to get robust and discriminative feature. Specifically, the network has two parts, where the first part introduces a residual attention subnetwork, which contains channel attention module to capture cross-dimension dependencies by using rotation and transformation and spatial attention module to focus on pedestrian feature. In the second part, a time attention module is designed to judge the quality score of each pedestrian, and to reduce the weight of the incomplete pedestrian image to alleviate the occlusion problem. We evaluate our proposed architecture on three datasets, iLIDS-VID, PRID2011 and MARS. Extensive comparative experimental results show that our proposed method achieves state-of-the-art results.},
keywords={},
doi={10.1587/transinf.2021EDL8037},
ISSN={1745-1361},
month={October},}

Copy

TY - JOUR
TI - Triplet Attention Network for Video-Based Person Re-Identification
T2 - IEICE TRANSACTIONS on Information
SP - 1775
EP - 1779
AU - Rui SUN
AU - Qili LIANG
AU - Zi YANG
AU - Zhenghui ZHAO
AU - Xudong ZHANG
PY - 2021
DO - 10.1587/transinf.2021EDL8037
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E104-D
IS - 10
JA - IEICE TRANSACTIONS on Information
Y1 - October 2021
AB - Video-based person re-identification (re-ID) aims at retrieving person across non-overlapping camera and has achieved promising results owing to deep convolutional neural network. Due to the dynamic properties of the video, the problems of background clutters and occlusion are more serious than image-based person Re-ID. In this letter, we present a novel triple attention network (TriANet) that simultaneously utilizes temporal, spatial, and channel context information by employing the self-attention mechanism to get robust and discriminative feature. Specifically, the network has two parts, where the first part introduces a residual attention subnetwork, which contains channel attention module to capture cross-dimension dependencies by using rotation and transformation and spatial attention module to focus on pedestrian feature. In the second part, a time attention module is designed to judge the quality score of each pedestrian, and to reduce the weight of the incomplete pedestrian image to alleviate the occlusion problem. We evaluate our proposed architecture on three datasets, iLIDS-VID, PRID2011 and MARS. Extensive comparative experimental results show that our proposed method achieves state-of-the-art results.
ER -