Video-based person re-identification (re-ID) aims at retrieving person across non-overlapping camera and has achieved promising results owing to deep convolutional neural network. Due to the dynamic properties of the video, the problems of background clutters and occlusion are more serious than image-based person Re-ID. In this letter, we present a novel triple attention network (TriANet) that simultaneously utilizes temporal, spatial, and channel context information by employing the self-attention mechanism to get robust and discriminative feature. Specifically, the network has two parts, where the first part introduces a residual attention subnetwork, which contains channel attention module to capture cross-dimension dependencies by using rotation and transformation and spatial attention module to focus on pedestrian feature. In the second part, a time attention module is designed to judge the quality score of each pedestrian, and to reduce the weight of the incomplete pedestrian image to alleviate the occlusion problem. We evaluate our proposed architecture on three datasets, iLIDS-VID, PRID2011 and MARS. Extensive comparative experimental results show that our proposed method achieves state-of-the-art results.
Rui SUN
Hefei University of Technology
Qili LIANG
Hefei University of Technology
Zi YANG
Hefei University of Technology
Zhenghui ZHAO
Hefei University of Technology
Xudong ZHANG
Hefei University of Technology
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Rui SUN, Qili LIANG, Zi YANG, Zhenghui ZHAO, Xudong ZHANG, "Triplet Attention Network for Video-Based Person Re-Identification" in IEICE TRANSACTIONS on Information,
vol. E104-D, no. 10, pp. 1775-1779, October 2021, doi: 10.1587/transinf.2021EDL8037.
Abstract: Video-based person re-identification (re-ID) aims at retrieving person across non-overlapping camera and has achieved promising results owing to deep convolutional neural network. Due to the dynamic properties of the video, the problems of background clutters and occlusion are more serious than image-based person Re-ID. In this letter, we present a novel triple attention network (TriANet) that simultaneously utilizes temporal, spatial, and channel context information by employing the self-attention mechanism to get robust and discriminative feature. Specifically, the network has two parts, where the first part introduces a residual attention subnetwork, which contains channel attention module to capture cross-dimension dependencies by using rotation and transformation and spatial attention module to focus on pedestrian feature. In the second part, a time attention module is designed to judge the quality score of each pedestrian, and to reduce the weight of the incomplete pedestrian image to alleviate the occlusion problem. We evaluate our proposed architecture on three datasets, iLIDS-VID, PRID2011 and MARS. Extensive comparative experimental results show that our proposed method achieves state-of-the-art results.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2021EDL8037/_p
Copy
@ARTICLE{e104-d_10_1775,
author={Rui SUN, Qili LIANG, Zi YANG, Zhenghui ZHAO, Xudong ZHANG, },
journal={IEICE TRANSACTIONS on Information},
title={Triplet Attention Network for Video-Based Person Re-Identification},
year={2021},
volume={E104-D},
number={10},
pages={1775-1779},
abstract={Video-based person re-identification (re-ID) aims at retrieving person across non-overlapping camera and has achieved promising results owing to deep convolutional neural network. Due to the dynamic properties of the video, the problems of background clutters and occlusion are more serious than image-based person Re-ID. In this letter, we present a novel triple attention network (TriANet) that simultaneously utilizes temporal, spatial, and channel context information by employing the self-attention mechanism to get robust and discriminative feature. Specifically, the network has two parts, where the first part introduces a residual attention subnetwork, which contains channel attention module to capture cross-dimension dependencies by using rotation and transformation and spatial attention module to focus on pedestrian feature. In the second part, a time attention module is designed to judge the quality score of each pedestrian, and to reduce the weight of the incomplete pedestrian image to alleviate the occlusion problem. We evaluate our proposed architecture on three datasets, iLIDS-VID, PRID2011 and MARS. Extensive comparative experimental results show that our proposed method achieves state-of-the-art results.},
keywords={},
doi={10.1587/transinf.2021EDL8037},
ISSN={1745-1361},
month={October},}
Copy
TY - JOUR
TI - Triplet Attention Network for Video-Based Person Re-Identification
T2 - IEICE TRANSACTIONS on Information
SP - 1775
EP - 1779
AU - Rui SUN
AU - Qili LIANG
AU - Zi YANG
AU - Zhenghui ZHAO
AU - Xudong ZHANG
PY - 2021
DO - 10.1587/transinf.2021EDL8037
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E104-D
IS - 10
JA - IEICE TRANSACTIONS on Information
Y1 - October 2021
AB - Video-based person re-identification (re-ID) aims at retrieving person across non-overlapping camera and has achieved promising results owing to deep convolutional neural network. Due to the dynamic properties of the video, the problems of background clutters and occlusion are more serious than image-based person Re-ID. In this letter, we present a novel triple attention network (TriANet) that simultaneously utilizes temporal, spatial, and channel context information by employing the self-attention mechanism to get robust and discriminative feature. Specifically, the network has two parts, where the first part introduces a residual attention subnetwork, which contains channel attention module to capture cross-dimension dependencies by using rotation and transformation and spatial attention module to focus on pedestrian feature. In the second part, a time attention module is designed to judge the quality score of each pedestrian, and to reduce the weight of the incomplete pedestrian image to alleviate the occlusion problem. We evaluate our proposed architecture on three datasets, iLIDS-VID, PRID2011 and MARS. Extensive comparative experimental results show that our proposed method achieves state-of-the-art results.
ER -