Despite Siamese trackers attracting much attention due to their scalability and efficiency in recent years, researchers have ignored the background appearance, which leads to their inapplicability in recognizing arbitrary target objects with various variations, especially in complex scenarios with background clutter and distractors. In this paper, we present a simple yet effective Siamese tracker, where the shifted windows multi-head self-attention is produced to learn the characteristics of a specific given target object for visual tracking. To validate the effectiveness of our proposed tracker, we use the Swin Transformer as the backbone network and introduced an auxiliary feature enhancement network. Extensive experimental results on two evaluation datasets demonstrate that the proposed tracker outperforms other baselines.
Peng GAO
Qufu Normal University
Xin-Yue ZHANG
Qufu Normal University
Xiao-Li YANG
Qufu Normal University
Jian-Cheng NI
Qufu Normal University
Fei WANG
Harbin Institute of Technology
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Peng GAO, Xin-Yue ZHANG, Xiao-Li YANG, Jian-Cheng NI, Fei WANG, "Robust Visual Tracking Using Hierarchical Vision Transformer with Shifted Windows Multi-Head Self-Attention" in IEICE TRANSACTIONS on Information,
vol. E107-D, no. 1, pp. 161-164, January 2024, doi: 10.1587/transinf.2023EDL8053.
Abstract: Despite Siamese trackers attracting much attention due to their scalability and efficiency in recent years, researchers have ignored the background appearance, which leads to their inapplicability in recognizing arbitrary target objects with various variations, especially in complex scenarios with background clutter and distractors. In this paper, we present a simple yet effective Siamese tracker, where the shifted windows multi-head self-attention is produced to learn the characteristics of a specific given target object for visual tracking. To validate the effectiveness of our proposed tracker, we use the Swin Transformer as the backbone network and introduced an auxiliary feature enhancement network. Extensive experimental results on two evaluation datasets demonstrate that the proposed tracker outperforms other baselines.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2023EDL8053/_p
Copy
@ARTICLE{e107-d_1_161,
author={Peng GAO, Xin-Yue ZHANG, Xiao-Li YANG, Jian-Cheng NI, Fei WANG, },
journal={IEICE TRANSACTIONS on Information},
title={Robust Visual Tracking Using Hierarchical Vision Transformer with Shifted Windows Multi-Head Self-Attention},
year={2024},
volume={E107-D},
number={1},
pages={161-164},
abstract={Despite Siamese trackers attracting much attention due to their scalability and efficiency in recent years, researchers have ignored the background appearance, which leads to their inapplicability in recognizing arbitrary target objects with various variations, especially in complex scenarios with background clutter and distractors. In this paper, we present a simple yet effective Siamese tracker, where the shifted windows multi-head self-attention is produced to learn the characteristics of a specific given target object for visual tracking. To validate the effectiveness of our proposed tracker, we use the Swin Transformer as the backbone network and introduced an auxiliary feature enhancement network. Extensive experimental results on two evaluation datasets demonstrate that the proposed tracker outperforms other baselines.},
keywords={},
doi={10.1587/transinf.2023EDL8053},
ISSN={1745-1361},
month={January},}
Copy
TY - JOUR
TI - Robust Visual Tracking Using Hierarchical Vision Transformer with Shifted Windows Multi-Head Self-Attention
T2 - IEICE TRANSACTIONS on Information
SP - 161
EP - 164
AU - Peng GAO
AU - Xin-Yue ZHANG
AU - Xiao-Li YANG
AU - Jian-Cheng NI
AU - Fei WANG
PY - 2024
DO - 10.1587/transinf.2023EDL8053
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E107-D
IS - 1
JA - IEICE TRANSACTIONS on Information
Y1 - January 2024
AB - Despite Siamese trackers attracting much attention due to their scalability and efficiency in recent years, researchers have ignored the background appearance, which leads to their inapplicability in recognizing arbitrary target objects with various variations, especially in complex scenarios with background clutter and distractors. In this paper, we present a simple yet effective Siamese tracker, where the shifted windows multi-head self-attention is produced to learn the characteristics of a specific given target object for visual tracking. To validate the effectiveness of our proposed tracker, we use the Swin Transformer as the backbone network and introduced an auxiliary feature enhancement network. Extensive experimental results on two evaluation datasets demonstrate that the proposed tracker outperforms other baselines.
ER -