Peng GAO Xin-Yue ZHANG Xiao-Li YANG Jian-Cheng NI Fei WANG
Despite Siamese trackers attracting much attention due to their scalability and efficiency in recent years, researchers have ignored the background appearance, which leads to their inapplicability in recognizing arbitrary target objects with various variations, especially in complex scenarios with background clutter and distractors. In this paper, we present a simple yet effective Siamese tracker, where the shifted windows multi-head self-attention is produced to learn the characteristics of a specific given target object for visual tracking. To validate the effectiveness of our proposed tracker, we use the Swin Transformer as the backbone network and introduced an auxiliary feature enhancement network. Extensive experimental results on two evaluation datasets demonstrate that the proposed tracker outperforms other baselines.
Chenchen MENG Jun WANG Chengzhi DENG Yuanyun WANG Shengqian WANG
Feature representation is a key component of most visual tracking algorithms. It is difficult to deal with complex appearance changes with low-level hand-crafted features due to weak representation capacities of such features. In this paper, we propose a novel tracking algorithm through combining a joint dictionary pair learning with convolutional neural networks (CNN). We utilize CNN model that is trained on ImageNet-Vid to extract target features. The CNN includes three convolutional layers and two fully connected layers. A dictionary pair learning follows the second fully connected layer. The joint dictionary pair is learned upon extracted deep features by the trained CNN model. The temporal variations of target appearances are learned in the dictionary learning. We use the learned dictionaries to encode target candidates. A linear combination of atoms in the learned dictionary is used to represent target candidates. Extensive experimental evaluations on OTB2015 demonstrate the superior performances against SOTA trackers.
Xin ZENG Lin ZHANG Zhongqiang LUO Xingzhong XIONG Chengjie LI
In recent years, the development of visual tracking is getting better and better, but some methods cannot overcome the problem of low accuracy and success rate of tracking. Although there are some trackers will be more accurate, they will cost more time. In order to solve the problem, we propose a reinforced tracker based on Hierarchical Convolutional Features (HCF for short). HOG, color-naming and grayscale features are used with different weights to supplement the convolution features, which can enhance the tracking robustness. At the same time, we improved the model update strategy to save the time costs. This tracker is called RHCF and the code is published on https://github.com/z15846/RHCF. Experiments on the OTB2013 dataset show that our tracker can validly achieve the promotion of the accuracy and success rate.
Ying KANG Cong LIU Ning WANG Dianxi SHI Ning ZHOU Mengmeng LI Yunlong WU
Siamese visual tracking, viewed as a problem of max-similarity matching to the target template, has absorbed increasing attention in computer vision. However, it is a challenge for current Siamese trackers that the demands of balance between accuracy in real-time tracking and robustness in long-time tracking are hard to meet. This work proposes a new Siamese based tracker with a dual-pipeline correlated fusion network (named as ADF-SiamRPN), which consists of one initial template for robust correlation, and the other transient template with the ability of adaptive feature optimal selection for accurate correlation. By the promotion from the learnable correlation-response fusion network afterwards, we are in pursuit of the synthetical improvement of tracking performance. To compare the performance of ADF-SiamRPN with state-of-the-art trackers, we conduct lots of experiments on benchmarks like OTB100, UAV123, VOT2016, VOT2018, GOT-10k, LaSOT and TrackingNet. The experimental results of tracking demonstrate that ADF-SiamRPN outperforms all the compared trackers and achieves the best balance between accuracy and robustness.
Recently, visual trackers based on the framework of kernelized correlation filter (KCF) achieve the robustness and accuracy results. These trackers need to learn information on the object from each frame, thus the state change of the object affects the tracking performances. In order to deal with the state change, we propose a novel KCF tracker using the filter response map, namely a confidence map, and adaptive model. This method firstly takes a skipped scale pool method which utilizes variable window size at every two frames. Secondly, the location of the object is estimated using the combination of the filter response and the similarity of the luminance histogram at multiple points in the confidence map. Moreover, we use the re-detection of the multiple peaks of the confidence map to prevent the target drift and reduce the influence of illumination. Thirdly, the learning rate to obtain the model of the object is adjusted, using the filter response and the similarity of the luminance histogram, considering the state of the object. Experimentally, the proposed tracker (CFCA) achieves outstanding performance for the challenging benchmark sequence (OTB2013 and OTB2015).
Yao GE Rui CHEN Ying TONG Xuehong CAO Ruiyu LIANG
We combine the siamese network and the recurrent regression network, proposing a two-stage tracking framework termed as SiamReg. Our method solves the problem that the classic siamese network can not judge the target size precisely and simplifies the procedures of regression in the training and testing process. We perform experiments on three challenging tracking datasets: VOT2016, OTB100, and VOT2018. The results indicate that, after offline trained, SiamReg can obtain a higher expected average overlap measure.
Zuopeng ZHAO Hongda ZHANG Yi LIU Nana ZHOU Han ZHENG Shanyi SUN Xiaoman LI Sili XIA
Although correlation filter-based trackers have demonstrated excellent performance for visual object tracking, there remain several challenges to be addressed. In this work, we propose a novel tracker based on the correlation filter framework. Traditional trackers face difficulty in accurately adapting to changes in the scale of the target when the target moves quickly. To address this, we suggest a scale adaptive scheme based on prediction scales. We also incorporate a speed-based adaptive model update method to further improve overall tracking performance. Experiments with samples from the OTB100 and KITTI datasets demonstrate that our method outperforms existing state-of-the-art tracking algorithms in fast motion scenes.
Tiansa ZHANG Chunlei HUO Zhiqiang ZHOU Bo WANG
By taking advantages of deep learning and reinforcement learning, ADNet (Action Decision Network) outperforms other approaches. However, its speed and performance are still limited by factors such as unreliable confidence score estimation and redundant historical actions. To address the above limitations, a faster and more accurate approach named Faster-ADNet is proposed in this paper. By optimizing the tracking process via a status re-identification network, the proposed approach is more efficient and 6 times faster than ADNet. At the same time, the accuracy and stability are enhanced by historical actions removal. Experiments demonstrate the advantages of Faster-ADNet.
Chenggang GUO Dongyi CHEN Zhiqi HUANG
Sparse representation has been successfully applied to visual tracking. Recent progresses in sparse tracking are mainly made within the particle filter framework. However, most sparse trackers need to extract complex feature representations for each particle in the limited sample space, leading to expensive computation cost and yielding inferior tracking performance. To deal with the above issues, we propose a novel sparse tracking method based on the circulant reverse lasso model. Benefiting from the properties of circulant matrices, densely sampled target candidates are implicitly generated by cyclically shifting the base feature descriptors, and then embedded into a reverse sparse reconstruction model as a dictionary to encode a robust appearance template. The alternating direction method of multipliers is employed for solving the reverse sparse model and the optimization process can be efficiently solved in the frequency domain, which enables the proposed tracker to run in real-time. The calculated sparse coefficient map represents the similarity scores between the template and circular shifted samples. Thus the target location can be directly predicted according to the coordinates of the peak coefficient. A scale-aware template updating strategy is combined with the correlation filter template learning to take into account both appearance deformations and scale variations. Both quantitative and qualitative evaluations on two challenging tracking benchmarks demonstrate that the proposed algorithm performs favorably against several state-of-the-art sparse representation based tracking methods.
Jiatian PI Shaohua ZENG Qing ZUO Yan WEI
Visual tracking has been studied for several decades but continues to draw significant attention because of its critical role in many applications. This letter handles the problem of fixed template size in Kernelized Correlation Filter (KCF) tracker with no significant decrease in the speed. Extensive experiments are performed on the new OTB dataset.
Shan JIANG Cheng HAN Xiaoqiang DI
Sparse representation has been widely applied to visual tracking for several years. In the sparse representation framework, tracking problem is transferred into solving an L1 minimization issue. However, during the tracking procedure, the appearance of target was affected by external environment. Therefore, we proposed a robust tracking algorithm based on the traditional sparse representation jointly particle filter framework. First, we obtained the observation image set from particle filter. Furthermore, we introduced a 2D transformation on the observation image set, which enables the tracking target candidates set more robust to handle misalignment problem in complex scene. Moreover, we adopt the occlusion detection mechanism before template updating, reducing the drift problem effectively. Experimental evaluations on five public challenging sequences, which exhibit occlusions, illuminating variations, scale changes, motion blur, and our tracker demonstrate accuracy and robustness in comparisons with the state-of-the-arts.
Jun WANG Yuanyun WANG Chengzhi DENG Shengqian WANG Yong QIN
Developing a robust appearance model is a challenging task due to appearance variations of objects such as partial occlusion, illumination variation, rotation and background clutter. Existing tracking algorithms employ linear combinations of target templates to represent target appearances, which are not accurate enough to deal with appearance variations. The underlying relationship between target candidates and the target templates is highly nonlinear because of complicated appearance variations. To address this, this paper presents a regularized kernel representation for visual tracking. Namely, the feature vectors of target appearances are mapped into higher dimensional features, in which a target candidate is approximately represented by a nonlinear combination of target templates in a dimensional space. The kernel based appearance model takes advantage of considering the non-linear relationship and capturing the nonlinear similarity between target candidates and target templates. l2-regularization on coding coefficients makes the approximate solution of target representations more stable. Comprehensive experiments demonstrate the superior performances in comparison with state-of-the-art trackers.
Yulong XU Yang LI Jiabao WANG Zhuang MIAO Hang LI Yafei ZHANG Gang TAO
Feature extractor is an important component of a tracker and the convolutional neural networks (CNNs) have demonstrated excellent performance in visual tracking. However, the CNN features cannot perform well under conditions of low illumination. To address this issue, we propose a novel deep correlation tracker with backtracking, which consists of target translation, backtracking and scale estimation. We employ four correlation filters, one with a histogram of oriented gradient (HOG) descriptor and the other three with the CNN features to estimate the translation. In particular, we propose a backtracking algorithm to reconfirm the translation location. Comprehensive experiments are performed on a large-scale challenging benchmark dataset. And the results show that the proposed algorithm outperforms state-of-the-art methods in accuracy and robustness.
Jieyan LIU Ao MA Jingjing LI Ke LU
Subspace representation model is an important subset of visual tracking algorithms. Compared with models performed on the original data space, subspace representation model can effectively reduce the computational complexity, and filter out high dimensional noises. However, for some complicated situations, e.g., dramatic illumination changing, large area of occlusion and abrupt object drifting, traditional subspace representation models may fail to handle the visual tracking task. In this paper, we propose a novel subspace representation algorithm for robust visual tracking by using low-rank representation with graph constraints (LRGC). Low-rank representation has been well-known for its superiority of handling corrupted samples, and graph constraint is flexible to characterize sample relationship. In this paper, we aim to exploit benefits from both low-rank representation and graph constraint, and deploy it to handle challenging visual tracking problems. Specifically, we first propose a novel graph structure to characterize the relationship of target object in different observation states. Then we learn a subspace by jointly optimizing low-rank representation and graph embedding in a unified framework. Finally, the learned subspace is embedded into a Bayesian inference framework by using the dynamical model and the observation model. Experiments on several video benchmarks demonstrate that our algorithm performs better than traditional ones, especially in dynamically changing and drifting situations.
Kai FANG Shuoyan LIU Chunjie XU Hao XUE
In this paper, an adaptive updating probabilistic model is proposed to track an object in real-world environment that includes motion blur, illumination changes, pose variations, and occlusions. This model adaptively updates tracker with the searching and updating process. The searching process focuses on how to learn appropriate tracker and updating process aims to correct it as a robust and efficient tracker in unconstrained real-world environments. Specifically, according to various changes in an object's appearance and recent probability matrix (TPM), tracker probability is achieved in Expectation-Maximization (EM) manner. When the tracking in each frame is completed, the estimated object's state is obtained and then fed into update current TPM and tracker probability via running EM in a similar manner. The highest tracker probability denotes the object location in every frame. The experimental result demonstrates that our method tracks targets accurately and robustly in the real-world tracking environments.
Yulong XU Yang LI Jiabao WANG Zhuang MIAO Hang LI Yafei ZHANG
Feature extractor plays an important role in visual tracking, but most state-of-the-art methods employ the same feature representation in all scenes. Taking into account the diverseness, a tracker should choose different features according to the videos. In this work, we propose a novel feature adaptive correlation tracker, which decomposes the tracking task into translation and scale estimation. According to the luminance of the target, our approach automatically selects either hierarchical convolutional features or histogram of oriented gradient features in translation for varied scenarios. Furthermore, we employ a discriminative correlation filter to handle scale variations. Extensive experiments are performed on a large-scale benchmark challenging dataset. And the results show that the proposed algorithm outperforms state-of-the-art trackers in accuracy and robustness.
Yulong XU Zhuang MIAO Jiabao WANG Yang LI Hang LI Yafei ZHANG Weiguang XU Zhisong PAN
Correlation filter-based approaches achieve competitive results in visual tracking, but the traditional correlation tracking methods failed in mining the color information of the videos. To address this issue, we propose a novel tracker combined with color features in a correlation filter framework, which extracts not only gray but also color information as the feature maps to compute the maximum response location via multi-channel correlation filters. In particular, we modify the label function of the conventional classifier to improve positioning accuracy and employ a discriminative correlation filter to handle scale variations. Experiments are performed on 35 challenging benchmark color sequences. And the results clearly show that our method outperforms state-of-the-art tracking approaches while operating in real-time.
We propose a new visual tracking method, where the target appearance is represented by combining color distribution and keypoints. Firstly, the object is localized via a keypoint-based tracking and matching strategy, where a new clustering method is presented to remove outliers. Secondly, the tracking confidence is evaluated by the color template. According to the tracking confidence, the local and global keypoints matching can be performed adaptively. Finally, we propose a target appearance update method in which the new appearance can be learned and added to the target model. The proposed tracker is compared with five state-of-the-art tracking methods on a recent benchmark dataset. Both qualitative and quantitative evaluations show that our method has favorable performance.
Jiatian PI Keli HU Yuzhang GU Lei QU Fengrong LI Xiaolin ZHANG Yunlong ZHAN
Visual tracking has been studied for several decades but continues to draw significant attention because of its critical role in many applications. Recent years have seen greater interest in the use of correlation filters in visual tracking systems, owing to their extremely compelling results in different competitions and benchmarks. However, there is still a need to improve the overall tracking capability to counter various tracking issues, including large scale variation, occlusion, and deformation. This paper presents an appealing tracker with robust scale estimation, which can handle the problem of fixed template size in Kernelized Correlation Filter (KCF) tracker with no significant decrease in the speed. We apply the discriminative correlation filter for scale estimation as an independent part after finding the optimal translation based on the KCF tracker. Compared to an exhaustive scale space search scheme, our approach provides improved performance while being computationally efficient. In order to reveal the effectiveness of our approach, we use benchmark sequences annotated with 11 attributes to evaluate how well the tracker handles different attributes. Numerous experiments demonstrate that the proposed algorithm performs favorably against several state-of-the-art algorithms. Appealing results both in accuracy and robustness are also achieved on all 51 benchmark sequences, which proves the efficiency of our tracker.
Suofei ZHANG Zhixin SUN Xu CHENG Lin ZHOU
This work presents an object tracking framework which is based on integration of Deformable Part based Models (DPMs) and Dynamic Conditional Random Fields (DCRF). In this framework, we propose a DCRF based novel way to track an object and its details on multiple resolutions simultaneously. Meanwhile, we tackle drastic variations in target appearance such as pose, view, scale and illumination changes with DPMs. To embed DPMs into DCRF, we design specific temporal potential functions between vertices by explicitly formulating deformation and partial occlusion respectively. Furthermore, temporal transition functions between mixture models bring higher robustness to perspective and pose changes. To evaluate the efficacy of our proposed method, quantitative tests on six challenging video sequences are conducted and the results are analyzed. Experimental results indicate that the method effectively addresses serious problems in object tracking and performs favorably against state-of-the-art trackers.