1-1hit |
Tongwei LU Hao ZHANG Feng MIN Shihai JIA
Convolutional neural network (CNN) based vehicle re-identificatioin (ReID) inevitably has many disadvantages, such as information loss caused by downsampling operation. Therefore we propose a vision transformer (Vit) based vehicle ReID method to solve this problem. To improve the feature representation of vision transformer and make full use of additional vehicle information, the following methods are presented. (I) We propose a Quadratic Split Architecture (QSA) to learn both global and local features. More precisely, we split an image into many patches as “global part” and further split them into smaller sub-patches as “local part”. Features of both global and local part will be aggregated to enhance the representation ability. (II) The Auxiliary Information Embedding (AIE) is proposed to improve the robustness of the model by plugging a learnable camera/viewpoint embedding into Vit. Experimental results on several benchmarks indicate that our method is superior to many advanced vehicle ReID methods.