1-3hit |
Tomoya NITTA Tsubasa HIRAKAWA Hironobu FUJIYOSHI Toru TAMAKI
In this paper we propose an extension of the Attention Branch Network (ABN) by using instance segmentation for generating sharper attention maps for action recognition. Methods for visual explanation such as Grad-CAM usually generate blurry maps which are not intuitive for humans to understand, particularly in recognizing actions of people in videos. Our proposed method, Object-ABN, tackles this issue by introducing a new mask loss that makes the generated attention maps close to the instance segmentation result. Further the Prototype Conformity (PC) loss and multiple attention maps are introduced to enhance the sharpness of the maps and improve the performance of classification. Experimental results with UCF101 and SSv2 shows that the generated maps by the proposed method are much clearer qualitatively and quantitatively than those of the original ABN.
Lin DU Chang TIAN Mingyong ZENG Jiabao WANG Shanshan JIAO Qing SHEN Guodong WU
Feature learning based on deep network has been verified as beneficial for person re-identification (Re-ID) in recent years. However, most researches use a single network as the baseline, without considering the fusion of different deep features. By analyzing the attention maps of different networks, we find that the information learned by different networks can complement each other. Therefore, a novel Dual Network Fusion (DNF) framework is proposed. DNF is designed with a trunk branch and two auxiliary branches. In the trunk branch, deep features are cascaded directly along the channel direction. One of the auxiliary branch is channel attention branch, which is used to allocate weight for different deep features. Another one is multi-loss training branch. To verify the performance of DNF, we test it on three benchmark datasets, including CUHK03NP, Market-1501 and DukeMTMC-reID. The results show that the effect of using DNF is significantly better than a single network and is comparable to most state-of-the-art methods.
Kazuhiro HOTTA Masaru TANAKA Takio KURITA Taketoshi MISHIMA
This paper presents Dynamic Attention Map by Ising model for face detection. In general, a face detector can not know where faces there are and how many faces there are in advance. Therefore, the face detector must search the whole regions on the image and requires much computational time. To speed up the search, the information obtained at previous search points should be used effectively. In order to use the likelihood of face obtained at previous search points effectively, Ising model is adopted to face detection. Ising model has the two-state spins; "up" and "down". The state of a spin is updated by depending on the neighboring spins and an external magnetic field. Ising spins are assigned to "face" and "non-face" states of face detection. In addition, the measured likelihood of face is integrated into the energy function of Ising model as the external magnetic field. It is confirmed that face candidates would be reduced effectively by spin flip dynamics. To improve the search performance further, the single level Ising search method is extended to the multilevel Ising search. The interactions between two layers which are characterized by the renormalization group method is used to reduce the face candidates. The effectiveness of the multilevel Ising search method is also confirmed by the comparison with the single level Ising search method.