1-1hit |
Xinyu ZHU Jun ZHANG Gengsheng CHEN
Recent top-performing object detectors usually depend on a two-stage approach, which benefits from its region proposal and refining practice but suffers low detection speed. By contrast, one-stage approaches have the advantage of high efficiency while sacrifice their accuracies to some extent. In this paper, we propose a novel single-shot object detection network which inherits the merits of both. Motivated by the idea of semantic enrichment to the convolutional features within a typical deep detector, we propose two novel modules: 1) by modeling the semantic interactions between channels and the long-range dependencies between spatial positions, the self-attending module generates both channel and position attention, and enhance the original convolutional features in a self-guided manner; 2) leveraging the class-discriminative localization ability of classification-trained CNN, the semantic activating module learns a semantic meaningful convolutional response which augments low-level convolutional features with strong class-specific semantic information. The so called self-attending and semantic activating network (ASAN) achieves better accuracy than two-stage methods and is able to fulfil real-time processing. Comprehensive experiments on PASCAL VOC indicates that ASAN achieves state-of-the-art detection performance with high efficiency.