1-1hit |
Dipankar DAS Yoshinori KOBAYASHI Yoshinori KUNO
This paper proposes an integrated approach to simultaneous detection and localization of multiple object categories using both generative and discriminative models. Our approach consists of first generating a set of hypotheses for each object category using a generative model (pLSA) with a bag of visual words representing each object. Based on the variation of objects within a category, the pLSA model automatically fits to an optimal number of topics. Then, the discriminative part verifies each hypothesis using a multi-class SVM classifier with merging features that combines spatial shape and appearance of an object. In the post-processing stage, environmental context information along with the probabilistic output of the SVM classifier is used to improve the overall performance of the system. Our integrated approach with merging features and context information allows reliable detection and localization of various object categories in the same image. The performance of the proposed framework is evaluated on the various standards (MIT-CSAIL, UIUC, TUD etc.) and the authors' own datasets. In experiments we achieved superior results to some state of the art methods over a number of standard datasets. An extensive experimental evaluation on up to ten diverse object categories over thousands of images demonstrates that our system works for detecting and localizing multiple objects within an image in the presence of cluttered background, substantial occlusion, and significant scale changes.