IEICE global.ieice.org Site

Keyword Search Result

[Keyword] visual words(6hit)

1-6hit

Improving Image Pair Selection for Large Scale Structure from Motion by Introducing Modified Simpson Coefficient
Takaharu KATO Ikuko SHIMIZU Tomas PAJDLA

PAPER-Image Recognition, Computer Vision

Pubricized:
2022/06/08
Vol:
E105-D No:9
Page(s):
1590-1599
Selecting visually overlapping image pairs without any prior information is an essential task of large-scale structure from motion (SfM) pipelines. To address this problem, many state-of-the-art image retrieval systems adopt the idea of bag of visual words (BoVW) for computing image-pair similarity. In this paper, we present a method for improving the image pair selection using BoVW. Our method combines a conventional vector-based approach and a set-based approach. For the set similarity, we introduce a modified version of the Simpson (m-Simpson) coefficient. We show the advantage of this measure over three typical set similarity measures and demonstrate that the combination of vector similarity and the m-Simpson coefficient effectively reduces false positives and increases accuracy. To discuss the choice of vocabulary construction, we prepared both a sampled vocabulary on an evaluation dataset and a basic pre-trained vocabulary on a training dataset. In addition, we tested our method on vocabularies of different sizes. Our experimental results show that the proposed method dramatically improves precision scores especially on the sampled vocabulary and performs better than the state-of-the-art methods that use pre-trained vocabularies. We further introduce a method to determine the k value of top-k relevant searches for each image and show that it obtains higher precision at the same recall.
Topic-Based Knowledge Transfer Algorithm for Cross-View Action Recognition
Changhong CHEN Shunqing YANG Zongliang GAN

LETTER-Pattern Recognition

Vol:
E97-D No:3
Page(s):
614-617
Cross-view action recognition is a challenging research field for human motion analysis. Appearance-based features are not credible if the viewpoint changes. In this paper, a new framework is proposed for cross-view action recognition by topic based knowledge transfer. First, Spatio-temporal descriptors are extracted from the action videos and each video is modeled by a bag of visual words (BoVW) based on the codebook constructed by the k-means cluster algorithm. Second, Latent Dirichlet Allocation (LDA) is employed to assign topics for the BoVW representation. The topic distribution of visual words (ToVW) is normalized and taken to be the feature vector. Third, in order to bridge different views, we transform ToVW into bilingual ToVW by constructing bilingual dictionaries, which guarantee that the same action has the same representation from different views. We demonstrate the effectiveness of the proposed algorithm on the IXMAS multi-view dataset.
Online Learned Player Recognition Model Based Soccer Player Tracking and Labeling for Long-Shot Scenes
Weicun XU Qingjie ZHAO Yuxia WANG Xuanya LI

PAPER-Pattern Recognition

Vol:
E97-D No:1
Page(s):
119-129
Soccer player tracking and labeling suffer from the similar appearance of the players in the same team, especially in long-shot scenes where the faces and the numbers of the players are too blurry to identify. In this paper, we propose an efficient multi-player tracking system. The tracking system takes the detection responses of a human detector as inputs. To realize real-time player detection, we generate a spatial proposal to minimize the scanning scope of the detector. The tracking system utilizes the discriminative appearance models trained using the online Boosting method to reduce data-association ambiguity caused by the appearance similarity of the players. We also propose to build an online learned player recognition model which can be embedded in the tracking system to approach online player recognition and labeling in tracking applications for long-shot scenes by two stages. At the first stage, to build the model, we utilize the fast k-means clustering method instead of classic k-means clustering to build and update a visual word vocabulary in an efficient online manner, using the informative descriptors extracted from the training samples drawn at each time step of multi-player tracking. The first stage finishes when the vocabulary is ready. At the second stage, given the obtained visual word vocabulary, an incremental vector quantization strategy is used to recognize and label each tracked player. We also perform importance recognition validation to avoid mistakenly recognizing an outlier, namely, people we do not need to recognize, as a player. Both quantitative and qualitative experimental results on the long-shot video clips of a real soccer game video demonstrate that, the proposed player recognition model performs much better than some state-of-the-art online learned models, and our tracking system also performs quite effectively even under very complicated situations.
Incorporating Contextual Information into Bag-of-Visual-Words Framework for Effective Object Categorization
Shuang BAI Tetsuya MATSUMOTO Yoshinori TAKEUCHI Hiroaki KUDO Noboru OHNISHI

PAPER-Image Recognition, Computer Vision

Vol:
E95-D No:12
Page(s):
3060-3068
Bag of visual words is a promising approach to object categorization. However, in this framework, ambiguity exists in patch encoding by visual words, due to information loss caused by vector quantization. In this paper, we propose to incorporate patch-level contextual information into bag of visual words for reducing the ambiguity mentioned above. To achieve this goal, we construct a hierarchical codebook in which visual words in the upper hierarchy contain contextual information of visual words in the lower hierarchy. In the proposed method, from each sample point we extract patches of different scales, all of which are described by the SIFT descriptor. Then, we build the hierarchical codebook in which visual words created from coarse scale patches are put in the upper hierarchy, while visual words created from fine scale patches are put in the lower hierarchy. At the same time, by employing the corresponding relationship among these extracted patches, visual words in different hierarchies are associated with each other. After that, we design a method to assign patch pairs, whose patches are extracted from the same sample point, to the constructed codebook. Furthermore, to utilize image information effectively, we implement the proposed method based on two sets of features which are extracted through different sampling strategies and fuse them using a probabilistic approach. Finally, we evaluate the proposed method on dataset Caltech 101 and dataset Caltech 256. Experimental results demonstrate the effectiveness of the proposed method.
Scene Categorization with Classified Codebook Model
Xu YANG De XU Songhe FENG Yingjun TANG Shuoyan LIU

LETTER-Image Recognition, Computer Vision

Vol:
E94-D No:6
Page(s):
1349-1352
This paper presents an efficient yet powerful codebook model, named classified codebook model, to categorize natural scene category. The current codebook model typically resorts to large codebook to obtain higher performance for scene categorization, which severely limits the practical applicability of the model. Our model formulates the codebook model with the theory of vector quantization, and thus uses the famous technique of classified vector quantization for scene-category modeling. The significant feature in our model is that it is beneficial for scene categorization, especially at small codebook size, while saving much computation complexity for quantization. We evaluate the proposed model on a well-known challenging scene dataset: 15 Natural Scenes. The experiments have demonstrated that our model can decrease the computation time for codebook generation. What is more, our model can get better performance for scene categorization, and the gain of performance becomes more pronounced at small codebook size.
Discriminating Semantic Visual Words for Scene Classification
Shuoyan LIU De XU Songhe FENG

PAPER-Pattern Recognition

Vol:
E93-D No:6
Page(s):
1580-1588
Bag-of-Visual-Words representation has recently become popular for scene classification. However, learning the visual words in an unsupervised manner suffers from the problem when faced these patches with similar appearances corresponding to distinct semantic concepts. This paper proposes a novel supervised learning framework, which aims at taking full advantage of label information to address the problem. Specifically, the Gaussian Mixture Modeling (GMM) is firstly applied to obtain "semantic interpretation" of patches using scene labels. Each scene induces a probability density on the low-level visual features space, and patches are represented as vectors of posterior scene semantic concepts probabilities. And then the Information Bottleneck (IB) algorithm is introduce to cluster the patches into "visual words" via a supervised manner, from the perspective of semantic interpretations. Such operation can maximize the semantic information of the visual words. Once obtained the visual words, the appearing frequency of the corresponding visual words in a given image forms a histogram, which can be subsequently used in the scene categorization task via the Support Vector Machine (SVM) classifier. Experiments on a challenging dataset show that the proposed visual words better perform scene classification task than most existing methods.