Incorporating Contextual Information into Bag-of-Visual-Words Framework for Effective Object Categorization

Shuang BAI; Tetsuya MATSUMOTO; Yoshinori TAKEUCHI; Hiroaki KUDO; Noboru OHNISHI

doi:10.1587/transinf.E95.D.3060

IEICE TRANSACTIONS on Information

Incorporating Contextual Information into Bag-of-Visual-Words Framework for Effective Object Categorization

Shuang BAI, Tetsuya MATSUMOTO, Yoshinori TAKEUCHI, Hiroaki KUDO, Noboru OHNISHI

Full Text Views

0

Cite this

Summary :

Bag of visual words is a promising approach to object categorization. However, in this framework, ambiguity exists in patch encoding by visual words, due to information loss caused by vector quantization. In this paper, we propose to incorporate patch-level contextual information into bag of visual words for reducing the ambiguity mentioned above. To achieve this goal, we construct a hierarchical codebook in which visual words in the upper hierarchy contain contextual information of visual words in the lower hierarchy. In the proposed method, from each sample point we extract patches of different scales, all of which are described by the SIFT descriptor. Then, we build the hierarchical codebook in which visual words created from coarse scale patches are put in the upper hierarchy, while visual words created from fine scale patches are put in the lower hierarchy. At the same time, by employing the corresponding relationship among these extracted patches, visual words in different hierarchies are associated with each other. After that, we design a method to assign patch pairs, whose patches are extracted from the same sample point, to the constructed codebook. Furthermore, to utilize image information effectively, we implement the proposed method based on two sets of features which are extracted through different sampling strategies and fuse them using a probabilistic approach. Finally, we evaluate the proposed method on dataset Caltech 101 and dataset Caltech 256. Experimental results demonstrate the effectiveness of the proposed method.

Publication: IEICE TRANSACTIONS on Information Vol.E95-D No.12 pp.3060-3068

Publication Date: 2012/12/01

Publicized

Online ISSN: 1745-1361

DOI: 10.1587/transinf.E95.D.3060

Type of Manuscript: PAPER

Category: Image Recognition, Computer Vision

Cite this

Copy

Shuang BAI, Tetsuya MATSUMOTO, Yoshinori TAKEUCHI, Hiroaki KUDO, Noboru OHNISHI, "Incorporating Contextual Information into Bag-of-Visual-Words Framework for Effective Object Categorization" in IEICE TRANSACTIONS on Information, vol. E95-D, no. 12, pp. 3060-3068, December 2012, doi: 10.1587/transinf.E95.D.3060.
Abstract: Bag of visual words is a promising approach to object categorization. However, in this framework, ambiguity exists in patch encoding by visual words, due to information loss caused by vector quantization. In this paper, we propose to incorporate patch-level contextual information into bag of visual words for reducing the ambiguity mentioned above. To achieve this goal, we construct a hierarchical codebook in which visual words in the upper hierarchy contain contextual information of visual words in the lower hierarchy. In the proposed method, from each sample point we extract patches of different scales, all of which are described by the SIFT descriptor. Then, we build the hierarchical codebook in which visual words created from coarse scale patches are put in the upper hierarchy, while visual words created from fine scale patches are put in the lower hierarchy. At the same time, by employing the corresponding relationship among these extracted patches, visual words in different hierarchies are associated with each other. After that, we design a method to assign patch pairs, whose patches are extracted from the same sample point, to the constructed codebook. Furthermore, to utilize image information effectively, we implement the proposed method based on two sets of features which are extracted through different sampling strategies and fuse them using a probabilistic approach. Finally, we evaluate the proposed method on dataset Caltech 101 and dataset Caltech 256. Experimental results demonstrate the effectiveness of the proposed method.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.E95.D.3060/_p

Copy

@ARTICLE{e95-d_12_3060,
author={Shuang BAI, Tetsuya MATSUMOTO, Yoshinori TAKEUCHI, Hiroaki KUDO, Noboru OHNISHI, },
journal={IEICE TRANSACTIONS on Information},
title={Incorporating Contextual Information into Bag-of-Visual-Words Framework for Effective Object Categorization},
year={2012},
volume={E95-D},
number={12},
pages={3060-3068},
abstract={Bag of visual words is a promising approach to object categorization. However, in this framework, ambiguity exists in patch encoding by visual words, due to information loss caused by vector quantization. In this paper, we propose to incorporate patch-level contextual information into bag of visual words for reducing the ambiguity mentioned above. To achieve this goal, we construct a hierarchical codebook in which visual words in the upper hierarchy contain contextual information of visual words in the lower hierarchy. In the proposed method, from each sample point we extract patches of different scales, all of which are described by the SIFT descriptor. Then, we build the hierarchical codebook in which visual words created from coarse scale patches are put in the upper hierarchy, while visual words created from fine scale patches are put in the lower hierarchy. At the same time, by employing the corresponding relationship among these extracted patches, visual words in different hierarchies are associated with each other. After that, we design a method to assign patch pairs, whose patches are extracted from the same sample point, to the constructed codebook. Furthermore, to utilize image information effectively, we implement the proposed method based on two sets of features which are extracted through different sampling strategies and fuse them using a probabilistic approach. Finally, we evaluate the proposed method on dataset Caltech 101 and dataset Caltech 256. Experimental results demonstrate the effectiveness of the proposed method.},
keywords={},
doi={10.1587/transinf.E95.D.3060},
ISSN={1745-1361},
month={December},}

Copy

TY - JOUR
TI - Incorporating Contextual Information into Bag-of-Visual-Words Framework for Effective Object Categorization
T2 - IEICE TRANSACTIONS on Information
SP - 3060
EP - 3068
AU - Shuang BAI
AU - Tetsuya MATSUMOTO
AU - Yoshinori TAKEUCHI
AU - Hiroaki KUDO
AU - Noboru OHNISHI
PY - 2012
DO - 10.1587/transinf.E95.D.3060
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E95-D
IS - 12
JA - IEICE TRANSACTIONS on Information
Y1 - December 2012
AB - Bag of visual words is a promising approach to object categorization. However, in this framework, ambiguity exists in patch encoding by visual words, due to information loss caused by vector quantization. In this paper, we propose to incorporate patch-level contextual information into bag of visual words for reducing the ambiguity mentioned above. To achieve this goal, we construct a hierarchical codebook in which visual words in the upper hierarchy contain contextual information of visual words in the lower hierarchy. In the proposed method, from each sample point we extract patches of different scales, all of which are described by the SIFT descriptor. Then, we build the hierarchical codebook in which visual words created from coarse scale patches are put in the upper hierarchy, while visual words created from fine scale patches are put in the lower hierarchy. At the same time, by employing the corresponding relationship among these extracted patches, visual words in different hierarchies are associated with each other. After that, we design a method to assign patch pairs, whose patches are extracted from the same sample point, to the constructed codebook. Furthermore, to utilize image information effectively, we implement the proposed method based on two sets of features which are extracted through different sampling strategies and fuse them using a probabilistic approach. Finally, we evaluate the proposed method on dataset Caltech 101 and dataset Caltech 256. Experimental results demonstrate the effectiveness of the proposed method.
ER -

IEICE TRANSACTIONS on Information

Incorporating Contextual Information into Bag-of-Visual-Words Framework for Effective Object Categorization

Summary :

Authors

Keyword

Latest Issue

Contents

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles

IEICE TRANSACTIONS on Information

Incorporating Contextual Information into Bag-of-Visual-Words Framework for Effective Object Categorization

Summary :

Authors

Keyword

Latest Issue

Contents

Copyrights notice of machine-translated contents

Cite this

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles