Efficient Two-Step Middle-Level Part Feature Extraction for Fine-Grained Visual Categorization

Hideki NAKAYAMA; Tomoya TSUDA

doi:10.1587/transinf.2015EDP7358

IEICE TRANSACTIONS on Information

Efficient Two-Step Middle-Level Part Feature Extraction for Fine-Grained Visual Categorization

Hideki NAKAYAMA, Tomoya TSUDA

Full Text Views

0

Cite this

Summary :

Fine-grained visual categorization (FGVC) has drawn increasing attention as an emerging research field in recent years. In contrast to generic-domain visual recognition, FGVC is characterized by high intra-class and subtle inter-class variations. To distinguish conceptually and visually similar categories, highly discriminative visual features must be extracted. Moreover, FGVC has highly specialized and task-specific nature. It is not always easy to obtain a sufficiently large-scale training dataset. Therefore, the key to success in practical FGVC systems is to efficiently exploit discriminative features from a limited number of training examples. In this paper, we propose an efficient two-step dimensionality compression method to derive compact middle-level part-based features. To do this, we compare both space-first and feature-first convolution schemes and investigate their effectiveness. Our approach is based on simple linear algebra and analytic solutions, and is highly scalable compared with the current one-vs-one or one-vs-all approach, making it possible to quickly train middle-level features from a number of pairwise part regions. We experimentally show the effectiveness of our method using the standard Caltech-Birds and Stanford-Cars datasets.

Publication: IEICE TRANSACTIONS on Information Vol.E99-D No.6 pp.1626-1634

Publication Date: 2016/06/01

Publicized: 2016/02/23

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2015EDP7358

Type of Manuscript: PAPER

Category: Image Recognition, Computer Vision

Authors

Hideki NAKAYAMA
University of Tokyo
Tomoya TSUDA
University of Tokyo

Keyword

image classification, fine-grained categorization, part-based features, dimensionality reduction

Cite this

Copy

Hideki NAKAYAMA, Tomoya TSUDA, "Efficient Two-Step Middle-Level Part Feature Extraction for Fine-Grained Visual Categorization" in IEICE TRANSACTIONS on Information, vol. E99-D, no. 6, pp. 1626-1634, June 2016, doi: 10.1587/transinf.2015EDP7358.
Abstract: Fine-grained visual categorization (FGVC) has drawn increasing attention as an emerging research field in recent years. In contrast to generic-domain visual recognition, FGVC is characterized by high intra-class and subtle inter-class variations. To distinguish conceptually and visually similar categories, highly discriminative visual features must be extracted. Moreover, FGVC has highly specialized and task-specific nature. It is not always easy to obtain a sufficiently large-scale training dataset. Therefore, the key to success in practical FGVC systems is to efficiently exploit discriminative features from a limited number of training examples. In this paper, we propose an efficient two-step dimensionality compression method to derive compact middle-level part-based features. To do this, we compare both space-first and feature-first convolution schemes and investigate their effectiveness. Our approach is based on simple linear algebra and analytic solutions, and is highly scalable compared with the current one-vs-one or one-vs-all approach, making it possible to quickly train middle-level features from a number of pairwise part regions. We experimentally show the effectiveness of our method using the standard Caltech-Birds and Stanford-Cars datasets.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2015EDP7358/_p

Copy

@ARTICLE{e99-d_6_1626,
author={Hideki NAKAYAMA, Tomoya TSUDA, },
journal={IEICE TRANSACTIONS on Information},
title={Efficient Two-Step Middle-Level Part Feature Extraction for Fine-Grained Visual Categorization},
year={2016},
volume={E99-D},
number={6},
pages={1626-1634},
abstract={Fine-grained visual categorization (FGVC) has drawn increasing attention as an emerging research field in recent years. In contrast to generic-domain visual recognition, FGVC is characterized by high intra-class and subtle inter-class variations. To distinguish conceptually and visually similar categories, highly discriminative visual features must be extracted. Moreover, FGVC has highly specialized and task-specific nature. It is not always easy to obtain a sufficiently large-scale training dataset. Therefore, the key to success in practical FGVC systems is to efficiently exploit discriminative features from a limited number of training examples. In this paper, we propose an efficient two-step dimensionality compression method to derive compact middle-level part-based features. To do this, we compare both space-first and feature-first convolution schemes and investigate their effectiveness. Our approach is based on simple linear algebra and analytic solutions, and is highly scalable compared with the current one-vs-one or one-vs-all approach, making it possible to quickly train middle-level features from a number of pairwise part regions. We experimentally show the effectiveness of our method using the standard Caltech-Birds and Stanford-Cars datasets.},
keywords={},
doi={10.1587/transinf.2015EDP7358},
ISSN={1745-1361},
month={June},}

Copy

TY - JOUR
TI - Efficient Two-Step Middle-Level Part Feature Extraction for Fine-Grained Visual Categorization
T2 - IEICE TRANSACTIONS on Information
SP - 1626
EP - 1634
AU - Hideki NAKAYAMA
AU - Tomoya TSUDA
PY - 2016
DO - 10.1587/transinf.2015EDP7358
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E99-D
IS - 6
JA - IEICE TRANSACTIONS on Information
Y1 - June 2016
AB - Fine-grained visual categorization (FGVC) has drawn increasing attention as an emerging research field in recent years. In contrast to generic-domain visual recognition, FGVC is characterized by high intra-class and subtle inter-class variations. To distinguish conceptually and visually similar categories, highly discriminative visual features must be extracted. Moreover, FGVC has highly specialized and task-specific nature. It is not always easy to obtain a sufficiently large-scale training dataset. Therefore, the key to success in practical FGVC systems is to efficiently exploit discriminative features from a limited number of training examples. In this paper, we propose an efficient two-step dimensionality compression method to derive compact middle-level part-based features. To do this, we compare both space-first and feature-first convolution schemes and investigate their effectiveness. Our approach is based on simple linear algebra and analytic solutions, and is highly scalable compared with the current one-vs-one or one-vs-all approach, making it possible to quickly train middle-level features from a number of pairwise part regions. We experimentally show the effectiveness of our method using the standard Caltech-Birds and Stanford-Cars datasets.
ER -

IEICE TRANSACTIONS on Information