Recent studies have obtained superior performance in image recognition tasks by using, as an image representation, the fully connected layer activations of Convolutional Neural Networks (CNN) trained with various kinds of images. However, the CNN representation is not very suitable for fine-grained image recognition tasks involving food image recognition. For improving performance of the CNN representation in food image recognition, we propose a novel image representation that is comprised of the covariances of convolutional layer feature maps. In the experiment on the ETHZ Food-101 dataset, our method achieved 58.65% averaged accuracy, which outperforms the previous methods such as the Bag-of-Visual-Words Histogram, the Improved Fisher Vector, and CNN-SVM.
Atsushi TATSUMA
Toyohashi University of Technology
Masaki AONO
Toyohashi University of Technology
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Atsushi TATSUMA, Masaki AONO, "Food Image Recognition Using Covariance of Convolutional Layer Feature Maps" in IEICE TRANSACTIONS on Information,
vol. E99-D, no. 6, pp. 1711-1715, June 2016, doi: 10.1587/transinf.2015EDL8212.
Abstract: Recent studies have obtained superior performance in image recognition tasks by using, as an image representation, the fully connected layer activations of Convolutional Neural Networks (CNN) trained with various kinds of images. However, the CNN representation is not very suitable for fine-grained image recognition tasks involving food image recognition. For improving performance of the CNN representation in food image recognition, we propose a novel image representation that is comprised of the covariances of convolutional layer feature maps. In the experiment on the ETHZ Food-101 dataset, our method achieved 58.65% averaged accuracy, which outperforms the previous methods such as the Bag-of-Visual-Words Histogram, the Improved Fisher Vector, and CNN-SVM.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2015EDL8212/_p
Copy
@ARTICLE{e99-d_6_1711,
author={Atsushi TATSUMA, Masaki AONO, },
journal={IEICE TRANSACTIONS on Information},
title={Food Image Recognition Using Covariance of Convolutional Layer Feature Maps},
year={2016},
volume={E99-D},
number={6},
pages={1711-1715},
abstract={Recent studies have obtained superior performance in image recognition tasks by using, as an image representation, the fully connected layer activations of Convolutional Neural Networks (CNN) trained with various kinds of images. However, the CNN representation is not very suitable for fine-grained image recognition tasks involving food image recognition. For improving performance of the CNN representation in food image recognition, we propose a novel image representation that is comprised of the covariances of convolutional layer feature maps. In the experiment on the ETHZ Food-101 dataset, our method achieved 58.65% averaged accuracy, which outperforms the previous methods such as the Bag-of-Visual-Words Histogram, the Improved Fisher Vector, and CNN-SVM.},
keywords={},
doi={10.1587/transinf.2015EDL8212},
ISSN={1745-1361},
month={June},}
Copy
TY - JOUR
TI - Food Image Recognition Using Covariance of Convolutional Layer Feature Maps
T2 - IEICE TRANSACTIONS on Information
SP - 1711
EP - 1715
AU - Atsushi TATSUMA
AU - Masaki AONO
PY - 2016
DO - 10.1587/transinf.2015EDL8212
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E99-D
IS - 6
JA - IEICE TRANSACTIONS on Information
Y1 - June 2016
AB - Recent studies have obtained superior performance in image recognition tasks by using, as an image representation, the fully connected layer activations of Convolutional Neural Networks (CNN) trained with various kinds of images. However, the CNN representation is not very suitable for fine-grained image recognition tasks involving food image recognition. For improving performance of the CNN representation in food image recognition, we propose a novel image representation that is comprised of the covariances of convolutional layer feature maps. In the experiment on the ETHZ Food-101 dataset, our method achieved 58.65% averaged accuracy, which outperforms the previous methods such as the Bag-of-Visual-Words Histogram, the Improved Fisher Vector, and CNN-SVM.
ER -