Action Recognition Using Low-Rank Sparse Representation

Shilei CHENG; Song GU; Maoquan YE; Mei XIE

doi:10.1587/transinf.2017EDL8176

IEICE TRANSACTIONS on Information

Action Recognition Using Low-Rank Sparse Representation

Shilei CHENG, Song GU, Maoquan YE, Mei XIE

Full Text Views

0

Cite this

Summary :

Human action recognition in videos draws huge research interests in computer vision. The Bag-of-Word model is quite commonly used to obtain the video level representations, however, BoW model roughly assigns each feature vector to its nearest visual word and the collection of unordered words ignores the interest points' spatial information, inevitably causing nontrivial quantization errors and impairing improvements on classification rates. To address these drawbacks, we propose an approach for action recognition by encoding spatio-temporal log Euclidean covariance matrix (ST-LECM) features within the low-rank and sparse representation framework. Motivated by low rank matrix recovery, local descriptors in a spatial temporal neighborhood have similar representation and should be approximately low rank. The learned coefficients can not only capture the global data structures, but also preserve consistent. Experimental results showed that the proposed approach yields excellent recognition performance on synthetic video datasets and are robust to action variability, view variations and partial occlusion.

Publication: IEICE TRANSACTIONS on Information Vol.E101-D No.3 pp.830-834

Publication Date: 2018/03/01

Publicized: 2017/11/24

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2017EDL8176

Type of Manuscript: LETTER

Category: Image Recognition, Computer Vision

Authors

Shilei CHENG
  University of Electronic Science and Technology of China
Song GU
  Chengdu Aeronautic Polytechnic
Maoquan YE
  University of Electronic Science and Technology of China
Mei XIE
  University of Electronic Science and Technology of China

Keyword

human action recognition, low-rank sparse representation, bag of word model, sparse coding representation, low-rank representation

Cite this

Copy

Shilei CHENG, Song GU, Maoquan YE, Mei XIE, "Action Recognition Using Low-Rank Sparse Representation" in IEICE TRANSACTIONS on Information, vol. E101-D, no. 3, pp. 830-834, March 2018, doi: 10.1587/transinf.2017EDL8176.
Abstract: Human action recognition in videos draws huge research interests in computer vision. The Bag-of-Word model is quite commonly used to obtain the video level representations, however, BoW model roughly assigns each feature vector to its nearest visual word and the collection of unordered words ignores the interest points' spatial information, inevitably causing nontrivial quantization errors and impairing improvements on classification rates. To address these drawbacks, we propose an approach for action recognition by encoding spatio-temporal log Euclidean covariance matrix (ST-LECM) features within the low-rank and sparse representation framework. Motivated by low rank matrix recovery, local descriptors in a spatial temporal neighborhood have similar representation and should be approximately low rank. The learned coefficients can not only capture the global data structures, but also preserve consistent. Experimental results showed that the proposed approach yields excellent recognition performance on synthetic video datasets and are robust to action variability, view variations and partial occlusion.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2017EDL8176/_p

Copy

@ARTICLE{e101-d_3_830,
author={Shilei CHENG, Song GU, Maoquan YE, Mei XIE, },
journal={IEICE TRANSACTIONS on Information},
title={Action Recognition Using Low-Rank Sparse Representation},
year={2018},
volume={E101-D},
number={3},
pages={830-834},
abstract={Human action recognition in videos draws huge research interests in computer vision. The Bag-of-Word model is quite commonly used to obtain the video level representations, however, BoW model roughly assigns each feature vector to its nearest visual word and the collection of unordered words ignores the interest points' spatial information, inevitably causing nontrivial quantization errors and impairing improvements on classification rates. To address these drawbacks, we propose an approach for action recognition by encoding spatio-temporal log Euclidean covariance matrix (ST-LECM) features within the low-rank and sparse representation framework. Motivated by low rank matrix recovery, local descriptors in a spatial temporal neighborhood have similar representation and should be approximately low rank. The learned coefficients can not only capture the global data structures, but also preserve consistent. Experimental results showed that the proposed approach yields excellent recognition performance on synthetic video datasets and are robust to action variability, view variations and partial occlusion.},
keywords={},
doi={10.1587/transinf.2017EDL8176},
ISSN={1745-1361},
month={March},}

Copy

TY - JOUR
TI - Action Recognition Using Low-Rank Sparse Representation
T2 - IEICE TRANSACTIONS on Information
SP - 830
EP - 834
AU - Shilei CHENG
AU - Song GU
AU - Maoquan YE
AU - Mei XIE
PY - 2018
DO - 10.1587/transinf.2017EDL8176
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E101-D
IS - 3
JA - IEICE TRANSACTIONS on Information
Y1 - March 2018
AB - Human action recognition in videos draws huge research interests in computer vision. The Bag-of-Word model is quite commonly used to obtain the video level representations, however, BoW model roughly assigns each feature vector to its nearest visual word and the collection of unordered words ignores the interest points' spatial information, inevitably causing nontrivial quantization errors and impairing improvements on classification rates. To address these drawbacks, we propose an approach for action recognition by encoding spatio-temporal log Euclidean covariance matrix (ST-LECM) features within the low-rank and sparse representation framework. Motivated by low rank matrix recovery, local descriptors in a spatial temporal neighborhood have similar representation and should be approximately low rank. The learned coefficients can not only capture the global data structures, but also preserve consistent. Experimental results showed that the proposed approach yields excellent recognition performance on synthetic video datasets and are robust to action variability, view variations and partial occlusion.
ER -

IEICE TRANSACTIONS on Information