Human action recognition in videos draws huge research interests in computer vision. The Bag-of-Word model is quite commonly used to obtain the video level representations, however, BoW model roughly assigns each feature vector to its nearest visual word and the collection of unordered words ignores the interest points' spatial information, inevitably causing nontrivial quantization errors and impairing improvements on classification rates. To address these drawbacks, we propose an approach for action recognition by encoding spatio-temporal log Euclidean covariance matrix (ST-LECM) features within the low-rank and sparse representation framework. Motivated by low rank matrix recovery, local descriptors in a spatial temporal neighborhood have similar representation and should be approximately low rank. The learned coefficients can not only capture the global data structures, but also preserve consistent. Experimental results showed that the proposed approach yields excellent recognition performance on synthetic video datasets and are robust to action variability, view variations and partial occlusion.
Shilei CHENG
University of Electronic Science and Technology of China
Song GU
Chengdu Aeronautic Polytechnic
Maoquan YE
University of Electronic Science and Technology of China
Mei XIE
University of Electronic Science and Technology of China
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Shilei CHENG, Song GU, Maoquan YE, Mei XIE, "Action Recognition Using Low-Rank Sparse Representation" in IEICE TRANSACTIONS on Information,
vol. E101-D, no. 3, pp. 830-834, March 2018, doi: 10.1587/transinf.2017EDL8176.
Abstract: Human action recognition in videos draws huge research interests in computer vision. The Bag-of-Word model is quite commonly used to obtain the video level representations, however, BoW model roughly assigns each feature vector to its nearest visual word and the collection of unordered words ignores the interest points' spatial information, inevitably causing nontrivial quantization errors and impairing improvements on classification rates. To address these drawbacks, we propose an approach for action recognition by encoding spatio-temporal log Euclidean covariance matrix (ST-LECM) features within the low-rank and sparse representation framework. Motivated by low rank matrix recovery, local descriptors in a spatial temporal neighborhood have similar representation and should be approximately low rank. The learned coefficients can not only capture the global data structures, but also preserve consistent. Experimental results showed that the proposed approach yields excellent recognition performance on synthetic video datasets and are robust to action variability, view variations and partial occlusion.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2017EDL8176/_p
Copy
@ARTICLE{e101-d_3_830,
author={Shilei CHENG, Song GU, Maoquan YE, Mei XIE, },
journal={IEICE TRANSACTIONS on Information},
title={Action Recognition Using Low-Rank Sparse Representation},
year={2018},
volume={E101-D},
number={3},
pages={830-834},
abstract={Human action recognition in videos draws huge research interests in computer vision. The Bag-of-Word model is quite commonly used to obtain the video level representations, however, BoW model roughly assigns each feature vector to its nearest visual word and the collection of unordered words ignores the interest points' spatial information, inevitably causing nontrivial quantization errors and impairing improvements on classification rates. To address these drawbacks, we propose an approach for action recognition by encoding spatio-temporal log Euclidean covariance matrix (ST-LECM) features within the low-rank and sparse representation framework. Motivated by low rank matrix recovery, local descriptors in a spatial temporal neighborhood have similar representation and should be approximately low rank. The learned coefficients can not only capture the global data structures, but also preserve consistent. Experimental results showed that the proposed approach yields excellent recognition performance on synthetic video datasets and are robust to action variability, view variations and partial occlusion.},
keywords={},
doi={10.1587/transinf.2017EDL8176},
ISSN={1745-1361},
month={March},}
Copy
TY - JOUR
TI - Action Recognition Using Low-Rank Sparse Representation
T2 - IEICE TRANSACTIONS on Information
SP - 830
EP - 834
AU - Shilei CHENG
AU - Song GU
AU - Maoquan YE
AU - Mei XIE
PY - 2018
DO - 10.1587/transinf.2017EDL8176
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E101-D
IS - 3
JA - IEICE TRANSACTIONS on Information
Y1 - March 2018
AB - Human action recognition in videos draws huge research interests in computer vision. The Bag-of-Word model is quite commonly used to obtain the video level representations, however, BoW model roughly assigns each feature vector to its nearest visual word and the collection of unordered words ignores the interest points' spatial information, inevitably causing nontrivial quantization errors and impairing improvements on classification rates. To address these drawbacks, we propose an approach for action recognition by encoding spatio-temporal log Euclidean covariance matrix (ST-LECM) features within the low-rank and sparse representation framework. Motivated by low rank matrix recovery, local descriptors in a spatial temporal neighborhood have similar representation and should be approximately low rank. The learned coefficients can not only capture the global data structures, but also preserve consistent. Experimental results showed that the proposed approach yields excellent recognition performance on synthetic video datasets and are robust to action variability, view variations and partial occlusion.
ER -