Traditional action recognition approaches use pre-defined rigid areas to process the space-time information, e.g. spatial pyramids, cuboids. However, most action categories happen in an unconstrained manner, that is, the same action in different videos can happen at different places. Thus we need a better video representation to deal with the space-time variations. In this paper, we introduce the idea of mining spatial temporal saliency. To better handle the uniqueness of each video, we use a space-time over-segmentation approach, e.g. supervoxel. We choose three different saliency measures that take not only the appearance cues, but also the motion cues into consideration. Furthermore, we design a category-specific mining process to find the discriminative power in each action category. Experiments on action recognition datasets such as UCF11 and HMDB51 show that the proposed spatial temporal saliency video representation can match or surpass some of the state-of-the-art alternatives in the task of action recognition.
Yinan LIU
University of Electronic Science and Technology of China
Qingbo WU
University of Electronic Science and Technology of China
Linfeng XU
University of Electronic Science and Technology of China
Bo WU
University of Electronic Science and Technology of China
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Yinan LIU, Qingbo WU, Linfeng XU, Bo WU, "Mining Spatial Temporal Saliency Structure for Action Recognition" in IEICE TRANSACTIONS on Information,
vol. E99-D, no. 10, pp. 2643-2646, October 2016, doi: 10.1587/transinf.2016EDL8093.
Abstract: Traditional action recognition approaches use pre-defined rigid areas to process the space-time information, e.g. spatial pyramids, cuboids. However, most action categories happen in an unconstrained manner, that is, the same action in different videos can happen at different places. Thus we need a better video representation to deal with the space-time variations. In this paper, we introduce the idea of mining spatial temporal saliency. To better handle the uniqueness of each video, we use a space-time over-segmentation approach, e.g. supervoxel. We choose three different saliency measures that take not only the appearance cues, but also the motion cues into consideration. Furthermore, we design a category-specific mining process to find the discriminative power in each action category. Experiments on action recognition datasets such as UCF11 and HMDB51 show that the proposed spatial temporal saliency video representation can match or surpass some of the state-of-the-art alternatives in the task of action recognition.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2016EDL8093/_p
Copy
@ARTICLE{e99-d_10_2643,
author={Yinan LIU, Qingbo WU, Linfeng XU, Bo WU, },
journal={IEICE TRANSACTIONS on Information},
title={Mining Spatial Temporal Saliency Structure for Action Recognition},
year={2016},
volume={E99-D},
number={10},
pages={2643-2646},
abstract={Traditional action recognition approaches use pre-defined rigid areas to process the space-time information, e.g. spatial pyramids, cuboids. However, most action categories happen in an unconstrained manner, that is, the same action in different videos can happen at different places. Thus we need a better video representation to deal with the space-time variations. In this paper, we introduce the idea of mining spatial temporal saliency. To better handle the uniqueness of each video, we use a space-time over-segmentation approach, e.g. supervoxel. We choose three different saliency measures that take not only the appearance cues, but also the motion cues into consideration. Furthermore, we design a category-specific mining process to find the discriminative power in each action category. Experiments on action recognition datasets such as UCF11 and HMDB51 show that the proposed spatial temporal saliency video representation can match or surpass some of the state-of-the-art alternatives in the task of action recognition.},
keywords={},
doi={10.1587/transinf.2016EDL8093},
ISSN={1745-1361},
month={October},}
Copy
TY - JOUR
TI - Mining Spatial Temporal Saliency Structure for Action Recognition
T2 - IEICE TRANSACTIONS on Information
SP - 2643
EP - 2646
AU - Yinan LIU
AU - Qingbo WU
AU - Linfeng XU
AU - Bo WU
PY - 2016
DO - 10.1587/transinf.2016EDL8093
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E99-D
IS - 10
JA - IEICE TRANSACTIONS on Information
Y1 - October 2016
AB - Traditional action recognition approaches use pre-defined rigid areas to process the space-time information, e.g. spatial pyramids, cuboids. However, most action categories happen in an unconstrained manner, that is, the same action in different videos can happen at different places. Thus we need a better video representation to deal with the space-time variations. In this paper, we introduce the idea of mining spatial temporal saliency. To better handle the uniqueness of each video, we use a space-time over-segmentation approach, e.g. supervoxel. We choose three different saliency measures that take not only the appearance cues, but also the motion cues into consideration. Furthermore, we design a category-specific mining process to find the discriminative power in each action category. Experiments on action recognition datasets such as UCF11 and HMDB51 show that the proposed spatial temporal saliency video representation can match or surpass some of the state-of-the-art alternatives in the task of action recognition.
ER -