Self-Supervised Learning of Video Representation for Anticipating Actions in Early Stage

Yinan LIU; Qingbo WU; Liangzhi TANG; Linfeng XU

doi:10.1587/transinf.2018EDL8013

Self-Supervised Learning of Video Representation for Anticipating Actions in Early Stage

Yinan LIU, Qingbo WU, Liangzhi TANG, Linfeng XU

Full Text Views

0

Cite this

Summary :

In this paper, we propose a novel self-supervised learning of video representation which is capable to anticipate the video category by only reading its short clip. The key idea is that we employ the Siamese convolutional network to model the self-supervised feature learning as two different image matching problems. By using frame encoding, the proposed video representation could be extracted from different temporal scales. We refine the training process via a motion-based temporal segmentation strategy. The learned representations for videos can be not only applied to action anticipation, but also to action recognition. We verify the effectiveness of the proposed approach on both action anticipation and action recognition using two datasets namely UCF101 and HMDB51. The experiments show that we can achieve comparable results with the state-of-the-art self-supervised learning methods on both tasks.

Publication: IEICE TRANSACTIONS on Information Vol.E101-D No.5 pp.1449-1452

Publication Date: 2018/05/01

Publicized: 2018/02/21

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2018EDL8013

Type of Manuscript: LETTER

Category: Pattern Recognition

Authors

Yinan LIU
  University of Electronic Science and Technology of China
Qingbo WU
  University of Electronic Science and Technology of China
Liangzhi TANG
  University of Electronic Science and Technology of China
Linfeng XU
  University of Electronic Science and Technology of China

Keyword

action anticipation, video frame encoding, convolutional neural network

Cite this

Copy

Yinan LIU, Qingbo WU, Liangzhi TANG, Linfeng XU, "Self-Supervised Learning of Video Representation for Anticipating Actions in Early Stage" in IEICE TRANSACTIONS on Information, vol. E101-D, no. 5, pp. 1449-1452, May 2018, doi: 10.1587/transinf.2018EDL8013.
Abstract: In this paper, we propose a novel self-supervised learning of video representation which is capable to anticipate the video category by only reading its short clip. The key idea is that we employ the Siamese convolutional network to model the self-supervised feature learning as two different image matching problems. By using frame encoding, the proposed video representation could be extracted from different temporal scales. We refine the training process via a motion-based temporal segmentation strategy. The learned representations for videos can be not only applied to action anticipation, but also to action recognition. We verify the effectiveness of the proposed approach on both action anticipation and action recognition using two datasets namely UCF101 and HMDB51. The experiments show that we can achieve comparable results with the state-of-the-art self-supervised learning methods on both tasks.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2018EDL8013/_p

Copy

@ARTICLE{e101-d_5_1449,
author={Yinan LIU, Qingbo WU, Liangzhi TANG, Linfeng XU, },
journal={IEICE TRANSACTIONS on Information},
title={Self-Supervised Learning of Video Representation for Anticipating Actions in Early Stage},
year={2018},
volume={E101-D},
number={5},
pages={1449-1452},
abstract={In this paper, we propose a novel self-supervised learning of video representation which is capable to anticipate the video category by only reading its short clip. The key idea is that we employ the Siamese convolutional network to model the self-supervised feature learning as two different image matching problems. By using frame encoding, the proposed video representation could be extracted from different temporal scales. We refine the training process via a motion-based temporal segmentation strategy. The learned representations for videos can be not only applied to action anticipation, but also to action recognition. We verify the effectiveness of the proposed approach on both action anticipation and action recognition using two datasets namely UCF101 and HMDB51. The experiments show that we can achieve comparable results with the state-of-the-art self-supervised learning methods on both tasks.},
keywords={},
doi={10.1587/transinf.2018EDL8013},
ISSN={1745-1361},
month={May},}

Copy

TY - JOUR
TI - Self-Supervised Learning of Video Representation for Anticipating Actions in Early Stage
T2 - IEICE TRANSACTIONS on Information
SP - 1449
EP - 1452
AU - Yinan LIU
AU - Qingbo WU
AU - Liangzhi TANG
AU - Linfeng XU
PY - 2018
DO - 10.1587/transinf.2018EDL8013
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E101-D
IS - 5
JA - IEICE TRANSACTIONS on Information
Y1 - May 2018
AB - In this paper, we propose a novel self-supervised learning of video representation which is capable to anticipate the video category by only reading its short clip. The key idea is that we employ the Siamese convolutional network to model the self-supervised feature learning as two different image matching problems. By using frame encoding, the proposed video representation could be extracted from different temporal scales. We refine the training process via a motion-based temporal segmentation strategy. The learned representations for videos can be not only applied to action anticipation, but also to action recognition. We verify the effectiveness of the proposed approach on both action anticipation and action recognition using two datasets namely UCF101 and HMDB51. The experiments show that we can achieve comparable results with the state-of-the-art self-supervised learning methods on both tasks.
ER -