Attentive Sequences Recurrent Network for Social Relation Recognition from Video

Jinna LV; Bin WU; Yunlei ZHANG; Yunpeng XIAO

doi:10.1587/transinf.2019EDP7104

IEICE TRANSACTIONS on Information

Open Access
Attentive Sequences Recurrent Network for Social Relation Recognition from Video

Jinna LV, Bin WU, Yunlei ZHANG, Yunpeng XIAO

Full Text Views

40

Cite this

Free PDF (1.2MB)

Summary :

Recently, social relation analysis receives an increasing amount of attention from text to image data. However, social relation analysis from video is an important problem, which is lacking in the current literature. There are still some challenges: 1) it is hard to learn a satisfactory mapping function from low-level pixels to high-level social relation space; 2) how to efficiently select the most relevant information from noisy and unsegmented video. In this paper, we present an Attentive Sequences Recurrent Network model, called ASRN, to deal with the above challenges. First, in order to explore multiple clues, we design a Multiple Feature Attention (MFA) mechanism to fuse multiple visual features (i.e. image, motion, body, and face). Through this manner, we can generate an appropriate mapping function from low-level video pixels to high-level social relation space. Second, we design a sequence recurrent network based on Global and Local Attention (GLA) mechanism. Specially, an attention mechanism is used in GLA to integrate global feature with local sequence feature to select more relevant sequences for the recognition task. Therefore, the GLA module can better deal with noisy and unsegmented video. At last, extensive experiments on the SRIV dataset demonstrate the performance of our ASRN model.

Publication: IEICE TRANSACTIONS on Information Vol.E102-D No.12 pp.2568-2576

Publication Date: 2019/12/01

Publicized: 2019/09/02

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2019EDP7104

Type of Manuscript: PAPER

Category: Image Recognition, Computer Vision

Authors

Jinna LV
  Beijing University of Posts and Telecommunications,Beijing Information Science & Technology University
Bin WU
  Beijing University of Posts and Telecommunications
Yunlei ZHANG
  Beijing University of Posts and Telecommunications
Yunpeng XIAO
  Chongqing University of Posts and Telecommunications

Keyword

social relation recognition, video analysis, deep learning, LSTM, attention mechanism

Cite this

Copy

Jinna LV, Bin WU, Yunlei ZHANG, Yunpeng XIAO, "Attentive Sequences Recurrent Network for Social Relation Recognition from Video" in IEICE TRANSACTIONS on Information, vol. E102-D, no. 12, pp. 2568-2576, December 2019, doi: 10.1587/transinf.2019EDP7104.
Abstract: Recently, social relation analysis receives an increasing amount of attention from text to image data. However, social relation analysis from video is an important problem, which is lacking in the current literature. There are still some challenges: 1) it is hard to learn a satisfactory mapping function from low-level pixels to high-level social relation space; 2) how to efficiently select the most relevant information from noisy and unsegmented video. In this paper, we present an Attentive Sequences Recurrent Network model, called ASRN, to deal with the above challenges. First, in order to explore multiple clues, we design a Multiple Feature Attention (MFA) mechanism to fuse multiple visual features (i.e. image, motion, body, and face). Through this manner, we can generate an appropriate mapping function from low-level video pixels to high-level social relation space. Second, we design a sequence recurrent network based on Global and Local Attention (GLA) mechanism. Specially, an attention mechanism is used in GLA to integrate global feature with local sequence feature to select more relevant sequences for the recognition task. Therefore, the GLA module can better deal with noisy and unsegmented video. At last, extensive experiments on the SRIV dataset demonstrate the performance of our ASRN model.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2019EDP7104/_p

Copy

@ARTICLE{e102-d_12_2568,
author={Jinna LV, Bin WU, Yunlei ZHANG, Yunpeng XIAO, },
journal={IEICE TRANSACTIONS on Information},
title={Attentive Sequences Recurrent Network for Social Relation Recognition from Video},
year={2019},
volume={E102-D},
number={12},
pages={2568-2576},
abstract={Recently, social relation analysis receives an increasing amount of attention from text to image data. However, social relation analysis from video is an important problem, which is lacking in the current literature. There are still some challenges: 1) it is hard to learn a satisfactory mapping function from low-level pixels to high-level social relation space; 2) how to efficiently select the most relevant information from noisy and unsegmented video. In this paper, we present an Attentive Sequences Recurrent Network model, called ASRN, to deal with the above challenges. First, in order to explore multiple clues, we design a Multiple Feature Attention (MFA) mechanism to fuse multiple visual features (i.e. image, motion, body, and face). Through this manner, we can generate an appropriate mapping function from low-level video pixels to high-level social relation space. Second, we design a sequence recurrent network based on Global and Local Attention (GLA) mechanism. Specially, an attention mechanism is used in GLA to integrate global feature with local sequence feature to select more relevant sequences for the recognition task. Therefore, the GLA module can better deal with noisy and unsegmented video. At last, extensive experiments on the SRIV dataset demonstrate the performance of our ASRN model.},
keywords={},
doi={10.1587/transinf.2019EDP7104},
ISSN={1745-1361},
month={December},}

Copy

TY - JOUR
TI - Attentive Sequences Recurrent Network for Social Relation Recognition from Video
T2 - IEICE TRANSACTIONS on Information
SP - 2568
EP - 2576
AU - Jinna LV
AU - Bin WU
AU - Yunlei ZHANG
AU - Yunpeng XIAO
PY - 2019
DO - 10.1587/transinf.2019EDP7104
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E102-D
IS - 12
JA - IEICE TRANSACTIONS on Information
Y1 - December 2019
AB - Recently, social relation analysis receives an increasing amount of attention from text to image data. However, social relation analysis from video is an important problem, which is lacking in the current literature. There are still some challenges: 1) it is hard to learn a satisfactory mapping function from low-level pixels to high-level social relation space; 2) how to efficiently select the most relevant information from noisy and unsegmented video. In this paper, we present an Attentive Sequences Recurrent Network model, called ASRN, to deal with the above challenges. First, in order to explore multiple clues, we design a Multiple Feature Attention (MFA) mechanism to fuse multiple visual features (i.e. image, motion, body, and face). Through this manner, we can generate an appropriate mapping function from low-level video pixels to high-level social relation space. Second, we design a sequence recurrent network based on Global and Local Attention (GLA) mechanism. Specially, an attention mechanism is used in GLA to integrate global feature with local sequence feature to select more relevant sequences for the recognition task. Therefore, the GLA module can better deal with noisy and unsegmented video. At last, extensive experiments on the SRIV dataset demonstrate the performance of our ASRN model.
ER -

IEICE TRANSACTIONS on Information