IEICE global.ieice.org Site

Keyword Search Result

[Keyword] video representation(4hit)

1-4hit

Spatio-Temporal Self-Attention Weighted VLAD Neural Network for Action Recognition
Shilei CHENG Mei XIE Zheng MA Siqi LI Song GU Feng YANG

LETTER-Biocybernetics, Neurocomputing

Pubricized:
2020/10/01
Vol:
E104-D No:1
Page(s):
220-224
As characterizing videos simultaneously from spatial and temporal cues have been shown crucial for video processing, with the shortage of temporal information of soft assignment, the vector of locally aggregated descriptor (VLAD) should be considered as a suboptimal framework for learning the spatio-temporal video representation. With the development of attention mechanisms in natural language processing, in this work, we present a novel model with VLAD following spatio-temporal self-attention operations, named spatio-temporal self-attention weighted VLAD (ST-SAWVLAD). In particular, sequential convolutional feature maps extracted from two modalities i.e., RGB and Flow are receptively fed into the self-attention module to learn soft spatio-temporal assignments parameters, which enabling aggregate not only detailed spatial information but also fine motion information from successive video frames. In experiments, we evaluate ST-SAWVLAD by using competitive action recognition datasets, UCF101 and HMDB51, the results shcoutstanding performance. The source code is available at:https://github.com/badstones/st-sawvlad.
Video Retrieval System for Bridging the Semantic Gap
Min Young JUNG Sung Han PARK

LETTER-Database

Vol:
E92-D No:12
Page(s):
2516-2519
We propose a video ontology system to overcome semantic gap in video retrieval. The proposed video ontology is aimed at bridging of the gap between the semantic nature of user queries and raw video contents. Also, results of semantic retrieval shows not only the concept of topic keyword but also a sub-concept of the topic keyword using semantic query extension. Through this process, recall is likely to provide high accuracy results in our method. The experiments compared with keyframe-based indexing have demonstrated that this proposed scene-based indexing presents better results in several kinds of videos.
Representation of Dynamic 3D Objects Using the Coaxial Camera System
Takayuki YASUNO Jun'ichi ICHIMURA Yasuhiko YASUDA

PAPER

Vol:
E79-B No:10
Page(s):
1484-1490
3D model-based coding methods that need 3D reconstruction techniques are proposed for next-generation image coding methods. A method is presented that reconstructs 3D shapes of dynamic objects from image sequences captured using two cameras, thus avoiding the stereo correspondence problem. A coaxial camera system consisting of one moving and one static camera was developed. The optical axes of both cameras are precisely adjusted and have the same orientation using an optical system with true and half mirrors. The moving camera is moved along a straight horizontal line. This method can reconstruct 3D shapes of static objects as well as dynamic objects using motion vectors calculated from the moving camera images and revised using the static camera image. The method was tested successfully on real images by reconstructing a moving human shape.
Object Surface Representation Using Occlusion Analysis of Spatiotemporal Images^*
Takayuki YASUNO Satoshi SUZUKI Yasuhiko YASUDA

PAPER

Vol:
E79-D No:6
Page(s):
764-771
Three dimensional model based coding methods are proposed as next generation image coding methods. These new representations need 3D reconstruction techniques. This paper presents a method that extracts the surfaces of static objects that occlude other objects from a spatiotemporal image captured with straight-line camera motion. We propose the concept of occlusion types and show that the occlusion types are restricted to only eight patterns. Furthermore, we show occlusion type pairs contain information that confirms the existence of surfaces. Occlusion information gives strong cues for segmentation and representation. The method can estimate not only the 3D positions of edge points but also the surfaces bounded by the edge points. We show that combinations of occlusion types contain information that can confirm surface existence. The method was tested successfully on real images by reconstructing flat and curved surfaces. Videos can be hierarchically structured with the method. The method makes various applications possible, such as object selective image communication and object selective video editing.

Keyword Search Result

[Keyword] video representation(4hit)

Spatio-Temporal Self-Attention Weighted VLAD Neural Network for Action Recognition

Video Retrieval System for Bridging the Semantic Gap

Representation of Dynamic 3D Objects Using the Coaxial Camera System

Object Surface Representation Using Occlusion Analysis of Spatiotemporal Images^*

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles

Keyword Search Result

[Keyword] video representation(4hit)

Spatio-Temporal Self-Attention Weighted VLAD Neural Network for Action Recognition

Video Retrieval System for Bridging the Semantic Gap

Representation of Dynamic 3D Objects Using the Coaxial Camera System

Object Surface Representation Using Occlusion Analysis of Spatiotemporal Images*

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles

Object Surface Representation Using Occlusion Analysis of Spatiotemporal Images^*