IEICE global.ieice.org Site

Keyword Search Result

[Keyword] human action recognition(9hit)

1-9hit

Spatio-Temporal Self-Attention Weighted VLAD Neural Network for Action Recognition
Shilei CHENG Mei XIE Zheng MA Siqi LI Song GU Feng YANG

LETTER-Biocybernetics, Neurocomputing

Pubricized:
2020/10/01
Vol:
E104-D No:1
Page(s):
220-224
As characterizing videos simultaneously from spatial and temporal cues have been shown crucial for video processing, with the shortage of temporal information of soft assignment, the vector of locally aggregated descriptor (VLAD) should be considered as a suboptimal framework for learning the spatio-temporal video representation. With the development of attention mechanisms in natural language processing, in this work, we present a novel model with VLAD following spatio-temporal self-attention operations, named spatio-temporal self-attention weighted VLAD (ST-SAWVLAD). In particular, sequential convolutional feature maps extracted from two modalities i.e., RGB and Flow are receptively fed into the self-attention module to learn soft spatio-temporal assignments parameters, which enabling aggregate not only detailed spatial information but also fine motion information from successive video frames. In experiments, we evaluate ST-SAWVLAD by using competitive action recognition datasets, UCF101 and HMDB51, the results shcoutstanding performance. The source code is available at:https://github.com/badstones/st-sawvlad.
Action Recognition Using Low-Rank Sparse Representation
Shilei CHENG Song GU Maoquan YE Mei XIE

LETTER-Image Recognition, Computer Vision

Pubricized:
2017/11/24
Vol:
E101-D No:3
Page(s):
830-834
Human action recognition in videos draws huge research interests in computer vision. The Bag-of-Word model is quite commonly used to obtain the video level representations, however, BoW model roughly assigns each feature vector to its nearest visual word and the collection of unordered words ignores the interest points' spatial information, inevitably causing nontrivial quantization errors and impairing improvements on classification rates. To address these drawbacks, we propose an approach for action recognition by encoding spatio-temporal log Euclidean covariance matrix (ST-LECM) features within the low-rank and sparse representation framework. Motivated by low rank matrix recovery, local descriptors in a spatial temporal neighborhood have similar representation and should be approximately low rank. The learned coefficients can not only capture the global data structures, but also preserve consistent. Experimental results showed that the proposed approach yields excellent recognition performance on synthetic video datasets and are robust to action variability, view variations and partial occlusion.
Learning a Similarity Constrained Discriminative Kernel Dictionary from Concatenated Low-Rank Features for Action Recognition
Shijian HUANG Junyong YE Tongqing WANG Li JIANG Changyuan XING Yang LI

LETTER-Pattern Recognition

Pubricized:
2015/11/16
Vol:
E99-D No:2
Page(s):
541-544
Traditional low-rank feature lose the temporal information among action sequence. To obtain the temporal information, we split an action video into multiple action subsequences and concatenate all the low-rank features of subsequences according to their time order. Then we recognize actions by learning a novel dictionary model from concatenated low-rank features. However, traditional dictionary learning models usually neglect the similarity among the coding coefficients and have bad performance in dealing with non-linearly separable data. To overcome these shortcomings, we present a novel similarity constrained discriminative kernel dictionary learning for action recognition. The effectiveness of the proposed method is verified on three benchmarks, and the experimental results show the promising results of our method for action recognition.
Gradient-Flow Tensor Divergence Feature for Human Action Recognition
Ngoc Nam BUI Jin Young KIM Hyoung-Gook KIM

LETTER-Vision

Vol:
E99-A No:1
Page(s):
437-440
Current research trends in computer vision have tended towards achieving the goal of recognizing human action, due to the potential utility of such recognition in various applications. Among many potential approaches, an approach involving Gaussian Mixture Model (GMM) supervectors with a Support Vector Machine (SVM) and a nonlinear GMM KL kernel has been proven to yield improved performance for recognizing human activities. In this study, based on tensor analysis, we develop and exploit an extended class of action features that we refer to as gradient-flow tensor divergence. The proposed method has shown a best recognition rate of 96.3% for a KTH dataset, and reduced processing time.
Statistics on Temporal Changes of Sparse Coding Coefficients in Spatial Pyramids for Human Action Recognition
Yang LI Junyong YE Tongqing WANG Shijian HUANG

LETTER-Pattern Recognition

Pubricized:
2015/06/01
Vol:
E98-D No:9
Page(s):
1711-1714
Traditional sparse representation-based methods for human action recognition usually pool over the entire video to form the final feature representation, neglecting any spatio-temporal information of features. To employ spatio-temporal information, we present a novel histogram representation obtained by statistics on temporal changes of sparse coding coefficients frame by frame in the spatial pyramids constructed from videos. The histograms are further fed into a support vector machine with a spatial pyramid matching kernel for final action classification. We validate our method on two benchmarks, KTH and UCF Sports, and experiment results show the effectiveness of our method in human action recognition.
Contextual Max Pooling for Human Action Recognition
Zhong ZHANG Shuang LIU Xing MEI

LETTER-Image Recognition, Computer Vision

Pubricized:
2015/01/19
Vol:
E98-D No:4
Page(s):
989-993
The bag-of-words model (BOW) has been extensively adopted by recent human action recognition methods. The pooling operation, which aggregates local descriptor encodings into a single representation, is a key determiner of the performance of the BOW-based methods. However, the spatio-temporal relationship among interest points has rarely been considered in the pooling step, which results in the imprecise representation of human actions. In this paper, we propose a novel pooling strategy named contextual max pooling (CMP) to overcome this limitation. We add a constraint term into the objective function under the framework of max pooling, which forces the weights of interest points to be consistent with their probabilities. In this way, CMP explicitly considers the spatio-temporal contextual relationships among interest points and inherits the positive properties of max pooling. Our method is verified on three challenging datasets (KTH, UCF Sports and UCF Films datasets), and the results demonstrate that our method achieves better results than the state-of-the-art methods in human action recognition.
Topic-Based Knowledge Transfer Algorithm for Cross-View Action Recognition
Changhong CHEN Shunqing YANG Zongliang GAN

LETTER-Pattern Recognition

Vol:
E97-D No:3
Page(s):
614-617
Cross-view action recognition is a challenging research field for human motion analysis. Appearance-based features are not credible if the viewpoint changes. In this paper, a new framework is proposed for cross-view action recognition by topic based knowledge transfer. First, Spatio-temporal descriptors are extracted from the action videos and each video is modeled by a bag of visual words (BoVW) based on the codebook constructed by the k-means cluster algorithm. Second, Latent Dirichlet Allocation (LDA) is employed to assign topics for the BoVW representation. The topic distribution of visual words (ToVW) is normalized and taken to be the feature vector. Third, in order to bridge different views, we transform ToVW into bilingual ToVW by constructing bilingual dictionaries, which guarantee that the same action has the same representation from different views. We demonstrate the effectiveness of the proposed algorithm on the IXMAS multi-view dataset.
Selecting Effective and Discriminative Spatio-Temporal Interest Points for Recognizing Human Action
Hongbo ZHANG Shaozi LI Songzhi SU Shu-Yuan CHEN

PAPER-Image Processing and Video Processing

Vol:
E96-D No:8
Page(s):
1783-1792
Many successful methods for recognizing human action are spatio-temporal interest point (STIP) based methods. Given a test video sequence, for a matching-based method using a voting mechanism, each test STIP casts a vote for each action class based on its mutual information with respect to the respective class, which is measured in terms of class likelihood probability. Therefore, two issues should be addressed to improve the accuracy of action recognition. First, effective STIPs in the training set must be selected as references for accurately estimating probability. Second, discriminative STIPs in the test set must be selected for voting. This work uses ε-nearest neighbors as effective STIPs for estimating the class probability and uses a variance filter for selecting discriminative STIPs. Experimental results verify that the proposed method is more accurate than existing action recognition methods.
A Vision-Based Emergency Response System with a Paramedic Mobile Robot
Il-Woong JEONG Jin CHOI Kyusung CHO Yong-Ho SEO Hyun Seung YANG

PAPER

Vol:
E93-D No:7
Page(s):
1745-1753
Detecting emergency situation is very important to a surveillance system for people like elderly live alone. A vision-based emergency response system with a paramedic mobile robot is presented in this paper. The proposed system is consisted of a vision-based emergency detection system and a mobile robot as a paramedic. A vision-based emergency detection system detects emergency by tracking people and detecting their actions from image sequences acquired by single surveillance camera. In order to recognize human actions, interest regions are segmented from the background using blob extraction method and tracked continuously using generic model. Then a MHI (Motion History Image) for a tracked person is constructed by silhouette information of region blobs and model actions. Emergency situation is finally detected by applying these information to neural network. When an emergency is detected, a mobile robot can help to diagnose the status of the person in the situation. To send the mobile robot to the proper position, we implement mobile robot navigation algorithm based on the distance between the person and a mobile robot. We validate our system by showing emergency detection rate and emergency response demonstration using the mobile robot.

Keyword Search Result

[Keyword] human action recognition(9hit)

Spatio-Temporal Self-Attention Weighted VLAD Neural Network for Action Recognition

Action Recognition Using Low-Rank Sparse Representation

Learning a Similarity Constrained Discriminative Kernel Dictionary from Concatenated Low-Rank Features for Action Recognition

Gradient-Flow Tensor Divergence Feature for Human Action Recognition

Statistics on Temporal Changes of Sparse Coding Coefficients in Spatial Pyramids for Human Action Recognition

Contextual Max Pooling for Human Action Recognition

Topic-Based Knowledge Transfer Algorithm for Cross-View Action Recognition

Selecting Effective and Discriminative Spatio-Temporal Interest Points for Recognizing Human Action

A Vision-Based Emergency Response System with a Paramedic Mobile Robot

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles