1-6hit |
Di YANG Songjiang LI Zhou PENG Peng WANG Junhui WANG Huamin YANG
Accurate traffic flow prediction is the precondition for many applications in Intelligent Transportation Systems, such as traffic control and route guidance. Traditional data driven traffic flow prediction models tend to ignore traffic self-features (e.g., periodicities), and commonly suffer from the shifts brought by various complex factors (e.g., weather and holidays). These would reduce the precision and robustness of the prediction models. To tackle this problem, in this paper, we propose a CNN-based multi-feature predictive model (MF-CNN) that collectively predicts network-scale traffic flow with multiple spatiotemporal features and external factors (weather and holidays). Specifically, we classify traffic self-features into temporal continuity as short-term feature, daily periodicity and weekly periodicity as long-term features, then map them to three two-dimensional spaces, which each one is composed of time and space, represented by two-dimensional matrices. The high-level spatiotemporal features learned by CNNs from the matrices with different time lags are further fused with external factors by a logistic regression layer to derive the final prediction. Experimental results indicate that the MF-CNN model considering multi-features improves the predictive performance compared to five baseline models, and achieves the trade-off between accuracy and efficiency.
Ruicong ZHI Hairui XU Ming WAN Tingting LI
Facial micro-expression is momentary and subtle facial reactions, and it is still challenging to automatically recognize facial micro-expression with high accuracy in practical applications. Extracting spatiotemporal features from facial image sequences is essential for facial micro-expression recognition. In this paper, we employed 3D Convolutional Neural Networks (3D-CNNs) for self-learning feature extraction to represent facial micro-expression effectively, since the 3D-CNNs could well extract the spatiotemporal features from facial image sequences. Moreover, transfer learning was utilized to deal with the problem of insufficient samples in the facial micro-expression database. We primarily pre-trained the 3D-CNNs on normal facial expression database Oulu-CASIA by supervised learning, then the pre-trained model was effectively transferred to the target domain, which was the facial micro-expression recognition task. The proposed method was evaluated on two available facial micro-expression datasets, i.e. CASME II and SMIC-HS. We obtained the overall accuracy of 97.6% on CASME II, and 97.4% on SMIC, which were 3.4% and 1.6% higher than the 3D-CNNs model without transfer learning, respectively. And the experimental results demonstrated that our method achieved superior performance compared to state-of-the-art methods.
Truc Hung NGO Yen-Wei CHEN Naoki MATSUSHIRO Masataka SEO
Facial paralysis is a popular clinical condition occurring in 30 to 40 patients per 100,000 people per year. A quantitative tool to support medical diagnostics is necessary. This paper proposes a simple, visual and robust method that can objectively measure the degree of the facial paralysis by the use of spatiotemporal features. The main contribution of this paper is the proposal of an effective spatiotemporal feature extraction method based on a tracking of landmarks. Our method overcomes the drawbacks of the other techniques such as the influence of irrelevant regions, noise, illumination change and time-consuming process. In addition, the method is simple and visual. The simplification helps to reduce the time-consuming process. Also, the movements of landmarks, which relate to muscle movement ability, are visual. Therefore, the visualization helps reveal regions of serious facial paralysis. For recognition rate, experimental results show that our proposed method outperformed the other techniques tested on a dynamic facial expression image database.
Jingjie YAN Wenming ZHENG Minhai XIN Jingwei YAN
In this letter, we research the method of using face and gesture image sequences to deal with the video-based bimodal emotion recognition problem, in which both Harris plus cuboids spatio-temporal feature (HST) and sparse canonical correlation analysis (SCCA) fusion method are applied to this end. To efficaciously pick up the spatio-temporal features, we adopt the Harris 3D feature detector proposed by Laptev and Lindeberg to find the points from both face and gesture videos, and then apply the cuboids feature descriptor to extract the facial expression and gesture emotion features [1],[2]. To further extract the common emotion features from both facial expression feature set and gesture feature set, the SCCA method is applied and the extracted emotion features are used for the biomodal emotion classification, where the K-nearest neighbor classifier and the SVM classifier are respectively used for this purpose. We test this method on the biomodal face and body gesture (FABO) database and the experimental results demonstrate the better recognition accuracy compared with other methods.
Wen ZHOU Chunheng WANG Baihua XIAO Zhong ZHANG Yunxue SHAO
Recognizing human action in complex scenes is a challenging problem in computer vision. Some action-unrelated concepts, such as camera position features, could significantly affect the appearance of local spatio-temporal features, and therefore the performance of low-level features based methods degrades. In this letter, we define the action-unrelated concept: the position of camera as high-level features. We observe that they can serve as a prior to local spatio-temporal features for human action recognition. We encode this prior by modeling interactions between spatio-temporal features and camera position features. We infer camera position features from local spatio-temporal features via these interactions. The parameters of this model are estimated by a new max-margin algorithm. We evaluate the proposed method on KTH, IXMAS and Youtube actions datasets. Experimental results show the effectiveness of the proposed method.
Yotaro KUBO Shigeki OKAWA Akira KUREMATSU Katsuhiko SHIRAI
We have attempted to recognize reverberant speech using a novel speech recognition system that depends on not only the spectral envelope and amplitude modulation but also frequency modulation. Most of the features used by modern speech recognition systems, such as MFCC, PLP, and TRAPS, are derived from the energy envelopes of narrowband signals by discarding the information in the carrier signals. However, some experiments show that apart from the spectral/time envelope and its modulation, the information on the zero-crossing points of the carrier signals also plays a significant role in human speech recognition. In realistic environments, a feature that depends on the limited properties of the signal may easily be corrupted. In order to utilize an automatic speech recognizer in an unknown environment, using the information obtained from other signal properties and combining them is important to minimize the effects of the environment. In this paper, we propose a method to analyze carrier signals that are discarded in most of the speech recognition systems. Our system consists of two nonlinear discriminant analyzers that use multilayer perceptrons. One of the nonlinear discriminant analyzers is HATS, which can capture the amplitude modulation of narrowband signals efficiently. The other nonlinear discriminant analyzer is a pseudo-instantaneous frequency analyzer proposed in this paper. This analyzer can capture the frequency modulation of narrowband signals efficiently. The combination of these two analyzers is performed by the method based on the entropy of the feature introduced by Okawa et al. In this paper, in Sect. 2, we first introduce pseudo-instantaneous frequencies to capture a property of the carrier signal. The previous AM analysis method are described in Sect. 3. The proposed system is described in Sect. 4. The experimental setup is presented in Sect. 5, and the results are discussed in Sect. 6. We evaluate the performance of the proposed method by continuous digit recognition of reverberant speech. The proposed system exhibits considerable improvement with regard to the MFCC feature extraction system.