1-1hit |
Jingjie YAN Wenming ZHENG Minhai XIN Jingwei YAN
In this letter, we research the method of using face and gesture image sequences to deal with the video-based bimodal emotion recognition problem, in which both Harris plus cuboids spatio-temporal feature (HST) and sparse canonical correlation analysis (SCCA) fusion method are applied to this end. To efficaciously pick up the spatio-temporal features, we adopt the Harris 3D feature detector proposed by Laptev and Lindeberg to find the points from both face and gesture videos, and then apply the cuboids feature descriptor to extract the facial expression and gesture emotion features [1],[2]. To further extract the common emotion features from both facial expression feature set and gesture feature set, the SCCA method is applied and the extracted emotion features are used for the biomodal emotion classification, where the K-nearest neighbor classifier and the SVM classifier are respectively used for this purpose. We test this method on the biomodal face and body gesture (FABO) database and the experimental results demonstrate the better recognition accuracy compared with other methods.