1-7hit |
Jingjie YAN Wenming ZHENG Minghai XIN Jingwei YAN
In this letter, a new sparse locality preserving projection (SLPP) algorithm is developed and applied to facial expression recognition. In comparison with the original locality preserving projection (LPP) algorithm, the presented SLPP algorithm is able to simultaneously find the intrinsic manifold of facial feature vectors and deal with facial feature selection. This is realized by the use of l1-norm regularization in the LPP objective function, which is directly formulated as a least squares regression pattern. We use two real facial expression databases (JAFFE and Ekman's POFA) to testify the proposed SLPP method and certain experiments show that the proposed SLPP approach respectively gains 77.60% and 82.29% on JAFFE and POFA database.
Ping LU Wenming ZHENG Ziyan WANG Qiang LI Yuan ZONG Minghai XIN Lenan WU
In this letter, a micro-expression recognition method is investigated by integrating both spatio-temporal facial features and a regression model. To this end, we first perform a multi-scale facial region division for each facial image and then extract a set of local binary patterns on three orthogonal planes (LBP-TOP) features corresponding to divided facial regions of the micro-expression videos. Furthermore, we use GSLSR model to build the linear regression relationship between the LBP-TOP facial feature vectors and the micro expressions label vectors. Finally, the learned GSLSR model is applied to the prediction of the micro-expression categories for each test micro-expression video. Experiments are conducted on both CASME II and SMIC micro-expression databases to evaluate the performance of the proposed method, and the results demonstrate that the proposed method is better than the baseline micro-expression recognition method.
Peng SONG Yun JIN Li ZHAO Minghai XIN
A major challenge for speech emotion recognition is that when the training and deployment conditions do not use the same speech corpus, the recognition rates will obviously drop. Transfer learning, which has successfully addressed the cross-domain classification or recognition problem, is presented for cross-corpus speech emotion recognition. First, by using the maximum mean discrepancy embedding (MMDE) optimization and dimension reduction algorithms, two close low-dimensional feature spaces are obtained for source and target speech corpora, respectively. Then, a classifier function is trained using the learned low-dimensional features in the labeled source corpus, and directly applied to the unlabeled target corpus for emotion label recognition. Experimental results demonstrate that the transfer learning method can significantly outperform the traditional automatic recognition technique for cross-corpus speech emotion recognition.
Yun JIN Peng SONG Wenming ZHENG Li ZHAO Minghai XIN
In this paper, a two-layer Multiple Kernel Learning (MKL) scheme for speaker-independent speech emotion recognition is presented. In the first layer, MKL is used for feature selection. The training samples are separated into n groups according to some rules. All groups are used for feature selection to obtain n sparse feature subsets. The intersection and the union of all feature subsets are the result of our feature selection methods. In the second layer, MKL is used again for speech emotion classification with the selected features. In order to evaluate the effectiveness of our proposed two-layer MKL scheme, we compare it with state-of-the-art results. It is shown that our scheme results in large gain in performance. Furthermore, another experiment is carried out to compare our feature selection method with other popular ones. And the result proves the effectiveness of our feature selection method.
Peng SONG Wenming ZHENG Xinran ZHANG Yun JIN Cheng ZHA Minghai XIN
Most of the current voice conversion methods are conducted based on parallel speech, which is not easily obtained in practice. In this letter, a novel iterative speaker model alignment (ISMA) method is proposed to address this problem. First, the source and target speaker models are each trained from the background model by adopting maximum a posteriori (MAP) algorithm. Then, a novel ISMA method is presented for alignment and transformation of spectral features. Finally, the proposed ISMA approach is further combined with a Gaussian mixture model (GMM) to improve the conversion performance. A series of objective and subjective experiments are carried out on CMU ARCTIC dataset, and the results demonstrate that the proposed method significantly outperforms the state-of-the-art approach.
Hao WANG Li ZHAO Wenjiang PEI Jiakuo ZUO Qingyun WANG Minghai XIN
The optimal design of an extrapolated impulse response (EIR) filter (in the mini-max sense) is a non-linear programming problem. In this paper, the optimal design of the EIR filter by the semi-infinite programming (SIP) is investigated and an iterative technique for optimally designing the EIR filter is proposed. The simulation experiment validates the effectiveness of the SIP technique and the proposed iterative technique in the optimal design of the EIR filter.
Chen WU Yifeng ZHANG Yuhui SHI Li ZHAO Minghai XIN
Recently, design of sparse finite impulse response (FIR) digital filters has attracted much attention due to its ability to reduce the implementation cost. However, finding a filter with the fewest number of nonzero coefficients subject to prescribed frequency domain constraints is a rather difficult problem because of its non-convexity. In this paper, an algorithm based on binary particle swarm optimization (BPSO) is proposed, which successively thins the filter coefficients until no sparser solution can be obtained. The proposed algorithm is evaluated on a set of examples, and better results can be achieved than other existing algorithms.