1-3hit |
Hongbin SUO Ming LI Ping LU Yonghong YAN
Robust automatic language identification (LID) is the task of identifying the language from a short utterance spoken by an unknown speaker. The mainstream approaches include parallel phone recognition language modeling (PPRLM), support vector machine (SVM) and the general Gaussian mixture models (GMMs). These systems map the cepstral features of spoken utterances into high level scores by classifiers. In this paper, in order to increase the dimension of the score vector and alleviate the inter-speaker variability within the same language, multiple data groups based on supervised speaker clustering are employed to generate the discriminative language characterization score vectors (DLCSV). The back-end SVM classifiers are used to model the probability distribution of each target language in the DLCSV space. Finally, the output scores of back-end classifiers are calibrated by a pair-wise posterior probability estimation (PPPE) algorithm. The proposed language identification frameworks are evaluated on 2003 NIST Language Recognition Evaluation (LRE) databases and the experiments show that the system described in this paper produces comparable results to the existing systems. Especially, the SVM framework achieves an equal error rate (EER) of 4.0% in the 30-second task and outperforms the state-of-art systems by more than 30% relative error reduction. Besides, the performances of proposed PPRLM and GMMs algorithms achieve an EER of 5.1% and 5.0% respectively.
Xiang ZHANG Ping LU Hongbin SUO Qingwei ZHAO Yonghong YAN
In this letter, a recently proposed clustering algorithm named affinity propagation is introduced for the task of speaker clustering. This novel algorithm exhibits fast execution speed and finds clusters with low error. However, experiments show that the speaker purity of affinity propagation is not satisfying. Thus, we propose a hybrid approach that combines affinity propagation with agglomerative hierarchical clustering to improve the clustering performance. Experiments show that compared with traditional agglomerative hierarchical clustering, the hybrid method achieves better performance on the test corpora.
Ping LU Wenming ZHENG Ziyan WANG Qiang LI Yuan ZONG Minghai XIN Lenan WU
In this letter, a micro-expression recognition method is investigated by integrating both spatio-temporal facial features and a regression model. To this end, we first perform a multi-scale facial region division for each facial image and then extract a set of local binary patterns on three orthogonal planes (LBP-TOP) features corresponding to divided facial regions of the micro-expression videos. Furthermore, we use GSLSR model to build the linear regression relationship between the LBP-TOP facial feature vectors and the micro expressions label vectors. Finally, the learned GSLSR model is applied to the prediction of the micro-expression categories for each test micro-expression video. Experiments are conducted on both CASME II and SMIC micro-expression databases to evaluate the performance of the proposed method, and the results demonstrate that the proposed method is better than the baseline micro-expression recognition method.