The search functionality is under construction.

Author Search Result

[Author] Peng SONG(16hit)

1-16hit
  • A Joint Convolutional Bidirectional LSTM Framework for Facial Expression Recognition

    Jingwei YAN  Wenming ZHENG  Zhen CUI  Peng SONG  

     
    LETTER-Biocybernetics, Neurocomputing

      Pubricized:
    2018/01/11
      Vol:
    E101-D No:4
      Page(s):
    1217-1220

    Facial expressions are generated by the actions of the facial muscles located at different facial regions. The spatial dependencies of different spatial facial regions are worth exploring and can improve the performance of facial expression recognition. In this letter we propose a joint convolutional bidirectional long short-term memory (JCBLSTM) framework to model the discriminative facial textures and spatial relations between different regions jointly. We treat each row or column of feature maps output from CNN as individual ordered sequence and employ LSTM to model the spatial dependencies within it. Moreover, a shortcut connection for convolutional feature maps is introduced for joint feature representation. We conduct experiments on two databases to evaluate the proposed JCBLSTM method. The experimental results demonstrate that the JCBLSTM method achieves state-of-the-art performance on Multi-PIE and very competitive result on FER-2013.

  • Transfer Semi-Supervised Non-Negative Matrix Factorization for Speech Emotion Recognition

    Peng SONG  Shifeng OU  Xinran ZHANG  Yun JIN  Wenming ZHENG  Jinglei LIU  Yanwei YU  

     
    LETTER-Speech and Hearing

      Pubricized:
    2016/07/01
      Vol:
    E99-D No:10
      Page(s):
    2647-2650

    In practice, emotional speech utterances are often collected from different devices or conditions, which will lead to discrepancy between the training and testing data, resulting in sharp decrease of recognition rates. To solve this problem, in this letter, a novel transfer semi-supervised non-negative matrix factorization (TSNMF) method is presented. A semi-supervised negative matrix factorization algorithm, utilizing both labeled source and unlabeled target data, is adopted to learn common feature representations. Meanwhile, the maximum mean discrepancy (MMD) as a similarity measurement is employed to reduce the distance between the feature distributions of two databases. Finally, the TSNMF algorithm, which optimizes the SNMF and MMD functions together, is proposed to obtain robust feature representations across databases. Extensive experiments demonstrate that in comparison to the state-of-the-art approaches, our proposed method can significantly improve the cross-corpus recognition rates.

  • Shared Latent Embedding Learning for Multi-View Subspace Clustering

    Zhaohu LIU  Peng SONG  Jinshuai MU  Wenming ZHENG  

     
    LETTER-Artificial Intelligence, Data Mining

      Pubricized:
    2023/10/17
      Vol:
    E107-D No:1
      Page(s):
    148-152

    Most existing multi-view subspace clustering approaches only capture the inter-view similarities between different views and ignore the optimal local geometric structure of the original data. To this end, in this letter, we put forward a novel method named shared latent embedding learning for multi-view subspace clustering (SLE-MSC), which can efficiently capture a better latent space. To be specific, we introduce a pseudo-label constraint to capture the intra-view similarities within each view. Meanwhile, we utilize a novel optimal graph Laplacian to learn the consistent latent representation, in which the common manifold is considered as the optimal manifold to obtain a more reasonable local geometric structure. Comprehensive experimental results indicate the superiority and effectiveness of the proposed method.

  • Robust Transferable Subspace Learning for Cross-Corpus Facial Expression Recognition

    Dongliang CHEN  Peng SONG  Wenjing ZHANG  Weijian ZHANG  Bingui XU  Xuan ZHOU  

     
    LETTER-Pattern Recognition

      Pubricized:
    2020/07/20
      Vol:
    E103-D No:10
      Page(s):
    2241-2245

    In this letter, we propose a novel robust transferable subspace learning (RTSL) method for cross-corpus facial expression recognition. In this method, on one hand, we present a novel distance metric algorithm, which jointly considers the local and global distance distribution measure, to reduce the cross-corpus mismatch. On the other hand, we design a label guidance strategy to improve the discriminate ability of subspace. Thus, the RTSL is much more robust to the cross-corpus recognition problem than traditional transfer learning methods. We conduct extensive experiments on several facial expression corpora to evaluate the recognition performance of RTSL. The results demonstrate the superiority of the proposed method over some state-of-the-art methods.

  • An Immersive VR System for Sports Education

    Peng SONG  Shuhong XU  Wee Teck FONG  Ching Ling CHIN  Gim Guan CHUA  Zhiyong HUANG  

     
    PAPER-Signal Processing

      Vol:
    E95-D No:5
      Page(s):
    1324-1331

    The development of new technologies has undoubtedly promoted the advances of modern education, among which Virtual Reality (VR) technologies have made the education more visually accessible for students. However, classroom education has been the focus of VR applications whereas not much research has been done in promoting sports education using VR technologies. In this paper, an immersive VR system is designed and implemented to create a more intuitive and visual way of teaching tennis. A scalable system architecture is proposed in addition to the hardware setup layout, which can be used for various immersive interactive applications such as architecture walkthroughs, military training simulations, other sports game simulations, interactive theaters, and telepresent exhibitions. Realistic interaction experience is achieved through accurate and robust hybrid tracking technology, while the virtual human opponent is animated in real time using shader-based skin deformation. Potential future extensions are also discussed to improve the teaching/learning experience.

  • A novel Adaptive Weighted Transfer Subspace Learning Method for Cross-Database Speech Emotion Recognition

    Keke ZHAO  Peng SONG  Shaokai LI  Wenjing ZHANG  Wenming ZHENG  

     
    LETTER-Speech and Hearing

      Pubricized:
    2022/06/09
      Vol:
    E105-D No:9
      Page(s):
    1643-1646

    In this letter, we present an adaptive weighted transfer subspace learning (AWTSL) method for cross-database speech emotion recognition (SER), which can efficiently eliminate the discrepancy between source and target databases. Specifically, on one hand, a subspace projection matrix is first learned to project the cross-database features into a common subspace. At the same time, each target sample can be represented by the source samples by using a sparse reconstruction matrix. On the other hand, we design an adaptive weighted matrix learning strategy, which can improve the reconstruction contribution of important features and eliminate the negative influence of redundant features. Finally, we conduct extensive experiments on four benchmark databases, and the experimental results demonstrate the efficacy of the proposed method.

  • Transform Electric Power Curve into Dynamometer Diagram Image Using Deep Recurrent Neural Network

    Junfeng SHI  Wenming MA  Peng SONG  

     
    LETTER-Artificial Intelligence, Data Mining

      Pubricized:
    2018/05/09
      Vol:
    E101-D No:8
      Page(s):
    2154-2158

    To learn the working situation of rod-pumped wells under ground, we always need to analyze dynamometer diagrams, which are generated by the load sensor and displacement sensor. Rod-pumped wells are usually located in the places with extreme weather, and these sensors are installed on some special oil equipments in the open air. As time goes by, sensors are prone to generating unstable and incorrect data. Unfortunately, load sensors are too expensive to frequently reinstall. Therefore, the resulting dynamometer diagrams sometimes cannot make an accurate diagnosis. Instead, as an absolutely necessary equipment of the rod-pumped well, the electric motor has much longer life and cannot be easily impacted by the weather. The electric power curve during a swabbing period can also reflect the working situation under ground, but is much harder to explain than the dynamometer diagram. This letter presented a novel deep learning architecture, which can transform the electric power curve into the dimensionless dynamometer diagram image. We conduct our experiments on a real-world dataset, and the results show that our method can get an impressive transformation accuracy.

  • A Novel Transferable Sparse Regression Method for Cross-Database Facial Expression Recognition

    Wenjing ZHANG  Peng SONG  Wenming ZHENG  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2021/10/12
      Vol:
    E105-D No:1
      Page(s):
    184-188

    In this letter, we propose a novel transferable sparse regression (TSR) method, for cross-database facial expression recognition (FER). In TSR, we firstly present a novel regression function to regress the data into a latent representation space instead of a strict binary label space. To further alleviate the influence of outliers and overfitting, we impose a row sparsity constraint on the regression term. And a pairwise relation term is introduced to guide the feature transfer learning. Secondly, we design a global graph to transfer knowledge, which can well preserve the cross-database manifold structure. Moreover, we introduce a low-rank constraint on the graph regularization term to uncover additional structural information. Finally, several experiments are conducted on three popular facial expression databases, and the results validate that the proposed TSR method is superior to other non-deep and deep transfer learning methods.

  • Unsupervised Cross-Database Micro-Expression Recognition Using Target-Adapted Least-Squares Regression

    Lingyan LI  Xiaoyan ZHOU  Yuan ZONG  Wenming ZHENG  Xiuzhen CHEN  Jingang SHI  Peng SONG  

     
    LETTER-Pattern Recognition

      Pubricized:
    2019/03/26
      Vol:
    E102-D No:7
      Page(s):
    1417-1421

    Over the past several years, the research of micro-expression recognition (MER) has become an active topic in affective computing and computer vision because of its potential value in many application fields, e.g., lie detection. However, most previous works assumed an ideal scenario that both training and testing samples belong to the same micro-expression database, which is easily broken in practice. In this letter, we hence consider a more challenging scenario that the training and testing samples come from different micro-expression databases and investigated unsupervised cross-database MER in which the source database is labeled while the label information of target database is entirely unseen. To solve this interesting problem, we propose an effective method called target-adapted least-squares regression (TALSR). The basic idea of TALSR is to learn a regression coefficient matrix based on the source samples and their provided label information and also enable this learned regression coefficient matrix to suit the target micro-expression database. We are thus able to use the learned regression coefficient matrix to predict the micro-expression categories of the target micro-expression samples. Extensive experiments on CASME II and SMIC micro-expression databases are conducted to evaluate the proposed TALSR. The experimental results show that our TALSR has better performance than lots of recent well-performing domain adaptation methods in dealing with unsupervised cross-database MER tasks.

  • Speech Emotion Recognition Based on Sparse Transfer Learning Method

    Peng SONG  Wenming ZHENG  Ruiyu LIANG  

     
    LETTER-Speech and Hearing

      Pubricized:
    2015/04/10
      Vol:
    E98-D No:7
      Page(s):
    1409-1412

    In traditional speech emotion recognition systems, when the training and testing utterances are obtained from different corpora, the recognition rates will decrease dramatically. To tackle this problem, in this letter, inspired from the recent developments of sparse coding and transfer learning, a novel sparse transfer learning method is presented for speech emotion recognition. Firstly, a sparse coding algorithm is employed to learn a robust sparse representation of emotional features. Then, a novel sparse transfer learning approach is presented, where the distance between the feature distributions of source and target datasets is considered and used to regularize the objective function of sparse coding. The experimental results demonstrate that, compared with the automatic recognition approach, the proposed method achieves promising improvements on recognition rates and significantly outperforms the classic dimension reduction based transfer learning approach.

  • Speech Emotion Recognition Using Transfer Learning

    Peng SONG  Yun JIN  Li ZHAO  Minghai XIN  

     
    LETTER-Speech and Hearing

      Vol:
    E97-D No:9
      Page(s):
    2530-2532

    A major challenge for speech emotion recognition is that when the training and deployment conditions do not use the same speech corpus, the recognition rates will obviously drop. Transfer learning, which has successfully addressed the cross-domain classification or recognition problem, is presented for cross-corpus speech emotion recognition. First, by using the maximum mean discrepancy embedding (MMDE) optimization and dimension reduction algorithms, two close low-dimensional feature spaces are obtained for source and target speech corpora, respectively. Then, a classifier function is trained using the learned low-dimensional features in the labeled source corpus, and directly applied to the unlabeled target corpus for emotion label recognition. Experimental results demonstrate that the transfer learning method can significantly outperform the traditional automatic recognition technique for cross-corpus speech emotion recognition.

  • A Novel Discriminative Virtual Label Regression Method for Unsupervised Feature Selection

    Zihao SONG  Peng SONG  Chao SHENG  Wenming ZHENG  Wenjing ZHANG  Shaokai LI  

     
    LETTER-Pattern Recognition

      Pubricized:
    2021/10/19
      Vol:
    E105-D No:1
      Page(s):
    175-179

    Unsupervised Feature selection is an important dimensionality reduction technique to cope with high-dimensional data. It does not require prior label information, and has recently attracted much attention. However, it cannot fully utilize the discriminative information of samples, which may affect the feature selection performance. To tackle this problem, in this letter, we propose a novel discriminative virtual label regression method (DVLR) for unsupervised feature selection. In DVLR, we develop a virtual label regression function to guide the subspace learning based feature selection, which can select more discriminative features. Moreover, a linear discriminant analysis (LDA) term is used to make the model be more discriminative. To further make the model be more robust and select more representative features, we impose the ℓ2,1-norm on the regression and feature selection terms. Finally, extensive experiments are carried out on several public datasets, and the results demonstrate that our proposed DVLR achieves better performance than several state-of-the-art unsupervised feature selection methods.

  • Micro-Expression Recognition by Leveraging Color Space Information

    Minghao TANG  Yuan ZONG  Wenming ZHENG  Jisheng DAI  Jingang SHI  Peng SONG  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2019/03/13
      Vol:
    E102-D No:6
      Page(s):
    1222-1226

    Micro-expression is one type of special facial expressions and usually occurs when people try to hide their true emotions. Therefore, recognizing micro-expressions has potential values in lots of applications, e.g., lie detection. In this letter, we focus on such a meaningful topic and investigate how to make full advantage of the color information provided by the micro-expression samples to deal with the micro-expression recognition (MER) problem. To this end, we propose a novel method called color space fusion learning (CSFL) model to fuse the spatiotemporal features extracted in different color space such that the fused spatiotemporal features would be better at describing micro-expressions. To verify the effectiveness of the proposed CSFL method, extensive MER experiments on a widely-used spatiotemporal micro-expression database SMIC is conducted. The experimental results show that the CSFL can significantly improve the performance of spatiotemporal features in coping with MER tasks.

  • Speaker-Independent Speech Emotion Recognition Based on Two-Layer Multiple Kernel Learning

    Yun JIN  Peng SONG  Wenming ZHENG  Li ZHAO  Minghai XIN  

     
    LETTER-Speech and Hearing

      Vol:
    E96-D No:10
      Page(s):
    2286-2289

    In this paper, a two-layer Multiple Kernel Learning (MKL) scheme for speaker-independent speech emotion recognition is presented. In the first layer, MKL is used for feature selection. The training samples are separated into n groups according to some rules. All groups are used for feature selection to obtain n sparse feature subsets. The intersection and the union of all feature subsets are the result of our feature selection methods. In the second layer, MKL is used again for speech emotion classification with the selected features. In order to evaluate the effectiveness of our proposed two-layer MKL scheme, we compare it with state-of-the-art results. It is shown that our scheme results in large gain in performance. Furthermore, another experiment is carried out to compare our feature selection method with other popular ones. And the result proves the effectiveness of our feature selection method.

  • A Novel Iterative Speaker Model Alignment Method from Non-Parallel Speech for Voice Conversion

    Peng SONG  Wenming ZHENG  Xinran ZHANG  Yun JIN  Cheng ZHA  Minghai XIN  

     
    LETTER-Speech and Hearing

      Vol:
    E98-A No:10
      Page(s):
    2178-2181

    Most of the current voice conversion methods are conducted based on parallel speech, which is not easily obtained in practice. In this letter, a novel iterative speaker model alignment (ISMA) method is proposed to address this problem. First, the source and target speaker models are each trained from the background model by adopting maximum a posteriori (MAP) algorithm. Then, a novel ISMA method is presented for alignment and transformation of spectral features. Finally, the proposed ISMA approach is further combined with a Gaussian mixture model (GMM) to improve the conversion performance. A series of objective and subjective experiments are carried out on CMU ARCTIC dataset, and the results demonstrate that the proposed method significantly outperforms the state-of-the-art approach.

  • Learning Corpus-Invariant Discriminant Feature Representations for Speech Emotion Recognition

    Peng SONG  Shifeng OU  Zhenbin DU  Yanyan GUO  Wenming MA  Jinglei LIU  Wenming ZHENG  

     
    LETTER-Speech and Hearing

      Pubricized:
    2017/02/02
      Vol:
    E100-D No:5
      Page(s):
    1136-1139

    As a hot topic of speech signal processing, speech emotion recognition methods have been developed rapidly in recent years. Some satisfactory results have been achieved. However, it should be noted that most of these methods are trained and evaluated on the same corpus. In reality, the training data and testing data are often collected from different corpora, and the feature distributions of different datasets often follow different distributions. These discrepancies will greatly affect the recognition performance. To tackle this problem, a novel corpus-invariant discriminant feature representation algorithm, called transfer discriminant analysis (TDA), is presented for speech emotion recognition. The basic idea of TDA is to integrate the kernel LDA algorithm and the similarity measurement of distributions into one objective function. Experimental results under the cross-corpus conditions show that our proposed method can significantly improve the recognition rates.