The search functionality is under construction.
The search functionality is under construction.

Author Search Result

[Author] Huawei TAO(6hit)

1-6hit
  • A Novel Hybrid Network Model Based on Attentional Multi-Feature Fusion for Deception Detection

    Yuanbo FANG  Hongliang FU  Huawei TAO  Ruiyu LIANG  Li ZHAO  

     
    LETTER-Speech and Hearing

      Pubricized:
    2020/09/24
      Vol:
    E104-A No:3
      Page(s):
    622-626

    Speech based deception detection using deep learning is one of the technologies to realize a deception detection system with high recognition rate in the future. Multi-network feature extraction technology can effectively improve the recognition performance of the system, but due to the limited labeled data and the lack of effective feature fusion methods, the performance of the network is limited. Based on this, a novel hybrid network model based on attentional multi-feature fusion (HN-AMFF) is proposed. Firstly, the static features of large amounts of unlabeled speech data are input into DAE for unsupervised training. Secondly, the frame-level features and static features of a small amount of labeled speech data are simultaneously input into the LSTM network and the encoded output part of DAE for joint supervised training. Finally, a feature fusion algorithm based on attention mechanism is proposed, which can get the optimal feature set in the training process. Simulation results show that the proposed feature fusion method is significantly better than traditional feature fusion methods, and the model can achieve advanced performance with only a small amount of labeled data.

  • Spectral Features Based on Local Normalized Center Moments for Speech Emotion Recognition

    Huawei TAO  Ruiyu LIANG  Xinran ZHANG  Li ZHAO  

     
    LETTER-Speech and Hearing

      Vol:
    E99-A No:10
      Page(s):
    1863-1866

    To discuss whether rotational invariance is the main role in spectrogram features, new spectral features based on local normalized center moments, denoted by LNCMSF, are proposed. The proposed LNCMSF firstly adopts 2nd order normalized center moments to describe local energy distribution of the logarithmic energy spectrum, then normalized center moment spectrograms NC1 and NC2 are gained. Secondly, DCT (Discrete Cosine Transform) is used to eliminate the correlation of NC1 and NC2, then high order cepstral coefficients TNC1 and TNC2 are obtained. Finally, LNCMSF is generated by combining NC1, NC2, TNC1 and TNC2. The rotational invariance test experiment shows that the rotational invariance is not a necessary property in partial spectrogram features. The recognition experiment shows that the maximum UA (Unweighted Average of Class-Wise Recall Rate) of LNCMSF are improved by at least 10.7% and 1.2% respectively, compared to that of MFCC (Mel Frequency Cepstrum Coefficient) and HuWSF (Weighted Spectral Features Based on Local Hu Moments).

  • Spectral Features Based on Local Hu Moments of Gabor Spectrograms for Speech Emotion Recognition

    Huawei TAO  Ruiyu LIANG  Cheng ZHA  Xinran ZHANG  Li ZHAO  

     
    LETTER-Pattern Recognition

      Pubricized:
    2016/05/06
      Vol:
    E99-D No:8
      Page(s):
    2186-2189

    To improve the recognition rate of the speech emotion, new spectral features based on local Hu moments of Gabor spectrograms are proposed, denoted by GSLHu-PCA. Firstly, the logarithmic energy spectrum of the emotional speech is computed. Secondly, the Gabor spectrograms are obtained by convoluting logarithmic energy spectrum with Gabor wavelet. Thirdly, Gabor local Hu moments(GLHu) spectrograms are obtained through block Hu strategy, then discrete cosine transform (DCT) is used to eliminate correlation among components of GLHu spectrograms. Fourthly, statistical features are extracted from cepstral coefficients of GLHu spectrograms, then all the statistical features form a feature vector. Finally, principal component analysis (PCA) is used to reduce redundancy of features. The experimental results on EmoDB and ABC databases validate the effectiveness of GSLHu-PCA.

  • Convolutional Auto-Encoder and Adversarial Domain Adaptation for Cross-Corpus Speech Emotion Recognition

    Yang WANG  Hongliang FU  Huawei TAO  Jing YANG  Hongyi GE  Yue XIE  

     
    LETTER-Artificial Intelligence, Data Mining

      Pubricized:
    2022/07/12
      Vol:
    E105-D No:10
      Page(s):
    1803-1806

    This letter focuses on the cross-corpus speech emotion recognition (SER) task, in which the training and testing speech signals in cross-corpus SER belong to different speech corpora. Existing algorithms are incapable of effectively extracting common sentiment information between different corpora to facilitate knowledge transfer. To address this challenging problem, a novel convolutional auto-encoder and adversarial domain adaptation (CAEADA) framework for cross-corpus SER is proposed. The framework first constructs a one-dimensional convolutional auto-encoder (1D-CAE) for feature processing, which can explore the correlation among adjacent one-dimensional statistic features and the feature representation can be enhanced by the architecture based on encoder-decoder-style. Subsequently the adversarial domain adaptation (ADA) module alleviates the feature distributions discrepancy between the source and target domains by confusing domain discriminator, and specifically employs maximum mean discrepancy (MMD) to better accomplish feature transformation. To evaluate the proposed CAEADA, extensive experiments were conducted on EmoDB, eNTERFACE, and CASIA speech corpora, and the results show that the proposed method outperformed other approaches.

  • A Salient Feature Extraction Algorithm for Speech Emotion Recognition

    Ruiyu LIANG  Huawei TAO  Guichen TANG  Qingyun WANG  Li ZHAO  

     
    LETTER-Speech and Hearing

      Pubricized:
    2015/05/29
      Vol:
    E98-D No:9
      Page(s):
    1715-1718

    A salient feature extraction algorithm is proposed to improve the recognition rate of the speech emotion. Firstly, the spectrogram of the emotional speech is calculated. Secondly, imitating the selective attention mechanism, the color, direction and brightness map of the spectrogram is computed. Each map is normalized and down-sampled to form the low resolution feature matrix. Then, each feature matrix is converted to the row vector and the principal component analysis (PCA) is used to reduce features redundancy to make the subsequent classification algorithm more practical. Finally, the speech emotion is classified with the support vector machine. Compared with the tradition features, the improved recognition rate reaches 15%.

  • Cross-Corpus Speech Emotion Recognition Based on Causal Emotion Information Representation Open Access

    Hongliang FU  Qianqian LI  Huawei TAO  Chunhua ZHU  Yue XIE  Ruxue GUO  

     
    LETTER-Speech and Hearing

      Pubricized:
    2024/04/12
      Vol:
    E107-D No:8
      Page(s):
    1097-1100

    Speech emotion recognition (SER) is a key research technology to realize the third generation of artificial intelligence, which is widely used in human-computer interaction, emotion diagnosis, interpersonal communication and other fields. However, the aliasing of language and semantic information in speech tends to distort the alignment of emotion features, which affects the performance of cross-corpus SER system. This paper proposes a cross-corpus SER model based on causal emotion information representation (CEIR). The model uses the reconstruction loss of the deep autoencoder network and the source domain label information to realize the preliminary separation of causal features. Then, the causal correlation matrix is constructed, and the local maximum mean difference (LMMD) feature alignment technology is combined to make the causal features of different dimensions jointly distributed independent. Finally, the supervised fine-tuning of labeled data is used to achieve effective extraction of causal emotion information. The experimental results show that the average unweighted average recall (UAR) of the proposed algorithm is increased by 3.4% to 7.01% compared with the latest partial algorithms in the field.