The search functionality is under construction.

Author Search Result

[Author] Xinran ZHANG(4hit)

1-4hit
  • A Novel Iterative Speaker Model Alignment Method from Non-Parallel Speech for Voice Conversion

    Peng SONG  Wenming ZHENG  Xinran ZHANG  Yun JIN  Cheng ZHA  Minghai XIN  

     
    LETTER-Speech and Hearing

      Vol:
    E98-A No:10
      Page(s):
    2178-2181

    Most of the current voice conversion methods are conducted based on parallel speech, which is not easily obtained in practice. In this letter, a novel iterative speaker model alignment (ISMA) method is proposed to address this problem. First, the source and target speaker models are each trained from the background model by adopting maximum a posteriori (MAP) algorithm. Then, a novel ISMA method is presented for alignment and transformation of spectral features. Finally, the proposed ISMA approach is further combined with a Gaussian mixture model (GMM) to improve the conversion performance. A series of objective and subjective experiments are carried out on CMU ARCTIC dataset, and the results demonstrate that the proposed method significantly outperforms the state-of-the-art approach.

  • Transfer Semi-Supervised Non-Negative Matrix Factorization for Speech Emotion Recognition

    Peng SONG  Shifeng OU  Xinran ZHANG  Yun JIN  Wenming ZHENG  Jinglei LIU  Yanwei YU  

     
    LETTER-Speech and Hearing

      Pubricized:
    2016/07/01
      Vol:
    E99-D No:10
      Page(s):
    2647-2650

    In practice, emotional speech utterances are often collected from different devices or conditions, which will lead to discrepancy between the training and testing data, resulting in sharp decrease of recognition rates. To solve this problem, in this letter, a novel transfer semi-supervised non-negative matrix factorization (TSNMF) method is presented. A semi-supervised negative matrix factorization algorithm, utilizing both labeled source and unlabeled target data, is adopted to learn common feature representations. Meanwhile, the maximum mean discrepancy (MMD) as a similarity measurement is employed to reduce the distance between the feature distributions of two databases. Finally, the TSNMF algorithm, which optimizes the SNMF and MMD functions together, is proposed to obtain robust feature representations across databases. Extensive experiments demonstrate that in comparison to the state-of-the-art approaches, our proposed method can significantly improve the cross-corpus recognition rates.

  • Spectral Features Based on Local Normalized Center Moments for Speech Emotion Recognition

    Huawei TAO  Ruiyu LIANG  Xinran ZHANG  Li ZHAO  

     
    LETTER-Speech and Hearing

      Vol:
    E99-A No:10
      Page(s):
    1863-1866

    To discuss whether rotational invariance is the main role in spectrogram features, new spectral features based on local normalized center moments, denoted by LNCMSF, are proposed. The proposed LNCMSF firstly adopts 2nd order normalized center moments to describe local energy distribution of the logarithmic energy spectrum, then normalized center moment spectrograms NC1 and NC2 are gained. Secondly, DCT (Discrete Cosine Transform) is used to eliminate the correlation of NC1 and NC2, then high order cepstral coefficients TNC1 and TNC2 are obtained. Finally, LNCMSF is generated by combining NC1, NC2, TNC1 and TNC2. The rotational invariance test experiment shows that the rotational invariance is not a necessary property in partial spectrogram features. The recognition experiment shows that the maximum UA (Unweighted Average of Class-Wise Recall Rate) of LNCMSF are improved by at least 10.7% and 1.2% respectively, compared to that of MFCC (Mel Frequency Cepstrum Coefficient) and HuWSF (Weighted Spectral Features Based on Local Hu Moments).

  • Spectral Features Based on Local Hu Moments of Gabor Spectrograms for Speech Emotion Recognition

    Huawei TAO  Ruiyu LIANG  Cheng ZHA  Xinran ZHANG  Li ZHAO  

     
    LETTER-Pattern Recognition

      Pubricized:
    2016/05/06
      Vol:
    E99-D No:8
      Page(s):
    2186-2189

    To improve the recognition rate of the speech emotion, new spectral features based on local Hu moments of Gabor spectrograms are proposed, denoted by GSLHu-PCA. Firstly, the logarithmic energy spectrum of the emotional speech is computed. Secondly, the Gabor spectrograms are obtained by convoluting logarithmic energy spectrum with Gabor wavelet. Thirdly, Gabor local Hu moments(GLHu) spectrograms are obtained through block Hu strategy, then discrete cosine transform (DCT) is used to eliminate correlation among components of GLHu spectrograms. Fourthly, statistical features are extracted from cepstral coefficients of GLHu spectrograms, then all the statistical features form a feature vector. Finally, principal component analysis (PCA) is used to reduce redundancy of features. The experimental results on EmoDB and ABC databases validate the effectiveness of GSLHu-PCA.