In practice, emotional speech utterances are often collected from different devices or conditions, which will lead to discrepancy between the training and testing data, resulting in sharp decrease of recognition rates. To solve this problem, in this letter, a novel transfer semi-supervised non-negative matrix factorization (TSNMF) method is presented. A semi-supervised negative matrix factorization algorithm, utilizing both labeled source and unlabeled target data, is adopted to learn common feature representations. Meanwhile, the maximum mean discrepancy (MMD) as a similarity measurement is employed to reduce the distance between the feature distributions of two databases. Finally, the TSNMF algorithm, which optimizes the SNMF and MMD functions together, is proposed to obtain robust feature representations across databases. Extensive experiments demonstrate that in comparison to the state-of-the-art approaches, our proposed method can significantly improve the cross-corpus recognition rates.
Peng SONG
Yantai University
Shifeng OU
Yantai University
Xinran ZHANG
Southeast University
Yun JIN
Southeast University
Wenming ZHENG
Southeast University
Jinglei LIU
Yantai University
Yanwei YU
Yantai University
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Peng SONG, Shifeng OU, Xinran ZHANG, Yun JIN, Wenming ZHENG, Jinglei LIU, Yanwei YU, "Transfer Semi-Supervised Non-Negative Matrix Factorization for Speech Emotion Recognition" in IEICE TRANSACTIONS on Information,
vol. E99-D, no. 10, pp. 2647-2650, October 2016, doi: 10.1587/transinf.2016EDL8067.
Abstract: In practice, emotional speech utterances are often collected from different devices or conditions, which will lead to discrepancy between the training and testing data, resulting in sharp decrease of recognition rates. To solve this problem, in this letter, a novel transfer semi-supervised non-negative matrix factorization (TSNMF) method is presented. A semi-supervised negative matrix factorization algorithm, utilizing both labeled source and unlabeled target data, is adopted to learn common feature representations. Meanwhile, the maximum mean discrepancy (MMD) as a similarity measurement is employed to reduce the distance between the feature distributions of two databases. Finally, the TSNMF algorithm, which optimizes the SNMF and MMD functions together, is proposed to obtain robust feature representations across databases. Extensive experiments demonstrate that in comparison to the state-of-the-art approaches, our proposed method can significantly improve the cross-corpus recognition rates.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2016EDL8067/_p
Copy
@ARTICLE{e99-d_10_2647,
author={Peng SONG, Shifeng OU, Xinran ZHANG, Yun JIN, Wenming ZHENG, Jinglei LIU, Yanwei YU, },
journal={IEICE TRANSACTIONS on Information},
title={Transfer Semi-Supervised Non-Negative Matrix Factorization for Speech Emotion Recognition},
year={2016},
volume={E99-D},
number={10},
pages={2647-2650},
abstract={In practice, emotional speech utterances are often collected from different devices or conditions, which will lead to discrepancy between the training and testing data, resulting in sharp decrease of recognition rates. To solve this problem, in this letter, a novel transfer semi-supervised non-negative matrix factorization (TSNMF) method is presented. A semi-supervised negative matrix factorization algorithm, utilizing both labeled source and unlabeled target data, is adopted to learn common feature representations. Meanwhile, the maximum mean discrepancy (MMD) as a similarity measurement is employed to reduce the distance between the feature distributions of two databases. Finally, the TSNMF algorithm, which optimizes the SNMF and MMD functions together, is proposed to obtain robust feature representations across databases. Extensive experiments demonstrate that in comparison to the state-of-the-art approaches, our proposed method can significantly improve the cross-corpus recognition rates.},
keywords={},
doi={10.1587/transinf.2016EDL8067},
ISSN={1745-1361},
month={October},}
Copy
TY - JOUR
TI - Transfer Semi-Supervised Non-Negative Matrix Factorization for Speech Emotion Recognition
T2 - IEICE TRANSACTIONS on Information
SP - 2647
EP - 2650
AU - Peng SONG
AU - Shifeng OU
AU - Xinran ZHANG
AU - Yun JIN
AU - Wenming ZHENG
AU - Jinglei LIU
AU - Yanwei YU
PY - 2016
DO - 10.1587/transinf.2016EDL8067
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E99-D
IS - 10
JA - IEICE TRANSACTIONS on Information
Y1 - October 2016
AB - In practice, emotional speech utterances are often collected from different devices or conditions, which will lead to discrepancy between the training and testing data, resulting in sharp decrease of recognition rates. To solve this problem, in this letter, a novel transfer semi-supervised non-negative matrix factorization (TSNMF) method is presented. A semi-supervised negative matrix factorization algorithm, utilizing both labeled source and unlabeled target data, is adopted to learn common feature representations. Meanwhile, the maximum mean discrepancy (MMD) as a similarity measurement is employed to reduce the distance between the feature distributions of two databases. Finally, the TSNMF algorithm, which optimizes the SNMF and MMD functions together, is proposed to obtain robust feature representations across databases. Extensive experiments demonstrate that in comparison to the state-of-the-art approaches, our proposed method can significantly improve the cross-corpus recognition rates.
ER -