1-1hit |
Jiaxin WU Bing LI Li ZHAO Xinzhou XU
The task of Speech Emotion Detection (SED) aims at judging positive class and negetive class when the speaker expresses emotions. The SED performances are heavily dependent on the diversity and prominence of emotional features extracted from the speech. However, most of the existing related research focuses on investigating the effects of single feature source and hand-crafted features. Thus, we propose a SED approach using multi-source low-level information based recurrent branches. The fusion multi-source low-level information obtain variety and discriminative representations from speech emotion signals. In addition, focal-loss function benifit for imbalance classes, resulting in reducing the proportion of well-classified samples and increasing the weights for difficult samples on SED tasks. Experiments on IEMOCAP corpus demonstrate the effectiveness of the proposed method. Compared with the baselines, MSIR achieve the significant performance improvements in terms of Unweighted Average Recall and F1-score.