The search functionality is under construction.
The search functionality is under construction.

Speech Emotion Recognition Using Multihead Attention in Both Time and Feature Dimensions

Yue XIE, Ruiyu LIANG, Zhenlin LIANG, Xiaoyan ZHAO, Wenhao ZENG

  • Full Text Views

    3

  • Cite this

Summary :

To enhance the emotion feature and improve the performance of speech emotion recognition, an attention mechanism is employed to recognize the important information in both time and feature dimensions. In the time dimension, multi-heads attention is modified with the last state of the long short-term memory (LSTM)'s output to match the time accumulation characteristic of LSTM. In the feature dimension, scaled dot-product attention is replaced with additive attention that refers to the method of the state update of LSTM to construct multi-heads attention. This means that a nonlinear change replaces the linear mapping in classical multi-heads attention. Experiments on IEMOCAP datasets demonstrate that the attention mechanism could enhance emotional information and improve the performance of speech emotion recognition.

Publication
IEICE TRANSACTIONS on Information Vol.E106-D No.5 pp.1098-1101
Publication Date
2023/05/01
Publicized
2023/02/21
Online ISSN
1745-1361
DOI
10.1587/transinf.2022EDL8084
Type of Manuscript
LETTER
Category
Speech and Hearing

Authors

Yue XIE
  Nanjing Institute of Technology
Ruiyu LIANG
  Nanjing Institute of Technology
Zhenlin LIANG
  Southeast University
Xiaoyan ZHAO
  Nanjing Institute of Technology
Wenhao ZENG
  Nanjing Institute of Technology

Keyword