To enhance the emotion feature and improve the performance of speech emotion recognition, an attention mechanism is employed to recognize the important information in both time and feature dimensions. In the time dimension, multi-heads attention is modified with the last state of the long short-term memory (LSTM)'s output to match the time accumulation characteristic of LSTM. In the feature dimension, scaled dot-product attention is replaced with additive attention that refers to the method of the state update of LSTM to construct multi-heads attention. This means that a nonlinear change replaces the linear mapping in classical multi-heads attention. Experiments on IEMOCAP datasets demonstrate that the attention mechanism could enhance emotional information and improve the performance of speech emotion recognition.
Yue XIE
Nanjing Institute of Technology
Ruiyu LIANG
Nanjing Institute of Technology
Zhenlin LIANG
Southeast University
Xiaoyan ZHAO
Nanjing Institute of Technology
Wenhao ZENG
Nanjing Institute of Technology
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Yue XIE, Ruiyu LIANG, Zhenlin LIANG, Xiaoyan ZHAO, Wenhao ZENG, "Speech Emotion Recognition Using Multihead Attention in Both Time and Feature Dimensions" in IEICE TRANSACTIONS on Information,
vol. E106-D, no. 5, pp. 1098-1101, May 2023, doi: 10.1587/transinf.2022EDL8084.
Abstract: To enhance the emotion feature and improve the performance of speech emotion recognition, an attention mechanism is employed to recognize the important information in both time and feature dimensions. In the time dimension, multi-heads attention is modified with the last state of the long short-term memory (LSTM)'s output to match the time accumulation characteristic of LSTM. In the feature dimension, scaled dot-product attention is replaced with additive attention that refers to the method of the state update of LSTM to construct multi-heads attention. This means that a nonlinear change replaces the linear mapping in classical multi-heads attention. Experiments on IEMOCAP datasets demonstrate that the attention mechanism could enhance emotional information and improve the performance of speech emotion recognition.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2022EDL8084/_p
Copy
@ARTICLE{e106-d_5_1098,
author={Yue XIE, Ruiyu LIANG, Zhenlin LIANG, Xiaoyan ZHAO, Wenhao ZENG, },
journal={IEICE TRANSACTIONS on Information},
title={Speech Emotion Recognition Using Multihead Attention in Both Time and Feature Dimensions},
year={2023},
volume={E106-D},
number={5},
pages={1098-1101},
abstract={To enhance the emotion feature and improve the performance of speech emotion recognition, an attention mechanism is employed to recognize the important information in both time and feature dimensions. In the time dimension, multi-heads attention is modified with the last state of the long short-term memory (LSTM)'s output to match the time accumulation characteristic of LSTM. In the feature dimension, scaled dot-product attention is replaced with additive attention that refers to the method of the state update of LSTM to construct multi-heads attention. This means that a nonlinear change replaces the linear mapping in classical multi-heads attention. Experiments on IEMOCAP datasets demonstrate that the attention mechanism could enhance emotional information and improve the performance of speech emotion recognition.},
keywords={},
doi={10.1587/transinf.2022EDL8084},
ISSN={1745-1361},
month={May},}
Copy
TY - JOUR
TI - Speech Emotion Recognition Using Multihead Attention in Both Time and Feature Dimensions
T2 - IEICE TRANSACTIONS on Information
SP - 1098
EP - 1101
AU - Yue XIE
AU - Ruiyu LIANG
AU - Zhenlin LIANG
AU - Xiaoyan ZHAO
AU - Wenhao ZENG
PY - 2023
DO - 10.1587/transinf.2022EDL8084
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E106-D
IS - 5
JA - IEICE TRANSACTIONS on Information
Y1 - May 2023
AB - To enhance the emotion feature and improve the performance of speech emotion recognition, an attention mechanism is employed to recognize the important information in both time and feature dimensions. In the time dimension, multi-heads attention is modified with the last state of the long short-term memory (LSTM)'s output to match the time accumulation characteristic of LSTM. In the feature dimension, scaled dot-product attention is replaced with additive attention that refers to the method of the state update of LSTM to construct multi-heads attention. This means that a nonlinear change replaces the linear mapping in classical multi-heads attention. Experiments on IEMOCAP datasets demonstrate that the attention mechanism could enhance emotional information and improve the performance of speech emotion recognition.
ER -