Attention-Based Dense LSTM for Speech Emotion Recognition

Yue XIE; Ruiyu LIANG; Zhenlin LIANG; Li ZHAO

doi:10.1587/transinf.2019EDL8019

Open Access
Attention-Based Dense LSTM for Speech Emotion Recognition

Yue XIE, Ruiyu LIANG, Zhenlin LIANG, Li ZHAO

Full Text Views

140

Cite this

Free PDF (322.6KB)

Summary :

Despite the widespread use of deep learning for speech emotion recognition, they are severely restricted due to the information loss in the high layer of deep neural networks, as well as the degradation problem. In order to efficiently utilize information and solve degradation, attention-based dense long short-term memory (LSTM) is proposed for speech emotion recognition. LSTM networks with the ability to process time series such as speech are constructed into which attention-based dense connections are introduced. That means the weight coefficients are added to skip-connections of each layer to distinguish the difference of the emotional information between layers and avoid the interference of redundant information from the bottom layer to the effective information from the top layer. The experiments demonstrate that proposed method improves the recognition performance by 12% and 7% on eNTERFACE and IEMOCAP corpus respectively.

Publication: IEICE TRANSACTIONS on Information Vol.E102-D No.7 pp.1426-1429

Publication Date: 2019/07/01

Publicized: 2019/04/17

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2019EDL8019

Type of Manuscript: LETTER

Category: Pattern Recognition

Authors

Yue XIE
  Southeast University
Ruiyu LIANG
  Nanjing Institute of Technology
Zhenlin LIANG
  Southeast University
Li ZHAO
  Southeast University

Keyword

attention mechanism, speech emotion recognition, dense connections, LSTM

Cite this

Copy

Yue XIE, Ruiyu LIANG, Zhenlin LIANG, Li ZHAO, "Attention-Based Dense LSTM for Speech Emotion Recognition" in IEICE TRANSACTIONS on Information, vol. E102-D, no. 7, pp. 1426-1429, July 2019, doi: 10.1587/transinf.2019EDL8019.
Abstract: Despite the widespread use of deep learning for speech emotion recognition, they are severely restricted due to the information loss in the high layer of deep neural networks, as well as the degradation problem. In order to efficiently utilize information and solve degradation, attention-based dense long short-term memory (LSTM) is proposed for speech emotion recognition. LSTM networks with the ability to process time series such as speech are constructed into which attention-based dense connections are introduced. That means the weight coefficients are added to skip-connections of each layer to distinguish the difference of the emotional information between layers and avoid the interference of redundant information from the bottom layer to the effective information from the top layer. The experiments demonstrate that proposed method improves the recognition performance by 12% and 7% on eNTERFACE and IEMOCAP corpus respectively.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2019EDL8019/_p

Copy

@ARTICLE{e102-d_7_1426,
author={Yue XIE, Ruiyu LIANG, Zhenlin LIANG, Li ZHAO, },
journal={IEICE TRANSACTIONS on Information},
title={Attention-Based Dense LSTM for Speech Emotion Recognition},
year={2019},
volume={E102-D},
number={7},
pages={1426-1429},
abstract={Despite the widespread use of deep learning for speech emotion recognition, they are severely restricted due to the information loss in the high layer of deep neural networks, as well as the degradation problem. In order to efficiently utilize information and solve degradation, attention-based dense long short-term memory (LSTM) is proposed for speech emotion recognition. LSTM networks with the ability to process time series such as speech are constructed into which attention-based dense connections are introduced. That means the weight coefficients are added to skip-connections of each layer to distinguish the difference of the emotional information between layers and avoid the interference of redundant information from the bottom layer to the effective information from the top layer. The experiments demonstrate that proposed method improves the recognition performance by 12% and 7% on eNTERFACE and IEMOCAP corpus respectively.},
keywords={},
doi={10.1587/transinf.2019EDL8019},
ISSN={1745-1361},
month={July},}

Copy

TY - JOUR
TI - Attention-Based Dense LSTM for Speech Emotion Recognition
T2 - IEICE TRANSACTIONS on Information
SP - 1426
EP - 1429
AU - Yue XIE
AU - Ruiyu LIANG
AU - Zhenlin LIANG
AU - Li ZHAO
PY - 2019
DO - 10.1587/transinf.2019EDL8019
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E102-D
IS - 7
JA - IEICE TRANSACTIONS on Information
Y1 - July 2019
AB - Despite the widespread use of deep learning for speech emotion recognition, they are severely restricted due to the information loss in the high layer of deep neural networks, as well as the degradation problem. In order to efficiently utilize information and solve degradation, attention-based dense long short-term memory (LSTM) is proposed for speech emotion recognition. LSTM networks with the ability to process time series such as speech are constructed into which attention-based dense connections are introduced. That means the weight coefficients are added to skip-connections of each layer to distinguish the difference of the emotional information between layers and avoid the interference of redundant information from the bottom layer to the effective information from the top layer. The experiments demonstrate that proposed method improves the recognition performance by 12% and 7% on eNTERFACE and IEMOCAP corpus respectively.
ER -