An Intra- and Inter-Emotion Transformer-Based Fusion Model with Homogeneous and Diverse Constraints Using Multi-Emotional Audiovisual Features for Depression Detection

Shiyu TENG; Jiaqing LIU; Yue HUANG; Shurong CHAI; Tomoko TATEYAMA; Xinyin HUANG; Lanfen LIN; Yen-Wei CHEN

doi:10.1587/transinf.2023HCP0006

IEICE TRANSACTIONS on Information

An Intra- and Inter-Emotion Transformer-Based Fusion Model with Homogeneous and Diverse Constraints Using Multi-Emotional Audiovisual Features for Depression Detection

Shiyu TENG, Jiaqing LIU, Yue HUANG, Shurong CHAI, Tomoko TATEYAMA, Xinyin HUANG, Lanfen LIN, Yen-Wei CHEN

Full Text Views

3

Cite this

Summary :

Depression is a prevalent mental disorder affecting a significant portion of the global population, leading to considerable disability and contributing to the overall burden of disease. Consequently, designing efficient and robust automated methods for depression detection has become imperative. Recently, deep learning methods, especially multimodal fusion methods, have been increasingly used in computer-aided depression detection. Importantly, individuals with depression and those without respond differently to various emotional stimuli, providing valuable information for detecting depression. Building on these observations, we propose an intra- and inter-emotional stimulus transformer-based fusion model to effectively extract depression-related features. The intra-emotional stimulus fusion framework aims to prioritize different modalities, capitalizing on their diversity and complementarity for depression detection. The inter-emotional stimulus model maps each emotional stimulus onto both invariant and specific subspaces using individual invariant and specific encoders. The emotional stimulus-invariant subspace facilitates efficient information sharing and integration across different emotional stimulus categories, while the emotional stimulus specific subspace seeks to enhance diversity and capture the distinct characteristics of individual emotional stimulus categories. Our proposed intra- and inter-emotional stimulus fusion model effectively integrates multimodal data under various emotional stimulus categories, providing a comprehensive representation that allows accurate task predictions in the context of depression detection. We evaluate the proposed model on the Chinese Soochow University students dataset, and the results outperform state-of-the-art models in terms of concordance correlation coefficient (CCC), root mean squared error (RMSE) and accuracy.

Publication: IEICE TRANSACTIONS on Information Vol.E107-D No.3 pp.342-353

Publication Date: 2024/03/01

Publicized: 2023/12/15

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2023HCP0006

Type of Manuscript: Special Section PAPER (Special Section on Human Communication V)

Category

Authors

Shiyu TENG
  Ritsumeikan University
Jiaqing LIU
  Ritsumeikan University
Yue HUANG
  Zhejiang Provincial Clinical Research Center for Mental Disorder
Shurong CHAI
  Ritsumeikan University
Tomoko TATEYAMA
  Fujita Health University
Xinyin HUANG
  Soochow University
Lanfen LIN
  Zhejiang University
Yen-Wei CHEN
  Ritsumeikan University

Keyword

depression detection, emotion fusion, stimuli, transformer, audiovisual, deep learning

Cite this

Copy

Shiyu TENG, Jiaqing LIU, Yue HUANG, Shurong CHAI, Tomoko TATEYAMA, Xinyin HUANG, Lanfen LIN, Yen-Wei CHEN, "An Intra- and Inter-Emotion Transformer-Based Fusion Model with Homogeneous and Diverse Constraints Using Multi-Emotional Audiovisual Features for Depression Detection" in IEICE TRANSACTIONS on Information, vol. E107-D, no. 3, pp. 342-353, March 2024, doi: 10.1587/transinf.2023HCP0006.
Abstract: Depression is a prevalent mental disorder affecting a significant portion of the global population, leading to considerable disability and contributing to the overall burden of disease. Consequently, designing efficient and robust automated methods for depression detection has become imperative. Recently, deep learning methods, especially multimodal fusion methods, have been increasingly used in computer-aided depression detection. Importantly, individuals with depression and those without respond differently to various emotional stimuli, providing valuable information for detecting depression. Building on these observations, we propose an intra- and inter-emotional stimulus transformer-based fusion model to effectively extract depression-related features. The intra-emotional stimulus fusion framework aims to prioritize different modalities, capitalizing on their diversity and complementarity for depression detection. The inter-emotional stimulus model maps each emotional stimulus onto both invariant and specific subspaces using individual invariant and specific encoders. The emotional stimulus-invariant subspace facilitates efficient information sharing and integration across different emotional stimulus categories, while the emotional stimulus specific subspace seeks to enhance diversity and capture the distinct characteristics of individual emotional stimulus categories. Our proposed intra- and inter-emotional stimulus fusion model effectively integrates multimodal data under various emotional stimulus categories, providing a comprehensive representation that allows accurate task predictions in the context of depression detection. We evaluate the proposed model on the Chinese Soochow University students dataset, and the results outperform state-of-the-art models in terms of concordance correlation coefficient (CCC), root mean squared error (RMSE) and accuracy.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2023HCP0006/_p

Copy

@ARTICLE{e107-d_3_342,
author={Shiyu TENG, Jiaqing LIU, Yue HUANG, Shurong CHAI, Tomoko TATEYAMA, Xinyin HUANG, Lanfen LIN, Yen-Wei CHEN, },
journal={IEICE TRANSACTIONS on Information},
title={An Intra- and Inter-Emotion Transformer-Based Fusion Model with Homogeneous and Diverse Constraints Using Multi-Emotional Audiovisual Features for Depression Detection},
year={2024},
volume={E107-D},
number={3},
pages={342-353},
abstract={Depression is a prevalent mental disorder affecting a significant portion of the global population, leading to considerable disability and contributing to the overall burden of disease. Consequently, designing efficient and robust automated methods for depression detection has become imperative. Recently, deep learning methods, especially multimodal fusion methods, have been increasingly used in computer-aided depression detection. Importantly, individuals with depression and those without respond differently to various emotional stimuli, providing valuable information for detecting depression. Building on these observations, we propose an intra- and inter-emotional stimulus transformer-based fusion model to effectively extract depression-related features. The intra-emotional stimulus fusion framework aims to prioritize different modalities, capitalizing on their diversity and complementarity for depression detection. The inter-emotional stimulus model maps each emotional stimulus onto both invariant and specific subspaces using individual invariant and specific encoders. The emotional stimulus-invariant subspace facilitates efficient information sharing and integration across different emotional stimulus categories, while the emotional stimulus specific subspace seeks to enhance diversity and capture the distinct characteristics of individual emotional stimulus categories. Our proposed intra- and inter-emotional stimulus fusion model effectively integrates multimodal data under various emotional stimulus categories, providing a comprehensive representation that allows accurate task predictions in the context of depression detection. We evaluate the proposed model on the Chinese Soochow University students dataset, and the results outperform state-of-the-art models in terms of concordance correlation coefficient (CCC), root mean squared error (RMSE) and accuracy.},
keywords={},
doi={10.1587/transinf.2023HCP0006},
ISSN={1745-1361},
month={March},}

Copy

TY - JOUR
TI - An Intra- and Inter-Emotion Transformer-Based Fusion Model with Homogeneous and Diverse Constraints Using Multi-Emotional Audiovisual Features for Depression Detection
T2 - IEICE TRANSACTIONS on Information
SP - 342
EP - 353
AU - Shiyu TENG
AU - Jiaqing LIU
AU - Yue HUANG
AU - Shurong CHAI
AU - Tomoko TATEYAMA
AU - Xinyin HUANG
AU - Lanfen LIN
AU - Yen-Wei CHEN
PY - 2024
DO - 10.1587/transinf.2023HCP0006
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E107-D
IS - 3
JA - IEICE TRANSACTIONS on Information
Y1 - March 2024
AB - Depression is a prevalent mental disorder affecting a significant portion of the global population, leading to considerable disability and contributing to the overall burden of disease. Consequently, designing efficient and robust automated methods for depression detection has become imperative. Recently, deep learning methods, especially multimodal fusion methods, have been increasingly used in computer-aided depression detection. Importantly, individuals with depression and those without respond differently to various emotional stimuli, providing valuable information for detecting depression. Building on these observations, we propose an intra- and inter-emotional stimulus transformer-based fusion model to effectively extract depression-related features. The intra-emotional stimulus fusion framework aims to prioritize different modalities, capitalizing on their diversity and complementarity for depression detection. The inter-emotional stimulus model maps each emotional stimulus onto both invariant and specific subspaces using individual invariant and specific encoders. The emotional stimulus-invariant subspace facilitates efficient information sharing and integration across different emotional stimulus categories, while the emotional stimulus specific subspace seeks to enhance diversity and capture the distinct characteristics of individual emotional stimulus categories. Our proposed intra- and inter-emotional stimulus fusion model effectively integrates multimodal data under various emotional stimulus categories, providing a comprehensive representation that allows accurate task predictions in the context of depression detection. We evaluate the proposed model on the Chinese Soochow University students dataset, and the results outperform state-of-the-art models in terms of concordance correlation coefficient (CCC), root mean squared error (RMSE) and accuracy.
ER -

IEICE TRANSACTIONS on Information