IEICE global.ieice.org Site

Keyword Search Result

[Keyword] multimodal fusion(2hit)

1-2hit

Multimodal Named Entity Recognition with Bottleneck Fusion and Contrastive Learning
Peng WANG Xiaohang CHEN Ziyu SHANG Wenjun KE

PAPER-Natural Language Processing

Pubricized:
2023/01/18
Vol:
E106-D No:4
Page(s):
545-555
Multimodal named entity recognition (MNER) is the task of recognizing named entities in multimodal context. Existing methods focus on utilizing co-attention mechanism to discover the relationships between multiple modalities. However, they still have two deficiencies: First, current methods fail to fuse the multimodal representations in a fine-grained way, which may bring noise of visual modalities. Second, current methods ignore bridging the semantic gap between heterogeneous modalities. To solve the above issues, we propose a novel MNER method with bottleneck fusion and contrastive learning (BFCL). Specifically, we first incorporate the transformer-based bottleneck fusion mechanism, subsequently, information between different modalities can only be exchanged through several bottleneck tokens, thus reducing the noise propagation. Then we propose two decoupled image-text contrastive losses to align the unimodal representations, making the representations of semantically similar modalities closer, while the representations of semantically different modalities farther away. Experimental results demonstrate that our method is competitive to the state-of-the-art models, and achieves 74.54% and 85.70% F1-scores on Twitter-2015 and Twitter-2017 datasets, respectively.
Multimodal Affect Recognition Using Boltzmann Zippers
Kun LU Xin ZHANG

LETTER-Image Recognition, Computer Vision

Vol:
E96-D No:11
Page(s):
2496-2499
This letter presents a novel approach for automatic multimodal affect recognition. The audio and visual channels provide complementary information for human affective states recognition, and we utilize Boltzmann zippers as model-level fusion to learn intrinsic correlations between the different modalities. We extract effective audio and visual feature streams with different time scales and feed them to two component Boltzmann chains respectively. Hidden units of the two chains are interconnected to form a Boltzmann zipper which can effectively avoid local energy minima during training. Second-order methods are applied to Boltzmann zippers to speed up learning and pruning process. Experimental results on audio-visual emotion data recorded by ourselves in Wizard of Oz scenarios and collected from the SEMAINE naturalistic database both demonstrate our approach is robust and outperforms the state-of-the-art methods.

Keyword Search Result

[Keyword] multimodal fusion(2hit)

Multimodal Named Entity Recognition with Bottleneck Fusion and Contrastive Learning

Multimodal Affect Recognition Using Boltzmann Zippers

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles