The search functionality is under construction.

IEICE TRANSACTIONS on Information

A Visual Question Answering Network Merging High- and Low-Level Semantic Information

Huimin LI, Dezhi HAN, Chongqing CHEN, Chin-Chen CHANG, Kuan-Ching LI, Dun LI

  • Full Text Views

    1

  • Cite this

Summary :

Visual Question Answering (VQA) usually uses deep attention mechanisms to learn fine-grained visual content of images and textual content of questions. However, the deep attention mechanism can only learn high-level semantic information while ignoring the impact of the low-level semantic information on answer prediction. For such, we design a High- and Low-Level Semantic Information Network (HLSIN), which employs two strategies to achieve the fusion of high-level semantic information and low-level semantic information. Adaptive weight learning is taken as the first strategy to allow different levels of semantic information to learn weights separately. The gate-sum mechanism is used as the second to suppress invalid information in various levels of information and fuse valid information. On the benchmark VQA-v2 dataset, we quantitatively and qualitatively evaluate HLSIN and conduct extensive ablation studies to explore the reasons behind HLSIN's effectiveness. Experimental results demonstrate that HLSIN significantly outperforms the previous state-of-the-art, with an overall accuracy of 70.93% on test-dev.

Publication
IEICE TRANSACTIONS on Information Vol.E106-D No.5 pp.581-589
Publication Date
2023/05/01
Publicized
2022/01/06
Online ISSN
1745-1361
DOI
10.1587/transinf.2022DLP0002
Type of Manuscript
Special Section PAPER (Special Section on Deep Learning Technologies: Architecture, Optimization, Techniques, and Applications)
Category
Core Methods

Authors

Huimin LI
  Shanghai Maritime University
Dezhi HAN
  Shanghai Maritime University
Chongqing CHEN
  Shanghai Maritime University
Chin-Chen CHANG
  Feng Chia University
Kuan-Ching LI
  Providence University
Dun LI
  Shanghai Maritime University

Keyword