Image Captioning Algorithm Based on Multi-Branch CNN and Bi-LSTM

Shan HE; Yuanyao LU; Shengnan CHEN

doi:10.1587/transinf.2020EDP7227

Image Captioning Algorithm Based on Multi-Branch CNN and Bi-LSTM

Shan HE, Yuanyao LU, Shengnan CHEN

Full Text Views

3

Cite this

Summary :

The development of deep learning and neural networks has brought broad prospects to computer vision and natural language processing. The image captioning task combines cutting-edge methods in two fields. By building an end-to-end encoder-decoder model, its description performance can be greatly improved. In this paper, the multi-branch deep convolutional neural network is used as the encoder to extract image features, and the recurrent neural network is used to generate descriptive text that matches the input image. We conducted experiments on Flickr8k, Flickr30k and MSCOCO datasets. According to the analysis of the experimental results on evaluation metrics, the model proposed in this paper can effectively achieve image caption, and its performance is better than classic image captioning models such as neural image annotation models.

Publication: IEICE TRANSACTIONS on Information Vol.E104-D No.7 pp.941-947

Publication Date: 2021/07/01

Publicized: 2021/04/19

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2020EDP7227

Type of Manuscript: PAPER

Category: Artificial Intelligence, Data Mining

Authors

Shan HE
  North China University of Technology
Yuanyao LU
  North China University of Technology
Shengnan CHEN
  North China University of Technology

Keyword

image captioning, multi-branch CNN, Bi-LSTM, encoder-decoder

Cite this

Copy

Shan HE, Yuanyao LU, Shengnan CHEN, "Image Captioning Algorithm Based on Multi-Branch CNN and Bi-LSTM" in IEICE TRANSACTIONS on Information, vol. E104-D, no. 7, pp. 941-947, July 2021, doi: 10.1587/transinf.2020EDP7227.
Abstract: The development of deep learning and neural networks has brought broad prospects to computer vision and natural language processing. The image captioning task combines cutting-edge methods in two fields. By building an end-to-end encoder-decoder model, its description performance can be greatly improved. In this paper, the multi-branch deep convolutional neural network is used as the encoder to extract image features, and the recurrent neural network is used to generate descriptive text that matches the input image. We conducted experiments on Flickr8k, Flickr30k and MSCOCO datasets. According to the analysis of the experimental results on evaluation metrics, the model proposed in this paper can effectively achieve image caption, and its performance is better than classic image captioning models such as neural image annotation models.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2020EDP7227/_p

Copy

@ARTICLE{e104-d_7_941,
author={Shan HE, Yuanyao LU, Shengnan CHEN, },
journal={IEICE TRANSACTIONS on Information},
title={Image Captioning Algorithm Based on Multi-Branch CNN and Bi-LSTM},
year={2021},
volume={E104-D},
number={7},
pages={941-947},
abstract={The development of deep learning and neural networks has brought broad prospects to computer vision and natural language processing. The image captioning task combines cutting-edge methods in two fields. By building an end-to-end encoder-decoder model, its description performance can be greatly improved. In this paper, the multi-branch deep convolutional neural network is used as the encoder to extract image features, and the recurrent neural network is used to generate descriptive text that matches the input image. We conducted experiments on Flickr8k, Flickr30k and MSCOCO datasets. According to the analysis of the experimental results on evaluation metrics, the model proposed in this paper can effectively achieve image caption, and its performance is better than classic image captioning models such as neural image annotation models.},
keywords={},
doi={10.1587/transinf.2020EDP7227},
ISSN={1745-1361},
month={July},}

Copy

TY - JOUR
TI - Image Captioning Algorithm Based on Multi-Branch CNN and Bi-LSTM
T2 - IEICE TRANSACTIONS on Information
SP - 941
EP - 947
AU - Shan HE
AU - Yuanyao LU
AU - Shengnan CHEN
PY - 2021
DO - 10.1587/transinf.2020EDP7227
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E104-D
IS - 7
JA - IEICE TRANSACTIONS on Information
Y1 - July 2021
AB - The development of deep learning and neural networks has brought broad prospects to computer vision and natural language processing. The image captioning task combines cutting-edge methods in two fields. By building an end-to-end encoder-decoder model, its description performance can be greatly improved. In this paper, the multi-branch deep convolutional neural network is used as the encoder to extract image features, and the recurrent neural network is used to generate descriptive text that matches the input image. We conducted experiments on Flickr8k, Flickr30k and MSCOCO datasets. According to the analysis of the experimental results on evaluation metrics, the model proposed in this paper can effectively achieve image caption, and its performance is better than classic image captioning models such as neural image annotation models.
ER -