The development of deep learning and neural networks has brought broad prospects to computer vision and natural language processing. The image captioning task combines cutting-edge methods in two fields. By building an end-to-end encoder-decoder model, its description performance can be greatly improved. In this paper, the multi-branch deep convolutional neural network is used as the encoder to extract image features, and the recurrent neural network is used to generate descriptive text that matches the input image. We conducted experiments on Flickr8k, Flickr30k and MSCOCO datasets. According to the analysis of the experimental results on evaluation metrics, the model proposed in this paper can effectively achieve image caption, and its performance is better than classic image captioning models such as neural image annotation models.
Shan HE
North China University of Technology
Yuanyao LU
North China University of Technology
Shengnan CHEN
North China University of Technology
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Shan HE, Yuanyao LU, Shengnan CHEN, "Image Captioning Algorithm Based on Multi-Branch CNN and Bi-LSTM" in IEICE TRANSACTIONS on Information,
vol. E104-D, no. 7, pp. 941-947, July 2021, doi: 10.1587/transinf.2020EDP7227.
Abstract: The development of deep learning and neural networks has brought broad prospects to computer vision and natural language processing. The image captioning task combines cutting-edge methods in two fields. By building an end-to-end encoder-decoder model, its description performance can be greatly improved. In this paper, the multi-branch deep convolutional neural network is used as the encoder to extract image features, and the recurrent neural network is used to generate descriptive text that matches the input image. We conducted experiments on Flickr8k, Flickr30k and MSCOCO datasets. According to the analysis of the experimental results on evaluation metrics, the model proposed in this paper can effectively achieve image caption, and its performance is better than classic image captioning models such as neural image annotation models.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2020EDP7227/_p
Copy
@ARTICLE{e104-d_7_941,
author={Shan HE, Yuanyao LU, Shengnan CHEN, },
journal={IEICE TRANSACTIONS on Information},
title={Image Captioning Algorithm Based on Multi-Branch CNN and Bi-LSTM},
year={2021},
volume={E104-D},
number={7},
pages={941-947},
abstract={The development of deep learning and neural networks has brought broad prospects to computer vision and natural language processing. The image captioning task combines cutting-edge methods in two fields. By building an end-to-end encoder-decoder model, its description performance can be greatly improved. In this paper, the multi-branch deep convolutional neural network is used as the encoder to extract image features, and the recurrent neural network is used to generate descriptive text that matches the input image. We conducted experiments on Flickr8k, Flickr30k and MSCOCO datasets. According to the analysis of the experimental results on evaluation metrics, the model proposed in this paper can effectively achieve image caption, and its performance is better than classic image captioning models such as neural image annotation models.},
keywords={},
doi={10.1587/transinf.2020EDP7227},
ISSN={1745-1361},
month={July},}
Copy
TY - JOUR
TI - Image Captioning Algorithm Based on Multi-Branch CNN and Bi-LSTM
T2 - IEICE TRANSACTIONS on Information
SP - 941
EP - 947
AU - Shan HE
AU - Yuanyao LU
AU - Shengnan CHEN
PY - 2021
DO - 10.1587/transinf.2020EDP7227
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E104-D
IS - 7
JA - IEICE TRANSACTIONS on Information
Y1 - July 2021
AB - The development of deep learning and neural networks has brought broad prospects to computer vision and natural language processing. The image captioning task combines cutting-edge methods in two fields. By building an end-to-end encoder-decoder model, its description performance can be greatly improved. In this paper, the multi-branch deep convolutional neural network is used as the encoder to extract image features, and the recurrent neural network is used to generate descriptive text that matches the input image. We conducted experiments on Flickr8k, Flickr30k and MSCOCO datasets. According to the analysis of the experimental results on evaluation metrics, the model proposed in this paper can effectively achieve image caption, and its performance is better than classic image captioning models such as neural image annotation models.
ER -