Effectively Utilizing the Category Labels for Image Captioning

Junlong FENG; Jianping ZHAO

doi:10.1587/transinf.2022DLP0013

Effectively Utilizing the Category Labels for Image Captioning

Junlong FENG, Jianping ZHAO

Full Text Views

2

Cite this

Summary :

As a further investigation of the image captioning task, some works extended the vision-text dataset for specific subtasks, such as the stylized caption generating. The corpus in such dataset is usually composed of obvious sentiment-bearing words. While, in some special cases, the captions are classified depending on image category. This will result in a latent problem: the generated sentences are in close semantic meaning but belong to different or even opposite categories. It is a worthy issue to explore an effective way to utilize the image category label to boost the caption difference. Therefore, we proposed an image captioning network with the label control mechanism (LCNET) in this paper. First, to further improve the caption difference, LCNET employs a semantic enhancement module to provide the decoder with global semantic vectors. Then, through the proposed label control LSTM, LCNET can dynamically modulate the caption generation depending on the image category labels. Finally, the decoder integrates the spatial image features with global semantic vectors to output the caption. Using all the standard evaluation metrics shows that our model outperforms the compared models. Caption analysis demonstrates our approach can improve the performance of semantic representation. Compared with other label control mechanisms, our model is capable of boosting the caption difference according to the labels and keeping a better consistent with image content as well.

Publication: IEICE TRANSACTIONS on Information Vol.E106-D No.5 pp.617-624

Publication Date: 2023/05/01

Publicized: 2021/12/13

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2022DLP0013

Type of Manuscript: Special Section PAPER (Special Section on Deep Learning Technologies: Architecture, Optimization, Techniques, and Applications)

Category: Core Methods

Authors

Junlong FENG
Changchun University of Science and Technology
Jianping ZHAO
Changchun University of Science and Technology

Keyword

image captioning, category label, label control mechanism, caption difference, semantic enhancement

Cite this

Copy

Junlong FENG, Jianping ZHAO, "Effectively Utilizing the Category Labels for Image Captioning" in IEICE TRANSACTIONS on Information, vol. E106-D, no. 5, pp. 617-624, May 2023, doi: 10.1587/transinf.2022DLP0013.
Abstract: As a further investigation of the image captioning task, some works extended the vision-text dataset for specific subtasks, such as the stylized caption generating. The corpus in such dataset is usually composed of obvious sentiment-bearing words. While, in some special cases, the captions are classified depending on image category. This will result in a latent problem: the generated sentences are in close semantic meaning but belong to different or even opposite categories. It is a worthy issue to explore an effective way to utilize the image category label to boost the caption difference. Therefore, we proposed an image captioning network with the label control mechanism (LCNET) in this paper. First, to further improve the caption difference, LCNET employs a semantic enhancement module to provide the decoder with global semantic vectors. Then, through the proposed label control LSTM, LCNET can dynamically modulate the caption generation depending on the image category labels. Finally, the decoder integrates the spatial image features with global semantic vectors to output the caption. Using all the standard evaluation metrics shows that our model outperforms the compared models. Caption analysis demonstrates our approach can improve the performance of semantic representation. Compared with other label control mechanisms, our model is capable of boosting the caption difference according to the labels and keeping a better consistent with image content as well.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2022DLP0013/_p

Copy

@ARTICLE{e106-d_5_617,
author={Junlong FENG, Jianping ZHAO, },
journal={IEICE TRANSACTIONS on Information},
title={Effectively Utilizing the Category Labels for Image Captioning},
year={2023},
volume={E106-D},
number={5},
pages={617-624},
abstract={As a further investigation of the image captioning task, some works extended the vision-text dataset for specific subtasks, such as the stylized caption generating. The corpus in such dataset is usually composed of obvious sentiment-bearing words. While, in some special cases, the captions are classified depending on image category. This will result in a latent problem: the generated sentences are in close semantic meaning but belong to different or even opposite categories. It is a worthy issue to explore an effective way to utilize the image category label to boost the caption difference. Therefore, we proposed an image captioning network with the label control mechanism (LCNET) in this paper. First, to further improve the caption difference, LCNET employs a semantic enhancement module to provide the decoder with global semantic vectors. Then, through the proposed label control LSTM, LCNET can dynamically modulate the caption generation depending on the image category labels. Finally, the decoder integrates the spatial image features with global semantic vectors to output the caption. Using all the standard evaluation metrics shows that our model outperforms the compared models. Caption analysis demonstrates our approach can improve the performance of semantic representation. Compared with other label control mechanisms, our model is capable of boosting the caption difference according to the labels and keeping a better consistent with image content as well.},
keywords={},
doi={10.1587/transinf.2022DLP0013},
ISSN={1745-1361},
month={May},}

Copy

TY - JOUR
TI - Effectively Utilizing the Category Labels for Image Captioning
T2 - IEICE TRANSACTIONS on Information
SP - 617
EP - 624
AU - Junlong FENG
AU - Jianping ZHAO
PY - 2023
DO - 10.1587/transinf.2022DLP0013
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E106-D
IS - 5
JA - IEICE TRANSACTIONS on Information
Y1 - May 2023
AB - As a further investigation of the image captioning task, some works extended the vision-text dataset for specific subtasks, such as the stylized caption generating. The corpus in such dataset is usually composed of obvious sentiment-bearing words. While, in some special cases, the captions are classified depending on image category. This will result in a latent problem: the generated sentences are in close semantic meaning but belong to different or even opposite categories. It is a worthy issue to explore an effective way to utilize the image category label to boost the caption difference. Therefore, we proposed an image captioning network with the label control mechanism (LCNET) in this paper. First, to further improve the caption difference, LCNET employs a semantic enhancement module to provide the decoder with global semantic vectors. Then, through the proposed label control LSTM, LCNET can dynamically modulate the caption generation depending on the image category labels. Finally, the decoder integrates the spatial image features with global semantic vectors to output the caption. Using all the standard evaluation metrics shows that our model outperforms the compared models. Caption analysis demonstrates our approach can improve the performance of semantic representation. Compared with other label control mechanisms, our model is capable of boosting the caption difference according to the labels and keeping a better consistent with image content as well.
ER -