Image Caption Generation Model with Visual Attention and Dynamic Semantic Information Guiding

doi:10.3778/j.issn.1673-9418.1704047

Journal of Frontiers of Computer Science and Technology ›› 2017, Vol. 11 ›› Issue (12): 2033-2040.DOI: 10.3778/j.issn.1673-9418.1704047

Image Caption Generation Model with Visual Attention and Dynamic Semantic Information Guiding

ZHANG Wei+, ZHOU Zhiping

Engineering Research Center of Internet of Things Technology Applications of Ministry of Education, Jiangnan University, Wuxi, Jiangsu 214122, China

Online:2017-12-01 Published:2017-12-07

融合注意力和动态语义指导的图像描述模型

张威+，周治平

江南大学物联网技术应用教育部工程研究中心，江苏无锡 214122

Abstract

Abstract: Aiming at the problem that the current image semantic generation model does not adequately describe the details of the object in the images, this paper proposes an image content description structure which combines the dynamic semantic guidance of image and the adaptive attention mechanism. In the model, according to the last-time prediction word, the attention mechanism adaptively chooses the image part which will be processed in the next-time. In addition, the model builds dense image information as the additional monitoring information, so that makes the model description image associating the image semantic information with the attention information. The training and testing are done in Flickr8K and Flickr30K databases, the experimental results using different evaluations show that the proposed model has good performance. Especially, compared with Guiding-Long Short-Term Memory model, the score increases 4.1, 1.8, 2.4, 0.8, 3.1, up to 6.3%，4.0%，7.9%，3.9%，17.3%; Compared with Soft-Attention, the score improves 1.9, 2.4, 3.3, 1.5, 2.74, up to 2.8%, 5.5%, 11.1%, 7.5%, 14.8%.

Key words: image caption generation, image description, deep neural networks, visual attention mechanism, semantic information

摘要： 针对当前图像语义描述生成模型对图像内目标细节部分描述不充分问题，提出了一种结合图像动态语义指导和自适应注意力机制的图像语义描述模型。该模型根据上一时刻信息预测下一时刻单词，采用自适应注意力机制选择下一时刻模型需要处理的图像区域。此外，该模型构建了图像的密集属性信息作为额外的监督信息，使得模型可以联合图像语义信息和注意力信息进行图像内容描述。在Flickr8K和Flickr30K图像集中进行了训练和测试，并且使用了不同的评估方法对所提模型进行了验证，实验结果表明所提模型性能有较大的提高，尤其与Guiding-Long Short-Term Memory模型相比，得分提高了4.1、1.8、2.4、0.8、3.1，提升幅度达到6.3%、4.0%、7.9%、3.9%、17.3%；与Soft-Attention相比，得分分别提高了1.9、2.4、3.3、1.5、2.74，提升幅度达到2.8%、5.5%、11.1%、7.5%、14.8%。

关键词: 图像标注生成, 图像内容描述, 深度神经网络, 视觉注意力, 语义信息

ZHANG Wei, ZHOU Zhiping. Image Caption Generation Model with Visual Attention and Dynamic Semantic Information Guiding[J]. Journal of Frontiers of Computer Science and Technology, 2017, 11(12): 2033-2040.

张威，周治平. 融合注意力和动态语义指导的图像描述模型[J]. 计算机科学与探索, 2017, 11(12): 2033-2040.

[1]	WU Xiaodong, LIU Jinghao, JIN Jie, MAO Siping. DNN Intrusion Detection Model Based on DT and PCA [J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(8): 1450-1458.
[2]	SHEN Xueli, QIN Xinyu. KNN Algorithm of Enhanced Clustering Based on Density Canopy and Deep Feature [J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(7): 1289-1301.
[3]	ZU Xian, XIE Fei, LIU Xiaojian. Keyphrase Extraction Combining Word and Document Embeddings [J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(2): 294-304.
[4]	LI Junjie, WANG Qian. Perceptually Similar Image Classification Adversarial Example Generation Model [J]. Journal of Frontiers of Computer Science and Technology, 2020, 14(11): 1930-1942.
[5]	ZHANG Yubing, SONG Wei. Block Weighted Image Retrieval Method Based on Visual Features [J]. Journal of Frontiers of Computer Science and Technology, 2017, 11(3): 468-477.
[6]	HU Zhigang, JING Dongmei, CHEN Bailin, YANG Liu. Research on Semantic Data Query Method Based on Hadoop [J]. Journal of Frontiers of Computer Science and Technology, 2016, 10(7): 948-958.
[7]	HAN Bing, GAO Xinbo, LI Jie . Visual Attention Model Based on Visual Cortex Mechanisms [J]. Journal of Frontiers of Computer Science and Technology, 2011, 5(11): 1014-1020.
[8]	ZENG Chunqiu+, TANG Changjie, LI Chuan, DUAN Lei. MPSQAR： Mining Quantitative Association Rules without Loss of Semantics [J]. Journal of Frontiers of Computer Science and Technology, 2009, 3(4): 392-404.

Image Caption Generation Model with Visual Attention and Dynamic Semantic Information Guiding

融合注意力和动态语义指导的图像描述模型

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 8

Recommended Articles

Metrics