融合注意力和动态语义指导的图像描述模型

doi:10.3778/j.issn.1673-9418.1704047

计算机科学与探索 ›› 2017, Vol. 11 ›› Issue (12): 2033-2040.DOI: 10.3778/j.issn.1673-9418.1704047

• 人工智能与模式识别 • 上一篇

融合注意力和动态语义指导的图像描述模型

张威+，周治平

江南大学物联网技术应用教育部工程研究中心，江苏无锡 214122

出版日期:2017-12-01 发布日期:2017-12-07

Image Caption Generation Model with Visual Attention and Dynamic Semantic Information Guiding

ZHANG Wei+, ZHOU Zhiping

Engineering Research Center of Internet of Things Technology Applications of Ministry of Education, Jiangnan University, Wuxi, Jiangsu 214122, China

Online:2017-12-01 Published:2017-12-07

摘要/Abstract

摘要： 针对当前图像语义描述生成模型对图像内目标细节部分描述不充分问题，提出了一种结合图像动态语义指导和自适应注意力机制的图像语义描述模型。该模型根据上一时刻信息预测下一时刻单词，采用自适应注意力机制选择下一时刻模型需要处理的图像区域。此外，该模型构建了图像的密集属性信息作为额外的监督信息，使得模型可以联合图像语义信息和注意力信息进行图像内容描述。在Flickr8K和Flickr30K图像集中进行了训练和测试，并且使用了不同的评估方法对所提模型进行了验证，实验结果表明所提模型性能有较大的提高，尤其与Guiding-Long Short-Term Memory模型相比，得分提高了4.1、1.8、2.4、0.8、3.1，提升幅度达到6.3%、4.0%、7.9%、3.9%、17.3%；与Soft-Attention相比，得分分别提高了1.9、2.4、3.3、1.5、2.74，提升幅度达到2.8%、5.5%、11.1%、7.5%、14.8%。

关键词: 图像标注生成, 图像内容描述, 深度神经网络, 视觉注意力, 语义信息

Abstract: Aiming at the problem that the current image semantic generation model does not adequately describe the details of the object in the images, this paper proposes an image content description structure which combines the dynamic semantic guidance of image and the adaptive attention mechanism. In the model, according to the last-time prediction word, the attention mechanism adaptively chooses the image part which will be processed in the next-time. In addition, the model builds dense image information as the additional monitoring information, so that makes the model description image associating the image semantic information with the attention information. The training and testing are done in Flickr8K and Flickr30K databases, the experimental results using different evaluations show that the proposed model has good performance. Especially, compared with Guiding-Long Short-Term Memory model, the score increases 4.1, 1.8, 2.4, 0.8, 3.1, up to 6.3%，4.0%，7.9%，3.9%，17.3%; Compared with Soft-Attention, the score improves 1.9, 2.4, 3.3, 1.5, 2.74, up to 2.8%, 5.5%, 11.1%, 7.5%, 14.8%.

Key words: image caption generation, image description, deep neural networks, visual attention mechanism, semantic information

张威，周治平. 融合注意力和动态语义指导的图像描述模型[J]. 计算机科学与探索, 2017, 11(12): 2033-2040.

ZHANG Wei, ZHOU Zhiping. Image Caption Generation Model with Visual Attention and Dynamic Semantic Information Guiding[J]. Journal of Frontiers of Computer Science and Technology, 2017, 11(12): 2033-2040.

HTML			PDF

最新录用	在线预览	正式出版	最新录用	在线预览	正式出版
0	0	0	0	0	95

来源	本网站	其他网站

次数	93	2
比例	98%	2%

摘要

150

最新录用	在线预览	正式出版

0	0	150

[1]	武晓栋，刘敬浩，金杰，毛思平. 基于DT及PCA的DNN入侵检测模型[J]. 计算机科学与探索, 2021, 15(8): 1450-1458.
[2]	刘利平，乔乐乐，蒋柳成. 图像去噪方法概述[J]. 计算机科学与探索, 2021, 15(8): 1418-1431.
[3]	沈学利，秦鑫宇. 密度Canopy的增强聚类与深度特征的KNN算法[J]. 计算机科学与探索, 2021, 15(7): 1289-1301.
[4]	祖弦，谢飞，刘啸剑. 融合词和文档嵌入的关键词抽取算法[J]. 计算机科学与探索, 2021, 15(2): 294-304.
[5]	徐辉，祝玉华，甄彤，李智慧. 深度神经网络图像语义分割方法综述[J]. 计算机科学与探索, 2021, 15(1): 47-59.
[6]	林阳，初旭，王亚沙，毛维嘉，赵俊峰. 融合自注意力机制的跨模态食谱检索方法[J]. 计算机科学与探索, 2020, 14(9): 1471-1481.
[7]	李俊杰，王茜. 感知相似的图像分类对抗样本生成模型[J]. 计算机科学与探索, 2020, 14(11): 1930-1942.
[8]	张涛，任相赢，刘阳，耿彦章. 基于自编码特征的语音增强声学特征提取[J]. 计算机科学与探索, 2019, 13(8): 1341-1350.
[9]	徐毅，董晴，戴鑫，宋威. ELM优化的深度自编码分类算法[J]. 计算机科学与探索, 2018, 12(5): 820-827.
[10]	王毅，冯小年，钱铁云，朱辉，周静. 基于CNN和LSTM深度网络的伪装用户入侵检测[J]. 计算机科学与探索, 2018, 12(4): 575-585.
[11]	胡志刚，景冬梅，陈柏林，杨柳. 基于Hadoop平台的语义数据查询策略研究[J]. 计算机科学与探索, 2016, 10(7): 948-958.
[12]	曾春秋+,唐常杰,李川,段磊. MPSQAR：无损语义的量化关联规则挖掘算法[J]. 计算机科学与探索, 2009, 3(4): 392-404.

融合注意力和动态语义指导的图像描述模型

Image Caption Generation Model with Visual Attention and Dynamic Semantic Information Guiding

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 12

编辑推荐

Metrics