计算机科学与探索 ›› 2021, Vol. 15 ›› Issue (5): 922-930.DOI: 10.3778/j.issn.1673-9418.2006062

• 人工智能 • 上一篇    下一篇

基于对象特征的深度哈希跨模态检索

朱杰,白弘煜,张仲羽,谢博鋆,张俊三   

  1. 1. 中央司法警官学院 信息管理系,河北 保定 071000
    2. 河北大学 数学与信息科学学院,河北 保定 071002
    3. 中国石油大学(华东) 计算机科学与技术学院,山东 青岛 266580
  • 出版日期:2021-05-01 发布日期:2021-04-30

Object Feature Based Deep Hashing for Cross-Modal Retrieval

ZHU Jie, BAI Hongyu, ZHANG Zhongyu, XIE Bojun, ZHANG Junsan   

  1. 1. Department of Information Management, The National Police University for Criminal Justice, Baoding, Hebei 071000, China
    2. College of Mathematics and Information Science, Hebei University, Baoding, Hebei 071002, China
    3. College of Computer Science and Technology, China University of Petroleum, Qingdao, Shandong 266580, China
  • Online:2021-05-01 Published:2021-04-30

摘要:

随着不同模态的数据在互联网中的飞速增长,跨模态检索逐渐成为了当今的一个热点研究问题。哈希检索因其快速、有效的特点,成为了大规模数据跨模态检索的主要方法之一。在众多图像-文本的深度跨模态检索算法中,设计的准则多为尽量使得图像的深度特征与对应文本的深度特征相似。但是此类方法将图像中的背景信息融入到特征学习中,降低了检索性能。为了解决此问题,提出了一种基于对象特征的深度哈希(OFBDH)跨模态检索方法。此方法从特征映射中学习到优化的、有判别力的极大激活特征作为对象特征,并将其融入到图像与文本的跨模态网络学习中。实验结果表明,OFBDH能够在MIRFLICKR-25K、IAPR TC-12和NUS-WIDE三个数据集上获得良好的跨模态检索结果。

关键词: 对象特征, 跨模态损失, 网络参数学习, 检索

Abstract:

With the rapid growth of data with different modalities on the Internet, cross-modal retrieval has gradually become a hot research topic. Due to its efficiency and effectiveness, Hashing based methods have become one of the most popular large-scale cross-modal retrieval strategies. In most of the image-text cross-modal retrieval methods, the goal is to make the deep features of the images similar to the corresponding deep text features. However, these methods incorporate background information of the images into the feature learning, as a result, the retrieval performance is decreased. To solve this problem, OFBDH (object feature based deep Hashing) is proposed to learn optimal discriminative maximum activations of convolutions from the feature maps to represent the object features, and then the learned object features are integrated into the image-text cross-modal network learning. Experimental results show that OFBDH can obtain satisfactory cross-modal retrieval results on MIRFLICKR-25K, IAPR TC-12 and NUS-WIDE.

Key words: object feature, cross-modal loss, network parameters learning, retrieval