计算机科学与探索 ›› 2021, Vol. 15 ›› Issue (8): 1390-1404.DOI: 10.3778/j.issn.1673-9418.2101092

• 综述·探索 • 上一篇    下一篇

跨模态检索研究文献综述

陈宁,段友祥,孙歧峰   

  1. 中国石油大学(华东) 计算机科学与技术学院,山东 青岛 266580
  • 出版日期:2021-08-01 发布日期:2021-08-02

Literature Review of Cross-Modal Retrieval Research

CHEN Ning, DUAN Youxiang, SUN Qifeng   

  1. School of Computer Science and Technology, China University of Petroleum, Qingdao, Shandong 266580, China
  • Online:2021-08-01 Published:2021-08-02

摘要:

随着互联网技术的蓬勃发展和智能设备的普及,多媒体数据在数量爆炸式增长的同时,其形态也越来越多样化。人们获取信息的需求已经不满足于单一模态的数据检索,通过不同模态的知识协同实现跨模态的检索成为近几年研究的热点。在深入了解分析跨模态检索研究背景和研究进展的基础上,以跨模态检索的关键技术——公共子空间建模为主线,对跨模态检索技术的三大类方法传统统计分析方法、深度学习方法与哈希学习方法,从不同角度对研究内容、关键技术、局限性、适用性和特点等方面进行了全方位、多角度的对比分析,并进行了实验以更深入地对比。最后,对跨模态检索有待解决的难点、未来的探索方向、近些年主流设计思路以及发展趋势进行了充分展望,为进一步研究提供理论基础。

关键词: 跨模态检索, 多媒体数据, 知识协同, 公共子空间

Abstract:

With the vigorous development of Internet technology and the popularization of smart devices, while the amount of multimedia data exploding, their forms become increasingly diverse. People??s demand for information is no longer satisfied with single-modal data retrieval. Realizing cross-modal retrieval through knowledge collaboration of different modalities has become a research hotspot in recent years. On the basis of in-depth understanding and analysis of the research background and progress of cross-modal retrieval, with the key technology of cross-modal retrieval, public subspace modeling as the main line, this paper analyzes three types of methods of cross-modal retrieval technology: traditional statistical analysis, deep learning, and Hash learning. This paper conducts a comprehensive and multi-angle comparative analysis on the research content, key technology, limitations, applicability and characteristics from different angles, and experiments are done for more in-depth comparisons. Finally, the difficulties to be solved in cross-modal retrieval, future exploration directions, mainstream design ideas and development trends in recent years are fully prospected to provide a theoretical basis for further research.

Key words: cross-modal retrieval, multimedia data, knowledge collaboration, public subspace