计算机科学与探索 ›› 2019, Vol. 13 ›› Issue (1): 97-105.DOI: 10.3778/j.issn.1673-9418.1801016

• 人工智能与模式识别 • 上一篇    下一篇

基于深度示例差异化的零样本多标签图像分类

冀  中,李慧慧,何宇清+   

  1. 天津大学 电气自动化与信息工程学院,天津 300072
  • 出版日期:2019-01-01 发布日期:2019-01-09

Zero-Shot Multi-Label Image Classification Based on Deep Instance Differentiation

JI Zhong, LI Huihui, HE Yuqing+   

  1. School of Electrical and Information Engineering, Tianjin University, Tianjin 300072, China
  • Online:2019-01-01 Published:2019-01-09

摘要: 零样本多标签图像分类是对含多个标签且测试类别标签在训练过程中没有相应训练样本的图像进行分类标注。已有的研究表明,多标签图像类别间存在相互关联,合理利用标签间相互关系是多标签图像分类技术的关键,如何实现已见类到未见类的模型迁移,并利用标签间相关性实现未见类的分类是零样本多标签分类需要解决的关键问题。针对这一挑战性的学习任务,提出一种深度示例差异化分类算法。首先利用深度嵌入网络实现图像视觉特征空间至标签语义特征空间的跨模态映射,然后在语义空间利用示例差异化算法实现多标签分类。通过在主流数据集Natural Scene和IAPRTC-12上与已有算法进行对比实验,验证了所提方法的先进性和有效性,同时验证了嵌入网络的先进性。

关键词: 零样本学习, 多标签分类, 跨模态映射, 多示例学习

Abstract: Zero-shot multi-label image classification aims to tag images with multiple labels under the condition that the testing labels have no corresponding training samples. Previous studies show that the key to multi-label image classification technology is to make rational use of the label relationship that exists between multi-label image categories. The key issues that need to be addressed in the zero-shot multi-label image classification are to realize the model transfer from seen class to unseen class and utilize the label correlation to achieve classification of testing unseen classes. A deep instance differentiation algorithm is proposed for this challenging learning task. Specifically, visual features are first projected into semantic embedding space by use of deep embedding network, then an instance differentiation algorithm is employed to achieve multi-label classification in semantic space. Compared with existing algorithms on the popular Natural Scene and IAPRTC-12 datasets, the experimental results demonstrate the proposed method is advanced and effective, and the advance of embedding network is verified.

Key words: zero-shot learning, multi-label classification, cross-modal mapping, multi-instance learning