计算机科学与探索 ›› 2015, Vol. 9 ›› Issue (11): 1351-1361.DOI: 10.3778/j.issn.1673-9418.1503017

• 人工智能与模式识别 • 上一篇    下一篇

基于多核学习的商品图像句子标注

张红斌1,2+,姬东鸿1,任亚峰1,尹  兰1   

  1. 1. 武汉大学 计算机学院,武汉 430072
    2. 华东交通大学 软件学院,南昌 330013
  • 出版日期:2015-11-01 发布日期:2015-11-03

Product Image Sentence Annotation Based on Multiple Kernel Learning

ZHANG Hongbin1,2+, JI Donghong1, REN Yafeng1, YIN Lan1   

  1. 1. Computer School, Wuhan University, Wuhan 430072, China
    2. School of Software, East China Jiaotong University, Nanchang 330013, China
  • Online:2015-11-01 Published:2015-11-03

摘要: 句子蕴含丰富的语义信息,为商品图像标注句子能准确刻画商品特性,并改善信息检索准确率。现有商品图像句子标注方法存在特征学习不充分、特征表现单一等问题,针对这些问题,提出了基于高效匹配核(efficient match kernels,EMK)进行特征学习,抽取判别性能更优的形状核特征来刻画商品图像,并综合图像的形状、纹理、梯度等特征,在多核学习模型内融合出多核特征(multiple kernel feature,MKF),丰富特征表现形式,更好地解释图像中的形状和纹理视觉特性。基于MKF完成图像分类,检索关键文本标注商品图像。实验表明,MKF获取了最优的图像分类准确率,并且具有鲜明纹理或形状特性的商品图像,其MAP(mean average precision)指标更优。另据BLEU(bilingual evaluation understudy)评分显示,所标句子包含的语义信息贴近商品图像内容,且它的连贯性、可读性更好,具有很高的实用价值。

关键词: 多核学习, 高效匹配核, 商品图像, 句子标注, 自然语言生成

Abstract:  Product characteristics can be described comprehensively by sentence as well as the information retrieval performance can be improved effectively because sentence contains rich semantic information. However, several problems such as insufficient feature learning and so simple feature still remain in current sentence annotation works. As the reason, image feature learning is implemented based on EMK (efficient match kernels) so that a shape EMK feature with more powerful discriminate ability is extracted to describe the product image. Moreover, shape, texture and gradient features are fused together to create a new feature named MKF (multiple kernel feature) by multiple kernel learning. MKF interprets the shape and texture characteristics of product image well. Finally, key texts are retrieved to annotate the product image after product image classification. The experimental results show that MKF achieves the best classification performance. Meanwhile products which have distinct shape and texture characteristics obtain better MAP (mean average precision) values. As is expected, BLEU (bilingual evaluation understudy) scores of the sentence generated by MK-SVM model are superior to the state of art baselines. More importantly, semantic information that the sentence contains is close to the product image’s content. Furthermore, the sentence is more coherent and readable than traditional models, which means high practicability.

Key words: multiple kernel learning, efficient match kernels, product image, sentence annotation, natural language generation