计算机科学与探索 ›› 2011, Vol. 5 ›› Issue (7): 577-587.

• 学术研究 • 上一篇    下一篇

发现维基百科文章相关图片

寿思聪1, 姚从磊2, 李晓明1   

  1. 1. 北京大学 信息科学与技术学院, 北京 100871
    2. 惠普中国实验室, 北京 100084
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2011-07-01 发布日期:2011-07-01

Discovering Images for Wikipedia Articles

SHOU Sicong, YAO Conglei, LI Xiaoming   

  1. 1. School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China 2. HP Labs China, Beijing 100084, China
  • Received:1900-01-01 Revised:1900-01-01 Online:2011-07-01 Published:2011-07-01

摘要: 维基百科(Wikipedia)提供了海量的描述著名概念的高质量文章, 丰富的图片使它们有更高的价值。但大部分Wikipedia 文章都没有图片或图很少, 为此给出了综合的框架WIMAGE 来为Wikipedia 文章发现高精度、高召回度和高多样性图片。WIMAGE 包括生成查询的方法及两种图片排序方法。采用Wikipedia中4 个常见类别的40 篇文章进行实验, 结果显示WIMAGE 能有效地为Wikipedia 文章发现高精度、高召回度以及高多样性的图片, 且同时考虑了视觉相似度和文本相似度的排序方法效果最好。

关键词: 维基百科, 图片发现, 多样性, 图片排序

Abstract:

Wikipedia provides plenty of human-edited articles for popular concepts in most domains. One Wikipedia article with high-diversity images is more valuable than that with no image. This paper proposes the problem of image discovery for Wikipedia articles with high precision, high recall and high diversity, and a general framework WIMAGE to address this problem. WIMAGE includes an approach to generate queries for different paragraphs of each Wikipedia article, and two ever-increasing methods to rank the images retrieved. This paper evaluates the effectiveness of WIMAGE using 40 Wikipedia articles from 4 popular Wikipedia categories. Experimental results show that WIMAGE is effective in discovering images for Wikipedia articles with high precision, high recall and high diversity, and the ranking method taking into account both the visual similarity and text similarity performs better.

Key words: Wikipedia article, image discovery, diversity, image ranking