计算机科学与探索 ›› 2013, Vol. 7 ›› Issue (6): 494-504.DOI: 10.3778/j.issn.1673-9418.1212007

• 学术研究 • 上一篇    下一篇

面向科学数据的PageRank排序算法

黎建辉1,兰金松1,2,沈志宏1,滕常延1,2,周园春1+   

  1. 1. 中国科学院 计算机网络信息中心,北京 100190
    2. 中国科学院大学,北京 100190
  • 出版日期:2013-06-01 发布日期:2013-05-30

PageRank Algorithm for Scientific Data Ranking

LI Jianhui1, LAN Jinsong1,2, SHEN Zhihong1, TENG Changyan1,2, ZHOU Yuanchun1+   

  1. 1. Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China
    2. University of Chinese Academy of Sciences, Beijing 100190, China
  • Online:2013-06-01 Published:2013-05-30

摘要: 随着科学研究的发展,科学数据资源日益激增。在海量数据的情况下,数据检索服务变得极其关键,传统的科学数据检索系统只进行关键词匹配,检索结果的排序效果很差。为此,提出了针对结构化的科学数据的链接提取技术,并基于此把PageRank链接分析应用于科学数据排序。该算法在排序阶段考虑了各个科学数据资源的重要性以获得更好的排序结果。在科学数据检索系统Voovle中的实验结果表明,结合PageRank的科学数据排序更能满足用户的需求,排序结果更加合理。

关键词: 科学数据, 搜索引擎, 链接提取, PageRank

Abstract: With the development of scientific research, scientific data have been exploding increasingly. Faced with the challenges of big data, data retrieval service is beginning very important. However, traditional scientific data retrieval system just performs key words matching between record and query, leading to unreasonable ranking results. Focusing on this issue, this paper proposes a technique to extract link information from structured scientific data and applies PageRank to scientific data ranking. This algorithm considers the importance of scientific data during the ranking stage and can get better ranking results. The experimental results of Voovle, which is a typical scientific data retrieval system, indicate that the ranking result is more reasonable and can improve user experience.

Key words: scientific data, search engine, link extraction, PageRank