计算机科学与探索 ›› 2010, Vol. 4 ›› Issue (10): 890-898.DOI: 10.3778/j.issn.1673-9418.2010.10.003

• 学术研究 • 上一篇    下一篇

高效的数据源选择方式*

黄维篁+;李国良;冯建华

  

  1. 清华大学 计算机科学与技术系, 北京 100084
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2010-10-01 发布日期:2010-10-01
  • 通讯作者: 黄维篁

Efficient Method for Database Selection*

HUANG Weihuang+;LI Guoliang;FENG Jianhua

  

  1. Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China

  • Received:1900-01-01 Revised:1900-01-01 Online:2010-10-01 Published:2010-10-01
  • Contact: HUANG Weihuang

摘要: 随着关键词查询技术的飞速发展和互联网数据的迅猛增长, 高效、准确的数据源选择变得十分有意义。提出了一种基于倒排列表的数据源选择方式, 通过这种方式, 能够在短时间内选择出相关度高的数据源, 在这些数据源中执行检索, 从而减少查询时间, 给用户带来了更好的查询体验。从实验结果可以看出,这种方法在实际系统(例如机票查询系统)中可以得到很好的效果。为了在大规模的数据集上高效地实现相关算法, 将min-hash 算法应用到相似度估计中来, 减少了查询空间和时间的消耗。与传统算法的比较结果表明:min-hash 算法能够得到较高的精确度, 并且极大地节省了算法的运行时间。

关键词: 数据源选择, 关键词查询, 概要, min-hash 算法

Abstract: With the rapid growth and deployment of the distributed databases over the Internet, it calls for new efficient search method over multiple structured data sources. This paper proposes a new keyword-search method for effective database selection using inverted lists. The method can achieve a high interactive speed and thus can improve user experiences. This method has been implemented on airticket-search systems, and experimental results show that it achieves high search performance. For large scale data, a min-hash based algorithm is adopted to select highly relevant data sources, which can improve the performance and achieve high precision

Key words: database selection, keyword search, database summary, min-hash based algorithm

中图分类号: