高效的数据源选择方式*

doi:10.3778/j.issn.1673-9418.2010.10.003

计算机科学与探索 ›› 2010, Vol. 4 ›› Issue (10): 890-898.DOI: 10.3778/j.issn.1673-9418.2010.10.003

高效的数据源选择方式*

黄维篁⁺;李国良;冯建华

清华大学计算机科学与技术系, 北京 100084

收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2010-10-01 发布日期:2010-10-01
通讯作者: 黄维篁

Efficient Method for Database Selection*

HUANG Weihuang⁺;LI Guoliang;FENG Jianhua

Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China

Received:1900-01-01 Revised:1900-01-01 Online:2010-10-01 Published:2010-10-01
Contact: HUANG Weihuang

摘要/Abstract

摘要： 随着关键词查询技术的飞速发展和互联网数据的迅猛增长, 高效、准确的数据源选择变得十分有意义。提出了一种基于倒排列表的数据源选择方式, 通过这种方式, 能够在短时间内选择出相关度高的数据源, 在这些数据源中执行检索, 从而减少查询时间, 给用户带来了更好的查询体验。从实验结果可以看出,这种方法在实际系统(例如机票查询系统)中可以得到很好的效果。为了在大规模的数据集上高效地实现相关算法, 将min-hash 算法应用到相似度估计中来, 减少了查询空间和时间的消耗。与传统算法的比较结果表明：min-hash 算法能够得到较高的精确度, 并且极大地节省了算法的运行时间。

关键词: 数据源选择, 关键词查询, 概要, min-hash 算法

Abstract: With the rapid growth and deployment of the distributed databases over the Internet, it calls for new efficient search method over multiple structured data sources. This paper proposes a new keyword-search method for effective database selection using inverted lists. The method can achieve a high interactive speed and thus can improve user experiences. This method has been implemented on airticket-search systems, and experimental results show that it achieves high search performance. For large scale data, a min-hash based algorithm is adopted to select highly relevant data sources, which can improve the performance and achieve high precision

Key words: database selection, keyword search, database summary, min-hash based algorithm

中图分类号:

TP311.133.1

黄维篁+ ;李国良;冯建华 . 高效的数据源选择方式*[J]. 计算机科学与探索, 2010, 4(10): 890-898.

HUANG Weihuang⁺;LI Guoliang;FENG Jianhua

. Efficient Method for Database Selection*[J]. Journal of Frontiers of Computer Science and Technology, 2010, 4(10): 890-898.

[1]	刘正涛，王建东. Web大数据系统数据源选择[J]. 计算机科学与探索, 2018, 12(3): 360-369.
[2]	柳郁，孙小兵，李斌. 面向Java程序包的代码概要自动生成技术研究[J]. 计算机科学与探索, 2017, 11(2): 212-220.

高效的数据源选择方式*

Efficient Method for Database Selection*

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 2

编辑推荐

Metrics