计算机科学与探索 ›› 2016, Vol. 10 ›› Issue (4): 495-503.DOI: 10.3778/j.issn.1673-9418.1506045

• 数据库技术 • 上一篇    下一篇

海量不完整数据上基于维度组合的Skyline查询

王  妍1,2,银  彪1,刘赓浩1,宋宝燕1+,王俊陆1   

  1. 1. 辽宁大学 信息学院,沈阳 110036
    2. 东北大学 信息与工程学院,沈阳 110819
  • 出版日期:2016-04-01 发布日期:2016-04-01

Skyline Query of Massive Incomplete Data Based on Combinational Dimensions

WANG Yan1,2, YIN Biao1, LIU Genghao1, SONG Baoyan1+, WANG Junlu1   

  1. 1. School of Information, Liaoning University, Shenyang 110036, China
    2. School of Information Science and Engineering, Northeastern University, Shenyang 110819, China
  • Online:2016-04-01 Published:2016-04-01

摘要: 随着互联网、物联网等信息技术的快速发展,多维数据日益增多,这些海量数据中往往伴随着大量的不完整数据,如何从海量不完整数据中高效地获取用户所需的近似的结果集是一个亟需解决的问题。针对海量高维的不完整数据集,提出了一种基于维度组合的Skyline查询算法,通过构建RankList数据结构提高查询效率,并减少不完整数据对查询结果的影响;利用维度的不同组合,划分出查询子空间,并渐进地查询出每个子空间的最优先点,从而获得海量不完整数据集上均匀分布的Skyline点。实验结果表明,该算法与Iskyline算法相比,平均查询效率提高了85%,并且在数据量大、维度高时,较普通方法查询效率更高。

关键词: 海量不完整数据, 维度组合, Skyline

Abstract: With the rapid development of Internet, Internet of things and other information technology, and multi-dimensional data increasing, these massive data are often accompanied by a large number of incomplete data. So how to efficiently get the approximate result sets required by users from the massive incomplete data is an urgent problem to solve. This paper proposes a Skyline query algorithm for the massive high-dimensional incomplete data sets based on combination of dimensions. The algorithm constitutes RankList data structure to improve query efficiency and reduce the impact of incomplete data for query results, divides query subspaces by combining different dimensions, and incrementally checks out the highest priority point in the subspace, that is Skyline points uniformly distributed in the incomplete data set. The experimental results show that, compared with the Iskyline algorithm, the query efficiency of the proposed algorithm increases by 85% on average. And when the data are huge amount and high dimension, the algorithm shows higher query efficiency than the ordinary methods.

Key words: massive incomplete data, combinational dimensions, Skyline