计算机科学与探索 ›› 2019, Vol. 13 ›› Issue (11): 1829-1838.DOI: 10.3778/j.issn.1673-9418.1812054

• 数据库技术 • 上一篇    下一篇

非完整数据库Skyline-join查询

鲍斌国,秦小麟,李星罗,张彤   

  1. 南京航空航天大学 计算机科学与技术学院,南京 211106
  • 出版日期:2019-11-01 发布日期:2019-11-07

Skyline-join Queries in Incomplete Database

BAO Binguo, QIN Xiaolin, LI Xingluo, ZHANG Tong   

  1. College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China
  • Online:2019-11-01 Published:2019-11-07

摘要: 传统的Skyline-join查询仅适用于完整数据库,随着新的应用需要的出现,实际应用中考虑到非完整数据库中的Skyline-join查询。概率Skyline利用概率值表示非完整数据项之间的支配关系,有效地避免了传统非完整数据库Skyline查询存在的支配性丢失问题。在分析概率Skyline无法有效处理多关系查询的基础上,对概率Skyline定义进行了扩充,使其适用于多关系查询,并提出了基于多层次分组的PSkyline-join算法。该算法首先基于连接键值及缺失位图对各个关系进行多层次分组,再计算各组数据项的局部Skyline概率上界,然后连接数据项并更新数据项的全局Skyline概率上界,最后利用全局Skyline概率上界与全局Skyline概率下界设计了两种剪枝策略,高效地计算全局概率Skyline结果集。在模拟数据集上验证了PSkyline-join算法效率相较传统算法有着几十倍的提升。

关键词: 非完整数据库, Skyline-join查询, 概率Skyline

Abstract: Traditional Skyline-join queries are only available for complete databases. As new application requirements arise, Skyline-join queries in incomplete databases are required in the actual application. Probabilistic Skyline uses probability values to represent the dominance relationship between incomplete tuples, which avoids the issue of dominance loss in the traditional incomplete database Skyline query. Since probabilistic Skyline can’t deal with multi-relational queries, the definition of probability Skyline is extended to make it suitable for multi-relational queries, and the PSkyline-join algorithm based on multi-level grouping is proposed. Firstly, the algorithm performs multi-level grouping on each relation based on the join key value and the missing bitmap. Next, the local Skyline probability upper bound of each group of data items is calculated. After that, the tuples are connected and the global Skyline probability upper bound of the tuples is updated. Finally, two kinds of pruning strategies are designed by using the global Skyline probability upper bound and the global Skyline probability lower bound to efficiently?calculate the global probabilistic Skyline results. Experimental results on the simulated data set demonstrate that PSkyline-join algorithm is tens of times faster than the traditional algorithm.

Key words: incomplete database, Skyline-join queries, probabilistic Skyline