计算机科学与探索 ›› 2018, Vol. 12 ›› Issue (10): 1559-1570.DOI: 10.3778/j.issn.1673-9418.1709033

• 数据库技术 • 上一篇    下一篇

面向空间在线分析的并行近似聚集查询

申金鑫,吴  烨+,陈  荦,景  宁   

  1. 国防科技大学 电子科学学院,长沙 410073
  • 出版日期:2018-10-01 发布日期:2018-10-08

Parallel Approximate Aggregation Query for Spatial Online Analysis

SHEN Jinxin, WU Ye+, CHEN Luo, JING Ning   

  1. College of Electronic Science, National University of Defense Technology, Changsha 410073, China
  • Online:2018-10-01 Published:2018-10-08

摘要: 在应对激增的空间数据时,空间聚集查询是一类有效的分析方法。当前,传统单机串行方法已经难以胜任在线分析需求,然而并行可扩展的计算架构中专门针对空间数据的聚集索引技术尚未有很多研究。因此,提出两种新的索引方法以支持空间在线并行聚集分析。第一种索引方法中,并行的两级空间索引结构提升了精确聚集查询效率。在此基础上构建随机采样样本并优化得到第二种索引方法,在任意给定置信度下能够反馈带有置信区间聚集查询结果,且精度随着获取样本的增加不断提高。10亿级规模数据实验结果表明该方法有效可行,还有一定的可扩展性。

关键词: 聚集计算, 近似查询, 空间索引, 在线分析

Abstract: While coping with the soaring spatial data, spatial aggregation proves to be competent and efficient, though it can be compute-intensive. In terms of spatial online aggregation, traditional stand-alone serial methods gradually become limited. However, the current parallel computing architectures widely used nowadays, scarcely have research conducted on the index-based parallel online aggregation methods specifically for spatial data. Therefore, two new indexes-based methods are proposed to support spatial online aggregation analysis. In the first method,     indexes are organized in two-layers, where the global grid index filters the related local indexes and the local indexes accelerate the aggregate query locally. In the second method, on the basis of the first method, the random sampling, adaptive data-bricks partition, dynamic caching, and other optimization techniques are all applied. In this way, when given certain confidence, the final results are returned with certain credit intervals. Experimental and analytical results on billion-scale data verify the effectiveness and scalability of those methods.

Key words: aggregation computation, approximate query, spatial index, online analysis