计算机科学与探索 ›› 2015, Vol. 9 ›› Issue (1): 24-35.DOI: 10.3778/j.issn.1673-9418.1407037

• 数据库技术 • 上一篇    下一篇

面向海量数据的空间co-location模式挖掘新算法

姚华传,王丽珍+,陈红梅,邹目权   

  1. 云南大学 信息学院 计算机科学与工程系,昆明 650091
  • 出版日期:2015-01-01 发布日期:2014-12-31

Spatial Co-location Patterns Mining Algorithm over Massive Spatial Data Sets

YAO Huachuan, WANG Lizhen+, CHEN Hongmei, ZOU Muquan   

  1. Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming 650091, China
  • Online:2015-01-01 Published:2014-12-31

摘要: 空间co-location模式挖掘是空间数据挖掘的一个重要任务,目前无论是挖掘确定数据,还是不确定数据,算法的时间和空间效率都不高,更谈不上对海量数据进行挖掘。为此,在深入分析传统挖掘方式过度消耗时间和空间资源的根本原因的基础上,提出了网格微分挖掘co-location模式的算法。新算法在传统网格基础上实施微分,求出各微分格中属于同一特征的实例质心,并基于这些质心进行多分辨剪枝co-location模式挖掘。算法在保证具有较高准确率的前提下,较好地解决了传统挖掘方式中存在的效率问题,从而解决了面向海量数据进行空间co-location模式挖掘的难题。大量实验证明,网格微分算法具有高效性、稳健性和高准确率等优点。

关键词: 网格微分算法, 质心, &sigma, 2微分格, 空间实例压缩率

Abstract: Spatial co-location patterns mining is an important task in spatial data mining, but the efficiencies of running time and space are low for traditional mining algorithms of determination data and uncertain data, not to mention the massive data. Therefore, based on the analysis of why traditional mining algorithms consumed excessive time and space resources, this paper proposes a grid differential algorithm to mine spatial co-location patterns. The new algorithm divides the traditional grids into differential ones, and then calculates the centroids of instances that belong to the same feature for each differential grid. Finally, based on these centroids, the co-location patterns are mined with multiresolution pruning method. The proposed algorithm greatly improves the overall efficiency and has a high accuracy rate, which better solves the problem of mining spatial co-location patterns from a massive data set. Extensive experiments show that the grid differential algorithm has the advantages of high efficiency, robustness and high accuracy and so on.

Key words: grid differential algorithm, centroid; σ2 differential grid, compression ratio of spatial instances