计算机科学与探索 ›› 2022, Vol. 16 ›› Issue (4): 806-821.DOI: 10.3778/j.issn.1673-9418.2010015
收稿日期:
2020-10-09
修回日期:
2020-12-01
出版日期:
2022-04-01
发布日期:
2020-12-08
通讯作者:
+ E-mail: lzhwang@ynu.edu.cn作者简介:
胡自松(1995—),男,云南曲靖人,硕士研究生,主要研究方向为空间数据挖掘。基金资助:
HU Zisong, WANG Lizhen+(), Vanha Tran, ZHOU Lihua
Received:
2020-10-09
Revised:
2020-12-01
Online:
2022-04-01
Published:
2020-12-08
About author:
HU Zisong, born in 1995, M.S. candidate. His research interest is spatial data mining.Supported by:
摘要:
空间频繁并置模式(SPCP)是一组空间特征的子集,它们的实例在地理空间中频繁地出现在一起。基于内存物化实例邻近关系并搜索模式实例效率较高,但实例信息会被重复存储。图数据库技术能高效地对具有复杂关联关系的数据建模,但基于实例邻近关系图移植已有的挖掘方法不能发挥图遍历的优势。针对上述问题,探索了基于图数据库的空间频繁并置模式挖掘方法。首先,利用图数据库对空间实例及其邻近关系建模,即将实例和关系存储在图数据库中。然后,基于图数据库设计了一个称为子图(团)搜索的基本算法,采用团查找的方式生成模式的表实例从而获得参与实例,避免了传统方法中效率较低的组合或连接操作。考虑到通过生成表实例收集参与实例的效率较低,设计了参与实例验证算法,包括过滤阶段和验证阶段。过滤阶段判断一个中心实例的邻居集中所涉及的特征是否完全包含了待计算模式中的特征,验证阶段则是判断是否存在一个模式实例包含该中心实例。参与实例验证算法每次验证一个中心实例都尽可能多地去确定参与对象,从而有效地减小了搜索空间和减少了团的搜索次数。此外,对提出算法的正确性和完备性进行了证明。最后,在真实和合成数据集上做了大量的实验,验证了所提算法的效率和有效性。
中图分类号:
胡自松, 王丽珍, Vanha Tran, 周丽华. 基于图数据库的空间频繁并置模式挖掘[J]. 计算机科学与探索, 2022, 16(4): 806-821.
HU Zisong, WANG Lizhen, Vanha Tran, ZHOU Lihua. Mining Spatial Prevalent Co-location Patterns Based on Graph Databases[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(4): 806-821.
数据集 | 特征数 | 实例数 | 范围(D×D) |
---|---|---|---|
真实数据集1 | 80 | 10 264 | 25 000×60 000 |
真实数据集2 | 26 | 25 276 | 14 000×25 000 |
真实数据集3 | 56 | 207 891 | 50 000×30 000 |
表1 真实数据集说明
Table 1 Description of real datasets
数据集 | 特征数 | 实例数 | 范围(D×D) |
---|---|---|---|
真实数据集1 | 80 | 10 264 | 25 000×60 000 |
真实数据集2 | 26 | 25 276 | 14 000×25 000 |
真实数据集3 | 56 | 207 891 | 50 000×30 000 |
参数 | 意义 |
---|---|
| 距离阈值 |
| 最小频繁性阈值 |
| 特征数量 |
| 实例数量 |
| 相同邻域内生成的模式实例数量 |
| 不同模式中重叠实例数占总实例数的比例 |
表2 合成数据参数说明
Table 2 Description of synthetic data parameters
参数 | 意义 |
---|---|
| 距离阈值 |
| 最小频繁性阈值 |
| 特征数量 |
| 实例数量 |
| 相同邻域内生成的模式实例数量 |
| 不同模式中重叠实例数占总实例数的比例 |
数据集 | 距离阈值/m | 频繁度阈值 |
---|---|---|
真实数据集1 | 150 | 0.2 |
真实数据集2 | 400 | 0.3 |
真实数据集3 | 50 | 0.2 |
表3 真实数据集默认参数
Table 3 Default parameters of real datasets
数据集 | 距离阈值/m | 频繁度阈值 |
---|---|---|
真实数据集1 | 150 | 0.2 |
真实数据集2 | 400 | 0.3 |
真实数据集3 | 50 | 0.2 |
数据集 | 执行时间/s | 时间效率提升倍数 | ||||
---|---|---|---|---|---|---|
GDBFracScore | CliqueSearch | InstanceValidation | | | | |
真实数据集1 | 130 | 13 | 4 | 10.0 | 32.5 | 3.2 |
真实数据集2 | 137 | 10 | 5 | 13.7 | 27.4 | 2.0 |
真实数据集3 | 568 | 81 | 49 | 7.0 | 11.5 | 1.6 |
表4 提出算法与基准的执行时间和加速比
Table 4 Execution time and speed-up ratio of proposed algorithm and benchmark
数据集 | 执行时间/s | 时间效率提升倍数 | ||||
---|---|---|---|---|---|---|
GDBFracScore | CliqueSearch | InstanceValidation | | | | |
真实数据集1 | 130 | 13 | 4 | 10.0 | 32.5 | 3.2 |
真实数据集2 | 137 | 10 | 5 | 13.7 | 27.4 | 2.0 |
真实数据集3 | 568 | 81 | 49 | 7.0 | 11.5 | 1.6 |
[1] | YOO J S, BOULWARE D, KIMMEY D. A parallel spatial co-location mining algorithm based on MapReduce[C]// Pro-ceedings of the 2014 IEEE International Congress on Big Data, Anchorage, Jun 27-Jul 2, 2014. Washington: IEEE Computer Society, 2014: 25-31. |
[2] | CHAN H K, LONG C, YAN D, et al. Fraction-Score: a new support measure for co-location pattern mining[C]// Procee-dings of the 35th IEEE International Conference on Data Engineering, Macao, China, Apr 8-11, 2019. Piscataway: IEEE, 2019: 1514-1525. |
[3] |
HUANG Y, SHEKHAR S, XIONG H. Discovering colocation patterns from spatial data sets: a general approach[J]. IEEE Transactions on Knowledge and Data Engineering, 2004, 16(12):1472-1485.
DOI URL |
[4] | YOO J S, SHEKHAR S, CELIK M. A join-less approach for co-location pattern mining: a summary of results[C]// Proceedings of the 5th IEEE International Conference on Data Mining, Houston, Nov 27-30, 2005. Washington: IEEE Computer Society, 2005: 813-816. |
[5] |
YAO X J, JIANG X F, WANG D C, et al. Efficiently mining maximal co-locations in a spatial continuous field under directed road networks[J]. Information Sciences, 2020, 542:357-379.
DOI URL |
[6] |
BAO X G, WANG L Z. A clique-based approach for co-location pattern mining[J]. Information Sciences, 2019, 490:244-264.
DOI URL |
[7] |
YU W H. Spatial co-location pattern mining for location-based services in road networks[J]. Expert Systems with Applications, 2016, 46:324-335.
DOI URL |
[8] |
MENNIS J L, LIU J W. Mining association rules in spatio-temporal data: an analysis of urban socioeconomic and land cover change[J]. Transactions in GIS, 2010, 9(1):5-17.
DOI URL |
[9] |
LEIBOVICI D G, CLARAMUNT C, LE GUYADER D, et al. Local and global spatio-temporal entropy indices based on distance-ratios and co-occurrences[J]. International Journal of Geographical Information Science, 2014, 28(5):1061-1084.
DOI URL |
[10] |
YUE H, ZHU X Y, YE X Y, et al. The local colocation patterns of crime and land-use features in Wuhan, China[J]. ISPRS International Journal of Geo-Information, 2017, 6(10):307.
DOI URL |
[11] |
AKBARI M, SAMADZADEGAN F, WEIBEL R. A generic regional spatio-temporal co-occurrence pattern mining model: a case study for air pollution[J]. Journal of Geographical Systems, 2015, 17(3):249-274.
DOI URL |
[12] |
SHAH F, CASTELLTORT A, LAURENT A. Handling missing values for mining gradual patterns from NoSQL graph data-bases[J]. Future Generation Computer Systems, 2020, 111:523-538.
DOI URL |
[13] | YOO J S, SHEKHAR S. A partial join approach for mining co-location patterns[C]// Proceedings of the 12th ACM Inter-national Workshop on Geographic Information Systems, Nov 2-13, 2004. New York: ACM, 2004: 241-249. |
[14] |
YOO J S, BOULWARE D, KIMMEY D. Parallel co-location mining with MapReduce and NoSQL systems[J]. Knowledge and Information Systems, 2020, 62(4):1433-1463.
DOI URL |
[15] |
SHESHIKALA M, RAO D R, PRAKASH R V. Parallel approach for finding co-location pattern—a map reduce framework[J]. Procedia Computer Science, 2016, 89:341-348.
DOI URL |
[16] |
YANG P Z, WANG L Z, WANG X X. A MapReduce approach for spatial co-location pattern mining via ordered-clique-growth[J]. Distributed and Parallel Databases, 2020, 38(2):531-560.
DOI URL |
[17] | KOPERSKI K, HAN J W. Discovery of spatial association rules in geographic information databases[C]// LNCS 951: Proceedings of the 4th International Symposium on Spatial Databases, Portland, Aug 6-9, 1995. Berlin, Heidelberg: Springer, 1995: 47-66. |
[18] | SHEKHAR S, HUANG Y. Discovering spatial co-location patterns: a summary of results[C]// LNCS 2121: Proceedings of the 7th International Symposium on Spatial and Temporal Databases, Redondo Beach, Jul 12-15, 2001. Berlin, Heidel-berg: Springer, 2001: 236-256. |
[19] | WANG L Z, BAO Y Z, LU J, et al. A new join-less approach for co-location pattern mining[C]// Proceedings of the 8th IEEE International Conference on Computer and Information Technology, Sydney, Jul 8-11, 2008. Washington: IEEE Computer Society, 2008: 197-202. |
[20] |
SAINJU A M, AGHAJARIAN D, JIANG Z, et al. Parallel grid-based colocation mining algorithms on GPUs for big spatial event data[J]. IEEE Transactions on Big Data, 2020, 6(1):107-118.
DOI URL |
[21] |
YOO J S, BOW M. A framework for generating condensed co-location sets from spatial databases[J]. Intelligent Data Analysis, 2019, 23(2):333-355.
DOI URL |
[22] | XIAO X Y, XIE X, LUO Q, et al. Density-based co-location pattern discovery[C]// Proceedings of the 16th ACM SIG-SPATIAL International Symposium on Advances in Geo-graphic Information Systems, Irvine, Nov 5-7, 2008. New York: ACM, 2008: 29. |
[23] |
YAO X J, CHEN L J, PENG L, et al. A co-location pattern-mining algorithm with a density-weighted distance thre-sholding consideration[J]. Journal of Information Sciences, 2017, 396:144-161.
DOI URL |
[24] | CELIK M, KANG J M, SHEKHAR S. Zonal co-location pattern discovery with dynamic parameters[C]// Proceedings of the 7th IEEE International Conference on Data Mining, Omaha, Oct 28-31, 2007. Washington: IEEE Computer Society, 2007: 433-438. |
[25] | DAI B R, LIN M Y. Efficiently mining dynamic zonal co-location patterns based on maximal co-locations[C]// Pro-ceedings of the 11th International Conference on Data Mining Workshops, Vancouver, Dec 11, 2011. Washington: IEEE Computer Society, 2011: 861-868. |
[26] | QIAN F, CHIEW K, HE Q M, et al. Mining regional co-location patterns with kNNG[J]. Journal of Intelligent Infor-mation Systems, 2014, 42(3):485-505. |
[27] | 王光耀, 王丽珍, 杨培忠, 等. 极小负co-location模式及有效的挖掘算法[J]. 计算机科学与探索, 2021, 15(2):366-378. |
WANG G Y, WANG L Z, YANG P Z, et al. Minimal negative co-location patterns and effective mining algorithm[J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(2):366-378. | |
[28] | 姚晓婧. 面向城市服务设施数据的同位模式挖掘研究[J]. 测绘学报, 2018, 47(10):1426. |
YAO X J. Co-location researches on urban public-service facilities[J]. Acta Geodaetica et Cartographica Sinica, 2018, 47(10):1426. | |
[29] | 刘文凯, 刘启亮, 蔡建南. 自然邻域支持下的空间同位模式挖掘方法[J]. 测绘学报, 2019, 48(1):95-105. |
LIU W K, LIU Q L, CAI J N. Discovery of co-location patterns based on natural neighborhood[J]. Acta Geodaetica et Cartographica Sinica, 2019, 48(1):95-105. | |
[30] | 马董, 陈红梅, 王丽珍, 等. 空间亚频繁co-location模式的主导特征挖掘[J]. 计算机应用, 2020, 40(2):465-472. |
MA D, CHEN H M, WANG L Z, et al. Dominant feature mining of spatial sub-prevalent co-location patterns[J]. Journal of Computer Applications, 2020, 40(2):465-472. |
[1] | 王光耀, 王丽珍, 杨培忠, 陈红梅. 极小负co-location模式及有效的挖掘算法[J]. 计算机科学与探索, 2021, 15(2): 366-378. |
[2] | 储传鑫,王丽珍,周丽华,李旭阳. 恶性肿瘤与工业污染之间的模糊关系挖掘[J]. 计算机科学与探索, 2020, 14(12): 2061-2071. |
[3] | 李文鹏,王建彬,林泽琦,赵俊峰,邹艳珍,谢冰. 面向开源软件项目的软件知识图谱构建方法[J]. 计算机科学与探索, 2017, 11(6): 851-862. |
[4] | 胡新,王丽珍,周丽华,温佛生. 空间极大co-location模式挖掘研究[J]. 计算机科学与探索, 2014, 8(2): 150-160. |
[5] | 欧阳志平,王丽珍,周丽华. 实例位置模糊的空间co-location模式挖掘研究[J]. 计算机科学与探索, 2012, 6(12): 1144-1152. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||