计算机科学与探索 ›› 2022, Vol. 16 ›› Issue (4): 806-821.DOI: 10.3778/j.issn.1673-9418.2010015

• 数据库技术 • 上一篇    下一篇

基于图数据库的空间频繁并置模式挖掘

胡自松, 王丽珍+(), Vanha Tran, 周丽华   

  1. 云南大学 信息学院,昆明 650504
  • 收稿日期:2020-10-09 修回日期:2020-12-01 出版日期:2022-04-01 发布日期:2020-12-08
  • 通讯作者: + E-mail: lzhwang@ynu.edu.cn
  • 作者简介:胡自松(1995—),男,云南曲靖人,硕士研究生,主要研究方向为空间数据挖掘。
    王丽珍(1962—),女,山西武乡人,博士,教授,博士生导师,CCF高级会员,主要研究方向为空间数据挖掘、交互式数据挖掘、大数据分析及其应用等。
    陈文和(1988—),男,越南人,博士研究生,主要研究方向为空间数据库、数据挖掘。
    周丽华(1968—),女,云南华坪人,博士,教授,博士生导师,CCF会员,主要研究方向为数据挖掘、机器学习、社会网络分析。
  • 基金资助:
    国家自然科学基金(61966036);国家自然科学基金(61662086);云南省创新团队建设项目(2018HC019)

Mining Spatial Prevalent Co-location Patterns Based on Graph Databases

HU Zisong, WANG Lizhen+(), Vanha Tran, ZHOU Lihua   

  1. School of Information Science and Engineering, Yunnan University, Kunming 650504, China
  • Received:2020-10-09 Revised:2020-12-01 Online:2022-04-01 Published:2020-12-08
  • About author:HU Zisong, born in 1995, M.S. candidate. His research interest is spatial data mining.
    WANG Lizhen, born in 1962, Ph.D., professor, Ph.D. supervisor, senior member of CCF. Her research interests include spatial data mining, interactive data mining, big data analytics and their applications, etc.
    Vanha Tran, born in 1988, Ph.D. candidate. His research interests include spatial database and data mining.
    ZHOU Lihua, born in 1968, Ph.D., professor, Ph.D. supervisor, member of CCF. Her research interests include data mining, machine learning and social network analysis.
  • Supported by:
    National Natural Science Foundation of China(61966036);National Natural Science Foundation of China(61662086);Project of Innovative Research Team of Yunnan Province(2018HC019)

摘要:

空间频繁并置模式(SPCP)是一组空间特征的子集,它们的实例在地理空间中频繁地出现在一起。基于内存物化实例邻近关系并搜索模式实例效率较高,但实例信息会被重复存储。图数据库技术能高效地对具有复杂关联关系的数据建模,但基于实例邻近关系图移植已有的挖掘方法不能发挥图遍历的优势。针对上述问题,探索了基于图数据库的空间频繁并置模式挖掘方法。首先,利用图数据库对空间实例及其邻近关系建模,即将实例和关系存储在图数据库中。然后,基于图数据库设计了一个称为子图(团)搜索的基本算法,采用团查找的方式生成模式的表实例从而获得参与实例,避免了传统方法中效率较低的组合或连接操作。考虑到通过生成表实例收集参与实例的效率较低,设计了参与实例验证算法,包括过滤阶段和验证阶段。过滤阶段判断一个中心实例的邻居集中所涉及的特征是否完全包含了待计算模式中的特征,验证阶段则是判断是否存在一个模式实例包含该中心实例。参与实例验证算法每次验证一个中心实例都尽可能多地去确定参与对象,从而有效地减小了搜索空间和减少了团的搜索次数。此外,对提出算法的正确性和完备性进行了证明。最后,在真实和合成数据集上做了大量的实验,验证了所提算法的效率和有效性。

关键词: 空间数据挖掘, 图数据库, 空间并置模式, 子图搜索

Abstract:

A spatial prevalent co-location pattern (SPCP) is a subset of spatial features whose instances frequently appear together in geographic space. Memory-based neighbor relationship materialization method to search for pattern’s instances is efficient, but instance information is stored repeatedly. Graph database technology can efficiently model data with complex associations. Thus, it is possible to consider using the graph database technology to materialize neighbor relationships (i.e., to construct neighborhood graph), but directly transplanting existing mining methods cannot exert the advantages of graph traversal. To solve the above problem, this paper explores the graph database-based approach to mine spatial prevalent co-location patterns. Firstly, the graph database is utilized to model the spatial instances and their neighbor relations, i.e., the instances and relations are stored in the graph database to construct the neighborhood graph. Then, a basic algorithm called subgraph (or clique) search is designed based on the graph database, using the way of clique search strategy to generate a pattern’s table instance to obtain the participating instances, and avoid the inefficient combination or join operations in the traditional method. Considering the low efficiency of collecting participating instances by generating table instances, a participating instance verification algorithm is designed, including the filtering and verification phases. The filtering phase determines whether the features involved in the center instance’s neighborhoods fully contain the features in the pattern, and the verification phase determines whether there is pattern instance containing the central instance. The participating instance verification algorithm determines as many participating instances as possible each time, thereby effectively reducing the search space and the number of clique searches. In addition, the correctness and completeness of the proposed algorithms are proven. Finally, extensive experiments are conducted on real and synthetic datasets to verify the efficiency and effectiveness of the proposed algorithms.

Key words: spatial data mining, graph database, spatial co-location pattern, subgraph search

中图分类号: