计算机科学与探索 ›› 2015, Vol. 9 ›› Issue (11): 1326-1334.DOI: 10.3778/j.issn.1673-9418.1412056

• 数据库技术 • 上一篇    下一篇

基于平面图覆盖的大规模图可达查询处理

段雨晴1,李世峰2,丁琳琳2+   

  1. 1. 香港科技大学 电子及计算机工程学系,中国香港
    2. 辽宁大学 信息学院,沈阳 110036
  • 出版日期:2015-11-01 发布日期:2015-11-03

Planar Graph Cover Based Reachability Query Processing in Large Graph

DUAN Yuqing1, LI Shifeng2, DING Linlin2+   

  1. 1. Department of Electronic & Computer Engineering, Hong Kong University of Science and Technology, Hong Kong, China
    2. School of Information, Liaoning University, Shenyang 110036, China
  • Online:2015-11-01 Published:2015-11-03

摘要: 随着语义网络、社交网络、生物信息网络等新兴应用的涌现及普及,图数据的规模不断增大,针对大规模图数据的研究成为当今的研究热点和难点。可达查询是图数据处理中频繁使用的基础性查询,一些复杂的查询能够分解成包含多个可达查询的操作集合,其高效处理具有重要意义。针对大规模图的可达查询,提出了一种基于平面图覆盖的大规模图可达查询处理方法。首先给出了一种基于平面图覆盖的可达标签索引方法(planar graph cover based reachability labeling index method,PGCL)。该方法将最优树作为预处理应用于平面图覆盖,通过最优树创建、最优树分解以及树分解平面化处理,得到有向无环图(directed acyclic graph,DAG)的平面图覆盖,最大限度地保留了原图的可达性信息,从而基于覆盖顶点创建二维标签,用于压缩可达传递闭包。设计了基于PGCL的可达查询算法,有效实现了大规模图的可达查询。通过大量实验证明了提出的查询方法在保证查询的高效性情况下,更好地压缩了传递闭包,提高了可达查询的处理效率。

关键词: 大规模有向图, 平面图覆盖, 标签索引方法, 可达查询

Abstract: With the springing up and wide use of emerging applications, such as semantic networks, social networks and bioinformatics networks, the research on increasingly large-scale graph data has become a hot and difficult problem. Reachability query is a fundamental query frequently used in graph data analysis and processing which has important effectiveness. Some complex queries can be decomposed into operation sets containing multiple reachability queries. For processing the reachability query of large graph, this paper proposes planar graph cover based reachability query processing in large graph. Firstly, this paper presents a planar graph cover based reachability labeling index method (PGCL). PGCL uses the optimal tree as the preprocessing to process the planar graph cover. By creating the optimal tree, optimal tree decomposition and some plane processing, this paper obtains the planar graph cover of a DAG (directed acyclic graph), which maximally retains the accessibility information of the original image, and creates labels for vertexes and compresses the reachability transitive closure. Then, this paper designs a reachability query algorithm based on PGCL to effectively process the reachability query of large graph. The experimental results on real data sets show that the proposed query method has better performance of compressing the transitive closure, and enhances the performance of reachability query.

Key words:  large-scale digraph, planar graph cover, labeling index method, reachability query