计算机科学与探索 ›› 2023, Vol. 17 ›› Issue (10): 2426-2434.DOI: 10.3778/j.issn.1673-9418.2212016

• 理论·算法 • 上一篇    下一篇

适用于稀疏图的基于关键点标记的可达性算法

苗伟华,危辉   

  1. 复旦大学 计算机科学技术学院/软件学院 认知算法模型实验室,上海 200438
  • 出版日期:2023-10-01 发布日期:2023-10-01

Reachability Algorithm Based on Key Points Labeling for Sparse Graphs

MIAO Weihua, WEI Hui   

  1. Laboratory of Algorithms for Cognitive Models, School of Computer Science/School of Software,  Fudan University, Shanghai 200438, China
  • Online:2023-10-01 Published:2023-10-01

摘要: 有向图中任意两点间的可达性查询是研究各种网络问题时的一个基础操作,如在社交网络中查询两个人是否相互关注等。但随着网络规模的日益扩大,传统算法因巨大的时间或空间复杂度而变得难以被应用。因此需要根据网络结构特点针对性地使用合适的可达性算法。稀疏图可以看作由若干有向生成树与少量非树边组成,GRKPL算法将稀疏图中的可达性问题拆分成两部分:树上可达性问题与加入非树边后带来的影响。前一部分使用区间标记法解决;后一部分通过构造关键点集,将原图中所有的可达性查询转化为关键点集中的查询后得以解决。关键点集包括所有被非树边覆盖的节点,以及这些节点按照前序遍历的顺序排序后相邻节点之间的最近公共祖先。证明了关键点集的大小与原图中非树边的规模具有相同的数量级。最后在10个中小规模与4个大规模现实数据集上进行了测试,GRKPL在中小规模数据集上表现优异,查询处理时间相较于其他算法平均减少49.8%,空间占用平均减少65.1%。

关键词: 可达性, 稀疏图, 有向图, 强连通, 最近公共祖先, 位运算

Abstract: Reachability queries of directed graphs are fundamental when studying various network problems, such as querying whether two persons follow each other in a social network. However, with the increasing size of networks, traditional algorithms have become unacceptable due to their huge time or space complexity. Therefore, it is necessary to use a suitable reachability algorithm according to the structural characteristics of the network. A sparse graph consists of several directed spanning trees with a small number of non-tree edges. The GRKPL (graph reachability indexing via key points labeling) algorithm splits the reachability problem in sparse graphs into two parts: the tree reachability problem and the impact of adding non-tree edges. The first part is solved by the interval labeling method, and the second part is solved by constructing a key point set, which transforms all the reachability queries in the original graph into queries in the key point set. The key point set includes all nodes covered by non-tree edges and the lowest common ancestors between adjacent nodes after these nodes are sorted in the preorder traversal order. And it is proven that the size of the key point set is of the same order of magnitude as the size of the non-tree edges in the original graph. Finally, GRKPL is tested on ten small- and medium-scale and four large-scale realistic datasets. GRKPL performs well on small- and medium-scale datasets, with an average 49.8% reduction in query processing time and 65.1% reduction in space occupation compared to other algorithms.

Key words: reachability, sparse graphs, directed graphs, strongly connected components, lowest common ancestor, bitwise operations