计算机科学与探索 ›› 2024, Vol. 18 ›› Issue (5): 1211-1222.DOI: 10.3778/j.issn.1673-9418.2303079

• 理论·算法 • 上一篇    下一篇

满足强连通性的有向团枚举算法研究

陈久健,代强强,李荣华,王国仁   

  1. 北京理工大学 计算机学院,北京 100081
  • 出版日期:2024-05-01 发布日期:2024-04-29

Research on Directed Clique Enumeration with Strongly Connected Constraint

CHEN Jiujian, DAI Qiangqiang, LI Ronghua, WANG Guoren   

  1. School of Computer Science & Technology, Beijing Institute of Technology, Beijing 100081, China
  • Online:2024-05-01 Published:2024-04-29

摘要: 有向图的有向边可以表示关系的指向或者数据的传递,在稠密子图的挖掘中引入连通性的约束可以增加顶点之间的联系。为此,结合极大团与强连通分量的定义,底图是完全子图且顶点之间满足强连通性的子图结构被称为有向团。已有工作给出了枚举极大有向团的输出敏感算法,然而其存在大量重复枚举和判重操作复杂等不足之处。为了解决这些问题,基于深度优先搜索的思想和有向团的扩展性质,提出一种新颖的递归枚举算法。算法对于出边邻居和入边邻居分别划分候选集与排除集,维护完全子图结构的同时,不断尝试扩展有向团并保证满足强连通性,并且引入基于共同邻居的支撑点剪枝策略,在稠密图上获得上千倍的效率优化。算法还针对搜索空间给出两种优化设计:一是添加了分割子图的预处理,限制递归调用的搜索范围;二是基于位向量压缩表示顶点集合,提高集合运算的效率。在真实图数据上的实验结果表明,相比现有工作中的输出敏感算法,提出的算法具有50倍以上的加速比。

关键词: 图数据挖掘, 有向团, 强连通性, 支撑点剪枝, 位向量压缩

Abstract: Directed edges in a directed graph can represent the direction of relationships or the transmission of data. Introducing connectivity constraints in the mining of dense subgraphs can enhance the connections between vertices. Accordingly, by combining the definitions of maximal cliques and strongly connected components, a directed clique is a subgraph structure where the underlying graph is a complete subgraph and the vertices are strongly connected. Existing work has proposed an output-sensitive algorithm for enumerating maximal directed cliques, but it suffers from lots of repeated enumerations and complex deduplication operations. To address these issues, a novel recursive enumeration algorithm based on depth-first search and the characteristics of directed clique extension is proposed. The algorithm divides the outgoing and incoming neighbors into candidate and excluded sets respectively. While preserving the structure of complete subgraph, it constantly tries to extend the directed cliques and ensures that they are strongly connected. Additionally, the algorithm adopts a pivot pruning strategy based on common neighbors, achieving efficiency optimization of over a thousand times on dense graphs. Furthermore, the algorithm utilizes two optimization designs for the search space: firstly, a pre-processing step is introduced to divide subgraphs and narrow the search for the recursion; secondly, a bitwise compression representation is proposed for vertex sets to improve the efficiency of set operations. Experimental results on real-world graphs demonstrate that the proposed algorithm achieves a speedup of more than 50x compared with the existing output-sensitive algorithm.

Key words: graph data mining, directed clique, strongly connected, pivot pruning, bitwise compression