计算机科学与探索 ›› 2017, Vol. 11 ›› Issue (9): 1379-1388.DOI: 10.3778/j.issn.1673-9418.1607016

• 数据库技术 • 上一篇    下一篇

多样性度量的Top-K区分子图挖掘

王章辉,赵宇海+,王国仁,李  源   

  1. 东北大学 计算机科学与工程学院,沈阳 110819
  • 出版日期:2017-09-01 发布日期:2017-09-06

Top-K Discriminative Subgraph Mining Based on Diversity Measure

WANG Zhanghui, ZHAO Yuhai+, WANG Guoren, LI Yuan   

  1. School of Computer Science and Engineering, Northeastern University, Shenyang 110819, China
  • Online:2017-09-01 Published:2017-09-06

摘要: 区分子图可以用来描述复杂的图数据结构和构建高效的图分类模型。提出了多样性度量的Top-K区分子图挖掘问题,避免了挖掘结果之间出现高度相关的子图模式,提高了区分子图模式的可用性。通过组合图结构相似性与支持集相似性约束,给出图模式的多样性度量标准。提出两个高效算法Greedy-TopK和Leap-TopK挖掘多样性度量的Top-K区分子图。Greedy-TopK算法采用两阶段的增量式贪婪方法快速挖掘K个区分子图模式。Leap-TopK算法通过在挖掘过程中限制扩展结构相似的图模式,实现了跳跃搜索子图模式空间。实验结果表明,Leap-TopK算法的效率明显优于Greedy-TopK算法;在可用性方面,利用Leap-TopK算法与Greedy-TopK算法挖掘结果构建的图分类器具有相似的分类精度,且都优于传统区分子图挖掘算法产生的结果。

关键词: 图挖掘, 图分类, 子图模式, 区分子图, 多样性

Abstract: Discriminative subgraph can be used to characterize complex graph structures and construct efficient graph classification model. This paper proposes the Top-K discriminative subgraph mining problem based on diversity measure. The diversity measure can be used to mine low correlation subgraph patterns in the mining result, which can enhance the usefulness of the discriminative subgraph patterns. By exploiting the graph structure similarity and support set similarity restraints, this paper introduces the criterion of graph pattern diversity measure. Then this paper proposes two efficient algorithms, Greedy-TopK and Leap-TopK, for the problem. Greedy-TopK algorithm employs two step strategies to incrementally and greedily mine K discriminative subgraphs. By limiting the structure similarity graph pattern extension in the discriminative subgraph mining process, Leap-TopK algorithm can leap the graph pattern searching space. Extensive experimental results demonstrate that Leap-TopK algorithm is more efficient than Greedy-TopK algorithm. Besides, when the mining results of discriminative subgraphs are considered, the classification accuracies of the two algorithms are almost the same. But they are all superior to the traditional discriminative subgraph mining algorithm in terms of usefulness.

Key words: graph mining, graph classification, subgraph pattern, discriminative subgraph, diversity