计算机科学与探索 ›› 2025, Vol. 19 ›› Issue (11): 2967-2980.DOI: 10.3778/j.issn.1673-9418.2502013

• 理论·算法 • 上一篇    下一篇

IMLS:用于大规模属性图的迭代有损图摘要方法

赵丹枫,马健,贺琪,郑小罗,李明刚   

  1. 上海海洋大学 信息学院,上海 201306
  • 出版日期:2025-11-01 发布日期:2025-10-30

IMLS: Iterative Lossy Graph Summarization for Large-Scale Attributed Graphs

ZHAO Danfeng, MA Jian, HE Qi, ZHENG Xiaoluo, LI Minggang   

  1. College of Information Technology, Shanghai Ocean University, Shanghai 201306, China
  • Online:2025-11-01 Published:2025-10-30

摘要: 随着计算资源的进步,越来越多的现实世界数据以图的形式存储在计算机中。然而,直接处理大规模图所需的时间和资源成本不断攀升。有损图摘要技术在确保保留图的整体结构和主要属性的前提下,对图进行精炼和删减,从而生成一个近似输入图的紧凑表示,来获得较小存储需求和更高计算效率。然而,已有的工作通常面临计算效率与摘要质量的不平衡,或缺乏节点属性的纳入。为了解决这些问题,提出了一种基于迭代最小哈希局部敏感哈希(MinHash-LSH)的有损图摘要方法IMLS。方法通过引入结合属性权重的得分函数评估节点合并收益,迭代地合并节点对,生成一个限制节点数量的摘要图,旨在最大限度地减少重构误差与存储大小,同时最大限度地提高节点同质性。实验结果表明,IMLS在压缩比相似的情况下,运行时间比同类方法快约51.3倍,且能够生成较同类方法重构误差低约91.87%的摘要图。此外,IMLS方法具有线性可扩展性,适用于大规模图,并且通过控制属性权重参数,能够确保生成的摘要图具有可控的节点同质性。

关键词: 图计算, 图摘要, 图压缩, 图简化

Abstract: With the advancement of computing resources, an increasing amount of real-world data is stored in the form of graphs. However, the time and resource costs required to process large-scale graphs continue to rise. Lossy graph summarization techniques refine and reduce the size of the graph while preserving its overall structure and key attributes, generating a compact representation that approximates the input graph. This approach helps to achieve lower storage requirements and higher computational efficiency. However, existing methods often face a trade-off between computational efficiency and summary quality, or fail to incorporate node attributes. To address these issues, this paper proposes an iterative MinHash-LSH (MinHash locality-sensitive hashing) lossy graph summarization method IMLS. The method introduces a scoring function that combines attribute weights to evaluate the benefit of node merging. By iteratively merging node pairs, it generates a summary graph with a restricted number of nodes. The method aims to minimize reconstruction error and storage size while maximizing node homogeneity. Experimental results show that, with similar compression ratios, IMLS is 51.3 times faster than comparable methods and produces summary graphs with 91.87% lower reconstruction error. Additionally, IMLS exhibits linear scalability, making it suitable for large-scale graphs. By controlling the attribute weight parameter, it ensures that the generated summary graphs maintain controllable node homogeneity.

Key words: graph computation, graph summarization, graph compression, graph simplification