Journal of Frontiers of Computer Science and Technology ›› 2025, Vol. 19 ›› Issue (11): 3072-3082.DOI: 10.3778/j.issn.1673-9418.2507064

• Artificial Intelligence·Pattern Recognition • Previous Articles     Next Articles

Vulnerability Detection Method Integrating Global Graph Topology and Multi-scale Masked Convolution

HUANG Anbo, QU Haicheng, JIANG Qingling   

  1. 1. College of Software, Liaoning Technical University, Huludao, Liaoning 125105, China
    2. College of Science, Tieling Normal College, Tieling, Liaoning 112000, China
  • Online:2025-11-01 Published:2025-10-30

融合全局图拓扑与多尺度掩码卷积的漏洞检测方法

黄安博,曲海成,姜庆玲   

  1. 1. 辽宁工程技术大学 软件学院,辽宁 葫芦岛 125105
    2. 铁岭师范高等专科学校 理学院,辽宁 铁岭 112000

Abstract: Sequence-based deep learning methods exhibit limitations in modeling the structural characteristics of source code. Although graph neural networks (GNNs) can aggregate neighborhood information to enrich node representations, they struggle to effectively capture global graph representations and long-range dependencies among nodes. To address these challenges, this paper proposes a novel approach named GTMC-VD (global graph topology and multi-scale masked convolution network for vulnerability detection). In this method, source code is first transformed into a code property graph (CPG) using the Joern tool. Subsequently, the Word2Vec algorithm is applied to embedding nodes within the graph, generating initial node representations. This paper designs and implements the global graph topology encoder that leverages GCN (graph convolutional networks) outputs as node importance scores. The graph structure is then simplified via the score, updating the adjacency matrix and node features accordingly. A hierarchical pooling strategy is employed to get multi-scale topological information and aggregate it to obtain comprehensive global graph representation. Within the multi-scale masked convolution module, two convolutional kernels of different scales are used to capture relationships between both distant and neighboring nodes. Additionally, a masking mechanism is introduced to handle variable-length graph data, mitigating noise caused by padding nodes. Finally, a gating mechanism adaptively fuses the outputs of the two components to produce the final vulnerability detection result. Extensive experiments on two public datasets show that the proposed method effectively addresses the aforementioned two problems, achieving improvements of 6.69, 4.43, 13.63, and 8.17 percentage points in Accuracy, Precision, Recall and F1-Score, respectively, compared with the baseline model (Devign). In summary, GTMC-VD effectively captures global graph features while mitigating the limitation of GCN-based models in capturing long-range dependencies, providing a more robust and effective solution for vulnerability detection tasks.

Key words: vulnerability detection, graph neural networks, graph topology, multi-scale convolution, mask mechanism

摘要: 基于序列的深度学习方法在建模源代码的结构特征方面存在不足,而图神经网络(GNN)虽然可以通过聚合邻居节点信息丰富当前节点表征,但无法有效获取图的全局特征信息,且难以捕获图节点间长距离依赖。为克服上述问题,提出了一种融合全局图拓扑与多尺度掩码卷积的门控漏洞检测方法(GTMC-VD)。在该方法中,利用开源工具(Joern)将源代码转换为代码属性图(CPG),采用词嵌入模型(Word2Vec)对图中节点进行嵌入以获得图中节点的初始表示,设计并实现了图全局拓扑编码器。该编码器利用图卷积网络(GCN)的输出作为节点重要性评分,利用该评分对图结构进行简化,并对邻接矩阵和节点特征进行更新,通过层次化的策略实现逐层优化并采用不同池化层获取多尺度的拓扑信息,最终聚合以获取图的全局特征。之后采用两个不同尺度的卷积核捕捉节点之间的依赖关系,同时针对变长图数据引入掩码机制,避免因填充节点带来的噪声干扰,实现了多尺度掩码卷积模块。最终,引入门控机制,自适应融合两个模块的输出结果,并得到模型最终检测结果。在两个公开数据集上的大量实验表明,所提方法有效解决了上述两个问题,并在准确率、精确率、召回率和[F1]分数指标上相比于基准模型(Devign)分别提高了6.69、4.43、13.63和8.17个百分点。总之,GTMC-VD有效获取了图的全局特征,且缓解了基于GCN模型无法捕捉长距离依赖的问题,为漏洞检测任务提供了一种更为鲁棒且高效的解决方案。

关键词: 漏洞检测, 图神经网络, 图拓扑, 多尺度卷积, 掩码机制