计算机科学与探索

• 学术研究 •    下一篇

多维度边优化溯源图改进的APT攻击检测方法

何厚翰,芦天亮,张岚泽,袁梦娇,曾高俊   

  1. 中国人民公安大学 信息网络安全学院, 北京 100038

Improved APT detection with multi-dimensional edge optimization in provenance graph

HE Houhan,  LU Tianliang,  ZHANG Lanze,  YUAN Mengjiao,  ZENG Gaojun   

  1. College of Information and Cyber Security, People’s Public Security University of China, Beijing 100038, China

摘要: 考虑到高级持续性威胁(APT)的复杂特性,利用溯源图能够将系统事件以因果关系联系起来,现有研究尝试将溯源图技术应用于检测此类攻击及取证分析。针对溯源图规模爆炸、数据样本不均衡引起的过渡平滑现象,以及系统事件类型过度多样化导致的关系型算法数据稀疏性问题,提出了一种多维度边优化溯源图改进的APT攻击检测方法。首先,面向系统内核日志设计多模块化的前端解析方案,解决海量日志的溯源图建模问题,并对原有上下文语义缺失进行补全;然后,通过边缩减优化策略的K阶子图采样方法关注与攻击活动相关的局部结构,将提取到的多维度边特征利用图嵌入技术学习并融合为边属性的嵌入表达;最后,通过在图注意力网络(GAT)中引入多维度边属性与节点特征的注意力计算,并与节点间的注意力计算相融合以构建混合注意力机制。调参及消融实验结果表明,所提方法有效缩减了溯源图规模,同时具备较低的计算资源消耗与算法时间复杂度。对比实验结果验证了所提方法在数据不均衡及事件类型多样化背景下,模型的综合检测性能有较大提升,相比R-GCN等传统关系型算法,Precision,Recall和F1值分别提高5.70%、4.35%和5.08%。

关键词: 溯源图, APT攻击检测, 图注意力网络, 边缩减优化, 图采样

Abstract: Considering the complex nature of advanced persistent threat (APT), the utilization of provenance graphs facilitates the establishment of causal relationships among system events. Existing research endeavors to apply provenance graph techniques to the detection of such attacks and forensic analysis. In response to the issues of provenance graph scalability explosion, the over-smooth phenomenon caused by imbalanced data samples, and the data sparsity problem of relational algorithms due to the excessive diversification of system event types, an improved APT detection method based on multi-dimensional edge optimization of provenance graph is proposed. Firstly, a multi-modular front-end parsing scheme is designed for system kernel logs to address the provenance graph modeling issue in the face of massive log data, and to complement the missing contextual semantics in the original data. Subsequently, a K-hop subgraph sampling method incorporating an edge reduction optimization strategy is employed to focus on the local structures related to attack activities. The multi-dimensional edge features extracted are then leveraged using graph embedding techniques to learn and integrate into an embedded representation of edge attributes. Finally, by introducing the attention computation of multi-dimensional edge attributes and node features within the Graph Attention Networks (GAT), and merging it with the inter-node attention calculations, a hybrid attention mechanism is constructed. The results of hyperparameter tuning and ablation experiments indicate that the proposed method effectively reduces the scale of the provenance graph, concurrently achieving lower computational resource consumption and algorithmic time complexity. Comparative experimental results validate that under conditions of data imbalance and diverse event types, the comprehensive detection performance of the model is significantly enhanced. Compared to traditional relational algorithms such as R-GCN, the Precision, Recall, and F1 scores of the proposed method are improved by 5.70%, 4.35%, and 5.08%, respectively.

Key words: provenance graph, APT detection, graph attention networks, edge optimization, graph sampling