计算机科学与探索

• 学术研究 •    下一篇

TCGCL:基于图对比学习的复杂网络流量分类算法

胡仲则, 秦宏超, 李振军, 李艳辉, 李荣华, 王国仁   

  1. 1. 北京理工大学 计算机学院, 北京 100081
    2. 深圳城市职业学院 信息与通信学院, 广东 深圳 518038
    3. 重庆交通大学 信息科学与工程学院, 重庆 400074
    4. 深圳市龙岗区智能供应链技术重点实验室, 广东 深圳 518100

TCGCL:A Complex Network Traffic Classification Algorithm Based on Graph Contrastive Learning

HU Zhongze, QIN Hongchao, LI Zhenjun, LI Yanhui, LI Ronghua, WANG Guoren   

  1. 1. School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
    2. School of Information and Communication, Shenzhen City Polytechnic, Shenzhen, Guangdong 518038, China
    3. School of Information Science and Engineering, Chongqing Jiaotong University, Chongqing 400074, China
    4. Key Laboratory of Intelligent Supply Chain Technology in Longgang District, Shenzhen‌, Guangdong 518100, China

摘要: 网络流量分类技术在网络安全领域中起到了关键性作用。现代网络架构高度复杂,网络流量传输过程中将会不可避免的遇到各种异常情况。为此,本文提出了一种稳定性指标,评估算法对数据异常干扰的抵抗能力,另外还基于图对比学习技术提出一种流量分类算法TCGCL(Traffic Classification Graph Contrastive Learning),可以同时提取网络流量内部的载荷特征及网络流量之间的通联关系特征,更全面的保留数据有效信息,在此基础上通过数据增强技术模拟网络流量异常状态表现,大幅提升了算法在数据异常情况下的分类稳定性。另外,基于协议分析技术,TCGCL对流量分类过程中图结构数据的构造方式进行了研究,提出了一种高质低维的属性生成方法。实验表明,相较于基线算法,在达到近乎相同准确率的前提下,TCGCL的样本输入维度降低了约80%。另外,针对复杂网络通信环境,TCGCL对测试样本进行了噪音混淆,模拟流量异常的情况,测试结果表明,TCGCL在流量出现异常的条件下仍可保持很高的分类准确率,且稳定性指标大幅领先于基线算法。

关键词: 流量分类, 图神经网络, 对比学习, 协议分析

Abstract: Network traffic classification technology plays a crucial role in the field of network security. Modern network architecture is highly complex, and various abnormal situations will inevitably be encountered during network traffic transmission. To this end, this article proposed a stability index to evaluate the algorithm's resistance to data anomaly interference. In addition, a traffic classification algorithm TCGCL(Traffic Classification Graph Contrastive Learning) is proposed based on graph contrastive learning. It can simultaneously extract the payload characteristics within network traffic and the connectivity relationship characteristics between network traffic, more comprehensively preserving the effective information of data. Based on this, through data augmentation technology, it simulates the abnormal state of network traffic, greatly improving the classification performance of the algorithm in the case of data anomalies. In addition, based on protocol analysis techniques, TCGCL has studied the construction of graph structured data in the process of traffic classification and proposed a high-quality and low dimensional attribute generation method. The experiment shows that compared to the baseline algorithm, TCGCL reduces the sample input dimension by about 80% with almost the same accuracy. For complex network communication environments, TCGCL conducted noise obfuscation on test samples and simulated abnormal traffic situations. The results show that TCGCL can still maintain high classification accuracy even under abnormal traffic conditions, and its stability index is significantly ahead of the baseline algorithm.

Key words: traffic classification, graph neural networks, contrastive learning, protocol analysis