TCGCL：基于图对比学习的复杂网络流量分类算法

doi:10.3778/j.issn.1673-9418.2407095

摘要/Abstract

摘要： 网络流量分类技术在网络安全领域中起到了关键性作用。现代网络架构高度复杂，网络流量传输过程中将会不可避免地遇到各种异常情况。为此，提出了一种稳定性指标，评估算法对数据异常干扰的抵抗能力，基于图对比学习技术提出一种流量分类算法TCGCL，可以同时提取网络流量内部的载荷特征及网络流量之间的通联关系特征，更全面地保留数据有效信息，在此基础上通过数据增强技术模拟网络流量异常状态表现，大幅提升了算法在数据异常情况下的分类稳定性。基于协议分析技术，对流量分类过程中图结构数据的构造方式进行了研究，提出了一种高质低维的属性生成方法。实验表明，相较于基线算法，在达到近乎相同准确率的前提下，TCGCL的样本输入维度降低了约80%。针对复杂网络通信环境，TCGCL对测试样本进行了噪音混淆，模拟流量异常的情况。测试结果表明，TCGCL在流量出现异常的条件下仍可保持很高的分类准确率，且稳定性指标大幅领先于基线算法。

关键词: 流量分类, 图神经网络, 对比学习, 协议分析

Abstract: Network traffic classification technology plays a crucial role in the field of network security. Modern network architecture is highly complex, and various abnormal situations will inevitably be encountered during network traffic transmission. To this end, this paper proposes a stability index to evaluate the algorithm’s resistance to data anomaly interference. In addition, a traffic classification algorithm TCGCL (traffic classification graph contrastive learning) is proposed based on graph contrastive learning. It can simultaneously extract the payload characteristics within network traffic and the connectivity relationship characteristics between network traffic, more comprehensively preserving the effective information of data. Based on this, through data augmentation technology, it simulates the abnormal state of network traffic, greatly improving the classification performance of the algorithm in the case of data anomalies. In addition, based on protocol analysis techniques, this paper studies the construction of graph structured data in the process of traffic classification and proposes a high-quality and low dimensional attribute generation method. The experiment shows that compared with the baseline algorithm, TCGCL reduces the sample input dimension by about 80% with almost the same accuracy. For complex network communication environments, TCGCL conducts noise obfuscation on test samples and simulates abnormal traffic situations. The results show that TCGCL can still maintain high classification accuracy even under abnormal traffic conditions, and its stability index is significantly ahead of the baseline algorithm.

Key words: traffic classification, graph neural networks, contrastive learning, protocol analysis

胡仲则, 秦宏超, 李振军, 李艳辉, 李荣华, 王国仁. TCGCL：基于图对比学习的复杂网络流量分类算法[J]. 计算机科学与探索, 2025, 19(5): 1230-1240.

HU Zhongze, QIN Hongchao, LI Zhenjun, LI Yanhui, LI Ronghua, WANG Guoren. TCGCL: Complex Network Traffic Classification Algorithm Based on Graph Contrastive Learning[J]. Journal of Frontiers of Computer Science and Technology, 2025, 19(5): 1230-1240.

参考文献

[1] MADHUKAR A, WILLIAMSON C. A longitudinal study of P2P traffic classification[C]//Proceedings of the 14th IEEE International Symposium on Modeling, Analysis, and Simulation. Piscataway: IEEE, 2006: 179-188.
[2] LIN P C, LIN Y D, LAI Y C, et al. Using string matching for deep packet inspection[J]. Computer, 2008, 41(4): 23-28.
[3] Internet control message protocol: RFC792[S]. 1981.
[4] Transmission control protocol: RFC793[S]. 1981.
[5] User datagram protocol: RFC768[S]. 1980.
[6] NIELSEN H, FIELDING R T, BERNERS-LEE T. Hypertext transfer protocol-HTTP/1.0: RFC1945[S]. 1996.
[7] BARNES R, THOMSON M, PIRONTI A, et al. Deprecating secure sockets layer version 3.0: RFC7568[S]. 2015.
[8] ALLEN C, DIERKS T. The TLS protocol version 1.0: RFC2246[S]. 1999.
[9] SHAFIQ M, YU X Z, ALI LAGHARI A, et al. Network traffic classification techniques and comparative analysis using machine learning algorithms[C]//Proceedings of the 2016 2nd IEEE International Conference on Computer and Communications. Piscataway: IEEE, 2016: 2451-2455.
[10] DHOTE Y, AGRAWAL S, DEEN A J. A survey on feature selection techniques for Internet traffic classification[C]//Proceedings of the 2015 International Conference on Computational Intelligence and Communication Networks. Piscataway: IEEE, 2015: 1375-1380.
[11] PACHECO F, EXPOSITO E, GINESTE M, et al. Towards the deployment of machine learning solutions in network traffic classification: a systematic survey[J]. IEEE Communications Surveys & Tutorials, 2019, 21(2): 1988-2014.
[12] LI R, XIAO X, NI S G, et al. Byte segment neural network for network traffic classification[C]//Proceedings of the 2018 IEEE/ACM 26th International Symposium on Quality of Service. Piscataway: IEEE, 2018: 1-10.
[13] WANG W, ZHU M, ZENG X W, et al. Malware traffic classification using convolutional neural network for representation learning[C]//Proceedings of the 2017 International Conference on Information Networking. Piscataway: IEEE, 2017: 712-717.
[14] WANG W, ZHU M, WANG J L, et al. End-to-end encrypted traffic classification with one-dimensional convolution neural networks[C]//Proceedings of the 2017 IEEE International Conference on Intelligence and Security Informatics. Piscataway: IEEE, 2017: 43-48.
[15] PANG B, FU Y, REN S, et al. CGNN: traffic classification with graph neural network[EB/OL]. [2024-05-14]. https://arxiv. org/abs/2110.09726.
[16] ZHENG J, ZENG Z Y, FENG T. GCN-ETA: high-efficiency encrypted malicious traffic detection[J]. Security and Communication Networks, 2022(1): 4274139.
[17] DIAO Z L, XIE G G, WANG X, et al. EC-GCN: a encrypted traffic classification framework based on multi-scale graph convolution networks[J]. Computer Networks, 2023, 224: 109614.
[18] HU G W, XIAO X, SHEN M, et al. TCGNN: packet-grained network traffic classification via graph neural networks[J]. Engineering Applications of Artificial Intelligence, 2023, 123: 106531.
[19] NIELSEN H, MOGUL J, MASINTER L M, et al. Hypertext transfer protocol-HTTP/1.1: RFC2616[S]. 1999.
[20] RESCORLA E, DIERKS T. The transport layer security (TLS) protocol version 1.2: RFC5246[S]. 2008.
[21] LOTFOLLAHI M, JAFARI SIAVOSHANI M, SHIRALI HOSSEIN ZADE R, et al. Deep packet: a novel approach for encrypted traffic classification using deep learning[J]. Soft Computing, 2020, 24(3): 1999-2012.
[22] MAO K L, XIAO X, HU G W, et al. Byte-label joint attention learning for packet-grained network traffic classification[C]//Proceedings of the 2021 IEEE/ACM 29th International Symposium on Quality of Service. Piscataway: IEEE, 2021: 1-10.
[23] DIAO Y X, SUN Z B, ZHOU Y. A multi-label imbalanced data classification method based on label partition integration[C]//Proceedings of the 20th International Conference on Web Information Systems and Applications. Singapore: Springer, 2023: 14-25.