网络威胁技战术情报识别提取生成式技术研究

doi:10.3778/j.issn.1673-9418.2405088

摘要/Abstract

摘要： MITRE ATT&CK定义了网络攻击全过程14类战术625类技术，逐步成为网络威胁技战术情报（TTP）的事实标准，现有研究基于此分类将TTP识别提取问题转化为句子级别的战、技术类别多分类任务，利用深度学习、基于提示工程的大语言模型进行问题研究。但限于数据集小样本类别占比大、多分类模型性能瓶颈问题，类别识别覆盖率与精度较低。提出结合ChatGPT数据增强和指令监督微调大语言模型的方法，较好地解决了句子级别技术类别多分类问题。ChatGPT数据增强方法在保留原始样本语义基础上更好地丰富了样本多样性，为小样本学习高性能识别提供了高质量训练数据支撑，实验结果也证明了本数据增强方法的优越性；指令监督微调大语言模型，突破了深度学习多分类模型的性能瓶颈，实现625类技术类别识别全覆盖，Precision、Recall和F1值分别达到了86.2%、89.9%和88.0%，优于已有研究。

关键词: 网络威胁情报（CTI）；网络威胁技战术情报（TTP）；ATT&, CK；数据增强；大语言模型；监督微调（SFT）

Abstract: The MITRE ATT&CK framework defines 14 tactics and 625 techniques that cover the full spectrum of cyber attacks. It has progressively become the de facto standard for describing tactics, techniques, and procedures (TTPs) in cyber threat intelligence. Current research often transforms the task of identifying and extracting TTPs into a multi-class classification problem at the sentence level, employing deep learning and large language models based on prompt engineering. However, issues such as the dominance of small sample categories in datasets and the performance limitations of multi-class models result in low coverage and accuracy in category identification. This paper proposes a method that combines ChatGPT data augmentation with instruction-supervised fine-tuning of large language models, effectively addressing the multi-class classification problem for technique categories at the sentence level. The ChatGPT data augmentation method enriches sample diversity while preserving the original sample semantics, providing high-quality training data to support high-performance recognition in small sample learning. Experimental results demonstrate the superiority of this data augmentation method. The instruction-supervised fine-tuning of the large language model overcomes the performance bottleneck of deep learning multi-class models, achieving full coverage of 625 technique categories. The Precision, Recall, and F1-score reach 86.2%, 89.9% and 88.0%, respectively, surpassing existing research.

Key words: cyber threat intelligence (CTI), tactics, techniques and procedures (TTPs), ATT&CK, data augmentation, large language model, supervised fine-tuning (SFT)

于丰瑞, 杜彦辉. 网络威胁技战术情报识别提取生成式技术研究[J]. 计算机科学与探索, 2025, 19(1): 118-131.

YU Fengrui, DU Yanhui. Research on Generative Techniques for Identifying and Extracting Tactics, Techniques and Procedures[J]. Journal of Frontiers of Computer Science and Technology, 2025, 19(1): 118-131.

参考文献

[1] OOSTHOEK K, DOERR C. Cyber threat intelligence: a product without a process?[J]. International Journal of Intelligence and CounterIntelligence, 2021, 34(2): 300-315.
[2] SCHLETTE D, BÖHM F, CASELLI M, et al. Measuring and visualizing cyber threat intelligence quality[J]. International Journal of Information Security, 2021, 20: 21-38.
[3] KIM H, KIM H. Comparative experiment on TTP classification with class imbalance using oversampling from CTI dataset[J]. Security and Communication Networks, 2022(1):5021125.
[4] RAHMAN M R, WILLIAMS L. From threat reports to continuous threat intelligence: a comparison of attack technique extraction methods from textual artifacts[EB/OL].[2024-04-13]. https://arxiv.org/abs/2210.02601.
[5] LIU J, YAN J, JIANG J, et al. TriCTI: an actionable cyber threat intelligence discovery system via trigger-enhanced neural network[J]. Cybersecurity, 2022, 5(1): 8.
[6] RANI N, SAHA B, MAURYA V, et al. TTPXHunter: actionable threat intelligence extraction as TTPs form finished cyber threat reports[EB/OL]. [2024-04-13]. https://arxiv.org/abs/2403.03267.
[7] LIU C, WANG J, CHEN X. Threat intelligence ATT&CK extraction based on the attention transformer hierarchical recurrent neural network[J]. Applied Soft Computing, 2022, 122: 108826.
[8] YU Z, WANG J F, TANG B H, et al. Tactics and techniques classification in cyber threat intelligence[J]. The Computer Journal, 2023, 66(8): 1870-1881.
[9] YU Z, WANG J F, TANG B H, et al. Research on the classification of cyber threat intelligence techniques and tactics based on attention mechanism and feature fusion[J]. Journal of Sichuan University (Natural Science Edition), 2022, 59(5): 053003.
[10] GE W H, WANG J F, TANG B H, et al. RENet: tactics and techniques classifications for cyber threat intelligence with relevance enhancement[J]. Journal of Sichuan University (Natural Science Edition), 2022, 59(2): 023004.
[11] LIU X, TAN Y, XIAO Z, et al. Not the end of story: an evaluation of ChatGPT-driven vulnerability description mappings[C]//Findings of the Association for Computational Linguistics: ACL 2023. Stroudsburg: ACL, 2023: 3724-3731.
[12] ZHANG Y, DU T, MA Y, et al. AttacKG+: boosting attack knowledge graph construction with large language models[EB/OL]. [2024-06-19]. https://arxiv.org/abs/2405.04753.
[13] CHAWLA N V, BOWYER K W, HALL L O, et al. SMOTE: synthetic minority over-sampling technique[J]. Journal of Artificial Intelligence Research, 2002, 16: 321-357.
[14] WEI J, ZOU K. EDA: easy data augmentation techniques for boosting performance on text classification tasks[EB/OL]. [2024-04-13]. https://arxiv.org/abs/1901.11196.
[15] CONTI M, DARGAHI T, DEHGHANTANHA A. Cyber threat intelligence: challenges and opportunities[M]. Cham:Springer International Publishing, 2018.
[16] AGHAEI E, NIU X, SHADID W, et al. SecureBERT: a domain-specific language model for cybersecurity[C]//Proceedings of the 2022 International Conference on Security and Privacy in Communication Systems. Cham: Springer, 2022: 39-56.
[17] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems 30, Long Beach, Dec 4-9, 2017: 5998-6008.
[18] FANG L Y, LEE G G, ZHAI X. Using Gpt-4 to augment unbalanced data for automatic scoring[EB/OL].[2024-04-13].https://arxiv.org/abs/2310.18365.
[19] FANG Y, LI X, THOMAS S W, et al. ChatGPT as data augmentation for compositional generalization: a case study in open intent detection[EB/OL].[2024-04-13].https://arxiv.org/abs/2308.13517.
[20] DAI H, LIU Z, LIAO W, et al. AugGPT: leveraging ChatGPT for text data augmentation[EB/OL]. [2024-04-13]. https://arxiv.org/abs/2302.13007.
[21] LI Z, ZENG J, CHEN Y, et al. AttacKG: constructing technique knowledge graph from cyber threat intelligence reports[C]//Proceedings of the 2022 European Symposium on Research in Computer Security. Cham: Springer, 2022: 589-609.
[22] AGHAEI E, AL-SHAER E. CVE-driven attack technique prediction with semantic information extraction and a domain-specific language model[EB/OL]. [2024-04-13]. https://arxiv.org/abs/2309.02785.
[23] CHAE Y, DAVIDSON T. Large language models for text classification: from zero-shot learning to fine-tuning[EB/OL]. [2024-06-19]. https://doi.org/10.31235/osf.io/sthwk.
[24] HEGSELMANN S, BUENDIA A, LANG H, et al. TabLLM: few-shot classification of tabular data with large language models[C]//Proceedings of the 2023 International Conference on Artificial Intelligence and Statistics, Palau de Congressos, Apr 25-27, 2023: 5549-5581.
[25] JI H, YANG J, CHAI L, et al. SEvenLLM: benchmarking, eliciting, and enhancing abilities of large language models in cyber threat intelligence[EB/OL]. [2024-06-19]. https://arxiv.org/abs/2405.03446.
[26] BALKUS S V, YAN D. Improving short text classification with augmented data using GPT-3[J]. Natural Language Engineering, 2024, 30(5): 943-972.
[27] MØLLER A G, DALSGAARD J A, PERA A, et al. Is a prompt and a few samples all you need? Using GPT-4 for data augmentation in low-resource classification tasks[EB/OL]. [2024- 04-13]. https://arxiv.org/abs/2304.13861.
[28] PURI R S, MISHRA S, PARMAR M, et al. How many data samples is an additional instruction worth?[EB/OL]. [2024-04-13]. https://arxiv.org/abs/2203.09161.
[29] TOUVRON H, LAVRIL T, IZACARD G, et al. LLaMA: open and efficient foundation language models[EB/OL].[2024-04-13]. https://arxiv.org/abs/2302.13971.
[30] HAN Z, GAO C, LIU J, et al. Parameter-efficient fine-tuning for large models: a comprehensive survey[EB/OL]. [2024-04-13]. https://arxiv.org/abs/2403.14608.
[31] HU E J, SHEN Y, WALLIS P, et al. LoRA: low-rank adaptation of large language models[EB/OL]. [2024-04-13]. https://arxiv.org/abs/2106.09685.
[32] ZHAO W X, ZHOU K, LI J, et al. A survey of large language models[EB/OL].[2024-04-13].https://arxiv.org/abs/2303.18223.
[33] YANG A, XIAO B, WANG B, et al. Baichuan 2: open large-scale language models[EB/OL]. [2024-04-13]. https://arxiv.org/abs/2309.10305.
[34] GAO J. Unifying demonstration selection and compression for in-context learning[EB/OL]. [2024-06-19]. https://arxiv.org/abs/2405.17062.
[35] ZENG A, LIU X, DU Z, et al. GLM-130B: an open bilingual pre-trained model[EB/OL].[2024-04-13].https://arxiv.org/abs/2210.02414.
[36] DU Z, QIAN Y, LIU X, et al. GLM: general language model pretraining with autoregressive blank infilling[EB/OL]. [2024-04-13]. https://arxiv.org/abs/2103.10360.
[37] MUENNIGHOFF N, WANG T, SUTAWIKA L, et al. Crosslingual generalization through multitask finetuning[EB/OL].[2024-04-13]. https://arxiv.org/abs/2211.01786.
[38] BAI J, BAI S, CHU Y, et al. Qwen technical report[EB/OL].[2024-04-13]. https://arxiv.org/abs/2309.16609.
[39] FAYYAZI R, YANG S J. On the uses of large language models to interpret ambiguous cyberattack descriptions[EB/OL]. [2024-04-13]. https://arxiv.org/abs/2306.14062.