计算机科学与探索 ›› 2021, Vol. 15 ›› Issue (5): 881-892.DOI: 10.3778/j.issn.1673-9418.2009066

• 网络与信息安全 • 上一篇    下一篇

融合注意力机制的恶意代码家族分类研究

王润正,高见,仝鑫,杨梦岐   

  1. 1. 中国人民公安大学 信息网络安全学院,北京 100038
    2. 安全防范与风险评估公安部重点实验室,北京 102623
  • 出版日期:2021-05-01 发布日期:2021-04-30

Research on Malicious Code Family Classification Combining Attention Mechanism

WANG Runzheng, GAO Jian, TONG Xin, YANG Mengqi   

  1. 1. College of Information and Cyber Security, People??s Public Security University of China, Beijing 100038, China
    2. Key Laboratory of Safety Precautions and Risk Assessment, Ministry of Public Security, Beijing 102623, China
  • Online:2021-05-01 Published:2021-04-30

摘要:

近年来,随着恶意代码家族变种的多样化和混淆等对抗手段的不断加强,传统的恶意代码检测方法难以取得较好的分类效果。鉴于此,提出了一种融合注意力机制的恶意代码家族分类模型。首先,使用逆向反汇编工具获取恶意样本的各区段特征,并利用可视化技术将各区段转化为RGB彩色图像的各通道;其次,引入通道域和空间域注意力机制来构建基于混合域注意力机制的深度可分离卷积网络,从通道和空间两个维度提取恶意样本的图像纹理特征;最后,选取九类恶意代码家族对模型进行训练和测试。实验结果表明,使用单一区段特征对恶意代码家族分类的准确率较低,采用融合特征能够有效地区分各类恶意代码家族,同时该模型相比于传统的神经网络模型取得了更好的分类效果,模型的分类准确率达到了98.38%。

关键词: 恶意家族, 多分类, 混合域注意力机制, 深度可分离卷积, 融合特征

Abstract:

In recent years, with the diversification of malicious code family and the enhancement of confounding countermeasures, traditional detection methods for malicious code are difficult to achieve good classification effect. Therefore, a malicious code family classification model combining attention mechanism is proposed. Firstly, this paper uses the reverse disassembly tool to obtain the features of each section of the malicious sample, and uses visualization technology to convert each section into each channel of RGB color image. Secondly, the channel domain and spatial domain attention mechanism are introduced to build the depthwise separable convolution network based on the mixed domain attention mechanism, and the image texture features of the malicious samples are extracted from the channel and space dimensions. Finally, nine categories of malicious code family are selected to train and test the model. The experimental result shows that the accuracy of the classification of malicious code family by a single section feature is lower than that by fusion feature, which can effectively distinguish various types of malicious code family. Compared with traditional neural network models, the proposed model achieves better classification effect and the classification accuracy of the model reaches 98.38%.

Key words: malicious family, multiclassification, mixed domain attention mechanism, depthwise separable convolution, fusion feature