Journal of Frontiers of Computer Science and Technology

• Science Researches •     Next Articles

Cross-Architecture Vulnerability Detection combining Semantic and Attribute Feature

LI Kun, LI Bin, ZHU Wenjing, ZHOU Qinglei   

  1. 1. College of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, China
    2. Key Laboratory of Network Cryptography Technology of Henan Province, Zhengzhou 450001, China
    3. National Supercomputing Center in Zhengzhou, Zhengzhou 450001, China

融合语义与属性特征的跨架构漏洞检测

李坤, 李斌, 朱文静, 周清雷   

  1. 1. 郑州大学 计算机与人工智能学院, 郑州 450001
    2. 河南省网络密码技术重点实验室, 郑州 450001
    3. 国家超级计算郑州中心, 郑州 450001

Abstract: Binary vulnerability detection plays an important role in the field of program security. In order to cope with large-scale vulnerability detection tasks, more and more neural network technologies have been applied to cross-architecture vulnerability detection, which significantly improves the accuracy of vulnerability detection. However, existing methods still face problems such as single extracted information and inability to perform cross-architecture vulnerability detection. Therefore, a cross-architecture vulnerability detection method combining semantic and attribute characteristics is proposed. Firstly, the assembly code and attribute control flow diagram of binary function are used as input to extract the semantic information of all assembly code in basic block, and the basic block-level semantic information and attribute feature information are fused to generate a 139-dimensional basic block-level vector representation, in order to represent the semantic and attribute information of the function more comprehensively. Secondly, the twin network model based on convolutional neural network is used to generate function-level embedding vectors, in order to extract the features of different spatial hierarchies in different basic blocks and reduce the number of parameters in the neural network. Moreover, the distance of the function-level embedding vectors is calculated to determine whether the two binary functions to be detected are similar. Finally, when cross-architecture vulnerability detection is carried out, it is only necessary to input the assembly code and attribute control flow diagram of functions and known vulnerability functions in binary files to complete vulnerability detection. Experimental results show that the detection accuracy of this method is 95.64%, and the value of AUC (Area Under Curve) is 0.9969. Compared with the existing method, the accuracy can be increased by 0.26%~7.04%, and the AUC can be increased by 0.11%~1.59%, which has excellent performance in the real environment vulnerability detection.

Key words: Vulnerability detection, Neural network, Cross-architecture, Feature fusion, Function level

摘要: 二进制漏洞检测在程序安全领域有着重要的作用,为应对大规模的漏洞检测任务,越来越多的神经网络技术被应用到跨架构漏洞检测中,这些技术显著的提高了漏洞检测的准确率,但是现有方法仍然面临提取到的信息单一、不能进行跨架构漏洞检测等问题,为此提出了一种融合语义与属性特征的跨架构漏洞检测方法。首先,使用二进制函数的汇编代码和属性控制流图作为输入,提取基本块中所有汇编代码的语义信息,将基本块级的语义信息与属性特征信息进行特征融合生成139维的基本块级向量表示,以此来更全面的表示函数的语义和属性信息。其次,使用基于卷积神经网络的孪生网络模型生成函数级的嵌入向量,以此来提取不同基本块中的不同空间层次结构的特征并减少神经网络的参数量,之后通过计算函数级嵌入向量的距离来判断待检测的两个二进制函数是否相似。最后,在进行跨架构漏洞检测时,只需要输入二进制文件中的函数和已知漏洞函数的汇编代码和属性控制流图即可完成漏洞检测。实验结果表明:该方法检测的准确率为95.64%,AUC(Area Under Curve)的值为0.9969,与现有方法相比,准确率可以提升0.26%~7.04%,AUC可以提升0.11%~1.59%,在真实环境下的漏洞检测中表现优异。

关键词: 漏洞检测, 神经网络, 跨架构, 特征融合, 函数级