计算机科学与探索 ›› 2020, Vol. 14 ›› Issue (7): 1173-1182.DOI: 10.3778/j.issn.1673-9418.1908022

• 网络与信息安全 • 上一篇    下一篇

S-C特征提取的计算机漏洞自动分类算法

任家东,王倩,王菲,李亚洲,刘佳新   

  1. 1. 燕山大学 信息科学与工程学院,河北 秦皇岛 066001
    2. 河北省计算机虚拟技术与系统集成实验室,河北 秦皇岛 066001
  • 出版日期:2020-07-01 发布日期:2020-08-12

Automatic Classification of Computer Vulnerability Based on S-C Feature Extraction

REN Jiadong, WANG Qian, WANG Fei, LI Yazhou, LIU Jiaxin   

  1. 1. College of Information Science and Engineering, Yanshan University, Qinhuangdao, Hebei 066001, China
    2. Computer Virtual Technology and System Integration Laboratory of Hebei Province, Qinhuangdao, Hebei 066001, China
  • Online:2020-07-01 Published:2020-08-12

摘要:

近年来未知的计算机漏洞数量呈海量增长状态,对于大量的漏洞数据进行及时准确的分析和分类管理,是十分重要且有待解决的问题。因此,提出一种基于信息熵与综合函数[(S-C)]特征提取,并利用关联了特征词集间相互关系的平均一阶依赖贝叶斯模型(AODE)分类器的分类方法对计算机漏洞描述信息进行文本分类。首先,利用[S-C]特征提取法提取特征词。通过结合词语的类间重要程度和类内重要程度的综合函数[C],计算出词语对于类别的重要程度。再利用词语对于类别间的信息熵[S],来弱化对于分类较为混乱的词语的重要程度,选取得到准确的特征词集。最后,利用关联了特征词集间相互关系的AODE对漏洞数据集进行分类。通过实验对比表明,[S-C]特征提取法能够提取准确的特征词集,并且结合AODE分类器的分类准确率要高于传统的分类器模型。

关键词: 计算机漏洞, 文本分类, 特征提取, 信息熵

Abstract:

In recent years, the number of unknown computer vulnerabilities has increased rapidly. It is an important and unsolved problem for analyzing and classifying a large number of vulnerability data timely and accurately. Therefore, this paper proposes a text classification method for computer vulnerability description information   based on information entropy and comprehensive function[(S-C)]feature extraction and combines the averaged one-dependence estimators (AODE) classifier. First, the feature words are extracted by the[S-C]feature extraction method. By combining the comprehensive function[C]of the importance degree between classes and within classes of words, the importance degree of words to classes is calculated. Then, the information entropy[S]of words to classes is used to weaken the importance of words with chaotic classification and an accurate feature set is selected. Finally, the vulnerability data set is classified by using AODE which relates the relationship between feature word sets. The experimental comparison shows that the[S-C]feature extraction method can extract the accurate feature word set, and the classification accuracy combined with AODE classifier is higher than traditional classifier model.

Key words: computer vulnerability, text classification, feature extraction, information entropy