计算机科学与探索 ›› 2014, Vol. 8 ›› Issue (12): 1442-1451.DOI: 10.3778/j.issn.1673-9418.1410007

• 系统软件与软件工程 • 上一篇    下一篇

代价敏感分类的软件缺陷预测方法

李  勇1,2+,黄志球1,房丙午1,王  勇1   

  1. 1. 南京航空航天大学 计算机科学与技术学院,南京 210016
    2. 新疆师范大学 网络信息安全与舆情分析重点实验室,乌鲁木齐 830054
  • 出版日期:2014-12-01 发布日期:2014-12-08

Using Cost-Sensitive Classification for Software Defects Prediction

LI Yong1,2+, HUANG Zhiqiu1, FANG Bingwu1, WANG Yong1   

  1. 1. College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China
    2. Key Laboratory of Network Information Security and Public Opinion Analysis, Xinjiang Normal University, Urumqi 830054, China
  • Online:2014-12-01 Published:2014-12-08

摘要: 软件缺陷预测是提高软件测试效率,保证软件可靠性的重要途径。考虑到软件缺陷预测模型对软件模块错误分类代价的不同,提出了代价敏感分类的软件缺陷预测模型构建方法。针对代码属性度量数据,采用Bagging方式有放回地多次随机抽取训练样本来构建代价敏感分类的决策树基分类器,然后通过投票的方式集成后进行软件模块的缺陷预测,并给出模型构建过程中代价因子最优值的判定选择方法。使用公开的NASA软件缺陷预测数据集进行仿真实验,结果表明该方法在保证缺陷预测率的前提下,误报率明显降低,综合评价指标AUC和F值均优于现有方法。

关键词: 软件缺陷预测, 代价敏感分类, 最优代价因子, 决策树, 集成算法

Abstract: Software defects prediction is considered as an effective means for the optimization of quality assurance activities. Taking into account the different misclassification cost for unknown software modules using the software defects prediction models, this paper proposes the cost-sensitive classification method for constructing software defects prediction models. Firstly, for the code attribute metric data, decision tree algorithm is selected to construct base-
classifiers using cost-sensitive classification method by sampling with replacement of Bagging. Then, the defects prediction model is constructed based on majority rule, and the approach to obtain the approximate optimal cost-factor value is researched. The experimental results on the NASA software defects prediction datasets show that the proposed method is averagely superior to the conventional methods with lower probability of false alarm and higher comprehensive evaluation values.

Key words: software defects prediction, cost-sensitive classification, optimal cost-factor, decision tree, ensemble algorithm