计算机科学与探索 ›› 2016, Vol. 10 ›› Issue (1): 43-55.DOI: 10.3778/j.issn.1673-9418.1505089

• 系统软件与软件工程 • 上一篇    下一篇

基于实例迁移的跨项目软件缺陷预测

毛发贵,李碧雯,沈备军+   

  1. 上海交通大学 软件学院,上海 200240
  • 出版日期:2016-01-01 发布日期:2016-01-07

Cross-Project Software Defect Prediction Based on Instance Transfer

MAO Fagui, LI Biwen, SHEN Beijun+   

  1. School of Software, Shanghai Jiao Tong University, Shanghai 200240, China
  • Online:2016-01-01 Published:2016-01-07

摘要: 跨项目软件缺陷预测是解决项目初期缺陷预测缺乏数据集的有效途径,但是项目间的差异性降低了预测准确率。针对这一问题,研究提出了基于实例迁移的跨项目缺陷预测方法。该方法采用迁移学习和自适应增强技术,从其他项目数据集中提取并迁移转化出与目标数据集关联性高的训练数据集,训练出更有效的预测模型。使用PROMISE数据集进行了对比实验,结果表明所提出的新方法有效避免了单源单目标缺陷预测两极分化问题,获得了更高的预测准确率和查全率;在目标项目数据集不足的情况下,能达到甚至超过数据集充足时项目内缺陷预测的预测效果。

关键词: 跨项目缺陷预测, 迁移学习, 基于实例的迁移, 自适应增强

Abstract: Cross-project defect prediction is considered as an effective means for solving the data shortage early in the project. Unfortunately, the performance of cross-project defect prediction is generally poor largely because of project variation. Focusing on this issue, this paper proposes a cross-project defect prediction approach based on instance transfer. The approach uses transfer learning and boosting technology to extract and transfer the training dataset high-related with target dataset from other projects, and builds a stronger combined classification model. The experimental results on PROMISE datasets show that, the proposed approach is superior to single-source single-target boosting methods with higher precision and recall; and in early phase with short data, it can achieve similar or better prediction results than intra-project approach with rich data.

Key words: cross-project defect prediction, transfer learning, instance-based transfer, boosting