计算机科学与探索 ›› 2023, Vol. 17 ›› Issue (12): 3052-3064.DOI: 10.3778/j.issn.1673-9418.2209087

• 软件工程 • 上一篇    

超参数优化对跨版本缺陷预测影响的实证研究

韩惠,于巧,祝义   

  1. 江苏师范大学 计算机科学与技术学院,江苏 徐州 221116
  • 出版日期:2023-12-01 发布日期:2023-12-01

Impact of Hyperparameter Optimization on Cross-Version Defect Prediction: An Empirical Study

HAN Hui, YU Qiao, ZHU Yi   

  1. School of Computer Science and Technology, Jiangsu Normal University, Xuzhou, Jiangsu 221116, China
  • Online:2023-12-01 Published:2023-12-01

摘要: 在机器学习领域,超参数是影响模型性能的关键因素之一。已有研究表明,超参数优化能够显著提升版本内缺陷预测和跨项目缺陷预测性能,而对跨版本缺陷预测性能的影响尚不明确。选取五种经典缺陷预测模型(决策树、K-近邻、随机森林、支持向量机和多层感知机)以及四种常用超参数优化算法(基于TPE的贝叶斯优化、基于SMAC的贝叶斯优化、随机搜索和模拟退火),在PROMISE数据集上进行实验,探究超参数优化对跨版本缺陷预测性能的影响。研究结果表明:决策树、K-近邻和多层感知机模型超参数优化后,跨版本缺陷预测AUC值得到显著提升;超参数优化后的模型仍保持与默认超参数设置下相当的稳定性;除了较为复杂的多层感知机模型,其余模型超参数优化的时间平均为1~2 min,在跨版本缺陷预测实验中优化模型超参数是可行的。上述结果表明,跨版本缺陷预测应考虑优化模型超参数以提升预测性能。

关键词: 软件缺陷预测, 跨版本缺陷预测, 超参数优化

Abstract: In the field of machine learning, hyperparameters are one of the key factors that affect prediction performance. Previous studies have shown that optimizing hyperparameters can improve the performance of inner-version defect prediction and cross-project defect prediction, but the impact on the performance of cross-version defect prediction is unclear. This paper chooses five classical defect prediction models (decision tree, K-nearest neighbors, random forests, support vector machine, and multi-layer perceptron) and four common hyperparameter optimization algorithms (Bayesian optimization based on TPE, Bayesian optimization based on SMAC, random search, and simulated annealing). An empirical study is conducted on PROMISE dataset to explore the influence of optimizing hyperparameters on the performance of cross-version defect prediction. The results indicate that: firstly, there is an obvious improvement in the AUC of cross-version defect prediction after optimizing the decision tree, K-nearest neighbors and multi-layer perceptron models; secondly, the optimal models still maintain the same stability as the default hyperparametric models; thirdly, hyperparameter optimization takes 1 to 2 minutes for all models on average except the complicated multi-layer perceptron model and it is feasible to optimize the hyperparameter of model in cross-version defect prediction experiment. The above results indicate that the hyperparameter optimization of the model should be considered in the process of cross-version defect prediction to improve its performance.

Key words: software defect prediction, cross-version defect prediction, hyperparameter optimization