Journal of Frontiers of Computer Science and Technology

• Science Researches •     Next Articles

Research on Software Defect Prediction Models Combining Static Analysis Warnings

WU Haitao, MA Jingyue, GAO Jianhua   

  1. The College of Information, Mechanical and Electrical Engineering, Shanghai Normal University, Shanghai 200234,China

融合静态分析警告的软件缺陷预测模型及其应用研究

吴海涛, 马景悦, 高建华   

  1. 上海师范大学 信息与机电工程学院, 上海 200234

Abstract: Static analysis warnings, as an important software quality metric, are widely used to identify potential violations in the source code. Recent studies have shown that static analysis warnings are applied in code smell detection and Just-in-time defect prediction, but they are not involved in projects early on when commit logs are lacking. To address this issue, the paper utilizes warning information from three popular static analysis tools and combines it into the existing defect prediction model. This creates a new metric covering both software development and code maintainability aspects, exploring the potential relationship between static analysis warnings and defects. The paper investigates the impact of combining warnings on the performance of software defect prediction models and evaluates their influence in cross-project scenarios. The experimental results indicate that the quantity of warnings is closely related to the distribution of defects, showing a positive correlation. This suggests that warnings have significant potential in software defect prediction models, and the reported warning information in datasets with defects is often related to coding standards. After combining warnings, the defect prediction model improved the average precision by 1.4%-14.7%, the average recall by 0.2%-2.4%, the average F1 by 0.3%-3.0%, and the average AUC by 0.2%-1.4% in different projects. In cross-project scenarios, the study determines that the metric CODE+SAW_VIF provides the best-performing defect prediction model. From a performance perspective, combining static analysis warnings enhances the model's ability to identify defects.

Key words: Software defects, Static analysis tools, Static analysis warnings, Code metrics, Cross-project scenario prediction

摘要: 静态分析警告作为一种重要的软件质量指标,被广泛用于识别源代码中潜在的违规问题。近期的研究表明,静态分析警告在代码异味检测以及即时缺陷预测中有所应用,但在项目早期缺少提交修改记录的情况下没有涉及,针对上述问题,文中利用三种流行的静态分析工具的警告信息,在原有的缺陷预测模型中融合静态分析警告这一新的度量,构建一个涵盖软件开发和代码可维护性两个方面的缺陷预测模型,并探究静态分析警告与缺陷的潜在关系,融合警告对软件缺陷预测模型性能的影响以及在跨项目场景中的影响。实验结果表明,警告数量往往与缺陷分布密切相关,呈现正相关的关系,即警告这一度量在软件缺陷预测模型中有相当大的潜力,并且在有缺陷数据中报告的警告信息往往与编码规范相关;融合警告之后,缺陷预测模型在各项目的平均精度提高1.4%-14.7%,平均召回率提高0.2%-2.4%,平均F1提高0.3%-3.0%,平均AUC提高0.2%-1.4%。在跨项目场景中,确定了CODE+SAW_VIF这一度量提供了最佳性能的缺陷预测模型。从性能来看,融合静态分析警告能够提升模型识别缺陷的性能。

关键词: 软件缺陷, 静态分析工具, 静态分析警告, 代码度量, 跨项目场景预测