计算机科学与探索 ›› 2018, Vol. 12 ›› Issue (9): 1420-1433.DOI: 10.3778/j.issn.1673-9418.1707016

• 系统软件与软件工程 • 上一篇    下一篇

基于多目标优化的软件缺陷预测特征选择方法

陈  翔1,2,沈宇翔1,孟少卿3+,崔展齐4,鞠小林1,2,王  赞5   

  1. 1. 南通大学 计算机科学与技术学院,江苏 南通 226019
    2. 桂林电子科技大学 广西可信软件重点实验室,广西 桂林 541004
    3. 天津大学 信息与网络中心,天津 300072
    4. 北京信息科技大学 计算机学院,北京 100101
    5. 天津大学 软件学院,天津 300072
  • 出版日期:2018-09-01 发布日期:2018-09-10

Multi-Objective Optimization Based Feature Selection Method for Software Defect Prediction

CHEN Xiang1,2, SHEN Yuxiang1, MENG Shaoqing3+, CUI Zhanqi4, JU Xiaolin1,2, WANG Zan5   

  1. 1. School of Computer Science and Technology, Nantong University, Nantong, Jiangsu 226019, China
    2. Guangxi Key Laboratory of Trusted Software, Guilin University of Electronic Technology, Guilin, Guangxi 541004, China
    3. Information and Network Center, Tianjin University, Tianjin 300072, China
    4. Computer School, Beijing Information Science and Technology University, Beijing 100101, China
    5. School of Computer Software, Tianjin University, Tianjin 300072, China
  • Online:2018-09-01 Published:2018-09-10

摘要: 软件缺陷预测可以通过预先识别出可疑缺陷模块,并随后对其投入足够的测试资源以提高软件质量。但在缺陷预测数据集的搜集过程中,若考虑了多种不同度量元(即特征)会造成维数灾难问题。特征选择是缓解该问题的一种有效方法,其尝试尽可能多地识别并移除已有特征集中的冗余特征和无关特征。然而设计有效的特征选择方法具有一定的挑战性。将软件缺陷预测特征选择问题建模为多目标优化问题,其优化目标包括最小化选出的特征子集规模和最大化随后构建出的缺陷预测模型的预测效果。随后提出MOFES(multi-objective optimization feature selection)方法来尝试平衡这两个可能矛盾的优化目标。为了验证MOFES方法的有效性,选择了来自实际开源项目的数据集PROMISE和RELINK,并且将MOFES方法与一些基准方法(例如GFS、GBS和SOFS)进行了比较。最终结果表明:在可接受的计算开销内,MOFES方法在大部分情况下可以选出规模更小的特征子集,并同时取得更好的模型预测效果。

关键词: 软件缺陷预测, 基于搜索的软件工程, 特征选择, 多目标优化

Abstract: Software defect prediction can identify potential defective modules in advance. It provides a guidance for software testers to allocate more testing resources on these modules for improving software quality. During the gathering process for defect prediction datasets, if multiple metrics (i.e., features) are used to measure the program modules, it will result in the curse of dimensionality. Feature selection is one of effective methods to alleviate this problem. It aims to identify and remove redundant and irrelevant features as many as possible. However, designing effective feature selection methods is a challenge problem. This paper formulizes the problem as a multi-objective optimization problem. One objective is to minimize the number of selected features. Another objective is to maximize the performance of trained model. Then, this paper proposes a novel method MOFES (multi-objective optimization feature selection) to find a balance between these two conflict objectives. To verify the effectiveness of the proposed method, this paper chooses PROMISE and RELINK datasets gathered from real open source projects, and compares MOFES with some classical baseline methods, such as GFS, GBS and SOFS. Final results show that the proposed method has the advantages of selecting fewer features and achieving better prediction performance in most projects while its computational cost is acceptable.

Key words: software defect prediction, search based software engineering, feature selection, multi-objective optimization