计算机科学与探索 ›› 2016, Vol. 10 ›› Issue (10): 1451-1458.DOI: 10.3778/j.issn.1673-9418.1509042

• 人工智能与模式识别 • 上一篇    下一篇

分治策略下的代价敏感属性选择回溯算法

黄伟婷1+,赵  红2,祝  峰2   

  1. 1. 闽南师范大学 计算机学院,福建 漳州 363000
    2. 闽南师范大学 粒计算及其应用重点实验室,福建 漳州 363000
  • 出版日期:2016-10-01 发布日期:2016-09-29

Backtracking Algorithm for Cost-Sensitive Feature Selection Based on Divide and Conquer Strategy

HUANG Weiting1+, ZHAO Hong2, ZHU William2   

  1. 1. School of Computer, Minnan Normal University, Zhangzhou, Fujian 363000, China
    2. Lab of Granular Computer, Minnan Normal University, Zhangzhou, Fujian 363000, China
  • Online:2016-10-01 Published:2016-09-29

摘要: 代价敏感属性选择是数据挖掘的一个重要研究领域,其目的在于通过权衡测试代价和误分类代价,获得总代价最小的属性子集。针对经典回溯算法运行时间较长的缺点,结合分治思想,提出了一种改进的回溯算法。改进算法引入了两个相关参数,根据数据集规模自适应调整参数,并按参数大小拆分数据集,降低问题规模,以提高经典回溯算法的执行效率。针对较大规模数据集的实验结果表明,与经典的回溯算法相比,改进算法在保证效果的同时至少提高20%的运算效率;与启发式算法相比,改进算法在保证效率的同时取得了具有更小总代价的属性集合,可应用于实际问题。

关键词: 粗糙集, 粒计算, 代价敏感, 属性选择, 自适应分治

Abstract: Cost-sensitive feature selection is an important research field in the process of data mining. It aims at obtaining an attribute subset of the lowest total cost, through balancing test cost and misclassification cost. According to the shortcoming of the classical backtracking algorithm with longer running time, combining divide and conquer thought, this paper proposes an improved backtracking algorithm. Introducing two related parameters, this algorithm computes adaptively parameters according to the dataset scale, and splits the dataset with these parameters. It can enhance the    efficiency of the classical backtracking algorithm by reducing the problem size. The experiments on the datasets with large scale show that this improved algorithm is effective and meets the need of practical problems. At the same time guaranteeing the effect, this improved algorithm promotes the efficiency of 20% at least than the classical backtracking algorithm. Compared with heuristic algorithm, this improved algorithm obtains an attribute set with a smaller total cost and ensures the efficiency.

Key words: rough sets, granular computing, cost-sensitive, feature selection, adaptive divide and conquer