计算机科学与探索 ›› 2013, Vol. 7 ›› Issue (2): 126-135.DOI: 10.3778/j.issn.1673-9418.1206053

• 学术研究 • 上一篇    下一篇

决策粗糙集与代价敏感分类

李华雄1,2+,周献中1,黄  兵3,赵佳宝1   

  1. 1. 南京大学 工程管理学院,南京 210093
    2. 南京大学 计算机软件新技术国家重点实验室,南京 210093
    3. 南京审计学院 信息科学学院,南京 211815
  • 出版日期:2013-02-01 发布日期:2013-02-01

Decision-Theoretic Rough Set and Cost-Sensitive Classification

LI Huaxiong1,2+, ZHOU Xianzhong1, HUANG Bing3, ZHAO Jiabao1   

  1. 1. School of Management and Engineering, Nanjing University, Nanjing 210093, China
    2. State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210093, China
    3. School of Information Science, Nanjing Audit University, Nanjing 211815, China
  • Online:2013-02-01 Published:2013-02-01

摘要: 将决策粗糙集与代价敏感学习相结合,提出了一种基于决策粗糙集的代价敏感分类方法。依据决策粗糙集理论和属性约简方法,对待预测样本分别计算最优测试属性集,使得样本在最优测试属性集上计算的分类结果具有最小误分类代价和测试代价,依此给出样本的最小总代价分类结果。针对全局最优测试属性集求解过程中计算复杂度高的问题,提出了局部最优测试属性集的启发式搜索算法。该算法以单个属性对降低总分类代价的贡献率为启发函数,搜索各样本的局部最优测试属性集,并输出在局部最优测试属性集上样本的代价敏感分类结果。在UCI数据上的实验分析显示,所提算法有效地降低了分类结果的总代价和测试属性个数,使得样本分类结果同时具有较小的误分类代价和较小的测试代价。

关键词: 决策粗糙集, 代价敏感, 属性约简, 误分类代价, 测试代价

Abstract: This paper proposes a cost-sensitive classification method by combining cost-sensitive learning with decision theoretic rough set model. In the proposed classification method, the optimal test attribute sets for each test sample are respectively computed based on decision-theoretic rough set and attribute reduction, on which the sample is predicated with the lowest misclassification cost and test cost. Then the optimal classification label for each test sample is determined based on the optimal test attribute set. In view of the high computational complexity in searching global optimal test attribute set, the paper presents a heuristic algorithm to search local optimal test attribute set, in which the contribution of an attribute to decrease the total classification cost is taken as the heuristic function, and the local optimal classification label is determined based on the local optimal test attribute set. The experimental results on UCI database show that the proposed algorithm can effectively decrease the total cost of classification and the test attributes, which leads to a lower misclassification cost and a lower test cost.

Key words: decision-theoretic rough set, cost-sensitive, attribute reduction, misclassification cost, test cost