计算机科学与探索 ›› 2022, Vol. 16 ›› Issue (11): 2619-2627.DOI: 10.3778/j.issn.1673-9418.2104117

• 理论与算法 • 上一篇    下一篇

最大化独立有效分类信息率的属性选择

柳叶1,2, 代建华1,2,+(), 陈姣龙1,2   

  1. 1.湖南师范大学 智能计算与语言信息处理湖南省重点实验室,长沙 410081
    2.湖南师范大学 信息科学与工程学院,长沙 410081
  • 收稿日期:2021-04-30 修回日期:2021-06-15 出版日期:2022-11-01 发布日期:2021-06-17
  • 通讯作者: + E-mail: jhdai@hunnu.edu.cn
  • 作者简介:柳叶(1996—),女,江西人,硕士研究生,主要研究方向为知识发现、人工智能。
    代建华(1977—),男,湖北人,博士,教授,博士生导师,主要研究方向为人工智能、软计算、粒计算、知识发现、智能信息处理。
    陈姣龙(1996—),男,湖南人,硕士研究生,主要研究方向为粗糙集、模糊集。
  • 基金资助:
    国家自然科学基金(61976089);国家自然科学基金(61473259);湖南省科技计划项目(2018RS3065);湖南省科技计划项目(2018TP1018);湖南省研究生科研创新项目(CX20200552)

Attribute Selection via Maximizing Independent-and-Effective Classification Information Ratio

LIU Ye1,2, DAI Jianhua1,2,+(), CHEN Jiaolong1,2   

  1. 1. Hunan Provincial Key Laboratory of Intelligent Computing and Language Information Processing, Hunan Normal University, Changsha 410081, China
    2. College of Information Science and Engineering, Hunan Normal University, Changsha 410081, China
  • Received:2021-04-30 Revised:2021-06-15 Online:2022-11-01 Published:2021-06-17
  • About author:LIU Ye, born in 1996, M.S. candidate. Her research interests include knowledge discovery and artificial intelligence.
    DAI Jianhua, born in 1977, Ph.D., professor, Ph.D. supervisor. His research interests include artificial intelligence, soft computing, granular computing, knowledge discovery and intelligent information processing.
    CHEN Jiaolong, born in 1996, M.S. candidate. His research interests include rough sets and fuzzy sets.
  • Supported by:
    National Natural Science Foundation of China(61976089);National Natural Science Foundation of China(61473259);Science and Technology Project of Hunan Province(2018RS3065);Science and Technology Project of Hunan Province(2018TP1018);Innovation Foundation for Postgraduate of Hunan Province(CX20200552)

摘要:

粗糙集中的属性选择有着十分重要的应用价值。现有的属性选择方法大多忽视了衡量待选属性所提供的分类信息和冗余信息,以及新增待选属性时已选属性所保留的分类信息三者之间的关联。因此,首先利用传统互信息,定义了有效分类信息率的属性重要性评估函数,并提出了一种基于有效分类信息率的属性选择方法。该属性选择方法可以有效地选择能提供大量有效分类信息同时携带较少冗余信息的待选属性。另外,考虑到新增待选属性对已选属性所保留的分类信息的影响,进一步提出了独立有效分类信息率的概念,并构造一种基于独立分类有效信息率的改进属性选择方法。该改进的属性选择方法能够有助于平衡属性的有效分类信息和冗余信息的关系,同时提高属性子集的整体识别能力。最后,从分类性能和统计学检验等方面分别与现有的属性选择方法进行了对比实验,实验结果表明了所提出的两种属性选择方法的有效性。

关键词: 粗糙集理论, 属性选择, 独立有效分类信息率, 互信息

Abstract:

Attribute selection in rough set theory has wide practical application values. Most existing attribute selection approaches neglect the relationship among the classification information and redundant information brought by the candidate attribute, and the retained classification information provided by the selected attributes when selecting the candidate attribute. Therefore, the significant evaluation function of effective classification information ratio is defined for attribute selection, and an attribute selection approach via the effective classification information ratio is proposed further, which can effectively select the attributes that can provide lots of effective classification information and low redundant information. Besides, considering the influence of candidate attribute on the retained classification information provided by the selected attributes, another significant evaluation function of independent-and-effective classification information ratio is advanced, and an improved attribute selection approach is proposed, which can contribute to balancing the relationship between the effective classification information and redundant information of the attributes, and improving the overall recognition ability of the selected attribute subset. Finally, comparative experiments are conducted from the aspects of classification performance and statistical Bonferroni-Dunn test, and the experimental results illustrate that the proposed attribute selection approaches are effective.

Key words: rough set theory, attribute selection, independent-and-effective classification information ratio, mutual information

中图分类号: