Attribute Selection via Maximizing Independent-and-Effective Classification Information Ratio

doi:10.3778/j.issn.1673-9418.2104117

Journal of Frontiers of Computer Science and Technology ›› 2022, Vol. 16 ›› Issue (11): 2619-2627.DOI: 10.3778/j.issn.1673-9418.2104117

• Theory and Algorithm • Previous Articles Next Articles

Attribute Selection via Maximizing Independent-and-Effective Classification Information Ratio

LIU Ye¹^,², DAI Jianhua¹^,²^,⁺(), CHEN Jiaolong¹^,²

1. Hunan Provincial Key Laboratory of Intelligent Computing and Language Information Processing, Hunan Normal University, Changsha 410081, China
2. College of Information Science and Engineering, Hunan Normal University, Changsha 410081, China

Received:2021-04-30 Revised:2021-06-15 Online:2022-11-01 Published:2021-06-17
About author:LIU Ye, born in 1996, M.S. candidate. Her research interests include knowledge discovery and artificial intelligence.
DAI Jianhua, born in 1977, Ph.D., professor, Ph.D. supervisor. His research interests include artificial intelligence, soft computing, granular computing, knowledge discovery and intelligent information processing.
CHEN Jiaolong, born in 1996, M.S. candidate. His research interests include rough sets and fuzzy sets.
Supported by:
National Natural Science Foundation of China(61976089);National Natural Science Foundation of China(61473259);Science and Technology Project of Hunan Province(2018RS3065);Science and Technology Project of Hunan Province(2018TP1018);Innovation Foundation for Postgraduate of Hunan Province(CX20200552)

最大化独立有效分类信息率的属性选择

柳叶¹^,², 代建华¹^,²^,⁺(), 陈姣龙¹^,²

1.湖南师范大学智能计算与语言信息处理湖南省重点实验室，长沙 410081
2.湖南师范大学信息科学与工程学院，长沙 410081

通讯作者: + E-mail: jhdai@hunnu.edu.cn
作者简介:柳叶（1996—），女，江西人，硕士研究生，主要研究方向为知识发现、人工智能。
代建华（1977—），男，湖北人，博士，教授，博士生导师，主要研究方向为人工智能、软计算、粒计算、知识发现、智能信息处理。
陈姣龙（1996—），男，湖南人，硕士研究生，主要研究方向为粗糙集、模糊集。
基金资助:
国家自然科学基金(61976089);国家自然科学基金(61473259);湖南省科技计划项目(2018RS3065);湖南省科技计划项目(2018TP1018);湖南省研究生科研创新项目(CX20200552)

Abstract

Abstract:

Attribute selection in rough set theory has wide practical application values. Most existing attribute selection approaches neglect the relationship among the classification information and redundant information brought by the candidate attribute, and the retained classification information provided by the selected attributes when selecting the candidate attribute. Therefore, the significant evaluation function of effective classification information ratio is defined for attribute selection, and an attribute selection approach via the effective classification information ratio is proposed further, which can effectively select the attributes that can provide lots of effective classification information and low redundant information. Besides, considering the influence of candidate attribute on the retained classification information provided by the selected attributes, another significant evaluation function of independent-and-effective classification information ratio is advanced, and an improved attribute selection approach is proposed, which can contribute to balancing the relationship between the effective classification information and redundant information of the attributes, and improving the overall recognition ability of the selected attribute subset. Finally, comparative experiments are conducted from the aspects of classification performance and statistical Bonferroni-Dunn test, and the experimental results illustrate that the proposed attribute selection approaches are effective.

Key words: rough set theory, attribute selection, independent-and-effective classification information ratio, mutual information

摘要：

粗糙集中的属性选择有着十分重要的应用价值。现有的属性选择方法大多忽视了衡量待选属性所提供的分类信息和冗余信息，以及新增待选属性时已选属性所保留的分类信息三者之间的关联。因此，首先利用传统互信息，定义了有效分类信息率的属性重要性评估函数，并提出了一种基于有效分类信息率的属性选择方法。该属性选择方法可以有效地选择能提供大量有效分类信息同时携带较少冗余信息的待选属性。另外，考虑到新增待选属性对已选属性所保留的分类信息的影响，进一步提出了独立有效分类信息率的概念，并构造一种基于独立分类有效信息率的改进属性选择方法。该改进的属性选择方法能够有助于平衡属性的有效分类信息和冗余信息的关系，同时提高属性子集的整体识别能力。最后，从分类性能和统计学检验等方面分别与现有的属性选择方法进行了对比实验,实验结果表明了所提出的两种属性选择方法的有效性。

关键词: 粗糙集理论, 属性选择, 独立有效分类信息率, 互信息

CLC Number:

TP18

LIU Ye, DAI Jianhua, CHEN Jiaolong. Attribute Selection via Maximizing Independent-and-Effective Classification Information Ratio[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(11): 2619-2627.

柳叶, 代建华, 陈姣龙. 最大化独立有效分类信息率的属性选择[J]. 计算机科学与探索, 2022, 16(11): 2619-2627.

Figures/Tables 6

References 23

[1]	PAWLAK Z. Rough sets[J]. International Journal of Computer & Information Sciences, 1982, 11(5): 341-356.
[2]	PAWLAK Z. Rough sets and intelligent data analysis[J]. Infor- mation Sciences, 2002, 147(2): 1-12.
[3]	NI P, ZHAO S Y, WANG X Z, et al. PARA: a positive-region based attribute reduction accelerator[J]. Information Sciences, 2019, 503: 533-550. DOI
[4]	倪鹏, 刘阳明, 赵素云, 等. 动态模糊粗糙特征选取算法[J]. 计算机科学与探索, 2020, 14(2): 236-243. DOI
	NI P, LIU Y M, ZHAO S Y, et al. Dynamic fuzzy rough feature selection algorithm[J]. Journal of Frontiers of Computer Science and Technology, 2020, 14(2): 236-243. DOI
[5]	KONG L H, QU W H, YU J D, et al. Distributed feature selection for big data using fuzzy rough sets[J]. IEEE Transactions on Fuzzy Systems, 2020, 28 (5): 846-857. DOI URL
[6]	DAI J H, HU Q H, HU H, et al. Neighbor inconsistent pair selection for attribute reduction by rough set approach[J]. IEEE Transactions on Fuzzy Systems, 2018, 26(2): 937-950. DOI URL
[7]	DAI J H, HU Q H, ZHANG J H, et al. Attribute selection for partially labeled categorical data by rough set approach[J]. IEEE Transactions on Cybernetics, 2017, 47(9): 2460-2471. DOI PMID
[8]	YANG Y Y, CHEN D G, WANG H. Active sample selection based incremental algorithm for attribute reduction with rough sets[J]. IEEE Transactions on Fuzzy Systems, 2016, 25(4): 825-838. DOI URL
[9]	WANG C Z, HU Q H, WANG X Z, et al. Feature selection based on neighborhood discrimination index[J]. IEEE Tran-sactions on Neural Networks and Learning Systems, 2018, 29(7): 2986-2999.
[10]	YANG H, MOODY J. Data visualization and feature selection: new algorithms for nongaussian data[C]// Proceedings of the 12th International Conference on Neural Information Processing Systems, Denver, Nov 29-Dec 4, 1999. Cambridge: MIT Press, 1999: 687-702.
[11]	MEYER P E, BONTEMPI G. On the use of variable complementarity for feature selection in cancer classification[C]// LNCS 3907:Proceedings of the Workshops on Applications of Evolutionary Computation, Budapest, Apr 10-12, 2006. Berlin, Heidelberg: Springer, 2006: 91-102.
[12]	BENNASAR M, HICKS Y, SETCHI R. Feature selection using joint mutual information maximization[J]. Expert Systems with Applications, 2015, 42(22): 8520-8532. DOI URL
[13]	贾平, 代建华, 潘云鹤, 等. 一种基于互信息增益率的新属性约简算法[J]. 浙江大学学报(工学版), 2006, 40(6): 1041- 1044.
	JIA P, DAI J H, PAN Y H, et al. Novel algorithm for attribute reduction based on mutual-information gain ratio[J]. Journal of Zhejiang University (Engineering Science), 2006, 40(6): 1041-1044.
[14]	BATTITI R. Using mutual information for selecting features in supervised neural net learning[J]. IEEE Transactions on Neural Networks, 1994, 5(4): 537-550. PMID
[15]	KWAK N, CHOI C. Input feature selection for classification problems[J]. IEEE Transactions on Neural Networks, 2002, 13(1): 143-159. DOI PMID
[16]	PENG H C, LONG F H, DING C. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(8): 1226-1238. DOI URL
[17]	LIN D H, TANG X O. Conditional infomax learning: an integrated framework for feature extraction and fusion[C]// LNCS 3951: Proceedings of the 9th European Conference on Computer Vision, Graz, May 7-13, 2006. Berlin, Heidelberg: Springer, 2006: 68-82.
[18]	ESTEVEZ P, TESMER M, PEREZ C, et al. Normalized mutual information feature selection[J]. IEEE Transactions on Neural Networks, 2009, 20(2): 189-201. DOI PMID
[19]	WANG J, WEI J M, YANG Z L, et al. Feature selection by maximizing independent classification information[J]. IEEE Transactions on Knowledge and Data Engineering, 2017, 29(4): 828-841. DOI URL
[20]	刘琼, 代建华, 陈姣龙. 区间值数据的代价敏感特征选择[J]. 南京大学学报(自然科学), 2021, 57(1): 121-129.
	LIU Q, DAI J H, CHEN J L. Cost-sensitive feature selection for interval-valued data[J]. Journal of Nanjing University (Natural Science), 2021, 57(1): 121-129.
[21]	钱文彬, 黄琴, 王映龙, 等. 多标记不完备数据的特征选择算法[J]. 计算机科学与探索, 2019, 13(10): 1768-1780. DOI
	IAN W B, HUANG Q, WANG Y L, et al. Feature selection algorithm in multi-label incomplete data[J]. Journal of Frontiers of Computer Science and Technology, 2019, 13(10): 1768-1780. DOI
[22]	张文修, 吴伟志, 梁吉业, 等. 粗糙集理论与方法[M]. 北京: 科学出版社, 2001.
	ZHANG W X, WU W Z, LIANG J Y, et al. Rough set theory and method[M]. Beijing: Science Press, 2001.
[23]	DEMSAR J. Statistical comparisons of classifiers over multiple data sets[J]. Journal of Machine Learning Research, 2006, 7: 1-30.

Dataset	Samples	Attributes	Classes
Arrh	452	206	13
Car	1 728	6	4
Chess	3 196	36	2
Clean1	476	166	2
Colon	62	2 000	2
Glass	214	9	7
Libras	360	90	15
Lung	73	326	7
Lymph	148	18	4
Musk2	707	166	2
Vote	435	16	2
Wpbc33	198	32	2
Zoo	101	16	7

Dataset	Samples	Attributes	Classes
Arrh	452	206	13
Car	1 728	6	4
Chess	3 196	36	2
Clean1	476	166	2
Colon	62	2 000	2
Glass	214	9	7
Libras	360	90	15
Lung	73	326	7
Lymph	148	18	4
Musk2	707	166	2
Vote	435	16	2
Wpbc33	198	32	2
Zoo	101	16	7

Dataset	Acc/%
Dataset	DISR	NJMIM	GainRatio	MIFS	mRMR	NMIFS	CIFE	MRI	ASECIR	ASIECIR
Arrh	62.42	62.85	60.42	61.74	61.74	62.85	56.21	61.08	60.86	61.30
Car	91.03	91.03	91.03	91.03	91.03	91.03	91.03	91.03	95.95	95.95
Chess	95.78	95.90	95.62	94.49	95.96	95.59	95.84	96.06	96.28	96.28
Clean1	87.60	82.54	84.87	85.70	83.81	83.82	83.38	84.45	84.24	85.90
Colon	90.24	95.24	86.9	91.67	88.33	91.90	88.81	95.00	93.33	93.33
Glass	57.84	57.84	56.97	58.38	55.54	54.59	57.42	57.42	57.38	57.86
Libras	61.94	70.00	64.17	73.33	66.67	61.39	65.56	66.11	68.89	70.28
Lung	92.14	89.11	90.54	89.11	91.79	91.96	77.14	91.96	87.68	89.46
Lymph	80.29	78.33	80.43	74.33	73.57	75.71	81.14	79.14	74.05	76.33
Musk2	90.36	91.51	91.80	90.38	92.07	90.80	93.07	91.23	91.09	91.66
Vote	93.31	92.63	92.18	94.01	93.56	93.56	92.41	92.40	96.55	97.02
Wpbc33	67.24	70.26	74.32	69.74	72.71	71.82	67.79	67.79	74.79	74.79
Zoo	92.88	93.57	93.10	92.65	93.79	93.79	91.72	94.71	93.55	93.57
Avg. Acc/%	81.77	82.37	81.72	82.04	81.58	81.45	80.12	82.18	82.66	83.36
Avg. Rank	5.77	5.04	6.42	5.81	5.62	6.04	6.96	5.08	5.08	3.19

Dataset	Acc/%
Dataset	DISR	NJMIM	GainRatio	MIFS	mRMR	NMIFS	CIFE	MRI	ASECIR	ASIECIR
Arrh	62.42	62.85	60.42	61.74	61.74	62.85	56.21	61.08	60.86	61.30
Car	91.03	91.03	91.03	91.03	91.03	91.03	91.03	91.03	95.95	95.95
Chess	95.78	95.90	95.62	94.49	95.96	95.59	95.84	96.06	96.28	96.28
Clean1	87.60	82.54	84.87	85.70	83.81	83.82	83.38	84.45	84.24	85.90
Colon	90.24	95.24	86.9	91.67	88.33	91.90	88.81	95.00	93.33	93.33
Glass	57.84	57.84	56.97	58.38	55.54	54.59	57.42	57.42	57.38	57.86
Libras	61.94	70.00	64.17	73.33	66.67	61.39	65.56	66.11	68.89	70.28
Lung	92.14	89.11	90.54	89.11	91.79	91.96	77.14	91.96	87.68	89.46
Lymph	80.29	78.33	80.43	74.33	73.57	75.71	81.14	79.14	74.05	76.33
Musk2	90.36	91.51	91.80	90.38	92.07	90.80	93.07	91.23	91.09	91.66
Vote	93.31	92.63	92.18	94.01	93.56	93.56	92.41	92.40	96.55	97.02
Wpbc33	67.24	70.26	74.32	69.74	72.71	71.82	67.79	67.79	74.79	74.79
Zoo	92.88	93.57	93.10	92.65	93.79	93.79	91.72	94.71	93.55	93.57
Avg. Acc/%	81.77	82.37	81.72	82.04	81.58	81.45	80.12	82.18	82.66	83.36
Avg. Rank	5.77	5.04	6.42	5.81	5.62	6.04	6.96	5.08	5.08	3.19

Dataset	Acc/%
Dataset	DISR	NJMIM	GainRatio	MIFS	mRMR	NMIFS	CIFE	MRI	ASECIR	ASIECIR
Arrh	58.19	56.86	56.42	59.73	58.86	56.86	55.55	54.41	56.64	57.09
Car	95.37	95.37	94.97	95.37	95.37	95.37	95.37	95.37	96.99	96.99
Chess	99.25	98.69	99.25	97.50	99.25	99.12	99.12	99.25	99.09	99.19
Clean1	82.78	81.72	79.21	82.55	83.40	80.04	79.39	80.90	81.91	83.63
Colon	90.24	90.24	93.14	90.48	85.71	91.90	93.50	90.00	96.67	96.67
Glass	65.8	65.80	55.15	63.07	64.46	64.46	63.51	63.48	66.26	66.26
Libras	56.94	57.22	62.78	68.33	61.94	58.33	59.72	67.78	70.00	66.39
Lung	61.61	64.11	64.64	55.89	61.61	61.61	68.93	65.54	61.43	60.36
Lymph	69.52	70.90	75.57	72.29	71.52	73.57	71.52	72.24	76.95	82.43
Musk2	89.39	89.82	90.38	88.53	88.55	88.26	90.10	89.40	90.94	90.80
Vote	97.01	97.01	97.01	95.40	96.55	96.55	95.87	96.78	96.55	96.79
Wpbc33	72.76	73.79	70.82	69.76	71.79	71.74	69.26	70.26	74.29	74.29
Zoo	96.55	97.01	97.01	95.87	96.55	96.55	96.10	96.78	95.40	96.78
Avg. Acc/%	79.65	79.89	79.72	79.60	79.66	79.57	79.84	80.17	81.78	82.13
Avg. Rank	5.42	5.46	5.31	6.92	5.65	6.42	6.69	5.85	4.23	3.04

Attribute Selection via Maximizing Independent-and-Effective Classification Information Ratio

最大化独立有效分类信息率的属性选择

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 6

References 23

Related Articles 15

Recommended Articles 0

Metrics

[1]	TANG Chen, ZHAO Jieyu, YE Xulun, ZHENG Yang, YU Shushi. Link Prediction Model for Dynamic Graphs [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(10): 2365-2376.
[2]	WANG Jinjie, LI Wei. Multi-Objective Feature Selection Method Based on Hybrid MI and PSO Algorithm [J]. Journal of Frontiers of Computer Science and Technology, 2020, 14(1): 83-95.
[3]	RONG Chuitian, LI Yinyin, WANG Yan. Research on Technologies of Chinese Key-Phrase Automatic Extraction [J]. Journal of Frontiers of Computer Science and Technology, 2019, 13(9): 1481-1492.
[4]	RAO Ya, JIA Xiuyi, LI Tongjun, SHANG Lin. Attribute Reduction Method Based on Inter-Class Separability [J]. Journal of Frontiers of Computer Science and Technology, 2019, 13(8): 1422-1430.
[5]	MA Chen, JIANG Gaoxia, WANG Wenjian. Dynamic Mutual Information Feature Selection for Functional Data [J]. Journal of Frontiers of Computer Science and Technology, 2019, 13(1): 158-168.
[6]	XIA Wei, WANG Shanlei, YIN Zidu, YUE Kun. Mutual Information Based Modeling and Completion of Correlations in Knowledge Graphs [J]. Journal of Frontiers of Computer Science and Technology, 2018, 12(7): 1064-1074.
[7]	CHEN Qinxia, LIU Dun, LIANG Decui. Improved AHP Approach Based on Rough Set Theory and Information Entropy [J]. Journal of Frontiers of Computer Science and Technology, 2018, 12(3): 484-493.
[8]	GUO Lele, LIN Youfang, HAN Sheng. Using Ordered Mutual Information to Match Schema with Opaque Column Names and Data Values [J]. Journal of Frontiers of Computer Science and Technology, 2017, 11(9): 1389-1397.
[9]	LI Mengmeng, XU Weihua. Rough Fuzzy Set of Logical and Operation of Variable Precision and Grade Based on Dominance Relation [J]. Journal of Frontiers of Computer Science and Technology, 2016, 10(2): 277-284.
[10]	ZHANG Wei, MIAO Duoqian, LI Feng. Application of WilsonTh Data Editing for Neighborhood Rough Sets Based Co-training Classification Model [J]. Journal of Frontiers of Computer Science and Technology, 2014, 8(9): 1092-1100.
[11]	ZHOU Wei, WANG Feng, WANG Chongjun, XIE Junyuan. Mining Core Herbs and Their Combination Rules Using Effect Degree [J]. Journal of Frontiers of Computer Science and Technology, 2013, 7(11): 994-1001.
[12]	WANG Xin, WANG Xizhao, CHEN Jiankai, ZHAI Junhai. Comparative Study on Ordinal Decision Trees [J]. Journal of Frontiers of Computer Science and Technology, 2013, 7(11): 1018-1025.
[13]	YANG Aimin, LIN Jianghao, ZHOU Yongmei. Method on Building Chinese Text Sentiment Lexicon [J]. Journal of Frontiers of Computer Science and Technology, 2013, 7(11): 1033-1039.
[14]	WU Hao¹, LI Shijin¹⁺, LIN Lin², WAN Dingsheng¹. Multiple-strategy Combination Based Approach to Band Selection for Hyper-spectral Image Classification* [J]. Journal of Frontiers of Computer Science and Technology, 2010, 4(5): 464-472.
[15]	CHEN Zehua， XIE Gang， XIE Jun， XIE Keming+. BGrM and its Application in Knowledge Reduction [J]. Journal of Frontiers of Computer Science and Technology, 2010, 4(3): 283-288.

Dataset	Acc/%
Dataset	DISR	NJMIM	GainRatio	MIFS	mRMR	NMIFS	CIFE	MRI	ASECIR	ASIECIR
Arrh	70.36	69.90	67.70	67.05	69.91	70.57	68.15	70.13	69.25	68.37
Car	83.45	83.45	83.45	83.45	83.45	83.45	83.45	83.45	83.57	83.57
Chess	95.71	95.03	95.46	93.77	95.34	95.12	95.65	95.62	95.53	95.53
Clean1	81.32	79.43	82.98	81.94	81.53	81.33	80.43	80.68	82.35	82.99
Colon	91.67	90.00	88.57	85.48	90.00	93.33	88.57	88.57	90.24	90.24
Glass	46.28	46.75	48.14	47.71	47.73	47.73	47.73	47.71	49.16	48.20
Libras	63.61	60.28	61.94	69.44	61.67	57.22	63.89	63.06	64.72	66.94
Lung	80.46	81.09	80.90	80.90	80.05	80.06	76.68	77.73	80.66	81.30
Lymph	75.62	74.24	79.00	74.90	76.24	79.05	78.48	76.95	78.29	81.00
Musk2	88.68	90.81	82.18	86.14	91.94	91.80	88.97	90.10	90.81	91.37
Vote	96.09	95.63	96.55	96.55	96.32	96.32	96.55	96.79	96.32	96.10
Wpbc33	77.27	65.65	62.62	59.59	72.72	70.70	74.24	76.26	76.26	77.27
Zoo	90.09	89.09	94.00	94.09	90.09	93.09	96.00	96.00	96.00	95.00
Avg. Acc/%	80.05	78.57	78.73	78.54	79.77	79.98	79.91	80.23	81.01	81.38
Avg. Rank	5.73	7.58	5.85	6.81	5.96	5.27	5.65	5.35	3.73	3.08

Dataset	Acc/%
Dataset	DISR	NJMIM	GainRatio	MIFS	mRMR	NMIFS	CIFE	MRI	ASECIR	ASIECIR
Arrh	70.36	69.90	67.70	67.05	69.91	70.57	68.15	70.13	69.25	68.37
Car	83.45	83.45	83.45	83.45	83.45	83.45	83.45	83.45	83.57	83.57
Chess	95.71	95.03	95.46	93.77	95.34	95.12	95.65	95.62	95.53	95.53
Clean1	81.32	79.43	82.98	81.94	81.53	81.33	80.43	80.68	82.35	82.99
Colon	91.67	90.00	88.57	85.48	90.00	93.33	88.57	88.57	90.24	90.24
Glass	46.28	46.75	48.14	47.71	47.73	47.73	47.73	47.71	49.16	48.20
Libras	63.61	60.28	61.94	69.44	61.67	57.22	63.89	63.06	64.72	66.94
Lung	80.46	81.09	80.90	80.90	80.05	80.06	76.68	77.73	80.66	81.30
Lymph	75.62	74.24	79.00	74.90	76.24	79.05	78.48	76.95	78.29	81.00
Musk2	88.68	90.81	82.18	86.14	91.94	91.80	88.97	90.10	90.81	91.37
Vote	96.09	95.63	96.55	96.55	96.32	96.32	96.55	96.79	96.32	96.10
Wpbc33	77.27	65.65	62.62	59.59	72.72	70.70	74.24	76.26	76.26	77.27
Zoo	90.09	89.09	94.00	94.09	90.09	93.09	96.00	96.00	96.00	95.00
Avg. Acc/%	80.05	78.57	78.73	78.54	79.77	79.98	79.91	80.23	81.01	81.38
Avg. Rank	5.73	7.58	5.85	6.81	5.96	5.27	5.65	5.35	3.73	3.08