计算机科学与探索 ›› 2020, Vol. 14 ›› Issue (1): 83-95.DOI: 10.3778/j.issn.1673-9418.1901060

• 人工智能 • 上一篇    下一篇

混合互信息和粒子群算法的多目标特征选择方法

王金杰,李炜   

  1. 1.安徽大学 计算机科学与技术学院,合肥 230601
    2.安徽大学 计算智能与信号处理重点实验室,合肥 230039
  • 出版日期:2020-01-01 发布日期:2020-01-09

Multi-Objective Feature Selection Method Based on Hybrid MI and PSO Algorithm

WANG Jinjie, LI Wei   

  1. 1.School of Computer Science and Technology, Anhui University, Hefei 230601, China
    2.Key Laboratory of Intelligent Computing & Signal Processing, Ministry of Education, Anhui University, Hefei 230039, China
  • Online:2020-01-01 Published:2020-01-09

摘要: 在数据挖掘中,由于数据集中含有大量的冗余和不相关的特征,因此特征选择是一个重要的预处理过程。提出了一个基于混合互信息和粒子群算法的过滤式-封装式的多目标特征选择方法(HMIPSO)。根据粒子的pbest距离上次更新的迭代次数,提出了自适应突变策略去扰动种群,避免种群陷入局部最优。同时基于帕累托前沿面和外部文档提出了一个新的集合概念。结合互信息和新的集合知识提出了一个局部搜索策略,使得帕累托前沿面中的粒子可以删除不相关和冗余的特征,然后通过精英策略更新学习前和学习后的帕累托前沿面。最后将提出的算法和另外4种多目标算法在15个UCI数据集上进行了测试,实验结果表明提出的算法能够更好地降低特征个数和分类错误率。

关键词: 多目标优化, 特征选择, 互信息(MI), 粒子群算法(PSO), 帕累托前沿面, 外部文档

Abstract: Feature selection is an important pre-processing in data mining, due to a lot of redundant and irrelevant features in datasets. A filter-wrapper multi-objective feature selection method based on hybrid mutual information and particle swarm optimization algorithm (HMIPSO) is proposed. Based on the number of iterations of the pbest of the particle from the last update, an adaptive mutation strategy is proposed to disturb the population and avoid the population falling into local optimum. Meanwhile, a new set concept based on the Pareto front and archive is proposed. Combining mutual information and the new set knowledge, a local search strategy is proposed, which enables particles in Pareto front to delete irrelevant and redundant features, and then the Pareto front before and after learning is updated by Elite. Finally, this paper compares the effectiveness of HMIPSO with other 4 multi-objective algorithms on 15 UCI datasets. The experimental results show that HMIPSO can reduce the number of features and classification error rate efficiently.

Key words: multi-objective optimization, feature selection, mutual information (MI), particle swarm optimization(PSO), Pareto front, archive