计算机科学与探索 ›› 2019, Vol. 13 ›› Issue (6): 990-1004.DOI: 10.3778/j.issn.1673-9418.1812017

• 人工智能 • 上一篇    下一篇

改进的粒子群算法优化的特征选择方法

李炜1,巢秀琴2+   

  1. 1.安徽大学 计算机科学与技术学院,合肥 230601
    2.安徽大学 计算智能与信号处理教育部重点实验室,合肥 230601
  • 出版日期:2019-06-01 发布日期:2019-06-14

Improved Particle Swarm Optimization Method for Feature Selection

LIWei1, CHAO Xiuqin2+   

  1. 1. School of Computer Science and Technology, Anhui University, Hefei 230601, China
    2. Key Laboratory of Intelligent Computing and Signal Processing, Ministry of Education, Anhui University, Hefei 230601, China
  • Online:2019-06-01 Published:2019-06-14

摘要: 特征选择是数据挖掘中数据预处理的一个重要步骤,因此选择出最优的特征子集可有效地降低学习算法的数据维度和计算成本。采用二进制粒子群优化算法(binary particle swarm optimization algorithm,BPSO)来对特征选择过程进行优化。提出基于特征聚类信息进行种群初始化的策略,其中特征的聚类由社团划分算法完成,并根据划分后的信息,在初始化过程中减少信息冗余,提高初始化种群的质量。提出一种基于决策空间相似性的自适应局部搜索策略,其中粒子的相似性指数由粒子在决策空间中的相似性确定。进化过程中,自适应地调整粒子进行局部搜索,避免算法早熟。最后,选择三种代表性的优化算法分别在11 个UCI数据集上进行对比实验。实验结果表明,改进后的BPSO算法得到的特征选择结果在降低特征数目方面明显优于其他对比算法,且分类精度也有显著提高。

关键词: 二进制粒子群优化算法, 特征聚类, 交互操作, 粒子密度, 群智能算法

Abstract: Feature selection is an important step of data preprocessing in data mining, so selecting the optimal feature subset can reduce the data dimension and computing cost of learning algorithm effectively. In this paper,binary particle swarm optimization (BPSO) is used to optimize the feature selection process. A strategy of population initialization based on feature clustering information is proposed, in which feature clustering is completed by community partitioning algorithm. According to the partitioned information, information redundancy is reduced and the quality of initial population is improved. Then, an adaptive local search strategy based on decision space similarity is proposed, in which the similarity index of particles is determined by the similarity of particles in decision space. In the evolutionary process, the particles are adaptively adjusted for local search to avoid premature algorithm. Finally, 3 representative optimization algorithms are selected to carry out comparative experiments on 11 UCI datasets. The experimental results show that the improved BPSO algorithm has a better performance in reducing the number of features than other comparison algorithms, and the classification accuracy is also significantly improved.

Key words: binary particle swarm optimization, feature clustering, interactive manipulation, particle density, swarm intelligence algorithms