Journal of Frontiers of Computer Science and Technology ›› 2022, Vol. 16 ›› Issue (8): 1923-1932.DOI: 10.3778/j.issn.1673-9418.2012099

• Theory and Algorithm • Previous Articles    

Rough K-means Clustering Algorithm Combined with Artificial Bee Colony Optimization

YE Tingyu1, YE Jun1,+(), WANG Hui1,2, WANG Lei1,2   

  1. 1. School of Information Engineering, Nanchang Institute of Technology, Nanchang 330000, China
    2. Jiangxi Province Key Laboratory of Water Information Cooperative Sensing and Intelligent Processing (Nanchang Institute of Technology), Nanchang 330000, China
  • Received:2020-12-28 Revised:2021-03-05 Online:2022-08-01 Published:2021-04-08
  • About author:YE Tingyu, born in 1997, M.S. candidate. His research interests include evolutionary computing, swarm intelligence and machine learning.
    YE Jun,born in 1968, M.S., professor, member of CCF. His research interests include rough set, granular computing, knowledge discovery, data mining, etc.
    WANG Hui, born in 1982, Ph.D., professor, member of CCF. His research interests include evolutionary computing, swarm intelligence, dispatch and water resource optimization.
    WANG Lei, born in 1967, Ph.D., associate professor, member of CCF. His research interests include rough set, granular computing, knowldge discovery, data mining, etc.
  • Supported by:
    the National Natural Science Foundation of China(61562061);the National Natural Science Foundation of China(61663028);the Natural Science Foundation of Jiangxi Province(20212BAB202022);the Technology Project of Ministry of Education of Jiangxi Province(GJJ170995)

结合人工蜂群优化的粗糙K-means聚类算法

叶廷宇1, 叶军1,+(), 王晖1,2, 王磊1,2   

  1. 1. 南昌工程学院 信息工程学院,南昌 330000
    2. 江西省水信息协同感知与智能处理重点实验室(南昌工程学院),南昌 330000
  • 通讯作者: +E-mail: 2003992646@nit.edu.cn
  • 作者简介:叶廷宇(1997—),男,江西南昌人,硕士研究生,主要研究方向为演化计算、群智能、机器学习。
    叶军(1968—),男,江西万安人,硕士,教授,CCF会员,主要研究方向为粗糙集和粒计算、知识发现和数据挖掘等。
    王晖(1982—),男,湖北红安人,博士,教授,CCF会员,主要研究方向为演化计算、群智能、调度和水资源优化。
    王磊(1967—),男,湖北鄂州人,博士,副教授,CCF会员,主要研究方向为粗糙集和粒计算、知识发现和数据挖掘等。
  • 基金资助:
    国家自然科学基金(61562061);国家自然科学基金(61663028);江西省自然科学基金(20212BAB202022);江西省教育厅科技项目(GJJ170995)

Abstract:

The rough K-means clustering algorithm has strong ability to deal with data with uncertain boundaries. However, this algorithm also has limitations such as sensitivity to the selection of initial clustering centers, and use of fixed weights and thresholds resulting in unstable clustering results and decreased accuracy. A lot of research has been devoted to solving these problems from different angles. With introduction of artificial bee colony (ABC) algorithm, the algorithm is improved from three aspects. Firstly, based on the ratio of the number of objects in lower approximate set and the boundary set to the product of the difference of the objects in the dataset, a more reasonable method of dynamically adjusting the weights of approximation and boundary set is designed. Secondly, in order to speed up the convergence speed of the algorithm, an implementation method of adaptive threshold ε associated with the number of iterations is given. Thirdly, by constructing the fitness function of the nectar source location, the bee colony is guided to search for high-quality nectar sources globally. The best position of honey source obtained by each iteration is taken as the initial cluster center, and the cluster is carried out on the basis of this. Experimental results show that the improved algorithm improves the stability of the clustering results and obtains better clustering effect.

Key words: rough K-means algorithm, artificial bee colony (ABC) algorithm, nectar, cluster center, fitness function

摘要:

粗糙K-means聚类算法具有较强的处理边界不确定数据能力,但该算法也存在对初始聚类中心选取敏感,以及采用固定权重和阈值方式而导致聚类结果不稳定、精度下降等问题。许多研究工作从不同角度致力于解决这些问题。引入人工蜂群算法(ABC)从三方面对算法进行了改进:首先,以下近似和边界集中数据对象个数与对象在数据集中空间分布的差异性乘积的比值为基础,设计了一种更为合理的动态调整下近似和边界集的权重方法。其次,为加快算法的收敛速度,给出了一种与迭代次数相关联的自适应阈值 ε的实现方法。最后,通过构造蜜源位置的适应度函数,引导蜂群向高质量蜜源全局搜索,把蜂群每次迭代得到的最优源位置作为初始聚类中心,并在此基础上进行交替聚类。实验结果表明,改进后的算法提高了聚类结果的稳定性,获得了较好的聚类效果。

关键词: 粗糙K-means聚类算法, 人工蜂群算法(ABC), 蜜源, 聚类中心, 适应度函数

CLC Number: