计算机科学与探索 ›› 2018, Vol. 12 ›› Issue (6): 1004-1012.DOI: 10.3778/j.issn.1673-9418.1702048

• 理论与算法 • 上一篇    下一篇

自动确定聚类中心的势能聚类算法

于晓飞1,葛洪伟1,2+   

  1. 1. 江南大学 物联网工程学院,江苏 无锡 214122
    2. 江南大学 轻工过程先进控制教育部重点实验室,江苏 无锡 214122
  • 出版日期:2018-06-01 发布日期:2018-06-06

Potential Clustering by Automatic Determination of Cluster Centers

YU Xiaofei1, GE Hongwei1,2+   

  1. 1. School of Internet of Things Engineering, Jiangnan University, Wuxi, Jiangsu 214122, China
    2. Ministry of Education Key Laboratory of Advanced Process Control for Light Industry, Jiangnan University, Wuxi, Jiangsu 214122, China
  • Online:2018-06-01 Published:2018-06-06

摘要: 基于势能的快速层次聚类算法使用一种全新的相似性度量准则,可以更高效地得到聚类结果。但该算法需人工设定聚类数目,而且在分配样本时仅依据距离测度,削弱了势能的影响。针对上述问题,提出一种自动确定聚类中心的势能聚类算法。新算法基于势能的物理意义和数据点与父节点的距离两个特征来自动确定聚类中心,并在分配机制上同时考虑势能和距离两个因素。在人工数据集和真实数据集上的实验表明,新算法不仅可以自动确定聚类数目,而且具有更优的聚类结果。

关键词: 聚类, 基于势能的快速层次聚类(PHA), 势能聚类, 自动确定聚类数目

Abstract: Potential-based hierarchical agglomerative clustering (PHA) uses a new similarity metric to get clustering results more efficiently. However, it suffers from the problem how to determine the number of clusters automatically. And it assigns?samples according to distance measure, which ignores the influence of potential. To overcome these shortcomings, this paper proposes a new algorithm that can determine the number of clusters automatically. Firstly, two variables are used to find the clustering centers automatically: the potential of each point and the distance from points to their parent nodes. Then, the distance and the potential are used to assign the remaining points. Finally, the experiments on artificial data sets and real data sets show that the new algorithm not only determines the number of clusters automatically, but also has better clustering results.

Key words: clustering, potential-based hierarchical agglomerative clustering (PHA), potential clustering, automatically determining the number of clustering