Journal of Frontiers of Computer Science and Technology ›› 2016, Vol. 10 ›› Issue (2): 268-276.DOI: 10.3778/j.issn.1673-9418.1505034

Previous Articles     Next Articles

Fast Multi-Exemplar Affinity Propagation Algorithm for Large Data Sets

LIU Ji+, CHEN Xiuhong, HANG Wenlong   

  1. School of Digital Media, Jiangnan University, Wuxi, Jiangsu 214122, China
  • Online:2016-02-01 Published:2016-02-03

面向大规模数据的快速多代表点仿射传播算法

刘  季+,陈秀宏,杭文龙   

  1. 江南大学 数字媒体学院,江苏 无锡 214122

Abstract: The traditional affinity propagation (AP) is difficult to handle with multi-exemplar clustering, and has large space and time complexity, so it is not suitable for large data sets. To address these problems, this paper proposes a multi-exemplar affinity propagation using fast reduced set density estimator (FRSMEAP). At the beginning of clustering, fast reduced set density estimator (FRSDE) is introduced to preprocess large-scale data sets, and then the condensed set fully representing sample properties can be obtained. Multi-exemplar affinity propagation (MEAP) algorithm is used to cluster the condensed set, which can find better decision boundaries than AP. So the accuracy of clustering is improved. In order to get the final data partition, the K-nearest neighbor (KNN) is used to assign the remained data. The simulation results on synthetic and standard data sets show that the proposed algorithm can not only cluster on large-scale data sets, but also has the advantage of high precision and fast speed.

Key words: affinity propagation, clustering, large data sets, multi-exemplar

摘要: 针对原始的仿射传播(affinity propagation,AP)聚类算法难以处理多代表点聚类,以及空间和时间开销过大等问题,提出了快速多代表点仿射传播(multi-exemplar affinity propagation using fast reduced set density estimator,FRSMEAP)聚类算法。该算法在聚类初始阶段,引入快速压缩集密度估计算法(fast reduced set density estimator,FRSDE)对大规模数据集进行预处理,得到能够充分代表样本属性的压缩集;在聚类阶段,使用多代表点仿射传播(multi-exemplar affinity propagation,MEAP)聚类算法,获得比AP更加明显的聚类决策边界,从而提高聚类的精度;最后再利用K-邻近(K-nearest neighbor,KNN)算法分配剩余点得到最终的数据划分。在人工数据集和真实数据集上的仿真实验结果表明,该算法不仅能在大规模数据集上进行聚类,而且具有聚类精度高和运行速度快等优点。

关键词: 仿射传播, 聚类, 大数据, 多代表点