计算机科学与探索 ›› 2019, Vol. 13 ›› Issue (9): 1553-1566.DOI: 10.3778/j.issn.1673-9418.1807063

• 人工智能与模式识别 • 上一篇    下一篇

具有抗噪性能适用高维数据的增量式聚类算法

邵俊健,王士同   

  1. 江南大学 数字媒体学院,江苏 无锡 214122
  • 出版日期:2019-09-01 发布日期:2019-09-06

Incremental Clustering Algorithm with Anti-Noise Performance and Suitable for High Dimensional Data

SHAO Junjian, WANG Shitong   

  1. School of Digital Media, Jiangnan University, Wuxi, Jiangsu 214122, China
  • Online:2019-09-01 Published:2019-09-06

摘要: 针对含有噪声的高维数据的聚类问题,提出一种使用新的距离度量方式的增量式聚类算法ANFCM(c+p)。由于传统的模糊C均值聚类算法对初始化聚类中心比较敏感,所提出的聚类算法将单程FCM的增量机制(称为SpFCM)与FCPM中使用的初始化聚类中心的策略相结合,即将先前数据块的聚类中心附近的几个样本点添加到下一个数据块进行聚类,以避免FCM对噪声的敏感性。此外,所提出的聚类算法使用一种新的改进后的距离度量的同时,使用修正后的约束条件和目标函数。通过以上改进,可以有效区分已知类和未知类在算法中的不同影响程度,并加强类之间的相互影响程度。实验结果表明,该算法对高维噪声数据具有很好的聚类效果和鲁棒性。

关键词: 高斯噪声, 增量式聚类算法, 距离度量, 高维数据, FCPM算法

Abstract: In order to cluster high-dimensional noisy data, a new distance metric based incremental clustering algo-rithm called ANFCM(c+p) (anti-noise fuzzy (c+p) means clustering) is proposed. Because the traditional FCM (fuzzy C-means clustering) algorithm is sensitive to the initialization of the cluster center, the proposed clustering algorithm integrates the incremental mechanism of the single-pass FCM called SpFCM (single pass fuzzy C-means clustering) algorithm together with the initialization strategy of cluster centers adopted in FCPM (fuzzy (c+p) means clustering) algorithm. That is several data points which are nearest to the cluster centers of the previous data block are added to the next data block for clustering. So it can avoid the sensitivity of FCM to noise. In particular, the proposed clustering algorithm takes a new improved distance metric, and then adopts the modified constraint condition and objective function. Through the above improvements, the influence degree of known and unknown classes in the algorithm can be distinguished, and the interaction degree between classes can be strengthened. Experimental results show that the proposed algorithm has good clustering effect and robustness for high dimen-sional noisy data.

Key words: Gaussian noise, incremental clustering algorithm, distance measure, high dimensional data, fuzzy (c+p) means clustering (FCPM) algorithm