计算机科学与探索 ›› 2017, Vol. 11 ›› Issue (3): 406-413.DOI: 10.3778/j.issn.1673-9418.1603046

• 人工智能与模式识别 • 上一篇    下一篇

有效距离在聚类算法中的应用

光俊叶1,刘明霞1,2,张道强1+   

  1. 1. 南京航空航天大学 计算机科学与技术学院,南京 211106
    2. 泰山学院 信息科学技术学院,山东 泰安 271021
  • 出版日期:2017-03-01 发布日期:2017-03-09

Application of Effective Distance in Clustering Algorithms

GUANG Junye1, LIU Mingxia1,2, ZHANG Daoqiang1+   

  1. 1. College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing   211106, China
    2. College of Information Science and Technology, Taishan University, Taian, Shandong 271021, China
  • Online:2017-03-01 Published:2017-03-09

摘要: clustering; distance metric; metric learning; effective distance
摘  要:聚类分析是数据挖掘领域的重要组成部分之一,而度量学习是聚类分析中的关键性步骤。传统聚类算法中通常使用欧氏距离进行距离度量,但是欧氏距离只关注两两样本之间的距离关系,并没有顾及数据的全局性分布结构。考虑到数据的全局性结构信息,提出了一种新的具有全局性的度量方法——有效距离度量(effective distance metric),其主要思想是通过稀疏重构的方法计算数据样本之间的有效距离。进一步地,将有效距离应用到K-means、K-medoids和FCM(fuzzy C-means)3种经典聚类算法中开发了3种基于有效距离的聚类算法,即EK-means,EK-medoids和EFCM聚类算法。通过与传统聚类算法在UCI标准数据集上的实验结果进行比较,验证了基于有效距离的聚类算法能显著提高聚类效果。

关键词: 聚类, 距离度量, 度量学习, 有效距离

Abstract:  Distance metric learning is a key step in clustering analysis, which is an important sub-domain of data mining. Euclidean distance metric is a quite commonly used local distance metric in clustering algorithms, which only focuses on the distance between two samples. This paper proposes a new global distance metric method, named as the effective distance metric. In the new method, the similarity between two samples is evaluated by using not only the distance between these two samples, but also distances between one specific sample and all the other related ones. Sparse reconstruction coefficients are employed to reflect such global relationship among samples. Then, this paper develops three effective distance-based clustering algorithms, including EK-means, EK-medoids and EFCM, by applying the effective distance to three classical clustering algorithms, i.e., K-means, K-medoids and FCM (fuzzy C-means), respectively. The experimental results on UCI benchmark datasets demonstrate the efficacy of the proposed methods.

Key words: clustering, distance metric, metric learning, effective distance