计算机科学与探索 ›› 2018, Vol. 12 ›› Issue (6): 859-871.DOI: 10.3778/j.issn.1673-9418.1705030

• 学术研究 • 上一篇    下一篇

利用局部主方向实现交互式聚类可视化

卢  颖1,张志豪1,2,张军平1,2+   

  1. 1. 复旦大学 计算机科学技术学院,上海 200433
    2. 上海市智能信息处理重点实验室,上海 200433
  • 出版日期:2018-06-01 发布日期:2018-06-06

Interactive Clustering Visualization Algorithm with Local Principal Direction

LU Ying1, ZHANG Zhihao1,2, ZHANG Junping1,2+    

  1. 1. School of Computer Science, Fudan University, Shanghai 200433, China
    2. Shanghai Key Laboratory of Intelligent Information Processing, Shanghai 200433, China
  • Online:2018-06-01 Published:2018-06-06

摘要: 聚类可以将结构相似的无标签数据分成不同的类。但是,现有的聚类算法无法让用户从直观上把握数据的分布情况,尤其是在高维空间中的分布情况。尽管维数约简的方法可以有效地将高维数据映射到低维空间便于用户理解,但是低维空间中数据点的重叠会影响可视化的效果。为了解决这一问题,提出了一种基于局部主方向的交互式聚类可视化方法。具体地,用户可以通过主方向上的频数直方图来理解和利用数据的统计特性,采用交互的方法收缩或拉伸点点距离,解决投影点的重叠问题。在人工数据集和真实数据集上进行了实验,实验结果表明,该方法可以有效地改善数据点在低维子空间中的可分性,为用户提供更好的可视化聚类效果。此外,该方法还能在保持良好聚类效果的同时,有效地减少降维算法的迭代次数,提升聚类分析效率。

关键词: 局部主方向, 可视聚类分析, 可视分析, 交互方法

Abstract: Clustering can group unlabeled data into different cliques, each of which has a similar structure. However, the existing clustering algorithms cannot provide users with an intuitive impression to the data distribution, especially when data lie on a high-dimensional space. Although dimension reduction is helpful for this issue, the effect of low-dimensional visualization may suffer from data overlapping. This paper proposes an interactive clustering visualization method based on local principal directions of data to solve the problem. Specifically, dimension reduction method is adopted first to give users an initial visualization effect of data, then local principal direction and corresponding frequency histogram are calculated and presented, so that users can understand and utilize the statistical characteristics of the data by looking at the frequency histogram along the local principal direction, and interactively shrink or stretch out the distance between points to separate some seemingly accumulated data. Experiments on artificial and real-world datasets indicate that the proposed method effectively improves the separability of data points in the low-dimensional subspace, provides users with a better visual effect to carry out clustering analysis and further exploration. The method is also useful to reduce the iteration times of a widely-used dimension reduction algorithm yet maintain a competitive clustering performance.

Key words: local principal direction, visual clustering analysis, visual analysis, interactive method