Journal of Frontiers of Computer Science and Technology ›› 2018, Vol. 12 ›› Issue (6): 961-971.DOI: 10.3778/j.issn.1673-9418.1705023

Previous Articles     Next Articles

Correlation Clustering Based on Drop of Water Principle

HUA Jialin1,2+, YU Jian1,2   

  1. 1. School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China
    2. Beijing Key Laboratory of Traffic Data Analysis and Mining, Beijing 100044, China
  • Online:2018-06-01 Published:2018-06-06

基于滴水原理的关联聚类算法

华佳林1,2+,于  剑1,2   

  1. 1. 北京交通大学 计算机与信息技术学院,北京 100044
    2. 交通数据分析与挖掘北京市重点实验室,北京 100044

Abstract: With the development of various new media, there are many new issues popped up in data mining. Correlation clustering is one of these issues, that receives more and more attentions recently. It has been widely used in real applications such as image segmentation and spam filtering. Many methods have been proposed to process the issues. However, few of them can process large signed graphs which become common in practice. This paper proposes a novel algorithm based on the drop of water principle to tackle the problem of clustering large scale signed graphs. The algorithm is constructed according to the following procedures. Firstly, the signed graphs are shrank based on the drop of water principle. The points within a stream can be taken as a new point in the graph, the scale of signed graphs can be enormously decreased in this procedure. Secondly, some important points are chosen as seeds, and based on integer linear programming, the seeds can decide whether the neighbors joint the seeds or not. The experimental results show that the proposed method can cluster large scale graphs effectively and efficiently.

Key words: drop of water principle, signed graphs, correlation clustering, integer linear programming

摘要: 随着各种新兴媒体的发展,在数据挖掘领域出现了越来越多的新问题和新任务,关联聚类问题就是其中之一,最近受到越来越多的关注。现实中有很多问题可以使用关联聚类技术来处理,比如图像分割和垃圾邮件过滤等。大规模有符号图的出现越来越频繁,虽然之前有很多关联聚类算法被提出,但是很少算法能够处理规模很大的有符号图。提出了一个基于滴水原理的算法来处理大规模有符号图的聚类问题。算法过程包括:根据滴水原理来收缩图的规模,将一个水流上的所有点看成是一个新的点,这样可以极大地减小图的规模;在新的图中选出重要的点,并根据整数线性规划来判断邻居点是否合并。实验结果表明,该算法能够快速有效地进行大规模有符号图的聚类。

关键词: 滴水原理, 有符号图, 关联聚类, 整数线性规划