计算机科学与探索 ›› 2014, Vol. 8 ›› Issue (12): 1494-1501.DOI: 10.3778/j.issn.1673-9418.1408010

• 人工智能与模式识别 • 上一篇    下一篇

基于粗糙模糊集的不确定数据流聚类算法

姜元凯+,郑洪源   

  1. 南京航空航天大学 计算机科学与技术学院,南京 210016
  • 出版日期:2014-12-01 发布日期:2014-12-08

Clustering Algorithm over Uncertain Data Streams Based on Rough Fuzzy Set

JIANG Yuankai+, ZHENG Hongyuan   

  1. College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China
  • Online:2014-12-01 Published:2014-12-08

摘要: 为解决高维和高不确定级别的数据流聚类问题,提出了一种针对不确定数据流的聚类算法HFMicro。引入粗糙模糊集理论,定义了一种新的不确定数据流模型,并利用隶属程度的上、下近似来描述微簇。根据粗糙模糊集间的相似程度来选择最合适的微簇。使用动态衰减窗口模型提高算法的效率和聚类效果。由于采用了离线聚类模式,使得算法具有较好的实时性。实验结果表明,该算法能够很好地处理高维和高不确定级别的数据流,同时兼容存在级不确定性和属性级不确定性,与现有算法相比效果更好。

关键词: 不确定数据流, 粗糙模糊集, 聚类, 隶属度

Abstract: To solve data streams clustering problems of high dimensionality and high uncertainty level, this paper proposes an algorithm named HFMicro. The rough fuzzy set theory is introduced to define a new uncertain model of data streams, and the upper and lower approximations of the membership degree are used to describe micro-clusters. The most suitable micro-clusters are selected according to the similarity degree between rough fuzzy sets. Dynamic window of decay model is applied to achieve good algorithmic efficiency and clustering performance. Offline clustering model makes the algorithm have good real-time performance. The experimental results show that the algorithm can handle the data streams with high dimensionality and uncertainty level, and can process the data streams having existent uncertainty and property uncertainty at the same time. In comparison with the existing algorithms, HFMicro has better performance.

Key words: uncertain data streams, rough fuzzy set, clustering, membership degree