计算机科学与探索 ›› 2021, Vol. 15 ›› Issue (3): 468-477.DOI: 10.3778/j.issn.1673-9418.2008094

• 学术研究 • 上一篇    下一篇

面向时间序列事件的动态矩阵聚类方法

马瑞强,宋宝燕,丁琳琳,王俊陆   

  1. 辽宁大学 信息学院,沈阳 110036
  • 出版日期:2021-03-01 发布日期:2021-03-05

Dynamic Matrix Clustering Method for Time Series Events

MA Ruiqiang, SONG Baoyan, DING Linlin, WANG Junlu   

  1. School of Information, Liaoning University, Shenyang 110036, China
  • Online:2021-03-01 Published:2021-03-05

摘要:

时间序列事件聚类是研究事件分类及挖掘分析的基础。现有聚类方法多直接针对具有时间属性且结构复杂的持续事件聚类,未考虑聚类对象的转化,聚类准确性低且效率差。针对这些问题,提出一种面向时间序列事件的动态矩阵聚类方法RDMC。首先,构建事件近邻评价体系,根据评价值优劣衡量事件的代表性,通过近邻评分的后向差分计算策略构建RDS候选集;其次,提出基于组合优化的RDS选取方法,从候选集上快速得到RDS最优解;最后,动态构建RDS与数据集的距离矩阵,提出基于K-means的矩阵聚类方法,实现时间序列事件所属类别的有效划分。实验表明,相比现有方法,所提方法在聚类准确率、聚类可靠性、聚类效率等方面具有明显优势。

关键词: 聚类, 后向差分, 组合优化, K-means

Abstract:

Time series events clustering is the basis of studying the classification of events and mining analysis. Most of the existing clustering methods directly aim at continuous events with time attribute and complex structure, but the transformation of clustering objects is not considered, hence the accuracy of clustering is extremely low, and the efficiency is limited. In response to these problems, a time series events oriented dynamic matrix clustering method RDMC is proposed. Firstly, the r-nearest neighbor evaluation system is established to measure the representativeness of the event according to the evaluation value, and the candidate set of RDS (representative and diversifying sequences) is constructed by the backward difference calculation strategy of the nearest neighbor score. Secondly, a method of RDS selection based on combinatorial optimization is proposed to obtain the optimal solution of RDS from the candidate set quickly. Finally, on the basis of dynamically constructing the distance matrix between RDS and the data set, a matrix clustering method based on K-means is proposed to realize the effective division of time series events. Experimental results show that compared with the existing methods, the method proposed in this paper has obvious advantages in clustering accuracy, clustering reliability, and clustering efficiency.

Key words: clustering, backward difference, combinatorial optimization, K-means