Dynamic Matrix Clustering Method for Time Series Events

doi:10.3778/j.issn.1673-9418.2008094

Abstract

Abstract:

Time series events clustering is the basis of studying the classification of events and mining analysis. Most of the existing clustering methods directly aim at continuous events with time attribute and complex structure, but the transformation of clustering objects is not considered, hence the accuracy of clustering is extremely low, and the efficiency is limited. In response to these problems, a time series events oriented dynamic matrix clustering method RDMC is proposed. Firstly, the r-nearest neighbor evaluation system is established to measure the representativeness of the event according to the evaluation value, and the candidate set of RDS (representative and diversifying sequences) is constructed by the backward difference calculation strategy of the nearest neighbor score. Secondly, a method of RDS selection based on combinatorial optimization is proposed to obtain the optimal solution of RDS from the candidate set quickly. Finally, on the basis of dynamically constructing the distance matrix between RDS and the data set, a matrix clustering method based on K-means is proposed to realize the effective division of time series events. Experimental results show that compared with the existing methods, the method proposed in this paper has obvious advantages in clustering accuracy, clustering reliability, and clustering efficiency.

Key words: clustering, backward difference, combinatorial optimization, K-means

摘要：

时间序列事件聚类是研究事件分类及挖掘分析的基础。现有聚类方法多直接针对具有时间属性且结构复杂的持续事件聚类，未考虑聚类对象的转化，聚类准确性低且效率差。针对这些问题，提出一种面向时间序列事件的动态矩阵聚类方法RDMC。首先，构建事件近邻评价体系，根据评价值优劣衡量事件的代表性，通过近邻评分的后向差分计算策略构建RDS候选集；其次，提出基于组合优化的RDS选取方法，从候选集上快速得到RDS最优解；最后，动态构建RDS与数据集的距离矩阵，提出基于K-means的矩阵聚类方法，实现时间序列事件所属类别的有效划分。实验表明，相比现有方法，所提方法在聚类准确率、聚类可靠性、聚类效率等方面具有明显优势。

关键词: 聚类, 后向差分, 组合优化, K-means

MA Ruiqiang, SONG Baoyan, DING Linlin, WANG Junlu. Dynamic Matrix Clustering Method for Time Series Events[J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(3): 468-477.

马瑞强, 宋宝燕, 丁琳琳, 王俊陆. 面向时间序列事件的动态矩阵聚类方法[J]. 计算机科学与探索, 2021, 15(3): 468-477.

References

[1] HARUTYUNYAN H, KHACHATRIAN H, KALE D C, et al. Multitask learning and benchmarking with clinical time series data[J]. Scientific Data, 2019, 6(1): 1-18.
[2] XU D, TIAN Y. A comprehensive survey of clustering algorithms[J]. Annals of Data Science, 2015, 2(2): 165-193.
[3] SUN J G, LIU J, ZHAO L Y. Clustering algorithms research[J]. Journal of Software, 2008, 19(1): 48-61.
孙吉贵, 刘杰, 赵连宇. 聚类算法研究[J]. 软件学报, 2008, 19(1): 48-61.
[4] CHEN Z W, CHANG D X. Automatic clustering algorithm based on density difference[J]. Journal of Software, 2018, 29(4): 935-944.
陈朝威, 常冬霞. 基于密度差分的自动聚类算法[J]. 软件学报, 2018, 29(4): 935-944.
[5] LE Q V. Building high-level features using large scale unsupervised learning[C]//Proceedings of the IEEE 2013 International Conference on Acoustics, Speech and Signal Processing, Vancouver, May 26-31, 2013. Piscataway: IEEE, 2013: 8595-8598.
[6] L?NGKVIST M, KARLSSON L, LOUTFI A. A review of unsupervised feature learning and deep learning for time-series modeling[J]. Pattern Recognition Letters, 2014, 42: 11-24.
[7] HAN Z M, CHEN N, LE J J, et al. An efficient and effective clustering algorithm for time series of hot topics[J]. Chinese Journal of Computers, 2012, 35(11): 2337-2347.
韩忠明, 陈妮, 乐嘉锦, 等. 面向热点话题时间序列的有效聚类算法研究[J]. 计算机学报, 2012, 35(11): 2337-2347.
[8] WU X, ZHU X, WU G Q, et al. Data mining with big data[J]. IEEE Transactions on Knowledge and Data Engineering, 2013, 26(1): 97-107.
[9] SHU K, SLIVA A, WANG S, et al. Fake news detection on social media: a data mining perspective[J]. ACM SIGKDD Explorations Newsletter, 2017, 19(1): 22-36.
[10] EUáN C, OMBAO H, ORTEGA J. The hierarchical spectral merger algorithm: a new time series clustering procedure[J]. Journal of Classification, 2018, 35(1): 71-99.
[11] MARQUES A G, SEGARRA S, LEUS G, et al. Stationary graph processes and spectral estimation[J]. IEEE Transactions on Signal Processing, 2017, 65(22): 5911-5926.
[12] RAHIMIAN H, BAYRAKSAN G, HOMEM-DE-MELLO T. Identifying effective scenarios in distributionally robust stochastic programs with total variation distance[J]. Mathematical Programming, 2019, 173(1/2): 393-430.
[13] AZENCOTT R, MURAVINA V, HEKMATI R, et al. Automatic clustering in large sets of time series[M]//CHETVERUSHKIN B N, FITZGIBBON W, KUZNETSOV Y A, et al. Contributions to Partial Differential Equations and Applications. Berlin, Heidelberg: Springer, 2019: 65-75.
[14] AMIRI M M, GüNDüZ D. Machine learning at the wireless edge: distributed stochastic gradient descent over-the-air[J]. IEEE Transactions on Signal Processing, 2020, 68: 2155-2169.
[15] MANDT S, HOFFMAN M D, BLEI D M. Stochastic gradient descent as approximate Bayesian inference[J]. Journal of Machine Learning Research, 2017, 18(1): 4873-4907.
[16] ZHENG J W, LI Z R, WANG W L, et al. Clustering with joint Laplacian regularization and adaptive feature learning[J]. Journal of Software, 2019, 30(12): 3846-3861.
郑建炜, 李卓蓉, 王万良, 等. 联合Laplacian 正则项和特征自适应的数据聚类算法[J]. 软件学报, 2019, 30(12): 3846-3861.
[17] TANG C, ZHU X, LIU X, et al. Learning a joint affinity graph for multiview subspace clustering[J]. IEEE Transactions on Multimedia, 2018, 21(7): 1724-1736.
[18] ZHANG D Y, ZHOU L H, WU X Y, et al. Data stream clustering based on grid coupling[J]. Journal of Software, 2019, 30(3): 667-683.
张东月, 周丽华, 吴湘云, 等. 基于网格耦合的数据流聚类[J]. 软件学报, 2019, 30(3): 667-683.
[19] ZAKARIA J, MUEEN A, KEOGH E. Clustering time series using unsupervised-shapelets[C]//Proceedings of the 12th IEEE International Conference on Data Mining, Brussels, Dec 10-13, 2012. Washington: IEEE Computer Society, 2012: 785-794.
[20] MADIRAJU N S, SADAT S M, FISHER D, et al. Deep temporal clustering: fully unsupervised learning of time-domain features[J]. arXiv:1802.01059, 2018.
[21] MALININ A, GALES M. Reverse KL-divergence training of prior networks: improved uncertainty and adversarial robustness[C]//Proceedings of the Annual Conference on Neural Information Processing Systems, Vancouver, Dec 8-14, 2019. Red Hook: Curran Associates, 2019: 14520-14531.
[22] ZHU Y M, WAN J C, ZHOU Z M, et al. Triple-to-Text: converting RDF triples into high-quality natural languages via optimizing an inverse KL divergence[C]//Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Paris, Jul 21-25, 2019. New York: ACM, 2019: 455-464.