不等长时间序列滑窗STS距离聚类算法

doi:10.3778/j.issn.1673-9418.1411044

计算机科学与探索 ›› 2015, Vol. 9 ›› Issue (11): 1301-1313.DOI: 10.3778/j.issn.1673-9418.1411044

不等长时间序列滑窗STS距离聚类算法

刘琴，王恺乐，饶卫雄+

同济大学软件学院，上海 201804

出版日期:2015-11-01 发布日期:2015-11-03

Non-Equal Time Series Clustering Algorithm with Sliding Window STS Distance

LIU Qin, WANG Kaile, RAO Weixiong+

School of Software Engineering, Tongji University, Shanghai 201804, China

Online:2015-11-01 Published:2015-11-03

摘要/Abstract

摘要： 时间序列的聚类算法是分析预测互联网搜索对象搜索指数和社交网络话题热度随时间变化趋势的重要过程，但目前时间序列聚类算法的研究存在两点不足：首先国内外的时间序列聚类的研究都采用等长划分的时间序列，这往往会丢失许多重要特征点，对数据挖掘结果产生一定的负面影响；其次直接使用时间序列观测值不能准确地度量时间序列的形状相似度。因此，通过标准分数z_score预处理消除了时间序列观测值数量级差异的影响，并设计了基于滑窗的不等长时间序列STS（short time series）距离和类k-means聚类算法的中心曲线计算方法，最终提出了基于滑窗不等长时间序列STS距离的聚类算法，从而解决了不等长时间序列聚类问题。采集互联网上的真实数据集作为测试样本，并进行了大量实验。实验结果表明，基于滑窗不等长时间序列STS距离的聚类算法不仅消除了时间序列观测值数量级差异的影响，解决了不等长时间序列聚类问题，并且比现有算法取得了更优的聚类效果。

关键词: 聚类, 时间序列, k-means算法

Abstract: Time series clustering is an important algorithm widely used by many applications, such as the analysis and forecast of topics on social media and search words on search engine. However, existing time series clustering algorithms suffer from two shortcomings. Firstly, time series clustering algorithms mostly work only for isometric time series with equal length, leading to the loss of many important features and negative impact of clustering results. Secondly, time series similarity metrics are not able to compare the shape similarity of time series. To address the problems, this paper proposes a novel computation framework to cluster time series data with non-equal length. At first, this paper uses z_score standardization to normalize the observed values of time series data. Next, based on sliding window, this paper extends STS (short time series) distance and designs a new distance measure for time series with non-equal time length. After that, this paper adapts the classic k-means algorithm to develop a new clustering algorithm. The extensive experimental results, by two real datasets that are collected from search engines and public data, successfully verify that the proposed time series clustering algorithm can handle non-equal time series data and outperform the state of arts in terms of clustering accuracy and quality.

Key words: clustering, time series, k-means algorithm

刘琴，王恺乐，饶卫雄. 不等长时间序列滑窗STS距离聚类算法[J]. 计算机科学与探索, 2015, 9(11): 1301-1313.

LIU Qin, WANG Kaile, RAO Weixiong. Non-Equal Time Series Clustering Algorithm with Sliding Window STS Distance[J]. Journal of Frontiers of Computer Science and Technology, 2015, 9(11): 1301-1313.

332

HTML			PDF

最新录用	在线预览	正式出版	最新录用	在线预览	正式出版
0	0	0	0	0	332

来源	本网站	其他网站

次数	278	54
比例	84%	16%

摘要

326

最新录用	在线预览	正式出版

0	0	326

	来源	本网站

	次数	326
	比例	100%

[1]	陈俊芬, 张明, 赵佳成, 谢博鋆, 李艳. 结合降噪和自注意力的深度聚类算法[J]. 计算机科学与探索, 2021, 15(9): 1717-1727.
[2]	王大刚, 丁世飞, 钟锦. 基于二阶[k]近邻的密度峰值聚类算法研究[J]. 计算机科学与探索, 2021, 15(8): 1490-1500.
[3]	沈学利, 秦鑫宇. 密度Canopy的增强聚类与深度特征的KNN算法[J]. 计算机科学与探索, 2021, 15(7): 1289-1301.
[4]	范瑞东, 侯臣平. 鲁棒自加权的多视图子空间聚类[J]. 计算机科学与探索, 2021, 15(6): 1062-1073.
[5]	柏锷湘, 罗可, 罗潇. 结合自然和共享最近邻的密度峰值聚类算法[J]. 计算机科学与探索, 2021, 15(5): 931-940.
[6]	张倪妮, 葛洪伟. 稳定的K-多均值聚类算法[J]. 计算机科学与探索, 2021, 15(5): 941-948.
[7]	马瑞强, 宋宝燕, 丁琳琳, 王俊陆. 面向时间序列事件的动态矩阵聚类方法[J]. 计算机科学与探索, 2021, 15(3): 468-477.
[8]	孙冬璞, 曲丽. 时间序列特征表示与相似性度量研究综述[J]. 计算机科学与探索, 2021, 15(2): 195-205.
[9]	薛红艳, 钱雪忠, 周世兵. 超簇加权的集成聚类算法[J]. 计算机科学与探索, 2021, 15(12): 2362-2373.
[10]	张培, 祝恩, 蔡志平. 单步划分融合多视图子空间聚类算法[J]. 计算机科学与探索, 2021, 15(12): 2413-2420.
[11]	姚晓红, 黄恒君. 非负半监督函数型聚类方法[J]. 计算机科学与探索, 2021, 15(12): 2438-2448.
[12]	王沐贤，丁小欧，王宏志，李建中. 基于相关性的多维时序数据异常溯源方法[J]. 计算机科学与探索, 2021, 15(11): 2142-2150.
[13]	刘娟, 万静. 自然反向最近邻优化的密度峰值聚类算法[J]. 计算机科学与探索, 2021, 15(10): 1888-1899.
[14]	尤坊州, 白亮. 关键节点选择的快速图聚类算法[J]. 计算机科学与探索, 2021, 15(10): 1930-1937.
[15]	黄宇翔, 黄栋, 王昌栋, 赖剑煌. 基于集成学习的改进深度嵌入聚类算法[J]. 计算机科学与探索, 2021, 15(10): 1949-1957.

不等长时间序列滑窗STS距离聚类算法

Non-Equal Time Series Clustering Algorithm with Sliding Window STS Distance

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐 0

Metrics