Journal of Frontiers of Computer Science and Technology

• Science Researches •     Next Articles

A Review of Multivariate Time Series Clustering Algorithms

ZHENG Desheng,  SUN Hanming,  WANG Liyuan,  DUAN Yaoxin,  LI Xiaoyu   

  1. 1.School of Computer Science and Software Engineering, Southwest Petroleum University, Chengdu 610500, China
    2.School of Automation,Chongqing University of Posts and Telecommunications University, Chongqing 400065, China
    3.School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China

多元时间序列聚类算法综述

郑德生,孙涵明,王立远,段垚鑫,李晓瑜   

  1. 1.西南石油大学 计算机与软件学院,成都 610500
    2.重庆邮电大学 自动化学院,重庆 400065
    3.电子科技大学 信息与软件工程学院,成都 611731

Abstract: Multivariate Time Series (MTS) data, serving as a crucial basis for intelligent technologies across numerous domains, record the state changes of multiple variables in systems over time. Clustering techniques, as a core tool in data mining, can partition data into different clusters based on structural similarity, thereby uncovering the structure and internal relationships within data to discover systemic development patterns and variable correlations. Faced with the complexity of multivariate time series data structures, the interconnectivity between variables, and the challenges of data high-dimensionality, a substantial amount of research has been conducted internationally. However, there is still a lack of a comprehensive and systematic review on this topic. This article mainly provides an overview of clustering analysis algorithms for multivariate time series data scenarios. Initially, based on classification standards such as feature extraction methods, similarity measurement algorithms, and clustering partition frameworks, we conduct a comparative analysis of existing multivariate time series clustering algorithms. For each category of detection technology, a detailed summary and analysis are provided, covering algorithm principles, representative methods, advantages and disadvantages, and the problems they address. Further discussion includes common evaluation standards and publicly available datasets related to multivariate time series clustering. Lastly, from the perspective of the unique structure of multivariate temporal data, we outline several challenging issues and future research directions.

Key words: multivariate time series, clustering algorithm, feature representation, similarity measure, Clustering evaluation index

摘要: 多元时间序列(Multivariate Time Series,MTS)作为众多领域智能化技术的关键数据依据,其随时间推移记录了系统中多个变量的状态变化。聚类技术作为一个数据挖掘核心工具可以将数据按照其结构相似性划分为不同的簇,通过识别数据的结构和内在关系挖掘系统发展规律和变量相关关系。面对多元时间序列数据结构的复杂性、变量之间的关联性以及数据高维性等为聚类分析带来的挑战,国内外已经开展了大量相关研究工作。鉴于此,对多元时间序列数据场景下的聚类分析算法进行综述。首先基于特征提取方式、相似性度量算法、聚类划分框架等分类标准,对现有多元时间序列聚类算法进行对比分析。对于每一类多元时间序列聚类技术,从算法原理、代表性方法、算法优缺点以及解决的问题等方面进行详细总结与剖析。进一步讨论了常用的评价标准,以及多元时间序列聚类相关公开数据集。最后,从多变量时序数据结构特殊性出发对现有多元时间序列聚类存在的挑战及未来发展方向进行了总结与展望。

关键词: 多元时间序列, 聚类算法, 特征表示, 相似性度量, 评价指标