计算机科学与探索 ›› 2021, Vol. 15 ›› Issue (2): 195-205.DOI: 10.3778/j.issn.1673-9418.2003063

• 综述·探索 • 上一篇    下一篇

时间序列特征表示与相似性度量研究综述

孙冬璞,曲丽   

  1. 哈尔滨理工大学 计算机科学与技术学院,哈尔滨 150080
  • 出版日期:2021-02-01 发布日期:2021-02-01

Survey on Feature Representation and Similarity Measurement of Time Series

SUN Dongpu, QU Li   

  1. School of Computer Science and Technology, Harbin University of Science and Technology, Harbin 150080, China
  • Online:2021-02-01 Published:2021-02-01

摘要:

时间序列是将同一指标的数值按照时间的先后顺序排列组成的一组随机数列。随着科学技术的蓬勃发展,时间序列在数据挖掘领域中的应用变得越来越广泛。综合分析了近年来时间序列在数据挖掘领域的文献成果,对时间序列特征表示和相似性度量方法进行了阐述。针对时间序列特征表示方法,从非数据适应性方法、数据自适应性方法、基于模型的方法三方面进行说明,对各种主要方法的研究现状、优缺点、适用领域、方法特性以及局限性等进行了比较分析。针对时间序列的相似性度量方法,从基于形状的相似性度量方法、基于模型的相似性度量方法和基于数据压缩的相似性度量方法三方面进行系统描述,对各种主要方法的优缺点、适用领域等进行介绍,并从是否支持非等长时间序列之间的比较、是否支持平移、是否支持三角不等式等方面进行了比较分析。最后,对时间序列的未来研究方向进行了展望。

关键词: 数据挖掘, 时间序列, 特征表示, 相似性度量

Abstract:

Time series is a group of random numbers which are composed of the values of the same index according to the time sequence. With the rapid development of science and technology, the application of time series in the field of data mining becomes more and more extensively. This paper comprehensively analyzes the literature achi-evements of time series in the field of data mining in recent years, and expounds the methods of time series in feature representation and similarity measurement. For the feature representation methods of time series, the non-data adaptive methods, data self-adaptive methods and model-based methods are introduced. The research status, advantages and disadvantages, application fields, method characteristics and limitations of various main methods are compared and analyzed. For the similarity measurement methods of time series, the shape-based similarity measure-ment methods, model-based similarity measurement methods and data-compression-based similarity measurement methods are described systematically. The advantages and disadvantages of various main methods and their applica-tion fields are introduced. Some characteristics of different aspects are also compared and analyzed, such as whether to support the comparison between unequal length time series, whether to support translation, and whether to support trigonometric inequality. Finally, the future research direction of time series is prospected.

Key words: data mining, time series, feature representation, similarity measurement