计算机科学与探索 ›› 2021, Vol. 15 ›› Issue (11): 2142-2150.DOI: 10.3778/j.issn.1673-9418.2008100

• 数据库技术 • 上一篇    下一篇

基于相关性的多维时序数据异常溯源方法

王沐贤,丁小欧,王宏志,李建中   

  1. 哈尔滨工业大学 计算机科学与技术学院,哈尔滨 150000
  • 出版日期:2021-11-01 发布日期:2021-11-09

Correlation-Based Method for Tracing Multi-dimensional Time Series Data Anomalies

WANG Muxian, DING Xiaoou, WANG Hongzhi, LI Jianzhong   

  1. School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150000, China
  • Online:2021-11-01 Published:2021-11-09

摘要:

提出一种基于统计学相关性分析的多维时序异常数据检测分析方法,以对检测中表现为异常的数据进行溯源:对反映系统故障的数据和传感器质量问题的数据进行分类,进而识别出真正的系统故障,避免误检。首先根据相关关系构建时序相关图,再进一步归纳为时序相关环模型,通过在时序相关图上搜索并确定时序相关环的过程,提取时序相关环中的特征,得到时间序列相关性集合。进而利用时间序列相关性集合进行时序数据异常来源检测,根据检测结果评估时序传感器数据对应的系统故障的几率。在真实的工业设备传感器序列数据集上进行大量实验,实验结果验证了该方法在高维时序数据的异常检测任务上的有效性。通过对比实验,验证了该方法从稳定性和效率上优于基于统计和基于机器学习模型的基准算法,时间序列的维度越高,该方法较基准算法的提升越明显。该方法通过对多维时序数据相关性知识的挖掘,既节约了计算成本,又实现了对多维异常数据来源的精准识别。

关键词: 多维时间序列, 异常检测, 相关性分析, 图算法, 工业大数据, 溯源

Abstract:

This paper proposes a multi-dimensional time series anomaly data detection method based on correlation analysis, to trace the cause of anomaly detection: system failure data and sensor quality problem data are classified, and then real system failures are identified to avoid false detection. Firstly, the time series correlation graph model is proposed, which is further summarized as the time series correlation loop model. The time series correlation set is obtained by extracting the features in the time series correlation cycle, the cause of abnormality is detected, and the system failure is judged according to the result. Through a large number of experiments on real industrial data sets, the effectiveness of the method in the detection of abnormal sources of high-dimensional time series data is verified. Through comparative experiments, it is verified that the method is superior to fundamental algorithms based on statistics and machine learning models in terms of stability and efficiency. The higher dimensionality of time series, the more obvious improvement of the method compared with the fundamental algorithms. This method not only saves the cost, but also realizes the accurate identification of multi-dimensional abnormal data.

Key words: multi-dimensional time series, abnormal detection, correlation analysis, graph algorithm, industrial big data, provenance