Journal of Frontiers of Computer Science and Technology ›› 2020, Vol. 14 ›› Issue (11): 1828-1837.DOI: 10.3778/j.issn.1673-9418.2002015

Previous Articles     Next Articles

Research on Anomaly Detection System of Online Multi-node Log Flow

WANG Xiaodong, ZHAO Yining, XIAO Haili, WANG Xiaoning, CHI Xuebin   

  1. 1. Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China
    2. University of Chinese Academy of Sciences, Beijing 100049, China
  • Online:2020-11-01 Published:2020-11-09



  1. 1. 中国科学院 计算机网络信息中心,北京 100190
    2. 中国科学院大学,北京 100049


With the increasing amount of logs produced by nodes in CNGrid, traditional manual methods for abnormal log analysis can no longer meet the need of daily analysis. In order to analyze the log automatically and efficiently, a two-stage detection method is proposed in this paper. In the first stage, the log patterns are classified during preprocessing, then the principal component analysis is used for anomaly detection and the sequence of log types is defined as a log flow pattern. The abnormal flow patterns obtained from anomaly detection are extracted by the definition. Finally, the hierarchical clustering algorithm is used to simplify the results of the flow pattern and the results are saved. In the second stage, through the detection model and flow pattern obtained in the first stage, the log flow information can be monitored and analyzed in real time and the corresponding flow pattern can be matched. Finally, the experiment is carried out on real logs in CNGrid, and the results are visualized in real time. These greatly reduce the manual work of operations.

Key words: principal component analysis (PCA), log flow pattern, hierarchical clustering, visualization



关键词: 主成分分析(PCA), 日志流量模式, 层次聚类, 可视化