计算机科学与探索 ›› 2019, Vol. 13 ›› Issue (3): 374-382.DOI: 10.3778/j.issn.1673-9418.1712002

• 学术研究 • 上一篇    下一篇

国家高性能计算环境事件流系统的设计

赵一宁+,肖海力   

  1. 中国科学院 计算机网络信息中心,北京 100190
  • 出版日期:2019-03-01 发布日期:2019-03-11

Design of Event Stream System in CNGrid

ZHAO Yining+, XIAO Haili   

  1. Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China
  • Online:2019-03-01 Published:2019-03-11

摘要: 国家高性能计算环境是由中国众多国家级计算中心和高校的计算集群聚合而成的大型高性能计算环境,为国内研究人员提供优质计算资源。出于维护环境正常稳定运行的目的,环境管理人员需要获取环境内部所发生的各种事件信息,以确保及时迅速地对环境产生的问题进行处理。针对这种需求,设计了国家高性能计算环境事件流处理与分发系统,用于对环境各类事件进行收集和按类型分类,最终提供给对事件有需求的环境应用。在该系统中,事件工厂模块负责对环境的各种事件进行格式解析以及初步过滤和处理等加工工作,然后将加工过的事件封装为统一的接口格式对外发布。初步实现了事件流系统的各部分功能,将其部署到国家高性能计算环境中,并对该系统的事件处理延时进行测试。实验结果表明事件处理过程的延时很低,可以满足对事件时效性的要求。

关键词: 日志处理, 事件流, 大数据分析, 高性能计算

Abstract: CNGrid is a large high-performance computing environment integrated by many clusters from supercomputing centers, research institutes and universities in China. It provides high quality computing resources to researchers in many research areas. For the purpose of maintaining stable running environment, system administrators and maintainers need to acquire the knowledge of events which happen in the environment to make quick responses for the encountered problems. This paper proposes the design of event stream processing and distributing system in CNGrid. This system provides the functionalities of gathering and classifying events produced by CNGrid, and distri-buting events to the related applications. The event factory module in the system performs the job of decoding events in multiple formats, filtering and preprocessing events, and finally packs them into a unified format to publish. This paper implements the basic functionalities of each part in the event stream processing and distributing system, and deploys it in CNGrid environment, then evaluates the delay resulted by the event factory module. The experiment results show that the delay by the event processing and distributing steps in the system is very low, and satisfying the requirement of low latency of sending events to related applications.

Key words: log processing, event stream, big-data analysis, high-performance computing