计算机科学与探索 ›› 2019, Vol. 13 ›› Issue (12): 1995-2007.DOI: 10.3778/j.issn.1673-9418.1808054

• 学术研究 • 上一篇    下一篇

远程直接内存访问与检查点相结合的容器迁移

赵倩,谢上钦,韩轲,龚青泽,冯光升,林俊宇   

  1. 1.哈尔滨商业大学 计算机与信息工程学院,哈尔滨 150028
    2.哈尔滨工程大学 计算机科学与技术学院,哈尔滨 150001
    3.中国科学院 信息工程研究所,北京 100093
  • 出版日期:2019-12-01 发布日期:2019-12-10

Container Migration Based on Combination of Remote Direct Memory Access and Check Point

ZHAO Qian, XIE Shangqin, HAN Ke, GONG Qingze, FENG Guangsheng, LIN Junyu   

  1. 1.School of Computer and Information Engineering, Harbin University of Commerce, Harbin 150028, China
    2.College of Computer Science and Technology, Harbin Engineering University, Harbin 150001, China
    3.Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100093, China
  • Online:2019-12-01 Published:2019-12-10

摘要: 随着云服务的应用和普及,云计算集群中容器的数量也日益增多。当集群中某一结点发生故障时,如何将故障结点上的服务迁移到可靠结点上成为维护云计算集群的重要问题。传统的集群容错方法采用备用主机作容错结点,由于受服务运行环境的限制,一台物理主机只能作一类服务的备用主机。为了提高容错备机的利用率,同时降低容错迁移拒绝率和容错迁移延迟,提出一种基于容器容错池的容器迁移机制。利用检查点机制和远程直接内存访问(RDMA)技术,在不影响容器虚拟集群正常工作的前提下,减少任务恢复环境耦合问题对任务迁移造成的影响。在实验室环境下验证了这种迁移机制的可用性和有效性。

关键词: 云计算集群, 容错池, 容器迁移

Abstract: With the application and popularization of cloud services, the number of containers in cloud computing cluster is also increasing continuously. Once a node in a cluster fails, how to efficiently migrate the service on the fault node to a reliable one becomes an important problem for maintaining cloud computing cluster. The traditional cluster fault tolerant methods take backup hosts as fault tolerant nodes. Due to the limitation of service operating environment, a physical host can only be used as an alternate host that runs one single kind of service. In order to improve the efficiency of backup machine utilization and reduce the rejection rate and fault tolerant migration delay, a container migration technology based on container fault-tolerant is proposed. It integrates the check point and remote direct memory access (RDMA) techniques to relieve the impact of task recovery environment coupling problems on task migration without affecting the normal operation of the container virtual cluster. Extensive simula-tions are conducted to verify the availability and effectiveness of the proposed method in a laboratory environment.

Key words: cloud computing cluster, fault-tolerant pool, container migration