Journal of Frontiers of Computer Science and Technology ›› 2007, Vol. 1 ›› Issue (2): 191-199.
• 学术研究 • Previous Articles Next Articles
HUANG Chun+,LIU Yongpeng,YANG Xuejun
Received:
Revised:
Online:
Published:
Contact:
黄 春+,刘勇鹏,杨学军
通讯作者:
Abstract: Checkpoint/Restart is one of the important approaches for software fault-tolerance. In this paper, the system-level and application-level coordinated Checkpoint/Restart mechanisms for OpenMP programs are presented. The system-level support is introduced for transparency, and it makes shared data saved by all threads together. The semantics-related operations of OpenMP will be separated from and hence independent of low-level systems by the application-level OpenMP checkpoint protocol, which improves portability of the checkpoint system. Based on the presented mechanism, a CCRG OpenMP Checkpoint/Restart system has been implemented. The experiments, such as NPB3.2-OMP, show the overhead of checkpointing and restarting is so limited that the system can be used in large scale programs.
Key words: OpenMP, Checkpoint/Restart, system-level and application-level coordinated
摘要: 检查点/续算是软件容错的重要途径之一。论文描述了一个系统级和应用级混合的OpenMP检查点机制,系统级支持不仅使检查点系统具有了好的透明性,并且使共享数据的保存不再由主线程单独完成,具有良好的数据局部性。应用级OpenMP协议将与OpenMP相关的协议处理独立出来,提高了系统的可移植性。NPB3.2-OMP测试结果表明,检查点和续算所需要的时间开销小,能够满足大规模程序的实际需求。
关键词: OpenMP, 检查点/续算, 系统级和应用级协同
HUANG Chun+,LIU Yongpeng,YANG Xuejun. A new hybrid mechanism for Checkpoint/Restart in OpenMP programs[J]. Journal of Frontiers of Computer Science and Technology, 2007, 1(2): 191-199.
黄 春+,刘勇鹏,杨学军. 面向OpenMP的混合检查点机制[J]. 计算机科学与探索, 2007, 1(2): 191-199.
0 / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: http://fcst.ceaj.org/EN/
http://fcst.ceaj.org/EN/Y2007/V1/I2/191
/D:/magtech/JO/Jwk3_kxyts/WEB-INF/classes/