计算机科学与探索 ›› 2009, Vol. 3 ›› Issue (4): 413-422.DOI: 10.3778/j.issn.1673-9418.2009.04.008

• 学术研究 • 上一篇    下一篇

差量存储的集中式文件级连续数据保护方法

生拥宏1,3+,刘川意1,鞠大鹏2,汪东升1,2   

  1. 1. 清华大学 计算机科学与技术系,北京 100084
    2. 清华大学 信息技术研究院,北京 100084
    3. 信息工程大学 信息工程学院,郑州 450002
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2009-07-15 发布日期:2009-07-15
  • 通讯作者: 生拥宏

A Centralized Differential Archiving Method for File Level Continuous Data Protection

SHENG Yonghong1,3+, LIU Chuanyi1, JU Dapeng2, WANG Dongsheng1,2   

  1. 1. Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
    2. Research Institute of Information Technology, Tsinghua University, Beijing 100084, China
    3. College of Information Engineering, University of Information Engineering, Zhengzhou 450002, China
  • Received:1900-01-01 Revised:1900-01-01 Online:2009-07-15 Published:2009-07-15
  • Contact: SHENG Yonghong

摘要: 基于文件的连续数据保护系统可实时捕获单个文件的变化,提供任意时间点的文件恢复。在数据的传输上使用差量算法对文件进行差异传输,存储上使用镜像与差量结合的方式记录文件的变化。该方法有效利用了网络带宽,节约了存储资源。在服务器的工作方式上,采用同步和异步两种方式处理不同命令。对异步命令采用单队列、多处理线程的执行模式,有效提高了短作业响应能力以及多任务并发性能。针对多版本历史文件查找,提出了一种基于索引文件的快速查找方法。此外,对服务的并发执行性能进行了测试与分析。

关键词: 连续数据保护, 差量, 队列, 索引

Abstract: The file-level continuous data protection system works above file system. It captures real time changes of single file and provides the ability to recover data at any point of time. The system utilizes a delta algorithm to compute the differences between two versions of file in order to effectively use the network bandwidth. Different versions of file are kept at server with carefully designed integration of mirroring and incrementing. All commands are divided into synchronized or asynchronized catalogs, according to estimated execution time. All synchronized commands are executed immediately while asynchronized commands are inserted into a queue, waiting for execution by a multi-threaded worker. This design guaranties a short response time on small tasks and better performance on heavy work load. Also, An index structure to find out certain files by file name from all versions is presented. What’s more, the system performance on single task and multi-tasks is analyzed.

Key words: continuous data protection, incremental archive, queue, index