计算机科学与探索 ›› 2018, Vol. 12 ›› Issue (2): 218-230.DOI: 10.3778/j.issn.1673-9418.1611041

• 数据库技术 • 上一篇    下一篇

对象存储下的溯源收集与存储研究

廖雪龙1,谢雨来1+,荣  震2,秦磊华1,陈俭喜2,冯  丹2   

  1. 1. 华中科技大学 计算机科学与技术学院,武汉 430074
    2. 华中科技大学 武汉国家光电实验室,武汉 430074
  • 出版日期:2018-02-01 发布日期:2018-01-31

Research on Provenance Collection and Storage Based on Object-Based Storage System

LIAO Xuelong1, XIE Yulai1+, RONG Zhen2, QIN Leihua1, CHEN Jianxi2, FENG Dan2   

  1. 1. School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
    2. Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology, Wuhan 430074, China
  • Online:2018-02-01 Published:2018-01-31

摘要: 溯源是描述一个数据对象的历史操作的元数据。溯源提高了数据本身所描述的价值,给出了“对象是如何创建的?它依赖了哪些其他对象?这两个对象的历史操作有何不同?”等问题的答案。分析了对象存储系统存储管理溯源信息的优势,研究并实现了如何利用对象存储系统架构来收集和存储溯源。通过在对象存储客户端利用系统状态文件获取系统内核信息,调用JHOVE应用程序来分析和封装文件格式,使用Linux系统的审计功能对普通应用程序进行监听,并将收集到的溯源信息封装成对象,存储到对象存储设备端Berkeley DB数据库或日志文件中。测试结果表明,基于对象的溯源存储系统在不同溯源信息的收集、存储和查询方面都具有较好的性能。

关键词: 溯源, 对象存储系统, 文件格式分析, 溯源系统

Abstract:  Provenance is metadata that describes the ancestry or history of a digital object. Provenance enhances the value of the data it describes, as it provides answers to questions such as: How is this object created? What other object does this object depend on? How do the ancestries of these two objects differ? This paper analyzes the advantages  of using objected-based storage system to store and manage provenance information, designs and implements how to use object-based storage architecture to collect and store provenance information. The system collects the kernel information by using system-status files, uses the JHOVE application to analyze file formats, and uses the Linux   audit to monitor ordinary user applications on object-based storage client, and then encapsulates these provenance information into objects, stores them in Berkeley DB or log files in object-based storage devices. The measurement results show that the provenance-aware system based on object-based storage system has a good performance in terms of provenance collection, storage and query.

Key words: provenance, object-based storage system, file format analysis, provenance-aware system