计算机科学与探索 ›› 2014, Vol. 8 ›› Issue (2): 129-138.DOI: 10.3778/j.issn.1673-9418.1306050

• 数据库技术 • 上一篇    下一篇

面向复杂数据的对象存储系统

兰  超1,2,3+,张  勇2,3,邢春晓2,3   

  1. 1. 清华大学 计算机科学与技术系,北京 100084
    2. 清华大学 信息技术研究院,北京 100084
    3. 清华大学 信息科学与技术国家实验室,北京 100084
  • 出版日期:2014-02-01 发布日期:2014-01-26

Elastic Object Store System for Complex Data

LAN Chao1,2,3+, ZHANG Yong2,3, XING Chunxiao2,3   

  1. 1. Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
    2. Research Institute of Information Technology, Tsinghua University, Beijing 100084, China
    3. Tsinghua National Laboratory for Information Science and Technology, Tsinghua University, Beijing 100084, China
  • Online:2014-02-01 Published:2014-01-26

摘要: 主要研究了数字图书馆应用中数字对象的有效存储和索引机制,提出并设计了针对数字图书馆的数据仓储系统PuntTable。PuntTable使用非关系模式存储和管理对象,并且通过在数据对象内部建立索引来支持查询。PuntTable包括两个主要模块:一个是PuntStore,一种有多存储引擎的数据存储系统;另一个是PuntIndex,一种支持多种索引方式的索引系统。PuntTable实现了高吞吐量和低延迟对象存储,数据对象的索引和内容都可以选择最为合适的存储层来进行存储。使用实际数字图书馆中的数据对PuntTable进行了测试和评估。在测试所用的数据集中,每个数据都采用不同长度,使测试更加接近实际应用。实验结果显示,对于不同的数据集使用不同的存储模型可以显著增大数据库系统的吞吐量,并且有效减少延迟。

关键词: 数字图书馆, 对象存储, 数据管理架构, 大数据

Abstract: This paper studies the problem of efficient object store and index in digital library and proposes a data repository system called PuntTable. PuntTable uses a schema-free way to store and get the objects and builds indices to support querying the fields inside the objects. In order to achieve a high throughput and low latency, PuntTable is designed by using multiple content storage engine and index storage engine through two interfaces, PuntStore and PuntIndex. PuntStore and PuntIndex are designed as the storage layer of PuntTable. Both the objects content and their indices can choose the most suitable storage layer for a specific data set. PuntTable is tested and evaluated for the performance of processing object data and index store combination by using varies of data sets with different single record sizes. These data sets are picked from digital library to simulate the real application scenarios. The result reveals that the proper configuration of storage layer for a particular data set can significantly improve the throughput and drop the latency.

Key words: digital library, object store, digital infrastructure, big data