计算机科学与探索 ›› 2018, Vol. 12 ›› Issue (10): 1547-1558.DOI: 10.3778/j.issn.1673-9418.1709041

• 数据库技术 • 上一篇    下一篇

RStore:基于BigTable的关系数据模型存储系统

鲁鹏凯,江大伟+,陈  珂,寿黎但,陈  刚   

  1. 浙江大学 计算机科学与技术学院,杭州 310027
  • 出版日期:2018-10-01 发布日期:2018-10-08

RStore: Relational Storage System Built on Top of BigTable

LU Pengkai, JIANG Dawei+, CHEN Ke, SHOU Lidan, CHEN Gang   

  1. College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China
  • Online:2018-10-01 Published:2018-10-08

摘要: 近年来,为支持大数据应用开发,大数据管理系统的研发工作受到越来越多的关注。理想的大数据管理系统应为应用开发人员提供两种支持:(1)快速应用开发,即应用开发人员仅须关注业务数据的逻辑数据模型,而无须关心实际物理存储结构的能力;(2)良好的伸缩性及高效的数据存取。然而,目前的解决方案都未能同时满足上述两个需求。关系数据库管理系统使用数据独立的编程模型以支持快速应用开发,但是关系数据库难以获得大数据所需的高伸缩性。诸如BigTable之类的NoSQL数据库具有极佳的系统伸缩性,但是为了高效地存取数据,需要应用开发人员精心地设计和利用BigTable的物理存储结构,因此,该系统并不支持快速应用开发。提出了RStore大数据存储系统,该系统支持快速应用开发所需的关系数据编程模型,并自动将关系型数据存入BigTable中,同时满足了快速应用开发和高效数据访问两个目标。TPC-C应用上的实验证实了提出的方法可行且高效。

关键词: 大数据管理, BigTable, 关系数据库

Abstract: There is an increasing interest to build data management systems to support big data applications. Ideally, those big data management systems should offer application developers two capabilities: (1) rapid application development (RAD), i.e., the ability of enabling application developers to focus on logical data structure instead of physical data layout, (2) scalable and efficient data access. Unfortunately, existing solutions fail to provide the two properties at the same time. Relational data management systems support RAD since those systems employ a data independent programming model. However, relational databases are hard to scale up to support big data. On the other hand, NoSQL databases such as BigTable enjoy excellent system scalability. However, to achieve data access efficiency, the BigTable programming model requires application developers to finely tune physical data layout. Thus, the system is not easy to use. This paper presents RStore, a big data storage system which employs a relational programming model for RAD and automatically persists those relational data into BigTable for scalability and thus achieving both required properties at the same time. Experiments on TPC-C applications confirm the effectiveness and efficiency of the proposed approach.

Key words: big data management, BigTable, relational databases