计算机科学与探索 ›› 2013, Vol. 7 ›› Issue (9): 800-810.DOI: 10.3778/j.issn.1673-9418.1305057

• 学术研究 • 上一篇    下一篇

并行OLTP系统中增量数据的自动分片技术研究

王晓燕1,2,3,陈晋川1,3,杜小勇1,3+,范  旭1,3   

  1. 1. 中国人民大学 信息学院,北京 100872
    2. 鲁东大学 信息与电气工程学院,山东 烟台 264025
    3. 教育部数据工程与知识工程重点实验室,北京 100872
  • 出版日期:2013-09-01 发布日期:2013-09-04

Research on Automatic Partitioning of Appended Data in Parallel OLTP Systems

WANG Xiaoyan1,2,3, CHEN Jinchuan1,3, DU Xiaoyong1,3+, FAN Xu1,3   

  1. 1. School of Information, Renmin University of China, Beijing 100872, China
    2. School of Information and Electrical Engineering, Ludong University, Yantai, Shandong 264025, China
    3. Key Laboratory of Data Engineering and Knowledge Engineering, Ministry of Education, Beijing 100872, China
  • Online:2013-09-01 Published:2013-09-04

摘要: 近年来,由于数据规模的急剧增长,越来越多的大型应用系统被部署到分布式环境中,它们需要通过数据分片技术,将原有数据集和新增加的数据审慎地划分到不同的节点上,来优化并行联机事务处理(on-line transaction processing,OLTP)系统的性能。针对系统中已有的静态数据和新生成的增量数据,提出了一种新的数据分片策略——数据表依赖分片策略(table dependency partitioning strategy,TDPS)。该策略首先根据数据表之间的相互依赖关系,对初始数据进行划分。当有新的数据到达时,它会自动将每个数据片段分配到最相关的数据分区中。使用TPC-C测试基准进行了一系列的实验,实验结果显示,与以前的方法相比,TDPS策略可以有效地提高系统性能。

关键词: 数据划分, 联机事务处理(OLTP), 增量数据

Abstract: Nowadays, more and more applications have to be deployed in a distributed environment in order to handle huge volume of data, which need to use data partitioning to optimize the performance of parallel OLTP (on-line transaction processing) systems via carefully dividing the original data and newly appended data into different data nodes. This paper presents a novel data partitioning strategy for allocating both static and appended data, called TDPS (table dependency partitioning strategy). This strategy firstly partitions the initial data based on table dependency. When there are new data arriving, it will assign each data fragment to the partition most close to it. This paper conducts a series of experiments over TPC-C datasets and transactions. According to the results, the proposed strategy can effectively improve the system performance compared with previous methods.

Key words: data partitioning, on-line transaction processing (OLTP), appended data