计算机科学与探索 ›› 2011, Vol. 5 ›› Issue (8): 719-729.

• 学术研究 • 上一篇    下一篇

采用元组聚类的增量式数据分区方法

吕 晨, 房 俊, 韩燕波   

  1. 1. 中国科学院 计算技术研究所 集成应用中心, 北京 100190
    2. 中国科学院 研究生院, 北京 100049
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2011-08-01 发布日期:2011-08-01

Incremental Data Partitioning Approach Based on Tuple Clustering

LV Chen, FANG Jun, HAN Yanbo   

  1. 1. Integration Application Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China 2. Graduate University, Chinese Academy of Sciences, Beijing 100049, China
  • Received:1900-01-01 Revised:1900-01-01 Online:2011-08-01 Published:2011-08-01

摘要: 数据分区是提升数据库可扩展能力的有效方法。在事务查询密集的系统中, 合理的分区策略可减少分布式事务查询数量, 并提高事务查询响应速度。提出了一种基于元组聚类的增量式分区方法, 通过将元组聚簇和采用分区感知的数据筛选策略来降低算法的复杂度。首先依据时间窗口模型聚类元组, 并构建簇节点图, 然后利用分区感知策略对图进行删减, 最后采用图划分算法对图进行子图划分来得到分区。与现有方法相比, 该方法减少了分区响应时间, 保证了较少的分布式事务数量, 并提高了分区事务查询速度。

关键词: 数据分区, 可扩展性, 元组聚类

Abstract: Data partitioning is an effective method to advance scalability of databases. In transaction-intensive systems, reasonable methods of data partition can reduce the number of distributed transactions and the response time of transactions. This paper proposes a novel approach based on tuple clustering to partition data incrementally. Al-gorithm complexity is reduced by the method of tuple cluster and the method of data filter based on partition-aware. Firstly, the related tuples are clustered by using time window model, and then the clustered node graph is con-structed. Secondly, some nodes in the graph are deleted according to partition-aware strategy. Lastly, partitions are obtained by dividing the graph into subgraphs. Compared with existing methods, the new method spends less time on data partitioning, and obtains less distributed transactions and faster speed on query for partitioning databases.

Key words: data partitioning, scalability, tuple clustering