计算机科学与探索 ›› 2010, Vol. 4 ›› Issue (9): 850-858.DOI: 10.3778/j.issn.1673-9418.2010.09.008

• 学术研究 • 上一篇    下一篇

列存储数据查询中的连接策略选择方法*

李 静+;孙 莉; 王 梅

  

  1. 东华大学 计算机科学与技术学院, 上海 201620
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2010-09-09 发布日期:2010-09-09
  • 通讯作者: 李 静

A Selection Method of Join Strategy in Column Store Based Query*

LI Jing+; SUN Li; WANG Mei

  

  1. School of Computer Science and Technology, Donghua University, Shanghai 201620, China
  • Received:1900-01-01 Revised:1900-01-01 Online:2010-09-09 Published:2010-09-09
  • Contact: LI Jing

摘要: 列的连接策略优化是列存储数据查询中的重要问题。现有的列存储系统中, 列的连接存在策略单一, 缺少优化处理, 无法满足复杂查询等缺陷。针对这些问题, 提出一种连接策略选择方法。该方法首先定义简单规则过滤代价过大的查询计划, 生成候选查询计划树。进而根据动态Huffman树原理提出动态优化树算法, 对候选查询计划树中的查询执行顺序进行改进。根据列存储数据的特点, 候选计划中每个连接节点的执行策略被归纳为两种:串行连接和并行连接。在此基础上构建代价估计模型, 集中针对这两种连接策略进行代价估计和策略选择, 从而以较小的时间复杂度获得优化的查询执行策略。

关键词: 列存储, 连接策略, 查询优化, 基于代价

Abstract: The join strategy optimization among columns is the most important problem in column store based queries. Current column-oriented systems use single join strategy through a whole query, with less optimization, so the performance remains dissatisfactory. A selection method of join strategy is presented. Firstly, the query plans with too much cost is removed by defining several simple rules, and the candidate query plan tree is obtained. Then the dynamic optimization algorithm is proposed to improve the candidate query plan under the principle of Huffman tree. According to the column-oriented data storage characteristics, the join execution for each join node in the plan can be summarized into two ways: Pipeline strategy and parallel strategy. A cost model is then proposed to select the optimal strategy by estimating the cost of the pipeline and parallel strategies with low time complexity.

Key words: column store, join strategy, query optimization, cost-based

中图分类号: