Journal of Frontiers of Computer Science and Technology ›› 2012, Vol. 6 ›› Issue (9): 852-864.DOI: 10.3778/j.issn.1673-9418.2012.09.009

Previous Articles    

Multi-Query Optimization Strategy in Column-Based OLAP System

LU Xuchen+, WANG Mei, LE Jiajin   

  1. School of Computer Science and Technology, Donghua University, Shanghai 201620, China
  • Online:2012-09-01 Published:2012-09-03

列存储中的OLAP多查询优化方法

陆戌辰+,王  梅,乐嘉锦   

  1. 东华大学 计算机科学与技术学院,上海 201620

Abstract: This paper provides a multi-query optimization strategy to achieve the share and reuse of column scan, column join and aggregation operations in column store OLAP (on-line analytical processing) systems which can easily lead to large I/O and CPU overhead. According to the features of column-stores and OLAP applications, the paper proposes a series of transformation rules to generate a single global query plan for a set of related queries mapped from a certain OLAP require. Aiming at the share and reuse of operations, the paper also introduces four new defined nodes: the filter node, the group by node, the merge node and the aggregation node. At the same time, it makes an improvement to the MuGA (multiply group by algorithm), and uses the filter node, merge node and join node to mark group number for each tuple in dimension and fact tables, to achieve the share of column scan and column join operations. For the aggregation node, the paper proposes a multi-phase aggregation algorithm combined with compound group number of the fact table to implement effective aggregation reuse. The experimental results on the benchmark data sets SSB (star schema benchmark) also verify the effectiveness of the multi-query optimization strategy.

Key words: column store, on-line analytical processing (OLAP), multi-query optimization, global query plan, operation reuse

摘要: 为了使列存储OLAP(on-line analytical processing)操作中I/O和CPU开销较大的扫描、连接、聚集操作实现有效的共享和复用,提出了一个多查询优化技术。根据列存储以及OLAP操作的特点,提出了一系列转换规则,为OLAP查询请求产生的一组相关查询语句生成一个单一全局查询计划。为了达到共享复用的目的,在全局计划中引入新的过滤结点、分组结点、合并结点和聚集结点。同时,借用MuGA(multiply group by algorithm)算法,通过分组结点、合并结点、连接结点实现维表及事实表元组的分组序号标记,从而实现列扫描、列连接的共享。并为聚集结点提出了一个多阶段聚集算法,结合最终生成的事实表复合分组序号,实现聚集操作的复用。在SSB(star schema benchmark)数据集上设计实验,证明了该多查询优化策略的有效性。

关键词: 列存储, 联机分析处理(OLAP), 多查询优化, 全局计划, 操作复用