计算机科学与探索 ›› 2013, Vol. 7 ›› Issue (5): 385-393.DOI: 10.3778/j.issn.1673-9418.1212011

• 学术研究 • 上一篇    下一篇

星型模型上的高效百分点计算方法

覃雄派1,2+   

  1. 1. 中国人民大学信息学院,北京100872
    2. 中国人民大学教育部数据工程与知识工程重点实验室,北京100872
  • 出版日期:2013-05-01 发布日期:2013-05-03

Efficient Percentile Computing over Star Schema

QIN Xiongpai1,2+   

  1. 1. School of Information, Renmin University of China, Beijing 100872, China
    2. Key Laboratory of Data Engineering and Knowledge Engineering, Ministry of Education, Renmin University of
    China, Beijing 100872, China
  • Online:2013-05-01 Published:2013-05-03

摘要: 提出了星型模型扁平化编码方法上的百分点聚集函数的并行算法。把星型模型中维表上与查询相关的维度层次信息编码到事实表里,该编码方法使得经过改写的聚集查询,在查询处理过程中无需进行事实表和维表之间的连接,数据可以均匀分布到机群上,利用并行处理提高查询性能。百分点计算不具有天然的并行性,提出了基于采样预测的并行迭代式算法,通过付出采样数据的网络传输开销,使得算法快速收敛,解决了大规模机群上的百分点聚集函数计算的性能问题。实验结果证实,该算法不仅能快速收敛,而且其网络传输开销也是可以接受的。

关键词: 星型模型, 百分点, 并行算法, 迭代

Abstract: This paper proposes a parallel percentile computing algorithm over flattening encoded star schema. Firstlyhierarchies of all dimensions pertaining to queries in a star schema are encoded into the fact table, the encodingscheme enables processing queries without joining between the fact table and dimension tables after query rewriting,thus data can be partitioned onto a large cluster evenly, and high performance is achieved through parallel processing.Percentile computing is not naturally parallelizable, this paper also puts forward an iterative parallel algorithm based on prediction. Through paying out the cost of data sampling, the algorithm converges quickly, and resolves the problem of computing percentile on a large scale cluster. Experiment results demonstrate that not only the algorithm converges quickly, but also the network overhead is acceptable.

Key words: star schema, percentile, parallel algorithm, iterative