Journal of Frontiers of Computer Science and Technology ›› 2022, Vol. 16 ›› Issue (1): 106-119.DOI: 10.3778/j.issn.1673-9418.2009099
• Database Technology • Previous Articles Next Articles
ZONG Fengbo1, ZHAO Yuhai1,+(), WANG Guoren2, JI Hangxu1
Received:
2020-08-06
Revised:
2020-10-16
Online:
2022-01-01
Published:
2020-11-06
About author:
ZONG Fengbo, born in 1995, M.S. candidate. His research interest is big data.Supported by:
通讯作者:
+ E-mail: zhaoyuhai@mail.neu.edu.cn作者简介:
宗枫博(1995—),男,河北唐山人,硕士研究生,主要研究方向为大数据。基金资助:
CLC Number:
ZONG Fengbo, ZHAO Yuhai, WANG Guoren, JI Hangxu. Optimization Method of Projection and Order for Multiple Tables Join[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(1): 106-119.
宗枫博, 赵宇海, 王国仁, 季航旭. 面向多表数据连接投影和连接顺序的优化方法[J]. 计算机科学与探索, 2022, 16(1): 106-119.
Add to citation manager EndNote|Ris|BibTeX
URL: http://fcst.ceaj.org/EN/10.3778/j.issn.1673-9418.2009099
Source表 | TPC-H表 |
---|---|
Source0 | lineitem |
Source1 | orders |
Source2 | partsupp |
Source3 | part |
Source4 | customer |
Source5 | supplier |
Source6 | nation |
Source7 | region |
Table 1 TPC-H table corresponding to Source table
Source表 | TPC-H表 |
---|---|
Source0 | lineitem |
Source1 | orders |
Source2 | partsupp |
Source3 | part |
Source4 | customer |
Source5 | supplier |
Source6 | nation |
Source7 | region |
连接表 | 不同数据集规模下的连接表大小 | ||||
---|---|---|---|---|---|
100 MB | 500 MB | 600 MB | 700 MB | 800 MB | |
lineitem | 71 MB | 360 MB | 433 MB | 506 MB | 579 MB |
Orders | 17 MB | 82 MB | 98 MB | 115 MB | 131 MB |
Partsupp | 12 MB | 57 MB | 68 MB | 80 MB | 91 MB |
Customer | 2.4 MB | 12 MB | 14 MB | 17 MB | 19 MB |
Part | 2.3 MB | 12 MB | 14 MB | 17 MB | 19 MB |
Supplier | 140 KB | 692 KB | 828 KB | 964 KB | 1.1 MB |
Region | 4.0 KB | 4.0 KB | 4.0 KB | 4.0 KB | 4.0 KB |
nation | 4.0 KB | 4.0 KB | 4.0 KB | 4.0 KB | 4.0 KB |
Table 2 Data size and distribution of TPC-H tables
连接表 | 不同数据集规模下的连接表大小 | ||||
---|---|---|---|---|---|
100 MB | 500 MB | 600 MB | 700 MB | 800 MB | |
lineitem | 71 MB | 360 MB | 433 MB | 506 MB | 579 MB |
Orders | 17 MB | 82 MB | 98 MB | 115 MB | 131 MB |
Partsupp | 12 MB | 57 MB | 68 MB | 80 MB | 91 MB |
Customer | 2.4 MB | 12 MB | 14 MB | 17 MB | 19 MB |
Part | 2.3 MB | 12 MB | 14 MB | 17 MB | 19 MB |
Supplier | 140 KB | 692 KB | 828 KB | 964 KB | 1.1 MB |
Region | 4.0 KB | 4.0 KB | 4.0 KB | 4.0 KB | 4.0 KB |
nation | 4.0 KB | 4.0 KB | 4.0 KB | 4.0 KB | 4.0 KB |
中间节点 | 不同冗余率下的数据发送量 | |
---|---|---|
0% | 9.70% | |
JoinNode0 | 54.4 MB | 54.4 MB |
JoinNode1 | 320 MB | 411 MB |
JoinNode2 | 205 MB | 594 MB |
JoinNode3 | 228 MB | 228 MB |
JoinNode4 | 44.7 GB | 62.6 GB |
JoinNode5 | 44.7 GB | 62.6 GB |
JoinNode6 | 44.2 GB | 44.2 GB |
Table 3 Data ship of intermediate nodes in 0% and 9.70% redundant ratios
中间节点 | 不同冗余率下的数据发送量 | |
---|---|---|
0% | 9.70% | |
JoinNode0 | 54.4 MB | 54.4 MB |
JoinNode1 | 320 MB | 411 MB |
JoinNode2 | 205 MB | 594 MB |
JoinNode3 | 228 MB | 228 MB |
JoinNode4 | 44.7 GB | 62.6 GB |
JoinNode5 | 44.7 GB | 62.6 GB |
JoinNode6 | 44.2 GB | 44.2 GB |
[1] | KADKHODAEI H, MAHMOUDI F. A combination me-thod for join ordering problem in relational databases using genetic algorithm and ant colony[C]// Proceedings of the 2011 IEEE International Conference on Granular Computing,Taiwan, China, Nov 8-10, 2011. Washington: IEEE Computer Society, 2011: 312-317. |
[2] | WILSCHUT A N, FLOKSTRA J. APERS P M G. Parallel evaluation of multi-join queries[C]// Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data, San Jose, May 22-25, 1995. New York: ACM, 1995: 115-126. |
[3] | STEINBRUNN M, MOERKOTTE G, KEMPER A. Heuris-tic and randomized optimization for the join ordering pro-blem[J]. The VLDB Journal, 1997, 6(3):191208. |
[4] | VELLEV S. Review of algorithms for the join ordering pro-blems in database query optimization[J]. Information Tec-hnologies and Control, 2009, 1:3240. |
[5] | SELINGER P G, ASTRAHAN M M, CHAMBERLIN D D, et al. Access path selection in a relational database manage-ment system[C]// Proceedings of the 1979 ACM SIGMOD International Conference on Management of Data, Boston, May 30-Jun 1, 1979. New York: ACM, 1979: 23-34. |
[6] |
IOANNIDIS Y E, WONG E. Query optimization by simulated annealing[J]. ACM SIGMOD Record, 1987, 16(3):9-22.
DOI URL |
[7] | LI N N, LIU Y J, DONG Y F, et al. Application of ant colony optimization algorithm to multi-join query optimiza-tion[C]// LNCS 5370: Proceedings of the 3rd International Symposium on Intelligence Computation and Applications, Wuhan, Dec 19-21, 2008. Berlin, Heidelberg: Springer, 2008: 189-197. |
[8] | DE JONG EDWIN D, POLLACK J B. Ideal evaluation from coevolution[J]. Evolutionary Computation, 2004, 12(2):159192. |
[9] | SHEKITA E J, YOUNG H C, TAN K L. Multi-join optimi-zation for symmetric multiprocessors[C]// Proceedings of the 19th International Conference on Very Large Data Bases, Dublin, Aug 24-27, 1993. San Mateo: Morgan Kaufmann, 1993: 479-492. |
[10] | AFRATI F N, ULLMAN J D. Optimizing joins in a map-reduce environment[C]// Proceedings of the 13th Interna-tional Conference on Extending Database Technology, Lausanne, Mar 22-26, 2010. New York: ACM, 2010: 99-110. |
[11] | HÜSKE F. Peeking into Apache Flink’s engine room[EB/OL].(2015-03-13) [2020-05-26]. https://flink.apache.org. |
[12] | BLAKELEY J A, MARTIN N L. Join index, materialized view, and hybrid-hash join: a performance analysis[C]// Proceedings of the 6th International Conference on Data Engineering, Los Angeles, Feb 5-9, 1990. Washington: IEEE Computer Society, 1990: 256-263. |
[13] | COLE R. Parallel merge sort[J]. SIAM Journal on Com-puting, 1988, 17(4):770-785. |
[14] | TPC BenchmarkTM H standard specification revision 2.17.1[J]. San Francisco: Transaction Processing Performance Council, 2014. |
[1] | CHEN Jiannan, DU Junping, XUE Zhe, KOU Feifei. Accurate Portrait of Big Data of Financial Events Based on Multiple Attention Mechanism [J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(7): 1237-1244. |
[2] | ZHAO Xuewu, WU Ning, WANG Jun, RUAN Li, LI Lingling, XU Tao. Overview of Aviation Big Data Research [J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(6): 999-1025. |
[3] | GUO Zijing, LUO Yuchuan, CAI Zhiping, ZHENG Tengfei. Overview of Privacy Protection Technology of Big Data in Healthcare [J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(3): 389-402. |
[4] | ZHENG Yafeng, ZHAO Yaning, BAI Xue, FU Qian. Survey of Big Data Visualization in Education [J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(3): 403-422. |
[5] | WANG Muxian, DING Xiaoou, WANG Hongzhi, LI Jianzhong. Correlation-Based Method for Tracing Multi-dimensional Time Series Data Anomalies [J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(11): 2142-2150. |
[6] | BAO Panpan, TAO Chuanqi, HUANG Zhiqiu. Research on Data Quality of Open Source Code Data [J]. Journal of Frontiers of Computer Science and Technology, 2020, 14(3): 389-400. |
[7] | HU Jian, XU Kaibin, MAO Yimin. Parallel Density-Based Clustering Algorithm by Using Weighted Grid and Information Entropy [J]. Journal of Frontiers of Computer Science and Technology, 2020, 14(12): 2094-2107. |
[8] | SONG Baoyan, MENG Yanwei, DING Linlin. KNN Query Method for Location Data Based on Voronoi Partition [J]. Journal of Frontiers of Computer Science and Technology, 2019, 13(12): 2015-2028. |
[9] | LIU Zhengtao, WANG Jiandong. Data Source Selection for Web Big Data System [J]. Journal of Frontiers of Computer Science and Technology, 2018, 12(3): 360-369. |
[10] | ZHANG Dakun, REN Shuxia. Survey on Hypergraph Visualization Method [J]. Journal of Frontiers of Computer Science and Technology, 2018, 12(11): 1701-1717. |
[11] | LU Pengkai, JIANG Dawei, CHEN Ke, SHOU Lidan, CHEN Gang. RStore: Relational Storage System Built on Top of BigTable [J]. Journal of Frontiers of Computer Science and Technology, 2018, 12(10): 1547-1558. |
[12] | WANG Yongkun, JIN Yaohui. Design of Data Platform and Application in Data Competition [J]. Journal of Frontiers of Computer Science and Technology, 2018, 12(1): 39-48. |
[13] | DENG Shizhuo, XIN Junchang, NIE Tiezheng, WANG Guoren. Big Data Similarity Join Processing Based on Prefix-Suffix Filtering [J]. Journal of Frontiers of Computer Science and Technology, 2017, 11(8): 1235-1245. |
[14] | WANG Zhongguo, WU Min, TAN Fangfang. Sparse Mixed Graph Random Jump Transition Policy for Web Object Multi-Label Classification [J]. Journal of Frontiers of Computer Science and Technology, 2017, 11(7): 1166-1174. |
[15] | WANG Yi, REN Shuxia. Survey on Visualization of Medical Big Data [J]. Journal of Frontiers of Computer Science and Technology, 2017, 11(5): 681-699. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||
/D:/magtech/JO/Jwk3_kxyts/WEB-INF/classes/