计算机科学与探索 ›› 2020, Vol. 14 ›› Issue (7): 1081-1103.DOI: 10.3778/j.issn.1673-9418.1911063
宋雨萌,谷峪,李芳芳,于戈
出版日期:
2020-07-01
发布日期:
2020-08-12
SONG Yumeng, GU Yu, LI Fangfang, YU Ge
Online:
2020-07-01
Published:
2020-08-12
摘要:
数据查询处理与优化作为数据管理中最具挑战性的问题之一,一直受到广泛关注。传统的查询处理与优化技术在实际使用中需要针对特定的工作负载和数据集进行大量的手动调优,因而已经无法满足现代数据库系统的发展需求。受人工智能(AI)成功应用于多领域研究的启发,近期人工智能赋能的查询处理与优化新技术相继被提出并取得了一定的研究成果。针对这些研究工作,首先给出了人工智能赋能的查询处理与优化技术的主要任务,分析了与传统人工智能任务的区别。其次梳理了该领域的主要研究进展,并总结了主要优势与应用瓶颈。接着讨论了当前所面临的主要技术挑战。最后对该领域的未来发展进行了展望。
宋雨萌,谷峪,李芳芳,于戈. 人工智能赋能的查询处理与优化新技术研究综述[J]. 计算机科学与探索, 2020, 14(7): 1081-1103.
SONG Yumeng, GU Yu, LI Fangfang, YU Ge. Survey on AI Powered New Techniques for Query Processing and Optimization[J]. Journal of Frontiers of Computer Science and Technology, 2020, 14(7): 1081-1103.
[1] Zhou A Y, Jin C Q, Wang G R, et al. A survey on the management of uncertain data[J]. Chinese Journal of Computers, 2009, 32(1): 1-16. 周傲英, 金澈清, 王国仁, 等. 不确定性数据管理技术研究综述[J]. 计算机学报, 2009, 32(1): 1-16. [2] Kraska T, Alizadeh M, Beutel A, et al. Sagedb: a learned data- base system[C]//Proceedings of the 9th Biennial Conference on Innovative Data Systems Research, Asilomar, Jan 13-16, 2019: 13. [3] Wang W, Zhang M, Chen G, et al. Database meets deep lear-ning: challenges and opportunities[J]. ACM SIGMOD Record, 2016, 45(2): 17-22. [4] Sun L M, Zhang S M, Ji T, et al. Survey of data management techniques powered by artificial intelligence[J]. Journal of Software, 2020, 31(3): 600-619. 孙路明, 张少敏, 姬涛, 等. 人工智能赋能的数据管理技术研究[J]. 软件学报, 2020, 31(3): 600-619. [5] Li G L, Zhou X H, Sun J, et al. A survey of machine-learning-based database techniques[J/OL]. Chinese Journal of Computers (2019-11-04) [2020-02-20]. http://kns.cnki.net/kcms/detail/11.1826.TP.20191104.1009.002.html. 李国良, 周煊赫, 孙佶, 等. 基于机器学习的数据库技术综述[J/OL]. 计算机学报(2019-11-04) [2020-02-20]. http://kns.cnki.net/kcms/detail/11.1826.TP.20191104.1009.002.html. [6] Meng X F, Ma C H, Yang C. Survey on machine learning for database systems[J]. Journal of Computer Research and Development, 2019, 56(9): 1803-1820. 孟小峰, 马超红, 杨晨. 机器学习化数据库系统研究综述[J]. 计算机研究与发展, 2019, 56(9): 1803-1820. [7] LeCun Y, Bengio Y, Hinton G. Deep learning[J]. Nature, 2015, 521(7553): 436-444. [8] Kohonen T. An introduction to neural computing[J]. Neural Networks, 1988, 1(1): 3-16. [9] Sutton R S, Barto A G. Reinforcement learning: an introduction[M]. Cambridge: MIT Press, 2018. [10] Hu B, Lu Z, Li H, et al. Convolutional neural network architectures for matching natural language sentences[C]//Proceedings of the 2014 Annual Conference on Neural Information Processing Systems, Montreal, Dec 8-13, 2014. Cambridge: MIT Press, 2014: 2042-2050. [11] Lipton Z C, Berkowitz J, Elkan C. A critical review of recurrent neural networks for sequence learning[J]. arXiv:1506. 00019, 2015. [12] Graefe G, Larson P A. B-tree indexes and CPU caches[C]//Proceedings of the 17th International Conference on Data Engineering, Heidelberg, Apr 2-6, 2001. Piscataway: IEEE, 2001: 349-358. [13] Bayer R, McCreight E. Organization and maintenance of large ordered indexes[M]//Software Pioneers. Berlin, Heidelberg: Springer, 2002: 245-262. [14] Lehman T J, Carey M J. A study of index structures for main memory database management systems[R]. University of Wisconsin-Madison. Department of Computer Sciences, 1985. [15] Bayer R. Symmetric binary B-trees: data structure and maintenance algorithms[J]. Acta Informatica, 1972, 1(4): 290-306. [16] Galakatos A, Markovitch M, Binnig C, et al. A-tree: a bounded approximate index structure[J]. arXiv:1801.10207, 2018. [17] Hadian A, Heinis T. Interpolation-friendly B-trees: bridging the gap between algorithmic and learned indexes[C]//Proceedings of the 22nd International Conference on Extending Database Technology, Lisbon, Mar 26, 2019. Berlin, Heidelberg: Springer, 2019: 710-713. [18] Kraska T, Beutel A, Chi E H, et al. The case for learned index structures[C]//Proceedings of the 2018 International Conference on Management of Data, Houston, Jun 10-15, 2018. New York: ACM, 2018: 489-504. [19] Mitzenmacher M. Optimizing learned bloom filters by sandwiching[J]. arXiv:1803.01474, 2018. [20] Oosterhuis H, Culpepper J S, De Rijke M. The potential of learned index structures for index compression[J]. arXiv:1811.06678, 2018. [21] Hadian A, Heinis T. Considerations for handling updates in learned index structures[C]//Proceedings of the 2nd International Workshop on Exploiting Artificial Intelligence Techniques for Data Management, Amsterdam, Jul 5, 2019. New York: ACM, 2019: 3. [22] Marcus R, Papaemmanouil O. Towards a hands-free query optimizer through deep learning[J]. arXiv:1809.10212, 2018. [23] Wang Z, Bapst V, Heess N, et al. Sample efficient actor-critic with experience replay[J]. arXiv:1611.01224, 2016. [24] The PostgreSQL Global Development Group. PostgreSQL: the world??s most advanced open source database[EB/OL]. [2020-03-16]. http://www.postgresql.org/. [25] Microsoft. SQL server[EB/OL]. [2020-03-16]. https://microsoft.com/sql-server/. [26] Marcus R, Papaemmanouil O. Deep reinforcement learning for join order enumeration[C]//Proceedings of the 1st International Workshop on Exploiting Artificial Intelligence Techniques for Data Management, Houston, Jun 10, 2018. New York: ACM, 2018: 3. [27] Krishnan S, Yang Z, Goldberg K, et al. Learning to optimize join queries with deep reinforcement learning[J]. arXiv: 1808.03196, 2018. [28] Trummer I, Wang J, Maram D, et al. SkinnerDB: regret-bounded query evaluation via reinforcement learning[C]//Proceedings of the 2019 International Conference on Management of Data, Amsterdam, Jun 30-Jul 5, 2019. New York: ACM, 2019: 1153-1170. [29] Kocsis L, Szepesvári C. Bandit based Monte-Carlo planning[C]//Proceedings of the 2006 European Conference on Machine Learning, Berlin, Sep 18-22, 2006. Berlin, Heidelberg: Springer, 2006: 282-293. [30] Leis V, Gubichev A, Mirchev A, et al. How good are query optimizers, really?[J]. Proceedings of the VLDB Endowment, 2015, 9(3): 204-215. [31] Akdere M, Çetintemel U, Riondato M, et al. Learning-based query performance modeling and prediction[C]//Proceedings of the 28th IEEE International Conference on Data Engineering, Washington, Apr 1-5, 2012. Piscataway: IEEE, 2012: 390-401. [32] Marcus R, Papaemmanouil O. Plan-structured deep neural network models for query performance prediction[J]. arXiv:1902.00132, 2019. [33] Idreos S, Zoumpatianos K, Hentschel B, et al. The data calculator: data structure design and cost synthesis from first principles and learned cost models[C]//Proceedings of the 2018 International Conference on Management of Data, Houston, Jun 10-15, 2018. New York: ACM, 2018: 535-550. [34] Sun J, Li G. An end-to-end learning-based cost estimator[J]. arXiv:1906.02560, 2019. [35] Karnagel T, Habich D, Lehner W. Local vs. global optimization: operator placement strategies in heterogeneous environments[C]//Proceedings of the EDBT/ICDT Workshops, Brussels, Mar 27, 2015. Berlin, Heidelberg: Springer, 2015: 48-55. [36] Leis V, Radke B, Gubichev A, et al. Cardinality estimation done right: index-based join sampling[C]//Proceedings of the 8th Biennial Conference on Innovative Data Systems Research, Chaminade, Jan 8-11, 2017: 8. [37] Kipf A, Vorona D, Müller J, et al. Estimating cardinalities with deep sketches[J]. arXiv:1904.08223, 2019. [38] Lakshmi S, Zhou S. Selectivity estimation in extensible databases—a neural network approach[C]//Proceedings of the 24th International Conference on Very Large Data Bases, New York, Aug 24-27, 1998, New York: ACM, 1998: 623-627. [39] Malik T, Burns R C, Chawla N V. A black-box approach to query cardinality estimation[C]//Proceedings of the 3rd Biennial Conference on Innovative Data Systems Research, Asilomar, Jan 7-10, 2007: 56-67. [40] Liu H, Xu M, Yu Z, et al. Cardinality estimation using neural networks[C]//Proceedings of the 25th Annual International Conference on Computer Science and Software Engineering, Markham, Nov 2-4, 2015. New York: ACM, 2015: 53-59. [41] Hasan S, Thirumuruganathan S, Augustine J, et al. Multi-attribute selectivity estimation using deep learning[J]. arXiv: 1903.09999, 2019. [42] Kipf A, Kipf T, Radke B, et al. Learned cardinalities: estimating correlated joins with deep learning[J]. arXiv:1809. 00677, 2018. [43] Woltmann L, Hartmann C, Thiele M, et al. Cardinality estimation with local deep learning models[C]//Proceedings of the 2nd International Workshop on Exploiting Artificial Intelligence Techniques for Data Management, Amsterdam, Jul 5, 2019. New York: ACM, 2019: 5. [44] Ortiz J, Balazinska M, Gehrke J, et al. Learning state representations for query optimization with deep reinforcement learning[J]. arXiv:1803.08604, 2018. [45] Germain M, Gregor K, Murray I, et al. Made: masked autoencoder for distribution estimation[C]//Proceedings of the 2015 International Conference on Machine Learning, Lille, Jul 6-11, 2015. New York: ACM, 2015: 881-889. [46] Ma Q, Triantafillou P. DBEst: revisiting approximate query processing engines with machine learning models[C]//Proceedings of the 2019 International Conference on Management of Data, Amsterdam, Jun 30 - Jul 5, 2019. New York: ACM, 2019: 1553-1570. [47] Thirumuruganathan S, Hasan S, Koudas N, et al. Approximate query processing using deep generative models[J]. arXiv:1903.10000, 2019. [48] Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: machine learning in Python[J]. Journal of Machine Learning Research, 2011, 12: 2825-2830. [49] Chen T, Guestrin C. Xgboost: a scalable tree Boosting system[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, Aug 13-17, 2016. New York: ACM, 2016: 785-794. [50] Prokhorenkova L, Gusev G, Vorobev A, et al. CatBoost: unbiased Boosting with categorical features[C]//Proceedings of the 2018 Annual Conference on Neural Information Processing Systems, Montréal, Dec 3-8, 2018. Cambridge: MIT Press, 2018: 6638-6648. [51] Ke G, Meng Q, Finley T, et al. Lightgbm: a highly efficient gradient Boosting decision tree[C]//Proceedings of the 2017 Annual Conference on Neural Information Processing Systems, Long Beach, Dec 4-9, 2017. Cambridge: MIT Press, 2017: 3146-3154. [52] Friedman J H. Stochastic gradient Boosting[J]. Computational Statistics & Data Analysis, 2002, 38(4): 367-378. [53] Mozafari B. Approximate query engines: commercial challenges and research opportunities[C]//Proceedings of the 2017 ACM International Conference on Management of Data, Chicago, May 14-19, 2017. New York: ACM, 2017: 521-524. [54] Mozafari B, Niu N. A handbook for building an approximate query engine[J]. IEEE Data Engineering Bulletin, 2015, 38(3): 3-29. [55] Grover A, Gummadi R, Lazaro-Gredilla M, et al. Variational rejection sampling[J]. arXiv:1804.01712, 2018. [56] Park Y, Mozafari B, Sorenson J, et al. Verdictdb: universalizing approximate query processing[C]//Proceedings of the 2018 International Conference on Management of Data, Houston, Jun 10-15, 2018. New York: ACM, 2018: 1461-1476. [57] Agrawal S, Chaudhuri S, Kollar L, et al. Database tuning advisor for microsoft SQL server 2005[C]//Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, Baltimore, Jun 14-16, 2005. New York: ACM, 2005: 930-932. [58] Bruno N, Chaudhuri S. An online approach to physical design tuning[C]//Proceedings of the 23rd IEEE International Conference on Data Engineering, Istanbul, Apr 15-20, 2007. Piscataway: IEEE, 2007: 826-835. [59] Chaudhuri S, Narasayya V R. An efficient, cost-driven index selection tool for Microsoft SQL server[C]//Proceedings of the 23rd International Conference on Very Large Data Bases, Athens, Aug 25-29, 1997. New York: ACM, 1997: 146-155. [60] Chaudhuri S, Narasayya V. AutoAdmin “what-if” index analysis utility[C]//Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data. New York: ACM, 1998: 367-378. [61] Chaudhuri S, Narasayya V. Self-tuning database systems: a decade of progress[C]//Proceedings of the 33rd International Conference on Very Large Data Bases, Austria, Sep 23-27, 2007. New York: ACM, 2007: 3-14. [62] Ding B, Das S, Marcus R, et al. Ai meets ai: leveraging query executions to improve index recommendations[C]//Proceedings of the 2019 International Conference on Management of Data, Amsterdam, Jun 30-Jul 5, 2019. New York: ACM, 2019: 1241-1258. [63] Vu T. Deep query optimization[C]//Proceedings of the 2019 International Conference on Management of Data, Amsterdam, Jun 30-Jul 5, 2019. New York: ACM, 2019: 1856-1858. [64] Qiu T, Wang B, Shu Z W, et al. Intelligent index tuning approach for relational databases[J]. Journal of Software, 2020, 31(3): 634-647. 邱涛, 王斌, 舒昭维, 等. 一种面向关系数据库的智能索引调优方法[J]. 软件学报, 2020, 31(3): 634-647. [65] Sun J, Li G L. An end-to-end query optimization engine based on deep learning[J]. Journal of Chifeng University(Natural Science Edition), 2019, 35(1): 1-5. 孙佶, 李国良. 一个端到端的基于深度学习的查询优化引擎[J]. 赤峰学院学报(自然科学版), 2019, 35(1): 1-5. [66] Marcus R, Negi P, Mao H, et al. Neo: a learned query optimizer[J]. arXiv:1904.03711, 2019. [67] Mou L, Li G, Zhang L, et al. Convolutional neural networks over tree structures for programming language processing[C]//Proceedings of the 30th AAAI Conference on Artificial Intelligence, Phoenix, Feb 12-17, 2016. Menlo Park: AAAI, 2016: 1287-1293. [68] Dechter R, Pearl J. Generalized best-first search strategies and the optimality of A[J]. Journal of the ACM, 1985, 32(3): 505-536. [69] Bellman R. A Markovian decision process[J]. Indiana University Mathematics Journal, 1957, 6(4): 15. [70] Xu L, Cole R L, Ting D. Learning to optimize federated queries[C]//Proceedings of the 2nd International Workshop on Exploiting Artificial Intelligence Techniques for Data Management, Amsterdam, Jul 5, 2019. New York: ACM, 2019: 1-7. [71] Cao W. Survey on automatic physical database design[J]. Application Research of Computers, 2012(5): 12-18. 曹巍. 数据库物理自调优研究技术综述[J]. 计算机应用研究, 2012(5): 12-18. [72] Lu J, Chen Y, Herodotou H, et al. Speedup your analytics: automatic parameter tuning for databases and big data systems[J]. Proceedings of the VLDB Endowment, 2019, 12(12): 1970-1973. [73] Rodd S F, Kulkarni U P. Adaptive tuning algorithm for performance tuning of database management system[J]. arXiv:1005.0972, 2010. [74] Zheng C, Ding Z, Hu J. Self-tuning performance of database systems with neural network[C]//Proceedings of the 2014 International Conference on Intelligent Computing, Taiyuan, Aug 3-6, 2014. Berlin, Heidelberg: Springer, 2014: 1-12. [75] Van Aken D, Pavlo A, Gordon G J, et al. Automatic database management system tuning through large-scale machine learning[C]//Proceedings of the 2017 ACM International Conference on Management of Data, Chicago, May 14-19, 2017. New York: ACM, 2017: 1009-1024. [76] Scikit-learn documentation—factor analysis[EB/OL]. [2019-11-28]. http://scikit-learn.org/stable/modules/generated/sklearn. decomposition.FactorAnalysis.html. [77] Scikit-learn documentation—KMeans[EB/OL]. [2019-11-28]. http://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html. [78] Tibshirani R. Regression shrinkage and selection via the lasso[J]. Journal of the Royal Statistical Society: Series B (Methodological), 1996, 58(1): 267-288. [79] Rasmussen C E. Gaussian processes in machine learning[C]//LNCS 3176: Advanced Lectures on Machine Learning. Berlin, Heidelberg: Springer, 2003: 63-71. [80] Zhang J, Liu Y, Zhou K, et al. An end-to-end automatic cloud database tuning system using deep reinforcement learning[C]//Proceedings of the 2019 International Conference on Management of Data, Amsterdam, Jun 30-Jul 5, 2019. New York: ACM, 2019: 415-432. [81] Lillicrap T P, Hunt J J, Pritzel A, et al. Continuous control with deep reinforcement learning[J]. arXiv:1509.02971, 2015. [82] Li G, Zhou X, Li S, et al. QTune: a query-aware database tuning system with deep reinforcement learning[J]. Procee-dings of the VLDB Endowment, 2019, 12(12): 2118-2130. [83] Ma L, Van Aken D, Hefny A, et al. Query-based workload forecasting for self-driving database management systems[C]//Proceedings of the 2018 International Conference on Management of Data, Houston, Jun 10-15, 2018. New York: ACM, 2018: 631-645. [84] Li G L, Zhou X H. Xuan Yuan: an AI-native database systems[J]. Journal of Software, 2020, 31(3): 831-844. 李国良, 周煊赫. 轩辕: AI原生数据库系统[J]. 软件学报, 2020, 31(3): 831-844. [85] Oracle. Oracle??s autonomous database[EB/OL]. [2020-02-20]. https://www.oracle.com/database/autonomous-database. html. [86] Huawei. Huawei database & storage product launch[EB/OL]. [2020-02-20]. http://e.huawei.com/topic/database-storage-launch2019/cn/. [87] Carnegie Mellon University Database Group. Peloton database management system[EB/OL]. [2020-02-20]. http://pelotondb.org. |
[1] | 王迪聪,白晨帅,邬开俊. 基于深度学习的视频目标检测综述[J]. 计算机科学与探索, 2021, 15(9): 1563-1577. |
[2] | 张晓旭,马志强,刘志强,朱方圆,王春喻. Transformer在语音识别任务中的研究现状与展望[J]. 计算机科学与探索, 2021, 15(9): 1578-1594. |
[3] | 陈璠,彭力. 多层级重叠条纹特征融合的行人重识别[J]. 计算机科学与探索, 2021, 15(9): 1753-1761. |
[4] | 武家伟,孙艳春. 融合知识图谱和深度学习方法的问诊推荐系统[J]. 计算机科学与探索, 2021, 15(8): 1432-1440. |
[5] | 马煜,杜慧敏,毛智礼,张霞. 深度语义分割人群密度检测技术[J]. 计算机科学与探索, 2021, 15(8): 1469-1475. |
[6] | 荣欢,马廷淮. 利用收益预测与策略梯度两阶段众包评论集成[J]. 计算机科学与探索, 2021, 15(8): 1476-1489. |
[7] | 马玉琨,徐姚文,赵欣,徐涛,王泽瑞. 人脸识别系统的活体检测综述[J]. 计算机科学与探索, 2021, 15(7): 1195-1206. |
[8] | 葛轶洲,许翔,杨锁荣,周青,申富饶. 序列数据的数据增强方法综述[J]. 计算机科学与探索, 2021, 15(7): 1207-1219. |
[9] | 方钧婷,谭晓阳. 注意力级联网络的金属表面缺陷检测算法[J]. 计算机科学与探索, 2021, 15(7): 1245-1254. |
[10] | 杨悦,王士同. 随机特征映射的四层神经网络及其增量学习[J]. 计算机科学与探索, 2021, 15(7): 1265-1278. |
[11] | 田萱,丁琪,廖子慧,孙国栋. 基于深度学习的新闻推荐算法研究综述[J]. 计算机科学与探索, 2021, 15(6): 971-998. |
[12] | 能文鹏,陆军,赵彩虹. 基于关系归纳偏置的睡眠分期综述[J]. 计算机科学与探索, 2021, 15(6): 1026-1037. |
[13] | 吕昊远,俞璐,周星宇,邓祥. 半监督深度学习图像分类方法研究综述[J]. 计算机科学与探索, 2021, 15(6): 1038-1048. |
[14] | 马宇,张丽果,杜慧敏,毛智礼. 卷积神经网络的交通标志语义分割[J]. 计算机科学与探索, 2021, 15(6): 1114-1121. |
[15] | 汤凌燕,熊聪聪,王嫄,周宇博,赵子健. 基于深度学习的短文本情感倾向分析综述[J]. 计算机科学与探索, 2021, 15(5): 794-811. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||