计算机科学与探索 ›› 2020, Vol. 14 ›› Issue (8): 1315-1326.DOI: 10.3778/j.issn.1673-9418.1908023

• 数据库技术 • 上一篇    下一篇

文档-关系数据查询执行技术研究与实现

马志程,袁海峰,谷洋,刘亚茹,张孝   

  1. 1. 国网甘肃省电力公司电力科学研究院,兰州 730070
    2. 数据工程与知识工程教育部重点实验室(中国人民大学),北京 100872
    3. 中国人民大学 信息学院,北京 100872
  • 出版日期:2020-08-01 发布日期:2020-08-07

Research and Implementation of Document-Relational Data Query Execution Tech-nology

MA Zhicheng, YUAN Haifeng, GU Yang, LIU Yaru, ZHANG Xiao   

  1. 1. State Grid Gansu Electric Power Research Institute, Lanzhou 730070, China
    2. Key Laboratory of Data Engineering and Knowledge Engineering (Renmin University of China), Ministry of Education, Beijing 100872, China
    3. School of Information, Renmin University of China, Beijing 100872, China
  • Online:2020-08-01 Published:2020-08-07

摘要:

随着大数据时代的到来,各类互联网应用产生了丰富的数据类型。将多种多样结构的数据进行一体化存储、查询和组织是新时代下的大数据管理系统的研究热点。对关系数据库和NoSQL文档数据库加以统一管理,将支持结构化数据和半结构化数据的两种不同的数据库引擎集成在大数据管理系统中,实现了查询引擎ENTIA来执行查询处理。基于全局视图对用户提供统一的查询接口,终端用户无需关心数据的类型、结构以及物理存储位置,只需根据业务需求向ENTIA发出请求即可。进行了大量的前期实验,基于启发式规则进行查询优化,单个查询被重写为可以并行执行的多个查询子任务,将计算推向合适的数据库引擎,充分利用系统计算资源,大大提高了系统的查询性能。以关系数据库PostgreSQL和文档数据库MongoDB两个对等引擎为代表,实现了ENTIA对多数据类型的查询能力以及查询优化能力。通过功能符合实验测试了ENTIA能够正确地执行混合查询,以多组性能对比实验证明了优化方法的有效性。

关键词: 关系数据库, 文档数据库, 混合查询, 查询优化

Abstract:

With the arrival of the era of big data, Internet applications have produced abundant data types. Integ-rating storage, query and organization of data with various structures is a research hotspot of large data management system. The relational database and NoSQL document database are performed unified management. Two different database engines supporting structured and semi-structured data are integrated into the large data management system. The query engine ENTIA is implemented to perform query processing. Based on the global view, a unified query interface is provided to users. The end user does not need to care about the type and structure of the data, and the physical storage location. It only needs to send a request to ENTIA according to the business requirements. A large number of preliminary experiments are carried out to optimize the query based on heuristic rules. The single query is rewritten into multiple query sub-tasks that can be executed in parallel. The calculation is pushed to the appropriate database engine, which makes full use of the computing resources of the system and greatly improves the query performance of the system. Represented by two peer-to-peer engines of relational database PostgreSQL and document database MongoDB, ENTIA’s query ability for multiple data types and query optimization are realized. ENTIA can correctly execute mixed queries through functional coincidence experiments. The effectiveness of the optimization method is proven by a number of performance comparison experiments.

Key words: relation database, document database, hybrid query, query optimization