计算机科学与探索 ›› 2020, Vol. 14 ›› Issue (7): 1133-1141.DOI: 10.3778/j.issn.1673-9418.1908025

• 数据库技术 • 上一篇    下一篇

自然语言生成多表SQL查询语句技术研究

曹金超,黄滔,陈刚,吴晓凡,陈珂   

  1. 1. 浙江大学 计算机科学与技术学院,杭州 310027
    2. 浙江邦盛科技有限公司,杭州 310012
    3. 浙江大学 浙江省大数据智能计算重点实验室,杭州 310027
    4. 网易(杭州)网络有限公司,杭州 310051
  • 出版日期:2020-07-01 发布日期:2020-08-12

Research on Technology of Generating Multi-table SQL Query Statement by Natural Language

CAO Jinchao, HUANG Tao, CHEN Gang, WU Xiaofan4, CHEN Ke   

  1. 1. College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China
    2. Zhejiang Bangsun Technology Co., Ltd., Hangzhou 310012, China
    3. Key Laboratory of Big Data Intelligent Computing of Zhejiang Province, Zhejiang University, Hangzhou 310027, China
    4. Netease (Hangzhou) Network Co., Ltd., Hangzhou 310051, China
  • Online:2020-07-01 Published:2020-08-12

摘要:

自然语言生成SQL查询不仅是构建智能数据库查询系统的一个重要组成部分,亦是新型供电轨道交通系统混合时态大数据个性化运维的难点之一。目前利用深度学习模型的方法专注于数据库中单表SQL查询生成,无法解决数据库中多表SQL查询生成。针对这个问题,采用一种基于SQL语句模板填充的方法,将序列生成问题转化为多个分类问题,在训练深度学习模型的过程中充分利用SQL子句不同预测成分之间的依赖关系。在FROM子句的多表JOIN路径生成方面,将其建模为斯坦纳树问题,采用一种全局最优的算法来进行求解。在一个开放的文本生成SQL数据集Spider上对模型和算法进行实验验证,实验结果表明该方法能有效地提升多表SQL查询生成的查询匹配准确率。

关键词: 自然语言, SQL查询生成, 多表, 模板填充, 深度学习

Abstract:

SQL (structured query language) query generation from natural language is not only one of the most important parts of constructing intelligent database query system, but also one of the difficulties in the individualized operation and maintenance of hybrid temporal big data in the new power supply rail transit system. At present, the deep learning models almost focus on SQL query generation in a single table, but cannot solve multi-table SQL query generation in database. In order to solve this problem, this paper adopts a method named SQL sketch filling to transform the sequence generation problem into multiple classification problems. In the process of training the deep learning models, this paper makes full use of the dependencies of components in SQL clauses. In the generation of multi-table JOIN path of FROM clause, it is modeled as Steiner Tree problem and solved by a globally optimal algorithm. This method is validated on an open text-to-SQL dataset named Spider. The experimental results show that the model can improve the query-match accuracy of multi-table SQL query generation.

Key words: natural language, SQL (structured query language) query generation, multi-table, sketch filling, deep learning