Journal of Frontiers of Computer Science and Technology ›› 2022, Vol. 16 ›› Issue (11): 2487-2504.DOI: 10.3778/j.issn.1673-9418.2204089

• Surveys and Frontiers • Previous Articles     Next Articles

Survey of Deep Learning Table-to-Text Generation

HU Kang1,2, XI Xuefeng1,2,3,+(), CUI Zhiming1,2,3, ZHOU Yueyao1,2, QIU Yajin1,2   

  1. 1. School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, Jiangsu 215000, China
    2. Suzhou Key Laboratory of Virtual Reality Intelligent Interaction and Application Technology, Suzhou, Jiangsu 215000, China
    3. Suzhou Smart City Research Institute, Suzhou, Jiangsu 215000, China
  • Received:2022-04-06 Revised:2022-05-30 Online:2022-11-01 Published:2022-11-16
  • About author:HU Kang, born in 1998, M.S. candidate. His research interests include natural language processing and text generation.
    XI Xuefeng, born in 1978, Ph.D., associate professor. His research interests include natural language processing, machine learning and software engineering.
    CUI Zhiming, born in 1961, Ph.D., professor. His research interests include data mining and machine learning.
    ZHOU Yueyao, born in 1997, M.S. candidate. His research interest is natural language processing.
    QIU Yajin, born in 1995, M.S. candidate. His research interest is natural language processing.
  • Supported by:
    National Natural Science Foundation of China(61876217);National Natural Science Foundation of China(62176175);High-Level Talent Project of Jiangsu Province(XYDXX-086);Science and Technology Development Project of Suzhou(SGC2021078)

深度学习的表格到文本生成研究综述

胡康1,2, 奚雪峰1,2,3,+(), 崔志明1,2,3, 周悦尧1,2, 仇亚进1,2   

  1. 1.苏州科技大学 电子与信息工程学院,江苏 苏州 215000
    2.苏州市虚拟现实智能交互及应用重点实验室,江苏 苏州 215000
    3.苏州智慧城市研究院,江苏 苏州 215000
  • 通讯作者: + E-mail: xfxi2009@qq.com
  • 作者简介:胡康(1998—),男,四川泸州人,硕士研究生,主要研究方向为自然语言处理、文本生成。
    奚雪峰(1978—),男,江苏苏州人,博士,副教授,主要研究方向为自然语言处理、机器学习、软件工程。
    崔志明(1961—),男,上海人,博士,教授,主要研究方向为数据挖掘、机器学习。
    周悦尧(1997—),男,江苏苏州人,硕士研究生,主要研究方向为自然语言处理。
    仇亚进(1995—),男,江苏淮安人,硕士研究生,主要研究方向为自然语言处理。
  • 基金资助:
    国家自然科学基金(61876217);国家自然科学基金(62176175);江苏省“六大人才高峰”高层次人才项目(XYDXX-086);苏州市科技计划项目(SGC2021078)

Abstract:

Text generation is a hot field in natural language processing. With the increasing capability of information collection, more and more structured data, such as tables, are collected. How to solve the problem of information overload, understand the table meaning and describe the table content is an important problem of artificial intelli-gence, so the task of table-to-text generation appears. Table-to-text generation refers to the language model input table data generated after the corresponding text description of the table. The text description generated by the model should express the information of the table smoothly and not deviate from the fact of the table. Firstly, this paper describes and defines the task background from table-to-text generation in detail, analyzes the main difficulties of the task, and introduces the main research methods. There are two major issues on table-to-text generation: what to describe and how to describe it. This paper summarizes the methods proposed by different researchers to solve these two problems, and summarizes the characteristics, advantages and disadvantages of the proposed models. The performance of these excellent models on the main dataset is compared and analyzed. At the same time, the models are classified according to the model type, and the horizontal comparative analysis is carried out. This paper also introduces the common evaluation methods in the field of table-to-text generation, and summaries the characte-ristics, advantages and disadvantages of different evaluation methods. Finally, this paper prospects the future development trend of table-to-text generation task.

Key words: natural language processing, text generation, structured data, table-to-text generation

摘要:

文本生成是自然语言处理的热门领域,随着信息收集能力的不断增长,人们收集到越来越多的结构化数据,如表格。如何解决信息过载问题,理解表格含义并描述表格内容是人工智能面临的重要问题,因此有了表格到文本生成任务。表格到文本生成是指语言模型输入表格数据后生成表格的对应文本描述。模型生成的文本描述应该语句流畅,充分表达表格信息且不能偏离表格事实。描述了表格到文本生成任务背景并做出了详细定义,分析了当前任务主要难点并介绍了主流研究方法。表格到文本生成共有两大问题:描述什么,如何描述。梳理了不同研究人员针对这两大问题所提出的解决方法,同时总结了所提出模型的特点、优势以及劣势。对比分析了这些优秀模型在主流数据集上的表现,同时根据模型类型进行归类,并进行横向比较分析。介绍了表格到文本生成领域较为通用的评价方法,总结了不同评价方法的特点、优势以及劣势。最后展望了表格到文本生成任务未来发展趋势。

关键词: 自然语言处理, 文本生成, 结构化数据, 表格到文本生成

CLC Number: