计算机科学与探索

• 学术研究 •    下一篇

结合知识蒸馏的中文文本摘要生成方法

崔健,汪永伟,李飞杨,李强,苏北荣,张小健   

  1. 中国人民解放军战略支援部队信息工程大学,郑州 450000

Chinese Text Summarization with Knowledge Distillation

CUI Jian,  WANG Yongwei,  LI Feiyang,  LI Qiang,  SU Beirong,  ZHANG Xiaojian   

  1. College of Graduate Studies, Information Engineering University, Zhengzhou 450000,China

摘要: 文本摘要生成是自然语言处理领域的主要研究方向之一。针对当前中文摘要模型语义提取能力弱、大模型生成质量不稳定、部署资源要求高等问题,提出了结合知识蒸馏的中文文本摘要生成方法。首先,采取多线程调用大模型接口的方式对训练数据进行增强,引入提示工程,对齐摘要质量并生成参考标签;然后,利用蒸馏技术在知识迁移方面的优势,采取离线知识蒸馏方法,将大模型输出作为知识传授给学生网络,提升学生网络摘要准确性和可读性,同时降低训练成本和资源消耗;最后,对学生网络进行改进,使用复制机制缓解未登录词问题,并对输出结果进行双向相似度计算以优化损失函数,进一步提升模型稳定性。在NLPCC2017数据集上的实验表明,所提方法在ROUGE评价指标上的综合性能优于现有主流的摘要生成方法。在摘要生成质量方面,提出的方法在中文摘要过程中提高了准确率,提升了流畅度;在模型部署要求方面,所提方法具备较小参数量,能够满足轻量级低开销离线部署。

关键词: 文本摘要, 知识蒸馏, 提示工程, 大语言模型

Abstract: Text summary generation is one of the main research directions in the field of natural language processing. In response to the current issues such as weak semantic extraction capabilities of Chinese abstract models, unstable generation quality of large models, and high requirements for deployment resources, a Chinese text summarization generation method incorporating knowledge distillation is proposed. Firstly, a multi-threaded approach is employed to invoke the large model interface for augmenting the training data, with the introduction of prompt engineering to align the summary quality and generate reference labels. Then, utilizing the advantages of knowledge distillation in knowledge transfer, the output of the large model is used as knowledge to teach the student model, enhancing the accuracy and readability of the student model's summary, while reducing training costs and resource consumption. Finally, the student model is improved by incorporating a copy mechanism to mitigate the out-of-vocabulary (OOV) word issue, and bidirectional similarity calculations are performed on the output results to optimize the loss function, further enhancing model stability. Experiments on the NLPCC2017 dataset demonstrate that the proposed method outperforms existing mainstream summarization generation methods in terms of comprehensive performance on the ROUGE evaluation metric. In terms of summary generation quality, the proposed method has improved accuracy and fluidity during the Chinese summarization process; in terms of model deployment requirements, the proposed method features a smaller parameter size, capable of meeting the needs for lightweight, low-overhead offline deployment.

Key words: text summarization, knowledge distillation, prompt engineering, large language model