计算机科学与探索 ›› 2024, Vol. 18 ›› Issue (9): 2349-2360.DOI: 10.3778/j.issn.1673-9418.2406023

• 垂直领域大模型构建与应用专题 • 上一篇    下一篇

大模型驱动的科技政策法规问答系统研究

向小伟,申艳光,胡明昊,闫天伟,罗威,罗准辰   

  1. 1. 河北工程大学 信息与电气工程学院,河北 邯郸 056038
    2. 军事科学院 军事科学信息研究中心,北京 100142
    3. 国防科技大学 计算机学院,长沙 410037
  • 出版日期:2024-09-01 发布日期:2024-09-01

Research on Science and Technology Policy and Regulation Q&A System Driven by Large Models

XIANG Xiaowei, SHEN Yanguang, HU Minghao, YAN Tianwei, LUO Wei, LUO Zhunchen   

  1. 1. School of Information and Electrical Engineering, Hebei University of Engineering, Handan, Hebei 056038, China
    2. Military Science Information Research Center, Academy of Military Sciences, Beijing 100142, China
    3. College of Computer, National University of Defense Technology, Changsha 410037, China
  • Online:2024-09-01 Published:2024-09-01

摘要: 科技政策法规问答系统(Q&A)在帮助公众理解和应用科技法规方面发挥关键作用。大语言模型(LLM)可以显著提升科技政策法规问答系统的准确性和效率。然而,基于大语言模型的科技政策法规问答系统仍然存在以下问题:缺乏大规模高质量的科技政策法规问答数据集,且现有自动构建大规模数据集的方法在引用和整合政策法规知识方面存在不足;问答系统在处理科技政策法规问题时,专业性、准确性不足且模型知识更新滞后。为解决这些问题,提出了一种检索增强自提示的问答数据集构建方法,并构建了一个大规模高质量的科技政策法规问答数据集;同时,构建了科技政策法规问答系统,该系统结合了经过低秩自适应(LoRA)微调技术优化的大语言模型与科技政策法规知识库,并运用提示学习技术,来引导系统生成准确的答案。实验结果显示,构建的问答数据集在引用和整合科技政策法规知识方面,比传统方法构建的问答数据集有显著提升;相较于通用大语言模型驱动的问答系统,该问答系统在各项指标上也有明显提高。

关键词: 大语言模型, 问答数据集, 低秩自适应微调, 提示学习, 科技政策法规, 问答系统

Abstract: A question-and-answer (Q&A) system for science and technology (S&T) policies and regulations plays a critical role in helping the public understand and apply these regulations. Large language models (LLM) can significantly enhance the accuracy and efficiency of such systems. However, current LLM-based S&T policy and regulation Q&A systems face several challenges: the lack of large-scale, high-quality datasets, insufficient methods for auto-matically constructing datasets with accurate policy and regulation knowledge integration, and issues with the professional accuracy and timeliness of the models’ knowledge updates. To address these challenges, this paper proposes a retrieval-augmented self-prompting method for constructing a high-quality, large-scale S&T policy and regulation Q&A dataset. Additionally, a Q&A system is developed, which combines an LLM optimized by low-rank adaptation (LoRA) techniques with an S&T policy and regulation knowledge base, and employs prompt learning techniques to guide the system in generating accurate answers. Experimental results demonstrate that the constructed Q&A dataset significantly improves the integration of policy and regulation knowledge compared with traditional methods. Furthermore, the proposed Q&A system outperforms general LLM-driven systems across various metrics, highlighting its enhanced performance in the domain of S&T policies and regulations.

Key words: large language model, question-and-answer dataset, low-rank adaptive fine-tuning, prompt learning, science and technology policy and regulation, question-and-answer system