面向私有问答系统的检索增强式大模型稳定输出方法

doi:10.3778/j.issn.1673-9418.2406060

摘要/Abstract

摘要： 基于大模型的问答系统受大模型语义不一致性问题的影响，会出现“输出结果不稳定”的现象，从而制约着问答系统的安全性、鲁棒性和可信度，严重影响了用户体验。针对上述问题，提出一种面向私有问答系统的检索增强式大模型稳定输出方法。该方法通过优化提示词，让大模型首先输出num_k个用户查询的同义查询，然后再输出答案；目的是在大模型输出答案时，可以参考已经输出的num_k个同义查询，从而使大模型的输出结果更加稳定。针对开源大模型因指令理解能力弱而出现的“同义查询生成数目不稳定、输出格式无法解析”等问题，提出通过数据蒸馏的方式，利用闭源大模型自动构建了一个开放域上的检索增强式指令数据集，然后在该指令集上对开源大模型进行微调。此外，构建了一个私有问答场景下的评估集以验证该方法的有效性。在上述评估集上的实验结果表明，该方法在一致性指标和效果指标上，均显著优于基线方法。特别地，与基线方法相比，该方法的一致性指标ROUGE-1、ROUGE-2、ROUGE-L和BLEU分别提升了18.9、30.1、24.5和30.6，效果指标正确率提升了17.4%。

关键词: 大模型, 检索增强生成, 大模型稳定性, 问答系统

Abstract: The question-answering systems based on large language models (LLMs) was affected by the issues of semantic inconsistency from LLMs, resulting in an "unstable outputs" phenomenon, which impaired the safety, robustness and credibility of a question-answering system and severely degraded the user experience. To address the above issue, this paper proposed a method of retrieval-augmented LLMs’ stability output for private question-answering systems. This approach optimized the use of prompt words, allowing the LLMs to first output num_k synonymous queries of the user’s query, and then output the final answer. This design aimed to refer to the first generated num_k synonymous queries during the generation of the final answer, thereby making the output of the LLMs more stable. To tackle issues such as unstable generation of equivalent queries and uninterpretable output formats due to the weak instruction comprehension of open-source LLMs, this paper proposed using data distillation to automatically construct an open-domain retrieval-augmented instruction dataset through utilizing a closed-source LLM. Then an open-source LLM was fine-tuned on this instruction dataset. In addition, an evaluation dataset was developed under a private question-answering scenario to validate the effectiveness of the proposed method. Experimental results on the evaluation dataset demonstrated that the proposed method significantly surpassed the baseline method in terms of both consistency and performance metrics. Notably, compared to the baseline method, the proposed method showed substantial improvement on the consistency metrics, with ROUGE-1, ROUGE-2, ROUGE-L, and BLEU scores improved by 18.9, 30.1, 24.5, and 30.6 respectively; as for performance metrics, the accuracy rate increased by 17.4%.

Key words: large language models, retrieval-augmented generation, stability of large language models, question- answering systems

李铂鑫. 面向私有问答系统的检索增强式大模型稳定输出方法[J]. 计算机科学与探索, DOI: 10.3778/j.issn.1673-9418.2406060.

LI Boxin. A method of retrieval-augmented large language models with stable outputs for private question-answering systems[J]. Journal of Frontiers of Computer Science and Technology, DOI: 10.3778/j.issn.1673-9418.2406060.

[1]	许志伟, 李海龙, 李博, 李涛, 王嘉泰, 谢学说, 董泽辉. AIGC大模型测评综述：使能技术、安全隐患和应对[J]. 计算机科学与探索, 2024, 18(9): 2293-2325.
[2]	向小伟, 申艳光, 胡明昊, 闫天伟, 罗威, 罗准辰. 大模型驱动的科技政策法规问答系统研究[J]. 计算机科学与探索, 2024, 18(9): 2349-2360.
[3]	张鹤译, 王鑫, 韩立帆, 李钊, 陈子睿, 陈哲. 大语言模型融合知识图谱的问答系统研究[J]. 计算机科学与探索, 2023, 17(10): 2377-2388.
[4]	彭鐄, 曾维新, 周杰, 唐九阳, 赵翔. 基于图神经网络的实体对齐表示学习方法比较研究[J]. 计算机科学与探索, 2023, 17(10): 2343-2357.
[5]	杨波, 孙晓虎, 党佳怡, 赵海燕, 金芝. 面向医疗问答系统的大语言模型命名实体识别方法[J]. 计算机科学与探索, 2023, 17(10): 2389-2402.
[6]	李冬梅，张琪，王璇，檀稳. 基于浅层句法分析和最大熵的问句语义分析[J]. 计算机科学与探索, 2017, 11(8): 1288-1295.
[7]	毛先领, 李晓明. 问答系统研究综述[J]. 计算机科学与探索, 2012, 6(3): 193-207.

面向私有问答系统的检索增强式大模型稳定输出方法

A method of retrieval-augmented large language models with stable outputs for private question-answering systems

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 7

编辑推荐

Metrics