Journal of Frontiers of Computer Science and Technology ›› 2025, Vol. 19 ›› Issue (1): 132-140.DOI: 10.3778/j.issn.1673-9418.2406060

• Constructions and Applications of Large Language Models • Previous Articles     Next Articles

Method of Retrieval-Augmented Large Language Models with Stable Outputs for Private Question-Answering Systems

LI Boxin   

  1. 1. Xiaomi AI Lab, Beijing 100085, China
    2. Chinese Information Processing Laboratory, Institute of Software, Chinese Academy of Sciences, Beijing 100190, China
  • Online:2025-01-01 Published:2024-12-31

面向私有问答系统的检索增强式大模型稳定输出方法

李铂鑫   

  1. 1. 小米人工智能实验室,北京 100085
    2. 中国科学院软件研究所 中文信息处理实验室,北京 100190

Abstract: The question-answering systems based on large language models (LLMs) are affected by the issues of semantic inconsistency from LLMs, resulting in an “unstable outputs” phenomenon, which impairs the safety, robustness and credibility of a question-answering system and severely degrades the user experience. To address the above issue, this paper proposes a method of retrieval-augmented LLMs’ stability output for private question-answering systems. This approach optimizes the use of prompt words, allowing the LLMs to first output num_k synonymous queries of the user’s query, and then output the final answer. This design aims to refer to the first generated num_k synonymous queries during the generation of the final answer, thereby making the output of the LLMs more stable. To tackle issues such as unstable generation of equivalent queries and uninterpretable output formats due to the weak instruction comprehension of open-source LLMs, this paper uses data distillation to automatically construct an open-domain retrieval-augmented instruction dataset through utilizing a closed-source LLM. Then an open-source LLM is fine-tuned on this instruction dataset. In addition, an evaluation dataset is developed under a private question-answering scenario to validate the effectiveness of the proposed method. Experimental results on the evaluation dataset demonstrate that the proposed method significantly surpasses the baseline method in terms of both consistency and performance metrics. Compared with the baseline method, the proposed method shows substantial improvement on the consistency metrics, with ROUGE-1, ROUGE-2, ROUGE-L, and BLEU scores improved by 18.9, 30.1, 24.5, and 30.6 percentage points respectively; as for performance metrics, the accuracy is increased by 17.4 percentage points.

Key words: large language models, retrieval-augmented generation, stability of large language models, question-answering systems

摘要: 基于大模型的问答系统受大模型语义不一致性问题的影响,会出现“输出结果不稳定”的现象,从而制约着问答系统的安全性、鲁棒性和可信度,严重影响了用户体验。针对上述问题,提出一种面向私有问答系统的检索增强式大模型稳定输出方法。该方法通过优化提示词,让大模型首先输出num_k个用户查询的同义查询,然后输出答案;目的是在大模型输出答案时,可以参考已经输出的num_k个同义查询,从而使大模型的输出结果更加稳定。针对开源大模型因指令理解能力弱而出现的“同义查询生成数目不稳定、输出格式无法解析”等问题,提出通过数据蒸馏的方式,利用闭源大模型自动构建了一个开放域上的检索增强式指令数据集,在该指令集上对开源大模型进行微调。构建了一个私有问答场景下的评估集以验证该方法的有效性。在上述评估集上的实验结果表明,该方法在一致性指标和效果指标上,均显著优于基线方法。与基线方法相比,该方法的一致性指标ROUGE-1、ROUGE-2、ROUGE-L和BLEU分别提升了18.9、30.1、24.5和30.6个百分点,效果指标正确率提升了17.4个百分点。

关键词: 大模型, 检索增强生成, 大模型稳定性, 问答系统