计算机科学与探索 ›› 2025, Vol. 19 ›› Issue (8): 2161-2173.DOI: 10.3778/j.issn.1673-9418.2412085

• 人工智能·模式识别 • 上一篇    下一篇

基于大语言模型指令微调的案件信息要素抽取方法研究

王劲滔,孟琪翔,高志霖,卜凡亮   

  1. 中国人民公安大学 信息网络安全学院,北京 100038
  • 出版日期:2025-08-01 发布日期:2025-07-31

Research on Case Information Element Extraction Method Based on Instruction Fine-Tuning of Large Language Models

WANG Jintao, MENG Qixiang, GAO Zhilin, BU Fanliang   

  1. College of Information and Cyber Security, People's Public Security University of China, Beijing 100038, China
  • Online:2025-08-01 Published:2025-07-31

摘要: 当前随着人工智能技术的快速发展,科技兴警战略已成为提升公安工作现代化水平的重要途径。在科技兴警的大背景下,公安机关面临着海量的非结构化案件文本信息处理需求,传统的人工处理方式已难以满足当前的工作要求。大语言模型作为一种新兴的人工智能技术,具备强大的语言理解和生成能力,能够自动从案件文本中抽取涉案人员、时间、地点、案件性质等关键信息要素,为案件分析、证据收集和决策支持提供有力支撑。因此研究基于指令微调大语言模型的案件信息要素抽取方法,以期通过先进的自然语言处理技术提高公安机关在案件信息处理上的效率和准确性,进一步推动公安工作信息化进程。该研究通过高效微调技术LoRA、指令微调、数据增强、情境学习等技术提升大语言模型的信息抽取能力。实验结果表明,该方法在自建的案件文本数据集上取得了显著的性能提升,抽取准确率和召回率均优于传统方法。

关键词: 大语言模型, 信息抽取, 指令微调, 公安业务, 命名实体识别

Abstract: With the rapid development of artificial intelligence technology, the strategy of “technology-driven policing” has become an important way to enhance the modernization level of public security work.  Under the background of technology-driven policing, public security organs are faced with the demand for processing a large amount of unstructured case text information, and the traditional manual processing method can no longer meet the current work requirements. Large language models, as an emerging artificial intelligence technology, have strong language understanding and generation capabilities, and can automatically extract key information elements from case texts, such as involved personnel, time, location, and case nature, providing strong support for case analysis, evidence collection, and decision support. This paper aims to study the method of case information element extraction based on instruction fine-tuning of large language models, in order to improve the efficiency and accuracy of public security organs in case information processing through advanced natural language processing technology, and further promote the informatization process of public security work. The research enhances the information extraction capability of large language models through techniques such as efficient fine-tuning via LoRA, instruction fine-tuning, data augmentation, and in-context learning. Experimental results show that this method achieves significant performance improvement on the self-built case text dataset, with both extraction accuracy and recall being better than traditional methods.

Key words: large language model, information extraction, instruction fine-tuning, police affairs, named entity recognition