计算机科学与探索 ›› 2025, Vol. 19 ›› Issue (12): 3353-3367.DOI: 10.3778/j.issn.1673-9418.2507017

• 人工智能·模式识别 • 上一篇    下一篇

面向审计大模型微调的不平衡指令筛选策略

黄佳佳,朱浩然,姜茂伟,陈勇,徐超   

  1. 1. 南京审计大学 计算机学院,南京 211815
    2. 南京审计大学 江苏省现代智能审计一体化应用技术工程研究中心,南京 211815
  • 出版日期:2025-12-01 发布日期:2025-12-01

Imbalanced Instruction Filtering Strategy for Fine-Tuning Audit Large Language Models

HUANG Jiajia, ZHU Haoran, JIANG Maowei, CHEN Yong, XU Chao   

  1. 1. School of Computer Science, Nanjing Audit University, Nanjing 211815, China  
    2. Jiangsu Provincial Engineering Research Center for Integrated Application Technology of Modern Intelligent Auditing, Nanjing Audit University, Nanjing 211815, China
  • Online:2025-12-01 Published:2025-12-01

摘要: 在面向监管类(如法律咨询、审计判断)的垂直领域大语言模型指令微调中,多任务指令微调数据集中存在高低资源任务指令数据不平衡的问题。现有的指令筛选策略往往忽视了任务之间的协同效应和领域特定要求。为此,提出一种分阶段不平衡指令筛选策略(IIFS),用于从不平衡的多任务指令微调数据集中选择高质量的指令子集。IIFS通过指令冗余性、微调必要性和任务相关性来系统地评估高资源任务指令数据。通过动态聚类过滤冗余数据。使用文本相似度评估指令数据在目标模型上的响应效果以评估该指令微调的必要性。量化高资源任务指令数据与核心任务的语义关系衡量任务相关性。综合质量分与聚类规模构建一种自适应动态采样机制获取一批高质量指令微调数据子集以进行模型微调。基于IIFS采样策略将高低资源任务指令数据比例从11.5∶1降至2.8∶1,冗余率降低了75.61%,在保持领域适应性的同时缓解了数据不平衡性。在评估数据集上的实验表明,基于IIFS微调的大模型比基于完整指令集微调的大模型整体性能提升3.57个百分点;特别的,在审计案例分类任务上F1-score较完整指令集提升4.84个百分点。这项工作为垂直领域的工业规模大模型指令微调提供了一种经济高效的自动化解决方案。

关键词: 大语言模型, 多任务指令微调, 不平衡指令集, 指令筛选策略

Abstract: In the instruction fine-tuning of vertical large language models (LLMs) for regulatory applications (such as legal consultation and audit judgment), multi-task instruction fine-tuning of large language models faces the challenge of imbalanced instruction data between high-resource and low-resource tasks within multi-task instruction datasets. Existing instruction filtering strategies often neglect task synergies and domain-specific requirements. To address these problems, this paper proposes a phased imbalanced instruction filtering strategy (IIFS) for selecting a high-quality instruction subset from an imbalanced instruction dataset for model fine-tuning. IIFS systematically evaluates high-resource task instruction data based on instruction redundancy, fine-tuning necessity, and task relevance. Firstly, it filters redundant data via dynamic clustering. Secondly, the response from the target model to the instruction data is evaluated using text similarity to determine the necessity of this instruction fine-tuning. Thirdly, it quantifies the semantic relationship between high-resource task instructions and core tasks to measure task relevance. Finally, an adaptive dynamic sampling mechanism, integrating quality scores and cluster size, selects a high-quality instruction fine-tuning data subset for fine-tuning. Leveraging the IIFS sampling strategy, the high-to-low-resource task data ratio is reduced from 11.5∶1 to 2.8∶1, with redundancy rate decreased by 75.61%. This effectively mitigates data imbalance while preserving domain adaptability. Experiments on evaluation datasets demonstrate that LLMs fine-tuned by using the IIFS-selected subset achieve an overall performance improvement of 3.57 percentage points compared with models fine-tuned on the full instruction set. Notably, an increase of 4.84 percentage points  in F1-score is observed for the audit case classification task. This work provides an economical and efficient automated solution for industrial-scale LLM instruction fine-tuning in vertical domains.

Key words: large language models, multi-task instruction fine-tuning, imbalanced instruction dataset, instruction filtering strategy