
计算机科学与探索 ›› 2025, Vol. 19 ›› Issue (12): 3131-3152.DOI: 10.3778/j.issn.1673-9418.2502027
巴泽智,张辉,谢铮涵,左晓栋,侯健玮
出版日期:2025-12-01
发布日期:2025-12-01
BA Zezhi, ZHANG Hui, XIE Zhenghan, ZUO Xiaodong, HOU Jianwei
Online:2025-12-01
Published:2025-12-01
摘要: 基于提示学习的提示工程对于提升大语言模型的技术可及性、加速其推广扩散与应用开发至关重要。传统的提示工程过度依赖于提示词设计者的领域知识和使用经验,且不易满足提示空间较大的任务;相比之下,自动化提示工程能够自动化或半自动化地生成或优化提示词,以探索大规模的提示词组合,并通过自动优化技术提升提示词生成的稳定性。然而目前仍缺乏对自动化提示研究的系统性综述,因此,及时跟进该领域的最新研究成果,详细梳理并评述自动化提示工程的实现形式,提出自动化提示工程的未来研究方向。依据自动化提示工程实现形式在逻辑推理和效能导向两个维度的取舍上,将其分为基于思维链的自动化提示工程、基于类机器学习模型的自动化提示工程、基于进化算法的自动化提示工程以及使用预训练包的即插即用系统。全面评估自动化提示工程技术,构建其工作原理的理论解释框架,评估各类实现形式的适用性与局限性。最后,展望多模态大模型、强推理模型以及智能体中自动化提示工程的发展趋势。
巴泽智, 张辉, 谢铮涵, 左晓栋, 侯健玮. 大语言模型自动化提示工程技术研究综述[J]. 计算机科学与探索, 2025, 19(12): 3131-3152.
BA Zezhi, ZHANG Hui, XIE Zhenghan, ZUO Xiaodong, HOU Jianwei. Automatic Prompt Engineering Technology for Large Language Models: a Survey[J]. Journal of Frontiers of Computer Science and Technology, 2025, 19(12): 3131-3152.
| [1] DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019: 4171-4186. [2] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems 30, 2017: 5998-6008. [3] NAVEED H, KHAN A U, QIU S, et al. A comprehensive overview of large language models[EB/OL]. [2024-12-06]. https://arxiv.org/abs/2307.06435. [4] 王耀祖, 李擎, 戴张杰, 等. 大语言模型研究现状与趋势[J]. 工程科学学报, 2024, 46(8): 1411-1425. WANG Y Z, LI Q, DAI Z J, et al. Current status and trends in large language modeling research[J]. Chinese Journal of Engineering, 2024, 46(8): 1411-1425. [5] ZHAO W X, ZHOU K, LI J, et al. A survey of large language models[EB/OL]. [2024-12-06]. https://arxiv.org/abs/2303.18223. [6] 孙斐. 大模型提示词工程的进展、综述及展望[J]. 计算机应用文摘, 2024, 40(18): 179-182. SUN F. Progress, review, and prospects of prompt engineering for large models[J]. Chinese Journal of Computer Application, 2024, 40(18): 179-182. [7] 王东清, 芦飞, 张炳会, 等. 大语言模型中提示词工程综述[J]. 计算机系统应用, 2025, 34(1): 1-10. WANG D Q, LU F, ZHANG B H, et al. Survey on prompt engineering in large language model[J]. Computer Systems and Applications, 2025, 34(1): 1-10. [8] QIU R Z, XU Z, BAO W X, et al. Ask, and it shall be given: on the turing completeness of prompting[EB/OL]. [2024-12-06]. https://arxiv.org/abs/2411.01992. [9] ZHOU Y C, MURESANU A I, HAN Z W, et al. Large language models are human-level prompt engineers[EB/OL]. [2024-12-06]. https://arxiv.org/abs/2211.01910. [10] SAHOO P, SINGH A K, SAHA S, et al. A systematic survey of prompt engineering in large language models: techniques and applications[EB/OL]. [2024-12-06]. https://arxiv.org/abs/ 2402.07927. [11] RAE J W, BORGEAUD S, CAI T, et al. Scaling language models: methods, analysis & insights from training gopher[EB/OL]. [2024-12-06]. https://arxiv.org/abs/2112.11446. [12] ROBERTS J. How powerful are decoder-only transformer neural models?[C]//Proceedings of the 2024 International Joint Conference on Neural Networks. Piscataway: IEEE, 2024: 1-8. [13] ARORA S, NARAYAN A, CHEN M F, et al. Ask me anything: a simple strategy for prompting language models[EB/OL]. [2024-12-06]. https://arxiv.org/abs/2210.02441. [14] BROWN T B, MANN B, RYDER N, et al. Language models are few-shot learners[C]//Proceedings of the 34th International Conference on Neural Information Processing Systems, 2020: 1877-1901. [15] RADFORD A, WU J, CHILD R, et al. Language models are unsupervised multitask learners[EB/OL]. [2024-12-07]. https:// openai.com/research/language-models-are-unsupervised-multitask-learners. [16] REYNOLDS L, MCDONELL K. Prompt programming for large language models: beyond the few-shot paradigm[C]//Proceedings of the Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems. New York: ACM, 2021: 1-7. [17] KEPEL D, VALOGIANNI K. Autonomous prompt engineering in large language models[EB/OL]. [2024-12-07]. https://arxiv.org/abs/2407.11000. [18] LESTER B, AL-RFOU R, CONSTANT N. The power of scale for parameter-efficient prompt tuning[EB/OL]. [2024-12-07]. https://arxiv.org/abs/2104.08691. [19] WEI J, WANG X Z, SCHUURMANS D, et al. Chain-of-thought prompting elicits reasoning in large language models[EB/OL]. [2024-12-07]. https://arxiv.org/abs/2201.11903. [20] LIU P F, YUAN W Z, FU J L, et al. Pre-train, prompt, and predict: a systematic survey of prompting methods in natural language processing[J]. ACM Computing Surveys, 2023, 55(9): 1-35. [21] BARBIERATO E, GATTI A. The challenges of machine learning: a critical review[J]. Electronics, 2024, 13(2): 416. [22] KOJIMA T, GU S S, REID M, et al. Large language models are zero-shot reasoners[EB/OL]. [2024-12-07]. https://arxiv.org/abs/2205.11916. [23] LI X L, LIANG P. Prefix-tuning: optimizing continuous prompts for generation[EB/OL]. [2024-12-07]. https://arxiv.org/abs/2101.00190. [24] KATOCH S, CHAUHAN S S, KUMAR V. A review on genetic algorithm: past, present, and future[J]. Multimedia Tools and Applications, 2021, 80: 8091-8126. [25] ZHENG M, LIANG H, YANG F, et al. PAS: data-efficient plug-and-play prompt augmentation system[EB/OL]. [2024-12-07]. https://arxiv.org/abs/2407.06027. [26] WANG X Z, WEI J, SCHUURMANS D, et al. Self-consistency improves chain of thought reasoning in language models [EB/OL]. [2024-12-07]. https://arxiv.org/abs/2203.11171. [27] YAO S Y, YU D, ZHAO J, et al. Tree of thoughts: deliberate problem solving with large language models[EB/OL]. [2024-12-07]. https://arxiv.org/abs/2305.10601. [28] BESTA M, BLACH N, KUBICEK A, et al. Graph of thoughts: solving elaborate problems with large language models[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2024, 38(16): 17682-17690. [29] WEN Y L, WANG Z F, SUN J M. MindMap: knowledge graph prompting Sparks graph of thoughts in large language models[EB/OL]. [2024-12-07]. https://arxiv.org/abs/2308.09729. [30] ZHANG Y F, YUAN Y, YAO A C. On the diagram of thought[EB/OL]. [2024-12-07]. https://arxiv.org/abs/2409.10038. [31] YANG L, YU Z C, ZHANG T J, et al. Buffer of thoughts: thought-augmented reasoning with large language models[EB/OL]. [2024-12-07]. https://arxiv.org/abs/2406.04271. [32] SEVINC A, GUMUS A. AutoReason: automatic few-shot reasoning decomposition[EB/OL]. [2024-12-11]. https://arxiv.org/abs/2412.06975. [33] CHU Z, CHEN J C, CHEN Q L, et al. Navigate through Enigmatic Labyrinth a survey of chain of thought reasoning: advances, frontiers and future[EB/OL]. [2024-12-11]. https://arxiv.org/abs/2309.15402. [34] COBBE K, KOSARAJU V, BAVARIAN M, et al. Training verifiers to solve math word problems[EB/OL]. [2024-12-11]. https://arxiv.org/abs/2110.14168. [35] ZHANG Z S, ZHANG A, LI M, et al. Automatic chain of thought prompting in large language models[EB/OL]. [2024-12-11]. https://arxiv.org/abs/2210.03493. [36] DIAO S Z, WANG P C, LIN Y, et al. Active prompting with chain-of-thought for large language models[EB/OL]. [2024-12-11]. https://arxiv.org/abs/2302.12246. [37] AMINI A, GABRIEL S, LIN P, et al. MathQA: towards interpretable math word problem solving with operation-based formalisms[EB/OL]. [2024-12-11]. https://arxiv.org/abs/1905.13319. [38] YASUNAGA M, CHEN X Y, LI Y J, et al. Large language models as analogical reasoners[EB/OL]. [2024-12-11]. https://arxiv.org/abs/2310.01714. [39] SHUM K, DIAO S Z, ZHANG T. Automatic prompt augmentation and selection with chain-of-thought from labeled data[EB/OL]. [2024-12-11]. https://arxiv.org/abs/2302.12822. [40] GAO Y, CHEN S F, LU X. Research on reinforcement learning technology: a review[J]. Acta Automatica Sinica, 2004, 30(1): 86-100. [41] DENG M K, WANG J Y, HSIEH C P, et al. RLPrompt: optimizing discrete text prompts with reinforcement learning[EB/OL]. [2024-12-11]. https://arxiv.org/abs/2205.12548. [42] ZHANG T J, WANG X Z, ZHOU D, et al. TEMPERA: test-time prompting via reinforcement learning[EB/OL]. [2024-12-11]. https://arxiv.org/abs/2211.11890. [43] PRYZANT R, ITER D, LI J, et al. Automatic prompt optimization with “gradient descent” and beam search[EB/OL]. [2024-12-11]. https://arxiv.org/abs/2305.03495. [44] CHEN W Z, KOENIG S, DILKINA B. RePrompt: planning by automatic prompt engineering for large language models agents[EB/OL]. [2024-12-11]. https://arxiv.org/abs/2406.11132. [45] WALLACE E, XIAO K, LEIKE R, et al. The instruction hierarchy: training LLMs to prioritize privileged instructions[EB/OL]. [2024-12-13]. https://arxiv.org/abs/2404.13208. [46] GAO T Y, FISCH A, CHEN D Q. Making pre-trained language models better few-shot learners[EB/OL]. [2024-12-13]. https://arxiv.org/abs/2012.15723. [47] PEREZ E, KIELA D, CHO K. True few-shot learning with language models[C]//Advances in Neural Information Processing Systems 34, 2021:11054-11070. [48] MA R T, WANG X L, ZHOU X, et al. Are large language models good prompt optimizers?[EB/OL]. [2024-12-13]. https://arxiv.org/abs/2402.02101. [49] LIU S C, CHEN C S, QU X H, et al. Large language models as evolutionary optimizers[C]//Proceedings of the 2024 IEEE Congress on Evolutionary Computation. Piscataway: IEEE, 2024: 1-8. [50] YE Q Y, AXMED M, PRYZANT R, et al. Prompt engineering a prompt engineer[EB/OL]. [2024-12-13]. https://arxiv.org/abs/2311.05661. [51] DO V T, HOANG V K, NGUYEN D H, et al. Automatic prompt selection for large language models[EB/OL]. [2024-12-13]. https://arxiv.org/abs/2404.02717. [52] LANGE R, TIAN Y T, TANG Y J. Large language models as evolution strategies[C]//Proceedings of the Genetic and Evolutionary Computation Conference Companion. New York: ACM, 2024: 579-582. [53] FERNANDO C, BANARSE D, MICHALEWSKI H, et al. Promptbreeder: self-referential self-improvement via prompt evolution[EB/OL]. [2024-12-13]. https://arxiv.org/abs/2309. 16797. [54] HUSSAIN A, RIAZ S, AMJAD M S, et al. Genetic algorithm with a new round-robin based tournament selection: statistical properties analysis[J]. PLoS One, 2022, 17(9): e0274456. [55] GUO Q Y, WANG R, GUO J L, et al. EvoPrompt: connecting LLMs with evolutionary algorithms yields powerful prompt optimizers[EB/OL]. [2024-12-13]. https://arxiv.org/abs/2309.08532. [56] ELTAEIB T, MAHMOOD A. Differential evolution: a survey and analysis[J]. Applied Sciences, 2018, 8(10): 1945. [57] PRASAD A, HASE P, ZHOU X, et al. GrIPS: gradient-free, edit-based instruction search for prompting large language models[EB/OL]. [2024-12-13]. https://arxiv.org/abs/2203. 07281. [58] HSIEH C J, SI S, YU F X, et al. Automatic engineering of long prompts[EB/OL]. [2024-12-13]. https://arxiv.org/abs/2311.10117. [59] CHENG J L, LIU X, ZHENG K H, et al. Black-box prompt optimization: aligning large language models without model training[EB/OL]. [2024-12-13]. https://arxiv.org/abs/2311. 04155. [60] KEPEL D, VALOGIANNI K. Autonomous prompt engineering in large language models[EB/OL]. [2024-12-13]. https://arxiv.org/abs/2407.11000. [61] HE J, RUNGTA M, KOLECZEK D, et al. Does prompt formatting have any impact on LLM performance?[EB/OL]. [2024-12-13]. https://arxiv.org/abs/2411.10541. [62] DONG H, SU Q, GAO Y, et al. APPL: a prompt programming language for harmonious integration of programs and large language model prompts[EB/OL]. [2024-12-13]. https:// arxiv.org/abs/2406.13161. [63] CHENG J, VAN DURME B. Compressed chain of thought: efficient reasoning through dense representations[EB/OL]. [2024-12-19]. https://arxiv.org/abs/2412.13171. [64] LU Y, BARTOLO M, MOORE A, et al. Fantastically ordered prompts and where to find them: overcoming few-shot prompt order sensitivity[EB/OL]. [2024-12-19]. https://arxiv.org/abs/ 2104.08786. [65] SU H J, KASAI J, WU C H, et al. Selective annotation makes language models better few-shot learners[EB/OL]. [2024-12-19]. https://arxiv.org/abs/2209.01975. [66] CAO T F, WANG C Y, LIU B Y, et al. BeautifulPrompt: towards automatic prompt engineering for text-to-image synthesis[EB/OL]. [2024-12-19]. https://arxiv.org/abs/2311. 06752. [67] ZHANG Z S, ZHANG A, LI M, et al. Multimodal chain-of-thought reasoning in language models[EB/OL]. [2024-12-19]. https://arxiv.org/abs/2302.00923. [68] RADFORD A, KIM J W, HALLACY C, et al. Learning transferable visual models from natural language supervision[C]//Proceedings of the 38th International Conference on Machine Learning, 2021: 8748-8763. [69] LEE Y L, TSAI Y H, CHIU W C, et al. Multimodal prompting with missing modalities for visual recognition[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 14943-14952. [70] ZHUGE M C, GAO D H, FAN D P, et al. Kaleido-BERT: vision-language pre-training on fashion domain[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 12647-12657. [71] ZHOU L W, PALANGI H, ZHANG L, et al. Unified vision-language pre-training for image captioning and VQA[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(7): 13041-13049. [72] XU F L, HAO Q Y, ZONG Z F, et al. Towards large reasoning models: a survey of reinforced reasoning with large language models[EB/OL]. [2025-01-25]. https://arxiv.org/abs/2501.09686. [73] JOSHI S. A comprehensive review of DeepSeek: performance, architecture and capabilities[EB/OL]. [2025-03-29]. https://www.preprints.org/manuscript/202503.1887. [74] SANWAL M. Layered chain-of-thought prompting for multi-agent LLM systems: a comprehensive approach to explainable large language models[EB/OL]. [2025-01-31]. https://arxiv.org/abs/2501.18645. [75] WANG L, ZHANG J S, YANG H, et al. User behavior simulation with large language model-based agents[J]. ACM Transactions on Information Systems, 2025, 43(2): 1-37. |
| [1] | 昂格鲁玛, 王斯日古楞, 斯琴图. 知识图谱补全研究综述[J]. 计算机科学与探索, 2025, 19(9): 2302-2318. |
| [2] | 田崇腾, 刘静, 王晓燕, 李明. 大语言模型GPT在医疗文本中的应用综述[J]. 计算机科学与探索, 2025, 19(8): 2043-2056. |
| [3] | 王劲滔, 孟琪翔, 高志霖, 卜凡亮. 基于大语言模型指令微调的案件信息要素抽取方法研究[J]. 计算机科学与探索, 2025, 19(8): 2161-2173. |
| [4] | 时振普, 吕潇, 董彦如, 刘静, 王晓燕. 医学领域多模态知识图谱融合技术发展现状研究[J]. 计算机科学与探索, 2025, 19(7): 1729-1746. |
| [5] | 夏江镧, 李艳玲, 葛凤培. 基于大语言模型的实体关系抽取综述[J]. 计算机科学与探索, 2025, 19(7): 1681-1698. |
| [6] | 崔健, 汪永伟, 李飞扬, 李强, 苏北荣, 张小健. 结合知识蒸馏的中文文本摘要生成方法[J]. 计算机科学与探索, 2025, 19(7): 1899-1908. |
| [7] | 梁洁欣, 冯跃, 李健忠, 陈涛, 林卓胜, 何盈, 王松柏. 中医体质智能辨识方法的研究综述[J]. 计算机科学与探索, 2025, 19(6): 1455-1475. |
| [8] | 张欣, 孙靖超. 基于大语言模型的虚假信息检测框架综述[J]. 计算机科学与探索, 2025, 19(6): 1414-1436. |
| [9] | 许德龙, 林民, 王玉荣, 张树钧. 基于大语言模型的NLP数据增强方法综述[J]. 计算机科学与探索, 2025, 19(6): 1395-1413. |
| [10] | 何静, 沈阳, 谢润锋. 大语言模型幻觉现象的分类识别与优化研究[J]. 计算机科学与探索, 2025, 19(5): 1295-1301. |
| [11] | 李居昊, 石磊, 丁锰, 雷永升, 赵东越, 陈泷. 基于大语言模型的社交媒体文本立场检测[J]. 计算机科学与探索, 2025, 19(5): 1302-1312. |
| [12] | 刘华玲, 张子龙, 彭宏帅. 面向闭源大语言模型的增强研究综述[J]. 计算机科学与探索, 2025, 19(5): 1141-1156. |
| [13] | 杨思念, 曹立佳, 杨洋, 郭川东. 基于机器视觉的PCB缺陷检测算法研究综述[J]. 计算机科学与探索, 2025, 19(4): 901-915. |
| [14] | 刘哲旭, 李雷孝, 刘东江, 杜金泽, 林浩, 史建平. 智能合约漏洞检测与修复研究综述[J]. 计算机科学与探索, 2025, 19(4): 854-876. |
| [15] | 蒙秀扬, 王世屹, 李渡渡, 王春玲. 机器学习在社交媒体用户自杀意念检测中的应用综述[J]. 计算机科学与探索, 2025, 19(3): 559-581. |
| 阅读次数 | ||||||
|
全文 |
|
|||||
|
摘要 |
|
|||||