大语言模型自动化提示工程技术研究综述

doi:10.3778/j.issn.1673-9418.2502027

摘要/Abstract

摘要： 基于提示学习的提示工程对于提升大语言模型的技术可及性、加速其推广扩散与应用开发至关重要。传统的提示工程过度依赖于提示词设计者的领域知识和使用经验，且不易满足提示空间较大的任务；相比之下，自动化提示工程能够自动化或半自动化地生成或优化提示词，以探索大规模的提示词组合，并通过自动优化技术提升提示词生成的稳定性。然而目前仍缺乏对自动化提示研究的系统性综述，因此，及时跟进该领域的最新研究成果，详细梳理并评述自动化提示工程的实现形式，提出自动化提示工程的未来研究方向。依据自动化提示工程实现形式在逻辑推理和效能导向两个维度的取舍上，将其分为基于思维链的自动化提示工程、基于类机器学习模型的自动化提示工程、基于进化算法的自动化提示工程以及使用预训练包的即插即用系统。全面评估自动化提示工程技术，构建其工作原理的理论解释框架，评估各类实现形式的适用性与局限性。最后，展望多模态大模型、强推理模型以及智能体中自动化提示工程的发展趋势。

关键词: 大语言模型, 提示工程, 自动化提示工程, 思维链, 机器学习, 进化算法, 即插即用系统

Abstract: Prompt engineering (PE) based on prompt learning is crucial for improving the technical accessibility of LLMs and accelerating their adoption, diffusion, and application development. Compared with traditional PE, which heavily relies on the domain knowledge and experience of prompt designers and is less adaptable to tasks with large prompt spaces, automatic prompt engineering (APE) can generate or optimize prompts in an automatic or semi-automatic way. This enables the exploration of large-scale prompt combinations and enhances the stability of prompt generation through automated optimization techniques. However, there is currently a lack of systematic reviews on APE, which hinders subsequent researchers from quickly grasping the state of the field. Therefore, this paper keeps up with the latest research developments, systematically reviews the implementation forms of automated prompt engineering, and proposes future research directions. Based on the trade-offs in logical reasoning and performance orientation in the implementation of a APE, this paper categorizes it into four main types: APE based on chain-of-thought, APE based on machine learning models, APE based on evolutionary algorithms and plug-and-play auto-prompt systems. Subsequently, this paper conducts a comprehensive evaluation of APE techniques, constructing a theoretical explanatory framework for their working principles and assessing the applicability and limitations of each implementation form. Finally, this paper looks ahead to the development trends of APE in multimodal large models, advanced reasoning models and AI-Agents.

Key words: large language models, prompt engineering, automatic prompt engineering, chain of thought, machine learning, evolutionary algorithms, plug-and-play systems

巴泽智, 张辉, 谢铮涵, 左晓栋, 侯健玮. 大语言模型自动化提示工程技术研究综述[J]. 计算机科学与探索, 2025, 19(12): 3131-3152.

BA Zezhi, ZHANG Hui, XIE Zhenghan, ZUO Xiaodong, HOU Jianwei. Automatic Prompt Engineering Technology for Large Language Models: a Survey[J]. Journal of Frontiers of Computer Science and Technology, 2025, 19(12): 3131-3152.

参考文献

[1] DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019: 4171-4186.
[2] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems 30, 2017: 5998-6008.
[3] NAVEED H, KHAN A U, QIU S, et al. A comprehensive overview of large language models[EB/OL]. [2024-12-06]. https://arxiv.org/abs/2307.06435.
[4] 王耀祖, 李擎, 戴张杰, 等. 大语言模型研究现状与趋势[J]. 工程科学学报, 2024, 46(8): 1411-1425.
WANG Y Z, LI Q, DAI Z J, et al. Current status and trends in large language modeling research[J]. Chinese Journal of Engineering, 2024, 46(8): 1411-1425.
[5] ZHAO W X, ZHOU K, LI J, et al. A survey of large language models[EB/OL]. [2024-12-06]. https://arxiv.org/abs/2303.18223.
[6] 孙斐. 大模型提示词工程的进展、综述及展望[J]. 计算机应用文摘, 2024, 40(18): 179-182.
SUN F. Progress, review, and prospects of prompt engineering for large models[J]. Chinese Journal of Computer Application, 2024, 40(18): 179-182.
[7] 王东清, 芦飞, 张炳会, 等. 大语言模型中提示词工程综述[J]. 计算机系统应用, 2025, 34(1): 1-10.
WANG D Q, LU F, ZHANG B H, et al. Survey on prompt engineering in large language model[J]. Computer Systems and Applications, 2025, 34(1): 1-10.
[8] QIU R Z, XU Z, BAO W X, et al. Ask, and it shall be given: on the turing completeness of prompting[EB/OL]. [2024-12-06]. https://arxiv.org/abs/2411.01992.
[9] ZHOU Y C, MURESANU A I, HAN Z W, et al. Large language models are human-level prompt engineers[EB/OL]. [2024-12-06]. https://arxiv.org/abs/2211.01910.
[10] SAHOO P, SINGH A K, SAHA S, et al. A systematic survey of prompt engineering in large language models: techniques and applications[EB/OL]. [2024-12-06]. https://arxiv.org/abs/ 2402.07927.
[11] RAE J W, BORGEAUD S, CAI T, et al. Scaling language models: methods, analysis & insights from training gopher[EB/OL]. [2024-12-06]. https://arxiv.org/abs/2112.11446.
[12] ROBERTS J. How powerful are decoder-only transformer neural models?[C]//Proceedings of the 2024 International Joint Conference on Neural Networks. Piscataway: IEEE, 2024: 1-8.
[13] ARORA S, NARAYAN A, CHEN M F, et al. Ask me anything: a simple strategy for prompting language models[EB/OL]. [2024-12-06]. https://arxiv.org/abs/2210.02441.
[14] BROWN T B, MANN B, RYDER N, et al. Language models are few-shot learners[C]//Proceedings of the 34th International Conference on Neural Information Processing Systems, 2020: 1877-1901.
[15] RADFORD A, WU J, CHILD R, et al. Language models are unsupervised multitask learners[EB/OL]. [2024-12-07]. https:// openai.com/research/language-models-are-unsupervised-multitask-learners.
[16] REYNOLDS L, MCDONELL K. Prompt programming for large language models: beyond the few-shot paradigm[C]//Proceedings of the Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems. New York: ACM, 2021: 1-7.
[17] KEPEL D, VALOGIANNI K. Autonomous prompt engineering in large language models[EB/OL]. [2024-12-07]. https://arxiv.org/abs/2407.11000.
[18] LESTER B, AL-RFOU R, CONSTANT N. The power of scale for parameter-efficient prompt tuning[EB/OL]. [2024-12-07]. https://arxiv.org/abs/2104.08691.
[19] WEI J, WANG X Z, SCHUURMANS D, et al. Chain-of-thought prompting elicits reasoning in large language models[EB/OL]. [2024-12-07]. https://arxiv.org/abs/2201.11903.
[20] LIU P F, YUAN W Z, FU J L, et al. Pre-train, prompt, and predict: a systematic survey of prompting methods in natural language processing[J]. ACM Computing Surveys, 2023, 55(9): 1-35.
[21] BARBIERATO E, GATTI A. The challenges of machine learning: a critical review[J]. Electronics, 2024, 13(2): 416.
[22] KOJIMA T, GU S S, REID M, et al. Large language models are zero-shot reasoners[EB/OL]. [2024-12-07]. https://arxiv.org/abs/2205.11916.
[23] LI X L, LIANG P. Prefix-tuning: optimizing continuous prompts for generation[EB/OL]. [2024-12-07]. https://arxiv.org/abs/2101.00190.
[24] KATOCH S, CHAUHAN S S, KUMAR V. A review on genetic algorithm: past, present, and future[J]. Multimedia Tools and Applications, 2021, 80: 8091-8126.
[25] ZHENG M, LIANG H, YANG F, et al. PAS: data-efficient plug-and-play prompt augmentation system[EB/OL]. [2024-12-07]. https://arxiv.org/abs/2407.06027.
[26] WANG X Z, WEI J, SCHUURMANS D, et al. Self-consistency improves chain of thought reasoning in language models [EB/OL]. [2024-12-07]. https://arxiv.org/abs/2203.11171.
[27] YAO S Y, YU D, ZHAO J, et al. Tree of thoughts: deliberate problem solving with large language models[EB/OL]. [2024-12-07]. https://arxiv.org/abs/2305.10601.
[28] BESTA M, BLACH N, KUBICEK A, et al. Graph of thoughts: solving elaborate problems with large language models[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2024, 38(16): 17682-17690.
[29] WEN Y L, WANG Z F, SUN J M. MindMap: knowledge graph prompting Sparks graph of thoughts in large language models[EB/OL]. [2024-12-07]. https://arxiv.org/abs/2308.09729.
[30] ZHANG Y F, YUAN Y, YAO A C. On the diagram of thought[EB/OL]. [2024-12-07]. https://arxiv.org/abs/2409.10038.
[31] YANG L, YU Z C, ZHANG T J, et al. Buffer of thoughts: thought-augmented reasoning with large language models[EB/OL]. [2024-12-07]. https://arxiv.org/abs/2406.04271.
[32] SEVINC A, GUMUS A. AutoReason: automatic few-shot reasoning decomposition[EB/OL]. [2024-12-11]. https://arxiv.org/abs/2412.06975.
[33] CHU Z, CHEN J C, CHEN Q L, et al. Navigate through Enigmatic Labyrinth a survey of chain of thought reasoning: advances, frontiers and future[EB/OL]. [2024-12-11]. https://arxiv.org/abs/2309.15402.
[34] COBBE K, KOSARAJU V, BAVARIAN M, et al. Training verifiers to solve math word problems[EB/OL]. [2024-12-11]. https://arxiv.org/abs/2110.14168.
[35] ZHANG Z S, ZHANG A, LI M, et al. Automatic chain of thought prompting in large language models[EB/OL]. [2024-12-11]. https://arxiv.org/abs/2210.03493.
[36] DIAO S Z, WANG P C, LIN Y, et al. Active prompting with chain-of-thought for large language models[EB/OL]. [2024-12-11]. https://arxiv.org/abs/2302.12246.
[37] AMINI A, GABRIEL S, LIN P, et al. MathQA: towards interpretable math word problem solving with operation-based formalisms[EB/OL]. [2024-12-11]. https://arxiv.org/abs/1905.13319.
[38] YASUNAGA M, CHEN X Y, LI Y J, et al. Large language models as analogical reasoners[EB/OL]. [2024-12-11]. https://arxiv.org/abs/2310.01714.
[39] SHUM K, DIAO S Z, ZHANG T. Automatic prompt augmentation and selection with chain-of-thought from labeled data[EB/OL]. [2024-12-11]. https://arxiv.org/abs/2302.12822.
[40] GAO Y, CHEN S F, LU X. Research on reinforcement learning technology: a review[J]. Acta Automatica Sinica, 2004, 30(1): 86-100.
[41] DENG M K, WANG J Y, HSIEH C P, et al. RLPrompt: optimizing discrete text prompts with reinforcement learning[EB/OL]. [2024-12-11]. https://arxiv.org/abs/2205.12548.
[42] ZHANG T J, WANG X Z, ZHOU D, et al. TEMPERA: test-time prompting via reinforcement learning[EB/OL]. [2024-12-11]. https://arxiv.org/abs/2211.11890.
[43] PRYZANT R, ITER D, LI J, et al. Automatic prompt optimization with “gradient descent” and beam search[EB/OL]. [2024-12-11]. https://arxiv.org/abs/2305.03495.
[44] CHEN W Z, KOENIG S, DILKINA B. RePrompt: planning by automatic prompt engineering for large language models agents[EB/OL]. [2024-12-11]. https://arxiv.org/abs/2406.11132.
[45] WALLACE E, XIAO K, LEIKE R, et al. The instruction hierarchy: training LLMs to prioritize privileged instructions[EB/OL]. [2024-12-13]. https://arxiv.org/abs/2404.13208.
[46] GAO T Y, FISCH A, CHEN D Q. Making pre-trained language models better few-shot learners[EB/OL]. [2024-12-13]. https://arxiv.org/abs/2012.15723.
[47] PEREZ E, KIELA D, CHO K. True few-shot learning with language models[C]//Advances in Neural Information Processing Systems 34, 2021:11054-11070.
[48] MA R T, WANG X L, ZHOU X, et al. Are large language models good prompt optimizers?[EB/OL]. [2024-12-13]. https://arxiv.org/abs/2402.02101.
[49] LIU S C, CHEN C S, QU X H, et al. Large language models as evolutionary optimizers[C]//Proceedings of the 2024 IEEE Congress on Evolutionary Computation. Piscataway: IEEE, 2024: 1-8.
[50] YE Q Y, AXMED M, PRYZANT R, et al. Prompt engineering a prompt engineer[EB/OL]. [2024-12-13]. https://arxiv.org/abs/2311.05661.
[51] DO V T, HOANG V K, NGUYEN D H, et al. Automatic prompt selection for large language models[EB/OL]. [2024-12-13]. https://arxiv.org/abs/2404.02717.
[52] LANGE R, TIAN Y T, TANG Y J. Large language models as evolution strategies[C]//Proceedings of the Genetic and Evolutionary Computation Conference Companion. New York: ACM, 2024: 579-582.
[53] FERNANDO C, BANARSE D, MICHALEWSKI H, et al. Promptbreeder: self-referential self-improvement via prompt evolution[EB/OL]. [2024-12-13]. https://arxiv.org/abs/2309. 16797.
[54] HUSSAIN A, RIAZ S, AMJAD M S, et al. Genetic algorithm with a new round-robin based tournament selection: statistical properties analysis[J]. PLoS One, 2022, 17(9): e0274456.
[55] GUO Q Y, WANG R, GUO J L, et al. EvoPrompt: connecting LLMs with evolutionary algorithms yields powerful prompt optimizers[EB/OL]. [2024-12-13]. https://arxiv.org/abs/2309.08532.
[56] ELTAEIB T, MAHMOOD A. Differential evolution: a survey and analysis[J]. Applied Sciences, 2018, 8(10): 1945.
[57] PRASAD A, HASE P, ZHOU X, et al. GrIPS: gradient-free, edit-based instruction search for prompting large language models[EB/OL]. [2024-12-13]. https://arxiv.org/abs/2203. 07281.
[58] HSIEH C J, SI S, YU F X, et al. Automatic engineering of long prompts[EB/OL]. [2024-12-13]. https://arxiv.org/abs/2311.10117.
[59] CHENG J L, LIU X, ZHENG K H, et al. Black-box prompt optimization: aligning large language models without model training[EB/OL]. [2024-12-13]. https://arxiv.org/abs/2311. 04155.
[60] KEPEL D, VALOGIANNI K. Autonomous prompt engineering in large language models[EB/OL]. [2024-12-13]. https://arxiv.org/abs/2407.11000.
[61] HE J, RUNGTA M, KOLECZEK D, et al. Does prompt formatting have any impact on LLM performance?[EB/OL]. [2024-12-13]. https://arxiv.org/abs/2411.10541.
[62] DONG H, SU Q, GAO Y, et al. APPL: a prompt programming language for harmonious integration of programs and large language model prompts[EB/OL]. [2024-12-13]. https:// arxiv.org/abs/2406.13161.
[63] CHENG J, VAN DURME B. Compressed chain of thought: efficient reasoning through dense representations[EB/OL]. [2024-12-19]. https://arxiv.org/abs/2412.13171.
[64] LU Y, BARTOLO M, MOORE A, et al. Fantastically ordered prompts and where to find them: overcoming few-shot prompt order sensitivity[EB/OL]. [2024-12-19]. https://arxiv.org/abs/ 2104.08786.
[65] SU H J, KASAI J, WU C H, et al. Selective annotation makes language models better few-shot learners[EB/OL]. [2024-12-19]. https://arxiv.org/abs/2209.01975.
[66] CAO T F, WANG C Y, LIU B Y, et al. BeautifulPrompt: towards automatic prompt engineering for text-to-image synthesis[EB/OL]. [2024-12-19]. https://arxiv.org/abs/2311. 06752.
[67] ZHANG Z S, ZHANG A, LI M, et al. Multimodal chain-of-thought reasoning in language models[EB/OL]. [2024-12-19]. https://arxiv.org/abs/2302.00923.
[68] RADFORD A, KIM J W, HALLACY C, et al. Learning transferable visual models from natural language supervision[C]//Proceedings of the 38th International Conference on Machine Learning, 2021: 8748-8763.
[69] LEE Y L, TSAI Y H, CHIU W C, et al. Multimodal prompting with missing modalities for visual recognition[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 14943-14952.
[70] ZHUGE M C, GAO D H, FAN D P, et al. Kaleido-BERT: vision-language pre-training on fashion domain[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 12647-12657.
[71] ZHOU L W, PALANGI H, ZHANG L, et al. Unified vision-language pre-training for image captioning and VQA[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(7): 13041-13049.
[72] XU F L, HAO Q Y, ZONG Z F, et al. Towards large reasoning models: a survey of reinforced reasoning with large language models[EB/OL]. [2025-01-25]. https://arxiv.org/abs/2501.09686.
[73] JOSHI S. A comprehensive review of DeepSeek: performance, architecture and capabilities[EB/OL]. [2025-03-29]. https://www.preprints.org/manuscript/202503.1887.
[74] SANWAL M. Layered chain-of-thought prompting for multi-agent LLM systems: a comprehensive approach to explainable large language models[EB/OL]. [2025-01-31]. https://arxiv.org/abs/2501.18645.
[75] WANG L, ZHANG J S, YANG H, et al. User behavior simulation with large language model-based agents[J]. ACM Transactions on Information Systems, 2025, 43(2): 1-37.