Survey of Multi-domain Machine Translation Methods for Fine-Tuning Large Models

doi:10.3778/j.issn.1673-9418.2410032

Abstract

Abstract: With the rapid development of machine translation technology, machine translation methods based on pre-trained large models have occupied an important position in the field of natural language processing. However, due to the significant differences in language features, lexical styles and expressions between different domains, it is difficult for a single pre-trained model to achieve efficient and stable performance in multi-domain translation tasks. Therefore, this paper focuses on the key issues of large model fine-tuning technology in multi-domain machine translation tasks, systematically reviews the core principles, main methods and application effects of fine-tuning technology, and focuses on analyzing the performance and applicability scenarios of three types of strategies, namely full-parameter fine-tuning, parameter-efficient fine-tuning, and prompt-tuning. This paper discusses the advantages and limitations of different fine-tuning methods in depth, focusing on how to balance the domain generalization ability and task specificity through efficient fine-tuning strategies under resource-constrained conditions, and demonstrating the significant advantages of parameter-efficient fine-tuning and prompt-tuning in terms of resource utilization efficiency and domain adaptability. The practical effects of different fine-tuning strategies in terms of domain migration and resource utilization are further evaluated through comparative analysis and experimental validation, and their effectiveness is verified through case studies. Future research directions should focus on the efficient utilization of resources, the domain adaptive capability of models, and the improvement of translation quality and robustness, so as to promote the continuous development of multi-domain machine translation systems in terms of performance and adaptability.

Key words: large model fine-tuning, multi-domain machine translation, full-parameter fine-tuning, parameter-efficient fine-tuning, prompt-tuning

摘要： 随着机器翻译技术的快速发展，基于预训练大模型的机器翻译方法已在自然语言处理领域占据重要地位。然而，由于不同领域之间语言特征、词汇风格和表达方式的显著差异，单一预训练模型在多领域翻译任务中难以实现高效且稳定的性能。为此，聚焦于多领域机器翻译任务中大模型微调技术的关键问题，系统性地综述了微调技术的核心原理、主要方法及应用效果，重点分析了全参数微调、参数高效微调和提示微调三类策略的性能表现与适用场景。深入探讨了不同微调方法的优势与局限性，重点分析了在资源受限条件下如何通过高效微调策略平衡领域泛化能力与任务特异性，展示了参数高效微调与提示微调在资源利用效率和领域适应性方面的显著优势。通过对比分析与实验验证，进一步评估了不同微调策略在领域迁移和资源利用方面的实际效果，并通过案例分析验证了其有效性。未来的研究方向应重点关注资源的高效利用、模型的领域自适应能力，以及翻译质量和鲁棒性的提升，从而推动多领域机器翻译系统在性能与适应性方面的持续发展。

关键词: 大模型微调, 多领域机器翻译, 全参数微调, 参数高效微调, 提示微调

CHEN Zijian, WANG Siriguleng, SI Qintu. Survey of Multi-domain Machine Translation Methods for Fine-Tuning Large Models[J]. Journal of Frontiers of Computer Science and Technology, 2025, 19(4): 916-928.

陈子建, 王斯日古楞, 斯琴图. 大模型微调的多领域机器翻译方法综述[J]. 计算机科学与探索, 2025, 19(4): 916-928.

References

[1] ZHAO W X, ZHOU K, LI J, et al. A survey of large language models[EB/OL]. [2024-08-18]. https://arxiv.org/abs/2303. 18223.
[2] RADFORD A, NARASIMHAN K, SALIMANS T, et al. Improving language understanding by generative pre-training[EB/OL]. [2024-08-18]. https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf.
[3] RADFORD A, WU J, CHILD R, et al. Language models are unsupervised multitask learners[J]. OpenAI Blog, 2019, 1(8): 9.
[4] BROWN T, MANN B, RYDER N, et al. Language models are few-shot learners[C]//Advances in Neural Information Processing Systems 33, 2020: 1877-1901.
[5] SHAHNAZARYAN L, BELOUCIF M. Defining boundaries: the impact of domain specification on cross-language and cross-domain transfer in machine translation[EB/OL]. [2024- 08-18]. https://arxiv.org/abs/2408.11926.
[6] MAN Z, HUANG Z, ZHANG Y, et al. WDSRL: multi-domain neural machine translation with word-level domain-sensitive representation learning[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2023, 32: 577-590.
[7] KOEHN P, KNOWLES R. Six challenges for neural machine translation[EB/OL]. [2024-08-18]. https://arxiv.org/abs/1706. 03872.
[8] RAFFEL C, SHAZEER N, ROBERTS A, et al. Exploring the limits of transfer learning with a unified text-to-text transformer[J]. Journal of Machine Learning Research, 2020, 21(1): 5485-5551.
[9] GURURANGAN S, MARASOVI? A, SWAYAMDIPTA S, et al. Don’t stop pretraining: adapt language models to domains and tasks[EB/OL]. [2024-08-18]. https://arxiv.org/abs/2004.10964.
[10] 李亚超, 熊德意, 张民. 神经机器翻译综述[J]. 计算机学报, 2018, 41(12): 2734-2755.
LI Y C, XIONG D Y, ZHANG M. A survey of neural machine translation[J]. Chinese Journal of Computers, 2018, 41(12): 2734-2755.
[11] 袁小于. 基于规则的机器翻译技术综述[J]. 重庆文理学院学报(自然科学版), 2011(3): 56-59.
YUAN X Y. Rule-based machine translation technology review[J]. Journal of Chongqing University of Arts and Sciences (Natural Science Edition), 2011(3): 56-59.
[12] 刘占一, 李生, 刘挺, 等. 利用统计搭配模型改进基于实例的机器翻译[J]. 软件学报, 2012, 23(6): 1472-1485.
LIU Z Y, LI S, LIU T, et al. Improving example-based machine translation with statistical collocation model[J]. Journal of Software, 2012, 23(6): 1472-1485.
[13] ZAREMBA W, SUTSKEVER I, VINYALS O. Recurrent neural network regularization[EB/OL]. [2024-08-18]. https:// arxiv.org/abs/1409.2329.
[14] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735-1780.
[15] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems 30, 2017: 5998-6008.
[16] YANG S H, WANG Y X, CHU X W. A survey of deep learning techniques for neural machine translation[EB/OL]. [2024-08-18]. https://arxiv.org/abs/2002.07526.
[17] DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[EB/OL]. [2024-08-18]. https://arxiv.org/abs/1810.04805.
[18] LEWIS M, LIU Y H, GOYAL N, et al. BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension[EB/OL]. [2024-09-14]. https://arxiv.org/abs/1910.13461.
[19] UNANUE I J, PARNELL J, PICCARDI M. BERTTune: fine-tuning neural machine translation with BERTScore[EB/OL]. [2024-09-14]. https://arxiv.org/abs/2106.02208.
[20] ZHU J H, XIA Y C, WU L J, et al. Incorporating BERT into neural machine translation[EB/OL]. [2024-09-14]. https://arxiv.org/abs/2002.06823.
[21] XU H, VAN DURME B, MURRAY K. BERT, mBERT, or BiBERT? A study on contextualized embeddings for neural machine translation[EB/OL]. [2024-09-14]. https://arxiv.org/abs/2109.04588.
[22] LIU Y H, GU J T, GOYAL N, et al. Multilingual denoising pre-training for neural machine translation[EB/OL]. [2024-09-14]. https://arxiv.org/abs/2001.08210.
[23] VERMA N, MURRAY K, DUH K. Strategies for adapting multilingual pre-training for domain-specific machine translation[C]//Proceedings of the 15th Biennial Conference of the Association for Machine Translation in the Americas,2022: 31-44.
[24] YE J J, CHEN X T, XU N, et al. A comprehensive capability analysis of GPT-3 and GPT-3.5 series models[EB/OL]. [2024-09-14]. https://arxiv.org/abs/2303.10420.
[25] ACHIAM J, ADLER S, AGARWAL S, et al. Gpt-4 technical report[EB/OL]. [2024-09-14]. https://arxiv.org/abs/2303.08774.
[26] LIALIN V, DESHPANDE V, RUMSHISKY A. Scaling down to scale up: a guide to parameter-efficient fine-tuning[EB/OL]. [2024-09-14]. https://arxiv.org/abs/2303.15647.
[27] XEZONAKI D, KHALIL T, STAP D, et al. Improving domain robustness in neural machine translation with fused topic knowledge embeddings[C]//Proceedings of the 2023 Machine Translation Summit XIX, Vol. 1: Research Track, 2023: 209-221.
[28] MAN Z B, ZHANG Y J, CHEN Y M, et al. Exploring domain-shared and domain-specific knowledge in multi-domain neural machine translation[C]//Proceedings of the 2023 Machine Translation Summit XIX, Vol. 1: Research Track, 2023: 99-110.
[29] SAUNDERS D, DENEEFE S. Domain adapted machine translation: what does catastrophic forgetting forget and why?[C]//Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2024: 12660-12671.
[30] THOMPSON B, GWINNUP J, KHAYRALLAH H, et al. Overcoming catastrophic forgetting during domain adaptation of neural machine translation[C]//Proceedings of the 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Stroudsburg: ACL, 2019: 2062-2068.
[31] ZHANG X, RAJABI N, DUH K, et al. Machine translation with large language models: prompting, few-shot learning, and fine-tuning with QLoRA[C]//Proceedings of the 8th Conference on Machine Translation. Stroudsburg: ACL, 2023: 468-481.
[32] LIU Y H, OTT M, GOYAL N, et al. RoBERTa: a robustly optimized BERT pretraining approach[EB/OL]. [2024-09-25]. https://arxiv.org/abs/1907.11692.
[33] HOULSBY N, GIURGIU A, JASTRZEBSKI S, et al. Parameter-efficient transfer learning for NLP[C]//Proceedings of the 36th International Conference on Machine Learning, 2019: 2790-2799.
[34] BEN ZAKEN E, RAVFOGEL S, GOLDBERG Y. BitFit: simple parameter-efficient fine-tuning for transformer-based masked language-models[EB/OL]. [2024-09-25]. https://arxiv.org/abs/2106.10199.
[35] AGHAJANYAN A, ZETTLEMOYER L, GUPTA S. Intrinsic dimensionality explains the effectiveness of language model fine-tuning[EB/OL]. [2024-09-25]. https://arxiv.org/abs/2012. 13255.
[36] HU J E, SHEN Y L, WALLIS P, et al. LoRA: low-rank adaptation of large language models[EB/OL]. [2024-09-25]. https://arxiv.org/abs/2106.09685.
[37] ZHENG J, HONG H, WANG X, et al. Fine-tuning large language models for domain-specific machine translation[EB/OL]. [2024-09-25]. https://arxiv.org/abs/2402.15061.
[38] ZHANG Q R, CHEN M S, BUKHARIN A, et al. AdaLoRA: adaptive budget allocation for parameter-efficient fine-tuning[EB/OL]. [2024-09-25]. https://arxiv.org/abs/2303.10512.
[39] DETTMERS T, PAGNONI A, HOLTZMAN A, et al. QLoRA: efficient finetuning of quantized LLMs[EB/OL]. [2024-09-25]. https://arxiv.org/pdf/2305.14314.
[40] 张钦彤, 王昱超, 王鹤羲, 等. 大语言模型微调技术的研究综述[J]. 计算机工程与应用, 2024, 60(17): 17-33.
ZHANG Q T, WANG Y C, WANG H X, et al. Comprehensive review of large language model fine-tuning[J]. Computer Engineering and Applications, 2024, 60(17): 17-33.
[41] LAI W, CHRONOPOULOU A, FRASER A. m4Adapter: multilingual multi-domain adaptation for machine translation with a meta-adapter[EB/OL]. [2024-09-25]. https://arxiv.org/abs/2210.11912.
[42] WU Z L, LUO Y C, WEI D M, et al. HW-TSC’s submission to the CCMT 2024 machine translation tasks[EB/OL]. [2024- 09-25]. https://arxiv.org/abs/2409.14842.
[43] SCHICK T, SCHüTZE H. Exploiting cloze-questions for few-shot text classification and natural language inference[EB/OL]. [2024-09-25]. https://arxiv.org/abs/2001.07676.
[44] DONG Q X, LI L, DAI D M, et al. A survey on in-context learning[EB/OL]. [2024-09-25]. https://arxiv.org/abs/2301. 00234.
[45] ZHU S, CUI M, XIONG D. Towards robust in-context learning for machine translation with large language models[C]//Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, 2024: 16619-16629.
[46] ZHANG B, HADDOW B, BIRCH A, et al. Prompting large language model for machine translation[C]//Proceedings of the 40th International Conference on Machine Learning, 2023: 41092-41110.
[47] WEI J, BOSMA M, ZHAO V Y, et al. Finetuned language models are zero-shot learners[EB/OL]. [2024-09-25]. https:// arxiv.org/abs/2109.01652.
[48] RIOS M. Instruction-tuned large language models for machine translation in the medical domain[EB/OL]. [2024-09-25]. https://arxiv.org/abs/2408.16440.
[49] WEI J, WANG X Z, SCHUURMANS D, et al. Chain-of-thought prompting elicits reasoning in large language models[C]//Advances in Neural Information Processing Systems 35, 2022: 24824-24837.
[50] KOJIMA T, GU S S, REID M, et al. Large language models are zero-shot reasoners[C]//Advances in Neural Information Processing Systems 35, 2022: 22199-22213.
[51] HU T, ZHANG P, YANG B, et al. Large language model for multi-domain translation: benchmarking and domain CoT fine-tuning[EB/OL]. [2024-11-10]. https://arxiv.org/abs/ 2410.02631.