
计算机科学与探索 ›› 2025, Vol. 19 ›› Issue (10): 2635-2647.DOI: 10.3778/j.issn.1673-9418.2410024
肖增,王斯日古楞,斯琴图
出版日期:2025-10-01
发布日期:2025-09-30
XIAO Zeng, WANG Siriguleng, SI Qintu
Online:2025-10-01
Published:2025-09-30
摘要: 在多语言神经机器翻译中,零样本翻译是一个重要的研究方向,旨在使模型能够翻译训练过程中从未见过的语言对,实现跨语言迁移学习。然而,现有多语言模型在处理未见过语言对时,仍面临诸如语义偏移、翻译质量不稳定、语言方向不对称等问题,严重影响了翻译效果的可靠性与一致性。为系统性梳理该领域的研究现状,围绕“多语言模型构建方式对零样本翻译性能的影响”这一核心问题展开综述,旨在为后续研究者提供理论支持和方法借鉴。零样本翻译对于训练语料匮乏的语言对翻译任务意义重大,很大程度上降低了翻译成本。从语料资源的角度出发,介绍了零样本翻译的研究背景、基本定义、核心原理及其在跨文化沟通、新语言支持等场景中的实际应用价值。针对当前主流的零样本翻译建模方法,从基于预训练模型、双语监督训练和大语言模型构建多语言神经机器翻译的三个方向进行介绍。分析了多语言神经机器翻译中零样本翻译的未来研究趋势,为该领域进一步研究提供参考。
肖增, 王斯日古楞, 斯琴图. 零样本多语言神经机器翻译综述[J]. 计算机科学与探索, 2025, 19(10): 2635-2647.
XIAO Zeng, WANG Siriguleng, SI Qintu. Survey of Zero-Shot Multilingual Neural Machine Translation[J]. Journal of Frontiers of Computer Science and Technology, 2025, 19(10): 2635-2647.
| [1] JOHNSON M, SCHUSTER M, LE Q V, et al. Google??s multilingual neural machine translation system: enabling zero-shot translation[J]. Transactions of the Association for Computational Linguistics, 2017, 5: 339-351. [2] CHEN Y, LIU Y, CHENG Y, et al. A teacher-student framework for zero-resource neural machine translation[EB/OL].[2024-08-10]. https://arxiv.org/abs/1705.00753. [3] CHEN Y, LIU Y, LI V. Zero-resource neural machine translation with multi-agent communication game[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2018, 32(1): 5086-5093. [4] CHENG Y, YANG Q, LIU Y, et al. Joint training for pivot-based neural machine translation[C]//Proceedings of the 26th International Joint Conference on Artificial Intelligence. Palo Alto: AAAI, 2017: 3974-3980. [5] DONG D X, WU H, HE W, et al. Multi-task learning for multiple language translation[C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. Stroudsburg: ACL, 2015: 1723-1732. [6] SALEH F, BUNTINE W, HAFFARI G, et al. Multilingual neural machine translation: can linguistic hierarchies help?[EB/OL]. [2024-08-10]. https://arxiv.org/abs/2110.07816. [7] NEUBIG G, HU J. Rapid adaptation of neural machine translation to new languages[EB/OL]. [2024-08-10]. https://arxiv.org/abs/1808.04189. [8] ZOPH B, KNIGHT K. Multi-source neural translation[EB/OL]. [2024-08-10]. https://arxiv.org/abs/1601.00710. [9] HUIDROM R, LEPAGE Y. Zero-shot translation among Indian languages[C]//Proceedings of the 3rd Workshop on Technologies for MT of Low Resource Languages. Stroudsburg: ACL, 2020: 47-54. [10] HAN L, EROFEEV G, SOROKINA I, et al. Using massive multilingual pre-trained language models towards real zero-shot neural machine translation in clinical domain[EB/OL].[2024-08-12]. https://arxiv.org/abs/ 2210.06068. [11] LASKAR S R, KHILJI A F U R, PAKRAY P, et al. Zero-shot neural machine translation: Russian-Hindi@LoResMT 2020[C]//Proceedings of the 3rd Workshop on Technologies for MT of Low Resource Languages. Stroudsburg: ACL, 2020: 38-42. [12] CHEN G, MA S, CHEN Y, et al. Zero-shot cross-lingual transfer of neural machine translation with multilingual pretrained encoders[EB/OL]. [2024-08-12]. https://arxiv.org/ abs/2104.08757. [13] CHEN G H, MA S M, CHEN Y, et al. Towards making the most of cross-lingual transfer for zero-shot neural machine translation[C]//Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2022: 142-157. [14] SUN Z, WANG M, LI L. Multilingual translation via grafting pre-trained language models[EB/OL]. [2024-08-12]. https://arxiv.org/abs/2109.05256. [15] LIU Y. Multilingual denoising pre-training for neural machine translation[EB/OL]. [2024-08-11]. https://arxiv.org/abs/2001. 08210. [16] LIN Z, PAN X, WANG M, et al. Pre-training multilingual neural machine translation by leveraging alignment information[EB/OL]. [2024-08-11]. https://arxiv.org/abs/2010.03142. [17] TANG Y, TRAN C, LI X, et al. Multilingual translation with extensible multilingual pretraining and finetuning[EB/OL]. [2024-08-11]. https://arxiv.org/abs/2008.00401. [18] HUDI F, QU Z, KAMIGAITO H, et al. Disentangling pretrained representation to leverage low-resource languages in multilingual machine translation[C]//Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, 2024: 4978-4989. [19] LIU Z, WINATA G I, FUNG P. Continual mixed-language pre-training for extremely low-resource neural machine translation[EB/OL]. [2024-08-11]. https://arxiv.org/abs/2105. 03953. [20] PAN X, WANG M, WU L, et al. Contrastive learning for many-to-many multilingual neural machine translation[EB/OL]. [2024-08-11]. https://arxiv.org/abs/2105.09501. [21] DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: ACL, 2019: 4171-4186. [22] CONNEAU A. Unsupervised cross-lingual representation learning at scale[EB/OL]. [2024-08-12]. https://arxiv.org/abs/1911.02116. [23] LEWIS M. BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension[EB/OL]. [2024-08-11]. https://arxiv.org/abs/1910.13461. [24] VASWANI A. Attention is all you need[C]//Advances in Neural Information Processing Systems 30, 2017: 5998-6008. [25] LIN Z, WU L, WANG M, et al. Learning language specific sub-network for multilingual machine translation[EB/OL].[2024-08-12]. https://arxiv.org/abs/2105.09259. [26] PIRES T P, SCHMIDT R M, LIAO Y H, et al. Learning language-specific layers for multilingual machine translation[EB/OL]. [2024-08-12]. https://arxiv.org/abs/2305.02665. [27] LIAO J W, SHI Y, GONG M, et al. Improving zero-shot neural machine translation on language-specific encoders-decoders[C]//Proceedings of the 2021 International Joint Conference on Neural Networks. Piscataway: IEEE, 2021: 1-8. [28] PURASON T, TATTAR A. Multilingual neural machine translation with the right amount of sharing[C]//Proceedings of the 23rd Annual Conference of the European Association for Machine Translation, 2022: 91-100. [29] LIU J P, HUANG K Y, LI J Y, et al. Adaptive token-level cross-lingual feature mixing for multilingual neural machine translation[C]//Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2022: 10097-10113. [30] CHENG Y, BAPNA A, FIRAT O, et al. Multilingual mix: example interpolation improves multilingual neural machine translation[EB/OL]. [2024-08-10]. https://arxiv.org/abs/2203. 07627. [31] LI S J, WEI X P, ZHU S L, et al. MMNMT: modularizing multilingual neural machine translation with flexibly assembled MoE and dense blocks[C]//Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2023: 4978-4990. [32] ESCOLANO C, COSTA-JUSSà M R, FONOLLOSA J A R. From bilingual to multilingual neural-based machine translation by incremental training[J]. Journal of the Association for Information Science and Technology, 2021, 72(2): 190-203. [33] LIU J P, HUANG K Y, YU H, et al. Continual learning for multilingual neural machine translation via dual importance-based model division[C]//Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2023: 12011-12027. [34] HUANG K Y, LI P, LIU J P, et al. Learn and consolidate: continual adaptation for zero-shot and multilingual neural machine translation[C]//Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2023: 13938-13951. [35] HUANG K, LI P, MA J, et al. Knowledge transfer in incremental learning for multilingual neural machine translation[C]//Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2023: 15286-15304. [36] MULLOV C, PHAM N Q, WAIBEL A. Decoupled vocabulary learning enables zero-shot translation from unseen languages[EB/OL]. [2024-09-15]. https://arxiv.org/abs/2408.02290. [37] ZHANG B, WILLIAMS P, TITOV I, et al. Improving massively multilingual neural machine translation and zero-shot translation[EB/OL]. [2024-08-11]. https://arxiv.org/abs/2004. 11867. [38] RAGANATO A, VáZQUEZ R, CREUTZ M, et al. An empirical investigation of word alignment supervision for zero-shot multilingual neural machine translation[C]//Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2021: 8449-8456. [39] WANG W, JIAO W, WANG S, et al. Understanding and mitigating the uncertainty in zero-shot translation[EB/OL].[2024-08-11]. https://arxiv.org/abs/2205.10068. [40] QU Z, WATANABE T. Adapting to non-centered languages for zero-shot multilingual translation[EB/OL]. [2024-08-11]. https://arxiv.org/abs/2209.04138. [41] YANG Y, ERIGUCHI A, MUZIO A, et al. Improving multilingual translation by representation and gradient regularization[EB/OL]. [2024-08-11]. https://arxiv.org/abs/2109.04778. [42] CHEN L, MA S, ZHANG D, et al. On the off-target problem of zero-shot multilingual neural machine translation[EB/OL]. [2024-08-11]. https://arxiv.org/abs/2305.10930. [43] SUN Z, LIU Y, MENG F, et al. LCS: a language converter strategy for zero-shot neural machine translation[EB/OL].[2024-08-11]. https://arxiv.org/abs/2406.02876. [44] GU J, WANG Y, CHO K, et al. Improved zero-shot neural machine translation via ignoring spurious correlations[EB/OL]. [2024-08-12]. https://arxiv.org/abs/1906.01181. [45] JI B J, ZHANG Z R, DUAN X Y, et al. Cross-lingual pre-training based transfer for zero-shot neural machine translation[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(1): 115-122. [46] AL-SHEDIVAT M, PARIKH A P. Consistency by agreement in zero-shot neural machine translation[EB/OL]. [2024-08-12]. https://arxiv.org/abs/1904.02338. [47] ARIVAZHAGAN N, BAPNA A, FIRAT O, et al. The missing ingredient in zero-shot neural machine translation[EB/OL].[2024-08-12]. https://arxiv.org/abs/1903.07091. [48] SESTORAIN L, CIARAMITA M, BUCK C, et al. Zero-shot dual machine translation[EB/OL]. [2024-08-12]. https://arxiv.org/abs/1805.10338. [49] HE D, XIA Y, QIN T, et al. Dual learning for machine translation[C]//Advances in Neural Information Processing Systems 29, 2016: 820-828. [50] FIRAT O, CHO K, BENGIO Y. Multi-way, multilingual neural machine translation with a shared attention mechanism[EB/OL]. [2024-08-10]. https://arxiv.org/abs/1601.01073. [51] LU Y, KEUNG P, LADHAK F, et al. A neural interlingua for multilingual machine translation[EB/OL]. [2024-08-10].https://arxiv.org/abs/1804.08198. [52] VAZQUEZ R, RAGANATO A, TIEDEMANN J, et al. Multilingual NMT with a language-independent attention bridge[EB/OL]. [2024-08-10]. https://arxiv.org/abs/1811.00498. [53] ESCOLANO C, COSTA-JUSSA M R, FONOLLOSA J A R, et al. Multilingual machine translation: closing the gap between shared and language-specific encoder-decoders[EB/OL]. [2024-08-15]. https://arxiv.org/abs/2004.06575. [54] ESCOLANO C, COSTA-JUSSA M R, FONOLLOSA J A R, et al. Training multilingual machine translation by alternately freezing language-specific encoders-decoders[EB/OL]. [2024-08-15]. https://arxiv.org/abs/2006.01594. [55] XU H, KIM Y J, SHARAF A, et al. A paradigm shift in machine translation: boosting translation performance of large language models[EB/OL]. [2024-08-11]. https://arxiv.org/abs/ 2309.11674. [56] MOSLEM Y, HAQUE R, WAY A. Fine-tuning large language models for adaptive machine translation[EB/OL]. [2024-08-11]. https://arxiv.org/abs/2312.12740. [57] GAO P, HE Z, WU H, et al. Towards boosting many-to-many multilingual machine translation with large language models [EB/OL]. [2024-08-11]. https://arxiv.org/abs/2401.05861. [58] ZAN C, DING L, SHEN L, et al. Building accurate translation-tailored LLMs with language aware instruction tuning[EB/OL]. [2024-08-11]. https://arxiv.org/abs/2403.14399. [59] HUANG Y, LI B, FENG X, et al. Aligning translation-specific understanding to general understanding in large language models[EB/OL]. [2024-08-11]. https://arxiv.org/abs/2401.05072. [60] ZENG J L, MENG F D, YIN Y J, et al. Teaching large language models to translate with comparison[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2024, 38(17): 19488-19496. [61] ALVES D M, GUERREIRO N M, ALVES J, et al. Steering large language models for machine translation with finetuning and in-context learning[EB/OL]. [2024-08-11]. https://arxiv.org/abs/2310.13448. |
| [1] | 昂格鲁玛, 王斯日古楞, 斯琴图. 知识图谱补全研究综述[J]. 计算机科学与探索, 2025, 19(9): 2302-2318. |
| [2] | 王劲滔, 孟琪翔, 高志霖, 卜凡亮. 基于大语言模型指令微调的案件信息要素抽取方法研究[J]. 计算机科学与探索, 2025, 19(8): 2161-2173. |
| [3] | 田崇腾, 刘静, 王晓燕, 李明. 大语言模型GPT在医疗文本中的应用综述[J]. 计算机科学与探索, 2025, 19(8): 2043-2056. |
| [4] | 崔健, 汪永伟, 李飞扬, 李强, 苏北荣, 张小健. 结合知识蒸馏的中文文本摘要生成方法[J]. 计算机科学与探索, 2025, 19(7): 1899-1908. |
| [5] | 夏江镧, 李艳玲, 葛凤培. 基于大语言模型的实体关系抽取综述[J]. 计算机科学与探索, 2025, 19(7): 1681-1698. |
| [6] | 时振普, 吕潇, 董彦如, 刘静, 王晓燕. 医学领域多模态知识图谱融合技术发展现状研究[J]. 计算机科学与探索, 2025, 19(7): 1729-1746. |
| [7] | 张欣, 孙靖超. 基于大语言模型的虚假信息检测框架综述[J]. 计算机科学与探索, 2025, 19(6): 1414-1436. |
| [8] | 许德龙, 林民, 王玉荣, 张树钧. 基于大语言模型的NLP数据增强方法综述[J]. 计算机科学与探索, 2025, 19(6): 1395-1413. |
| [9] | 何静, 沈阳, 谢润锋. 大语言模型幻觉现象的分类识别与优化研究[J]. 计算机科学与探索, 2025, 19(5): 1295-1301. |
| [10] | 李居昊, 石磊, 丁锰, 雷永升, 赵东越, 陈泷. 基于大语言模型的社交媒体文本立场检测[J]. 计算机科学与探索, 2025, 19(5): 1302-1312. |
| [11] | 刘华玲, 张子龙, 彭宏帅. 面向闭源大语言模型的增强研究综述[J]. 计算机科学与探索, 2025, 19(5): 1141-1156. |
| [12] | 常保发, 车超, 梁艳. 基于大语言模型多轮对话的推荐模型研究[J]. 计算机科学与探索, 2025, 19(2): 385-395. |
| [13] | 王晓宇, 李欣, 胡勉宁, 薛迪. 基于大语言模型的CIL-LLM类别增量学习框架[J]. 计算机科学与探索, 2025, 19(2): 374-384. |
| [14] | 冯拓宇, 王刚亮, 乔子剑, 李伟平, 张雨松, 郭庆浪. SbSER:基于外部子图生成的大语言模型分步增强推理框架[J]. 计算机科学与探索, 2025, 19(2): 367-373. |
| [15] | 徐凤如, 李博涵, 胥帅. 基于深度学习与大语言模型的序列推荐研究进展[J]. 计算机科学与探索, 2025, 19(2): 344-366. |
| 阅读次数 | ||||||
|
全文 |
|
|||||
|
摘要 |
|
|||||