Survey of Research on Curriculum Learning in Neural Machine Translation

doi:10.3778/j.issn.1673-9418.2403008

Abstract

Abstract: Curriculum learning, as an emerging technology, has gradually attracted attention in recent years. It is in line with human learning habits, from simple to difficult, from shallow to deep. Its core idea is to allow the model to learn from simple and basic concepts, and gradually transition to more complex and higher-level content. In the translation of neural machines, curriculum learning is a training strategy to help models learn in accordance with certain laws. Curriculum learning has been proven to accelerate the convergence of the model and improve the quality and stability of the model translation. This paper first introduces the definition and basic framework of curriculum learning from the perspective of machine learning, and further explores its application in the field of neural machine translation. Two approaches of curriculum learning, namely predefined curriculum learning and dynamic curriculum learning, are discussed in detail from the perspectives of sample difficulty evaluation and model training scheduling strategies. Predefined curriculum learning guides the model to gradually learn tasks from simple to complex by pre-determining the difficulty order of samples. In contrast, dynamic curriculum learning adjusts the difficulty of samples dynamically based on the model’s current learning state, offering a more flexible training approach. Additionally, this paper analyzes the future research trends of curriculum learning in neural machine translation and proposes three promising research directions.

Key words: curriculum learning, neural machine translation, difficulty measure, training schedule

摘要： 课程学习作为一种新兴技术，近年来逐渐受到关注，它符合人类的学习习惯，由简到难、由浅至深。其核心思想是让模型从简单的、基础的概念开始学习，逐渐过渡到更复杂、更高层次的内容。在神经机器翻译中，课程学习作为一种训练策略，旨在帮助模型按照一定规律学习。课程学习现已被证明可以加速模型收敛，提高模型翻译质量和稳定性。从机器学习的角度介绍了课程学习的定义及其基础框架，并进一步探讨了课程学习在神经机器翻译领域中的应用。从样本难度评估与模型训练调度策略两个方面，详细阐述了预定义课程学习和动态课程学习两种方法。预定义课程学习通过事先确定样本的难度顺序，引导模型从简单到复杂的任务逐步学习；而动态课程学习则依据模型当前的学习状态动态调整样本的难度，提供了更灵活的训练方式。分析了课程学习在神经机器翻译领域的未来研究趋势，并提出了三个值得进一步探索的研究方向。

关键词: 课程学习, 神经机器翻译, 难度评估, 训练调度

HU Chunyue, SI Qintu, WANG Siriguleng. Survey of Research on Curriculum Learning in Neural Machine Translation[J]. Journal of Frontiers of Computer Science and Technology, 2025, 19(2): 334-343.

胡春月, 斯琴图, 王斯日古楞. 神经机器翻译中课程学习研究综述[J]. 计算机科学与探索, 2025, 19(2): 334-343.

References

[1] BENGIO Y, LOURADOUR J, COLLOBERT R, et al. Curriculum learning[C]//Proceedings of the 26th Annual International Conference on Machine Learning. New York: ACM, 2009: 41-48.
[2] VOLKAN C, EDUARD H, LOUIS-PHILIPPE M. Visualizing and understanding curriculum learning for long short-term memory networks[EB/OL]. [2024-01-12]. https://arxiv.org/abs/1611.06204.
[3] LIU X, LAI H, WONG D F, et al. Norm-based curriculum learning for neural machine translation[EB/OL]. [2024-01-12]. https://arxiv.org/abs/2006.02014.
[4] ZHOU Y K, YANG B S, DEREK F W, et al. Uncertainty-aware curriculum learning for neural machine translation[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL,2020: 6934-6944.
[5] ZHANG X,PAMELA S, GAURAV K, et al. Curriculum learning for domain adaptation in neural machine translation[C]//Proceedings of the 2019 Conference on the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: ACL, 2019: 1903-1915.
[6] ZHAN R Z, LIU X B, DEREK F W, et al. Meta-curriculum learning for domain adaptation in neural machine translation[EB/OL]. [2024-01-12]. https://arxiv.org/abs/2103.02262.
[7] RUITER D, GENABITH J V, ESPAA-BONET C. Self-induced curriculum learning in self-supervised neural machine translation[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing.Stroudsburg: ACL, 2020: 2560-2571.
[8] SPITKOVSKY V I, ALSHAWI H, JURAFSKY D. From baby steps to leapfrog: how “less is more” in unsupervised dependency parsing[C]//Human Language Technologies: the 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Stroudsburg: ACL, 2010: 751-759.
[9] YUAN Z, LI Y. A curriculum learning approach for multi-domain text classification using keyword weight ranking[J]. Electronics, 2023, 12(14): 3040.
[10] 于东, 谢婉莹, 谷舒豪, 等. 基于语种关联度课程学习的多语言神经机器翻译[J]. 计算机科学, 2022, 49(1): 24-30.
YU D, XIE W Y, GU S H, et al. Similarity-based curriculum learning for multilingual neural machine translation[J]. Computer Science, 2022, 49(1): 24-30.
[11] GRAVES A, BELLEMARE M G, MENICK J, et al. Automated curriculum learning for neural networks[C]//Proceedings of the 34th International Conference on Machine Learning, Sydney, Aug 6-11, 2017: 1311-1320.
[12] HACOHEN G, WEINSHALL D. On the power of curriculum learning in training deep networks[C]//Proceedings of the 36th International Conference on Machine Learning, Long Beach, Jun 9-15, 2019: 2535-2544.
[13] HASLER E, DOMHAN T, TRENOUS J, et al.Improving the quality trade-off for neural machine translation multi-domain adaptation[C]//Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing.Stroudsburg: ACL, 2021: 8470-8477.
[14] TAY Y, WANG S, TUAN L A, et al. Simple and effective curriculum pointer-generator networks for reading comprehension over long narratives[EB/OL]. [2024-01-12]. https://arxiv.org/abs/1905.10847.
[15] ZHANG D, KIM J, CREGO J, et al. Boosting neural machine translation[EB/OL]. [2024-01-12]. https://arxiv.org/abs/1612.06138.
[16] FAN Y, TIAN F, QIN T, et al. Learning to teach[EB/OL].[2024-01-12]. https://arxiv.org/abs/1805.03643.
[17] WANG W, CASWELL I, CHELBA C. Dynamically composing domain-data selection with clean-data selection by “cocurricular learning” for neural machine translation[EB/OL]. [2024-01-12]. https://arxiv.org/abs/1906.01130.
[18] WANG W, TIAN Y, NGIAM J, et al. Learning a multi-domain curriculum for neural machine translation[EB/OL]. [2024-01-12]. https://arxiv.org/abs/1908.10940.
[19] GUO Y, CHEN Y, ZHENG Y, et al. Breaking the curse of space explosion: towards efficient NAS with curriculum search[C]//Proceedings of the 37th International Conference on Machine Learning, Jul 13-18, 2020: 3822-3831.
[20] ZHANG X, KUMAR G, KHAYRALLAH H, et al. An empirical exploration of curriculum learning for neural machine translation[EB/OL]. [2024-01-12]. https://arxiv.org/abs/1811. 00739.
[21] KOCMI T, BOJAR O. Curriculum learning and minibatch bucketing in neural machine translation[EB/OL]. [2024-01-12]. https://arxiv.org/abs/1707.09533.
[22] WANG C, WU Y, LIU S, et al. Curriculum pre-training for end-to-end speech translation[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2020: 3728-3738.
[23] Ee L M, ALJUNDI R, MASANA M, et al. A continual learning survey: defying forgetting in classification tasks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 44(7): 3366-3385.
[24] ALJUNDI R, LIN M, GOUJAUD B, et al. Gradient based sample selection for online continual learning[C]//Advances in Neural Information Processing Systems 32, Vancouver, Dec 8-14, 2019: 11816-11825.
[25] PENTINA A, SHARMANSKA V, LAMPERT C H. Curriculum learning of multiple tasks[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2015: 5492-5500.
[26] MOHIUDDIN T, KOEHN P, CHAUDHARY V, et al. Data selection curriculum for neural machine translation[EB/OL]. [2024-01-12]. https://arxiv.org/abs/2203.13867.
[27] PLATANIOS E A, STRETCU O, NEUBIG G, et al. Competence-based curriculum learning for neural machine translation[EB/OL]. [2024-01-12]. https://arxiv.org/abs/1903.09848.
[28] WU M, LI Y, ZHANG M, et al. Uncertainty-aware balancing for multilingual and multi-domain neural machine translation training[EB/OL]. [2024-01-12]. https://arxiv.org/abs/2109.02284.
[29] WAN Y, YANG B, DEREK F W, et al. Self-paced learning for neural machine translation[EB/OL]. [2024-01-12]. https://arxiv.org/abs/2010.04505.
[30] KUMAR M, PACKER B, KOLLER D. Self-paced learning for latent variable models[C]//Advances in Neural Information Processing Systems 23, Vancouver, Dec 6-9, 2010: 1189- 1197.
[31] ZHOU L, DING L, DUH K, et al. Self-guided curriculum learning for neural machine translation[EB/OL]. [2024-01-12]. https://arxiv.org/abs/2105.04475.
[32] GENG X, ZHANG Y,LI J H, et al. Denoising pre-training for machine translation quality estimation with curriculum learning[C]//Proceedings of the 37th AAAI Conference on Artificial Intelligence, the 35th Conference on Innovative Applications of Artificial Intelligence, the 13th Symposium on Educational Advances in Artificial Intelligence.Menlo Park: AAAI, 2023: 12827-12835.
[33] DOU Z Y, ANASTASOPOULOS A, NEUBIG G. Dynamic data selection and weighting for iterative back-translation[EB/OL]. [2024-01-12]. https://arxiv.org/abs/2004.03672.
[34] MATIISEN T, OLIVER A, COHEN T, et al. Teacher-student curriculum learning[J]. IEEE Transactions on Neural Networks and Learning Systems, 2020, 31(9): 3732-3740.
[35] WEINSHALL D, COHEN G, AMIR D. Curriculum learning by transfer learning: theory and experiments with deep networks[C]//Proceedings of the 35th International Conference on Machine Learning, Stockholmsmässan, Jul 10-15, 2018: 5238-5246.
[36] KUMAR G, FOSTER G, CHERRY C, et al. Reinforcement learning based curriculum optimization for neural machine translation[EB/OL]. [2024-01-12]. https://arxiv.org/abs/1903. 00041.
[37] ZHAO M, WU H, NIU D, et al. Reinforced curriculum learning on pretrained neural machine translation models[J].Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(5): 9652-9659.
[38] XU C, HU B, JIANG Y, et al. Dynamic curriculum learning for low-resource neural machine translation[EB/OL]. [2024-01-12]. https://arxiv.org/abs/2011.14608.
[39] MADANI M, MOTAMENI H, ROSHANI R. Fake news detection using feature extraction, natural language processing, curriculum learning, and deep learning[J]. International Journal of Information Technology & Decision Making, 2024, 23(3): 1063-1098.
[40] WANG R, UTIYAMA M, SUMITA E. Dynamic sentence sampling for efficient training of neural machine translation[EB/OL]. [2024-01-12]. https://arxiv.org/abs/1805.00178.
[41] PHAM M Q, CREGO J, YVON F. Multi-domain adaptation in neural machine translation with dynamic sampling strategies[C]//Proceedings of the 23rd Annual Conference of the European Association for Machine Translation, Ghent, Jun 1-3, 2022: 13-22.
[42] LIANG C, JIANG H M, LIU X D, et al. Token-wise curriculum learning for neural machine translation[EB/OL]. [2024-01-12]. https://arxiv.org/abs/2103.11088.