基于知识蒸馏的神经机器翻译综述

doi:10.3778/j.issn.1673-9418.2311027

摘要/Abstract

摘要： 机器翻译（MT）是利用计算机将一种语言转换为与其同语义的另一种语言的过程。随着神经网络的提出，神经机器翻译（NMT）作为一种强大的机器翻译技术，在自动翻译领域和人工智能方向上取得了显著成功。由于传统神经翻译模型存在参数、结构冗余的问题，提出了使用知识蒸馏（KD）技术手段对神经机器翻译进行模型压缩和加速推理，该方法在机器学习和自然语言处理领域引起了广泛的关注。主要从评价指标、技术创新等角度对各种引入知识蒸馏的翻译模型进行了系统的考察和比较。首先简要回顾了机器翻译的发展历程、主流框架和评价指标；接着详细介绍了知识蒸馏技术；然后分别从多语言模型、多模态翻译、低资源语言以及自回归和非自回归四个角度详述了基于知识蒸馏的神经机器翻译发展方向，并简要介绍其他领域的研究现状；最后针对现有的大语言模型、零资源语言及多模态机器翻译所存在的问题进行分析，展望神经机器翻译发展趋势。

关键词: 机器翻译, 神经机器翻译, 知识蒸馏, 模型压缩

Abstract: Machine translation (MT) is the process of using a computer to convert one language into another language with the same semantics. With the introduction of neural network, neural machine translation (NMT), as a powerful machine translation technology, has achieved remarkable success in the field of automatic translation and artificial intelligence. Due to the problem of redundant parameters and structure in traditional neural translation models, knowledge distillation (KD) technology is proposed to compress the model and accelerate the inference of neural machine translation, which has attracted wide attention in the field of machine learning and natural language processing. This paper systematically investigates and compares various translation models with introduction of know-ledge distillation from the perspectives of evaluation indicators and technical innovations. Firstly, this paper briefly reviews the development process, mainstream frameworks and evaluation indicators of machine translation. Secondly, the knowledge distillation technology is introduced in detail. Thirdly, the development direction of neural machine translation based on knowledge distillation is detailed from four perspectives: multi-language model, multi-modal translation, low-resource language, autoregressive and non-autoregressive, and the research status of other fields is briefly introduced. Finally, the problems of existing large language models, zero-resource languages and multi-modal machine translation are analyzed, and the development trend of neural machine translation is prospected.

Key words: machine translation, neural machine translation, knowledge distillation, model compression

马畅, 田永红, 郑晓莉, 孙康康. 基于知识蒸馏的神经机器翻译综述[J]. 计算机科学与探索, 2024, 18(7): 1725-1747.

MA Chang, TIAN Yonghong, ZHENG Xiaoli, SUN Kangkang. Survey of Neural Machine Translation Based on Knowledge Distillation[J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(7): 1725-1747.

参考文献

    [1] 李亚超, 熊德意, 张民. 神经机器翻译综述[J]. 计算机学报, 2018, 41(12): 2734-2755.
LI Y C, XIONG D Y, ZHANG M. A survey of neural machine translation[J]. Chinese Journal of Computers, 2018, 41(12): 2734-2755.
    [2] 娜日娜. 蒙古语新媒体发展概况[D]. 呼和浩特: 内蒙古师范大学, 2017.
NA R N. The survey of the development of Mongolian new media[D]. Hohhot: Inner Mongolia Normal?University, 2017.
    [3] 高璐璐, 赵雯. 机器翻译研究综述[J]. 中国外语, 2020, 17(6): 97-103.
GAO L L, ZHAO W. An overview study on machine translation[J]. Foreign Languages in China, 2020, 17(6): 97-103.
    [4] SUTSKEVER I, VINYALS O, LE Q V. Sequence to sequence learning with neural networks[C]//Advances in Neural Information Processing Systems 27, Montreal, Dec 8-13, 2014: 3104-3112.
    [5] BAHDANAU D, CHO K, BENGIO Y. Neural machine translation by jointly learning to align and translate[C]//Proceedings of the 3rd International Conference on Learning Representations, San Diego, May 7-9, 2015.
    [6] LUONG M T, PHAM H, MANNING C D. Effective approaches to attention-based neural machine translation[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Sep 17-21, 2015. Stroudsburg: ACL, 2015: 1412-1421.
    [7] KIM Y. Convolutional neural networks for sentence classification[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Doha, Oct 25-29, 2014. Stroudsburg: ACL, 2014: 1746-1751.
    [8] LAMB A, XIE M. Convolutional encoders for neural machine translation[EB/OL]. [2023-10-16]. https://cs224d.stanford. edu/reports/LambAndrew.pdf.
    [9] SUNDERMEYER M, ALKHOULI T, WUEBKER J, et al. Translation modeling with bidirectional recurrent neural networks[C]//Proceedings of the 2014 Conference on Empirical Methods In Natural Language Processing, Doha, Oct 25-29, 2014. Stroudsburg: ACL, 2014: 14-25.
[10] DATTA D, DAVID P E, MITTAL D, et al. Neural machine translation using recurrent neural network[J]. International Journal of Engineering and Advanced Technology, 2020, 9(4): 1395-1400.
[11] AULI M, GALLEY M, QUIRK C, et al. Joint language and translation modeling with recurrent neural networks[C]//Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Washington, Oct 18-21, 2013. Stroudsburg: ACL, 2013: 1044-1054.
[12] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems 30, Long Beach, Dec 4-9, 2017: 5998-6008.
[13] HINTON G, VINYALS O, DEAN J. Distilling the know-ledge in a neural network[EB/OL]. [2023-10-16]. https://arxiv.org/abs/1503.02531.
[14] GOU J, YU B, MAYBANK S J, et al. Knowledge distillation: a survey[J]. International Journal of Computer Vision, 2021, 129: 1789-1819.
[15] KAY M. The proper place of men and machines in language translation[J]. Machine Translation, 1997, 12: 3-23.
[16] MELAMED I D. Models of translational equivalence among words[J]. Computational Linguistics, 2000, 26(2): 221-249.
[17] SCHANK R C. Conceptual information processing[M]. Elsevier, 2014.
[18] CHAROENPORNSAWAT P, SORNLERTLAMVANICH V, CHAROENPORN T. Improving translation quality of rule-based machine translation[C]//Proceedings of Coling 2002 Workshop on Machine Translation in Asia, Taipei, China, Sep 1, 2002.
[19] WU H, WANG H. Improving statistical word alignment with a rule-based machine translation system[C]//Proceedings of the 20th International Conference on Computational Linguistics, Geneva, Aug 23-27, 2004: 29-35.
[20] YU S W, BAI X J. Rule-based machine translation[M]//Routledge Encyclopedia of Translation Technology. [S.l.]: Routledge, 2014: 186-200.
[21] FORCADA M L, GINESTí-ROSELL M, NORDFALK J, et al. Apertium: a free/open-source platform for rule-based machine translation[J]. Machine Translation, 2011, 25: 127-144.
[22] BROWN P F, COCKE J, DELLA PIETRA S A, et al. A statistical approach to machine translation[J]. Computational Linguistics, 1990, 16(2): 79-85.
[23] BROWN P F, DELLA PIETRA S A, MERCE R L. The mathematics of statistical machine translation: parameter estimation[J]. Computational Linguistics, 1993, 19(2): 263-311.
[24] OCH F J, NEY H. Discriminative training and maximum entropy models for statistical machine translation[C]//Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, Jul 12-17, 2002. Stroudsburg: ACL, 2002: 295-302.
[25] LIANG P, TASKAR B, KLEIN D. Alignment by agreement[C]//Proceedings of the 2006 Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, New York, Jun 4-9, 2006. Stroudsburg: ACL, 2006: 104-111.
[26] GALLEY M, MANNING C D. A simple and effective hierarchical phrase reordering model[C]//Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, Honolulu, Oct 25-27, 2008. Stroudsburg: ACL, 2008: 848-856.
[27] GALLEY M, GRAEHL J, KNIGHT K, et al. Scalable inference and training of context-rich syntactic translation models[C]//Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, Sydney, Jul 17-21, 2006. Stroudsburg: ACL, 2006: 961-968.
[28] CHIANG D. Hierarchical phrase-based translation[J]. Computational Linguistics, 2007, 33(2): 201-228.
[29] VOGEL S, NEY H, TILLMANN C. HMM-based word alignment in statistical translation[C]//Proceedings of the 16th International Conference on Computational Linguistics, Sprogteknologi, Aug 5-9, 1996: 836-841.
[30] KOEHN P, OCH F J, MARCU D. Statistical phrase-based translation[C]//Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, Edmonton, May 27-Jun 1, 2003. Stroudsburg: ACL, 2003: 127-133.
[31] MI H, HUANG L. Forest-based translation rule extraction[C]//Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, Honolulu, Oct 25-27, 2008. Stroudsburg: ACL, 2008: 206-214.
[32] HAFFARI G, ROY M, SARKAR A. Active learning for statistical phrase-based machine translation[C]//Proceedings of the 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Boulder, May 31-Jun 5, 2009. Stroudsburg: ACL, 2009: 415-423.
[33] 冯洋, 邵晨泽. 神经机器翻译前沿综述[J]. 中文信息学报, 2020, 34(7): 1-18.
FENG Y, SHAO C Z. Frontiers in neural machine translation: a literature review[J]. Journal of Chinese Information Processing, 2020, 34(7): 1-18.
[34] KALCHBRENNER N, BLUNSOM P. Recurrent continuous translation models[C]//Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, Oct 18-21, 2013. Stroudsburg: ACL, 2013: 1700-1709.
[35] LIPTON Z C, BERKOWITZ J, ELKAN C. A critical review of recurrent neural networks for sequence learning[EB/OL]. [2023-10-16]. https://arxiv.org/abs/1506.00019.
[36] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735-1780.
[37] CHUNG J, GULCEHRE C, CHO K H, et al. Empirical evaluation of gated recurrent neural networks on sequence modeling[EB/OL]. [2023-10-16]. https://arxiv.org/abs/1412. 3555.
[38] GEHRING J, AULI M, GRANGIER D, et al. Convolutional sequence to sequence learning[C]//Proceedings of the 34th International Conference on Machine Learning, Sydney, Aug 6-11, 2017: 1243-1252.
[39] GEHRING J, AULI M, GRANGIER D, et al. A convolutional encoder model for neural machine translation[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, Jul 30-Aug 4, 2017. Stroudsburg: ACL, 2017: 123-135.
[40] SHIV V, QUIRK C. Novel positional encodings to enable tree-based transformers[C]//Advances in Neural Information Processing Systems 32, Vancouver, Dec 8-14, 2019: 12058-12068.
[41] WANG Q, LI B, XIAO T, et al. Learning deep transformer models for machine translation[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Jul 28-Aug 2, 2019. Stroudsburg: ACL, 2019: 1810-1822.
[42] LU Y, ZENG J, ZHANG J, et al. Learning confidence for transformer-based neural machine translation[C]//Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Dublin, May 22-27, 2022. Stroudsburg: ACL, 2022: 2353-2364.
[43] RUMELHART D E, HINTON G E, WILLIAMS R J. Learning representations by back-propagating errors[J]. Nature, 1986, 323(6088): 533-536.
[44] LECUN Y, BOSER B, DENKER J S, et al. Backpropagation applied to handwritten zip code recognition[J]. Neural Computation, 1989, 1(4): 541-551.
[45] PAPINENI K, ROUKOS S, WARD T, et al. BLEU: a method for automatic evaluation of machine translation[C]//Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, Jul 6-12, 2002. Stroudsburg: ACL, 2002: 311-318.
[46] POPOVI? M. chrF: character n-gram F-score for automatic MT evaluation[C]//Proceedings of the 10th Workshop on Statistical Machine Translation, Lisboa, Sep 17-18, 2015: 392-395.
[47] JEAN S, CHO K, MEMISEVIC R, et al. On using very large target vocabulary for neural machine translation[C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Beijing, Jul 26-31, 2015. Stroudsburg: ACL, 2015: 1-10.
[48] BUCK C, HEAFIELD K, VAN OOYEN B. N-gram counts and language models from the common crawl[C]//Proceedings of the 9th International Conference on Language Resources and Evaluation, Reykjavik, May 26-31, 2014: 3579-3584.
[49] 周孝青, 段湘煜, 俞鸿飞, 等. 基于递进式半知识蒸馏的神经机器翻译[J]. 中文信息学报, 2021, 35(2): 52-60.
ZHOU X Q, DUAN X Y, YU H F, et al. Progressive semi-knowledge distillation for neural machine translation[J]. Journal of Chinese Information Processing, 2021, 35(2): 52-60.
[50] HAN S, MAO H, DALLY W J. Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman coding[EB/OL]. [2023-10-16]. https://arxiv.org/abs/1510.00149.
[51] JACOB B, KLIGYS S, CHEN B, et al. Quantization and training of neural networks for efficient integer-arithmetic-only inference[EB/OL]. [2023-10-16]. https://arxiv.org/abs/ 1712.05877.
[52] KIM Y, RUSH A M. Sequence-level knowledge distillation[C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, Nov 1-5, 2016. Stroudsburg: ACL, 2016: 1317-1327.
[53] LI T, LI J, LIU Z, et al. Few sample knowledge distillation for efficient network compression[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, Jun 13-19, 2020. Piscataway: IEEE, 2020: 14639-14647.
[54] ROMERO A, BALLAS N, KAHOU S E, et al. FitNets: hints for thin deep nets[EB/OL]. [2023-10-17]. https://arxiv.org/abs/1412.6550.
[55] HUANG Z, WANG N. Like what you like: knowledge distill via neuron selectivity transfer[EB/OL]. [2023-10-17]. https://arxiv.org/abs/1707.01219.
[56] PASSALIS N, TEFAS A. Learning deep representations with probabilistic knowledge transfer[C]//Proceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 268-284.
[57] CHEN D, MEI J P, WANG C, et al. Online knowledge distillation with diverse peers[C]//Proceedings of the 2020 AAAI Conference on Artificial Intelligence, New York, Feb 7-12, 2020. Menlo Park: AAAI, 2020: 3430-3437.
[58] XIE J, LIN S, ZHANG Y, et al. Training convolutional neural networks with cheap convolutions and online distillation[EB/OL]. [2023-10-17]. https://arxiv.org/abs/1909. 13063.
[59] KIM J, HYUN M, CHUNG I, et al. Feature fusion for online mutual knowledge distillation[C]//Proceedings of the 25th International Conference on Pattern Recognition, Milan, Jan 10-15, 2021: 4619-4625.
[60] CHUNG I, PARK S U, KIM J, et al. Feature-map-level online adversarial knowledge distillation[C]//Proceedings of the 2020 International Conference on Machine Learning, Vienna, Jul 12-18, 2020: 2006-2015.
[61] ZHANG Z, SABUNCU M. Self-distillation as instance-specific label smoothing[C]//Advances in Neural Information Processing Systems 33, Dec 6-12, 2020: 2184-2195.
[62] ZHANG L, SONG J, GAO A, et al. Be your own teacher: improve the performance of convolutional neural networks via self distillation[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Oct 27-Nov 2, 2019. Piscataway: IEEE, 2019: 3713-3722.
[63] YUN S, PARK J, LEE K, et al. Regularizing class-wise predictions via self-knowledge distillation[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, Jun 14-19, 2020. Piscataway: IEEE, 2020: 13876-13885.
[64] LEE H, HWANG S J, SHIN J. Rethinking data augmentation: self-supervision and self-distillation[EB/OL]. [2023-10-17]. https://arxiv.org/abs/1910.05872v1.
[65] SHEN Y, RONG W, JIANG N, et al. Word embedding based correlation model for question/answer matching[C]//Proceedings of the 2017 AAAI Conference on Artificial Intelligencem, San Francisco, Feb 4-9, 2017. Menlo Park: AAAI, 2017: 3511-3517.
[66] 朱俊国, 杨福岸, 余正涛, 等. 低频词表示增强的低资源神经机器翻译[J]. 中文信息学报, 2022, 36(6): 44-51.
ZHU J G, YANG F A, YU Z T, et al. Low resource neural machine translation with enhanced representation of rare words[J]. Journal of Chinese Information Processing, 2022, 36(6): 44-51.
[67] SENNRICH R, HADDOW B, BIRCH A. Neural machine translation of rare words with subword units[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Austin, Nov 1-5, 2016. Stroudsburg: ACL, 2016: 1715-1725.
[68] OUYANG L, WU J, JIANG X, et al. Training language models to follow instructions with human feedback[C]//Advances in Neural Information Processing Systems 35, New Orleans, Nov 28-Dec 9, 2022: 27730-27744.
[69] KENTON J D M W C, TOUTANOVA L K. BERT: pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 17th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, Jun 2-7, 2019. Stroudsburg: ACL, 2019: 4171-4186.
[70] JOHNSON M, SCHUSTER M, LE Q V, et al. Google??s multilingual neural machine translation system: enabling zero-shot translation[J]. Transactions of the Association for Computational Linguistics, 2017, 5: 339-351.
[71] HA T L, NIEHUES J, WAIBEL A. Toward multilingual neural machine translation with universal encoder and decoder[C]//Proceedings of the 13th International Conference on Spoken Language Translation, Seattle, Dec 8-9, 2016.
[72] AHARONI R, JOHNSON M, FIRAT O. Massively multilingual neural machine translation[C]//Proceedings of the 17th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, Jun 2-7, 2019. Stroudsburg: ACL, 2019: 3874-3884.
[73] TANG Y, TRAN C, LI X, et al. Multilingual translation with extensible multilingual pretraining and finetuning[EB/OL]. [2023-10-17]. https://arxiv.org/abs/2008.00401.
[74] ZHANG B, WILLIAMS P, TITOV I, et al. Improving massively multilingual neural machine translation and zero-shot translation[C]//Proceedings of the 2020 Annual Conference of the Association for Computational Linguistics, Jul 5-10, 2020. Stroudsburg: ACL, 2020: 1628-1639.
[75] TAN X, CHEN J, HE D, et al. Multilingual neural machine translation with language clustering[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kong, China, Nov 3-7, 2019. Stroudsburg: ACL, 2019: 963- 973.
[76] SUN H, WANG R, CHEN K, et al. Knowledge distillation for multilingual unsupervised neural machine translation[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Jul 5-10, 2020. Stroudsburg: ACL, 2020: 3525-3535.
[77] YANG J, YIN Y, MA S, et al. UM4: unified multilingual multiple teacher-student model for zero-resource neural machine translation[EB/OL]. [2023-10-18]. https://arXiv. org/abs/2207.04900.
[78] ZHANG J, HUANG H, HU Y, et al. Importance-based neuron selective distillation for interference mitigation in multilingual neural machine translation[C]//Proceedings of the 2023 International Conference on Knowledge Science, Engineering and Management, Guangzhou, Aug 16-18, 2023: 140-150.
[79] GUMMA V, DABRE R, KUMAR P. An empirical study of leveraging knowledge distillation for compressing multilingual neural machine translation models[C]//Proceedings of the 24th Annual Conference of the European Association for Machine Translation, Tampere, Jun 12-15, 2023: 103-114.
[80] ZHANG M, YANG H, TAO S, et al. Incorporating multilingual knowledge distillation into machine translation evaluation[C]//Proceedings of the 2022 China Conference on Knowledge Graph and Semantic Computing, Qinhuangdao, Aug 24-27, 2022: 148-160.
[81] TAN X, REN Y, HE D, et al. Multilingual neural machine translation with knowledge distillation[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Seattle, Jul 5-10, 2018. Stroudsburg: ACL, 2018: 3525-3535.
[82] KIROS R, SALAKHUTDINOV R, ZEMEL R. Multimodal neural language models[C]//Proceedings of the 2014 International Conference on Machine Learning, Beijing, Jun 21-26, 2014: 595-603.
[83] ELLIOTT D, FRANK S, BARRAULT L, et al. Findings of the second shared task on multimodal machine translation and multilingual image description[C]//Proceedings of the 2nd Conference on Machine Translation, Copenhagen, Sep 7-8, 2017: 215-233.
[84] BARRAULT L, BOUGARES F, SPECIA L, et al. Findings of the third shared task on multimodal machine translation[C]//Proceedings of the 3rd Conference on Machine Translation, Belgium, Oct 31-Nov 1, 2018: 308-327.
[85] CAGLAYAN O, BARRAULT L, BOUGARES F. Multimodal attention for neural machine translation[EB/OL]. [2023-10-18]. https://arxiv.org/abs/1609.03976.
[86] SU Y, FAN K, BACH N, et al. Unsupervised multi-modal neural machine translation[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-20, 2019. Piscataway: IEEE, 2019: 10482-10491.
[87] IVE J, MADHYASTHA P, SPECIA L. Distilling translations with visual awareness[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Jul 28-Aug 2, 2019. Stroudsburg: ACL, 2019: 6525-6538.
[88] YAO S, WAN X. Multimodal transformer for multimodal machine translation[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Jul 5-10, 2020. Stroudsburg: ACL, 2020: 4346-4350.
[89] OKABE S, BLAIN F, SPECIA L. Multimodal quality estimation for machine translation[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Jul 5-10, 2020. Stroudsburg: ACL, 2020: 1233-1240.
[90] GUPTA S, HOFFMAN J, MALIK J. Cross modal distillation for supervision transfer[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Washington: IEEE Computer Society, 2016: 2827-2836.
[91] YUAN M, PENG Y. Text-to-image synthesis via symmetrical distillation networks[C]//Proceedings of the 26th ACM International Conference on Multimedia, New York, Oct 22-26, 2018. New York: ACM, 2018: 1407-1415.
[92] GU J, BRADBURY J, XIONG C, et al. Non-autoregressive neural machine translation[EB/OL]. [2023-10-17]. https://arxiv.org/abs/1711.02281.
[93] PENG R, ZENG Y, ZHAO J. Distill the image to nowhere: inversion knowledge distillation for multimodal machine translation[C]//Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Abu Dhabi, Dec 7-11, 2022. Stroudsburg: ACL, 2022: 2379-2390.
[94] CALIXTO I, LIU Q, CAMPBELL N. Doubly-attentive decoder for multi-modal neural machine translation[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, Jul 30-Aug 4, 2017. Stroudsburg: ACL, 2017: 1913-1924.
[95] YIN Y, MENG F, SU J, et al. A novel graph-based multi-modal fusion encoder for neural machine translation[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Jul 5-10, 2020. Stroudsburg: ACL, 2020: 3025-3035.
[96] WU Z, KONG L, BI W, et al. Good for misconceived reasons: an empirical revisiting on the need for visual context in multimodal machine translation[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Aug 1-6, 2021. Stroudsburg: ACL, 2021: 6153-6166.
[97] CALIXTO I, RIOS M, AZIZ W. Latent variable model for multi-modal translation[EB/OL]. [2023-10-18]. https://arXiv. org/abs/1811.00357.
[98] LONG Q, WANG M, LI L. Generative imagination elevates machine translation[C]//Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Jun 6-11, 2021. Stroudsburg: ACL, 2021: 5738-5748.
[99] XIA M, KONG X, ANASTASOPOULOS A, et al. Generalized data augmentation for low-resource translation[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Jul 28-Aug 2, 2019. Stroudsburg: ACL, 2019: 5786-5796.
[100] ZOPH B, YURET D, MAY J, et al. Transfer learning for low-resource neural machine translation[C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, Nov 1-5, 2016. Stroudsburg: ACL, 2016: 1568-1575.
[101] HUANG K, HUANG D, LIU Z, et al. A joint multiple criteria model in transfer learning for cross-domain Chinese word segmentation[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, Nov 16-20, 2020. Stroudsburg: ACL, 2020: 3873-3882.
[102] 米尔阿迪力江·麦麦提. 低资源条件下的神经机器翻译方法研究[D]. 北京: 清华大学, 2021.
MIERADILIJIANG·MAIMAITI. Research on neural machine translation methods under low-resource conditions[D]. Beijing: Tsinghua University, 2021.
[103] SENNRICH R, HADDOW B, BIRCH A. Improving neural machine translation models with monolingual data[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Aug 7-12, 2016. Stroudsburg: ACL, 2016: 86-96.
[104] FADAEE M, BISAZZA A, MONZ C. Data augmentation for low-resource neural machine translation[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, Jul 30-Aug 30, 2017. Stroudsburg: ACL, 2017: 567-573.
[105] 尤丛丛, 高盛祥, 余正涛, 等. 基于同义词数据增强的汉越神经机器翻译方法[J]. 计算机工程与科学, 2021, 43(8): 1497-1502.
YOU C C, GAO S X, YU Z T, et al. A Chinese-Vietnamese neural machine translation method based an synonym data augmentation[J]. Computer Engineering & Science, 2021, 43(8): 1497-1502.
[106] KOBAYASHI S. Contextual augmentation: data augmentation by words with paradigmatic relations[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, New Orleans, Jun 1-6, 2018. Stroudsburg: ACL, 2018: 452-457.
[107] LONG M, CAO Y, WANG J, et al. Learning transferable features with deep adaptation networks[C]//Proceedings of the 2015 International Conference on Machine Learning, Lille, Jul 6-11, 2015: 95-105.
[108] AJI A F, BOGOYCHEV N, HEAFIELD K, et al. In neural machine translation, what does transfer learning transfer?[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Jul 5-10, 2020. Stroudsburg: ACL, 2020: 7701-7710.
[109] GU J, WANG Y, CHEN Y, et al. Meta-learning for low-resource neural machine translation[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Oct 31-Nov 4, 2018. Stroudsburg: ACL, 2018: 3622-3631.
[110] LI R, WANG X, YU H. MetaMT, a meta learning method leveraging multiple domain data for low resource machine translation[C]//Proceedings of the 2020 AAAI Conference on Artificial Intelligence, New York, Feb 7-12, 2020. Menlo Park: AAAI, 2020: 8245-8252.
[111] BAZIOTIS C, HADDOW B, BIRCH A. Language model prior for low-resource neural machine translation[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, Nov 16-20, 2020. Stroudsburg: ACL, 2020: 7622-7634.
[112] SONG Y, EZZINI S, KLEIN J, et al. Letz translate: low-resource machine translation for luxembourgish[EB/OL]. [2023-10-18]. https://arxiv.org/pdf/2303.01347.
[113] DIDDEE H, DANDAPAT S, CHOUDHURY M, et al. Too brittle to touch: comparing the stability of quantization and distillation towards developing low-resource MT models[C]//Proceedings of the 7th Conference on Machine Translation, Abu Dhabi, Dec 7-8, 2022: 870-885.
[114] SALEH F, BUNTINE W, HAFFARI G, et al. Multilingual neural machine translation: can linguistic hierarchies help?[C]//Findings of the Association for Computational Linguistics: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Nov 16-20, 2021. Stroudsburg: ACL, 2021: 1313-1330.
[115] ZHANG X, LI X, YANG Y, et al. Improving low-resource neural machine translation with teacher-free knowledge distillation[J]. IEEE Access, 2020, 8: 206638-206645.
[116] HE T, CHEN J, TAN X, et al. Language graph distillation for low-resource machine translation[EB/OL]. [2023-10-18]. https://arxiv.org/abs/1908.06258.
[117] LIBOVICKY J, HELCL J. End-to-end non-autoregressive neural machine translation with connectionist temporal classification[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Oct 31-Nov 4, 2018. Stroudsburg: ACL, 2018: 3016-3021.
[118] LI Z, LIN Z, HE D, et al. Hint-based training for non-autoregressive machine translation[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kong, China, Nov 3-7, 2019. Stroudsburg: ACL, 2019: 5708-5713.
[119] GUO J, TAN X, HE D, et al. Non-autoregressive neural machine translation with enhanced decoder input[C]//Proceedings of the 2019 AAAI Conference on Artificial Intelligence, Honolulu, Jan 27-Feb 1, 2019. Menlo Park: AAAI, 2019: 3723-3730.
[120] ZHOU J, KEUNG P. Improving non-autoregressive neural machine translation with monolingual data[EB/OL]. [2023-10-18]. https://arxiv.org/abs/1711.02281.
[121] ZHOU C, GU J, NEUBIG G. Understanding knowledge distillation in non-autoregressive machine translation[C]//Proceedings of the 8th International Conference on Learning Representations, Addis Ababa, Apr 26-30, 2020.
[122] GUO J, WANG M, WEI D, et al. Self-distillation mixup training for non-autoregressive neural machine translation[EB/OL]. [2023-10-18]. https://arxiv.org/abs/2112.11640.
[123] XU W, MA S, ZHANG D, et al. How does distilled data complexity impact the quality and confidence of non-autoregressive machine translation?[C]//Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, Bangkok, Aug 1-6, 2021. Stroudsburg: ACL, 2021: 4392-4400.
[124] BAO G S, TENG Z Y, ZHOU H, et al. Non-autoregressive document-level machine translation[C]//Findings of the Association for Computational Linguistics: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Singapore, Dec 6-10, 2023. Stroudsburg: ACL, 2023: 14791-14803.
[125] DING L, WANG L, LIU X, et al. Understanding and improving lexical choice in non-autoregressive translation[C]//Proceedings of the 2020 International Conference on Learning Representations, Addis Ababa, Apr 30, 2020.
[126] REN Y, LIU J, TAN X, et al. A study of non-autoregressive model for sequence generation[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Jul 5-10, 2020. Stroudsburg: ACL, 2020: 149-159.
[127] XIAO Y, WU L, GUO J, et al. A survey on non-autoregressive generation for neural machine translation and beyond[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(10): 11407-11427.
[128] LIU M, BAO Y, ZHAO C, et al. Selective knowledge distillation for non-autoregressive neural machine translation[C]//Proceedings of the 37th AAAI Conference on Artificial Intelligence, Washington, Feb 7-14, 2023. Menlo Park: AAAI, 2023: 13246-13254.
[129] KASAI J, PAPPAS N, PENG H, et al. Deep encoder, shallow decoder: reevaluating non-autoregressive machine translation[C]//Proceedings of the 2020 International Conference on Learning Representations, Addis Ababa, Apr 26-30, 2020.
[130] GHAZVININEJAD M, LEVY O, LIU Y, et al. Mask-Predict: parallel decoding of conditional masked language models[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kong, China, Nov 3-7, 2019. Stroudsburg: ACL, 2019: 6112-6121.
[131] QIAN L, ZHOU H, BAO Y, et al. Glancing transformer for non-autoregressive neural machine translation[C]// Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Aug 1-6, 2021. Stroudsburg: ACL, 2021: 1993-2003.
[132] TAN X, ZHANG L, XIONG D, et al. Hierarchical modeling of global context for document-level neural machine translation[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kong, China, Nov 3-7, 2019. Stroudsburg: ACL, 2019: 1576-1585.
[133] REN Y, LIU J, TAN X, et al. SimulSpeech: end-to-end simultaneous speech to text translation[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Jul 5-10, 2020. Stroudsburg: ACL, 2020: 3787-3796.
[134] WANG S, WU J, FAN K, et al. Better simultaneous translation with monotonic knowledge distillation[C]//Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, Toronto, Jul 9-14, 2023. Stroudsburg: ACL, 2023: 2334-2349.
[135] CHU C, WANG R. A survey of domain adaptation for neural machine translation[C]//Proceedings of the 27th International Conference on Computational Linguistics, New Mexico, Aug 20-26, 2018: 1304-1319.
[136] GORDON M, DUH K. Distill, adapt, distill: training small, in-domain models for neural machine translation[C]//Proceedings of the 4th Workshop on Neural Generation and Translation, Jul 5-10, 2020: 110-118.
[137] ZHU J, XIA Y, WU L, et al. Incorporating BERT into neural machine translation[EB/OL]. [2023-10-18]. https://arxiv.org/abs/2002.06823.
[138] IMAMURA K, SUMITA E. Recycling a pre-trained BERT encoder for neural machine translation[C]//Proceedings of the 3rd Workshop on Neural Generation and Translation,Hong Kong, China, Nov 4, 2019: 23-31.
[139] XU H, VAN DURME B, MURRAY K. BERT, mBERT, or BiBERT? A study on contextualized embeddings for neural machine translation[C]//Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Punta Cana, Nov 7-11, 2021. Stroudsburg: ACL, 2021: 6663-6675.
[140] DAI Y, DE KAMPS M, SHAROFF S. BERTology for machine translation: what BERT knows about linguistic difficulties for translation[C]//Proceedings of the 13th Language Resources and Evaluation Conference, Marseille, Jun 20-25, 2022: 6674-6690.
[141] JIAO X, YIN Y, SHANG L, et al. TinyBERT: distilling BERT for natural language understanding[C]//Findings of the Association for Computational Linguistics: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, Nov 16-20, 2020. Stroudsburg: ACL, 2020: 4163-4174.
[142] RADFORD A, NARASIMHAN K, SALIMANS T, et al. Improving language understanding by generative pre-training[EB/OL]. [2023-10-18]. https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper. pdf.
[143] HENDY A, ABDELREHIM M, SHARAF A, et al. How good are GPT models at machine translation? A comprehensive evaluation[EB/OL]. [2023-10-18]. https://arxiv.org/abs/2302.09210.
[144] JIAO W, WANG W, HUANG J T, et al. Is ChatGPT a good translator? Yes with GPT-4 as the engine[EB/OL]. [2023-10-18]. https://arxiv.org/abs/2301.08745.
[145] WU Y, HU G. Exploring prompt engineering with GPT language models for document-level machine translation: insights and findings[C]//Proceedings of the 8th Conference on Machine Translation, Singapore, Dec 6-7, 2023: 166-169.
[146] MARIE B, FUJITA A. Synthesizing monolingual data for neural machine translation[EB/OL]. [2023-10-19]. https://arxiv.org/abs/2101.12462.
[147] STAP D, ARAABI A. ChatGPT is not a good indigenous translator[C]//Proceedings of the 2023 Workshop on Natural Language Processing for Indigenous Languages of the Americas, Toronto, Jul 14, 2023: 163-167.
[148] TAO C, HOU L, ZHANG W, et al. Compression of generative pre-trained language models via quantization[C]//Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Dublin, May 22-27, 2022. Stroudsburg: ACL, 2022: 4821-4836.
[149] GU Y, DONG L, WEI F, et al. Knowledge distillation of large language models[EB/OL]. [2023-10-19]. https://arxiv.org/abs/2306.08543.
[150] NAYAK G K, MOPURI K R, SHAJ V, et al. Zero-shot knowledge distillation in deep networks[C]//Proceedings of the 36th International Conference on Machine Learning, Long Beach, Jun 9-15, 2019: 4743-4751.