Journal of Frontiers of Computer Science and Technology ›› 2024, Vol. 18 ›› Issue (7): 1725-1747.DOI: 10.3778/j.issn.1673-9418.2311027
• Frontiers·Surveys • Previous Articles Next Articles
MA Chang, TIAN Yonghong, ZHENG Xiaoli, SUN Kangkang
Online:
2024-07-01
Published:
2024-06-28
马畅,田永红,郑晓莉,孙康康
MA Chang, TIAN Yonghong, ZHENG Xiaoli, SUN Kangkang. Survey of Neural Machine Translation Based on Knowledge Distillation[J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(7): 1725-1747.
马畅, 田永红, 郑晓莉, 孙康康. 基于知识蒸馏的神经机器翻译综述[J]. 计算机科学与探索, 2024, 18(7): 1725-1747.
Add to citation manager EndNote|Ris|BibTeX
URL: http://fcst.ceaj.org/EN/10.3778/j.issn.1673-9418.2311027
[1] 李亚超, 熊德意, 张民. 神经机器翻译综述[J]. 计算机学报, 2018, 41(12): 2734-2755. LI Y C, XIONG D Y, ZHANG M. A survey of neural machine translation[J]. Chinese Journal of Computers, 2018, 41(12): 2734-2755. [2] 娜日娜. 蒙古语新媒体发展概况[D]. 呼和浩特: 内蒙古师范大学, 2017. NA R N. The survey of the development of Mongolian new media[D]. Hohhot: Inner Mongolia Normal?University, 2017. [3] 高璐璐, 赵雯. 机器翻译研究综述[J]. 中国外语, 2020, 17(6): 97-103. GAO L L, ZHAO W. An overview study on machine translation[J]. Foreign Languages in China, 2020, 17(6): 97-103. [4] SUTSKEVER I, VINYALS O, LE Q V. Sequence to sequence learning with neural networks[C]//Advances in Neural Information Processing Systems 27, Montreal, Dec 8-13, 2014: 3104-3112. [5] BAHDANAU D, CHO K, BENGIO Y. Neural machine translation by jointly learning to align and translate[C]//Proceedings of the 3rd International Conference on Learning Representations, San Diego, May 7-9, 2015. [6] LUONG M T, PHAM H, MANNING C D. Effective approaches to attention-based neural machine translation[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Sep 17-21, 2015. Stroudsburg: ACL, 2015: 1412-1421. [7] KIM Y. Convolutional neural networks for sentence classification[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Doha, Oct 25-29, 2014. Stroudsburg: ACL, 2014: 1746-1751. [8] LAMB A, XIE M. Convolutional encoders for neural machine translation[EB/OL]. [2023-10-16]. https://cs224d.stanford. edu/reports/LambAndrew.pdf. [9] SUNDERMEYER M, ALKHOULI T, WUEBKER J, et al. Translation modeling with bidirectional recurrent neural networks[C]//Proceedings of the 2014 Conference on Empirical Methods In Natural Language Processing, Doha, Oct 25-29, 2014. Stroudsburg: ACL, 2014: 14-25. [10] DATTA D, DAVID P E, MITTAL D, et al. Neural machine translation using recurrent neural network[J]. International Journal of Engineering and Advanced Technology, 2020, 9(4): 1395-1400. [11] AULI M, GALLEY M, QUIRK C, et al. Joint language and translation modeling with recurrent neural networks[C]//Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Washington, Oct 18-21, 2013. Stroudsburg: ACL, 2013: 1044-1054. [12] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems 30, Long Beach, Dec 4-9, 2017: 5998-6008. [13] HINTON G, VINYALS O, DEAN J. Distilling the know-ledge in a neural network[EB/OL]. [2023-10-16]. https://arxiv.org/abs/1503.02531. [14] GOU J, YU B, MAYBANK S J, et al. Knowledge distillation: a survey[J]. International Journal of Computer Vision, 2021, 129: 1789-1819. [15] KAY M. The proper place of men and machines in language translation[J]. Machine Translation, 1997, 12: 3-23. [16] MELAMED I D. Models of translational equivalence among words[J]. Computational Linguistics, 2000, 26(2): 221-249. [17] SCHANK R C. Conceptual information processing[M]. Elsevier, 2014. [18] CHAROENPORNSAWAT P, SORNLERTLAMVANICH V, CHAROENPORN T. Improving translation quality of rule-based machine translation[C]//Proceedings of Coling 2002 Workshop on Machine Translation in Asia, Taipei, China, Sep 1, 2002. [19] WU H, WANG H. Improving statistical word alignment with a rule-based machine translation system[C]//Proceedings of the 20th International Conference on Computational Linguistics, Geneva, Aug 23-27, 2004: 29-35. [20] YU S W, BAI X J. Rule-based machine translation[M]//Routledge Encyclopedia of Translation Technology. [S.l.]: Routledge, 2014: 186-200. [21] FORCADA M L, GINESTí-ROSELL M, NORDFALK J, et al. Apertium: a free/open-source platform for rule-based machine translation[J]. Machine Translation, 2011, 25: 127-144. [22] BROWN P F, COCKE J, DELLA PIETRA S A, et al. A statistical approach to machine translation[J]. Computational Linguistics, 1990, 16(2): 79-85. [23] BROWN P F, DELLA PIETRA S A, MERCE R L. The mathematics of statistical machine translation: parameter estimation[J]. Computational Linguistics, 1993, 19(2): 263-311. [24] OCH F J, NEY H. Discriminative training and maximum entropy models for statistical machine translation[C]//Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, Jul 12-17, 2002. Stroudsburg: ACL, 2002: 295-302. [25] LIANG P, TASKAR B, KLEIN D. Alignment by agreement[C]//Proceedings of the 2006 Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, New York, Jun 4-9, 2006. Stroudsburg: ACL, 2006: 104-111. [26] GALLEY M, MANNING C D. A simple and effective hierarchical phrase reordering model[C]//Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, Honolulu, Oct 25-27, 2008. Stroudsburg: ACL, 2008: 848-856. [27] GALLEY M, GRAEHL J, KNIGHT K, et al. Scalable inference and training of context-rich syntactic translation models[C]//Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, Sydney, Jul 17-21, 2006. Stroudsburg: ACL, 2006: 961-968. [28] CHIANG D. Hierarchical phrase-based translation[J]. Computational Linguistics, 2007, 33(2): 201-228. [29] VOGEL S, NEY H, TILLMANN C. HMM-based word alignment in statistical translation[C]//Proceedings of the 16th International Conference on Computational Linguistics, Sprogteknologi, Aug 5-9, 1996: 836-841. [30] KOEHN P, OCH F J, MARCU D. Statistical phrase-based translation[C]//Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, Edmonton, May 27-Jun 1, 2003. Stroudsburg: ACL, 2003: 127-133. [31] MI H, HUANG L. Forest-based translation rule extraction[C]//Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, Honolulu, Oct 25-27, 2008. Stroudsburg: ACL, 2008: 206-214. [32] HAFFARI G, ROY M, SARKAR A. Active learning for statistical phrase-based machine translation[C]//Proceedings of the 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Boulder, May 31-Jun 5, 2009. Stroudsburg: ACL, 2009: 415-423. [33] 冯洋, 邵晨泽. 神经机器翻译前沿综述[J]. 中文信息学报, 2020, 34(7): 1-18. FENG Y, SHAO C Z. Frontiers in neural machine translation: a literature review[J]. Journal of Chinese Information Processing, 2020, 34(7): 1-18. [34] KALCHBRENNER N, BLUNSOM P. Recurrent continuous translation models[C]//Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, Oct 18-21, 2013. Stroudsburg: ACL, 2013: 1700-1709. [35] LIPTON Z C, BERKOWITZ J, ELKAN C. A critical review of recurrent neural networks for sequence learning[EB/OL]. [2023-10-16]. https://arxiv.org/abs/1506.00019. [36] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735-1780. [37] CHUNG J, GULCEHRE C, CHO K H, et al. Empirical evaluation of gated recurrent neural networks on sequence modeling[EB/OL]. [2023-10-16]. https://arxiv.org/abs/1412. 3555. [38] GEHRING J, AULI M, GRANGIER D, et al. Convolutional sequence to sequence learning[C]//Proceedings of the 34th International Conference on Machine Learning, Sydney, Aug 6-11, 2017: 1243-1252. [39] GEHRING J, AULI M, GRANGIER D, et al. A convolutional encoder model for neural machine translation[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, Jul 30-Aug 4, 2017. Stroudsburg: ACL, 2017: 123-135. [40] SHIV V, QUIRK C. Novel positional encodings to enable tree-based transformers[C]//Advances in Neural Information Processing Systems 32, Vancouver, Dec 8-14, 2019: 12058-12068. [41] WANG Q, LI B, XIAO T, et al. Learning deep transformer models for machine translation[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Jul 28-Aug 2, 2019. Stroudsburg: ACL, 2019: 1810-1822. [42] LU Y, ZENG J, ZHANG J, et al. Learning confidence for transformer-based neural machine translation[C]//Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Dublin, May 22-27, 2022. Stroudsburg: ACL, 2022: 2353-2364. [43] RUMELHART D E, HINTON G E, WILLIAMS R J. Learning representations by back-propagating errors[J]. Nature, 1986, 323(6088): 533-536. [44] LECUN Y, BOSER B, DENKER J S, et al. Backpropagation applied to handwritten zip code recognition[J]. Neural Computation, 1989, 1(4): 541-551. [45] PAPINENI K, ROUKOS S, WARD T, et al. BLEU: a method for automatic evaluation of machine translation[C]//Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, Jul 6-12, 2002. Stroudsburg: ACL, 2002: 311-318. [46] POPOVI? M. chrF: character n-gram F-score for automatic MT evaluation[C]//Proceedings of the 10th Workshop on Statistical Machine Translation, Lisboa, Sep 17-18, 2015: 392-395. [47] JEAN S, CHO K, MEMISEVIC R, et al. On using very large target vocabulary for neural machine translation[C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Beijing, Jul 26-31, 2015. Stroudsburg: ACL, 2015: 1-10. [48] BUCK C, HEAFIELD K, VAN OOYEN B. N-gram counts and language models from the common crawl[C]//Proceedings of the 9th International Conference on Language Resources and Evaluation, Reykjavik, May 26-31, 2014: 3579-3584. [49] 周孝青, 段湘煜, 俞鸿飞, 等. 基于递进式半知识蒸馏的神经机器翻译[J]. 中文信息学报, 2021, 35(2): 52-60. ZHOU X Q, DUAN X Y, YU H F, et al. Progressive semi-knowledge distillation for neural machine translation[J]. Journal of Chinese Information Processing, 2021, 35(2): 52-60. [50] HAN S, MAO H, DALLY W J. Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman coding[EB/OL]. [2023-10-16]. https://arxiv.org/abs/1510.00149. [51] JACOB B, KLIGYS S, CHEN B, et al. Quantization and training of neural networks for efficient integer-arithmetic-only inference[EB/OL]. [2023-10-16]. https://arxiv.org/abs/ 1712.05877. [52] KIM Y, RUSH A M. Sequence-level knowledge distillation[C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, Nov 1-5, 2016. Stroudsburg: ACL, 2016: 1317-1327. [53] LI T, LI J, LIU Z, et al. Few sample knowledge distillation for efficient network compression[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, Jun 13-19, 2020. Piscataway: IEEE, 2020: 14639-14647. [54] ROMERO A, BALLAS N, KAHOU S E, et al. FitNets: hints for thin deep nets[EB/OL]. [2023-10-17]. https://arxiv.org/abs/1412.6550. [55] HUANG Z, WANG N. Like what you like: knowledge distill via neuron selectivity transfer[EB/OL]. [2023-10-17]. https://arxiv.org/abs/1707.01219. [56] PASSALIS N, TEFAS A. Learning deep representations with probabilistic knowledge transfer[C]//Proceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 268-284. [57] CHEN D, MEI J P, WANG C, et al. Online knowledge distillation with diverse peers[C]//Proceedings of the 2020 AAAI Conference on Artificial Intelligence, New York, Feb 7-12, 2020. Menlo Park: AAAI, 2020: 3430-3437. [58] XIE J, LIN S, ZHANG Y, et al. Training convolutional neural networks with cheap convolutions and online distillation[EB/OL]. [2023-10-17]. https://arxiv.org/abs/1909. 13063. [59] KIM J, HYUN M, CHUNG I, et al. Feature fusion for online mutual knowledge distillation[C]//Proceedings of the 25th International Conference on Pattern Recognition, Milan, Jan 10-15, 2021: 4619-4625. [60] CHUNG I, PARK S U, KIM J, et al. Feature-map-level online adversarial knowledge distillation[C]//Proceedings of the 2020 International Conference on Machine Learning, Vienna, Jul 12-18, 2020: 2006-2015. [61] ZHANG Z, SABUNCU M. Self-distillation as instance-specific label smoothing[C]//Advances in Neural Information Processing Systems 33, Dec 6-12, 2020: 2184-2195. [62] ZHANG L, SONG J, GAO A, et al. Be your own teacher: improve the performance of convolutional neural networks via self distillation[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Oct 27-Nov 2, 2019. Piscataway: IEEE, 2019: 3713-3722. [63] YUN S, PARK J, LEE K, et al. Regularizing class-wise predictions via self-knowledge distillation[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, Jun 14-19, 2020. Piscataway: IEEE, 2020: 13876-13885. [64] LEE H, HWANG S J, SHIN J. Rethinking data augmentation: self-supervision and self-distillation[EB/OL]. [2023-10-17]. https://arxiv.org/abs/1910.05872v1. [65] SHEN Y, RONG W, JIANG N, et al. Word embedding based correlation model for question/answer matching[C]//Proceedings of the 2017 AAAI Conference on Artificial Intelligencem, San Francisco, Feb 4-9, 2017. Menlo Park: AAAI, 2017: 3511-3517. [66] 朱俊国, 杨福岸, 余正涛, 等. 低频词表示增强的低资源神经机器翻译[J]. 中文信息学报, 2022, 36(6): 44-51. ZHU J G, YANG F A, YU Z T, et al. Low resource neural machine translation with enhanced representation of rare words[J]. Journal of Chinese Information Processing, 2022, 36(6): 44-51. [67] SENNRICH R, HADDOW B, BIRCH A. Neural machine translation of rare words with subword units[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Austin, Nov 1-5, 2016. Stroudsburg: ACL, 2016: 1715-1725. [68] OUYANG L, WU J, JIANG X, et al. Training language models to follow instructions with human feedback[C]//Advances in Neural Information Processing Systems 35, New Orleans, Nov 28-Dec 9, 2022: 27730-27744. [69] KENTON J D M W C, TOUTANOVA L K. BERT: pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 17th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, Jun 2-7, 2019. Stroudsburg: ACL, 2019: 4171-4186. [70] JOHNSON M, SCHUSTER M, LE Q V, et al. Google??s multilingual neural machine translation system: enabling zero-shot translation[J]. Transactions of the Association for Computational Linguistics, 2017, 5: 339-351. [71] HA T L, NIEHUES J, WAIBEL A. Toward multilingual neural machine translation with universal encoder and decoder[C]//Proceedings of the 13th International Conference on Spoken Language Translation, Seattle, Dec 8-9, 2016. [72] AHARONI R, JOHNSON M, FIRAT O. Massively multilingual neural machine translation[C]//Proceedings of the 17th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, Jun 2-7, 2019. Stroudsburg: ACL, 2019: 3874-3884. [73] TANG Y, TRAN C, LI X, et al. Multilingual translation with extensible multilingual pretraining and finetuning[EB/OL]. [2023-10-17]. https://arxiv.org/abs/2008.00401. [74] ZHANG B, WILLIAMS P, TITOV I, et al. Improving massively multilingual neural machine translation and zero-shot translation[C]//Proceedings of the 2020 Annual Conference of the Association for Computational Linguistics, Jul 5-10, 2020. Stroudsburg: ACL, 2020: 1628-1639. [75] TAN X, CHEN J, HE D, et al. Multilingual neural machine translation with language clustering[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kong, China, Nov 3-7, 2019. Stroudsburg: ACL, 2019: 963- 973. [76] SUN H, WANG R, CHEN K, et al. Knowledge distillation for multilingual unsupervised neural machine translation[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Jul 5-10, 2020. Stroudsburg: ACL, 2020: 3525-3535. [77] YANG J, YIN Y, MA S, et al. UM4: unified multilingual multiple teacher-student model for zero-resource neural machine translation[EB/OL]. [2023-10-18]. https://arXiv. org/abs/2207.04900. [78] ZHANG J, HUANG H, HU Y, et al. Importance-based neuron selective distillation for interference mitigation in multilingual neural machine translation[C]//Proceedings of the 2023 International Conference on Knowledge Science, Engineering and Management, Guangzhou, Aug 16-18, 2023: 140-150. [79] GUMMA V, DABRE R, KUMAR P. An empirical study of leveraging knowledge distillation for compressing multilingual neural machine translation models[C]//Proceedings of the 24th Annual Conference of the European Association for Machine Translation, Tampere, Jun 12-15, 2023: 103-114. [80] ZHANG M, YANG H, TAO S, et al. Incorporating multilingual knowledge distillation into machine translation evaluation[C]//Proceedings of the 2022 China Conference on Knowledge Graph and Semantic Computing, Qinhuangdao, Aug 24-27, 2022: 148-160. [81] TAN X, REN Y, HE D, et al. Multilingual neural machine translation with knowledge distillation[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Seattle, Jul 5-10, 2018. Stroudsburg: ACL, 2018: 3525-3535. [82] KIROS R, SALAKHUTDINOV R, ZEMEL R. Multimodal neural language models[C]//Proceedings of the 2014 International Conference on Machine Learning, Beijing, Jun 21-26, 2014: 595-603. [83] ELLIOTT D, FRANK S, BARRAULT L, et al. Findings of the second shared task on multimodal machine translation and multilingual image description[C]//Proceedings of the 2nd Conference on Machine Translation, Copenhagen, Sep 7-8, 2017: 215-233. [84] BARRAULT L, BOUGARES F, SPECIA L, et al. Findings of the third shared task on multimodal machine translation[C]//Proceedings of the 3rd Conference on Machine Translation, Belgium, Oct 31-Nov 1, 2018: 308-327. [85] CAGLAYAN O, BARRAULT L, BOUGARES F. Multimodal attention for neural machine translation[EB/OL]. [2023-10-18]. https://arxiv.org/abs/1609.03976. [86] SU Y, FAN K, BACH N, et al. Unsupervised multi-modal neural machine translation[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, Jun 16-20, 2019. Piscataway: IEEE, 2019: 10482-10491. [87] IVE J, MADHYASTHA P, SPECIA L. Distilling translations with visual awareness[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Jul 28-Aug 2, 2019. Stroudsburg: ACL, 2019: 6525-6538. [88] YAO S, WAN X. Multimodal transformer for multimodal machine translation[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Jul 5-10, 2020. Stroudsburg: ACL, 2020: 4346-4350. [89] OKABE S, BLAIN F, SPECIA L. Multimodal quality estimation for machine translation[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Jul 5-10, 2020. Stroudsburg: ACL, 2020: 1233-1240. [90] GUPTA S, HOFFMAN J, MALIK J. Cross modal distillation for supervision transfer[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Washington: IEEE Computer Society, 2016: 2827-2836. [91] YUAN M, PENG Y. Text-to-image synthesis via symmetrical distillation networks[C]//Proceedings of the 26th ACM International Conference on Multimedia, New York, Oct 22-26, 2018. New York: ACM, 2018: 1407-1415. [92] GU J, BRADBURY J, XIONG C, et al. Non-autoregressive neural machine translation[EB/OL]. [2023-10-17]. https://arxiv.org/abs/1711.02281. [93] PENG R, ZENG Y, ZHAO J. Distill the image to nowhere: inversion knowledge distillation for multimodal machine translation[C]//Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Abu Dhabi, Dec 7-11, 2022. Stroudsburg: ACL, 2022: 2379-2390. [94] CALIXTO I, LIU Q, CAMPBELL N. Doubly-attentive decoder for multi-modal neural machine translation[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, Jul 30-Aug 4, 2017. Stroudsburg: ACL, 2017: 1913-1924. [95] YIN Y, MENG F, SU J, et al. A novel graph-based multi-modal fusion encoder for neural machine translation[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Jul 5-10, 2020. Stroudsburg: ACL, 2020: 3025-3035. [96] WU Z, KONG L, BI W, et al. Good for misconceived reasons: an empirical revisiting on the need for visual context in multimodal machine translation[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Aug 1-6, 2021. Stroudsburg: ACL, 2021: 6153-6166. [97] CALIXTO I, RIOS M, AZIZ W. Latent variable model for multi-modal translation[EB/OL]. [2023-10-18]. https://arXiv. org/abs/1811.00357. [98] LONG Q, WANG M, LI L. Generative imagination elevates machine translation[C]//Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Jun 6-11, 2021. Stroudsburg: ACL, 2021: 5738-5748. [99] XIA M, KONG X, ANASTASOPOULOS A, et al. Generalized data augmentation for low-resource translation[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Jul 28-Aug 2, 2019. Stroudsburg: ACL, 2019: 5786-5796. [100] ZOPH B, YURET D, MAY J, et al. Transfer learning for low-resource neural machine translation[C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, Nov 1-5, 2016. Stroudsburg: ACL, 2016: 1568-1575. [101] HUANG K, HUANG D, LIU Z, et al. A joint multiple criteria model in transfer learning for cross-domain Chinese word segmentation[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, Nov 16-20, 2020. Stroudsburg: ACL, 2020: 3873-3882. [102] 米尔阿迪力江·麦麦提. 低资源条件下的神经机器翻译方法研究[D]. 北京: 清华大学, 2021. MIERADILIJIANG·MAIMAITI. Research on neural machine translation methods under low-resource conditions[D]. Beijing: Tsinghua University, 2021. [103] SENNRICH R, HADDOW B, BIRCH A. Improving neural machine translation models with monolingual data[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Aug 7-12, 2016. Stroudsburg: ACL, 2016: 86-96. [104] FADAEE M, BISAZZA A, MONZ C. Data augmentation for low-resource neural machine translation[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, Jul 30-Aug 30, 2017. Stroudsburg: ACL, 2017: 567-573. [105] 尤丛丛, 高盛祥, 余正涛, 等. 基于同义词数据增强的汉越神经机器翻译方法[J]. 计算机工程与科学, 2021, 43(8): 1497-1502. YOU C C, GAO S X, YU Z T, et al. A Chinese-Vietnamese neural machine translation method based an synonym data augmentation[J]. Computer Engineering & Science, 2021, 43(8): 1497-1502. [106] KOBAYASHI S. Contextual augmentation: data augmentation by words with paradigmatic relations[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, New Orleans, Jun 1-6, 2018. Stroudsburg: ACL, 2018: 452-457. [107] LONG M, CAO Y, WANG J, et al. Learning transferable features with deep adaptation networks[C]//Proceedings of the 2015 International Conference on Machine Learning, Lille, Jul 6-11, 2015: 95-105. [108] AJI A F, BOGOYCHEV N, HEAFIELD K, et al. In neural machine translation, what does transfer learning transfer?[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Jul 5-10, 2020. Stroudsburg: ACL, 2020: 7701-7710. [109] GU J, WANG Y, CHEN Y, et al. Meta-learning for low-resource neural machine translation[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Oct 31-Nov 4, 2018. Stroudsburg: ACL, 2018: 3622-3631. [110] LI R, WANG X, YU H. MetaMT, a meta learning method leveraging multiple domain data for low resource machine translation[C]//Proceedings of the 2020 AAAI Conference on Artificial Intelligence, New York, Feb 7-12, 2020. Menlo Park: AAAI, 2020: 8245-8252. [111] BAZIOTIS C, HADDOW B, BIRCH A. Language model prior for low-resource neural machine translation[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, Nov 16-20, 2020. Stroudsburg: ACL, 2020: 7622-7634. [112] SONG Y, EZZINI S, KLEIN J, et al. Letz translate: low-resource machine translation for luxembourgish[EB/OL]. [2023-10-18]. https://arxiv.org/pdf/2303.01347. [113] DIDDEE H, DANDAPAT S, CHOUDHURY M, et al. Too brittle to touch: comparing the stability of quantization and distillation towards developing low-resource MT models[C]//Proceedings of the 7th Conference on Machine Translation, Abu Dhabi, Dec 7-8, 2022: 870-885. [114] SALEH F, BUNTINE W, HAFFARI G, et al. Multilingual neural machine translation: can linguistic hierarchies help?[C]//Findings of the Association for Computational Linguistics: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Nov 16-20, 2021. Stroudsburg: ACL, 2021: 1313-1330. [115] ZHANG X, LI X, YANG Y, et al. Improving low-resource neural machine translation with teacher-free knowledge distillation[J]. IEEE Access, 2020, 8: 206638-206645. [116] HE T, CHEN J, TAN X, et al. Language graph distillation for low-resource machine translation[EB/OL]. [2023-10-18]. https://arxiv.org/abs/1908.06258. [117] LIBOVICKY J, HELCL J. End-to-end non-autoregressive neural machine translation with connectionist temporal classification[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Oct 31-Nov 4, 2018. Stroudsburg: ACL, 2018: 3016-3021. [118] LI Z, LIN Z, HE D, et al. Hint-based training for non-autoregressive machine translation[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kong, China, Nov 3-7, 2019. Stroudsburg: ACL, 2019: 5708-5713. [119] GUO J, TAN X, HE D, et al. Non-autoregressive neural machine translation with enhanced decoder input[C]//Proceedings of the 2019 AAAI Conference on Artificial Intelligence, Honolulu, Jan 27-Feb 1, 2019. Menlo Park: AAAI, 2019: 3723-3730. [120] ZHOU J, KEUNG P. Improving non-autoregressive neural machine translation with monolingual data[EB/OL]. [2023-10-18]. https://arxiv.org/abs/1711.02281. [121] ZHOU C, GU J, NEUBIG G. Understanding knowledge distillation in non-autoregressive machine translation[C]//Proceedings of the 8th International Conference on Learning Representations, Addis Ababa, Apr 26-30, 2020. [122] GUO J, WANG M, WEI D, et al. Self-distillation mixup training for non-autoregressive neural machine translation[EB/OL]. [2023-10-18]. https://arxiv.org/abs/2112.11640. [123] XU W, MA S, ZHANG D, et al. How does distilled data complexity impact the quality and confidence of non-autoregressive machine translation?[C]//Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, Bangkok, Aug 1-6, 2021. Stroudsburg: ACL, 2021: 4392-4400. [124] BAO G S, TENG Z Y, ZHOU H, et al. Non-autoregressive document-level machine translation[C]//Findings of the Association for Computational Linguistics: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Singapore, Dec 6-10, 2023. Stroudsburg: ACL, 2023: 14791-14803. [125] DING L, WANG L, LIU X, et al. Understanding and improving lexical choice in non-autoregressive translation[C]//Proceedings of the 2020 International Conference on Learning Representations, Addis Ababa, Apr 30, 2020. [126] REN Y, LIU J, TAN X, et al. A study of non-autoregressive model for sequence generation[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Jul 5-10, 2020. Stroudsburg: ACL, 2020: 149-159. [127] XIAO Y, WU L, GUO J, et al. A survey on non-autoregressive generation for neural machine translation and beyond[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(10): 11407-11427. [128] LIU M, BAO Y, ZHAO C, et al. Selective knowledge distillation for non-autoregressive neural machine translation[C]//Proceedings of the 37th AAAI Conference on Artificial Intelligence, Washington, Feb 7-14, 2023. Menlo Park: AAAI, 2023: 13246-13254. [129] KASAI J, PAPPAS N, PENG H, et al. Deep encoder, shallow decoder: reevaluating non-autoregressive machine translation[C]//Proceedings of the 2020 International Conference on Learning Representations, Addis Ababa, Apr 26-30, 2020. [130] GHAZVININEJAD M, LEVY O, LIU Y, et al. Mask-Predict: parallel decoding of conditional masked language models[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kong, China, Nov 3-7, 2019. Stroudsburg: ACL, 2019: 6112-6121. [131] QIAN L, ZHOU H, BAO Y, et al. Glancing transformer for non-autoregressive neural machine translation[C]// Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Aug 1-6, 2021. Stroudsburg: ACL, 2021: 1993-2003. [132] TAN X, ZHANG L, XIONG D, et al. Hierarchical modeling of global context for document-level neural machine translation[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kong, China, Nov 3-7, 2019. Stroudsburg: ACL, 2019: 1576-1585. [133] REN Y, LIU J, TAN X, et al. SimulSpeech: end-to-end simultaneous speech to text translation[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Jul 5-10, 2020. Stroudsburg: ACL, 2020: 3787-3796. [134] WANG S, WU J, FAN K, et al. Better simultaneous translation with monotonic knowledge distillation[C]//Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, Toronto, Jul 9-14, 2023. Stroudsburg: ACL, 2023: 2334-2349. [135] CHU C, WANG R. A survey of domain adaptation for neural machine translation[C]//Proceedings of the 27th International Conference on Computational Linguistics, New Mexico, Aug 20-26, 2018: 1304-1319. [136] GORDON M, DUH K. Distill, adapt, distill: training small, in-domain models for neural machine translation[C]//Proceedings of the 4th Workshop on Neural Generation and Translation, Jul 5-10, 2020: 110-118. [137] ZHU J, XIA Y, WU L, et al. Incorporating BERT into neural machine translation[EB/OL]. [2023-10-18]. https://arxiv.org/abs/2002.06823. [138] IMAMURA K, SUMITA E. Recycling a pre-trained BERT encoder for neural machine translation[C]//Proceedings of the 3rd Workshop on Neural Generation and Translation,Hong Kong, China, Nov 4, 2019: 23-31. [139] XU H, VAN DURME B, MURRAY K. BERT, mBERT, or BiBERT? A study on contextualized embeddings for neural machine translation[C]//Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Punta Cana, Nov 7-11, 2021. Stroudsburg: ACL, 2021: 6663-6675. [140] DAI Y, DE KAMPS M, SHAROFF S. BERTology for machine translation: what BERT knows about linguistic difficulties for translation[C]//Proceedings of the 13th Language Resources and Evaluation Conference, Marseille, Jun 20-25, 2022: 6674-6690. [141] JIAO X, YIN Y, SHANG L, et al. TinyBERT: distilling BERT for natural language understanding[C]//Findings of the Association for Computational Linguistics: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, Nov 16-20, 2020. Stroudsburg: ACL, 2020: 4163-4174. [142] RADFORD A, NARASIMHAN K, SALIMANS T, et al. Improving language understanding by generative pre-training[EB/OL]. [2023-10-18]. https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper. pdf. [143] HENDY A, ABDELREHIM M, SHARAF A, et al. How good are GPT models at machine translation? A comprehensive evaluation[EB/OL]. [2023-10-18]. https://arxiv.org/abs/2302.09210. [144] JIAO W, WANG W, HUANG J T, et al. Is ChatGPT a good translator? Yes with GPT-4 as the engine[EB/OL]. [2023-10-18]. https://arxiv.org/abs/2301.08745. [145] WU Y, HU G. Exploring prompt engineering with GPT language models for document-level machine translation: insights and findings[C]//Proceedings of the 8th Conference on Machine Translation, Singapore, Dec 6-7, 2023: 166-169. [146] MARIE B, FUJITA A. Synthesizing monolingual data for neural machine translation[EB/OL]. [2023-10-19]. https://arxiv.org/abs/2101.12462. [147] STAP D, ARAABI A. ChatGPT is not a good indigenous translator[C]//Proceedings of the 2023 Workshop on Natural Language Processing for Indigenous Languages of the Americas, Toronto, Jul 14, 2023: 163-167. [148] TAO C, HOU L, ZHANG W, et al. Compression of generative pre-trained language models via quantization[C]//Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Dublin, May 22-27, 2022. Stroudsburg: ACL, 2022: 4821-4836. [149] GU Y, DONG L, WEI F, et al. Knowledge distillation of large language models[EB/OL]. [2023-10-19]. https://arxiv.org/abs/2306.08543. [150] NAYAK G K, MOPURI K R, SHAJ V, et al. Zero-shot knowledge distillation in deep networks[C]//Proceedings of the 36th International Conference on Machine Learning, Long Beach, Jun 9-15, 2019: 4743-4751. |
[1] | ZHAO Honglei, TANG Huanling, ZHANG Yu, SUN Xueyuan, LU Mingyu. Named Entity Recognition Model Based on k-best Viterbi Decoupling Knowledge Distillation [J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(3): 780-794. |
[2] | WANG Jiaqi, ZHU Junguo, YU Zhengtao. Low-Resource Machine Translation Based on Training Strategy with Changing Gradient Weight [J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(3): 731-739. |
[3] | LIN Zhenyuan, LIN Shaohui, YAO Yiwu, HE Gaoqi, WANG Changbo, MA Lizhuang. Multi-teacher Contrastive Knowledge Inversion for Data-Free Distillation [J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(11): 2721-2733. |
[4] | MA Jinlin, LIU Yuhao, MA Ziping, GONG Yuanwen, ZHU Yanbin. HSKDLR: Lightweight Lip Reading Method Based on Homogeneous Self-Knowledge Distillation [J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(11): 2689-2702. |
[5] | GUO Wanghao, FAN Jiangwei, ZHANG Keliang. Advance Research on Neural Machine Translation Integrating Linguistic Knowledge [J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(7): 1183-1194. |
[6] | MENG Xianfa, LIU Fang, LI Guang, HUANG Mengmeng. Review of Knowledge Distillation in Convolutional Neural Network Compression [J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(10): 1812-1829. |
[7] | GENG Lili, NIU Baoning. Survey of Deep Neural Networks Model Compression [J]. Journal of Frontiers of Computer Science and Technology, 2020, 14(9): 1441-1455. |
[8] | KONG Jinying, LI Xiao, WANG Lei, YANG Yating, LUO Yangen. Research of Deep Filtering Lexical Reordering Table [J]. Journal of Frontiers of Computer Science and Technology, 2017, 11(5): 785-793. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||
/D:/magtech/JO/Jwk3_kxyts/WEB-INF/classes/