[1] RICO S, BARRY H, ALEXANDRA B. Improving neural machine translation models with monolingual data[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistic, Berlin, Aug 7-12, 2016. Stroudsburg: ACL, 2016: 86-96.
[2] MARZIEH F, ARIANNA B, CHRISTOF M. Data augmentation for low-resource neural machine translation[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, Jul 30-Aug 4, 2017. Stroudsburg: ACL, 2017: 567-573.
[3] MENGZHOU X, XIANG K, ANTONIOS A, et al. Generalized data augmentation for low-resource translation[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Jul 28-Aug 2, 2019. Stroudsburg: ACL, 2019: 5786-5796.
[4] NITISH S, GEOFFREY H, ALEX K, et al. Dropout: a simple way to prevent neural networks from overfitting[J]. Journal of Machine Learning Research, 2014, 15(1): 1929-1958.
[5] RAFAEL M, SIMON K, GEOFFREY H. When does label smoothing help?[C]//Advances in Neural Information Processing Systems 32, Vancouver, Dec 8-14, 2019: 4696-4705.
[6] YINGBO G, WANG W, CHRISTIAN H, et al. Towards a better understanding of label smoothing in neural machine translation[C]//Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, Jul 5-10, 2020. Stroudsburg: ACL, 2020: 212-223.
[7] ASHISH V, NOAM S, NIKI P, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems 30, Long Beach, Dec 4-9, 2017: 5998-6008.
[8] KENTON M, JEFFERY K, TOAN N, et al. Auto-sizing the transformer network: improving speed, efficiency, and performance for low-resource machine translation[C]//Proceedings of the 3rd Workshop on Neural Generation and Translation, Hong Kong, China, Nov 4, 2019. Stroudsburg: ACL, 2019: 231-240.
[9] 冯洋, 邵晨泽. 神经机器翻译前沿综述[J]. 中文信息学报, 2018, 34(7): 1-18.
FENG Y, SHAO C Z. Frontiers in neural machine translation: a literature review[J]. Journal of Chinese Information Processing, 2018, 34(7): 1-18.
[10] DZMITRY B, KYUNGHYUN C, YOSHUA B. Neural machine translation by jointly learning to align and translate[C]//Proceedings of the 6th International Conference for Learning Representations, San Diego, May 7-9, 2015.
[11] ILYA S, ORIOL V, QUOC L. Sequence to sequence learning with neural networks[C]//Advances in Neural Information Processing Systems 27, Montreal, Dec 8-13, 2014: 3104-3112.
[12] RICO S, BARRY H, ALEXANDRA B. Neural machine translation of rare words with subword units[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistic, Berlin, Aug 7-12, 2016. Stroudsburg: ACL, 2016: 1715-1725.
[13] JACOB D, CHANG M, KENTON L, et al. BERT: pre-training of deep bidirectional transformers for language understan-ding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics, Minneapolis, Jun 2-7, 2019. Stroudsburg: ACL, 2019: 4171-4186.
[14] GU S, ZHANG J, MENG F. Token-level adaptive training for neural machine translation[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, Nov 16-20, 2020. Stroudsburg: ACL, 2020:1035-1046.
[15] XU Y, LIU Y, MENG F. Bilingual mutual information based adaptive training for neural machine translation[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics, Bangkok, Aug 1-6, 2021. Stroudsburg: ACL, 2021: 511-516.
[16] CHRISTOS B, BARRY H, ALEXANDRA B. Language model prior for low-resource neural machine translation[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, Nov 16-20, 2020. Stroudsburg: ACL, 2020: 7622-7634.
[17] ZHU J, XIA Y, WU L. Incorporating BERT into neural machine translation[C]//Proceedings of the 8th International Conference for Learning Representations, Apr 26-May 1, 2020.
[18] ZHANG S, LIU Y, MENG F. Conditional bilingual mutual information based adaptive training for neural machine translation[C]//Proceedings of the 60th Annual Meeting of the Association for Computational Linguistic, Dublin, May 22-27, 2022. Stroudsburg: ACL, 2022: 2377-2389.
[19] LéON B. Large-scale machine learning with stochastic gradient descent[C]//Proceedings of the 19th International Conference on Computational Statistics, Paris, Aug 22-27, 2010: 177-186.
[20] JOHN D, ELAD H, YORAM S. Adaptive subgradient methods for online learning and stochastic optimization[J]. Journal of Machine Learning Research, 2011, 12: 2121-2159.
[21] ADADELTA M Z. An adaptive learning rate method[J]. arXiv:1212.5701, 2012.
[22] DIEDERIK K, JIMMY B. Adam: a method for stochastic optimization[C]//Proceedings of the 3rd International Conference for Learning Representations, Banff, Apr 14-16, 2014: 404-413.
[23]?REDDI S, SATYEN K, SANJIV K. On the convergence speed of adam and beyond[C]//Proceedings of the 6th International Conference for Learning Representations, Vancouver, Apr 30-May 3, 2018.
[24] REBECCA K, SAMUEL L, DARLENE S, et al. NRC systems for low resource German-Upper Sorbian machine translation 2020: transfer learning with lexical modifications[C]//Proceedings of the 5th Conference on Machine Translation, Nov 16-20, 2020. Stroudsburg: ACL, 2020: 1112-1122.
[25] POST M, CALLISON-BURCH C, OSBORNE M. Constructing parallel Corpora for six Indian languages via crowdsourcing[C]//Proceedings of the 7th Workshop on Statistical Machine Translation, Montréal, Jun 7-8, 2012. Stroudsburg: ACL, 2012: 401-409.
[26] KISHORE P, SALIM R, TODD W, et al. BLEU: a method for automatic evaluation of machine translation[C]//Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, Jul 6-12, 2002. Stroudsburg: ACL, 2002: 311-318.
[27] KYLE K, CROSSLEY S A, JARVIS S. Assessing the validity of lexical diversity indices using direct judgements[J]. Language Assessment Quarterly, 2021, 28(2): 154-170. |