[1] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems 30, 2017: 5998-6008.
[2] OUYANG L, WU J, XU J, et al. Training language models to follow instructions with human feedback[C]//Proceedings of the 36th International Conference on Neural Information Processing Systems, 2024: 27730-27744.
[3] GU J, BRADBURY J, XIONG C, et al. Non-autoregressive neural machine translation[C]//Proceedings of the 6th International Conference on Learning Representations, 2018.
[4] SOHL-DICKSTEIN J, WEISS E A, MAHESWARANATHAN N, et al. Deep unsupervised learning using nonequilibrium thermodynamics[C]//Proceedings of the 32nd International Conference on Machine Learning- Volume 37, 2015: 2256-2265.
[5] LI X L, THICKSTUN J, GULRAJANI I, et al. Diffusion-LM improves controllable text generation[C]//Proceedings of the 36th International Conference on Neural Information Processing Systems, 2024: 4328-4343.
[6] GONG S, LI M, FENG J, et al. DiffuSeq: sequence to sequence text generation with diffusion models[C]//Proceedings of the 11th International Conference on Learning Representations, 2023.
[7] HAN X C, KUMAR S, TSVETKOV Y. SSD-LM: semi-autoregressive simplex-based diffusion language model for text generation and modular control[C]//Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2023: 11575-11596.
[8] HOOGEBOOM E, NIELSEN D, JAINI P, et al. Argmax flows and multinomial diffusion[C]//Proceedings of the 35th International Conference on Neural Information Processing Systems, 2024: 12454-12465.
[9] HE Z F, SUN T X, TANG Q, et al. DiffusionBERT: improving generative masked language models with diffusion models[C]//Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2023: 4521- 4534.
[10] ZHENG L, YUAN J B, YU L, et al. A reparameterized discrete diffusion model for text generation[EB/OL]. [2024-02-12]. https://arxiv.org/abs/2302.05737.
[11] LIN Z, GONG Y, SHEN Y, et al. Text generation with diffusion language models: a pre-training approach with continuous paragraph denoise[C]//Proceedings of the 11th International Conference on Learning Representations, 2023: 21051-21064.
[12] YE J S, ZHENG Z X, BAO Y, et al. DINOISER: diffused conditional sequence learning by manipulating noises[EB/OL]. [2024-02-12]. https://arxiv.org/abs/2302.10025.
[13] BENGIO Y, LOURADOUR J, COLLOBERT R, et al. Curriculum learning[C]//Proceedings of the 26th Annual International Conference on Machine Learning, 2009: 41-48.
[14] AUSTIN J, JOHNSON D D, HO J, et al. Structured denoising diffusion models in discrete state-spaces[C]//Proceedings of the 35th International Conference on Neural Information Processing Systems, 2024: 17981-17993.
[15] KENTON J D M W C, TOUTANOVA L K. BERT: pre-training of deep bi-directional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.Stroudsburg: ACL, 2019: 4171-4186.
[16] CHEN J A, ZHANG A, LI M, et al. A cheaper and better diffusion language model with soft-masked noise[C]//Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2023: 4765- 4775.
[17] STRUDEL R, TALLEC C, ALTCHÉ F, et al. Self-conditioned embedding diffusion for text generation[C]//Proceedings of the 11th International Conference on Learning Representations,2023.
[18] QIAN L H, ZHOU H, BAO Y, et al. Glancing transformer for non-autoregressive neural machine translation[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. Stroudsburg: ACL, 2021: 1993-2003.
[19] GHAZVININEJAD M, LEVY O, LIU Y H, et al. Mask-predict: parallel decoding of conditional masked language models[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Stroudsburg: ACL, 2019: 6112-6121.
[20] OTT M, EDUNOV S, BAEVSKI A, et al. Fairseq: a fast, extensible toolkit for sequence modeling[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: ACL, 2019: 48-53.
[21] CETTOLO M, NIEHUES J, STÜKER S, et al. Report on the 11th IWSLT evaluation campaign[C]//Proceedings of the 11th International Workshop on Spoken Language Translation: Evaluation Campaign, 2014: 2-17.
[22] BOJAR O, BUCK C, FEDERMANN C, et al. Findings of the 2014 workshop on statistical machine translation[C]//Proceedings of the 9th Workshop on Statistical Machine Translation. Stroudsburg: ACL, 2014: 12-58.
[23] BOJAR O, CHATTERJEE R, FEDERMANN C, et al. Findings of the 2016 conference on machine translation[C]//Proceedings of the 1st Conference on Machine Translation. Stroudsburg: ACL, 2016: 131-198.
[24] LEE J, MANSIMOV E, CHO K. Deterministic non-autoregressive neural sequence modeling by iterative refinement[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2018: 1173-1182.
[25] DHINGRA B, MAZAITIS K, COHEN W W. Quasar: datasets for question answering by search and reading[EB/OL]. [2024-02-12]. https://arxiv.org/abs/1707.03904.
[26] SHARMA L, GRAESSER L, NANGIA N, et al. Natural language understanding with the quora question pairs dataset[EB/OL]. [2024-02-12]. https://arxiv.org/abs/1907.01041.
[27] SAVINOV N, CHUNG J, BINKOWSKI M, et al. Step-unrolled denoising autoencoders for text generation[C]//Proceedings of the 10th International Conference on Learning Representations, 2022.
[28] GHAZVININEJAD M, LEVY O, ZETTLEMOYER L. Semi- autoregressive training improves mask-predict decoding[EB/OL]. [2024-02-12]. https://arxiv.org/abs/2001.08785.
[29] HUANG X, PÉREZ F, VOLKOVS M. Improving non-autoregressive translation models without distillation[C]//Proceedings of the 10th International Conference on Learning Representations, 2022. |