计算机科学与探索

• 学术研究 •    下一篇

基于离散扩散模型的序列到序列文本生成方法

蒋航, 蔡国永, 李思慧   

  1. 1. 桂林电子科技大学 计算机与信息安全学院, 广西 桂林 541004
    2. 广西可信软件重点实验室, 广西 桂林 541004

Sequence-to-Sequence Text Generation with Discrete Diffusion Models

JIANG Hang, CAI Guoyong, LI Sihui   

  1. 1. College of Computer and Information Security, Guilin University of Electronic Technology, Guilin, Guangxi 541004, China
    2. Guangxi Key Lab of Trusted Software, Guilin, Guangxi 541004, China

摘要: 扩散语言模型是目前非自回归模型中最具潜力的语言模型,有望取代饱受推理速度缓慢问题拖累的自回归语言模型,实现高效且无损于质量的文本生成。文本摘要生成、机器翻译、对话生成等序列到序列的生成任务是扩散语言模型经常会遇到的实际应用场景,如何更好更快的实现序列到序列的文本生成一直都是自然语言处理领域的研究重点。为了实现上述目标,通过推导离散扩散模型训练目标的上界简化了扩散模型的训练过程,随后引入并改造了条件掩码语言模型的遮掩-预测解码策略作为扩散模型的推理算法,提升了模型的生成质量。为了进一步提升离散扩散模型在推理的前几轮生成文本的质量,还提出了正弦噪音调度,相比于原来的线性噪音调度,时间步中高噪音区间变得更大了,模型将更专注于学习如何从高噪音数据中恢复数据,从而提升在推理的前几轮中生成文本的质量。受到课程学习策略的启发,设计了新的时间步采样分布,通过操纵时间步的采样实现由易到难的学习。在公开数据集上的实验表明,提出的方法能有效提升模型的性能,在WMT16 EN-RO数据集上,扩散模型仅用自回归基线一半的推理时间,就能推理出相同生成质量的文本。

关键词: 扩散模型, 语言模型, 文本生成, 序列到序列, 非自回归模型

Abstract: Diffusion language models are currently the most promising language models among non-autoregressive models, and are expected to replace autoregressive language models, which suffer from slow inference speed, to achieve efficient and quality-preserving text generation. Sequence-to-sequence(Seq2Seq) text generation is a common scenario encountered in practical applications of diffusion language models, including text summarization generation, machine translation, dialogue generation. Achieving high-quality Seq2Seq text generation with low latency remains a persistent challenge in the field of natural language processing. To this end, this paper simplifies the training process of the discrete diffusion model by deriving an upper bound on the training objective, and subsequently introduces and modifies the mask-and-predict decoding strategy from the conditional mask language model as the inference algorithm of the diffusion model. In order to further improve the quality of generated text in the first few rounds of inference for discrete diffusion models, this study also proposes a sinusoidal noise schedule. Compared to the original linear noise schedule, the high noise interval in time steps becomes larger, and the model will focus more on learning how to recover data from high noise data that is common encounter by the model in the first few rounds of inference. Inspired by curriculum learning strategies, this paper also designs a new sampling distribution for time steps to achieving an easy-to-hard learning strategy. Experiments on public datasets show that our methods can effectively improve model performance. On the WMT16 EN-RO dataset, the diffusion model achieving comparable generation quality to the autoregressive baseline in only half the inference time.

Key words: diffusion model, language model, text generation, sequence to sequence, non-autoregressive model