计算机科学与探索 ›› 2023, Vol. 17 ›› Issue (6): 1387-1394.DOI: 10.3778/j.issn.1673-9418.2109075

• 人工智能·模式识别 • 上一篇    下一篇

自动扩充关键词语义信息的诗歌生成算法

王勇超,周灵智,赵亚萍,许端清   

  1. 1. 浙江大学 信息技术中心,杭州 310027
    2. 浙江大学 计算机科学与技术学院,杭州 310027
  • 出版日期:2023-06-01 发布日期:2023-06-01

Poetry Generation Algorithm with Automatic Expansion of Keyword Semantic Information

WANG Yongchao, ZHOU Lingzhi, ZHAO Yaping, XU Duanqing   

  1. 1. Center of Information & Technology, Zhejiang University, Hangzhou 310027, China
    2. College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China
  • Online:2023-06-01 Published:2023-06-01

摘要: 当前,诗歌生成模型大多数通过用户所提供的关键词来生成符合韵律规则和音调起伏的诗歌。由于关键词蕴含的语义信息较少,很难保证生成诗歌的质量,容易出现上下文主题偏移的现象。针对这一问题,提出了一种基于条件变分自编码器的生成模型,该模型能够在更加丰富的语义信息指导下,生成更符合关键字描述和用户满意度的诗歌。该模型通过采样人类创作的诗歌,引入额外和关键词相关的语义信息,有效估计条件变分自编码器的先验概率分布,生成更贴合真实分布的先验概率。由于该模型自动扩充了关键词信息,缩小了输入和输出语义信息的差距,缓解了以往模型中普遍存在的过翻译问题。实验结果表明,该模型无论在自动评估还是人类评估方面相比其他模型都有更好的效果,并成功减少了过翻译问题出现的频率,提高了生成诗歌的流畅性。通过变化采样的范围,成功实现了对生成诗歌写作风格的控制,进一步证明了该算法的有效性。

关键词: 自然语言处理, 自然语言生成, 诗歌生成, 条件变分自编码器

Abstract: At present, most of the poetry generation models use keywords provided by users to generate poems that conform to the rules of rhythm and fluctuations in pitch. Because keywords contain less semantic information, it is difficult to guarantee the quality of generated poems, and the phenomenon of contextual theme shift is likely to occur. In response to this problem, this paper proposes a generative model based on conditional variational autoencoders, which can generate poems that are more in line with keyword descriptions and user satisfaction under the guidance of richer semantic information. By sampling human poetry and introducing additional semantic information related to keywords, the model effectively estimates the prior probability distribution of the conditional variational autoencoder, and generates a prior probability that more closely matches the true distribution. Because this model automatically expands keyword information, it narrows the gap between input and output semantic information, and alleviates the over-translation problem that is common in previous models. Experimental results show that the proposed model has better results than other models in both automatic and human evaluation, successfully reduces the frequency of over-translation problems and improves the fluency of generated poetry. By changing the range of sampling, controlling the writing style of the generated poetry is successfully achieved, which further shows the effectiveness of the algorithm proposed in this paper.

Key words: natural language processing, natural language generation, poetry generation, conditional variational autoencoder