计算机科学与探索 ›› 2020, Vol. 14 ›› Issue (10): 1681-1692.DOI: 10.3778/j.issn.1673-9418.1909008

• 学术研究 • 上一篇    下一篇

利用层级交互注意力的文本摘要方法

黄于欣,余正涛,相艳,高盛祥,郭军军   

  1. 1. 昆明理工大学 信息工程与自动化学院,昆明 650500
    2. 昆明理工大学 云南省人工智能重点实验室,昆明 650500
  • 出版日期:2020-10-01 发布日期:2020-10-12

Exploiting Multi-layer Interactive Attention for Abstractive Text Summarization

HUANG Yuxin, YU Zhengtao, XIANG Yan, GAO Shengxiang, GUO Junjun   

  1. 1. Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming   650500, China
    2. Yunnan Key Laboratory of Artificial Intelligence, Kunming University of Science and Technology, Kunming 650500, China
  • Online:2020-10-01 Published:2020-10-12

摘要:

基于注意力机制的编解码模型在文本摘要、机器翻译等序列到序列任务上得到了广泛的应用。在深度学习框架中,深层神经网络能够提取输入数据不同的特征表示,因此传统编解码模型中通常堆叠多层解码器来提高模型性能。然而现有的模型在解码时仅利用编码器最后一层信息,而忽略编码器其余层的特征。鉴于此,提出一种基于多层循环神经网络和层级交互注意力机制的摘要生成模型,通过层级交互注意力提取编码器不同层次的特征信息来指导摘要的生成。为了处理因引入不同层次特征而带来的信息冗余问题,引入变分信息瓶颈压缩数据噪声。最后在Gigaword和DUC2004摘要数据集上进行实验,结果表明所提方法能够获得最佳性能。

关键词: 文本摘要, 编解码模型, 层级交互注意力机制, 变分信息瓶颈

Abstract:

Attention-based encoding and decoding models have been widely used in text abstracts, machine translation and other sequence-to-sequence tasks. In deep learning framework, multi-layer neural network can obtain different feature representations of input data. Therefore, in conventional encoding and decoding model, the performance of the model is usually improved by stacking multi-layer decoders. However, the existing models only pay attention to the output of the last layer of the encoder when decoding, and ignore the information of other layers. In view of this, this paper proposes a novel abstractive text summarization model based on recurrent neural network and multi-layer interactive attention mechanism. The multi-layer interactive attention mechanism is introduced to extract contextual information from different levels of the encoder to guide the generation of abstracts. In order to deal with the problem of information redundancy caused by introducing different levels of context, the variational information bottleneck is adopted to compress data noise. Finally, this paper conducts experiments on Gigaword and DUC2004 datasets, and the results show that the proposed method achieves state of the art performance.

Key words: text summarization, encoding and decoding model, multi-layer interactive attention, variational infor-mation bottleneck