利用层级交互注意力的文本摘要方法

doi:10.3778/j.issn.1673-9418.1909008

摘要/Abstract

摘要：

基于注意力机制的编解码模型在文本摘要、机器翻译等序列到序列任务上得到了广泛的应用。在深度学习框架中，深层神经网络能够提取输入数据不同的特征表示，因此传统编解码模型中通常堆叠多层解码器来提高模型性能。然而现有的模型在解码时仅利用编码器最后一层信息，而忽略编码器其余层的特征。鉴于此，提出一种基于多层循环神经网络和层级交互注意力机制的摘要生成模型，通过层级交互注意力提取编码器不同层次的特征信息来指导摘要的生成。为了处理因引入不同层次特征而带来的信息冗余问题，引入变分信息瓶颈压缩数据噪声。最后在Gigaword和DUC2004摘要数据集上进行实验，结果表明所提方法能够获得最佳性能。

关键词: 文本摘要, 编解码模型, 层级交互注意力机制, 变分信息瓶颈

Abstract:

Attention-based encoding and decoding models have been widely used in text abstracts, machine translation and other sequence-to-sequence tasks. In deep learning framework, multi-layer neural network can obtain different feature representations of input data. Therefore, in conventional encoding and decoding model, the performance of the model is usually improved by stacking multi-layer decoders. However, the existing models only pay attention to the output of the last layer of the encoder when decoding, and ignore the information of other layers. In view of this, this paper proposes a novel abstractive text summarization model based on recurrent neural network and multi-layer interactive attention mechanism. The multi-layer interactive attention mechanism is introduced to extract contextual information from different levels of the encoder to guide the generation of abstracts. In order to deal with the problem of information redundancy caused by introducing different levels of context, the variational information bottleneck is adopted to compress data noise. Finally, this paper conducts experiments on Gigaword and DUC2004 datasets, and the results show that the proposed method achieves state of the art performance.

Key words: text summarization, encoding and decoding model, multi-layer interactive attention, variational infor-mation bottleneck

黄于欣，余正涛，相艳，高盛祥，郭军军. 利用层级交互注意力的文本摘要方法[J]. 计算机科学与探索, 2020, 14(10): 1681-1692.

HUANG Yuxin, YU Zhengtao, XIANG Yan, GAO Shengxiang, GUO Junjun. Exploiting Multi-layer Interactive Attention for Abstractive Text Summarization[J]. Journal of Frontiers of Computer Science and Technology, 2020, 14(10): 1681-1692.

参考文献

[1] Klein G, Kim Y, Deng Y, et al. OpenNMT: open-source toolkit for neural machine translation[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Lingui-stics, Vancouver, Jul 30-Aug 4, 2017. Stroudsburg: ACL, 2017: 67-72.
[2] Nallapati R, Zhou B, dos Santos C, et al. Abstractive text sum-marization using sequence-to-sequence RNNs and beyond[C]//Proceedings of the 20th CoNLL Conference on Computa-tional Natural Language Learning, Berlin, Aug 7-12, 2016. Stroudsburg: ACL, 2016: 280-290.
[3] Venugopalan S, Rohrbach M, Donahue J, et al. Sequence to sequence-video to text[C]//Proceedings of the 14th ICCV IEEE International Conference on Computer Vision, Santiago, Dec 13-16, 2015. Piscataway: IEEE, 2015: 4534-4542.
[4] Miao Y, Gowayyed M, Metze F. EESEN: end-to-end speech recognition using deep RNN models and WFST-based decod-ing[C]//Proceedings of the 2015 ASRU IEEE Workshop on Automatic Speech Recognition and Understanding, Arizona, Dec 13-17, 2015. Piscataway: IEEE, 2015: 167-174.
[5] Meng F, Lu Z, Li H, et al. Interactive attention for neural ma-chine translation[C]//Proceedings of the 26th COLING Interna-tional Conference on Computational Linguistics, Osaka, Dec 11-16, 2016. New York: ACM, 2016: 2174-2185.
[6] Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate[J]. arXiv:1409.0473, 2014.
[7] Rush A M, Chopra S, Weston J. A neural attention model for abstractive sentence summarization[C]//Proceedings of the 2015 EMNLP Conference on Empirical Methods in Natural Language Processing, Lisbon, Sep 17-21, 2015. Strouds-burg: ACL, 2015: 379-389.
[8] Gu J, Lu Z, Li H, et al. Incorporating copying mechanism in sequence-to-sequence learning[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Lingui-stics, Berlin, Aug 7-12, 2016. Stroudsburg: ACL, 2016: 1631-1640.
[9] See A, Liu P J, Manning C D. Get to the point: summariza-tion with pointer-generator networks[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, Jul 30-Aug 4, 2017. Stroudsburg: ACL, 2017: 1073-1083.
[10] Ling J, Rush A. Coarse-to-fine attention models for docu-ment summarization[C]//Proceedings of the 2017 Workshop on New Frontiers in Summarization, Copenhagen, 2017. Stroudsburg: ACL, 2017: 33-42.
[11] Belinkov Y, Durrani N, Dalvi F, et al. What do neural machine translation models learn about morphology?[C]//Proceedings of the 55th Annual Meeting of the Association for Computa-tional Linguistics, Vancouver, Jul 30-Aug 4, 2017. Strouds-burg: ACL, 2017: 861-872.
[12] Ren P, Chen Z, Ren Z, et al. Leveraging contextual sentence relations for extractive summarization using a neural attention model[C]//Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Tokyo, Aug 7-11, 2017. New York: ACM, 2017: 95-104.
[13] Wan X, Yang J. Improved affinity graph based multi-docu-ment summarization[C]//Proceedings of the Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, New York, Jun 4-9, 2006. Stroudsburg: ACL, 2006: 181-184.
[14] McDonald R T. A study of global inference algorithms in multi-document summarization[C]//LNCS 4425: Proceedings of the 29th European Conference on IR Research Advances in Information Retrieval, Rome, Apr 2-5, 2007. Berlin, Heidelberg: Springer, 2007: 557-564.
[15] Nallapati R, Zhai F, Zhou B. SummaRuNNer: a recurrent neural network based sequence model for extractive sum-marization of documents[C]//Proceedings of the 31st AAAI Conference on Artificial Intelligence, California, Feb 4-9, 2017. Menlo Park: AAAI, 2017: 3075-3081.
[16] Zhang X, Lapata M, Wei F, et al. Neural latent extractive document summarization[C]//Proceedings of the 2018 Con-ference on Empirical Methods in Natural Language Process-ing, Brussels, Oct 31-Nov 4, 2018. Stroudsburg: ACL, 2018: 779-784.
[17] Jadhav A, Rajan V. Extractive summarization with swap-net: sentences and words from alternating pointer networks[C]//Proceedings of the 56th Annual Meeting of the Associa-tion for Computational Linguistics, Melbourne, Jul 15-20, 2018. Stroudsburg: ACL, 2018: 142-151.
[18] Zhou Q, Yang N, Wei F, et al. Neural document summariza-tion by jointly learning to score and select sentences[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Jul 15-20, 2018. Stroudsburg: ACL, 2018: 654-663.
[19] Wang H, Wang X, Xiong W, et al. Self-supervised learning for contextualized extractive summarization[C]//Proceedings of the 57th Conference of the Association for Computational Linguistics, Florence, Jul 28-Aug 2, 2019. Stroudsburg: ACL, 2019: 654-663.
[20] Zheng H, Lapata M. Sentence centrality revisited for unsuper-vised summarization[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics,Florence, Jul 28-Aug 2, 2019. Stroudsburg: ACL, 2019: 6236-6247.
[21] Chopra S, Auli M, Rush A M. Abstractive sentence summariza-tion with attentive recurrent neural networks[C]//Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics, San Diego, Jun 12-17, 2016. Stroudsburg: ACL, 2016: 93-98.
[22] Zhou Q, Yang N, Wei F, et al. Selective encoding for abstrac-tive sentence summarization[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguis-tics, Vancouver, Jul 30-Aug 4, 2017. Stroudsburg: ACL, 2017: 1095-1104.
[23] Zeng W, Luo W, Fidler S, et al. Efficient summarization with read-again and copy mechanism[J]. arXiv:1611.03382, 2016.
[24] Lin J, Xu S U N, Ma S, et al. Global encoding for abstractive summarization[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Jul 15-20, 2018. Stroudsburg: ACL, 2018: 163-169.
[25] Xia Y, Tian F, Wu L, et al. Deliberation networks: sequence generation beyond one-pass decoding[C]//Proceedings of the 31st NeurIPS Advances in Neural Information Processing Systems, Long Beach, Dec 4-9, 2017. Cambridge: MIT Press, 2017: 1784-1794.
[26] Luong T, Pham H, Manning C D. Effective approaches to attention-based neural machine translation[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Langu-age Processing, Lisbon, Sep 17-21, 2015. Stroudsburg: ACL, 2015: 1412-1421.
[27] Garg S, Peitz S, Nallasamy U, et al. Jointly learning to align and translate with transformer models[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Langu-age Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kong, China, Nov 3-7, 2019. Stroudsburg: ACL, 2019: 4443-4452.
[28] Xu K, Ba J L, Kiros R, et al. Show, attend and tell: neural image caption generation with visual attention[C]//Proceed-ings of the 32nd International Conference on Machine Learning, Lille, Jul 6-11, 2015. Stroudsburg: ACL, 2015: 2048-2057.
[29] Paulus R, Xiong C, Socher R. A deep reinforced model for abstractive summarization[J]. arXiv:1705.04304, 2017.
[30] Chen H, Huang S, Chiang D, et al. Combining character and word information in neural machine translation using a multi-level attention[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Com-putational Linguistics: Human Language Technologies, New Orleans, Jun 3-7, 2018. Stroudsburg: ACL, 2018: 1284-1293.
[31] Li C, Xu W, Li S, et al. Guiding generation for abstractive text summarization based on key information guide network[C]//Proceedings of the 2018 Conference of the North Ameri-can Chapter of the Association for Computational Linguistics: Human Language Technologies, New Orleans, Jun 3-7, 2018. Stroudsburg: ACL, 2018: 55-60.
[32] Alemi A A, Fischer I, Dillon J V, et al. Deep variational information bottleneck[J]. arXiv:1612.00410, 2016.
[33] Tishby N, Pereira F C, Bialek W. The information bottleneck method[J]. arXiv:physics/0004057, 2000.
[34] Lin C Y. ROUGE: a package for automatic evaluation of summaries[C]//Proceedings of the Workshop on Text Sum-marization Branches Out, Barcelona, Jul 21-26, 2004. Strouds-burg: ACL, 2004: 74-81.
[35] Li J, Monroe W, Jurafsky D. A simple, fast diverse decoding algorithm for neural generation[J]. arXiv:1611.08562, 2016.
[36] Gehring J, Auli M, Grangier D, et al. Convolutional sequence to sequence learning[C]//Proceedings of the 34th International Conference on Machine Learning, Sydney, Aug 6-11, 2017.New York: ACM, 2017: 1243-1252.