计算机科学与探索 ›› 2022, Vol. 16 ›› Issue (4): 877-887.DOI: 10.3778/j.issn.1673-9418.2010066

• 人工智能 • 上一篇    下一篇

结合层级注意力的抽取式新闻文本自动摘要

王红斌1,2, 金子铃1,2, 毛存礼1,2,+()   

  1. 1.昆明理工大学 信息工程与自动化学院,昆明 650500
    2.昆明理工大学 云南省人工智能重点实验室,昆明 650500
  • 收稿日期:2020-10-26 修回日期:2021-01-06 出版日期:2022-04-01 发布日期:2021-02-03
  • 通讯作者: + E-mail: maocunli@163.com
  • 作者简介:王红斌(1983—),男,云南曲靖人,博士, 副教授,硕士生导师,主要研究方向为智能信息系统、自然语言处理、数据分析。
    金子铃(1995—),女,云南泸西人,硕士研究生,主要研究方向为自然语言处理。
    毛存礼(1977—),男,云南曲靖人,博士,副教授,硕士生导师,CCF会员,主要研究方向为自然语言处理、信息检索、机器翻译。
  • 基金资助:
    国家自然科学基金(61966020)

Extractive News Text Automatic Summarization Combined with Hierarchical Attention

WANG Hongbin1,2, JIN Ziling1,2, MAO Cunli1,2,+()   

  1. 1. Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China
    2. Key Laboratory of Artificial Intelligence, Kunming University of Science and Technology, Kunming 650500, China
  • Received:2020-10-26 Revised:2021-01-06 Online:2022-04-01 Published:2021-02-03
  • About author:WANG Hongbin, born in 1983, Ph.D., associate professor, M.S. supervisor. His research interests include intelligent information system, natural language processing and data analysis.
    JIN Ziling, born in 1995, M.S. candidate. Her research interest is natural language processing.
    MAO Cunli, born in 1977, Ph.D., associate professor, M.S. supervisor, member of CCF. His research interests include natural language processing, information retrieval and machine translation.
  • Supported by:
    National Natural Science Foundation of China(61966020)

摘要:

由于抽取式摘要抽取句子有较强的人为判断主观性,不能准确客观评测出文章中实际每个句子对摘要的重要程度,以及每句话中每个词对句子重要程度的影响,从而影响了摘要的抽取质量。针对该问题,提出了一种结合层级注意力的抽取式新闻文本自动摘要方法。首先,该方法通过对英文新闻文本进行层级编码并依次加入词级注意力、句级注意力,得到结合层级注意力的文本表示。其次,通过神经网络构建动态打分函数并依次选择出打分函数中分值最高的候选句子作为摘要句。最后,抽取出英文新闻文本所对应的摘要。所提方法在CNN/Daily Mail、New York Times与Multi-News公共数据集上均进行了实验验证,实验结果表明所提方法的ROUGE评测值与目前最好的模型相比表现相当,ROUGE F1值较baseline分别提高了1.78、0.70与1.44个百分点。由此表明该方法在英文新闻文本抽取式摘要任务上具有泛化性与有效性,并且与现有方法相比具有一定的优越性。

关键词: 英文新闻, 抽取式摘要, 层级注意力, 打分函数

Abstract:

Extractive summarization is of strong human subjectivity, it is therefore impossible to evaluate the importance of each sentence in the article and the influence of each word on the sentence, which would affect the quality of extractive summarization. In response to this problem, this paper proposes an automatic text summarization approach to news text combined with hierarchical attention. Firstly, this method uses hierarchical coding of English news text and adds word-level attention and sentence-level attention in turn to obtain a text representation combined with hierarchical attention. Secondly, a dynamic scoring function is constructed through the neural network and the candidate sentence with the highest score in the scoring function is selected in turn as the summary sentence. Finally, the summarization is extracted corresponding to the English news text. The proposed method is experimentally verified on public datasets of CNN/Daily Mail, New York Times and Multi-News. Experimental results show that the ROUGE evaluation value of the proposed method is equivalent to the current best model, and the ROUGE F1 value is increased by 1.78, 0.70 and 1.44 percentage points respectively than the baseline, which shows that the method has generalization and effectiveness in the task of extracting English news texts, and it has certain advantages compared with the existing methods.

Key words: English news, extractive summarization, hierarchical attention, scoring function

中图分类号: