计算机科学与探索 ›› 2016, Vol. 10 ›› Issue (3): 372-380.DOI: 10.3778/j.issn.1673-9418.1509085

• 人工智能与模式识别 • 上一篇    下一篇

使用关键词扩展的新闻文本自动摘要方法

李  峰1,2+,黄金柱3,李舟军1,杨伟铭2   

  1. 1. 北京航空航天大学 计算机学院,北京 100191
    2. 中国人民解放军后勤科学研究所,北京 100166
    3. 中国人民解放军外国语学院 语言工程系,河南 洛阳 471003
  • 出版日期:2016-03-01 发布日期:2016-03-11

Automatic Summarization Method of News Texts Using Keywords Expansion

LI Feng1,2+, HUANG Jinzhu3, LI Zhoujun1, YANG Weiming2   

  1. 1. School of Computer Science and Engineering, Beihang University, Beijing 100191, China
    2. Logistics Science Research Institute of PLA, Beijing 100166, China
    3. Department of Language Engineering, PLA University of Foreign Languages, Luoyang, Henan 471003, China
  • Online:2016-03-01 Published:2016-03-11

摘要: 提出了使用关键词扩展的新闻文本自动摘要方法。该方法从大规模的语料中提取与输入文档相近主题的文本组成背景语料,并基于背景语料进行关键词的扩展,强化关键词对文摘句的指示作用,从而提高新闻文本摘要抽取质量。研究和实验表明,该方法在Rouge-1、Rouge-2评测中取得了优于基于关键词、基于TextRank和基于Manifold Ranking方法的结果。在研究中组织制定了100篇新闻文本的4份中文新闻文本标准评价集,研制了基于关键词扩展的中文新闻文本自动摘要系统,开发了面向中文的基于ROUGE原理的新闻文本摘要结果自动评测系统,初步实现了从理论到实践的转化。

关键词: 关键词扩展, 相近文本, 自动摘要, 图算法, 系统实现

Abstract: This paper proposes an automatic summarization method of news texts using keywords expansion. This method extracts texts with similar topics from large-scale data for input text to form background data, and based on background data this method makes keywords expansion so that keywords can play more important role in guiding summary sentences and consequently improves the quality of news text summarization. The study and experiments show that the results obtained in Rouge-1 and Rouge-2 evaluations are better than those of methods based on keyword, TextRank and Manifold Ranking. This paper constructs a Chinese evaluation set which covers 100 news texts divided into 4 groups, and also develops keyword-based Chinese news text automatic summarization system and Chinese news text automatic evaluation system based on ROUGE theory. Through these systems, the theory put forward in the paper is realized and tested successfully.

Key words: keyword expansion, similar topic text, automatic summarization, graph algorithm, system implementation