计算机科学与探索 ›› 2013, Vol. 7 ›› Issue (4): 368-376.DOI: 10.3778/j.issn.1673-9418.1205024

• 学术研究 • 上一篇    下一篇

融合内容与时间特征的中文新闻子话题聚类

仲兆满1+,李存华1,戴红伟1,刘宗田2   

  1. 1. 淮海工学院 计算机工程学院,江苏 连云港 222005
    2. 上海大学 计算机工程与科学学院,上海 200072
  • 出版日期:2013-04-01 发布日期:2013-04-02

Clustering Chinese News Subtopic Integrating Content and Time Features

ZHONG Zhaoman1+, LI Cunhua1, DAI Hongwei1, LIU Zongtian2   

  1. 1. School of Computer Engineering, Huaihai Institute of Technology, Lianyungang, Jiangsu 222005, China
    2. School of Computer Engineering and Science, Shanghai University, Shanghai 200072, China
  • Online:2013-04-01 Published:2013-04-02

摘要: 子话题是对话题的再次划分,是比话题粒度更细的新兴研究方向,子话题的聚类是话题内部演化关系分析的基础。提出了融合内容特征和时间特征的中文新闻子话题聚类方法,重点分析了子话题内容特征的表现规律,研究了子话题特征词的权重计算和降维方法。选取5个话题的18个子话题进行了实验,结果表明,所提方法的性能与已有的子话题聚类方法相比有显著提高。

关键词: 话题演化, 子话题聚类, 内容特征, 时间特征

Abstract: Subtopic is the division for the topic, and it is a new research direction compared with the topic. Subtopic clustering is the base for the  analysis of topic evolution relations. This paper proposes a new method of clustering Chinese news subtopic integrating content and time features. It focuses on the analysis of subtopic content feature in text, and studies the computation of subtopic word weights and the dimension reduction of subtopic words. Five topics including 18 subtopics are used to conduct the experiment. Experimental results show that the performance of the proposed method is better than the existing subtopic identification methods.

Key words: topic evolution, subtopic clustering, content features, time features