计算机科学与探索 ›› 2013, Vol. 7 ›› Issue (8): 747-753.DOI: 10.3778/j.issn.1673-9418.1305004

• 学术研究 • 上一篇    下一篇

混合模型的微博交叉话题发现

詹  勇,杨  燕+,王红军   

  1. 西南交通大学 信息科学与技术学院,成都 610031
  • 出版日期:2013-08-01 发布日期:2013-08-06

Extracting Overlapping Topics from Micro-Blog Based on Mixture Model

ZHAN Yong, YANG Yan+, WANG Hongjun   

  1. School of Information Science and Technology, Southwest Jiaotong University, Chengdu 610031, China
  • Online:2013-08-01 Published:2013-08-06

摘要: 微博具有信息量庞大,信息分散多样等特点,已经成为快速分享和传播信息的新平台。传统话题发现算法大部分都是基于划分的,没有考虑话题之间的关联性,存在一定的局限性,因此研究了大规模微博文本集上的话题发现问题。采用具有分词准确率较高、歧义识别特点的西南交通大学思维与智慧研究所中文分词系统对文本进行分词处理,并提出了基于混合模型的微博交叉话题发现算法。实验结果表明,该算法具有一定可行性和有效性。

关键词: 微博, 交叉话题发现, 混合模型

Abstract: Micro-blog is a new platform to share and disseminate information quickly. It is characterized by huge amount of scattered and diverse information. The most of traditional topics extraction algorithms are partitioning method, which do not consider the relationship between the topics, so there are some limitations. This paper focuses on the task of news topics extraction from large-scale short posts of micro-blog service. The word segmentation is processed according to the characteristics of the micro-blog text using the Chinese word segmentation software with high accuracy and ambiguity recognition, which is developed by Institute of Noetics and Wisdom, Southwest Jiaotong University. And then, this paper proposes an overlapping topic detection algorithm based on mixture model. The experimental results prove the feasibility and validity of the algorithm.

Key words: micro-blog, overlapping topic detection, mixture model