计算机科学与探索 ›› 2016, Vol. 10 ›› Issue (4): 573-581.DOI: 10.3778/j.issn.1673-9418.1509078

• 人工智能与模式识别 • 上一篇    下一篇

基于热度联合排序的微博热点话题发现

刘培玉1,2,侯秀艳1,2+,朱振方3,刘  芳1,2,蔡肖红1,2   

  1. 1. 山东师范大学 信息科学与工程学院,济南 250014
    2. 山东省分布式计算机软件新技术重点实验室,济南 250014
    3. 山东交通学院 信息科学与电气工程学院,济南 250357
  • 出版日期:2016-04-01 发布日期:2016-04-01

Micro-Blog Hot Topic Detection Based on Heat Co-ranking

LIU Peiyu1,2, HOU Xiuyan1,2+, ZHU Zhenfang3, LIU Fang1,2, CAI Xiaohong1,2   

  1. 1. School of Information Science & Engineering, Shandong Normal University, Jinan 250014, China
    2. Shandong Provincial Key Laboratory for Distributed Computer Software Novel Technology, Jinan 250014, China
    3. College of Information Science and Electrical Engineering, Shandong Traffic Institute, Jinan 250357, China
  • Online:2016-04-01 Published:2016-04-01

摘要: 微博热点话题发现对于舆情分析和观点挖掘具有重要作用,提出了一种基于热度联合排序的微博热点话题发现方法,并构建统一的模型框架将微博文本和热点主题词之间的各种关系进行了有机融合;考虑微博用户的权威性以及主题词的时间段相关特性,对微博文本和主题词的热度进行了联合排序并互相增强;使用主题词组合支持度作为阈值对热度序列中的主题词进行聚类以表征热点话题。实验表明,所提方法对于热点主题词的抽取以及热点话题发现具有较高准确性,可以及时有效地发现特定时间段内的微博热点话题。

关键词: 热点话题, 主题词, 微博文本, 联合排序, 热度序列

Abstract: Micro-blog hot topic detection plays an important role in public opinion analysis and opinion mining. In   order to reduce the impact of data sparsity on topic detection, this paper proposes an approach for micro-blog hot topic detection based on heat co-ranking, builds a unified model framework to organically integrate all relationships between micro-blog texts and topic keywords. The authority of micro-blog user and the time-related characteristics of topic keywords are simultaneously considered, and the heat of micro-blog texts and topic keywords gets mutual reinforcement and co-ranking. Topic keywords in hot sequence are clustered by using the combination support confidence as a threshold. The experimental results show that the proposed method has high accuracy for hot keywords extraction and hot topic detection, can effectively discover micro-blog hot topics in a specific period.

Key words: hot topic, topic keywords, micro-blog text, co-ranking, heat sequence