计算机科学与探索 ›› 2012, Vol. 6 ›› Issue (12): 1076-1086.DOI: 10.3778/j.issn.1673-9418.2012.12.002

• 学术研究 • 上一篇    下一篇

EDM:高效的微博事件检测算法

童  薇+,陈  威,孟小峰   

  1. 中国人民大学 信息学院,北京 100872
  • 出版日期:2012-12-01 发布日期:2012-12-03

EDM: An Efficient Algorithm for Event Detection in Microblogs

TONG Wei+, CHEN Wei, MENG Xiaofeng   

  1. School of Information, Renmin University of China, Beijing 100872, China
  • Online:2012-12-01 Published:2012-12-03

摘要: 微博数据具有实时动态特性,人们通过分析微博数据可以检测现实生活中的事件。同时,微博数据的海量、短文本和丰富的社交关系等特性也为事件检测带来了新的挑战。综合考虑了微博数据的文本特征(转帖、评论、内嵌链接、用户标签hashtag、命名实体等)、语义特征、时序特性和社交关系特性,提出了一种有效的基于微博数据的事件检测算法(event detection in microblogs,EDM)。还提出了一种通过提取事件关键要素,即关键词、命名实体、发帖时间和用户情感倾向性,构成事件摘要的方法。与基于LDA(latent Dirichlet allocation)模型的事件检测算法进行实验对比,结果表明,EDM算法能够取得更好的事件检测效果,并且能够提供更直观可读的事件摘要。

关键词: 事件检测, 事件摘要, 特征选取, 微博

Abstract: Microblog data have the characteristics of real-time dynamics, so we can monitor the microblog data to detect events in real life. However, the characteristics of the microblog data, such as the big data, short texts, rich social information and so on, also bring challenges. This paper proposes a novel event-detection algorithm based on microblog data—EDM algorithm, according to the textual characteristics of microblog data (retweeting, commenting, shorten url, hashtag and named entities), semantic features, time features and social information. Besides, this paper extracts keywords, named entities, the publishing time of posts and sentiment analysis for event summarization. Compared with LDA (latent Dirichlet allocation) model, the experimental results demonstrate that the proposed EDM algorithm works better in event detection and offers an intuitive event summary.

Key words: event detection, event summarization, feature selection, microblog