计算机科学与探索 ›› 2017, Vol. 11 ›› Issue (1): 91-98.DOI: 10.3778/j.issn.1673-9418.1509090

• 网络与信息安全 • 上一篇    下一篇

微博演化网络的负信息分类方法

赵  一1,何克清1,李  昭2+,黄贻望1   

  1. 1. 武汉大学 计算机学院 软件工程国家重点实验室,武汉 430072
    2. 三峡大学 计算机与信息技术学院,湖北 宜昌 443002
  • 出版日期:2017-01-01 发布日期:2017-01-10

Micro Blog Evolutionary Network to Classification Method of Negative Information

ZHAO Yi1, HE Keqing1, LI Zhao2+, HUANG Yiwang1   

  1. 1. State Key Laboratory of Software Engineering, Computer School, Wuhan University, Wuhan 430072, China
    2. College of Computer and Information Technology, Three?Gorges University, Yichang, Hubei 443002, China
  • Online:2017-01-01 Published:2017-01-10

摘要: 针对Sina微博博文的转发关系,建立起用户转发博文之间的演化网络,从而利用SMO SVM(sequential minimal optimization support vector machine)分类算法对博文进行分类,筛选出恶意博文、垃圾广告、垃圾营销信息,使用户能够精确地屏蔽不想要的博文和博主。第一步基于微博转发关系的演化网络和SVM分类算法对整个Sina微博进行分类;第二步利用复杂网络等技术对经常发送恶意广告的博主进行标注,从而在网络中对他们进行屏蔽;最后找出垃圾信息的来源以及分辨出博主是不是恶意转发者,在宏观上能更好地遏制垃圾信息的传播。与用户从UCI数据集中实际反馈情况进行比较,实验结果表明,机器学习分类的实验结果吻合度达到89%。

关键词: 序列最小优化(SMO), 支持向量机(SVM), 演化网络, UCI数据集, 负信息

Abstract: Aiming at the relationship of the Sina micro blogging, this paper establishes the evolving network by user's transmit blog, which classifies blog by SMO SVM (sequential minimal optimization support vector machine) algorithm, and implements the classification of malicious posts, spam, trash marketing information. The method enables users to accurately block the unwanted posts and blogger. The first step, classifying the entire Sina micro blogs based on the evolving network of transmit relationship and SVM classification algorithm; The second step, annotating the bloggers of often sending malicious advertisements by using the complex network technology; When the malicious bloggers sending message, blocking them in the network; Finally, finding out the source of spam, and discerning the blogger malicious or not, on the macro to better curb the spread of spam.?The results of this paper are compared with user feedback actual situation from the UCI data set, the experimental results of machine learning classification reaches 89%.

Key words: sequential minimal optimization (SMO), support vector machine (SVM), evolutionary network, UCI data set, negative information