Journal of Frontiers of Computer Science and Technology ›› 2022, Vol. 16 ›› Issue (6): 1334-1342.DOI: 10.3778/j.issn.1673-9418.2101030

• Artificial Intelligence • Previous Articles     Next Articles

Rumor Detection Based on Representative User Characteristics Learning Through Propagation

XIE Xintong1,2, HU Yueyang2,5, LIU Xuanzhe1,2, ZHAO Yaoshuai3,4, JIANG Hai’ou5,6,+()   

  1. 1. School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China
    2. Key Laboratory of High Confidence Software Technologies of Ministry of Education (Peking University), Beijing 100871, China
    3. TravelSky Technology Limited, Beijing 101318, China
    4. Key Laboratory of Intelligent Passenger Service of Civil Aviation, Civil Aviation Administration of China, Beijing 101318, China
    5. School of Software and Microelectronics, Peking University, Beijing 102600, China
    6. Peking University Information Technology Institute (Tianjin Binhai), Tianjin 300452, China
  • Received:2021-01-07 Revised:2021-04-25 Online:2022-06-01 Published:2021-04-29
  • About author:XIE Xintong, born in 1998, M.S. candidate. Her research interest is data analysis.
    HU Yueyang, born in 1997,M.S. candidate. His research interests include big data and block chain.
    LIU Xuanzhe, born in 1980, Ph.D., associate professor. His research interests include services computing and system software.
    ZHAO Yaoshuai, born in 1977, M.S., senior software engineer. His research interests include big data and artificial intelligence.
    JIANG Haiou, born in 1987, Ph.D., assistant researcher. Her research interests include cloud computing, big data and machine learning.
  • Supported by:
    National Key Research and Development Program of China(2018YFB1004400);Beijing Outstanding Young Scientist Program(BJJWZYJH01201910001004)

传播用户代表性特征学习的谣言检测方法

谢欣彤1,2, 胡悦阳2,5, 刘譞哲1,2, 赵耀帅3,4, 姜海鸥5,6,+()   

  1. 1. 北京大学 信息科学技术学院,北京 100871
    2. 高可信软件技术教育部重点实验室(北京大学),北京 100871
    3. 中国民航信息网络股份有限公司,北京 101318
    4. 中国民用航空局 民航旅客服务智能化应用技术重点实验室,北京 101318
    5. 北京大学 软件与微电子学院,北京 102600
    6. 北京大学(天津滨海)新一代信息技术研究院,天津 300452
  • 通讯作者: + E-mail: seagullwill@foxmail.com
  • 作者简介:谢欣彤(1998—),女,广东大埔人,硕士研究生,主要研究方向为数据分析。
    胡悦阳(1997—),男,安徽蚌埠人,硕士研究生,主要研究方向为大数据、区块链。
    刘譞哲(1980—),男,甘肃兰州人,博士,副教授,主要研究方向为服务计算、系统软件。
    赵耀帅(1977—),男,山东济宁人,硕士,高级软件工程师,主要研究方向为大数据、人工智能。
    姜海鸥(1987—),女,辽宁丹东人,博士,助理研究员,主要研究方向为云计算、大数据、机器学习。
  • 基金资助:
    国家重点研发计划(2018YFB1004400);北京高等学校卓越青年科学家项目(BJJWZYJH01201910001004)

Abstract:

Effective rumor detection and management has become an essential part of Internet plus government services initiative. The Internet era brings great convenience to people’s communication as well as speeds up the propagation of rumors, which not only interferes people’s normal living but also does harm to the social confidence system. Existing work of rumor debunking on the Internet is mostly based on manual work of public tip-offs and screening, which is time consuming and demanding. Meanwhile, work on algorithm of rumor detection based on data mining and machine learning depends heavily on text content, which is deficient during the early stage of rumor propagation. This paper constructs latest dataset Weibo2020, composed of both rumors and normal information, and extracts representative user characteristics from the perspective of statistics, then proposes an algorithm of early-stage rumor detection based on brief propagation path, named RPPC (representative propagation path classification). The experimental results indicate that the proposed method can improve the prediction accuracy by 2.57 percentage points while reducing the input data scale by 50%. Meanwhile, the proposed method can predict the authenticity of news released in 5 minutes and achieve an accuracy of about 80%. Therefore, the proposed method achieves good results in a limited size of dataset and can to some degree help with network public opinion governance and improve the efficiency and quality of government service.

Key words: rumor detection, machine learning, characteristic analysis, propagation path, Internet plus government services initiative, public opinion management

摘要:

谣言的及时发现和有效管控,是互联网+政务服务中公共舆情治理的重要组成部分。互联网和移动互联网的发展,提高了民众沟通交流的便利度,同时也加速了谣言的传播速度和广度,极大地提高了谣言的影响力和危害力,给民众的生产生活带来干扰,也严重影响社会秩序。现有的网络平台辟谣工作大多依赖于人工举报筛查,往往耗费大量的时间和精力。而利用数据挖掘、机器学习技术实现的谣言检测算法大多基于文本信息,常用于追溯性谣言检测,不适用于谣言扩散早期数据量不足的情况。首先收集最新的网络平台数据进行标注构造数据集Weibo2020,对其中用户特征分布进行统计分析并选择具有代表性的用户特征,进而提出了基于传播用户代表性特征学习的早期谣言检测方法(RPPC)。经实验验证,RPPC与同样基于传播路径的算法在同等条件下,在输入数据规模减少50%的同时,将准确率提高了2.57个百分点。此外,该方法能对5 min内发布的消息进行检测,快速发现互联网内容中的疑似谣言且准确率达到近80%。因此可以认为提出的方法在现有的数据集中有较好的表现,能够在一定程度上辅助政府部门的舆情治理工作,从而提高政务服务的时效及质量。

关键词: 谣言检测, 机器学习, 特征分析, 传播路径, 互联网+政务服务, 舆情治理

CLC Number: