计算机科学与探索 ›› 2023, Vol. 17 ›› Issue (2): 428-441.DOI: 10.3778/j.issn.1673-9418.2104090

• 人工智能·模式识别 • 上一篇    下一篇

集成层级图注意力网络检测非均衡虚假评论

赵敏,张月琴,窦英通,张泽华   

  1. 1. 太原理工大学 信息与计算机学院,太原 030024
    2. Department of Computer Science, University of Illinois at Chicago, Chicago 60607, USA
  • 出版日期:2023-02-01 发布日期:2023-02-01

Imbalanced Fake Reviews?Detection with Ensemble Hierarchical Graph Attention Network

ZHAO Min, ZHANG Yueqin, DOU Yingtong, ZHANG Zehua   

  1. 1. College of Information and Computer, Taiyuan University of Technology, Taiyuan 030024, China
    2. Department of Computer Science, University of Illinois at Chicago, Chicago 60607, USA
  • Online:2023-02-01 Published:2023-02-01

摘要: 作为机器学习当前一大热点,图神经网络(GNN)模型近年来已逐渐开始结合用户评论应用于欺诈检测领域。但现实中汇总的用户评论涉及多个不同领域,可用信息复杂多样,海量的用户生成内容中欺诈信息通常也只占少数,基于GNN的相关检测方法对虚假评论的识别效果不甚理想。针对这种特征异构和数据分布不均衡的问题,将评论系统进行异构网络建模,提出一种新的集成层次图注意力网络(En-HGAN)识别方法。通过融合层次注意力结构,更加充分地利用异构网络中丰富的用户行为信息,为评论学习更加丰富的语义表征,并在集成学习Bagging框架下集成多个差异化的HGAN子模型,使用随机欠采样策略实现基学习器多样性聚合,从而减少有效信息丢失,增强对欺诈评论的检测能力。在YelpChi与Amazon真实数据集上的实验结果表明,En-HGAN方法具有良好的异常探测性能,和当前一些最新的方法相比,在数据类别倾斜分布的应用中显示En-HGAN方法对欺诈实体具有不错的鲁棒性。

关键词: 虚假评论检测, 层次图注意力网络, 网络表征学习, 集成学习, 非均衡数据分类

Abstract: As a hot spot in machine learning, graph neural networks (GNN) have recently begun to be applied in the field of fraud detection involving user reviews. In reality, the collected user comments involve diverse fields and complex information, and the fraud information in the massive user-generated content is usually in the minority, so that the GNN-based fraud detection methods are not ideal for this task. Aiming to solve the problems of heterogeneous features and uneven data distribution, a new ensemble hierarchical graph attention network (En-HGAN) detection method is proposed through modeling the review system as a heterogeneous network. The hierarchical attention is used to learn representations with richer semantics for comments by making full use of user behavior information in the heterogeneous network, and the Bagging framework introducing random under sampling is adopted to aggregate multiple discriminative HGAN sub-models, thereby reducing the effective information loss as well as enhancing the detection ability for fraud comments. Experimental results on YelpChi and Amazon real datasets show that this method has good anomaly detection performance. Compared with state-of-the- art methods, experimental results show that this method has nice robustness to deceptive entities when the data category is skewed.

Key words: fake review detection, hierarchical graph attention network, network representation learning, ensemble learning, imbalanced data classification