Imbalanced Fake Reviews?Detection with Ensemble Hierarchical Graph Attention Network

doi:10.3778/j.issn.1673-9418.2104090

Abstract

Abstract: As a hot spot in machine learning, graph neural networks (GNN) have recently begun to be applied in the field of fraud detection involving user reviews. In reality, the collected user comments involve diverse fields and complex information, and the fraud information in the massive user-generated content is usually in the minority, so that the GNN-based fraud detection methods are not ideal for this task. Aiming to solve the problems of heterogeneous features and uneven data distribution, a new ensemble hierarchical graph attention network (En-HGAN) detection method is proposed through modeling the review system as a heterogeneous network. The hierarchical attention is used to learn representations with richer semantics for comments by making full use of user behavior information in the heterogeneous network, and the Bagging framework introducing random under sampling is adopted to aggregate multiple discriminative HGAN sub-models, thereby reducing the effective information loss as well as enhancing the detection ability for fraud comments. Experimental results on YelpChi and Amazon real datasets show that this method has good anomaly detection performance. Compared with state-of-the- art methods, experimental results show that this method has nice robustness to deceptive entities when the data category is skewed.

Key words: fake review detection, hierarchical graph attention network, network representation learning, ensemble learning, imbalanced data classification

摘要： 作为机器学习当前一大热点，图神经网络（GNN）模型近年来已逐渐开始结合用户评论应用于欺诈检测领域。但现实中汇总的用户评论涉及多个不同领域，可用信息复杂多样，海量的用户生成内容中欺诈信息通常也只占少数，基于GNN的相关检测方法对虚假评论的识别效果不甚理想。针对这种特征异构和数据分布不均衡的问题，将评论系统进行异构网络建模，提出一种新的集成层次图注意力网络（En-HGAN）识别方法。通过融合层次注意力结构，更加充分地利用异构网络中丰富的用户行为信息，为评论学习更加丰富的语义表征，并在集成学习Bagging框架下集成多个差异化的HGAN子模型，使用随机欠采样策略实现基学习器多样性聚合，从而减少有效信息丢失，增强对欺诈评论的检测能力。在YelpChi与Amazon真实数据集上的实验结果表明，En-HGAN方法具有良好的异常探测性能，和当前一些最新的方法相比，在数据类别倾斜分布的应用中显示En-HGAN方法对欺诈实体具有不错的鲁棒性。

关键词: 虚假评论检测, 层次图注意力网络, 网络表征学习, 集成学习, 非均衡数据分类

ZHAO Min, ZHANG Yueqin, DOU Yingtong, ZHANG Zehua. Imbalanced Fake Reviews?Detection with Ensemble Hierarchical Graph Attention Network[J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(2): 428-441.

赵敏, 张月琴, 窦英通, 张泽华. 集成层级图注意力网络检测非均衡虚假评论[J]. 计算机科学与探索, 2023, 17(2): 428-441.

References

[1] WU Y Y, NGAI E W T, WU P K, et al. Fake online reviews: literature review, synthesis, and directions for future research[J]. Decision Support Systems, 2020, 132: 113280.
[2] YUAN L, LI D, WEI S K, et al. Research of deceptive review detection based on target product identification and metapath feature weight calculation[J]. Complexity, 2018: 1-12.
[3] 周黎宇. 基于非均衡数据分类方法的虚假评论检测研究[D]. 合肥: 合肥工业大学, 2018.
ZHOU L Y. Research on review spam detection based on imbalanced data classification method[D]. Hefei: Hefei University of Technology, 2018.
[4] NAJADA H A, ZHU X Q. iSRD: spam review detection with imbalanced data distributions[C]//Proceedings of the 15th International Conference on Information Reuse and Integration, Redwood City, Aug 13-15, 2014. Washington:IEEE Computer Society, 2014: 553-560.
[5] WU Z H, PAN S R, CHEN F W, et al. A comprehensive survey on graph neural networks[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021, 32(1): 4-24.
[6] KIPF T N, WELLING M. Semi-supervised classification with graph convolutional networks[C]//Proceedings of the 5th International Conference on Learning Representations, Toulon, Apr 24-26, 2017: 1-13.
[7] VASANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems 30, Long Beach, Dec 4-9, 2017. Red Hook: Curran Associates, 2017: 5998-6008.
[8] ZHANG Y M, FAN Y J, YE Y F, et al. Key player identification in underground forums over attributed heterogeneous information network embedding framework[C]//Proceedings of the 28th International Conference on Information and Knowledge Management, Beijing, Nov 3-7, 2019. New York: ACM, 2019: 549-558.
[9] WANG J Y, WEN R, WU C M, et al. FdGars: fraudster detection via graph convolutional networks in online App review system[C]//Proceedings of the 2019 World Wide Web Conference, San Francisco, May 13-17, 2019. New York: ACM, 2019: 310-316.
[10] JOHNSON J M, KHOSHGOFTAAR T M. Survey on deep learning with class imbalance[J]. Journal of Big Data, 2019, 6(1): 27.
[11] REN Y F, JI D H. Learning to detect deceptive opinion spam: a survey[J]. IEEE Access, 2019, 7: 42934-42945.
[12] SHOJAEE S, MURAD M A A, AZMAN A B, et al. Detecting deceptive reviews using lexical and syntactic features[C]//Proceedings of the 13th International Conference on Intellient Systems Design and Applications, Salangor, Dec 8-10, 2013. Piscataway: IEEE, 2013: 53-58.
[13] 任亚峰, 尹兰, 姬东鸿. 基于语言结构和情感极性的虚假评论识别[J]. 计算机科学与探索, 2014, 8(3): 313-320.
REN Y F, YIN L, JI D H. Deceptive reviews detection based on language structure and sentiment polarity[J]. Journal of Frontiers of Computer Science and Technology, 2014, 8(3): 313-320.
[14] 王梦华. 基于半监督学习的虚假评论识别研究[D]. 南京: 南京财经大学, 2018.
WANG M H. Research on fake reviews based on semi-supervised learning[D]. Nanjing: Nanjing University of Finance & Economics, 2018.
[15] XU Y Q, SHI B, TIAN W T, et al. A unified model for unsupervised opinion spamming detection incorporating text generality[C]//Proceedings of the 24th International Joint Conference on Artificial Intelligence, Buenos Aires, Jul 25-31, 2015. Menlo Park: AAAI, 2015: 725-732.
[16] SHEHNEPOOR S, SALEHI M, FARAHBAKHSH R, et al. NetSpam: a network-based spam detection framework for reviews in online social media[J]. IEEE Transactions on Information Forensics & Security, 2017, 12(7): 1585-1595.
[17] LI B T, PI D C. Network representation learning: a systematic literature review[J]. Neural Computing and Applications, 2020, 32(21): 16647-16679.
[18] XU G X, HU M X, MA C, et al. GSCPM: CPM-based group spamming detection in online product reviews[C]// Proceedings of the 2019 International Conference on Comm-unications, Shanghai, May 20-24, 2019. Piscataway: IEEE, 2019: 1-6.
[19] AKOGLU L, CHANDY R, FALOUTSOS C. Opinion fraud detection in online reviews by network effects[C]//Proceedings of the 7th International Conference on Weblogs and Social Media, Cambridge, Jul 8-11, 2013. Menlo Park: AAAI, 2013: 2-11.
[20] VELICKOVIC P, CUCURULL G, CASANOVA A, et al. Graph attention networks[C]//Proceedings of the 6th Inter-national Conference on Learning Representations, Vancouver, Apr 30-May 3, 2018: 1-12.
[21] 李璐旸, 秦兵, 刘挺. 虚假评论检测研究综述[J]. 计算机学报, 2018, 41(4): 946-968.
LI L Y, QIN B, LIU T. Survey on fake review detection research[J]. Chinese Journal of Computers, 2018, 41(4): 946-968.
[22] SINGH A, PUROHIT A. A survey on methods for solving data imbalance problem for classification[J]. International Journal of Computer Applications, 2015, 127(15): 37-41.
[23] ZHOU Z H.?Ensemble methods: foundations and algorithms[M]. Boca Raton: CRC Press, 2012.
[24] LEE Y S. Ensemble classification method for imbalanced data using deep learning[C]//Proceedings of the 17th Workshop on e-Business, Santa Clara, Dec 12, 2018. Cham: Springer, 2018: 162-170.
[25] RAYANA S, AKOGLU L. Collective opinion spam detection: bridging review networks and metadata[C]//Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, Aug 10-13, 2015. New York: ACM, 2015: 985-994.
[26] MCAULEY J J, LESKOVEC J. From amateurs to conn-oisseurs: modeling the evolution of user expertise through online reviews[C]//Proceedings of the 22nd International World Wide Web Conference, Rio de Janeiro, May 13-17, 2013. New York: ACM, 2013: 897-908.
[27] ZHANG S J, YIN H Z, CHEN T, et al. GCN-based user representation learning for unifying robust recommendation and fraudster detection[C]//Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2020: 689-698.
[28] DOU Y T, LIU Z W, SUN L, et al. Enhancing graph neural network-based fraud detectors against camouflaged fraudsters[C]//Proceedings of the 29th International Conference on Information and Knowledge Management, Ireland, Oct 19-23, 2020. New York: ACM, 2020: 315-324.
[29] LIU Z W, DOU Y T, YU P S, et al. Alleviating the inconsistency problem of applying graph neural network to fraud detection[C]//Proceedings of the 43rd International ACM SIGIR Conference on Research and development in Information Retrieval. New York: ACM, 2020: 1569-1572.
[30] HAMILTON W L, YING Z T, LESKOVEC J. Inductive representation learning on large graphs[C]//Advances in Neural Information Processing Systems 30, Long Beach, Dec 4-9, 2017. Red Hook: Curran Associates, 2017: 1024-1034.
赵敏（1997—），女，陕西渭南人，硕士研究生，主要研究方向为深度学习、异常检测。
ZHAO Min, born in 1997, M.S. candidate. Her research interests include deep learning and anomaly detection.