Method of Credit Fraud Detection by Combining Sub-graph Selection and Neighborhood Filtering

doi:10.3778/j.issn.1673-9418.2402040

Abstract

Abstract: Credit fraud detection is a hot and difficult research topic in the field of financial fraud detection, especially fraud detection in the scenario of large-scale financial credit transactions. However, the extremely uneven distribution of fraudster nodes in the credit fraud review process and the problem of fraudster nodes disguising themselves have always been important challenges. Therefore, researchers propose a graph neural network model that integrates reconstruction subgraph selection and reinforced neighborhood filtering (RSRF-GNN) for large-scale Internet financial credit dynamic graphs. In order to improve the effectiveness of credit fraud audit, this method first defines the unbalanced distribution of the number of fraudsters and fraud camouflage problem from the data perspective. Then, according to the node category and access degree information design, the balance subgraph selection module is reconstructed to solve the unbalanced distribution of the number of fraudsters. Next, for the fraud camouflage problem, researchers introduce the reinforcement learning framework and design a neighborhood filtering module embedded in dynamic filtering neighborhood nodes. In addition, researchers design an edge aggregation module to aggregate the neighborhood edge embedding of central nodes, further enriching the expression of neighborhood embedding information of central nodes. Finally, experimental verification is conducted on a real dataset DGraph-Fin, and the results show that the RSRF-GNN model proposed in this paper has significantly improved the effectiveness compared with existing models. The RSRF-GNN model is improved by 5 to 8 percentage points in AUC and 18 to 29 percentage points in AP score, which is a significant advantage in model performance.

Key words: credit fraud detection, large-scale dynamic graph, distribution imbalance, fraud disguise, graph neural network

摘要： 信贷欺诈审核是金融欺诈检测领域的研究热点与难点，尤其是大规模金融信贷交易场景下的欺诈检测问题。然而，信贷欺诈审核过程中的欺诈者类节点数量分布极不平衡和欺诈者节点伪装自身问题一直是其所面临的重要挑战。基于此，面向大规模互联网金融信贷动态图提出融合重构子图选择和强化邻域过滤的图神经网络（RSRF-GNN）模型，以提高信贷欺诈审核的有效性。该方法从数据角度定义欺诈者数量分布不平衡和欺诈伪装问题。依据节点类别和出入度信息设计重构平衡子图选择模块以解决欺诈者数量分布不平衡问题。针对欺诈伪装问题，设计动态过滤邻域节点嵌入的邻域过滤模块。设计了边聚合模块聚合中心节点的邻域边嵌入，进一步提高中心节点邻域嵌入信息表达效果。为验证RSRF-GNN模型的有效性，在真实数据集DGraph-Fin上进行实验，结果表明RSRF-GNN模型在有效性方面比现有模型有较大提升。RSRF-GNN模型在AUC上提高了5~8个百分点，AP分数表现提高了18~29个百分点，模型性能优势显著。

关键词: 信贷欺诈审核, 大规模动态图, 分布不平衡, 欺诈伪装, 图神经网络

TANG Xiaoyong, WANG Haodong. Method of Credit Fraud Detection by Combining Sub-graph Selection and Neighborhood Filtering[J]. Journal of Frontiers of Computer Science and Technology, 2025, 19(2): 465-475.

唐小勇, 王浩东. 融合子图选择和邻域过滤的信贷欺诈审核方法[J]. 计算机科学与探索, 2025, 19(2): 465-475.

References

[1] HOOI B, SHIN K, SONG H A, et al. Graph-based fraud detection in the face of camouflage[J]. ACM Transactions on Knowledge Discovery from Data, 2017, 11(4): 44.
[2] SEEJA K R, ZAREAPOOR M. FraudMiner: a novel credit card fraud detection model based on frequent itemset mining[J]. The Scientific World Journal, 2014: 252797.
[3] FIORE U, DE SANTIS A, PERLA F, et al. Using generative adversarial networks for improving classification effectiveness in credit card fraud detection[J]. Information Sciences, 2019, 479: 448-455.
[4] 刘颖, 杨轲. 基于深度集成学习的类极度不均衡数据信用欺诈检测算法[J]. 计算机研究与发展, 2021, 58(3): 539-547.
LIU Y, YANG K. Credit fraud detection for extremely imbalanced data based on ensembled deep learning[J]. Journal of Computer Research and Development, 2021, 58(3): 539-547.
[5] HUANG X W, YANG Y, WANG Y, et al. DGraph: a large-scale financial dataset for graph anomaly detection[EB/OL]. [2023-12-26]. https://arxiv.org/abs/2207.03579.
[6] 傅湘玲, 闫晨巍, 赵朋亚, 等. 图表示学习方法在消费金融领域团伙欺诈检测中的研究[J]. 中文信息学报, 2022, 36(9): 120-128.
FU X L, YAN C W, ZHAO P Y, et al. Graph representation learning based group fraud risk detection in the consumer finance domain[J]. Journal of Chinese Information Processing, 2022, 36(9): 120-128.
[7] GE S J, MA G X, XIE S H, et al. Securing behavior-based opinion Spam detection[C]//Proceedings of the 2018 IEEE International Conference on Big Data. Piscataway: IEEE, 2018: 112-117.
[8] KAGHAZGARAN P, ALFIFI M, CAVERLEE J. Wide-ranging review manipulation attacks: model, empirical study, and countermeasures[C]//Proceedings of the 28th ACM International Conference on Information and Knowledge Management. New York: ACM, 2019: 981-990.
[9] ZHANG Z W, CUI P, ZHU W W. Deep learning on graphs: a survey[J]. IEEE Transactions on Knowledge and Data Engineering, 2022, 34(1): 249-270.
[10]吴安彪, 袁野, 乔百友, 等. 大规模时序图影响力最大化的算法研究[J]. 计算机学报, 2019, 42(12): 2647-2664.
WU A B, YUAN Y, QIAO B Y, et al. The influence maximization problem based on large-scale temporal graph[J]. Chinese Journal of Computers, 2019, 42(12): 2647-2664.
[11] PEROZZI B, AL-RFOU R, SKIENA S. Deepwalk: online learning of social representations[C]//Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2014: 701-710.
[12] GROVER A, LESKOVEC J. node2vec: scalable feature learning for networks[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2016: 855-864.
[13] 赵港, 王千阁, 姚烽, 等. 大规模图神经网络系统综述[J]. 软件学报, 2022, 33(1): 150-170.
ZHAO G, WANG Q G, YAO F, et al. Survey on large-scale graph neural network systems[J]. Journal of Software, 2022, 33(1): 150-170.
[14] KIPF T N, WELLING M. Semi-supervised classification with graph convolutional networks[C]//Proceedings of the 5th International Conference on Learning Representations, 2017: 24-26.
[15] HAMILTON L, YING R, LESKOVEC J. Inductive representation learning on large graphs[C]//Advances in Neural Information Processing Systems 30, Long Beach, Dec 4-9, 2017: 1025-1035.
[16] VELICKOVIC P, CUCURULL G, CASANOVA A, et al. Graph attention networks[C]//Proceedings of the 6th International Conference on Learning Representations, 2018: 1-2.
[17] SHI L S, WANG L, LONG C J, et al. SGCN: sparse graph convolution network for pedestrian trajectory prediction[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 8990-8999.
[18] ZHU J, YAN Y J, ZHAO L X, et al. Beyond homophily in graph neural networks: current limitations and effective designs[C]//Advances in Neural Information Processing Systems 33, Dec 6-12, 2020: 7793-7804.
[19] YAN S J, XIONG Y J, LIN D H. Spatial temporal graph convolutional networks for skeleton-based action recognition[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2018, 32(1): 7444-7452.
[20] LIU Z W, DOU Y T, YU P S, et al. Alleviating the inconsistency problem of applying graph neural network to fraud detection[C]//Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2020: 1569-1572.
[21] DOU Y T, LIU Z W, SUN L, et al. Enhancing graph neural network-based fraud detectors against camouflaged fraudsters[C]//Proceedings of the 29th ACM International Conference on Information & Knowledge Management. New York: ACM, 2020: 315-324.
[22] ZENG H Q, ZHOU H K, SRIVASTAVA A, et al. GraphSAINT: graph sampling based inductive learning method[C]//Proceedings of the 8th International Conference on Learning Representations, 2020.
[23] LIU Y, AO X, QIN Z D, et al. Pick and choose: a GNN-based imbalanced learning approach for fraud detection[C]//Proceedings of the Web Conference 2021. New York: ACM, 2021: 3168-3177.
[24] WANG D X, LIN J B, CUI P, et al. A semi-supervised graph attentive network for financial fraud detection[C]//Proceedings of the 2019 IEEE International Conference on Data Mining. Piscataway: IEEE, 2019: 598-607.
[25] ZHANG G, WU J, YANG J, et al. FRAUDRE: fraud detection dual-resistant to graph inconsistency and imbalance[C]//Proceedings of the 2021 IEEE International Conference on Data Mining. Piscataway: IEEE, 2021: 867-876.
[26] FABRIZIO F, EMANUELE R, DAVIDE E, et al. Sign: scalable inception graph neural networks[C]//Proceedings of the 37th International Conference on Machine Learning. New York: ACM, 2020.
[27] SHI Y S, HUANG Z J, FENG S K, et al. Masked label prediction: unified message passing model for semi-supervised classification[C]//Proceedings of the 30th International Joint Conference on Artificial Intelligence, 2021: 1548-1554.