Graph Oversampling Method Based on Graph Latent Representation Distribution Learning

doi:10.3778/j.issn.1673-9418.2407119

Abstract

Abstract: In the real world, many graph datasets suffer from class imbalance issues, typically manifesting at the node, edge, and graph levels. Common oversampling-based methods for addressing graph-level imbalance often lead to model overfitting due to a lack of sample diversity. To address this issue, a graph latent representation distribution learning-based graph oversampling method called GLRD-GAN is proposed. First, a graph latent representation distribution learning method is introduced, utilizing a pre-trained Variational graph auto- encoder(VGAE) and a fully connected neural network to learn the latent representation distribution of minority class graph samples in the low-dimensional space. The latent representation information is randomly sampled on this distribution and fused with the original minority class latent representation, ensuring the diversity of the minority class latent representation. Second, a dual-decoder-based graph generator is designed. The pre-trained inner product decoder and graph convolution decoder make full use of the sampled latent representations to generate the topological structure and node features of graph data, respectively. Finally, a GAN discriminator is employed to detect the authenticity and class of the generated graphs, supervising the effectiveness of the generated samples, thereby achieving the generation of diverse minority class graph samples. Comparative experiments and visualization observations were conducted on five representative long-tail graph datasets. The results show that the proposed graph latent representation distribution learning-based graph oversampling method outperforms other methods by 1%-4% in terms of Acc and F1 scores, and can generate effective minority class graph samples.

Key words: long-tail recognition, variational graph auto-encoder, graph latent representation, generative adversarial network

摘要： 现实世界中许多图数据存在类别分布不平衡的问题，其通常表现在节点、边和图三个级别。常用的基于过采样的图级不平衡处理方法，因样本缺乏多样性，会导致模型过拟合。针对该问题，本文提出一种图潜向量分布学习的图过采样方法GLRD-GAN。首先，提出一种图潜向量分布学习方法，利用预训练的图变分自编码器（VGAE）和全连接神经网络学习少数类图样本在低维空间内的潜向量分布，在该分布上随机采样潜向量信息并与原少数类潜向量融合，保证了少数类潜向量的多样性。其次，设计了一种基于双解码器的图样本生成器，经预训练的内积解码器和图卷积解码器充分利用采样的潜向量来分别生成图数据的拓扑结构和节点特征。最后，通过GAN判别器检测生成样本的真伪和类别，监督生成样本的有效性，实现多样性的少数类图样本生成。在5个具有代表性的长尾图数据集上进行了对比实验和可视化观察，结果表明本文提出的基于图潜向量分布学习的图过采样方法在Acc和F1值上较其他方法平均高出1%-4%，且能够生成有效的少数类图样本。

关键词: 长尾问题, 图变分自编码器, 图潜向量, 生成对抗网络

REN Bo, DONG Minggang, YU Yang, LU Xianrui. Graph Oversampling Method Based on Graph Latent Representation Distribution Learning[J]. Journal of Frontiers of Computer Science and Technology, DOI: 10.3778/j.issn.1673-9418.2407119.

任博, 董明刚, 于扬, 卢贤睿. 基于图潜向量分布学习的图过采样方法[J]. 计算机科学与探索, DOI: 10.3778/j.issn.1673-9418.2407119.

[1]	JIANG Youpeng, HUA Yang, SONG Xiaoning. Domain Adaptation Algorithm for 3D Human Pose Estimation with Spatial Attention and Position Optimization [J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(9): 2384-2394.
[2]	XU Zhihong, HAO Xuemei, WANG Liqin, DONG Yongfeng, WANG Xu. Research on Knowledge Graph Entity Prediction Method of Multi-modal Curriculum Learning [J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(6): 1590-1599.
[3]	LI Xiangxia, CHEN Kairui, LI Bin. Generative Adversarial Network Recommendation System with Multi-dimensional Gradient Feedback Mechanism [J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(6): 1579-1589.
[4]	GONG Ying, XU Wentao, ZHAO Ce, WANG Binjun. Review of Application of Generative Adversarial Networks in Image Restoration [J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(3): 553-573.
[5]	WANG Tao, ZHANG Yushu, ZHAO Ruoyu, WEN Wenying, ZHU Youwen. Protecting Face Privacy via Beautification [J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(1): 244-251.
[6]	XU Yan, GUO Xiaoyan, RONG Leilei. Review of Research on Vehicle Re-identification Methods with Unsupervised Learning [J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(5): 1017-1037.
[7]	SUN Jiaze+, TANG Yanmei, WANG Shuyan. Model Robustness Optimization Method Using GAN and Feature Pyramid [J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(5): 1139-1146.
[8]	WANG Haobai, SHEN Xin, HUANG Weijian, CHEN Kejia. Protein-HVGAE: Protein Encoding Method in Hyperbolic Space [J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(3): 701-708.
[9]	WANG Cong, WANG Jie, LIU Quanming, LIANG Jiye. Semi-supervised Learning on Graphs Using Adversarial Training with Generated Sample [J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(2): 367-375.
[10]	XIA Hongbin, XIAO Yifei, LIU Yuan. Long Text Generation Adversarial Network Model with Self-Attention Mechanism [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(7): 1603-1610.
[11]	SHEN Ruicai, ZHAI Junhai, HOU Yingzhen. Multi-discriminator Generative Adversarial Networks Based on Selective Ensemble Learning [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(6): 1429-1438.
[12]	LIN Jiawei, WANG Shitong. Deep Adversarial-Reconstruction-Classification Networks for Unsupervised Domain Adaptation [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(5): 1107-1116.
[13]	JIANG Yi, XU Jiajie, LIU Xu, ZHU Junwu. Research on Edge-Guided Image Repair Algorithm [J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(3): 669-682.
[14]	LI Ximing, WU Jiarun, WU Shaoqian. GANs Based Privacy Amplification Against Bounded Adversaries [J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(7): 1220-1226.
[15]	LYU Haoyuan, YU Lu, ZHOU Xingyu, DENG Xiang. Review of Semi-supervised Deep Learning Image Classification Methods [J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(6): 1038-1048.

Graph Oversampling Method Based on Graph Latent Representation Distribution Learning

基于图潜向量分布学习的图过采样方法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics