Semi-supervised Learning on Graphs Using Adversarial Training with Generated Sample

doi:10.3778/j.issn.1673-9418.2104119

Abstract

Abstract: Given a graph composed of a small number of labeled nodes and a large number of unlabeled nodes, semi-supervised learning on graphs aims to assign labels for the unlabeled nodes. Generative adversarial networks have shown strong ability in semi-supervised learning, but the research of generative adversarial networks for semi-supervised learning on graphs is few. The current work mainly focuses on the generation of unlabeled samples in low-density regions to weaken the information transmission between subgraphs, so as to make the decision boundary clearer. However, in this kind of methods, too few labeled samples is still the main challenge. This paper proposes a semi-supervised learning algorithm on graphs using adversarial training with generated sample. The algorithm is based on generative adversarial networks, which generates the labeled samples from the real sample distribution and the unlabeled samples different from the real sample distribution. The generated labeled samples expand the supervised information, while the generated unlabeled samples reduce the influence of neighboring nodes in the density gap, thus improving the semi-supervised classification effect on graphs. Compared with the existing methods, the proposed algorithm fully considers the effects of labeled samples and unlabeled samples on graph-based semi-supervised learning, which makes its classification ability stronger. Meanwhile, a large number of experiments are carried out on different datasets to verify the effectiveness of the method.

Key words: semi-supervised learning on graphs, generative adversarial networks (GAN), adversarial training, ge-nerated sample, graph embedding

摘要： 给定一个由少量标记节点和大量未标记节点组成的图，图半监督学习的目标是为图中的未标记节点分配标签。生成对抗网络已经在半监督学习中展示了强大的能力，但基于生成对抗网络的图半监督学习的研究工作较少，目前的工作主要关注在低密度区域生成未标记样本削弱子图之间的信息传播，从而使决策边界更清晰，但在这类方法中，标记样本过少仍是其面临的主要挑战。针对这个问题，提出了一种基于生成样本对抗训练的图半监督学习算法。该算法基于生成对抗网络，分别生成服从真实样本分布的标记样本和与真实样本分布不同的未标记样本，其中生成的标记样本扩充了监督信息，生成的未标记样本减少了密度间隙中邻近节点的影响，从而提高了图半监督分类效果。相比现有的方法，提出的算法全面考虑了标记样本和未标记样本对图半监督学习的影响，使其分类能力更强。同时在不同的数据集上进行了大量的实验，验证了该方法的有效性。

关键词: 图半监督学习, 生成对抗网络（GAN）, 对抗训练, 生成样本, 图嵌入

WANG Cong, WANG Jie, LIU Quanming, LIANG Jiye. Semi-supervised Learning on Graphs Using Adversarial Training with Generated Sample[J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(2): 367-375.

王聪, 王杰, 刘全明, 梁吉业. 生成样本对抗训练的图半监督学习[J]. 计算机科学与探索, 2023, 17(2): 367-375.

References

[1] 刘建伟, 刘媛, 罗雄麟. 半监督学习方法[J]. 计算机学报, 2015, 38(8): 1592-1617.
LIU J W, LIU Y, LUO X L. Semi-supervised learning me-thods[J]. Chinese Journal of Computers, 2015, 38(8): 1592-1617.
[2] GOODFELLOW I J, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets[C]//Proceedings of the 27th In-ternational Conference on Neural Information Processing Systems, Montreal, Dec 8-13, 2014. Red Hook: Curran As-sociates, 2014: 2672-2680.
[3] DING M, TANG J, ZHANG J. Semi-supervised learning on graphs with generative adversarial nets[C]//Proceedings of the 27th ACM International Conference on Information and Knowledge Management, Torino, Oct 22-26, 2018. New York: ACM, 2018: 913-922.
[4] SEN P, NAMAT G, BILGIC M, et al. Collective classification in network data[J]. AI Magazine, 2008, 29(3): 93-106.
[5] KINGMA D P, MOHAMED S, REZENDE D J, et al. Semi-supervised learning with deep generative models[C]//Procee-dings of the 27th Advances in Neural Information Proces-sing Systems, Montreal, Dec 8-13, 2014. Red Hook: Curran Associates, 2014: 3581-3589.
[6] SINDHWANI V, SATHIYA KEERTHI S. Large scale semi-supervised linear SVMs[C]//Proceedings of the 29th Annual International ACM SIGIR Conference on Research and De-velopment in Information Retrieval, Washington, Aug 6-11, 2006. New York: ACM, 2006: 477-484.
[7] 周志华. 基于分歧的半监督学习[J]. 自动化学报, 2013, 39(11): 1871-1878.
ZHOU Z H. Disagreement-based semi-supervised learning[J]. Acta Automatica Sinica, 2013, 39(11): 1871-1878.
[8] HE J R, CARBONELL J G, LIU Y. Graph-based semi-supervised learning as a generative model[C]//Proceedings of the 20th International Joint Conference on Artificial Intel-ligence, Hyderabad, Jan 6-12, 2007. Menlo Park: AAAI, 2007: 2492-2497.
[9] 刘钰峰, 李仁发. 异构信息网络上基于图正则化的半监督学习[J]. 计算机研究与发展, 2015, 52(3): 606-613.
LIU Y F, LI R F. Graph regularized semi-supervised lear-ning on heterogeneous information networks[J]. Journal of Computer Research and Development, 2015, 52(3): 606-613.
[10] 侯臣平, 吴翊, 易东云. 新的流形学习方法统一框架及改进的拉普拉斯特征映射方法[J]. 计算机研究与发展, 2009, 46(4): 676-682.
HOU C P, WU Y, YI D Y. A novel unified manifold lear-ning framework and an improved Laplacian Eigenmap[J]. Journal of Computer Research and Development, 2009, 46(4): 676-682.
[11] ZHOU D Y, BOUSQUET O, WESTON J, et al. Learning with local and global consistency[C]//Proceedings of the 16th Advances in Neural Information Processing Systems, Vancouver and Whistler, Dec 8-13, 2003. Cambridge: MIT Press, 2004: 321-328.
[12] ZHU X J, GHAHRAMANI Z, LAFFERTY J D. Semi-supervised learning using Gaussian fields and harmonic func-tions[C]//Proceedings of the 20th International Conference on Machine Learning, Washington, Aug 21-24, 2003. Menlo Park: AAAI, 2003: 912-919.
[13] BELKIN M, NIYOGIi P, SINDHEANI V. Manifold regula-rization: a geometric framework for learning from labeled and unlabeled examples[J]. Journal of Machine Learning Research, 2006, 7(11): 2399-2434.
[14] 温雯, 黄家明, 蔡瑞初, 等. 一种融合节点先验信息的图表示学习方法[J]. 软件学报, 2018, 29(3): 786-798.
WEN W, HUANG J M, CAI R C, et al. Graph embedding by incorporating prior knowledge on vertex information[J]. Journal of Software, 2018, 29(3): 786-798.
[15] WESTON J, RATLE F, MOBAHI H, et al. Deep learning via semi-supervised embedding[M]//MONTAVON G, ORR G B, MüLLER K R. 2nd ed. LNCS 7700: Neural Networks: Tricks of the Trade. Berlin, Heidelberg: Springer, 2012: 639-655.
[16] YANG Z L, COHEN W W, SALAKHUDINOV R. Revisi-ting semi-supervised learning with graph embeddings[C]//Proceedings of the 33rd International Conference on Ma-chine Learning, New York, Jun 19-24, 2016: 40-48.
[17] PEROZZI B, AL-RFOU R, SKIENA S. DeepWalk: online learning of social representations[C]//Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, Aug 24-27, 2014. New York: ACM, 2014: 701-710.
[18] KIPF T N, WELLING M. Semi-supervised classification with graph convolutional networks[C]//Proceedings of the 5th International Conference on Learning Representations, Toulon, Apr 24-26, 2017: 1-14.
[19] GAN Z, CHEN L Q, WANG W Y, et al. Triangle generative adversarial networks[C]//Proceedings of the 30th Advances in Neural Information Processing Systems, Long Beach, Dec 4-9, 2017. Red Hook: Curran Associates, 2017: 5247-5256.
[20] ODENA A. Semi-supervised learning with generative adver-sarial networks[J]. arXiv:1606.01583, 2016.
[21] SALIMANS T, GOODFELLOW I, ZAREMBA W, et al. Im-proved techniques for training GANs[C]//Proceedings of the 29th Advances in Neural Information Processing Systems, Barcelona, Dec 5-10, 2016. Red Hook: Curran Associates, 2016: 2226-2234.
[22] LI C X, XU T, ZHU J, et al. Triple generative adversarial nets[C]//Proceedings of the 30th Advances in Neural Infor-mation Processing Systems, Long Beach, Dec 4-9, 2017. Red Hook: Curran Associates, 2017: 4088-4098.
[23] DAI Z H, YANG Z L, YANG F, et al. Good semi-supervised learning that requires a bad GAN[C]//Proceedings of the 30th Advances in Neural Information Processing Systems, Long Beach, Dec 4-9, 2017. Red Hook: Curran Associates, 2017: 6510-6520.
[24] YANG C, LIU Z Y, ZHAO D L, et al. Network representa-tion learning with rich text information[C]//Proceedings of the 24th International Joint Conference on Artificial Intelli-gence, Buenos Aires, Jul 25-31, 2015. Menlo Park: AAAI, 2015: 2111-2117.
[25] LU Q, GETOOR L. Link-based classification[C]//Proceedings of the 20th International Conference on Machine Learning, Washington, Aug 21-24, 2003. Menlo Park: AAAI, 2003: 496-503.
[26] DEFFERRARD M, BRESSON X, VANDERGHEYNST P. Convolutional neural networks on graphs with fast localized spectral filtering[C]//Proceedings of the 30th Advances in Neural Information Processing Systems, Barcelona, Dec 5-10, 2016. Red Hook: Curran Associates, 2016: 3837-3845.