计算机科学与探索 ›› 2023, Vol. 17 ›› Issue (2): 367-375.DOI: 10.3778/j.issn.1673-9418.2104119

• 理论·算法 • 上一篇    下一篇

生成样本对抗训练的图半监督学习

王聪,王杰,刘全明,梁吉业   

  1. 1. 山西大学 计算机与信息技术学院,太原 030006
    2. 山西大学 计算智能与中文信息处理教育部重点实验室,太原 030006
  • 出版日期:2023-02-01 发布日期:2023-02-01

Semi-supervised Learning on Graphs Using Adversarial Training with Generated Sample

WANG Cong, WANG Jie, LIU Quanming, LIANG Jiye   

  1. 1. School of Computer and Information Technology, Shanxi University, Taiyuan 030006, China
    2. Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, Shanxi University, Taiyuan 030006, China
  • Online:2023-02-01 Published:2023-02-01

摘要: 给定一个由少量标记节点和大量未标记节点组成的图,图半监督学习的目标是为图中的未标记节点分配标签。生成对抗网络已经在半监督学习中展示了强大的能力,但基于生成对抗网络的图半监督学习的研究工作较少,目前的工作主要关注在低密度区域生成未标记样本削弱子图之间的信息传播,从而使决策边界更清晰,但在这类方法中,标记样本过少仍是其面临的主要挑战。针对这个问题,提出了一种基于生成样本对抗训练的图半监督学习算法。该算法基于生成对抗网络,分别生成服从真实样本分布的标记样本和与真实样本分布不同的未标记样本,其中生成的标记样本扩充了监督信息,生成的未标记样本减少了密度间隙中邻近节点的影响,从而提高了图半监督分类效果。相比现有的方法,提出的算法全面考虑了标记样本和未标记样本对图半监督学习的影响,使其分类能力更强。同时在不同的数据集上进行了大量的实验,验证了该方法的有效性。

关键词: 图半监督学习, 生成对抗网络(GAN), 对抗训练, 生成样本, 图嵌入

Abstract: Given a graph composed of a small number of labeled nodes and a large number of unlabeled nodes, semi-supervised learning on graphs aims to assign labels for the unlabeled nodes. Generative adversarial networks have shown strong ability in semi-supervised learning, but the research of generative adversarial networks for semi-supervised learning on graphs is few. The current work mainly focuses on the generation of unlabeled samples in low-density regions to weaken the information transmission between subgraphs, so as to make the decision boundary clearer. However, in this kind of methods, too few labeled samples is still the main challenge. This paper  proposes a semi-supervised learning algorithm on graphs using adversarial training with generated sample. The algorithm is based on generative adversarial networks, which generates the labeled samples from the real sample distribution and the unlabeled samples different from the real sample distribution. The generated labeled samples expand the supervised information, while the generated unlabeled samples reduce the influence of neighboring nodes in the density gap, thus improving the semi-supervised classification effect on graphs. Compared with the existing methods, the proposed algorithm fully considers the effects of labeled samples and unlabeled samples on graph-based semi-supervised learning, which makes its classification ability stronger. Meanwhile, a large number of experiments are carried out on different datasets to verify the effectiveness of the method.

Key words: semi-supervised learning on graphs, generative adversarial networks (GAN), adversarial training, ge-nerated sample, graph embedding