计算机科学与探索 ›› 2017, Vol. 11 ›› Issue (4): 520-527.DOI: 10.3778/j.issn.1673-9418.1611073

• 学术研究 • 上一篇    下一篇

直推式网络表示学习

张  霞+,陈维政,谢正茂,闫宏飞   

  1. 北京大学 信息科学技术学院,北京 100871
  • 出版日期:2017-04-12 发布日期:2017-04-12

Learning Transductive Network Embedding

ZHANG Xia+, CHEN Weizheng, XIE Zhengmao, YAN Hongfei   

  1. School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China
  • Online:2017-04-12 Published:2017-04-12

摘要: 网络表示学习是一个经典的学习问题,其目的是将高维的网络在低维度的向量空间进行表示。目前大多数的网络表示学习方法都是无监督的,忽视了标签信息。受LINE(large-scale information network embedding)算法启发而提出了一种半监督的学习算法TLINE。TLINE是一种直推式表示学习算法,其通过优化LINE部分的目标函数来保留网络的局部特性。而标签信息部分,则使用线性支持向量机(support vector machine)来提高带标签结点的区分度。通过边采样、负采样和异步随机梯度下降来降低算法的复杂度,从而使TLINE算法可以处理大型的网络。最后,在论文引用数据集CiteSeer和共同作者数据集DBLP上进行了实验,实验结果表明,TLINE算法明显优于经典的无监督网络表示学习算法DeepWalk和LINE。

关键词: 直推式, 网络表示学习, 结点分类

Abstract: Network embedding is a classical task which aims to project a network into a low-dimensional space. Currently, most of existing embedding methods are unsupervised algorithms, which ignore useful label information. This paper proposes TLINE, a semi-supervised extension of LINE (large-scale information network embedding) algorithm. TLINE is a transductive network embedding method, which optimizes the loss function of LINE to preserve local network structure information, and applies SVM (support vector machine) to max the margin between the labeled nodes of different classes. By applying edge-sampling, negative sampling techniques and asynchronous stochastic gradient descent algorithm in the optimizing process, the computational complexity of TLINE is reduced, thus TLINE can handle the large-scale network. To evaluate the performance in node classification task, this paper tests the proposed methods on two real world network datasets, CiteSeer and DBLP. The experimental results indicate that TLINE outperforms the state-of-the-art baselines and is suitable for large-scale network.

Key words: transductive, network embedding learning, node classification