计算机科学与探索 ›› 2019, Vol. 13 ›› Issue (7): 1081-1093.DOI: 10.3778/j.issn.1673-9418.1903056

• 综述·探索 • 上一篇    下一篇

面向异质信息网络的表示学习方法研究综述

周  慧,赵中英+,李  超   

  1. 山东科技大学 计算机科学与工程学院,山东 青岛 266590
  • 出版日期:2019-07-01 发布日期:2019-07-08

Survey on Representation Learning Methods Oriented to Heterogeneous Information Network

ZHOU Hui, ZHAO Zhongying+, LI Chao   

  1. College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao, Shandong 266590, China
  • Online:2019-07-01 Published:2019-07-08

摘要: 网络表示学习旨在为网络中的组件(节点、边、子网络等)学习出低维的表征向量,使得这些向量能够在最大程度上保留组件在原网络中的特性。异质信息网络是由多种类型的节点、链接关系以及属性信息组成的网络,具有动态性、大规模和异质性等特点,在现实生活中普遍存在。融合多种异质信息进行网络表示学习,能在一定程度上解决数据稀疏问题,同时有助于训练出具有高区别力和推理能力的表征向量。但与此同时,也面临着如何有效处理复杂数据关系以及平衡异质信息的挑战。近年来,研究者们针对异质信息网络设计了不同的表示学习算法,在很大程度上推动了该领域的发展。针对这些算法,首先设计一个统一的分类框架,接着对各类别下的代表性算法进行概括介绍和比较,分析它们的时间复杂度和优缺点。此外,分类汇总了实验中的常用数据集。最后给出了该领域的挑战和未来可能的研究方向。

关键词: 网络表示学习, 异质信息网络, 网络分析

Abstract: Network representation learning aims to learn a series of low-dimensional vectors for the components (node, edge, subgraph, etc.) in a network. Meanwhile, the characters of the components in the original network should be largely retained in these vectors. Heterogeneous information network is the network composed of various types of nodes, link relationships and attribute information. It is characterized by dynamics, large scale and heterogeneity, and is ubiquitous in the real life. Network representation learning by integrating various heterogeneous information can not only alleviate the problem of data sparsity, but also help to learn the representation vectors with high discriminative and inferential ability. At the same time, it also faces the challenge of dealing with complex data relationships and balancing heterogeneous information. In recent years, researchers have designed different representation learning algorithms for heterogeneous information networks, which have greatly promoted the development of this field. In view of these algorithms, this paper first designs a unified classification framework, then generalizes and compares the representative algorithms in each category, including their time complexities, advantages, etc. In addition, the information of the commonly used data sets is summarized into a table. Some challenges and possible research directions are provided at the end of this paper.

Key words: network representation learning, heterogeneous information network, network analysis