Journal of Frontiers of Computer Science and Technology ›› 2022, Vol. 16 ›› Issue (11): 2537-2546.DOI: 10.3778/j.issn.1673-9418.2104081

• Artificial Intelligence • Previous Articles     Next Articles

Feature-Enhanced Latent Summarization Model of Heterogeneous Network

XU Zhengxiang1,2, WANG Ying1,2, WANG Hongji3, WANG Xin3,+()   

  1. 1. College of Computer Science and Technology, Jilin University, Changchun 130012, China
    2. Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Changchun 130012, China
    3. College of Artificial Intelligence, Jilin University, Changchun 130012, China
  • Received:2021-04-12 Revised:2021-06-04 Online:2022-11-01 Published:2021-06-08
  • About author:XU Zhengxiang, born in 1996, M.S. candidate. His research interests include network representation, graph summarization and graph clustering.
    WANG Ying, born in 1981, Ph.D., professor, Ph.D. supervisor. Her research interests include data mining, machine learning, social computing and search engine.
    WANG Hongji, born in 1993, M.S. candidate. His research interests include machine learning, data mining and knowledge graph completion.
    WANG Xin, born in 1981, assistant professor. His research interests include machine learning, data mining, graph deep learning, knowledge graph and recommendation system.
  • Supported by:
    National Natural Science Foundation of China(61872161);Natural Science Foundation of Jilin Province(2018101328JC);Foundation of Development and Reform Commission of Jilin Province(2019C053-8);Foundation of Jilin Provincial Education Department(JJKH20191257KJ);Interdisciplinary Integration and Innovation Project of Jilin Unive-rsity(419021421615)

基于特征加强的异构网络潜在摘要模型

徐正祥1,2, 王英1,2, 汪洪吉3, 王鑫3,+()   

  1. 1.吉林大学 计算机科学与技术学院,长春 130012
    2.符号计算与知识工程教育部重点实验室,长春 130012
    3.吉林大学 人工智能学院,长春 130012
  • 通讯作者: + E-mail: xinwang@jlu.edu.cn
  • 作者简介:徐正祥(1996—),男,河南信阳人,硕士研究生,主要研究方向为网络表示、图摘要、图聚类。
    王英(1981—),女,吉林长春人,博士,教授,博士生导师,主要研究方向为数据挖掘、机器学习、社交计算、搜索引擎。
    汪洪吉(1993—),男,黑龙江大庆人,硕士研究生,主要研究方向为机器学习、数据挖掘、知识图谱补全。
    王鑫(1981—),男,内蒙古赤峰人,助理教授,主要研究方向为机器学习、数据挖掘、图深度学习、知识图谱、推荐系统。
  • 基金资助:
    国家自然科学基金面上项目(61872161);吉林省自然科学基金(2018101328JC);吉林省发改委项目基金(2019C053-8);吉林省教育厅基金(JJKH20191257KJ);吉林大学学科交叉融合创新项目(419021421615)

Abstract:

With the rapid growth of network data, large-scale heterogeneous network data storage and network repre-sentation have become hot research topics. This paper proposes two different tasks, generating graph summarization and generating node representations of graphs. The target of the graph summarization is to find a compact repre-sentation of the input graph for compressed storage and accelerated query. And the structural information in network data can be extracted well via the network representation, and embedding representation for downstream tasks can be generated. However, in large-scale network data, there are still some challenges to be solved in generating the summarization and embedding representations of graphs. To overcome the problems of the scientific computing and storage space caused by large-scale heterogeneous network, this paper proposes a new feature-enhanced latent sum-marization representation model (FELS), which can obtain the embedding of large-scale network data by the incor-poration of node features and attributes of graphs. Firstly, this paper utilizes different node features of the original graph as basic features and applies a variety of relational operators to capture high-order sub-graph structure infor-mation. Secondly, according to different graph attributes, the potential subspace of the context structural information is learned through a special mapping method. Finally, this paper gets the latent summary representation of the hetero-geneous network through applying matrix decomposition to the learned features of the context, and the latent graph summary representation is a kind of compact latent graph summarization which is independent of the size and dimen-sionality of the input graph, and also able to obtain the node representation. Experimental results show that FELS can gain better potential summarization compared with traditional methods while it has lower model complexity, and FELS achieves higher efficiency and accuracy in link prediction.

Key words: latent summarization, network representation, structure learning, relational operators, feature selection

摘要:

随着网络数据的快速增长,大规模异构网络数据的存储和网络表示已成为研究的热点。现提出两个不同的任务:生成图摘要和生成图的节点表示。图摘要的目标是找到用于压缩存储和加速查询的输入图的紧凑表示;网络表示可以很好地提取网络数据中的结构信息,并为下游任务生成节点表示。但是,在大规模网络数据中,在生成图摘要和嵌入表示时仍需要解决一些挑战。为克服大规模异构网络数据带来的科学计算和存储空间问题,提出基于特征加强的异质网络潜在摘要模型(FELS),通过融合节点特征和图属性获得大规模异构网络数据的摘要表示。首先,将原图中不同的节点特征作为基础特征,通过应用多种关系算子捕获高阶子图结构信息;然后,根据不同的图属性通过桶映射方式学习上下文的潜在子空间结构;最后,对学习到的上下文特征矩阵利用奇异值分解获取异构网络的潜在摘要表示,即一种独立于输入图大小维度紧凑的潜在图摘要,同时能够获取节点表示。实验结果表明,与传统方法相比,提出的FELS模型能够获得更优质的潜在摘要且具有更低的模型复杂度,在链路预测任务上具有更高的效率和精度。

关键词: 潜在摘要, 网络表示, 结构学习, 关系算子, 特征选择

CLC Number: