基于特征加强的异构网络潜在摘要模型

doi:10.3778/j.issn.1673-9418.2104081

计算机科学与探索 ›› 2022, Vol. 16 ›› Issue (11): 2537-2546.DOI: 10.3778/j.issn.1673-9418.2104081

基于特征加强的异构网络潜在摘要模型

徐正祥¹^,², 王英¹^,², 汪洪吉³, 王鑫³^,⁺()

1.吉林大学计算机科学与技术学院，长春 130012
2.符号计算与知识工程教育部重点实验室，长春 130012
3.吉林大学人工智能学院，长春 130012

收稿日期:2021-04-12 修回日期:2021-06-04 出版日期:2022-11-01 发布日期:2021-06-08
通讯作者: + E-mail: xinwang@jlu.edu.cn
作者简介:徐正祥（1996—），男，河南信阳人，硕士研究生，主要研究方向为网络表示、图摘要、图聚类。
王英（1981—），女，吉林长春人，博士，教授，博士生导师，主要研究方向为数据挖掘、机器学习、社交计算、搜索引擎。
汪洪吉（1993—），男，黑龙江大庆人，硕士研究生，主要研究方向为机器学习、数据挖掘、知识图谱补全。
王鑫（1981—），男，内蒙古赤峰人，助理教授，主要研究方向为机器学习、数据挖掘、图深度学习、知识图谱、推荐系统。
基金资助:
国家自然科学基金面上项目(61872161);吉林省自然科学基金(2018101328JC);吉林省发改委项目基金(2019C053-8);吉林省教育厅基金(JJKH20191257KJ);吉林大学学科交叉融合创新项目(419021421615)

Feature-Enhanced Latent Summarization Model of Heterogeneous Network

XU Zhengxiang¹^,², WANG Ying¹^,², WANG Hongji³, WANG Xin³^,⁺()

1. College of Computer Science and Technology, Jilin University, Changchun 130012, China
2. Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Changchun 130012, China
3. College of Artificial Intelligence, Jilin University, Changchun 130012, China

Received:2021-04-12 Revised:2021-06-04 Online:2022-11-01 Published:2021-06-08
About author:XU Zhengxiang, born in 1996, M.S. candidate. His research interests include network representation, graph summarization and graph clustering.
WANG Ying, born in 1981, Ph.D., professor, Ph.D. supervisor. Her research interests include data mining, machine learning, social computing and search engine.
WANG Hongji, born in 1993, M.S. candidate. His research interests include machine learning, data mining and knowledge graph completion.
WANG Xin, born in 1981, assistant professor. His research interests include machine learning, data mining, graph deep learning, knowledge graph and recommendation system.
Supported by:
National Natural Science Foundation of China(61872161);Natural Science Foundation of Jilin Province(2018101328JC);Foundation of Development and Reform Commission of Jilin Province(2019C053-8);Foundation of Jilin Provincial Education Department(JJKH20191257KJ);Interdisciplinary Integration and Innovation Project of Jilin Unive-rsity(419021421615)

摘要/Abstract

摘要：

随着网络数据的快速增长，大规模异构网络数据的存储和网络表示已成为研究的热点。现提出两个不同的任务：生成图摘要和生成图的节点表示。图摘要的目标是找到用于压缩存储和加速查询的输入图的紧凑表示；网络表示可以很好地提取网络数据中的结构信息，并为下游任务生成节点表示。但是，在大规模网络数据中，在生成图摘要和嵌入表示时仍需要解决一些挑战。为克服大规模异构网络数据带来的科学计算和存储空间问题，提出基于特征加强的异质网络潜在摘要模型（FELS），通过融合节点特征和图属性获得大规模异构网络数据的摘要表示。首先，将原图中不同的节点特征作为基础特征，通过应用多种关系算子捕获高阶子图结构信息；然后，根据不同的图属性通过桶映射方式学习上下文的潜在子空间结构；最后，对学习到的上下文特征矩阵利用奇异值分解获取异构网络的潜在摘要表示，即一种独立于输入图大小维度紧凑的潜在图摘要，同时能够获取节点表示。实验结果表明，与传统方法相比，提出的FELS模型能够获得更优质的潜在摘要且具有更低的模型复杂度，在链路预测任务上具有更高的效率和精度。

关键词: 潜在摘要, 网络表示, 结构学习, 关系算子, 特征选择

Abstract:

With the rapid growth of network data, large-scale heterogeneous network data storage and network repre-sentation have become hot research topics. This paper proposes two different tasks, generating graph summarization and generating node representations of graphs. The target of the graph summarization is to find a compact repre-sentation of the input graph for compressed storage and accelerated query. And the structural information in network data can be extracted well via the network representation, and embedding representation for downstream tasks can be generated. However, in large-scale network data, there are still some challenges to be solved in generating the summarization and embedding representations of graphs. To overcome the problems of the scientific computing and storage space caused by large-scale heterogeneous network, this paper proposes a new feature-enhanced latent sum-marization representation model (FELS), which can obtain the embedding of large-scale network data by the incor-poration of node features and attributes of graphs. Firstly, this paper utilizes different node features of the original graph as basic features and applies a variety of relational operators to capture high-order sub-graph structure infor-mation. Secondly, according to different graph attributes, the potential subspace of the context structural information is learned through a special mapping method. Finally, this paper gets the latent summary representation of the hetero-geneous network through applying matrix decomposition to the learned features of the context, and the latent graph summary representation is a kind of compact latent graph summarization which is independent of the size and dimen-sionality of the input graph, and also able to obtain the node representation. Experimental results show that FELS can gain better potential summarization compared with traditional methods while it has lower model complexity, and FELS achieves higher efficiency and accuracy in link prediction.

Key words: latent summarization, network representation, structure learning, relational operators, feature selection

中图分类号:

TP391

徐正祥, 王英, 汪洪吉, 王鑫. 基于特征加强的异构网络潜在摘要模型[J]. 计算机科学与探索, 2022, 16(11): 2537-2546.

XU Zhengxiang, WANG Ying, WANG Hongji, WANG Xin. Feature-Enhanced Latent Summarization Model of Heterogeneous Network[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(11): 2537-2546.

图/表 10

图1 潜在摘要与网络表示

Fig.1 Latent summarization and network representations

图2 特征加强的异构网络潜在摘要模型

Fig.2 Feature-enhanced latent summarization model of heterogeneous network

图3 max关系算子

Fig.3 max relational operator

图4 特征抽取

Fig.4 Feature extraction

表1 数据集

Table 1 Dataset

Dataset	Nodes	Edges	Node type	Graph type
Hhar3	10 299	30 897	6	unweighted
Hhar10	10 299	102 988	6	unweighted
Reut3	10 000	30 006	4	unweighted
Reut10	10 000	100 000	4	unweighted
Movie	28 138	286 739	3	unweighted
American	6 386	217 662	1	weighted

表2 Comparison of results of latent summarization generation node embedding and baseline method generation embedding applied to link prediction tasks 单位：%

Table 2

Dataset	Metric	GF	HOPE	LP	LLE	Dwalk	LINE	SDNE	S2vec	N2vec	MLS	FELS
Hhar3	AUC	73.61	67.45	71.63	48.41	69.64	61.97	63.13	50.37	67.75	95.87	97.30
Hhar3	ACC	73.58	67.72	72.06	48.24	69.60	62.46	63.15	50.70	67.78	96.01	97.22
Hhar10	AUC	77.07	63.31	64.48	49.25	64.99	62.17	60.79	49.69	64.33	95.13	96.17
Hhar10	ACC	77.80	62.55	64.74	49.11	65.44	61.49	59.88	50.24	64.45	95.15	96.05
Reut3	AUC	67.04	49.02	50.61	50.43	50.49	49.60	50.74	49.72	51.18	92.44	94.12
Reut3	ACC	66.35	49.81	50.77	49.89	50.71	49.25	51.26	49.93	50.99	92.51	93.80
Reut10	AUC	70.16	50.10	50.39	50.04	50.08	50.01	50.18	49.76	49.63	83.23	83.94
Reut10	ACC	70.34	49.84	49.87	49.87	49.72	50.13	50.02	49.43	48.89	83.56	84.04
Movie	AUC	50.50	49.20	66.10	OOT	45.03	50.70	54.75	60.40	56.57	82.78	85.59
Movie	ACC	54.69	49.54	65.83	OOT	45.36	50.77	54.33	60.13	56.37	82.33	85.53
American	AUC	49.83	50.05	49.83	OOT	50.21	50.62	50.11	49.91	50.30	79.52	80.52
American	ACC	49.15	50.57	50.56	OOT	50.17	51.25	50.92	50.14	51.03	79.20	80.98

图5 变体实验1

Fig.5 Variation experiment 1

图6 变体实验2

Fig.6 Variation experiment 2

表3 存储空间对比

Table 3 Comparison of storage space

Dataset	Nodes	Edges	Node type	Graph type	G/MB	EMB/MB	FELS/MB
Yahoo	100 058	1 057 050	2	unweighted	74	91	9
Digg	283 183	4 742 055	2	unweighted	109	321	9
Dbpedia	495 940	921 710	4	unweighted	13	448	16
Bibsonomy	977 914	3 754 828	3	weighted	178	883	16

表4 FELS-L链路预测对比

Table 4 Comparison of link prediction of FELS-L

Dataset	AUC/%
Dataset	$L 1$	$L 2$	$L 3$
Hhar3	96.43	97.30	97.42
Hhar10	95.13	96.17	96.38
Reut3	93.42	94.12	94.07
Reut10	84.04	83.94	83.39
Movie	86.54	85.59	83.70
American	79.88	80.52	80.29

表4 FELS-L链路预测对比

Table 4 Comparison of link prediction of FELS-L

Dataset	AUC/%
Dataset	$L 1$	$L 2$	$L 3$
Hhar3	96.43	97.30	97.42
Hhar10	95.13	96.17	96.38
Reut3	93.42	94.12	94.07
Reut10	84.04	83.94	83.39
Movie	86.54	85.59	83.70
American	79.88	80.52	80.29

参考文献 23

[1]	吴越, 王英, 王鑫, 等. 基于超图卷积的异质网络半监督节点分类[J]. 计算机学报, 2021, 44(11): 2248-2260.
	WU Y, WANG Y, WANG X, et al. Motif-based hypergraph convolution network for semi-supervised node classification on heterogeneous graph[J]. Chinese Journal of Computers, 2021, 44(11): 2248-2260.
[2]	LIU Y, SAFAVI T, DIGHE A, et al. Graph summarization methods and applications: a survey[J]. ACM Computing Sur-veys, 2018, 51(3): 1-34.
[3]	QIU J, DONG Y, MA H, et al. Network embedding as matrix factorization: unifying DeepWalk, LINE, PTE, and Node2vec[C]// Proceedings of the 11th ACM International Conference on Web Search and Data Mining, Marina Del Rey, Feb 5-9, 2018. New York: ACM, 2018: 459-467.
[4]	DE RIDDER D, KOUROPTEVA O, OKUN O, et al. Super-vised locally linear embedding[C]// LNCS 2714: Proceedings of the Joint International Conference ICANN/ICONIP 2003, Istanbul, Jun 26-29, 2003. Berlin, Heidelberg: Springer, 2003: 333-341.
[5]	BELKIN M, NIYOGI P. Laplacian Eigenmaps and spectral techniques for embedding and clustering[C]// Advances in Neural Information Processing Systems 14, Vancouver, Dec 3-8, 2001: 585-591.
[6]	AHMED A, SHERVASHIDZE N, NARAYANAMURTHY S, et al. Distributed large-scale natural graph factorization[C]// Proceedings of the 22nd International Conference on World Wide Web, Rio de Janeiro, Brazil, May 13-17, 2013. New York: ACM, 2013: 37-48.
[7]	CAO S, LU W, XU Q. Grarep: learning graph representa-tions with global structural information[C]// Proceedings of the 24th ACM International Conference on Information and Knowledge Management, Melbourne, Oct 19-23, 2015. New York: ACM, 2015: 891-900.
[8]	OU M, CUI P, PEI J, et al. Asymmetric transitivity preser-ving graph embedding[C]// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, Aug 13-17, 2016. New York: ACM, 2016: 1105-1114.
[9]	PEROZZI B, AL-RFOU R, SKIENA S. DeepWalk: online learning of social representations[C]// Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Dis-covery and Data Mining, New York, Aug 24-27, 2014. New York: ACM, 2014: 701-710.
[10]	TANG J, QU M, WANG M, et al. LINE: large-scale informa-tion network embedding[C]// Proceedings of the 24th Interna-tional Conference on World Wide Web, Florence, May 18-22, 2015. New York: ACM, 2015: 1067-1077.
[11]	GROVER A, LESKOVEC J. Node2vec: scalable feature lear-ning for networks[C]// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, Aug 13-17, 2016. New York: ACM, 2016: 855-864.
[12]	KIPF T N, WELLING M. Semi-supervised classification with graph convolutional networks[J]. arXiv:1609.02907, 2016.
[13]	WANG D, CUI P, ZHU W. Structural deep network embed-ding[C]// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, Aug 13-17, 2016. New York: ACM, 2016: 1225-1234.
[14]	GUO S, WANG Y, YUAN H, et al. TAERT: triple-attentional explainable recommendation with temporal convolutional network[J]. Information Sciences, 2021, 567: 185-200. DOI URL
[15]	RIBEIRO L F R, SAVERESE P H P, FIGUEIREDO D R. Struc2vec: learning node representations from structural iden-tity[C]// Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, Aug 13-17, 2017. New York: ACM, 2017: 385-394.
[16]	NAVLAKHA S, RASTOGI R, SHRIVASTAVA N. Graph summarization with bounded error[C]// Proceedings of the 2018 ACM SIGMOD International Conference on Manage-ment of Data, Vancouver, Jun 10-12, 2008. New York: ACM, 2008: 419-432.
[17]	MACCIONI A, ABADI D J. Scalable pattern matching over compressed graphs via dedensification[C]// Proceedings of the 22nd ACM SIGKDD International Conference on Know-ledge Discovery and Data Mining, San Francisco, Aug 13-17, 2016. New York: ACM, 2016: 1755-1764.
[18]	ZHU L, GHASEMI-GOL M, SZEKELY P, et al. Unsuper-vised entity resolution on multi-type graphs[C]// LNCS 9981: Proceedings of the 15th International Semantic Web Confe-rence, Kobe, Oct 17-21, 2016. Cham: Springer, 2016: 649-667.
[19]	LI C T, LIN S D. Egocentric information abstraction for heterogeneous social networks[C]// Proceedings of the 2009 International Conference on Advances in Social Network Ana-lysis and Mining, Athens, Jul 20-22, 2009. Washington: IEEE Computer Society, 2009: 255-260.
[20]	SHAH N, KOUTRA D, ZOU T, et al. TimeCrunch: interp-retable dynamic graph summarization[C]// Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, Aug 10-13, 2015. New York: ACM, 2015: 1055-1064.
[21]	JIN D, ROSSI R A, KOH E, et al. Latent network summa-rization: bridging network embedding and summarization[C]// Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Anc-horage, Aug 4-8, 2019. New York: ACM, 2019: 987-997.
[22]	周才东, 曾碧卿, 王盛玉, 等. 结合注意力与卷积神经网络的中文摘要研究[J]. 计算机工程与应用, 2019, 55(8): 132-137. DOI
	ZHOU C D, ZENG B Q, WANG S Y, et al. Chinese summa-rization research on combination of local attention and con-volutional neural network[J]. Computer Engineering and App-lications, 2019, 55(8): 132-137.
[23]	田珂珂, 周瑞莹, 董浩业, 等. 基于编码器共享和门控网络的生成式文本摘要方法[J]. 北京大学学报(自然科学版), 2020, 56(1): 61-67.
	TIAN K K, ZHOU R Y, DONG H Y, et al. An abstractive summarization method based on encoder-sharing and gated network[J]. Acta Scientiarum Naturalium Universitatis Pekine-nsis, 2020, 56(1): 61-67.

基于特征加强的异构网络潜在摘要模型

Feature-Enhanced Latent Summarization Model of Heterogeneous Network

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 10

参考文献 23

相关文章 15

编辑推荐

Metrics

[1]	赵泽渊, 代永强. 改进混合二进制蝗虫优化特征选择算法[J]. 计算机科学与探索, 2021, 15(7): 1339-1349.
[2]	王雲霞, 曹付元, 凌兆龙. 混合局部因果结构学习[J]. 计算机科学与探索, 2021, 15(4): 754-765.
[3]	赵雪莉, 卢光跃, 吕少卿, 张潘. 结合属性信息的二分网络表示学习[J]. 计算机科学与探索, 2021, 15(3): 495-505.
[4]	涂吉屏，钱晔，王炜，范道远，张涵宇. 使用EBIC的软件故障特征选择方法[J]. 计算机科学与探索, 2020, 14(2): 215-235.
[5]	倪鹏，刘阳明，赵素云，陈红，李翠平. 动态模糊粗糙特征选取算法[J]. 计算机科学与探索, 2020, 14(2): 236-243.
[6]	杜师帅，邱天，李灵巧，胡锦泉，郑安兵，冯艳春，胡昌勤，杨辉华. 多层梯度提升树在药品鉴别中的应用[J]. 计算机科学与探索, 2020, 14(2): 260-273.
[7]	王金杰，李炜. 混合互信息和粒子群算法的多目标特征选择方法[J]. 计算机科学与探索, 2020, 14(1): 83-95.
[8]	许磊，黄玲，王昌栋. 保持Motif结构的网络表示学习[J]. 计算机科学与探索, 2019, 13(8): 1261-1271.
[9]	周慧，赵中英，李超. 面向异质信息网络的表示学习方法研究综述[J]. 计算机科学与探索, 2019, 13(7): 1081-1093.
[10]	林得富，王骏，张嘉旭，应文豪，王士同. Takagi-Sugeno模糊系统双正则联合稀疏建模新方法[J]. 计算机科学与探索, 2019, 13(6): 1016-1026.
[11]	杨晓翠，宋甲秀，张曦煌. 基于网络表示学习的链路预测算法[J]. 计算机科学与探索, 2019, 13(5): 812-821.
[12]	巢秀琴，李炜. 人工蜂群算法优化的特征选择方法[J]. 计算机科学与探索, 2019, 13(2): 300-309.
[13]	张蕾，钱峰，赵姝，陈洁，张燕平. 利用变分自编码器进行网络表示学习[J]. 计算机科学与探索, 2019, 13(10): 1733-1744.
[14]	钱文彬，黄琴，王映龙，杨珺. 多标记不完备数据的特征选择算法[J]. 计算机科学与探索, 2019, 13(10): 1768-1780.
[15]	马忱，姜高霞，王文剑. 面向函数型数据的动态互信息特征选择方法[J]. 计算机科学与探索, 2019, 13(1): 158-168.