计算机科学与探索 ›› 2022, Vol. 16 ›› Issue (11): 2537-2546.DOI: 10.3778/j.issn.1673-9418.2104081
收稿日期:
2021-04-12
修回日期:
2021-06-04
出版日期:
2022-11-01
发布日期:
2021-06-08
通讯作者:
+ E-mail: xinwang@jlu.edu.cn作者简介:
徐正祥(1996—),男,河南信阳人,硕士研究生,主要研究方向为网络表示、图摘要、图聚类。基金资助:
XU Zhengxiang1,2, WANG Ying1,2, WANG Hongji3, WANG Xin3,+()
Received:
2021-04-12
Revised:
2021-06-04
Online:
2022-11-01
Published:
2021-06-08
About author:
XU Zhengxiang, born in 1996, M.S. candidate. His research interests include network representation, graph summarization and graph clustering.Supported by:
摘要:
随着网络数据的快速增长,大规模异构网络数据的存储和网络表示已成为研究的热点。现提出两个不同的任务:生成图摘要和生成图的节点表示。图摘要的目标是找到用于压缩存储和加速查询的输入图的紧凑表示;网络表示可以很好地提取网络数据中的结构信息,并为下游任务生成节点表示。但是,在大规模网络数据中,在生成图摘要和嵌入表示时仍需要解决一些挑战。为克服大规模异构网络数据带来的科学计算和存储空间问题,提出基于特征加强的异质网络潜在摘要模型(FELS),通过融合节点特征和图属性获得大规模异构网络数据的摘要表示。首先,将原图中不同的节点特征作为基础特征,通过应用多种关系算子捕获高阶子图结构信息;然后,根据不同的图属性通过桶映射方式学习上下文的潜在子空间结构;最后,对学习到的上下文特征矩阵利用奇异值分解获取异构网络的潜在摘要表示,即一种独立于输入图大小维度紧凑的潜在图摘要,同时能够获取节点表示。实验结果表明,与传统方法相比,提出的FELS模型能够获得更优质的潜在摘要且具有更低的模型复杂度,在链路预测任务上具有更高的效率和精度。
中图分类号:
徐正祥, 王英, 汪洪吉, 王鑫. 基于特征加强的异构网络潜在摘要模型[J]. 计算机科学与探索, 2022, 16(11): 2537-2546.
XU Zhengxiang, WANG Ying, WANG Hongji, WANG Xin. Feature-Enhanced Latent Summarization Model of Heterogeneous Network[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(11): 2537-2546.
Dataset | Nodes | Edges | Node type | Graph type |
---|---|---|---|---|
Hhar3 | 10 299 | 30 897 | 6 | unweighted |
Hhar10 | 10 299 | 102 988 | 6 | unweighted |
Reut3 | 10 000 | 30 006 | 4 | unweighted |
Reut10 | 10 000 | 100 000 | 4 | unweighted |
Movie | 28 138 | 286 739 | 3 | unweighted |
American | 6 386 | 217 662 | 1 | weighted |
表1 数据集
Table 1 Dataset
Dataset | Nodes | Edges | Node type | Graph type |
---|---|---|---|---|
Hhar3 | 10 299 | 30 897 | 6 | unweighted |
Hhar10 | 10 299 | 102 988 | 6 | unweighted |
Reut3 | 10 000 | 30 006 | 4 | unweighted |
Reut10 | 10 000 | 100 000 | 4 | unweighted |
Movie | 28 138 | 286 739 | 3 | unweighted |
American | 6 386 | 217 662 | 1 | weighted |
Dataset | Metric | GF | HOPE | LP | LLE | Dwalk | LINE | SDNE | S2vec | N2vec | MLS | FELS |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Hhar3 | AUC | 73.61 | 67.45 | 71.63 | 48.41 | 69.64 | 61.97 | 63.13 | 50.37 | 67.75 | 95.87 | 97.30 |
ACC | 73.58 | 67.72 | 72.06 | 48.24 | 69.60 | 62.46 | 63.15 | 50.70 | 67.78 | 96.01 | 97.22 | |
Hhar10 | AUC | 77.07 | 63.31 | 64.48 | 49.25 | 64.99 | 62.17 | 60.79 | 49.69 | 64.33 | 95.13 | 96.17 |
ACC | 77.80 | 62.55 | 64.74 | 49.11 | 65.44 | 61.49 | 59.88 | 50.24 | 64.45 | 95.15 | 96.05 | |
Reut3 | AUC | 67.04 | 49.02 | 50.61 | 50.43 | 50.49 | 49.60 | 50.74 | 49.72 | 51.18 | 92.44 | 94.12 |
ACC | 66.35 | 49.81 | 50.77 | 49.89 | 50.71 | 49.25 | 51.26 | 49.93 | 50.99 | 92.51 | 93.80 | |
Reut10 | AUC | 70.16 | 50.10 | 50.39 | 50.04 | 50.08 | 50.01 | 50.18 | 49.76 | 49.63 | 83.23 | 83.94 |
ACC | 70.34 | 49.84 | 49.87 | 49.87 | 49.72 | 50.13 | 50.02 | 49.43 | 48.89 | 83.56 | 84.04 | |
Movie | AUC | 50.50 | 49.20 | 66.10 | OOT | 45.03 | 50.70 | 54.75 | 60.40 | 56.57 | 82.78 | 85.59 |
ACC | 54.69 | 49.54 | 65.83 | OOT | 45.36 | 50.77 | 54.33 | 60.13 | 56.37 | 82.33 | 85.53 | |
American | AUC | 49.83 | 50.05 | 49.83 | OOT | 50.21 | 50.62 | 50.11 | 49.91 | 50.30 | 79.52 | 80.52 |
ACC | 49.15 | 50.57 | 50.56 | OOT | 50.17 | 51.25 | 50.92 | 50.14 | 51.03 | 79.20 | 80.98 |
表2 Comparison of results of latent summarization generation node embedding and baseline method generation embedding applied to link prediction tasks 单位:%
Table 2
Dataset | Metric | GF | HOPE | LP | LLE | Dwalk | LINE | SDNE | S2vec | N2vec | MLS | FELS |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Hhar3 | AUC | 73.61 | 67.45 | 71.63 | 48.41 | 69.64 | 61.97 | 63.13 | 50.37 | 67.75 | 95.87 | 97.30 |
ACC | 73.58 | 67.72 | 72.06 | 48.24 | 69.60 | 62.46 | 63.15 | 50.70 | 67.78 | 96.01 | 97.22 | |
Hhar10 | AUC | 77.07 | 63.31 | 64.48 | 49.25 | 64.99 | 62.17 | 60.79 | 49.69 | 64.33 | 95.13 | 96.17 |
ACC | 77.80 | 62.55 | 64.74 | 49.11 | 65.44 | 61.49 | 59.88 | 50.24 | 64.45 | 95.15 | 96.05 | |
Reut3 | AUC | 67.04 | 49.02 | 50.61 | 50.43 | 50.49 | 49.60 | 50.74 | 49.72 | 51.18 | 92.44 | 94.12 |
ACC | 66.35 | 49.81 | 50.77 | 49.89 | 50.71 | 49.25 | 51.26 | 49.93 | 50.99 | 92.51 | 93.80 | |
Reut10 | AUC | 70.16 | 50.10 | 50.39 | 50.04 | 50.08 | 50.01 | 50.18 | 49.76 | 49.63 | 83.23 | 83.94 |
ACC | 70.34 | 49.84 | 49.87 | 49.87 | 49.72 | 50.13 | 50.02 | 49.43 | 48.89 | 83.56 | 84.04 | |
Movie | AUC | 50.50 | 49.20 | 66.10 | OOT | 45.03 | 50.70 | 54.75 | 60.40 | 56.57 | 82.78 | 85.59 |
ACC | 54.69 | 49.54 | 65.83 | OOT | 45.36 | 50.77 | 54.33 | 60.13 | 56.37 | 82.33 | 85.53 | |
American | AUC | 49.83 | 50.05 | 49.83 | OOT | 50.21 | 50.62 | 50.11 | 49.91 | 50.30 | 79.52 | 80.52 |
ACC | 49.15 | 50.57 | 50.56 | OOT | 50.17 | 51.25 | 50.92 | 50.14 | 51.03 | 79.20 | 80.98 |
Dataset | Nodes | Edges | Node type | Graph type | G/MB | EMB/MB | FELS/MB |
---|---|---|---|---|---|---|---|
Yahoo | 100 058 | 1 057 050 | 2 | unweighted | 74 | 91 | 9 |
Digg | 283 183 | 4 742 055 | 2 | unweighted | 109 | 321 | 9 |
Dbpedia | 495 940 | 921 710 | 4 | unweighted | 13 | 448 | 16 |
Bibsonomy | 977 914 | 3 754 828 | 3 | weighted | 178 | 883 | 16 |
表3 存储空间对比
Table 3 Comparison of storage space
Dataset | Nodes | Edges | Node type | Graph type | G/MB | EMB/MB | FELS/MB |
---|---|---|---|---|---|---|---|
Yahoo | 100 058 | 1 057 050 | 2 | unweighted | 74 | 91 | 9 |
Digg | 283 183 | 4 742 055 | 2 | unweighted | 109 | 321 | 9 |
Dbpedia | 495 940 | 921 710 | 4 | unweighted | 13 | 448 | 16 |
Bibsonomy | 977 914 | 3 754 828 | 3 | weighted | 178 | 883 | 16 |
Dataset | AUC/% | ||
---|---|---|---|
Hhar3 | 96.43 | 97.30 | 97.42 |
Hhar10 | 95.13 | 96.17 | 96.38 |
Reut3 | 93.42 | 94.12 | 94.07 |
Reut10 | 84.04 | 83.94 | 83.39 |
Movie | 86.54 | 85.59 | 83.70 |
American | 79.88 | 80.52 | 80.29 |
表4 FELS-L链路预测对比
Table 4 Comparison of link prediction of FELS-L
Dataset | AUC/% | ||
---|---|---|---|
Hhar3 | 96.43 | 97.30 | 97.42 |
Hhar10 | 95.13 | 96.17 | 96.38 |
Reut3 | 93.42 | 94.12 | 94.07 |
Reut10 | 84.04 | 83.94 | 83.39 |
Movie | 86.54 | 85.59 | 83.70 |
American | 79.88 | 80.52 | 80.29 |
[1] | 吴越, 王英, 王鑫, 等. 基于超图卷积的异质网络半监督节点分类[J]. 计算机学报, 2021, 44(11): 2248-2260. |
WU Y, WANG Y, WANG X, et al. Motif-based hypergraph convolution network for semi-supervised node classification on heterogeneous graph[J]. Chinese Journal of Computers, 2021, 44(11): 2248-2260. | |
[2] | LIU Y, SAFAVI T, DIGHE A, et al. Graph summarization methods and applications: a survey[J]. ACM Computing Sur-veys, 2018, 51(3): 1-34. |
[3] | QIU J, DONG Y, MA H, et al. Network embedding as matrix factorization: unifying DeepWalk, LINE, PTE, and Node2vec[C]// Proceedings of the 11th ACM International Conference on Web Search and Data Mining, Marina Del Rey, Feb 5-9, 2018. New York: ACM, 2018: 459-467. |
[4] | DE RIDDER D, KOUROPTEVA O, OKUN O, et al. Super-vised locally linear embedding[C]// LNCS 2714: Proceedings of the Joint International Conference ICANN/ICONIP 2003, Istanbul, Jun 26-29, 2003. Berlin, Heidelberg: Springer, 2003: 333-341. |
[5] | BELKIN M, NIYOGI P. Laplacian Eigenmaps and spectral techniques for embedding and clustering[C]// Advances in Neural Information Processing Systems 14, Vancouver, Dec 3-8, 2001: 585-591. |
[6] | AHMED A, SHERVASHIDZE N, NARAYANAMURTHY S, et al. Distributed large-scale natural graph factorization[C]// Proceedings of the 22nd International Conference on World Wide Web, Rio de Janeiro, Brazil, May 13-17, 2013. New York: ACM, 2013: 37-48. |
[7] | CAO S, LU W, XU Q. Grarep: learning graph representa-tions with global structural information[C]// Proceedings of the 24th ACM International Conference on Information and Knowledge Management, Melbourne, Oct 19-23, 2015. New York: ACM, 2015: 891-900. |
[8] | OU M, CUI P, PEI J, et al. Asymmetric transitivity preser-ving graph embedding[C]// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, Aug 13-17, 2016. New York: ACM, 2016: 1105-1114. |
[9] | PEROZZI B, AL-RFOU R, SKIENA S. DeepWalk: online learning of social representations[C]// Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Dis-covery and Data Mining, New York, Aug 24-27, 2014. New York: ACM, 2014: 701-710. |
[10] | TANG J, QU M, WANG M, et al. LINE: large-scale informa-tion network embedding[C]// Proceedings of the 24th Interna-tional Conference on World Wide Web, Florence, May 18-22, 2015. New York: ACM, 2015: 1067-1077. |
[11] | GROVER A, LESKOVEC J. Node2vec: scalable feature lear-ning for networks[C]// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, Aug 13-17, 2016. New York: ACM, 2016: 855-864. |
[12] | KIPF T N, WELLING M. Semi-supervised classification with graph convolutional networks[J]. arXiv:1609.02907, 2016. |
[13] | WANG D, CUI P, ZHU W. Structural deep network embed-ding[C]// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, Aug 13-17, 2016. New York: ACM, 2016: 1225-1234. |
[14] |
GUO S, WANG Y, YUAN H, et al. TAERT: triple-attentional explainable recommendation with temporal convolutional network[J]. Information Sciences, 2021, 567: 185-200.
DOI URL |
[15] | RIBEIRO L F R, SAVERESE P H P, FIGUEIREDO D R. Struc2vec: learning node representations from structural iden-tity[C]// Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, Aug 13-17, 2017. New York: ACM, 2017: 385-394. |
[16] | NAVLAKHA S, RASTOGI R, SHRIVASTAVA N. Graph summarization with bounded error[C]// Proceedings of the 2018 ACM SIGMOD International Conference on Manage-ment of Data, Vancouver, Jun 10-12, 2008. New York: ACM, 2008: 419-432. |
[17] | MACCIONI A, ABADI D J. Scalable pattern matching over compressed graphs via dedensification[C]// Proceedings of the 22nd ACM SIGKDD International Conference on Know-ledge Discovery and Data Mining, San Francisco, Aug 13-17, 2016. New York: ACM, 2016: 1755-1764. |
[18] | ZHU L, GHASEMI-GOL M, SZEKELY P, et al. Unsuper-vised entity resolution on multi-type graphs[C]// LNCS 9981: Proceedings of the 15th International Semantic Web Confe-rence, Kobe, Oct 17-21, 2016. Cham: Springer, 2016: 649-667. |
[19] | LI C T, LIN S D. Egocentric information abstraction for heterogeneous social networks[C]// Proceedings of the 2009 International Conference on Advances in Social Network Ana-lysis and Mining, Athens, Jul 20-22, 2009. Washington: IEEE Computer Society, 2009: 255-260. |
[20] | SHAH N, KOUTRA D, ZOU T, et al. TimeCrunch: interp-retable dynamic graph summarization[C]// Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, Aug 10-13, 2015. New York: ACM, 2015: 1055-1064. |
[21] | JIN D, ROSSI R A, KOH E, et al. Latent network summa-rization: bridging network embedding and summarization[C]// Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Anc-horage, Aug 4-8, 2019. New York: ACM, 2019: 987-997. |
[22] |
周才东, 曾碧卿, 王盛玉, 等. 结合注意力与卷积神经网络的中文摘要研究[J]. 计算机工程与应用, 2019, 55(8): 132-137.
DOI |
ZHOU C D, ZENG B Q, WANG S Y, et al. Chinese summa-rization research on combination of local attention and con-volutional neural network[J]. Computer Engineering and App-lications, 2019, 55(8): 132-137. | |
[23] | 田珂珂, 周瑞莹, 董浩业, 等. 基于编码器共享和门控网络的生成式文本摘要方法[J]. 北京大学学报(自然科学版), 2020, 56(1): 61-67. |
TIAN K K, ZHOU R Y, DONG H Y, et al. An abstractive summarization method based on encoder-sharing and gated network[J]. Acta Scientiarum Naturalium Universitatis Pekine-nsis, 2020, 56(1): 61-67. |
[1] | 赵泽渊, 代永强. 改进混合二进制蝗虫优化特征选择算法[J]. 计算机科学与探索, 2021, 15(7): 1339-1349. |
[2] | 王雲霞, 曹付元, 凌兆龙. 混合局部因果结构学习[J]. 计算机科学与探索, 2021, 15(4): 754-765. |
[3] | 赵雪莉, 卢光跃, 吕少卿, 张潘. 结合属性信息的二分网络表示学习[J]. 计算机科学与探索, 2021, 15(3): 495-505. |
[4] | 涂吉屏,钱晔,王炜,范道远,张涵宇. 使用EBIC的软件故障特征选择方法[J]. 计算机科学与探索, 2020, 14(2): 215-235. |
[5] | 倪鹏,刘阳明,赵素云,陈红,李翠平. 动态模糊粗糙特征选取算法[J]. 计算机科学与探索, 2020, 14(2): 236-243. |
[6] | 杜师帅,邱天,李灵巧,胡锦泉,郑安兵,冯艳春,胡昌勤,杨辉华. 多层梯度提升树在药品鉴别中的应用[J]. 计算机科学与探索, 2020, 14(2): 260-273. |
[7] | 王金杰,李炜. 混合互信息和粒子群算法的多目标特征选择方法[J]. 计算机科学与探索, 2020, 14(1): 83-95. |
[8] | 许磊,黄玲,王昌栋. 保持Motif结构的网络表示学习[J]. 计算机科学与探索, 2019, 13(8): 1261-1271. |
[9] | 周慧,赵中英,李超. 面向异质信息网络的表示学习方法研究综述[J]. 计算机科学与探索, 2019, 13(7): 1081-1093. |
[10] | 林得富,王骏,张嘉旭,应文豪,王士同. Takagi-Sugeno模糊系统双正则联合稀疏建模新方法[J]. 计算机科学与探索, 2019, 13(6): 1016-1026. |
[11] | 杨晓翠,宋甲秀,张曦煌. 基于网络表示学习的链路预测算法[J]. 计算机科学与探索, 2019, 13(5): 812-821. |
[12] | 巢秀琴,李炜. 人工蜂群算法优化的特征选择方法[J]. 计算机科学与探索, 2019, 13(2): 300-309. |
[13] | 张蕾,钱峰,赵姝,陈洁,张燕平. 利用变分自编码器进行网络表示学习[J]. 计算机科学与探索, 2019, 13(10): 1733-1744. |
[14] | 钱文彬,黄琴,王映龙,杨珺. 多标记不完备数据的特征选择算法[J]. 计算机科学与探索, 2019, 13(10): 1768-1780. |
[15] | 马忱,姜高霞,王文剑. 面向函数型数据的动态互信息特征选择方法[J]. 计算机科学与探索, 2019, 13(1): 158-168. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||