Protein-HVGAE: Protein Encoding Method in Hyperbolic Space

doi:10.3778/j.issn.1673-9418.2105041

Abstract

Abstract: Protein function prediction, protein interaction prediction and complex identification in protein-protein interaction (PPI) networks are important tasks in the field of bioinformatics, which rely heavily on the protein expression. Since the PPI network is a scale-free network dominated by a small number of hub nodes, it is difficult for the embedding method in traditional Euclidean space to capture the hierarchical structure in the network, resulting in unsatisfactory protein embeddings. This paper proposes a protein auto-encoder in hyperbolic space, Protein-HVGAE (hyperbolic graph auto-encoder for protein interaction networks). This paper uses two hyperbolic graph convolutional networks as encoders, calculates the mean and variance of the hidden layer and captures the hierarchical structure of the PPI network in hyperbolic spaces with different curvatures to distinguish the low-dimensional representation of each node; it uses the Fermi-Dirac function as the decoder, and reconstructs the network through the inner product operation on the hyperbolic space. Experimental results in three PPI networks show that the performance of this model in two downstream tasks (i.e., PPI prediction and protein function prediction) is superior to the previous methods in Euclidean space (around 0.07 improvement of AUC in PPI prediction and 0.02 improvement of Macro-F1 in protein function prediction compared with VGAE model).

Key words: protein-protein interaction network, hyperbolic space, graph convolution, variational graph auto-encoder (VGAE), protein function prediction

摘要： 蛋白质相互作用（PPI）网络中的蛋白质功能预测、蛋白质交互预测和复合物识别是生物信息学的重要任务，非常依赖于对蛋白质的编码。由于PPI网络是由少量中枢节点主导的无标度网络，传统欧氏空间嵌入方法难以捕捉网络中的层次结构，导致蛋白质编码效果并不理想。提出一种基于双曲空间图嵌入的蛋白质自编码器Protein-HVGAE，该模型采用两个双曲图卷积网络作为编码器，计算隐藏层的均值和方差，并在不同曲率的双曲空间中捕捉网络的层次结构，以区分各节点的低维表示；采用Fermi-Dirac函数做解码器，在双曲空间上通过内积运算重构网络。实验结果表明，该模型在3个PPI数据集中的两个下游任务（PPI预测和蛋白质功能预测）上的表现优于以往在欧氏空间中的编码方法（在PPI预测中AUC值高于VGAE模型0.07左右，在蛋白质功能预测中Macro-F1值高于VGAE模型0.02左右）。

关键词: 蛋白质交互网络, 双曲空间, 图卷积, 变分图自编码器（VGAE）, 蛋白质功能预测

WANG Haobai, SHEN Xin, HUANG Weijian, CHEN Kejia. Protein-HVGAE: Protein Encoding Method in Hyperbolic Space[J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(3): 701-708.

王皓白, 沈昕, 黄尉健, 陈可佳. Protein-HVGAE：一种双曲空间中的蛋白质编码方法[J]. 计算机科学与探索, 2023, 17(3): 701-708.

References

[1] HAMILTON W L, YING Z T, LESKOVEC J. Inductive re-presentation learning on large graphs[C]//Advances in Neu-ral Information Processing Systems 30, Dec 4-9, 2017. Red Hook: Curran Associates, 2017: 1024-1035.
[2] VELI V C, KOVI C P, CUCURULL G, et al. Graph attention networks[J]. arXiv:1710.10903, 2017.
[3] LUCK K, KIM D, LAMBOURNE L, et al. A reference map of the human binary protein interactome[J]. Nature, 2020, 580(7803): 402-408.
[4] SUN T, ZHOU B, LAI L, et al. Sequence-based prediction of protein protein interaction using a deep-learning algori-thm[J]. BMC Bioinformatics, 2017, 18(1): 1-8.
[5] YOU Z, CHAN K C, HU P. Predicting protein-protein inte-ractions from primary protein sequences using a novel multi-scale local feature representation scheme and the random forest[J]. PLoS One, 2015, 10(5): e125811.
[6] YOU Z, LI X, CHAN K C. An improved sequence-based pre-diction protocol for protein-protein interactions using amino acids substitution matrix and rotation forest ensemble clas-sifiers[J]. Neurocomputing, 2017, 228: 277-282.
[7] YOU Z, YU J, ZHU L, et al. A MapReduce based parallel SVM for large-scale predicting protein-protein interactions[J]. Neurocomputing, 2014, 145: 37-43.
[8] YUE X, WANG Z, HUANG J, et al. Graph embedding on biomedical networks: methods, applications and evaluations[J]. Bioinformatics, 2020, 36(4): 1241-1251.
[9] KULMANOV M, KHAN M A, HOEHNDORF R. DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier[J]. Bioinformatics,2018, 34(4): 660-668.
[10] YAO H, GUAN J, LIU T. Denoising protein-protein interac-tion network via variational graph auto-encoder for protein complex detection[J]. Journal of Bioinformatics and Compu-tational Biology, 2020, 18(3): 2040010.
[11] ZHU L, YOU Z, HUANG D. Increasing the reliability of protein-protein interaction networks via non-convex semantic embe-dding[J]. Neurocomputing, 2013, 121: 99-107.
[12] GROVER A, LESKOVEC J. node2vec: scalable feature lear-ning for networks[C]//Proceedings of the 22nd ACM SIG-KDD International Conference on Knowledge Discovery and Data Mining, San Francisco, Aug 13-17, 2016. New York: ACM, 2016: 855-864.
[13] LEI C, RUAN J. A novel link prediction algorithm for re-constructing protein-protein interaction networks by topo-logical similarity[J]. Bioinformatics, 2013, 29(3): 355-364.
[14] KIPF T N, WELLING M. Semi-supervised classification with graph convolutional networks[J]. arXiv:1609.02907, 2016.
[15] NICKEL M, KIELA D. Poincaré embeddings for learning hierarchical representations[C]//Advances in Neural Infor-mation Processing Systems 30, Long Beach, Dec 4-9, 2017. Red Hook: Curran Associates, 2017: 6338-6347.
[16] 王强, 江昊, 羿舒文, 等. 复杂网络的双曲空间表征学习方法[J]. 软件学报, 2021, 32(1): 93-117.
WANG Q, JIANG H, YI S W, et al. Hyperbolic representa-tion learning for complex networks[J]. Journal of Software, 2021, 32(1): 93-117.
[17] NICKEL M, KIELA D. Learning continuous hierarchies in the Lorentz model of hyperbolic geometry[C]//Proceedings of the 35th International Conference on Machine Learning, Stockholmsm?ssan, Jul 10-15, 2018: 3776-3785.
[18] CHAMI I, YING R, Ré C, et al. Hyperbolic graph convo-lutional neural networks[C]//Advances in Neural Informa-tion Processing Systems 32, Vancouver, Dec 8-14, 2019: 4869-4880.
[19] KIPF T N, WELLING M. Variational graph auto-encoders[J]. arXiv:1611.07308, 2016.
[20] CHO H, BERGER B, PENG J. Compact integration of multi-network topology for functional analysis of genes[J]. Cell Systems, 2016, 3(6): 540-548.
[21] GLIGORIJEVI C V, BAROT M, BONNEAU R. deepNF: deep network fusion for protein function prediction[J]. Bioinfor-matics, 2018, 34(22): 3873-3881.
[22] HU W, LIU B, GOMES J, et al. Strategies for pre-training graph neural networks[J]. arXiv:1905.12265, 2019.
[23] IOANNIDIS V N, MARQUES A G, GIANNAKIS G B. Graph neural networks for predicting protein functions[C]//Proceedings of the 8th IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Proces-sing, Le Gosier, Dec 15-18, 2019. Piscataway: IEEE, 2019: 221-225.
[24] LIU Q, NICKEL M, KIELA D. Hyperbolic graph neural net-works[C]//Advances in Neural Information Processing Sys-tems 32, Vancouver, Dec 8-14, 2019: 8230-8241.
[25] KRIOUKOV D, PAPADOPOULOS F, KITSAK M, et al. Hyperbolic geometry of complex networks[J]. Physical Re-view E, 2010, 82(3): 36106.
[26] PAPADOPOULOS F, KITSAK M, SERRANO M A N, et al. Popularity versus similarity in growing networks[J]. Na-ture, 2012, 489(7417): 537-540.
[27] TIFREA A, CIGNEUL G, GANEA O. Poincare GloVe: hyperbolic word embeddings[J]. arXiv:1810.06546, 2018.
[28] GULCEHRE C, DENIL M, MALINOWSKI M, et al. Hy-perbolic attention networks[J]. arXiv:1805.09786, 2018.
[29] PEROZZI B, AL-RFOU R, SKIENA S. DeepWalk: online learning of social representations[C]//Proceedings of the 20th ACM SIGKDD International Conference on Know-ledge Discovery and Data Mining, New York, Aug 24-27, 2014. New York: ACM, 2014: 701-710.
[30] RIBEIRO L F R, SAVERESE P H P, FIGUEIREDO D R. Struc2vec: learning node representations from structural iden-tity[C]//Proceedings of the 23rd ACM SIGKDD Internatio-nal Conference on Knowledge Discovery and Data Mining, Halifax, Aug 13-17, 2017. New York: ACM, 2017: 385-394.
[31] GROMOV M. Hyperbolic groups[M]//GERSTEN S M.Essays in Group Theory. Berlin, Heidelberg: Springer, 1987.