计算机科学与探索 ›› 2021, Vol. 15 ›› Issue (11): 2193-2205.DOI: 10.3778/j.issn.1673-9418.2006096

• 人工智能 • 上一篇    下一篇

融合多视角和多标签学习的RNA结合蛋白识别

杨海涛,邓赵红,王士同   

  1. 江南大学 人工智能与计算机学院,江苏 无锡 214122
  • 出版日期:2021-11-01 发布日期:2021-11-09

Recognition of RNA-Binding Protein by Fusion of Multi-view and Multi-label Learning

YANG Haitao, DENG Zhaohong, WANG Shitong   

  1. School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, Jiangsu 214122, China
  • Online:2021-11-01 Published:2021-11-09

摘要:

RNA结合蛋白(RBP)是一类伴随RNA调控代谢过程与RNA结合的蛋白质的总称。一种RBP可能存在多种靶标RNA,其表达缺陷会造成多种疾病。现有的方法大都是针对某种特定的RBP设计二分类模型,预测一条RNA是否可以与之结合。但这些方法没有考虑到不同RBP之间的结合相似性和关联性。对此,iDeepM利用多标签深度学习法进行了改进。此方法使用多标签技术和长短时记忆网络(LSTM),学习到不同RBP之间的结合相似性,预测一条给定的RNA与多种RBP的结合情况,但是该方法未能对RNA序列进行充分的特征学习和多标签学习,预测精度较低。延续iDeepM多标签的研究方法,提出新方法RRMVL,首次使用RNA序列视角、氨基酸序列视角、RNA序列语义视角和多间隙二肽成分视角组成多视角数据来处理多标签RBP识别问题。为了利用多视角数据的不同学习优势,融合四种视角提取到的深度特征,使用逻辑回归原理对它们进行多标签特征学习,将学习后的加权特征向量输入至链式多标签分类器中训练,使之达到最优多标签链式学习的效果。实验研究表明,融合多视角和多标签学习的RNA结合蛋白识别模型预测精度较之前使用单视角方法有了明显的提升。

关键词: 多视角深度特征学习, 多标签特征学习, 最优多标签链式学习, RNA结合蛋白识别

Abstract:

RNA-binding protein (RBP) is a total name of a class of proteins that bind to RNA (ribonucleic acid) along with the process of RNA??s regulation metabolic. An RBP may have multiple target RNAs, and its defective expression may cause various diseases. Existing methods are mostly designed for a specific RBP binary classification model to predict whether an RNA can bind to it. But these methods do not take into account the similarity and association between different RBPs. Therefore, iDeepM uses multi-label deep learning methods to improve it. This method fuses multi-label technology and long short term memory (LSTM) network, learns the similarity between different RBPs, and predicts the binding of a given RNA to multiple RBPs. However, this method fails to perform sufficient feature learning and multi-label learning on RNA sequences, and the prediction accuracy is low. This paper continues the research ideas of iDeepM multi-label, and proposes a new method RNA-RBP multiview learning (RRMVL). For the first time, the RNA sequence view, the amino acid sequence view, the RNA sequence semantic view and the multi-gap dipeptide component view are used to compose multi-view data to deal with multi-label RBP recognition. In order to use the different learning advantages of multi-view data, this paper fuses the deep features extracted from four views and uses the principle of logistic regression to learn multi-label features from them. After that, the learnt weighted feature vectors are fed to the multi-label classifier chain to achieve the optimal multi-label chain learning effect. Experimental studies show that the prediction accuracy of the RNA-binding protein recognition model combining multi-view and multi-label learning has been significantly improved compared with the previous single-view method.

Key words: multi-view deep feature learning, multi-label feature learning, optimal multi-label chain learning, RNA-binding proteins recognition