计算机科学与探索 ›› 2019, Vol. 13 ›› Issue (6): 982-989.DOI: 10.3778/j.issn.1673-9418.1806028

• 人工智能 • 上一篇    下一篇

基于Convolutional-LSTM的蛋白质亚细胞定位研究

王春宇1,徐珊珊1,郭茂祖1,2+,车  凯1,刘晓燕1   

  1. 1.哈尔滨工业大学 计算机科学与技术学院,哈尔滨 150001
    2.北京建筑大学 电气与信息工程学院,北京 100044
  • 出版日期:2019-06-01 发布日期:2019-06-14

Study of Protein Subcellular Localization Based on Convolutional-LSTM

WANG Chunyu1, XU Shanshan1, GUO Maozu1,2+, CHE Kai1, LIU Xiaoyan1   

  1. 1. School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
    2. School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing 100044, China
  • Online:2019-06-01 Published:2019-06-14

摘要: 蛋白质亚细胞位置预测研究是目前蛋白质组学和生物信息学研究的重点问题之一。蛋白质的亚细胞定位决定了它的生物学功能,故研究亚细胞定位对了解蛋白质功能非常重要。由于蛋白质结构的序列性,考虑使用序列模型来进行亚细胞定位研究。尝试使用卷积神经网络(convolutional neural network,CNN)、长短期记忆神经网络(long short-term memory,LSTM)两种模型挖掘氨基酸序列所包含的信息,从而进行亚细胞定位的预测。随后构建了基于卷积的长短期记忆网络(Convolutional-LSTM)的集成模型进行亚细胞定位。首先通过卷积神经网络对蛋白质数据进行特征抽取,随后进行特征组合,并将其送入长短期记忆神经网络进行特征表征学习,得到亚细胞定位结果。使用该模型能达到0.816 5的分类准确率,比传统方法有明显提升。

关键词: 蛋白质亚细胞定位, 卷积神经网络(CNN), 长短期记忆神经网络(LSTM), 分类

Abstract: The prediction study of protein subcellular location is one of the key issues in proteomics and bioin-formatics research. Subcellular localization of proteins determines its biological function. Therefore, studying subcellular location is very important for understanding the protein function. Because of the sequential protein structure, this paper uses sequence model to carry out subcellular location research. This paper uses two models, convolutional neural network (CNN) and long short-term memory (LSTM) networks, to mine the information contained in the amino acid sequence so as to predict the subcellular location, followed by the integrated model of Convolutional-LSTM to locate subcellular. First, this paper uses convolutional neural network to extract features of protein sequence data. And then the features are combined and sent to the long short-term memory networks for studying characteristic. After that, the subcellular localization results are obtained. The accuracy of the model classification is 0.8165, which is significantly higher than traditional methods.

Key words: protein subcellular location, convolutional neural network (CNN), long short-term memory (LSTM) network, classification