计算机科学与探索 ›› 2021, Vol. 15 ›› Issue (10): 1980-1989.DOI: 10.3778/j.issn.1673-9418.2007073

• 人工智能 • 上一篇    下一篇

采用隐马尔科夫模型的蛋白质复合物识别研究

李鹏,罗爱静,闵慧,谭荪怡,郭惠敏   

  1. 1. 中南大学 湘雅三医院,长沙 410006
    2. 湖南中医药大学 信息科学与工程学院,长沙 410208
    3. 医学信息研究湖南省普通高等学校重点实验室(中南大学),长沙 410006
    4. 湖南信息职业技术学院 软件学院,长沙 410200
  • 出版日期:2021-10-01 发布日期:2021-09-30

Research on Protein Complex Recognition Using Hidden Markov Model

LI Peng, LUO Aijing, MIN Hui, TAN Sunyi, GUO Huimin   

  1. 1. The Third Xiangya Hospital of Central South University, Changsha 410006, China
    2. School of Informatics, Hunan University of Chinese Medicine, Changsha 410208, China
    3. Key Laboratory of Medical Information Research (Central South University), College of Hunan Province, Changsha 410006, China
    4. Software Department, Hunan College of Information, Changsha 410200, China
  • Online:2021-10-01 Published:2021-09-30

摘要:

动态蛋白质网络的构建和复合物识别问题是生物信息学领域目前研究的热点。针对现有的算法在解决前述问题上的不足,提出了一种基于隐马尔科夫模型的蛋白质复合物识别算法(HMM-PC)。首先基于蛋白质的基因共表达特性构建初始蛋白质网络,然后利用蛋白质的共享功能注释、共享结构域和连接强度等信息来对网络进行加权,得到动态蛋白质网络。在此基础上,考虑前一时刻蛋白质网络拓扑结构信息对当前时刻蛋白质网络拓扑结构信息的影响,采用隐马尔科夫模型描述蛋白质复合物与网络个体间的相互关系,进而将动态蛋白质网络中的复合物识别问题建模为隐马尔科夫模型中的最优状态序列发现问题,并采用维特比算法识别得到蛋白质复合物。最后通过理论分析证明了所提算法的复杂度较低。采用DIP数据集和MIPS数据集中的酵母蛋白质网络作为测试对象,大量的仿真实验结果也表明,HMM-PC算法的鲁棒性较强,在查全率、查准率、F-measure和效率等方面的性能都要优于现有的复合物识别算法。

关键词: 动态蛋白质网络, 蛋白质复合物, 隐马尔科夫模型, 状态序列, 维特比算法

Abstract:

The construction of dynamic protein networks and the recognition of protein complexes are the hot topics in the current research of bioinformatics. In view of the shortcomings of existing algorithms in solving the above problems, a protein complex recognition algorithm (HMM-PC) based on hidden Markov model is proposed. In this paper, the initial protein network is constructed based on the co-expression characteristics of proteins, and then the dynamic protein network is obtained by weighting the initial network with the information of shared function annotation, shared domain and connection strength. On this basis, considering the influence of the previous time protein network topology information on the current protein network topology information, the relationship between protein complex and network individuals is described based on HMM, and then the problem of protein complex recognition in dynamic protein networks is modeled as the problem of optimal state sequence discovery in HMM and the protein complex is identified by the Viterbi algorithm. Finally, theoretical analysis shows that the proposed algorithm has low complexity. The yeast protein network in DIP data set and MIPS data set is used as the test object. A large number of simulation results also show that HMM-PC algorithm has strong robustness, and its performance is better than the existing composite recognition algorithms in terms of recall, precision, F-measure and efficiency.

Key words: dynamic protein network, protein complex, hidden Markov model, state sequence, Viterbi algorithm