计算机科学与探索 ›› 2009, Vol. 3 ›› Issue (3): 293-302.DOI: 10.3778/j.issn.1673-9418.2009.03.007

• 学术研究 • 上一篇    下一篇

基于朴素贝叶斯分类器的蛋白质界面残基识别

王池社1,2+,程家兴1,苏守宝1,徐栋哲3   

  1. 1. 安徽大学 计算智能与信号处理教育部重点实验室,合肥 230039
    2. 巢湖学院 计算机科学与技术系,安徽 巢湖 238000
    3. 中国科技大学 近代力学系,合肥 230026
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2009-05-15 发布日期:2009-05-15
  • 通讯作者: 王池社

Identification of Interface Residues Involved in Protein-protein Interactions Using Naïve Bayes Classifier

WANG Chishe1,2+, CHENG Jiaxing1, SU Shoubao1, XU Dongzhe3   

  1. 1. Key Laboratory of Intelligent Computing and Signal Processing, Ministry of Education, Anhui University, Hefei 230039, China
    2. Department of Computer Science and Technology, Chaohu College, Chaohu, Anhui 238000, China
    3. Department of Modern Mechanics, University of Science and Technology of China, Hefei 230026, China
  • Received:1900-01-01 Revised:1900-01-01 Online:2009-05-15 Published:2009-05-15
  • Contact: WANG Chishe

摘要: 蛋白质相互作用中界面残基的识别在药物设计与生物体的新陈代谢等方面有着广泛应用。基于朴素贝叶斯分类器对属性条件独立性的要求,构建了由蛋白质序列谱和溶剂可及表面积组成的蛋白质相互作用特征模型。在一个具有代表性的蛋白质异源复合物组成的数据集中取得了68.1%的准确率、0.201的相关系数、40.2%的特异度和49.9%的灵敏度,取得了比其他方法更优的结果,且远优于随机的实验结果。通过一个三维可视化的结果更好地验证了方法的有效性。

关键词: 朴素贝叶斯分类器, 蛋白质相互作用界面, 序列谱, 残基溶剂可及表面积

Abstract: The identification of interface residues involved in protein-protein interactions (PPIs) has broad application in rational drug design and metabolic etc. A naïve Bayes classifier for PPIs prediction with features including protein sequence profile and residue accessible surface area is proposed. This method adequately uses the character of naïve Bayes classifier which assumes independence of the attributes given the class. The test results on a diversity dataset made up of only hetero-complex proteins achieve 68.1% overall accuracy with a correlation coefficient of 0.201, 40.2% specificity and 49.9% sensitivity in identify interface residues as estimated by leave-one-out cross-validation. This result indicates that the method performs substantially better than chance (zero correlation). Examination of the predictions in the context of 3-dimensional structures of proteins demonstrates the effectiveness of this method in identifying protein-protein sites.

Key words: naï, ve Bayes classifier, protein-protein interactions, sequence profile, residue accessible surface area