Journal of Frontiers of Computer Science and Technology ›› 2014, Vol. 8 ›› Issue (1): 103-110.DOI: 10.3778/j.issn.1673-9418.1307017

Previous Articles     Next Articles

Protein Structure Class Prediction Based on Autocorrelation Coefficient and PseAAC

ZHANG Yanping1,2, ZHA Yongliang1,2, ZHAO Shu1,2, DU Xiuquan1,2+   

  1. 1. School of Computer Science and Technology, Anhui University, Hefei 230601, China
    2. Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Anhui University, Hefei 230601, China
  • Online:2014-01-01 Published:2014-01-03

基于自相关系数和PseAAC的蛋白质结构类预测

张燕平1,2,查永亮1,2,赵  姝1,2,杜秀全1,2+   

  1. 1. 安徽大学 计算机科学与技术学院,合肥 230601
    2. 安徽大学 计算智能和信号处理教育部重点实验室,合肥 230601

Abstract: In the traditional prediction methods, only the composition of amino acids was taken into account in constructing feature vector. While both the position and interaction of the amino acids which are at the different locations can be reflected well by the correlation coefficient. Firstly, this paper designs a method which combines amino acid composition and correlation coefficient. Secondly, on the basis of the pseudo-amino acid composition (PseAAC) model proposed by Chou, this paper reconstructs the PseAAC model by extending the information, and combines the PseAAC model and autocorrelation coefficient to construct feature vector. Using the two new methods for coding, several experiments are conducted on the datasets Z277, Z498 and the independent test sets D138 with the prediction tool of support vector machine. The experimental comparison results show that the accuracy of the new method can improve 7.43% and 8.53% on average than the traditional amino acid composition method, which proves that the new method is more effective.

Key words: protein structure class prediction, autocorrelation coefficient, pseudo-amino acid composition (PseAAC), support vector machine (SVM)

摘要: 传统的预测方法在构造特征向量时只考虑了氨基酸的组成,而自相关系数不仅能够很好地反映序列中氨基酸的位置信息,而且考虑了序列内部不同位置的氨基酸间的相互影响。设计了一种将氨基酸组成和自相关系数相结合的方法来构造特征向量;在Chou提出的伪氨基酸组成模型(pseudo-amino acid composition,PseAAC)的基础上,通过扩展信息重新构造了伪氨基酸组成模型,并将其与自相关系数组合在一起来构造特征向量。分别使用两种方法编码,选用支持向量机作为预测工具,在数据集Z277、Z498以及独立测试集D138上进行了若干实验,对比结果显示,新方法比传统的氨基酸组成方法的准确率分别平均提高了7.43%和8.53%,证明了新方法是有效的。

关键词: 蛋白质结构类预测, 自相关系数, 伪氨基酸组成(PseAAC), 支持向量机(SVM)