Accurate prediction of protein structural classes using functional domains and predicted secondary structure sequences期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Accurate prediction of protein structural classes using functional domains and predicted secondary structure sequences

Authors:	Amin Ahmadi Adl Abbas Nowzari-Dalini Bin Xue Vladimir N. Uversky

Affiliation:	1. Department of Computer Science &2. Engineering , University of South Florida , Tampa , FL , 33620 , USA;3. Center of Excellence in Bioinformatics, School of Mathematics , Statistics, and Computer Science, University of Tehran , Tehran , Iran;4. Department of Molecular Medicine, College of Medicine , University of South Florida , Tampa , FL , 33612 , USA;5. Department of Molecular Medicine, College of Medicine , University of South Florida , Tampa , FL , 33612 , USA;6. Institute for Biological Instrumentation, Russian Academy of Science , 142290 , Pushchino , Moscow Region , Russia

Abstract:	Protein structural class prediction is one of the challenging problems in bioinformatics. Previous methods directly based on the similarity of amino acid (AA) sequences have been shown to be insufficient for low-similarity protein data-sets. To improve the prediction accuracy for such low-similarity proteins, different methods have been recently proposed that explore the novel feature sets based on predicted secondary structure propensities. In this paper, we focus on protein structural class prediction using combinations of the novel features including secondary structure propensities as well as functional domain (FD) features extracted from the InterPro signature database. Our comprehensive experimental results based on several benchmark data-sets have shown that the integration of new FD features substantially improves the accuracy of structural class prediction for low-similarity proteins as they capture meaningful relationships among AA residues that are far away in protein sequence. The proposed prediction method has also been tested to predict structural classes for partially disordered proteins with the reasonable prediction accuracy, which is a more difficult problem comparing to structural class prediction for commonly used benchmark data-sets and has never been done before to the best of our knowledge. In addition, to avoid overfitting with a large number of features, feature selection is applied to select discriminating features that contribute to achieve high prediction accuracy. The selected features have been shown to achieve stable prediction performance across different benchmark data-sets.

Keywords:	Protein structural class prediction functional domains predicted secondary structure sequences protein secondary structure propensity disordered proteins support vector machines (SVMs) feature selection

设为首页 | 免责声明 | 关于勤云 | 加入收藏