首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Knowledge of structural class plays an important role in understanding protein folding patterns. In this study, a simple and powerful computational method, which combines support vector machine with PSI-BLAST profile, is proposed to predict protein structural class for low-similarity sequences. The evolution information encoding in the PSI-BLAST profiles is converted into a series of fixed-length feature vectors by extracting amino acid composition and dipeptide composition from the profiles. The resulting vectors are then fed to a support vector machine classifier for the prediction of protein structural class. To evaluate the performance of the proposed method, jackknife cross-validation tests are performed on two widely used benchmark datasets, 1189 (containing 1092 proteins) and 25PDB (containing 1673 proteins) with sequence similarity lower than 40% and 25%, respectively. The overall accuracies attain 70.7% and 72.9% for 1189 and 25PDB datasets, respectively. Comparison of our results with other methods shows that our method is very promising to predict protein structural class particularly for low-similarity datasets and may at least play an important complementary role to existing methods.  相似文献   

2.
The accurate identification of protein structure class solely using extracted information from protein sequence is a complicated task in the current computational biology. Prediction of protein structural class for low-similarity sequences remains a challenging problem. In this study, the new computational method has been developed to predict protein structural class by fusing the sequence information and evolution information to represent a protein sample. To evaluate the performance of the proposed method, jackknife cross-validation tests are performed on two widely used benchmark data-sets, 1189 and 25PDB with sequence similarity lower than 40 and 25%, respectively. Comparison of our results with other methods shows that the proposed method by us is very promising and may provide a cost-effective alternative to predict protein structural class in particular for low-similarity data-sets.  相似文献   

3.
The accurate identification of protein structure class solely using extracted information from protein sequence is a complicated task in the current computational biology. Prediction of protein structural class for low-similarity sequences remains a challenging problem. In this study, the new computational method has been developed to predict protein structural class by fusing the sequence information and evolution information to represent a protein sample. To evaluate the performance of the proposed method, jackknife cross-validation tests are performed on two widely used benchmark data-sets, 1189 and 25PDB with sequence similarity lower than 40 and 25%, respectively. Comparison of our results with other methods shows that the proposed method by us is very promising and may provide a cost-effective alternative to predict protein structural class in particular for low-similarity data-sets.  相似文献   

4.
Protein structural class prediction is one of the challenging problems in bioinformatics. Previous methods directly based on the similarity of amino acid (AA) sequences have been shown to be insufficient for low-similarity protein data-sets. To improve the prediction accuracy for such low-similarity proteins, different methods have been recently proposed that explore the novel feature sets based on predicted secondary structure propensities. In this paper, we focus on protein structural class prediction using combinations of the novel features including secondary structure propensities as well as functional domain (FD) features extracted from the InterPro signature database. Our comprehensive experimental results based on several benchmark data-sets have shown that the integration of new FD features substantially improves the accuracy of structural class prediction for low-similarity proteins as they capture meaningful relationships among AA residues that are far away in protein sequence. The proposed prediction method has also been tested to predict structural classes for partially disordered proteins with the reasonable prediction accuracy, which is a more difficult problem comparing to structural class prediction for commonly used benchmark data-sets and has never been done before to the best of our knowledge. In addition, to avoid overfitting with a large number of features, feature selection is applied to select discriminating features that contribute to achieve high prediction accuracy. The selected features have been shown to achieve stable prediction performance across different benchmark data-sets.  相似文献   

5.
Protein structural class prediction is one of the challenging problems in bioinformatics. Previous methods directly based on the similarity of amino acid (AA) sequences have been shown to be insufficient for low-similarity protein data-sets. To improve the prediction accuracy for such low-similarity proteins, different methods have been recently proposed that explore the novel feature sets based on predicted secondary structure propensities. In this paper, we focus on protein structural class prediction using combinations of the novel features including secondary structure propensities as well as functional domain (FD) features extracted from the InterPro signature database. Our comprehensive experimental results based on several benchmark data-sets have shown that the integration of new FD features substantially improves the accuracy of structural class prediction for low-similarity proteins as they capture meaningful relationships among AA residues that are far away in protein sequence. The proposed prediction method has also been tested to predict structural classes for partially disordered proteins with the reasonable prediction accuracy, which is a more difficult problem comparing to structural class prediction for commonly used benchmark data-sets and has never been done before to the best of our knowledge. In addition, to avoid overfitting with a large number of features, feature selection is applied to select discriminating features that contribute to achieve high prediction accuracy. The selected features have been shown to achieve stable prediction performance across different benchmark data-sets.  相似文献   

6.
Structural class characterizes the overall folding type of a protein or its domain. A number of computational methods have been proposed to predict structural class based on primary sequences; however, the accuracy of these methods is strongly affected by sequence homology. This paper proposes, an ensemble classification method and a compact feature-based sequence representation. This method improves prediction accuracy for the four main structural classes compared to competing methods, and provides highly accurate predictions for sequences of widely varying homologies. The experimental evaluation of the proposed method shows superior results across sequences that are characterized by entire homology spectrum, ranging from 25% to 90% homology. The error rates were reduced by over 20% when compared with using individual prediction methods and most commonly used composition vector representation of protein sequences. Comparisons with competing methods on three large benchmark datasets consistently show the superiority of the proposed method.  相似文献   

7.
《Biochimie》2013,95(9):1741-1744
In this study, a 12-dimensional feature vector is constructed to reflect the general contents and spatial arrangements of the secondary structural elements of a given protein sequence. Among the 12 features, 6 novel features are specially designed to improve the prediction accuracies for α/β and α + β classes based on the distributions of α-helices and β-strands and the characteristics of parallel β-sheets and anti-parallel β-sheets. To evaluate our method, the jackknife cross-validating test is employed on two widely-used datasets, 25PDB and 1189 datasets with sequence similarity lower than 40% and 25%, respectively. The performance of our method outperforms the recently reported methods in most cases, and the 6 newly-designed features have significant positive effect to the prediction accuracies, especially for α/β and α + β classes.  相似文献   

8.
This paper presents a novel feature vector based on physicochemical property of amino acids for prediction protein structural classes. The proposed method is divided into three different stages. First, a discrete time series representation to protein sequences using physicochemical scale is provided. Later on, a wavelet-based time-series technique is proposed for extracting features from mapped amino acid sequence and a fixed length feature vector for classification is constructed. The proposed feature space summarizes the variance information of ten different biological properties of amino acids. Finally, an optimized support vector machine model is constructed for prediction of each protein structural class. The proposed approach is evaluated using leave-one-out cross-validation tests on two standard datasets. Comparison of our result with existing approaches shows that overall accuracy achieved by our approach is better than exiting methods.  相似文献   

9.
Computational prediction of protein structural class based on sequence data remains a challenging problem in current protein science. In this paper, a new feature extraction approach based on relative polypeptide composition is introduced. This approach could take into account the background distribution of a given k-mer under a Markov model of order k-2, and avoid the curse of dimensionality with the increase of k by using a T-statistic feature selection strategy. The selected features are then fed to a support vector machine to perform the prediction. To verify the performance of our method, jackknife cross-validation tests are performed on four widely used benchmark datasets. Comparison of our results with existing methods shows that our method provides satisfactory performance for structural class prediction.  相似文献   

10.
Liu T  Geng X  Zheng X  Li R  Wang J 《Amino acids》2012,42(6):2243-2249
Computational prediction of protein structural class based solely on sequence data remains a challenging problem in protein science. Existing methods differ in the protein sequence representation models and prediction engines adopted. In this study, a powerful feature extraction method, which combines position-specific score matrix (PSSM) with auto covariance (AC) transformation, is introduced. Thus, a sample protein is represented by a series of discrete components, which could partially incorporate the long-range sequence order information and evolutionary information reflected from the PSI-BLAST profile. To verify the performance of our method, jackknife cross-validation tests are performed on four widely used benchmark datasets. Comparison of our results with existing methods shows that our method provides the state-of-the-art performance for structural class prediction. A Web server that implements the proposed method is freely available at http://202.194.133.5/xinxi/AAC_PSSM_AC/index.htm.  相似文献   

11.
12.
Knowledge of protein structural class can provide important information about its folding patterns. Many approaches have been developed for the prediction of protein structural classes. However, the information used by these approaches is primarily based on amino acid sequences. In this study, a novel method is presented to predict protein structural classes by use of chemical shift (CS) information derived from nuclear magnetic resonance spectra. Firstly, 399 non-homologue (about 15% identity) proteins were constructed to investigate the distribution of averaged CS values of six nuclei ((13)CO, (13)Cα, (13)Cβ, (1)HN, (1)Hα and (15)N) in three protein structural classes. Subsequently, support vector machine was proposed to predict three protein structural classes by using averaged CS information of six nuclei. Overall accuracy of jackknife cross-validation achieves 87.0%. Finally, the feature selection technique is applied to exclude redundant information and find out an optimized feature set. Results show that the overall accuracy increased to 88.0% by using the averaged CSs of (13)CO, (1)Hα and (15)N. The proposed approach outperformed other state-of-the-art methods in terms of predictive accuracy in particular for low-similarity protein data. We expect that our proposed approach will be an excellent alternative to traditional methods for protein structural class prediction.  相似文献   

13.
One major problem with the existing algorithm for the prediction of protein structural classes is low accuracies for proteins from α/β and α+β classes. In this study, three novel features were rationally designed to model the differences between proteins from these two classes. In combination with other rational designed features, an 11-dimensional vector prediction method was proposed. By means of this method, the overall prediction accuracy based on 25PDB dataset was 1.5% higher than the previous best-performing method, MODAS. Furthermore, the prediction accuracy for proteins from α+β class based on 25PDB dataset was 5% higher than the previous best-performing method, SCPRED. The prediction accuracies obtained with the D675 and FC699 datasets were also improved.  相似文献   

14.
We evaluated the i-peptides occurrence frequency in the protein sequences belonging to the two datasets which include proteins with a sequence similarity lower than 25% and 40%, respectively. We worked out a new structural class prediction algorithm using the most frequent i-peptides (with i=2, 3, 4), which characterize the four structural classes. Using the tri-peptides, much more able to gain structural information from sequences compared to the di-peptides, the best results were obtained. Compared to the other methods, similarly founded on peptide occurrence frequencies, our method achieves the best prediction accuracy. We compared it also with methods founded on more sophisticated computational approaches.  相似文献   

15.
Ding S  Zhang S  Li Y  Wang T 《Biochimie》2012,94(5):1166-1171
Knowledge of structural classes plays an important role in understanding protein folding patterns. In this paper, features based on the predicted secondary structure sequence and the corresponding E–H sequence are extracted. Then, an 11-dimensional feature vector is selected based on a wrapper feature selection algorithm and a support vector machine (SVM). Among the 11 selected features, 4 novel features are newly designed to model the differences between α/β class and α + β class, and other 7 rational features are proposed by previous researchers. To examine the performance of our method, a total of 5 datasets are used to design and test the proposed method. The results show that competitive prediction accuracies can be achieved by the proposed method compared to existing methods (SCPRED, RKS-PPSC and MODAS), and 4 new features are demonstrated essential to differentiate α/β and α + β classes. Standalone version of the proposed method is written in JAVA language and it can be downloaded from http://web.xidian.edu.cn/slzhang/paper.html.  相似文献   

16.
β转角作为一种蛋白质二级结构类型在蛋白质折叠、蛋白质稳定性、分子识别等方面具有重要作用.现有的β转角预测方法,没有将PDB等结构数据库中先前存在的同源序列的结构信息映射到待预测的蛋白质序列上.PDB存储的结构已超过70 000,因此对一条新确定的序列,有较大可能性从PDB中找到其同源序列.本文融合PDB中提取的同源结构信息(对每一待测序列,仅使用先于该序列存储于PDB中的同源信息)与NetTurnP预测,提出了一种新的β转角预测方法BTMapping,在经典的BT426数据集和本文构建的数据集EVA937上,以马修斯相关系数表示的预测精度分别为0.56、0.52,而仅使用NetTurnP的为0.50、0.46,以Qtotal表示的预测精度分别为81.4%、80.4%,而仅使用NetTurnP的为78.2%、77.3%.结果证实同源结构信息结合先进的β转角预测器如NetTurnP有助于改进β转角识别.BTMapping程序及相关数据集可从http://www.bio530.weebly.com获得.  相似文献   

17.
Subcellular location is an important functional annotation of proteins. An automatic, reliable and efficient prediction system for protein subcellular localization is necessary for large-scale genome analysis. This paper describes a protein subcellular localization method which extracts features from protein profiles rather than from amino acid sequences. The protein profile represents a protein family, discards part of the sequence information that is not conserved throughout the family and therefore is more sensitive than the amino acid sequence. The amino acid compositions of whole profile and the N-terminus of the profile are extracted, respectively, to train and test the probabilistic neural network classifiers. On two benchmark datasets, the overall accuracies of the proposed method reach 89.1% and 68.9%, respectively. The prediction results show that the proposed method perform better than those methods based on amino acid sequences. The prediction results of the proposed method are also compared with Subloc on two redundance-reduced datasets.  相似文献   

18.
Knowledge of structural class plays an important role in understanding protein folding patterns. So it is necessary to develop effective and reliable computational methods for prediction of protein structural class. To this end, we present a new method called NN-CDM, a nearest neighbor classifier with a complexity-based distance measure. Instead of extracting features from protein sequences as done previously, distance between each pair of protein sequences is directly evaluated by a complexity measure of symbol sequences. Then the nearest neighbor classifier is adopted as the predictive engine. To verify the performance of this method, jackknife cross-validation tests are performed on several benchmark datasets. Results show that our approach achieves a high prediction accuracy over some classical methods.  相似文献   

19.
Li ZC  Zhou XB  Lin YR  Zou XY 《Amino acids》2008,35(3):581-590
Structural class characterizes the overall folding type of a protein or its domain. Most of the existing methods for determining the structural class of a protein are based on a group of features that only possesses a kind of discriminative information for the prediction of protein structure class. However, different types of discriminative information associated with primary sequence have been completely missed, which undoubtedly has reduced the success rate of prediction. We present a novel method for the prediction of protein structure class by coupling the improved genetic algorithm (GA) with the support vector machine (SVM). This improved GA was applied to the selection of an optimized feature subset and the optimization of SVM parameters. Jackknife tests on the working datasets indicated that the prediction accuracies for the different classes were in the range of 97.8–100% with an overall accuracy of 99.5%. The results indicate that the approach has a high potential to become a useful tool in bioinformatics.  相似文献   

20.
Prediction of protein structure from its amino acid sequence is still a challenging problem. The complete physicochemical understanding of protein folding is essential for the accurate structure prediction. Knowledge of residue solvent accessibility gives useful insights into protein structure prediction and function prediction. In this work, we propose a random forest method, RSARF, to predict residue accessible surface area from protein sequence information. The training and testing was performed using 120 proteins containing 22006 residues. For each residue, buried and exposed state was computed using five thresholds (0%, 5%, 10%, 25%, and 50%). The prediction accuracy for 0%, 5%, 10%, 25%, and 50% thresholds are 72.9%, 78.25%, 78.12%, 77.57% and 72.07% respectively. Further, comparison of RSARF with other methods using a benchmark dataset containing 20 proteins shows that our approach is useful for prediction of residue solvent accessibility from protein sequence without using structural information. The RSARF program, datasets and supplementary data are available at http://caps.ncbs.res.in/download/pugal/RSARF/.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号