首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 922 毫秒
1.
Information on relative solvent accessibility (RSA) of amino acid residues in proteins provides valuable clues to the prediction of protein structure and function. A two-stage approach with support vector machines (SVMs) is proposed, where an SVM predictor is introduced to the output of the single-stage SVM approach to take into account the contextual relationships among solvent accessibilities for the prediction. By using the position-specific scoring matrices (PSSMs) generated by PSI-BLAST, the two-stage SVM approach achieves accuracies up to 90.4% and 90.2% on the Manesh data set of 215 protein structures and the RS126 data set of 126 nonhomologous globular proteins, respectively, which are better than the highest published scores on both data sets to date. A Web server for protein RSA prediction using a two-stage SVM method has been developed and is available (http://birc.ntu.edu.sg/~pas0186457/rsa.html).  相似文献   

2.
3.
Wang JY  Lee HM  Ahmad S 《Proteins》2007,68(1):82-91
A number of methods for predicting levels of solvent accessibility or accessible surface area (ASA) of amino acid residues in proteins have been developed. These methods either predict regularly spaced states of relative solvent accessibility or an analogue real value indicating relative solvent accessibility. While discrete states of exposure can be easily obtained by post prediction assignment of thresholds to the predicted or computed real values of ASA, the reverse, that is, obtaining a real value from quantized states of predicted ASA, is not straightforward as a two-state prediction in such cases would give a large real valued errors. However, prediction of ASA into larger number of ASA states and then finding a corresponding scheme for real value prediction may be helpful in integrating the two approaches of ASA prediction. We report a novel method of obtaining numerical real values of solvent accessibility, using accumulation cutoff set and support vector machine. This so-called SVM-Cabins method first predicts discrete states of ASA of amino acid residues from their evolutionary profile and then maps the predicted states onto a real valued linear space by simple algebraic methods. Resulting performance of such a rigorous approach using 13-state ASA prediction is at least comparable with the best methods of ASA prediction reported so far. The mean absolute error in this method reaches the best performance of 15.1% on the tested data set of 502 proteins with a coefficient of correlation equal to 0.66. Since, the method starts with the prediction of discrete states of ASA and leads to real value predictions, performance of prediction in binary states and real values are simultaneously optimized.  相似文献   

4.
基于支持向量机方法的蛋白可溶性预测   总被引:1,自引:0,他引:1  
按照蛋白质序列中残基的相对可溶性,将其分为两类(表面/内部)和三类(表面/中间/内部)进行预测。选择不同窗宽和参数对数据进行训练和预测,以确保得到最好的分类效果,并同其他已有方法进行比较。对同一数据集不同分类阈值的预测结果显示,支持向量机方法对蛋白质可溶性的整体预测效果优于神经网络和信息论的方法。其中,对两类数据的最优分类结果达到79.0%,对三类数据的最优分类结果达到67.5%,表明支持向量机是蛋白质残基可溶性预测的一种有效方法。  相似文献   

5.
Ahmad S 《Gene》2009,428(1-2):25-30
Solvent accessibility of amino acid residues in proteins has been widely studied and many methods for its prediction from sequence and evolutionary information are available. Some of the advantages of studying amino acid solvent accessibility also apply to DNA. However, currently there are no methods to estimate the solvent accessibility of nucleotides, as most works on DNA structures have focused on elastic deformations and other structural attributes. In this work, an attempt has been made to analyze the distribution of different nucleotides in various accessibility ranges. Effect of neighboring nucleotides on the predictability of exposure has been evaluated by developing a linear perceptron model that takes sequence information as the input. Five different types of solvent accessibility (overall nucleotide, side chain, main chain, polar and non-polar) have been predicted. From the analysis, it is observed that Thymine stands out in terms of its higher exposed surface area, particularly its side chain and non-polar atoms. It is also concluded that the solvent accessibility of a nucleotide strongly depends on its sequence neighbors and can be predicted with fair success using this information.  相似文献   

6.
Wang JY  Ahmad S  Gromiha MM  Sarai A 《Biopolymers》2004,75(3):209-216
We developed dictionaries of two-, three-, and five-residue patterns in proteins and computed the average solvent accessibility of the central residues in their native proteins. These dictionaries serve as a look-up table for making subsequent predictions of solvent accessibility of amino acid residues. We find that predictions made in this way are very close to those made using more sophisticated methods of solvent accessibility prediction. We also analyzed the effect of immediate neighbors on the solvent accessibility of residues. This helps us in understanding how the same residue type may have different accessible surface areas in different proteins and in different positions of the same protein. We observe that certain residues have a tendency to increase or decrease the solvent accessibility of their neighboring residues in C- or N-terminal positions. Interestingly, the C-terminal and N-terminal neighbor residues are found to have asymmetric roles in modifying solvent accessibility of residues. As expected, similar neighbors enhance the hydrophobic or hydrophilic character of residues. Detailed look-up tables are provided on the web at www.netasa.org/look-up/.  相似文献   

7.
Li X  Pan XM 《Proteins》2001,42(1):1-5
A novel method was developed for predicting the solvent accessibility. Based on single sequence data, this method achieved 71.5% accuracy with a correlation coefficient of 0.42 in a database of 704 proteins with threshold of 20% for a two-state-defining solvent accessibility. Prediction in a data subset of 341 monomeric proteins achieved 72.7% accuracy with a correlation coefficient of 0. 43. On the average, prediction over short chains gives better results than that over long chains. With a solvent accessibility threshold of 20%, prediction over 236 monomeric proteins with chain length < 300 amino acid residues achieved 75.3% accuracy with a correlation coefficient of 0.44 by jackknife analysis, which is higher than that obtained by previous methods using multiple sequence alignments.  相似文献   

8.
Faraggi E  Xue B  Zhou Y 《Proteins》2009,74(4):847-856
This article attempts to increase the prediction accuracy of residue solvent accessibility and real-value backbone torsion angles of proteins through improved learning. Most methods developed for improving the backpropagation algorithm of artificial neural networks are limited to small neural networks. Here, we introduce a guided-learning method suitable for networks of any size. The method employs a part of the weights for guiding and the other part for training and optimization. We demonstrate this technique by predicting residue solvent accessibility and real-value backbone torsion angles of proteins. In this application, the guiding factor is designed to satisfy the intuitive condition that for most residues, the contribution of a residue to the structural properties of another residue is smaller for greater separation in the protein-sequence distance between the two residues. We show that the guided-learning method makes a 2-4% reduction in 10-fold cross-validated mean absolute errors (MAE) for predicting residue solvent accessibility and backbone torsion angles, regardless of the size of database, the number of hidden layers and the size of input windows. This together with introduction of two-layer neural network with a bipolar activation function leads to a new method that has a MAE of 0.11 for residue solvent accessibility, 36 degrees for psi, and 22 degrees for phi. The method is available as a Real-SPINE 3.0 server in http://sparks.informatics.iupui.edu.  相似文献   

9.
Wang  Cui-cui  Fang  Yaping  Xiao  Jiamin  Li  Menglong 《Amino acids》2011,40(1):239-248
RNA–protein interactions play a pivotal role in various biological processes, such as mRNA processing, protein synthesis, assembly, and function of ribosome. In this work, we have introduced a computational method for predicting RNA-binding sites in proteins based on support vector machines by using a variety of features from amino acid sequence information including position-specific scoring matrix (PSSM) profiles, physicochemical properties and predicted solvent accessibility. Considering the influence of the surrounding residues of an amino acid and the dependency effect from the neighboring amino acids, a sliding window and a smoothing window are used to encode the PSSM profiles. The outer fivefold cross-validation method is evaluated on the data set of 77 RNA-binding proteins (RBP77). It achieves an overall accuracy of 88.66% with the Matthew’s correlation coefficient (MCC) of 0.69. Furthermore, an independent data set of 39 RNA-binding proteins (RBP39) is employed to further evaluate the performance and achieves an overall accuracy of 82.36% with the MCC of 0.44. The result shows that our method has good generalization abilities in predicting RNA-binding sites for novel proteins. Compared with other previous methods, our method performs well on the same data set. The prediction results suggest that the used features are effective in predicting RNA-binding sites in proteins. The code and all data sets used in this article are freely available at .  相似文献   

10.
Discriminating outer membrane proteins (OMPs) from other folding types of globular and membrane proteins is an important task both for identifying outer membrane proteins from genomic sequences and for the successful prediction of their secondary and tertiary structures. In this work, we have analyzed the influence of physico-chemical, energetic and conformational properties of amino acid residues for discriminating outer membrane proteins using different machine learning algorithms, such as, Bayes rules, Logistic functions, Neural networks, Support vector machines, Decision trees, etc. We observed that most of the properties have discriminated the OMPs with similar accuracy. The neural network method with the property, free energy change could discriminate the OMPs from other folding types of globular and membrane proteins at the 5-fold cross-validation accuracy of 94.4% in a dataset of 1,088 proteins, which is better than that obtained with amino acid composition. The accuracy of discriminating globular proteins is 94.3% and that of transmembrane helical (TMH) proteins is 91.8%. Further, the neural network method is tested with globular proteins belonging to 30 major folding types and it could successfully exclude 99.4% of the considered 1612 non-redundant proteins. These accuracy levels are comparable to or better than other methods in the literature. We suggest that this method could be effectively used to discriminate OMPs and for detecting OMPs in genomic sequences.  相似文献   

11.
Bhardwaj N  Lu H 《FEBS letters》2007,581(5):1058-1066
Protein-DNA interactions are crucial to many cellular activities such as expression-control and DNA-repair. These interactions between amino acids and nucleotides are highly specific and any aberrance at the binding site can render the interaction completely incompetent. In this study, we have three aims focusing on DNA-binding residues on the protein surface: to develop an automated approach for fast and reliable recognition of DNA-binding sites; to improve the prediction by distance-dependent refinement; use these predictions to identify DNA-binding proteins. We use a support vector machines (SVM)-based approach to harness the features of the DNA-binding residues to distinguish them from non-binding residues. Features used for distinction include the residue's identity, charge, solvent accessibility, average potential, the secondary structure it is embedded in, neighboring residues, and location in a cationic patch. These features collected from 50 proteins are used to train SVM. Testing is then performed on another set of 37 proteins, much larger than any testing set used in previous studies. The testing set has no more than 20% sequence identity not only among its pairs, but also with the proteins in the training set, thus removing any undesired redundancy due to homology. This set also has proteins with an unseen DNA-binding structural class not present in the training set. With the above features, an accuracy of 66% with balanced sensitivity and specificity is achieved without relying on homology or evolutionary information. We then develop a post-processing scheme to improve the prediction using the relative location of the predicted residues. Balanced success is then achieved with average sensitivity, specificity and accuracy pegged at 71.3%, 69.3% and 70.5%, respectively. Average net prediction is also around 70%. Finally, we show that the number of predicted DNA-binding residues can be used to differentiate DNA-binding proteins from non-DNA-binding proteins with an accuracy of 78%. Results presented here demonstrate that machine-learning can be applied to automated identification of DNA-binding residues and that the success rate can be ameliorated as more features are added. Such functional site prediction protocols can be useful in guiding consequent works such as site-directed mutagenesis and macromolecular docking.  相似文献   

12.
Knowing the coordination number and relative solvent accessibility of all the residues in a protein is crucial for deriving constraints useful in modeling protein folding and protein structure and in scoring remote homology searches. We develop ensembles of bidirectional recurrent neural network architectures to improve the state of the art in both contact and accessibility prediction, leveraging a large corpus of curated data together with evolutionary information. The ensembles are used to discriminate between two different states of residue contacts or relative solvent accessibility, higher or lower than a threshold determined by the average value of the residue distribution or the accessibility cutoff. For coordination numbers, the ensemble achieves performances ranging within 70.6-73.9% depending on the radius adopted to discriminate contacts (6A-12A). These performances represent gains of 16-20% over the baseline statistical predictor, always assigning an amino acid to the largest class, and are 4-7% better than any previous method. A combination of different radius predictors further improves performance. For accessibility thresholds in the relevant 15-30% range, the ensemble consistently achieves a performance above 77%, which is 10-16% above the baseline prediction and better than other existing predictors, by up to several percentage points. For both problems, we quantify the improvement due to evolutionary information in the form of PSI-BLAST-generated profiles over BLAST profiles. The prediction programs are implemented in the form of two web servers, CONpro and ACCpro, available at http://promoter.ics.uci.edu/BRNN-PRED/.  相似文献   

13.
Ho SY  Yu FC  Chang CY  Huang HL 《Bio Systems》2007,90(1):234-241
In this paper, we investigate the design of accurate predictors for DNA-binding sites in proteins from amino acid sequences. As a result, we propose a hybrid method using support vector machine (SVM) in conjunction with evolutionary information of amino acid sequences in terms of their position-specific scoring matrices (PSSMs) for prediction of DNA-binding sites. Considering the numbers of binding and non-binding residues in proteins are significantly unequal, two additional weights as well as SVM parameters are analyzed and adopted to maximize net prediction (NP, an average of sensitivity and specificity) accuracy. To evaluate the generalization ability of the proposed method SVM-PSSM, a DNA-binding dataset PDC-59 consisting of 59 protein chains with low sequence identity on each other is additionally established. The SVM-based method using the same six-fold cross-validation procedure and PSSM features has NP=80.15% for the training dataset PDNA-62 and NP=69.54% for the test dataset PDC-59, which are much better than the existing neural network-based method by increasing the NP values for training and test accuracies up to 13.45% and 16.53%, respectively. Simulation results reveal that SVM-PSSM performs well in predicting DNA-binding sites of novel proteins from amino acid sequences.  相似文献   

14.
Joo K  Lee SJ  Lee J 《Proteins》2012,80(7):1791-1797
We present a method to predict the solvent accessibility of proteins which is based on a nearest neighbor method applied to the sequence profiles. Using the method, continuous real-value prediction as well as two-state and three-state discrete predictions can be obtained. The method utilizes the z-score value of the distance measure in the feature vector space to estimate the relative contribution among the k-nearest neighbors for prediction of the discrete and continuous solvent accessibility. The Solvent accessibility database is constructed from 5717 proteins extracted from PISCES culling server with the cutoff of 25% sequence identities. Using optimal parameters, the prediction accuracies (for discrete predictions) of 78.38% (two-state prediction with the threshold of 25%), 65.1% (three-state prediction with the thresholds of 9 and 36%), and the Pearson correlation coefficient (between the predicted and true RSA's for continuous prediction) of 0.676 are achieved An independent benchmark test was performed with the CASP8 targets where we find that the proposed method outperforms existing methods. The prediction accuracies are 80.89% (for two state prediction with the threshold of 25%), 67.58% (three-state prediction), and the Pearson correlation coefficient of 0.727 (for continuous prediction) with mean absolute error of 0.148. We have also investigated the effect of increasing database sizes on the prediction accuracy, where additional improvement in the accuracy is observed as the database size increases. The SANN web server is available at http://lee.kias.re.kr/~newton/sann/.  相似文献   

15.
Prediction of protein surface accessibility with information theory   总被引:8,自引:0,他引:8  
A new, simple method based on information theory is introduced to predict the solvent accessibility of amino acid residues in various states defined by their different thresholds. Prediction is achieved by the application of information obtained from a single amino acid position or pair-information for a window of seventeen amino acids around the desired residue. Results obtained by pairwise information values are better than results from single amino acids. This reinforces the effect of the local environment on the accessibility of amino acid residues. The prediction accuracy of this method in a jackknife test system for two and three states is better than 70 and 60 %, respectively. A comparison of the results with those reported by others involving the same data set also testifies to a better prediction accuracy in our case.  相似文献   

16.
NETASA: neural network based prediction of solvent accessibility   总被引:3,自引:0,他引:3  
MOTIVATION: Prediction of the tertiary structure of a protein from its amino acid sequence is one of the most important problems in molecular biology. The successful prediction of solvent accessibility will be very helpful to achieve this goal. In the present work, we have implemented a server, NETASA for predicting solvent accessibility of amino acids using our newly optimized neural network algorithm. Several new features in the neural network architecture and training method have been introduced, and the network learns faster to provide accuracy values, which are comparable or better than other methods of ASA prediction. RESULTS: Prediction in two and three state classification systems with several thresholds are provided. Our prediction method achieved the accuracy level upto 90% for training and 88% for test data sets. Three state prediction results provide a maximum 65% accuracy for training and 63% for the test data. Applicability of neural networks for ASA prediction has been confirmed with a larger data set and wider range of state thresholds. Salient differences between a linear and exponential network for ASA prediction have been analysed. AVAILABILITY: Online predictions are freely available at: http://www.netasa.org. Linux ix86 binaries of the program written for this work may be obtained by email from the corresponding author.  相似文献   

17.
Accuracy of predicting protein secondary structure and solvent accessibility from sequence information has been improved significantly by using information contained in multiple sequence alignments as input to a neural 'network system. For the Asilomar meeting, predictions for 13 proteins were generated automatically using the publicly available prediction method PHD. The results confirm the estimate of 72% three-state prediction accuracy. The fairly accurate predictions of secondary structure segments made the tool useful as a starting point for modeling of higher dimensional aspects of protein structure. © 1995 Wiley-Liss, Inc.  相似文献   

18.
Afonnikov  D. A.  Morozov  A. V.  Kolchanov  N. A. 《Biophysics》2008,51(1):56-60

The profile of contact numbers of amino acid residues in proteins contains important information about the protein structure and is connected with the accessibility of residues to solvent. Here we propose a method for predicting the profile of contact numbers of residues in protein from its amino acid sequence. The method is based on regression using a neural network algorithm. The algorithm predicts two types of profiles, namely, the total number of contacts and the number of close contacts with the neighbors in the chain. The Pearson coefficient of correlation between the actual and predicted values of total contact numbers amounted to 0.526–0.703. As for the number of close contacts, this coefficient was higher (0.662–0.743) for all the considered threshold contact distances (6, 8, 10, and 12 Å). The program for prediction of contact numbers CONNP is available at http://wwwmgs2.bionet.nsc.ru/reloaded.

  相似文献   

19.
Prediction of protein structure from its amino acid sequence is still a challenging problem. The complete physicochemical understanding of protein folding is essential for the accurate structure prediction. Knowledge of residue solvent accessibility gives useful insights into protein structure prediction and function prediction. In this work, we propose a random forest method, RSARF, to predict residue accessible surface area from protein sequence information. The training and testing was performed using 120 proteins containing 22006 residues. For each residue, buried and exposed state was computed using five thresholds (0%, 5%, 10%, 25%, and 50%). The prediction accuracy for 0%, 5%, 10%, 25%, and 50% thresholds are 72.9%, 78.25%, 78.12%, 77.57% and 72.07% respectively. Further, comparison of RSARF with other methods using a benchmark dataset containing 20 proteins shows that our approach is useful for prediction of residue solvent accessibility from protein sequence without using structural information. The RSARF program, datasets and supplementary data are available at http://caps.ncbs.res.in/download/pugal/RSARF/.  相似文献   

20.
We have investigated amino acid features that determine secondary structure: (1) the solvent accessibility of each side chain, and (2) the interaction of each side chain with others one to four residues apart. Solvent accessibility is a simple model that distinguishes residue environment. The pairwise interactions represent a simple model of local side chain to side chain interactions. To test the importance of these features we developed an algorithm to separate alpha-helices, beta-strands, and "other" structure. Single residue and pairwise probabilities were determined for 25,141 samples from proteins with <30% homology. Combining the features of solvent accessibility with pairwise probabilities allows us to distinguish the three structures after cross validation at the 82.0% level. We gain 1.4% to 2.0% accuracy by optimizing the propensities, demonstrating that probabilities do not necessarily reflect propensities. Optimization of residue exposures, weights of all probabilities, and propensities increased accuracy to 84.0%.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号