首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Background

The fluctuation of atoms around their average positions in protein structures provides important information regarding protein dynamics. This flexibility of protein structures is associated with various biological processes. Predicting flexibility of residues from protein sequences is significant for analyzing the dynamic properties of proteins which will be helpful in predicting their functions.

Results

In this paper, an approach of improving the accuracy of protein flexibility prediction is introduced. A neural network method for predicting flexibility in 3 states is implemented. The method incorporates sequence and evolutionary information, context-based scores, predicted secondary structures and solvent accessibility, and amino acid properties. Context-based statistical scores are derived, using the mean-field potentials approach, for describing the different preferences of protein residues in flexibility states taking into consideration their amino acid context.The 7-fold cross validated accuracy reached 61 % when context-based scores and predicted structural states are incorporated in the training process of the flexibility predictor.

Conclusions

Incorporating context-based statistical scores with predicted structural states are important features to improve the performance of predicting protein flexibility, as shown by our computational results. Our prediction method is implemented as web service called “FLEXc” and available online at: http://hpcr.cs.odu.edu/flexc.
  相似文献   

2.
We present a protocol for predicting protein flexibility from NMR chemical shifts. The protocol consists of (i) ensuring that the chemical shift assignments are correctly referenced or, if not, performing a reference correction using information derived from the chemical shift index, (ii) calculating the random coil index (RCI), and (iii) predicting the expected root mean square fluctuations (RMSFs) and order parameters (S2) of the protein from the RCI. The key advantages of this protocol over existing methods for studying protein dynamics are that (i) it does not require prior knowledge of a protein's tertiary structure, (ii) it is not sensitive to the protein's overall tumbling and (iii) it does not require additional NMR measurements beyond the standard experiments for backbone assignments. When chemical shift assignments are available, protein flexibility parameters, such as S2 and RMSF, can be calculated within 1-2 h using a spreadsheet program.  相似文献   

3.
Protein folding rates vary by several orders of magnitude and they depend on the topology of the fold and the size and composition of the sequence. Although recent works show that the rates can be predicted from the sequence, allowing for high‐throughput annotations, they consider only the sequence and its predicted secondary structure. We propose a novel sequence‐based predictor, PFR‐AF, which utilizes solvent accessibility and residue flexibility predicted from the sequence, to improve predictions and provide insights into the folding process. The predictor includes three linear regressions for proteins with two‐state, multistate, and unknown (mixed‐state) folding kinetics. PFR‐AF on average outperforms current methods when tested on three datasets. The proposed approach provides high‐quality predictions in the absence of similarity between the predicted and the training sequences. The PFR‐AF's predictions are characterized by high (between 0.71 and 0.95, depending on the dataset) correlation and the lowest (between 0.75 and 0.9) mean absolute errors with respect to the experimental rates, as measured using out‐of‐sample tests. Our models reveal that for the two‐state chains inclusion of solvent‐exposed Ala may accelerate the folding, while increased content of Ile may reduce the folding speed. We also demonstrate that increased flexibility of coils facilitates faster folding and that proteins with larger content of solvent‐exposed strands may fold at a slower pace. The increased flexibility of the solvent‐exposed residues is shown to elongate folding, which also holds, with a lower correlation, for buried residues. Two case studies are included to support our findings. Proteins 2010. © 2010 Wiley‐Liss, Inc.  相似文献   

4.
The SLoop database of supersecondary fragments, first described by Donate et al. (Protein Sci., 1996, 5, 2600-2616), contains protein loops, classified according to structural similarity. The database has recently been updated and currently contains over 10 000 loops up to 20 residues in length, which cluster into over 560 well populated classes. The database can be found at http://www-cryst.bioc.cam.ac.uk/~sloop. In this paper, we identify conserved structural features such as main chain conformation and hydrogen bonding. Using the original approach of Rufino and co-workers (1997), the correct structural class is predicted with the highest SLoop score for 35% of loops. This rises to 65% by considering the three highest scoring class predictions and to 75% in the top five scoring class predictions. Inclusion of residues from the neighbouring secondary structures and use of substitution tables derived using a reduced definition of secondary structure increase these prediction accuracies to 58, 78 and 85%, respectively. This suggests that capping residues can stabilize the loop conformation as well as that of the secondary structure. Further increases are achieved if only well-populated classes are considered in the prediction. These results correspond to an average loop root mean square deviation of between 0.4 and 2.6 A for loops up to five residues in length.  相似文献   

5.
The increasing number and diversity of protein sequence families requires new methods to define and predict details regarding function. Here, we present a method for analysis and prediction of functional sub-types from multiple protein sequence alignments. Given an alignment and set of proteins grouped into sub-types according to some definition of function, such as enzymatic specificity, the method identifies positions that are indicative of functional differences by comparison of sub-type specific sequence profiles, and analysis of positional entropy in the alignment. Alignment positions with significantly high positional relative entropy correlate with those known to be involved in defining sub-types for nucleotidyl cyclases, protein kinases, lactate/malate dehydrogenases and trypsin-like serine proteases. We highlight new positions for these proteins that suggest additional experiments to elucidate the basis of specificity. The method is also able to predict sub-type for unclassified sequences. We assess several variations on a prediction method, and compare them to simple sequence comparisons. For assessment, we remove close homologues to the sequence for which a prediction is to be made (by a sequence identity above a threshold). This simulates situations where a protein is known to belong to a protein family, but is not a close relative of another protein of known sub-type. Considering the four families above, and a sequence identity threshold of 30 %, our best method gives an accuracy of 96 % compared to 80 % obtained for sequence similarity and 74 % for BLAST. We describe the derivation of a set of sub-type groupings derived from an automated parsing of alignments from PFAM and the SWISSPROT database, and use this to perform a large-scale assessment. The best method gives an average accuracy of 94 % compared to 68 % for sequence similarity and 79 % for BLAST. We discuss implications for experimental design, genome annotation and the prediction of protein function and protein intra-residue distances.  相似文献   

6.
Schlessinger A  Rost B 《Proteins》2005,61(1):115-126
Structural flexibility has been associated with various biological processes such as molecular recognition and catalytic activity. In silico studies of protein flexibility have attempted to characterize and predict flexible regions based on simple principles. B-values derived from experimental data are widely used to measure residue flexibility. Here, we present the most comprehensive large-scale analysis of B-values. We used this analysis to develop a neural network-based method that predicts flexible-rigid residues from amino acid sequence. The system uses both global and local information (i.e., features from the entire protein such as secondary structure composition, protein length, and fraction of surface residues, and features from a local window of sequence-consecutive residues). The most important local feature was the evolutionary exchange profile reflecting sequence conservation in a family of related proteins. To illustrate its potential, we applied our method to 4 different case studies, each of which related our predictions to aspects of function. The first 2 were the prediction of regions that undergo conformational switches upon environmental changes (switch II region in Ras) and the prediction of surface regions, the rigidity of which is crucial for their function (tunnel in propeller folds). Both were correctly captured by our method. The third study established that residues in active sites of enzymes are predicted by our method to have unexpectedly low B-values. The final study demonstrated how well our predictions correlated with NMR order parameters to reflect motion. Our method had not been set up to address any of the tasks in those 4 case studies. Therefore, we expect that this method will assist in many attempts at inferring aspects of function.  相似文献   

7.
8.

Background  

Relating features of protein sequences to structural hinges is important for identifying domain boundaries, understanding structure-function relationships, and designing flexibility into proteins. Efforts in this field have been hampered by the lack of a proper dataset for studying characteristics of hinges.  相似文献   

9.
The discovery of intrinsic disorderness in proteins and peptide regions has given a new and useful insight into the working of biological systems. Due to enormous plasticity and heterogeneity, intrinsically disordered proteins or regions in proteins can perform myriad of functions. The flexibility in disordered proteins allows them to undergo conformation transition to form homopolymers of proteins called amyloids. Amyloids are highly structured protein aggregates associated with many neurodegenerative diseases. However, amyloids have gained much appreciation in recent years due to their functional roles. A functional amyloid fiber called curli is assembled on the bacterial cell surface as a part of the extracellular matrix during biofilm formation. The extracellular matrix that encases cells in a biofilm protects the cells and provides resistance against many environmental stresses. Several of the Csg (curli specific genes) proteins that are required for curli amyloid assembly are predicted to be intrinsically disordered. Therefore, curli amyloid formation is highly orchestrated so that these intrinsically disordered proteins do not inappropriately aggregate at the wrong time or place. The curli proteins are compartmentalized and there are chaperone-like proteins that prevent inappropriate aggregation and allow the controlled assembly of curli amyloids. Here we review the biogenesis of curli amyloids and the role that intrinsically disordered proteins play in the process.  相似文献   

10.
Clark WT  Radivojac P 《Proteins》2011,79(7):2086-2096
Understanding protein function is one of the keys to understanding life at the molecular level. It is also important in the context of human disease because many conditions arise as a consequence of alterations of protein function. The recent availability of relatively inexpensive sequencing technology has resulted in thousands of complete or partially sequenced genomes with millions of functionally uncharacterized proteins. Such a large volume of data, combined with the lack of high-throughput experimental assays to functionally annotate proteins, attributes to the growing importance of automated function prediction. Here, we study proteins annotated by Gene Ontology (GO) terms and estimate the accuracy of functional transfer from protein sequence only. We find that the transfer of GO terms by pairwise sequence alignments is only moderately accurate, showing a surprisingly small influence of sequence identity (SID) in a broad range (30-100%). We developed and evaluated a new predictor of protein function, functional annotator (FANN), from amino acid sequence. The predictor exploits a multioutput neural network framework which is well suited to simultaneously modeling dependencies between functional terms. Experiments provide evidence that FANN-GO (predictor of GO terms; available from http://www.informatics.indiana.edu/predrag) outperforms standard methods such as transfer by global or local SID as well as GOtcha, a method that incorporates the structure of GO.  相似文献   

11.
12.
Length-dependent prediction of protein intrinsic disorder   总被引:2,自引:0,他引:2  

Background  

Due to the functional importance of intrinsically disordered proteins or protein regions, prediction of intrinsic protein disorder from amino acid sequence has become an area of active research as witnessed in the 6th experiment on Critical Assessment of Techniques for Protein Structure Prediction (CASP6). Since the initial work by Romero et al. (Identifying disordered regions in proteins from amino acid sequences, IEEE Int. Conf. Neural Netw., 1997), our group has developed several predictors optimized for long disordered regions (>30 residues) with prediction accuracy exceeding 85%. However, these predictors are less successful on short disordered regions (≤30 residues). A probable cause is a length-dependent amino acid compositions and sequence properties of disordered regions.  相似文献   

13.
A grand challenge in the proteomics and structural genomics era is the prediction of protein structure, including identification of those proteins that are partially or wholly unstructured. A number of predictors for identification of intrinsically disordered proteins (IDPs) have been developed over the last decade, but none can be taken as a fully reliable on its own. Using a single model for prediction is typically inadequate because prediction based on only the most accurate model ignores model uncertainty. In this paper, we present an empirical method to specify and measure uncertainty associated with disorder predictions. In particular, we analyze the uncertainty in the reference model itself and the uncertainty in data. This is achieved by training a set of models and developing several meta predictors on top of them. The best meta predictor achieved comparable or better results than any other single model, suggesting that incorporating different aspects of protein disorder prediction is important for the disorder prediction task. In addition, the best meta-predictor had more balanced sensitivity and specificity than any individual model. We also assessed the effects of changes in disorder prediction as a function of changes in the protein sequence. For collections of homologous sequences, we found that mutations caused many of the predicted disordered residues to be flipped to be predicted as ordered residues, while the reverse was observed much less frequently. These results suggest that disorder tendencies are more sensitive to allowed mutations than structure tendencies and the conservation of disorder is indeed less stable than conservation of structure. Availability: five meta-predictors and four single models developed for this study will be publicly freely accessible for non-commercial use.  相似文献   

14.
In proteins, immunogenic determinants that can induce protein-reactive antipeptide antibodies reside mostly in those parts of the molecule that have a high tendency to form beta-turns. A program for an IBM personal computer which predicts protein immunogenic determinants is described. The program predicts potential immunogenic determinants from protein amino acid sequences according to a Chou-Fasman-based probability of a beta-turn occurrence, p greater than 1.5 X 10(-4)(P. Y. Chou and G. D. Fasman, 1978, Adv. Enzymol. 47, 46-148). Oncopeptides (whose efficacy in generating protein-reactive antipeptide antibodies has been described) with a beta-turn probability of p greater than 1.5 X 10(-4) elicited antipeptide antibodies that reacted with the parent oncoprotein at a rate of 96%, thus showing a surprisingly good correlation between the tendency to form a beta-turn and the protein reactivity of antipeptide antibodies. Potential immunogenic determinants were predicted on myohemerythrin and myoglobin.  相似文献   

15.
Li X  Pan XM 《Proteins》2001,42(1):1-5
A novel method was developed for predicting the solvent accessibility. Based on single sequence data, this method achieved 71.5% accuracy with a correlation coefficient of 0.42 in a database of 704 proteins with threshold of 20% for a two-state-defining solvent accessibility. Prediction in a data subset of 341 monomeric proteins achieved 72.7% accuracy with a correlation coefficient of 0. 43. On the average, prediction over short chains gives better results than that over long chains. With a solvent accessibility threshold of 20%, prediction over 236 monomeric proteins with chain length < 300 amino acid residues achieved 75.3% accuracy with a correlation coefficient of 0.44 by jackknife analysis, which is higher than that obtained by previous methods using multiple sequence alignments.  相似文献   

16.
Prediction of protein structure from its amino acid sequence is still a challenging problem. The complete physicochemical understanding of protein folding is essential for the accurate structure prediction. Knowledge of residue solvent accessibility gives useful insights into protein structure prediction and function prediction. In this work, we propose a random forest method, RSARF, to predict residue accessible surface area from protein sequence information. The training and testing was performed using 120 proteins containing 22006 residues. For each residue, buried and exposed state was computed using five thresholds (0%, 5%, 10%, 25%, and 50%). The prediction accuracy for 0%, 5%, 10%, 25%, and 50% thresholds are 72.9%, 78.25%, 78.12%, 77.57% and 72.07% respectively. Further, comparison of RSARF with other methods using a benchmark dataset containing 20 proteins shows that our approach is useful for prediction of residue solvent accessibility from protein sequence without using structural information. The RSARF program, datasets and supplementary data are available at http://caps.ncbs.res.in/download/pugal/RSARF/.  相似文献   

17.
The nucleotide sequence of the alanine racemase (EC 5.1.1.1) gene from a thermophile, Bacillus stearothermophilus, was determined by the dideoxy chain termination method with universal and synthetic site-specific primers. The amino acid sequence of the enzyme predicted from the nucleotide sequence was confirmed by peptide sequence information derived from the N-terminal amino acid residues and several tryptic fragments. The alanine racemase gene consists of 1158 base pairs encoding a protein of 386 amino acid residues; the molecular weight of the apoenzyme is estimated as 43,341. The racemase gene of B. stearothermophilus has a closely similar size (1158 vs 1167 base pairs) to that of the gene of a mesophile, B. subtilis, but shows a higher preference for codons ending in G or C. A comparison of the amino acid sequence with those of Bacillus subtilis and Salmonella typhimurium dadB and alr enzymes revealed overall sequence homologies of 31-54%, including an identical octapeptide bearing the pyridoxal 5'-phosphate binding site. Although the residues common in the four racemases are not continuously arrayed, these constitute distinct domains and their hydropathy profiles are very similar. The secondary structure of B. stearothermophilus alanine racemase was predicted from the results obtained by theoretical analysis and circular dichroism measurement.  相似文献   

18.
Protein flexibility and intrinsic disorder   总被引:6,自引:0,他引:6  
Comparisons were made among four categories of protein flexibility: (1) low-B-factor ordered regions, (2) high-B-factor ordered regions, (3) short disordered regions, and (4) long disordered regions. Amino acid compositions of the four categories were found to be significantly different from each other, with high-B-factor ordered and short disordered regions being the most similar pair. The high-B-factor (flexible) ordered regions are characterized by a higher average flexibility index, higher average hydrophilicity, higher average absolute net charge, and higher total charge than disordered regions. The low-B-factor regions are significantly enriched in hydrophobic residues and depleted in the total number of charged residues compared to the other three categories. We examined the predictability of the high-B-factor regions and developed a predictor that discriminates between regions of low and high B-factors. This predictor achieved an accuracy of 70% and a correlation of 0.43 with experimental data, outperforming the 64% accuracy and 0.32 correlation of predictors based solely on flexibility indices. To further clarify the differences between short disordered regions and ordered regions, a predictor of short disordered regions was developed. Its relatively high accuracy of 81% indicates considerable differences between ordered and disordered regions. The distinctive amino acid biases of high-B-factor ordered regions, short disordered regions, and long disordered regions indicate that the sequence determinants for these flexibility categories differ from one another, whereas the significantly-greater-than-chance predictability of these categories from sequence suggest that flexible ordered regions, short disorder, and long disorder are, to a significant degree, encoded at the primary structure level.  相似文献   

19.
Several computational and experimental methods exist for identifying disordered residues within proteins. Computational algorithms can now identify these disordered sequences and predict their occurrence within genomes with relatively high accuracy. Recent advances in NMR and mass spectroscopy permit faster and more detailed studies of disordered states at atomic resolutions. Combining prediction, computation and experimentation is proposed to accelerate and enhance the characterization of intrinsically disordered protein.  相似文献   

20.
Using a protein design algorithm that considers side-chain packing quantitatively, the effect of explicit backbone motion on the selection of amino acids in protein design was assessed in the core of the streptococcal protein G beta 1 domain (G beta 1). Concerted backbone motion was introduced by varying G beta 1's supersecondary structure parameter values. The stability and structural flexibility of seven of the redesigned proteins were determined experimentally and showed that core variants containing as many as 6 of 10 possible mutations retain native-like properties. This result demonstrates that backbone flexibility can be combined explicitly with amino acid side-chain selection and that the selection algorithm is sufficiently robust to tolerate perturbations as large as 15% of G beta 1's native supersecondary structure parameter values.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号