首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 546 毫秒
1.
Side-chain modeling with an optimized scoring function   总被引:1,自引:0,他引:1       下载免费PDF全文
Modeling side-chain conformations on a fixed protein backbone has a wide application in structure prediction and molecular design. Each effort in this field requires decisions about a rotamer set, scoring function, and search strategy. We have developed a new and simple scoring function, which operates on side-chain rotamers and consists of the following energy terms: contact surface, volume overlap, backbone dependency, electrostatic interactions, and desolvation energy. The weights of these energy terms were optimized to achieve the minimal average root mean square (rms) deviation between the lowest energy rotamer and real side-chain conformation on a training set of high-resolution protein structures. In the course of optimization, for every residue, its side chain was replaced by varying rotamers, whereas conformations for all other residues were kept as they appeared in the crystal structure. We obtained prediction accuracy of 90.4% for chi(1), 78.3% for chi(1 + 2), and 1.18 A overall rms deviation. Furthermore, the derived scoring function combined with a Monte Carlo search algorithm was used to place all side chains onto a protein backbone simultaneously. The average prediction accuracy was 87.9% for chi(1), 73.2% for chi(1 + 2), and 1.34 A rms deviation for 30 protein structures. Our approach was compared with available side-chain construction methods and showed improvement over the best among them: 4.4% for chi(1), 4.7% for chi(1 + 2), and 0.21 A for rms deviation. We hypothesize that the scoring function instead of the search strategy is the main obstacle in side-chain modeling. Additionally, we show that a more detailed rotamer library is expected to increase chi(1 + 2) prediction accuracy but may have little effect on chi(1) prediction accuracy.  相似文献   

2.
To improve tertiary structure predictions of more difficult targets, the next generation of TASSER, TASSER_2.0, has been developed. TASSER_2.0 incorporates more accurate side-chain contact restraint predictions from a new approach, the composite-sequence method, based on consensus restraints generated by an improved threading algorithm, PROSPECTOR_3.5, which uses computationally evolved and wild-type template sequences as input. TASSER_2.0 was tested on a large-scale, benchmark set of 2591 nonhomologous, single domain proteins ≤200 residues that cover the Protein Data Bank at 35% pairwise sequence identity. Compared with the average fraction of accurately predicted side-chain contacts of 0.37 using PROSPECTOR_3.5 with wild-type template sequences, the average accuracy of the composite-sequence method increases to 0.60. The resulting TASSER_2.0 models are closer to their native structures, with an average root mean-square deviation of 4.99 Å compared to the 5.31 Å result of TASSER. Defining a successful prediction as a model with a root mean-square deviation to native <6.5 Å, the success rate of TASSER_2.0 (TASSER) for Medium targets (targets with good templates/poor alignments) is 74.3% (64.7%) and 40.8% (35.5%) for the Hard targets (incorrect templates/alignments). For Easy targets (good templates/alignments), the success rate slightly increases from 86.3% to 88.4%.  相似文献   

3.
Improved side-chain modeling for protein-protein docking   总被引:1,自引:0,他引:1  
Success in high-resolution protein-protein docking requires accurate modeling of side-chain conformations at the interface. Most current methods either leave side chains fixed in the conformations observed in the unbound protein structures or allow the side chains to sample a set of discrete rotamer conformations. Here we describe a rapid and efficient method for sampling off-rotamer side-chain conformations by torsion space minimization during protein-protein docking starting from discrete rotamer libraries supplemented with side-chain conformations taken from the unbound structures, and show that the new method improves side-chain modeling and increases the energetic discrimination between good and bad models. Analysis of the distribution of side-chain interaction energies within and between the two protein partners shows that the new method leads to more native-like distributions of interaction energies and that the neglect of side-chain entropy produces a small but measurable increase in the number of residues whose interaction energy cannot compensate for the entropic cost of side-chain freezing at the interface. The power of the method is highlighted by a number of predictions of unprecedented accuracy in the recent CAPRI (Critical Assessment of PRedicted Interactions) blind test of protein-protein docking methods.  相似文献   

4.
M. F. Thorpe  S. Banu Ozkan 《Proteins》2015,83(12):2279-2292
The most successful protein structure prediction methods to date have been template‐based modeling (TBM) or homology modeling, which predicts protein structure based on experimental structures. These high accuracy predictions sometimes retain structural errors due to incorrect templates or a lack of accurate templates in the case of low sequence similarity, making these structures inadequate in drug‐design studies or molecular dynamics simulations. We have developed a new physics based approach to the protein refinement problem by mimicking the mechanism of chaperons that rehabilitate misfolded proteins. The template structure is unfolded by selectively (targeted) pulling on different portions of the protein using the geometric based technique FRODA, and then refolded using hierarchically restrained replica exchange molecular dynamics simulations (hr‐REMD). FRODA unfolding is used to create a diverse set of topologies for surveying near native‐like structures from a template and to provide a set of persistent contacts to be employed during re‐folding. We have tested our approach on 13 previous CASP targets and observed that this method of folding an ensemble of partially unfolded structures, through the hierarchical addition of contact restraints (that is, first local and then nonlocal interactions), leads to a refolding of the structure along with refinement in most cases (12/13). Although this approach yields refined models through advancement in sampling, the task of blind selection of the best refined models still needs to be solved. Overall, the method can be useful for improved sampling for low resolution models where certain of the portions of the structure are incorrectly modeled. Proteins 2015; 83:2279–2292. © 2015 Wiley Periodicals, Inc.  相似文献   

5.
Prediction of side-chain conformations is an important component of several biological modeling applications. In this work, we have developed and tested an advanced Monte Carlo sampling strategy for predicting side-chain conformations. Our method is based on a cooperative rearrangement of atoms that belong to a group of neighboring side-chains. This rearrangement is accomplished by deleting groups of atoms from the side-chains in a particular region, and regrowing them with the generation of trial positions that depends on both a rotamer library and a molecular mechanics potential function. This method allows us to incorporate flexibility about the rotamers in the library and explore phase space in a continuous fashion about the primary rotamers. We have tested our algorithm on a set of 76 proteins using the all-atom AMBER99 force field and electrostatics that are governed by a distance-dependent dielectric function. When the tolerance for correct prediction of the dihedral angles is a <20 degrees deviation from the native state, our prediction accuracies for chi1 are 83.3% and for chi1 and chi2 are 65.4%. The accuracies of our predictions are comparable to the best results in the literature that often used Hamiltonians that have been specifically optimized for side-chain packing. We believe that the continuous exploration of phase space enables our method to overcome limitations inherent with using discrete rotamers as trials.  相似文献   

6.
Accurate prediction of the placement and comformations of protein side chains given only the backbone trace has a wide range of uses in protein design, structure prediction, and functional analysis. Prediction has most often relied on discrete rotamer libraries so that rapid fitness of side-chain rotamers can be assessed against some scoring function. Scoring functions are generally based on experimental parameters from small-molecule studies or empirical parameters based on determined protein structures. Here, we describe the NCN algorithm for predicting the placement of side chains. A predominantly first-principles approach was taken to develop the potential energy function incorporating van der Waals and electrostatics based on the OPLS parameters, and a hydrogen bonding term. The only empirical knowledge used is the frequency of rotameric states from the PDB. The rotamer library includes nearly 50,000 rotamers, and is the most extensive discrete library used to date. Although the computational time tends to be longer than most other algorithms, the overall accuracy exceeds all algorithms in the literature when placing rotamers on an accurate backbone trace. Considering only the most buried residues, 80% of the total residues tested, the placement accuracy reaches 92% for chi(1), and 83% for chi(1 + 2), and an overall RMS deviation of 1 A. Additionally, we show that if information is available to restrict chi(1) to one rotamer well, then this algorithm can generate structures with an average RMS deviation of 1.0 A for all heavy side-chains atoms and a corresponding overall chi(1 + 2) accuracy of 85.0%.  相似文献   

7.
We present direct evidence for a change in protein structural specificity due to hydrophobic core packing. High resolution structural analysis of a designed core variant of ubiquitin reveals that the protein is in slow exchange between two conformations. Examination of side-chain rotamers indicates that this dynamic response and the lower stability of the protein are coupled to greater strain and mobility in the core. The results suggest that manipulating the level of side-chain strain may be one way of fine tuning the stability and specificity of proteins.  相似文献   

8.
A graph-theory algorithm for rapid protein side-chain prediction   总被引:19,自引:0,他引:19       下载免费PDF全文
Fast and accurate side-chain conformation prediction is important for homology modeling, ab initio protein structure prediction, and protein design applications. Many methods have been presented, although only a few computer programs are publicly available. The SCWRL program is one such method and is widely used because of its speed, accuracy, and ease of use. A new algorithm for SCWRL is presented that uses results from graph theory to solve the combinatorial problem encountered in the side-chain prediction problem. In this method, side chains are represented as vertices in an undirected graph. Any two residues that have rotamers with nonzero interaction energies are considered to have an edge in the graph. The resulting graph can be partitioned into connected subgraphs with no edges between them. These subgraphs can in turn be broken into biconnected components, which are graphs that cannot be disconnected by removal of a single vertex. The combinatorial problem is reduced to finding the minimum energy of these small biconnected components and combining the results to identify the global minimum energy conformation. This algorithm is able to complete predictions on a set of 180 proteins with 34342 side chains in <7 min of computer time. The total chi(1) and chi(1 + 2) dihedral angle accuracies are 82.6% and 73.7% using a simple energy function based on the backbone-dependent rotamer library and a linear repulsive steric energy. The new algorithm will allow for use of SCWRL in more demanding applications such as sequence design and ab initio structure prediction, as well addition of a more complex energy function and conformational flexibility, leading to increased accuracy.  相似文献   

9.
Lim Heo  Michael Feig 《Proteins》2020,88(5):637-642
Protein structure prediction has long been available as an alternative to experimental structure determination, especially via homology modeling based on templates from related sequences. Recently, models based on distance restraints from coevolutionary analysis via machine learning to have significantly expanded the ability to predict structures for sequences without templates. One such method, AlphaFold, also performs well on sequences where templates are available but without using such information directly. Here we show that combining machine-learning based models from AlphaFold with state-of-the-art physics-based refinement via molecular dynamics simulations further improves predictions to outperform any other prediction method tested during the latest round of CASP. The resulting models have highly accurate global and local structures, including high accuracy at functionally important interface residues, and they are highly suitable as initial models for crystal structure determination via molecular replacement.  相似文献   

10.
The problem of protein side-chain packing for a given backbone trace is investigated using 3 different prediction models. The first requires an exhaustive search of all possible combinations of side-chain conformers, using the dead-end elimination theorem. The second considers only side-chain-backbone interactions, whereas the third neglects side-chain-backbone interactions and instead keeps side-chain-side-chain interactions. Predictions of side-chain conformations for 11 proteins using all 3 models show that removal of side-chain-side-chain interactions does not cause a large decrease in the prediction accuracy, whereas the model having only side-chain-side-chain interactions still retains a significant level of accuracy. These results suggest that the 2 classes of interactions, side-chain-backbone and side-chain-side-chain, are consistent with each other and work concurrently to stabilize the native conformations. This is confirmed by analyses of energy spectra of the side-chain conformations derived from the fourth prediction model, the Independent model, which gives almost the same quality of the prediction as the dead-end elimination. The analyses indicate that the 2 classes of interactions simultaneously increase the energy difference between the native and nonnative conformations.  相似文献   

11.
SCWRL and MolIDE are software applications for prediction of protein structures. SCWRL is designed specifically for the task of prediction of side-chain conformations given a fixed backbone usually obtained from an experimental structure determined by X-ray crystallography or NMR. SCWRL is a command-line program that typically runs in a few seconds. MolIDE provides a graphical interface for basic comparative (homology) modeling using SCWRL and other programs. MolIDE takes an input target sequence and uses PSI-BLAST to identify and align templates for comparative modeling of the target. The sequence alignment to any template can be manually modified within a graphical window of the target-template alignment and visualization of the alignment on the template structure. MolIDE builds the model of the target structure on the basis of the template backbone, predicted side-chain conformations with SCWRL and a loop-modeling program for insertion-deletion regions with user-selected sequence segments. SCWRL and MolIDE can be obtained at (http://dunbrack.fccc.edu/Software.php).  相似文献   

12.
We assume that each class of protein has a core structure that is defined by internal residues, and that the external, solvent-contacting residues contribute to the stability of the structure, are of primary importance to function, but do not determine the architecture of the core portions of the polypeptide chain. An algorithm has been developed to supply a list of permitted sequences of internal residues compatible with a known core structure. This list is referred to as the tertiary template for that structure. In general the positions in the template are not sequentially adjacent and are distributed throughout the polypeptide chain. The template is derived using the fixed positions for the main-chain and beta-carbon atoms in the test structure and selected stereochemical rules. The focus of this paper is on the use of two packing criteria: avoidance of steric overlap and complete filling of available space. The program also notes potential polar group interactions and disulfide bonds as well as possible burial of formal charges. Central to the algorithm is the side-chain rotamer library. In an update of earlier studies by others, we show that 17 of the 20 amino acids (omitting Met, Lys and Arg) can be represented adequately by 67 side-chain rotamers. A list of chi angles and their standard deviations is given. The newer, high-resolution, refined structures in the Brookhaven Protein Data Bank show similar mean chi values, but have much smaller deviations than those of earlier studies. This suggests that a rotamer library may be a better structural approximation than was previously thought. In using packing constraints, it has been found essential to include all hydrogen atoms specifically. The "unified atom" representation is not adequate. The permitted rotamer sequences are severely restricted by the main-chain plus beta-carbon atoms of the test structure. Further restriction is introduced if the full set of atoms of the external residues are held fixed, the full-chain model. The space-filling requirement has a major role in restricting the template lists. The preliminary tests reported here make it appear likely that templates prepared from the currently known core structures will be able to discriminate between these structures. The templates should thus be useful in deciding whether a sequence of unknown tertiary structure fits any of the known core classes and, if a fit is found, how the sequence should be aligned in three dimensions to fit the core of that class.(ABSTRACT TRUNCATED AT 400 WORDS)  相似文献   

13.
The performance of the self-consistent mean field theory (SCMFT) method for side-chain modeling, employing rotamer energies calculated with the flexible rotamer model (FRM), is evaluated in the context of comparative modeling of protein structure. Predictions were carried out on a test set of 56 model backbones of varying accuracy, to allow side-chain prediction accuracy to be analyzed as a function of backbone accuracy. A progressive decrease in the accuracy of prediction was observed as backbone accuracy decreased. However, even for very low backbone accuracy, prediction was substantially higher than random, indicating that the FRM can, in part, compensate for the errors in the modeled tertiary environment. It was also investigated whether the introduction in the FRM-SCMFT method of knowledge-based biases, derived from a backbone-dependent rotamer library, could enhance its performance. A bias derived from the backbone-dependent rotamer conformations alone did not improve prediction accuracy. However, a bias derived from the backbone-dependent rotamer probabilities improved prediction accuracy considerably. This bias was incorporated through two different strategies. In one (the indirect strategy), rotamer probabilities were used to reject unlikely rotamers a priori, thus restricting prediction by FRM-SCMFT to a subset containing only the most probable rotamers in the library. In the other (the direct strategy), rotamer energies were transformed into pseudo-energies that were added to the average potential energies of the respective rotamers, thereby creating hybrid energy-based/knowledge-based average rotamer energies, which were used by the FRM-SCMFT method for prediction. For all degrees of backbone accuracy, an optimal strength of the knowledge-based bias existed for both strategies for which predictions were more accurate than pure energy-based predictions, and also than pure knowledge-based predictions. Hybrid knowledge-based/energy-based methods were obtained from both strategies and compared with the SCWRL method, a hybrid method based on the same backbone-dependent rotamer library. The accuracy of the indirect method was approximately the same as that of the SCWRL method, but that of the direct method was significantly higher.  相似文献   

14.
The possible applicability of the new template CoMFA methodology to the prediction of unknown biological affinities was explored. For twelve selected targets, all ChEMBL binding affinities were used as training and/or prediction sets, making these 3D-QSAR models the most structurally diverse and among the largest ever. For six of the targets, X-ray crystallographic structures provided the aligned templates required as input (BACE, cdk1, chk2, carbonic anhydrase-II, factor Xa, PTP1B). For all targets including the other six (hERG, cyp3A4 binding, endocrine receptor, COX2, D2, and GABAa), six modeling protocols applied to only three familiar ligands provided six alternate sets of aligned templates. The statistical qualities of the six or seven models thus resulting for each individual target were remarkably similar. Also, perhaps unexpectedly, the standard deviations of the errors of cross-validation predictions accompanying model derivations were indistinguishable from the standard deviations of the errors of truly prospective predictions. These standard deviations of prediction ranged from 0.70 to 1.14 log units and averaged 0.89 (8x in concentration units) over the twelve targets, representing an average reduction of almost 50% in uncertainty, compared to the null hypothesis of “predicting” an unknown affinity to be the average of known affinities. These errors of prediction are similar to those from Tanimoto coefficients of fragment occurrence frequencies, the predominant approach to side effect prediction, which template CoMFA can augment by identifying additional active structural classes, by improving Tanimoto-only predictions, by yielding quantitative predictions of potency, and by providing interpretable guidance for avoiding or enhancing any specific target response.  相似文献   

15.
Protein structure prediction by comparative modeling benefits greatly from the use of multiple sequence alignment information to improve the accuracy of structural template identification and the alignment of target sequences to structural templates. Unfortunately, this benefit is limited to those protein sequences for which at least several natural sequence homologues exist. We show here that the use of large diverse alignments of computationally designed protein sequences confers many of the same benefits as natural sequences in identifying structural templates for comparative modeling targets. A large-scale massively parallelized application of an all-atom protein design algorithm, including a simple model of peptide backbone flexibility, has allowed us to generate 500 diverse, non-native, high-quality sequences for each of 264 protein structures in our test set. PSI-BLAST searches using the sequence profiles generated from the designed sequences ("reverse" BLAST searches) give near-perfect accuracy in identifying true structural homologues of the parent structure, with 54% coverage. In 41 of 49 genomes scanned using reverse BLAST searches, at least one novel structural template (not found by the standard method of PSI-BLAST against PDB) is identified. Further improvements in coverage, through optimizing the scoring function used to design sequences and continued application to new protein structures beyond the test set, will allow this method to mature into a useful strategy for identifying distantly related structural templates.  相似文献   

16.
Side-chain conformational entropy in protein folding.   总被引:14,自引:11,他引:3       下载免费PDF全文
An important, but often neglected, contribution to the thermodynamics of protein folding is the loss of entropy that results from restricting the number of accessible side-chain conformers in the native structure. Conformational entropy changes can be found by comparing the number of accessible rotamers in the unfolded and folded states, or by estimating fusion entropies. Comparison of several sets of results using different techniques shows that the mean conformational free energy change (T delta S) is 1 kcal.mol-1 per side chain or 0.5 kcal.mol-1 per bond. Changes in vibrational entropy appear to be negligible compared to the entropy change resulting from the loss of accessible rotamers. Side-chain entropies can help rationalize alpha-helix propensities, predict protein/inhibitor complex structures, and account for the distribution of side chains on the protein surface or interior.  相似文献   

17.
Misura KM  Baker D 《Proteins》2005,59(1):15-29
Achieving atomic level accuracy in de novo structure prediction presents a formidable challenge even in the context of protein models with correct topologies. High-resolution refinement is a fundamental test of force field accuracy and sampling methodology, and its limited success in both comparative modeling and de novo prediction contexts highlights the limitations of current approaches. We constructed four tests to identify bottlenecks in our current approach and to guide progress in this challenging area. The first three tests showed that idealized native structures are stable under our refinement simulation conditions and that the refinement protocol can significantly decrease the root mean square deviation (RMSD) of perturbed native structures. In the fourth test we applied the refinement protocol to de novo models and showed that accurate models could be identified based on their energies, and in several cases many of the buried side chains adopted native-like conformations. We also showed that the differences in backbone and side-chain conformations between the refined de novo models and the native structures are largely localized to loop regions and regions where the native structure has unusual features such as rare rotamers or atypical hydrogen bonding between beta-strands. The refined de novo models typically have higher energies than refined idealized native structures, indicating that sampling of local backbone conformations and side-chain packing arrangements in a condensed state is a primary obstacle.  相似文献   

18.
The thermodynamic stability of a protein provides an experimental metric for the relationship of protein sequence and native structure. We have investigated an approach based on an analysis of the structural database for stability engineering of an immunoglobulin variable domain. The most frequently occurring residues in specific positions of beta-turn motifs were predicted to increase the folding stability of mutants that were constructed by site-directed mutagenesis. Even in positions in which different residues are conserved in immunoglobulin sequences, the predictions were confirmed. Frequently, mutants with increased beta-turn propensities display increased folding cooperativities, suggesting pronounced effects on the unfolded state independent of the expected effect on conformational entropy. We conclude that structural motifs with predominantly local interactions can serve as templates with which patterns of sequence preferences can be extracted from the database of protein structures. Such preferences can predict the stability effects of mutations for protein engineering and design.  相似文献   

19.
Catalytic site structure is normally highly conserved between distantly related enzymes. As a consequence, templates representing catalytic sites have the potential to succeed at function prediction in cases where methods based on sequence or overall structure fail. There are many methods for searching protein structures for matches to structural templates, but few validated template libraries to use with these methods. We present a library of structural templates representing catalytic sites, based on information from the scientific literature. Furthermore, we analyse homologous template families to discover the diversity within families and the utility of templates for active site recognition. Templates representing the catalytic sites of homologous proteins mostly differ by less than 1A root mean square deviation, even when the sequence similarity between the two proteins is low. Within these sets of homologues there is usually no discernible relationship between catalytic site structure similarity and sequence similarity. Because of this structural conservation of catalytic sites, the templates can discriminate between matches to related proteins and random matches with over 85% sensitivity and predictive accuracy. Templates based on protein backbone positions are more discriminating than those based on side-chain atoms. These analyses show encouraging prospects for prediction of functional sites in structural genomics structures of unknown function, and will be of use in analyses of convergent evolution and exploring relationships between active site geometry and chemistry. The template library can be queried via a web server at and is available for download.  相似文献   

20.
Since Anfinsen demonstrated that the information encoded in a protein’s amino acid sequence determines its structure in 1973, solving the protein structure prediction problem has been the Holy Grail of structural biology. The goal of protein structure prediction approaches is to utilize computational modeling to determine the spatial location of every atom in a protein molecule starting from only its amino acid sequence. Depending on whether homologous structures can be found in the Protein Data Bank (PDB), structure prediction methods have been historically categorized as template-based modeling (TBM) or template-free modeling (FM) approaches. Until recently, TBM has been the most reliable approach to predicting protein structures, and in the absence of reliable templates, the modeling accuracy sharply declines. Nevertheless, the results of the most recent community-wide assessment of protein structure prediction experiment (CASP14) have demonstrated that the protein structure prediction problem can be largely solved through the use of end-to-end deep machine learning techniques, where correct folds could be built for nearly all single-domain proteins without using the PDB templates. Critically, the model quality exhibited little correlation with the quality of available template structures, as well as the number of sequence homologs detected for a given target protein. Thus, the implementation of deep-learning techniques has essentially broken through the 50-year-old modeling border between TBM and FM approaches and has made the success of high-resolution structure prediction significantly less dependent on template availability in the PDB library.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号