首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
There are several knowledge-based energy functions that can distinguish the native fold from a pool of grossly misfolded decoys for a given sequence of amino acids. These decoys, which are typically generated by mounting, or “threading”, the sequence onto the backbones of unrelated protein structures, tend to be non-compact and quite different from the native structure: the root-mean-squared (RMS) deviations from the native are commonly in the range of 15 to 20 Å. Effective energy functions should also demonstrate a similar recognition capability when presented with compact decoys that depart only slightly in conformation from the correct structure (i.e. those with RMS deviations of ∼5 Å or less). Recently, we developed a simple yet powerful method for native fold recognition based on the tendency for native folds to form hydrophobic cores. Our energy measure, which we call the hydrophobic fitness score, is challenged to recognize the native fold from 2000 near-native structures generated for each of five small monomeric proteins. First, 1000 conformations for each protein were generated by molecular dynamics simulation at room temperature. The average RMS deviation of this set of 5000 was 1.5 Å. A total of 323 decoys had energies lower than native; however, none of these had RMS deviations greater than 2 Å. Another 1000 structures were generated for each at high temperature, in which a greater range of conformational space was explored (4.3 Å average RMS deviation). Out of this set, only seven decoys were misrecognized. The hydrophobic fitness energy of a conformation is strongly dependent upon the RMS deviation. On average our potential yields energy values which are lowest for the population of structures generated at room temperature, intermediate for those produced at high temperature and highest for those constructed by threading methods. In general, the lowest energy decoy conformations have backbones very close to native structure. The possible utility of our method for screening backbone candidates for the purpose of modelling by side-chain packing optimization is discussed.  相似文献   

2.
We present a knowledge‐based function to score protein decoys based on their similarity to native structure. A set of features is constructed to describe the structure and sequence of the entire protein chain. Furthermore, a qualitative relationship is established between the calculated features and the underlying electromagnetic interaction that dominates this scale. The features we use are associated with residue–residue distances, residue–solvent distances, pairwise knowledge‐based potentials and a four‐body potential. In addition, we introduce a new target to be predicted, the fitness score, which measures the similarity of a model to the native structure. This new approach enables us to obtain information both from decoys and from native structures. It is also devoid of previous problems associated with knowledge‐based potentials. These features were obtained for a large set of native and decoy structures and a back‐propagating neural network was trained to predict the fitness score. Overall this new scoring potential proved to be superior to the knowledge‐based scoring functions used as its inputs. In particular, in the latest CASP (CASP10) experiment our method was ranked third for all targets, and second for freely modeled hard targets among about 200 groups for top model prediction. Ours was the only method ranked in the top three for all targets and for hard targets. This shows that initial results from the novel approach are able to capture details that were missed by a broad spectrum of protein structure prediction approaches. Source codes and executable from this work are freely available at http://mathmed.org /#Software and http://mamiris.com/ . Proteins 2014; 82:752–759. © 2013 Wiley Periodicals, Inc.  相似文献   

3.
《Proteins》2018,86(5):581-591
We compare side chain prediction and packing of core and non‐core regions of soluble proteins, protein‐protein interfaces, and transmembrane proteins. We first identified or created comparable databases of high‐resolution crystal structures of these 3 protein classes. We show that the solvent‐inaccessible cores of the 3 classes of proteins are equally densely packed. As a result, the side chains of core residues at protein‐protein interfaces and in the membrane‐exposed regions of transmembrane proteins can be predicted by the hard‐sphere plus stereochemical constraint model with the same high prediction accuracies (>90%) as core residues in soluble proteins. We also find that for all 3 classes of proteins, as one moves away from the solvent‐inaccessible core, the packing fraction decreases as the solvent accessibility increases. However, the side chain predictability remains high (80% within ) up to a relative solvent accessibility, , for all 3 protein classes. Our results show that % of the interface regions in protein complexes are “core”, that is, densely packed with side chain conformations that can be accurately predicted using the hard‐sphere model. We propose packing fraction as a metric that can be used to distinguish real protein‐protein interactions from designed, non‐binding, decoys. Our results also show that cores of membrane proteins are the same as cores of soluble proteins. Thus, the computational methods we are developing for the analysis of the effect of hydrophobic core mutations in soluble proteins will be equally applicable to analyses of mutations in membrane proteins.  相似文献   

4.
The amino acid sequences of soluble, ordered proteins with stable structures have evolved due to biological and physical requirements, thus distinguishing them from random sequences. Previous analyses have focused on extracting the features that frequently appear in protein substructures, such as α‐helix and β‐sheet, but the universal features of protein sequences have not been addressed. To clarify the differences between native protein sequences and random sequences, we analyzed 7368 soluble, ordered protein sequences, by inspecting the observed and expected occurrences of 400 amino acid pairs in local proximity, up to 10 residues along the sequence in comparison with their expected occurrence in random sequence. We found the trend that the hydrophobic residue pairs and the polar residue pairs are significantly decreased, whereas the pairs between a hydrophobic residue and a polar residue are increased. This trend was universally observed regardless of the secondary structure content but was not observed in protein sequences that include intrinsically disordered regions, indicating that it can be a general rule of protein foldability. The possible benefits of this rule are discussed from the viewpoints of protein aggregation and disorder, which are both caused by low‐complexity regions of hydrophobic or polar residues.  相似文献   

5.
A major challenge of the protein docking problem is to define scoring functions that can distinguish near‐native protein complex geometries from a large number of non‐native geometries (decoys) generated with noncomplexed protein structures (unbound docking). In this study, we have constructed a neural network that employs the information from atom‐pair distance distributions of a large number of decoys to predict protein complex geometries. We found that docking prediction can be significantly improved using two different types of polar hydrogen atoms. To train the neural network, 2000 near‐native decoys of even distance distribution were used for each of the 185 considered protein complexes. The neural network normalizes the information from different protein complexes using an additional protein complex identity input neuron for each complex. The parameters of the neural network were determined such that they mimic a scoring funnel in the neighborhood of the native complex structure. The neural network approach avoids the reference state problem, which occurs in deriving knowledge‐based energy functions for scoring. We show that a distance‐dependent atom pair potential performs much better than a simple atom‐pair contact potential. We have compared the performance of our scoring function with other empirical and knowledge‐based scoring functions such as ZDOCK 3.0, ZRANK, ITScore‐PP, EMPIRE, and RosettaDock. In spite of the simplicity of the method and its functional form, our neural network‐based scoring function achieves a reasonable performance in rigid‐body unbound docking of proteins. Proteins 2010. © 2009 Wiley‐Liss, Inc.  相似文献   

6.
De novo design of the hydrophobic cores of proteins.   总被引:22,自引:17,他引:5       下载免费PDF全文
We have developed and experimentally tested a novel computational approach for the de novo design of hydrophobic cores. A pair of computer programs has been written, the first of which creates a "custom" rotamer library for potential hydrophobic residues, based on the backbone structure of the protein of interest. The second program uses a genetic algorithm to globally optimize for a low energy core sequence and structure, using the custom rotamer library as input. Success of the programs in predicting the sequences of native proteins indicates that they should be effective tools for protein design. Using these programs, we have designed and engineered several variants of the phage 434 cro protein, containing five, seven, or eight sequence changes in the hydrophobic core. As controls, we have produced a variant consisting of a randomly generated core with six sequence changes but equal volume relative to the native core and a variant with a "minimalist" core containing predominantly leucine residues. Two of the designs, including one with eight core sequence changes, have thermal stabilities comparable to the native protein, whereas the third design and the minimalist protein are significantly destabilized. The randomly designed control is completely unfolded under equivalent conditions. These results suggest that rational de novo design of hydrophobic cores is feasible, and stress the importance of specific packing interactions for the stability of proteins. A surprising aspect of the results is that all of the variants display highly cooperative thermal denaturation curves and reasonably dispersed NMR spectra. This suggests that the non-core residues of a protein play a significant role in determining the uniqueness of the folded structure.  相似文献   

7.
We describe the development of a scoring function based on the decomposition P(structure/sequence) proportional to P(sequence/structure) *P(structure), which outperforms previous scoring functions in correctly identifying native-like protein structures in large ensembles of compact decoys. The first term captures sequence-dependent features of protein structures, such as the burial of hydrophobic residues in the core, the second term, universal sequence-independent features, such as the assembly of beta-strands into beta-sheets. The efficacies of a wide variety of sequence-dependent and sequence-independent features of protein structures for recognizing native-like structures were systematically evaluated using ensembles of approximately 30,000 compact conformations with fixed secondary structure for each of 17 small protein domains. The best results were obtained using a core scoring function with P(sequence/structure) parameterized similarly to our previous work (Simons et al., J Mol Biol 1997;268:209-225] and P(structure) focused on secondary structure packing preferences; while several additional features had some discriminatory power on their own, they did not provide any additional discriminatory power when combined with the core scoring function. Our results, on both the training set and the independent decoy set of Park and Levitt (J Mol Biol 1996;258:367-392), suggest that this scoring function should contribute to the prediction of tertiary structure from knowledge of sequence and secondary structure.  相似文献   

8.
Protein residues that are critical for structure and function are expected to be conserved throughout evolution. Here, we investigate the extent to which these conserved residues are clustered in three-dimensional protein structures. In 92% of the proteins in a data set of 79 proteins, the most conserved positions in multiple sequence alignments are significantly more clustered than randomly selected sets of positions. The comparison to random subsets is not necessarily appropriate, however, because the signal could be the result of differences in the amino acid composition of sets of conserved residues compared to random subsets (hydrophobic residues tend to be close together in the protein core), or differences in sequence separation of the residues in the different sets. In order to overcome these limits, we compare the degree of clustering of the conserved positions on the native structure and on alternative conformations generated by the de novo structure prediction method Rosetta. For 65% of the 79 proteins, the conserved residues are significantly more clustered in the native structure than in the alternative conformations, indicating that the clustering of conserved residues in protein structures goes beyond that expected purely from sequence locality and composition effects. The differences in the spatial distribution of conserved residues can be utilized in de novo protein structure prediction: We find that for 79% of the proteins, selection of the Rosetta generated conformations with the greatest clustering of the conserved residues significantly enriches the fraction of close-to-native structures.  相似文献   

9.
Yunqi Li  Yang Zhang 《Proteins》2009,76(3):665-676
Protein structure prediction approaches usually perform modeling simulations based on reduced representation of protein structures. For biological utilizations, it is an important step to construct full atomic models from the reduced structure decoys. Most of the current full atomic model reconstruction procedures have defects which either could not completely remove the steric clashes among backbone atoms or generate final atomic models with worse topology similarity relative to the native structures than the reduced models. In this work, we develop a new protocol, called REMO, to generate full atomic protein models by optimizing the hydrogen‐bonding network with basic fragments matched from a newly constructed backbone isomer library of solved protein structures. The algorithm is benchmarked on 230 nonhomologous proteins with reduced structure decoys generated by I‐TASSER simulations. The results show that REMO has a significant ability to remove steric clashes, and meanwhile retains good topology of the reduced model. The hydrogen‐bonding network of the final models is dramatically improved during the procedure. The REMO algorithm has been exploited in the recent CASP8 experiment which demonstrated significant improvements of the I‐TASSER models in both atomic‐level structural refinement and hydrogen‐bonding network construction. Proteins 2009. © 2009 Wiley‐Liss, Inc.  相似文献   

10.
Convergence of the vast sequence space of proteins into a highly restricted fold/conformational space suggests a simple yet unique underlying mechanism of protein folding that has been the subject of much debate in the last several decades. One of the major challenges related to the understanding of protein folding or in silico protein structure prediction is the discrimination of non-native structures/decoys from the native structure. Applications of knowledge-based potentials to attain this goal have been extensively reported in the literature. Also, scoring functions based on accessible surface area and amino acid neighbourhood considerations were used in discriminating the decoys from native structures. In this article, we have explored the potential of protein structure network (PSN) parameters to validate the native proteins against a large number of decoy structures generated by diverse methods. We are guided by two principles: (a) the PSNs capture the local properties from a global perspective and (b) inclusion of non-covalent interactions, at all-atom level, including the side-chain atoms, in the network construction accommodates the sequence dependent features. Several network parameters such as the size of the largest cluster, community size, clustering coefficient are evaluated and scored on the basis of the rank of the native structures and the Z-scores. The network analysis of decoy structures highlights the importance of the global properties contributing to the uniqueness of native structures. The analysis also exhibits that the network parameters can be used as metrics to identify the native structures and filter out non-native structures/decoys in a large number of data-sets; thus also has a potential to be used in the protein ‘structure prediction’ problem.  相似文献   

11.
Zhou R  Silverman BD  Royyuru AK  Athma P 《Proteins》2003,52(4):561-572
A recent study of 30 soluble globular protein structures revealed a quasi-invariant called the hydrophobic ratio. This invariant, which is the ratio of the distance at which the second order hydrophobic moment vanished to the distance at which the zero order moment vanished, was found to be 0.75 +/- 0.05 for 30 protein structures. This report first describes the results of the hydrophobic profiling of 5,387 non-redundant globular protein domains of the Protein Data Bank, which yields a hydrophobic ratio of 0.71 +/- 0.08. Then, a new hydrophobic score is defined based on the hydrophobic profiling to discriminate native-like proteins from decoy structures. This is tested on three widely used decoy sets, namely the Holm and Sander decoys, Park and Levitt decoys, and Baker decoys. Since the hydrophobic moment profiling characterizes a global feature and requires reasonably good statistics, this imposes a constraint upon the size of the protein structures in order to yield relatively smooth moment profiles. We show that even subject to the limitations of protein size (both Park & Levitt and Baker sets are small protein decoys), the hydrophobic moment profiling and hydrophobic score can provide useful information that should be complementary to the information provided by force field calculations.  相似文献   

12.
The identification of protein–protein interactions is vital for understanding protein function, elucidating interaction mechanisms, and for practical applications in drug discovery. With the exponentially growing protein sequence data, fully automated computational methods that predict interactions between proteins are becoming essential components of system‐level function inference. A thorough analysis of protein complex structures demonstrated that binding site locations as well as the interfacial geometry are highly conserved across evolutionarily related proteins. Because the conformational space of protein–protein interactions is highly covered by experimental structures, sensitive protein threading techniques can be used to identify suitable templates for the accurate prediction of interfacial residues. Toward this goal, we developed eFindSitePPI, an algorithm that uses the three‐dimensional structure of a target protein, evolutionarily remotely related templates and machine learning techniques to predict binding residues. Using crystal structures, the average sensitivity (specificity) of eFindSitePPI in interfacial residue prediction is 0.46 (0.92). For weakly homologous protein models, these values only slightly decrease to 0.40–0.43 (0.91–0.92) demonstrating that eFindSitePPI performs well not only using experimental data but also tolerates structural imperfections in computer‐generated structures. In addition, eFindSitePPI detects specific molecular interactions at the interface; for instance, it correctly predicts approximately one half of hydrogen bonds and aromatic interactions, as well as one third of salt bridges and hydrophobic contacts. Comparative benchmarks against several dimer datasets show that eFindSitePPI outperforms other methods for protein‐binding residue prediction. It also features a carefully tuned confidence estimation system, which is particularly useful in large‐scale applications using raw genomic data. eFindSitePPI is freely available to the academic community at http://www.brylinski.org/efindsiteppi . Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

13.
We present a new structurally derived pair-to-pair substitution matrix (P2PMAT). This matrix is constructed from a very large amount of integrated high quality multiple sequence alignments (Blocks) and protein structures. It evaluates the likelihoods of all 160,000 pair-to-pair substitutions. P2PMAT matrix implicitly accounts for evolutionary conservation, correlated mutations, and residue-residue contact potentials. The usefulness of the matrix for structural predictions is shown in this article. Predicting protein residue-residue contacts from sequence information alone, by our method (P2PConPred) is particularly accurate in the protein cores, where it performs better than other basic contact prediction methods (increasing accuracy by 25-60%). The method mean accuracy for protein cores is 24% for 59 diverse families and 34% for a subset of proteins shorter than 100 residues. This is above the level that was recently shown to be sufficient to significantly improve ab initio protein structure prediction. We also demonstrate the ability of our approach to identify native structures within large sets of (300-2000) protein decoys. On the basis of evolutionary information alone our method ranks the native structure in the top 0.3% of the decoys in 4/10 of the sets, and in 8/10 of sets the native structure is ranked in the top 10% of the decoys. The method can, thus, be used to assist filtering wrong models, complementing traditional scoring functions.  相似文献   

14.
Betancourt MR 《Proteins》2003,53(4):889-907
A protein model that is simple enough to be used in protein-folding simulations but accurate enough to identify a protein native fold is described. Its geometry consists of describing the residues by one, two, or three pseudoatoms, depending on the residue size. Its energy is given by a pairwise, knowledge-based potential obtained for all the pseudoatoms as a function of their relative distance. The pseudoatomic potential is also a function of the primary chain separation and residue order. The model is tested by gapless threading on a large, representative set of known protein and decoy structures obtained from the "Decoys 'R' Us" database. It is also tested by threading on gapped decoys generated for proteins with many homologs. The gapless threading tests show near 98% native-structure recognition as the lowest energy structure and almost 100% as one of the three lowest energy structures for over 2200 test proteins. In decoy threading tests, the model recognized the majority of the native structures. It is also able to recognize native structures among gapped decoys, in spite of close structural similarities. The results indicate that the pseudoatomic model has native recognition ability similar to comparable atomic-based models but much better than equivalent residue-based models.  相似文献   

15.
This work presents a novel C(alpha)--C(alpha) distance dependent force field which is successful in selecting native structures from an ensemble of high resolution near-native conformers. An enhanced and diverse protein set, along with an improved decoy generation technique, contributes to the effectiveness of this potential. High quality decoys were generated for 1489 nonhomologous proteins and used to train an optimization based linear programming formulation. The goal in developing a set of high resolution decoys was to develop a simple, distance-dependent force field that yields the native structure as the lowest energy structure and assigns higher energies to decoy structures that are quite similar as well as those that are less similar. The model also includes a set of physical constraints that were based on experimentally observed physical behavior of the amino acids. The force field was tested on two sets of test decoys not in the training set and was found to excel on all the metrics that are widely used to measure the effectiveness of a force field. The high resolution force field was successful in correctly identifying 113 native structures out of 150 test cases and the average rank obtained for this test was 1.87. All the high resolution structures (training and testing) used for this work are available online and can be downloaded from http://titan.princeton.edu/HRDecoys.  相似文献   

16.
Highly fluorinated analogs of hydrophobic amino acids are well known to increase the stability of proteins toward thermal unfolding and chemical denaturation, but there is very little data on the structural consequences of fluorination. We have determined the structures and folding energies of three variants of a de novo designed 4‐helix bundle protein whose hydrophobic cores contain either hexafluoroleucine (hFLeu) or t‐butylalanine (tBAla). Although the buried hydrophobic surface area is the same for all three proteins, the incorporation of tBAla causes a rearrangement of the core packing, resulting in the formation of a destabilizing hydrophobic cavity at the center of the protein. In contrast, incorporation of hFLeu, causes no changes in core packing with respect to the structure of the nonfluorinated parent protein which contains only leucine in the core. These results support the idea that fluorinated residues are especially effective at stabilizing proteins because they closely mimic the shape of the natural residues they replace while increasing buried hydrophobic surface area.  相似文献   

17.
A low-resolution scoring function for the selection of native and near-native structures from a set of predicted structures for a given protein sequence has been developed. The scoring function, ProVal (Protein Validate), used several variables that describe an aspect of protein structure for which the proximity to the native structure can be assessed quantitatively. Among the parameters included are a packing estimate, surface areas, and the contact order. A partial least squares for latent variables (PLS) model was built for each candidate set of the 28 decoy sets of structures generated for 22 different proteins using the described parameters as independent variables. The C(alpha) RMS of the candidate structures versus the experimental structure was used as the dependent variable. The final generalized scoring function was an average of all models derived, ensuring that the function was not optimized for specific fold classes or method of structure generation of the candidate folds. The results show that the crystal structure was scored best in 64% of the 28 test sets and was clearly separated from the decoys in many examples. In all the other cases in which the crystal structure did not rank first, it ranked within the top 10%. Thus, although ProVal could not distinguish between predicted structures that were similar overall in fold quality due to its inherently low resolution, it can clearly be used as a primary filter to eliminate approximately 90% of fold candidates generated by current prediction methods from all-atom modeling and further evaluation. The correlation between the predicted and actual C(alpha) RMS values varies considerably between the candidate fold sets.  相似文献   

18.
Protein structure refinement by optimization   总被引:1,自引:0,他引:1       下载免费PDF全文
Martin Carlsen  Peter Røgen 《Proteins》2015,83(9):1616-1624
Knowledge‐based protein potentials are simplified potentials designed to improve the quality of protein models, which is important as more accurate models are more useful for biological and pharmaceutical studies. Consequently, knowledge‐based potentials often are designed to be efficient in ordering a given set of deformed structures denoted decoys according to how close they are to the relevant native protein structure. This, however, does not necessarily imply that energy minimization of this potential will bring the decoys closer to the native structure. In this study, we introduce an iterative strategy to improve the convergence of decoy structures. It works by adding energy optimized decoys to the pool of decoys used to construct the next and improved knowledge‐based potential. We demonstrate that this strategy results in significantly improved decoy convergence on Titan high resolution decoys and refinement targets from Critical Assessment of protein Structure Prediction competitions. Our potential is formulated in Cartesian coordinates and has a fixed backbone potential to restricts motions to be close to those of a dihedral model, a fixed hydrogen‐bonding potential and a variable coarse grained carbon alpha potential consisting of a pair potential and a novel solvent potential that are b‐spline based as we use explicit gradient and Hessian for efficient energy optimization. Proteins 2015; 83:1616–1624. © 2015 Wiley Periodicals, Inc.  相似文献   

19.
pi-pi, Cation-pi, and hydrophobic packing interactions contribute specificity to protein folding and stability to the native state. As a step towards developing improved models of these interactions in proteins, we compare the side-chain packing arrangements in native proteins to those found in compact decoys produced by the Rosetta de novo structure prediction method. We find enrichments in the native distributions for T-shaped and parallel offset arrangements of aromatic residue pairs, in parallel stacked arrangements of cation-aromatic pairs, in parallel stacked pairs involving proline residues, and in parallel offset arrangements for aliphatic residue pairs. We then investigate the extent to which the distinctive features of native packing can be explained using Lennard-Jones and electrostatics models. Finally, we derive orientation-dependent pi-pi, cation-pi and hydrophobic interaction potentials based on the differences between the native and compact decoy distributions and investigate their efficacy for high-resolution protein structure prediction. Surprisingly, the orientation-dependent potential derived from the packing arrangements of aliphatic side-chain pairs distinguishes the native structure from compact decoys better than the orientation-dependent potentials describing pi-pi and cation-pi interactions.  相似文献   

20.
Locating sequences compatible with a protein structural fold is the well‐known inverse protein‐folding problem. While significant progress has been made, the success rate of protein design remains low. As a result, a library of designed sequences or profile of sequences is currently employed for guiding experimental screening or directed evolution. Sequence profiles can be computationally predicted by iterative mutations of a random sequence to produce energy‐optimized sequences, or by combining sequences of structurally similar fragments in a template library. The latter approach is computationally more efficient but yields less accurate profiles than the former because of lacking tertiary structural information. Here we present a method called SPIN that predicts Sequence Profiles by Integrated Neural network based on fragment‐derived sequence profiles and structure‐derived energy profiles. SPIN improves over the fragment‐derived profile by 6.7% (from 23.6 to 30.3%) in sequence identity between predicted and wild‐type sequences. The method also reduces the number of residues in low complex regions by 15.7% and has a significantly better balance of hydrophilic and hydrophobic residues at protein surface. The accuracy of sequence profiles obtained is comparable to those generated from the protein design program RosettaDesign 3.5. This highly efficient method for predicting sequence profiles from structures will be useful as a single‐body scoring term for improving scoring functions used in protein design and fold recognition. It also complements protein design programs in guiding experimental design of the sequence library for screening and directed evolution of designed sequences. The SPIN server is available at http://sparks‐lab.org . Proteins 2014; 82:2565–2573. © 2014 Wiley Periodicals, Inc.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号