首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 511 毫秒
1.
For a minimalist model of protein folding, which we introduced recently, we investigate various methods to obtain folding sequences. A detailed study of random sequences shows that, for this model, such sequences usually do not fold to their ground states during simulations. Straightforward techniques for the construction of folding sequences, based solely on the target structure, fail. We describe in detail an optimization algorithm, based on genetic algorithms, for the “simulated breeding” of folding sequences in this model. We find that, for any target structure studied, there is not only a single folding sequence but a patch of sequences in sequence space that fold to this structure. In addition, we show that, much as in real proteins, nonhomologous sequences may fold to the same target structure. © 1997 John Wiley & Sons, Inc.  相似文献   

2.
Computational protein design is a reverse procedure of protein folding and structure prediction, where constructing structures from evolutionarily related proteins has been demonstrated to be the most reliable method for protein 3-dimensional structure prediction. Following this spirit, we developed a novel method to design new protein sequences based on evolutionarily related protein families. For a given target structure, a set of proteins having similar fold are identified from the PDB library by structural alignments. A structural profile is then constructed from the protein templates and used to guide the conformational search of amino acid sequence space, where physicochemical packing is accommodated by single-sequence based solvation, torsion angle, and secondary structure predictions. The method was tested on a computational folding experiment based on a large set of 87 protein structures covering different fold classes, which showed that the evolution-based design significantly enhances the foldability and biological functionality of the designed sequences compared to the traditional physics-based force field methods. Without using homologous proteins, the designed sequences can be folded with an average root-mean-square-deviation of 2.1 Å to the target. As a case study, the method is extended to redesign all 243 structurally resolved proteins in the pathogenic bacteria Mycobacterium tuberculosis, which is the second leading cause of death from infectious disease. On a smaller scale, five sequences were randomly selected from the design pool and subjected to experimental validation. The results showed that all the designed proteins are soluble with distinct secondary structure and three have well ordered tertiary structure, as demonstrated by circular dichroism and NMR spectroscopy. Together, these results demonstrate a new avenue in computational protein design that uses knowledge of evolutionary conservation from protein structural families to engineer new protein molecules of improved fold stability and biological functionality.  相似文献   

3.
Yunqi Li  Yang Zhang 《Proteins》2009,76(3):665-676
Protein structure prediction approaches usually perform modeling simulations based on reduced representation of protein structures. For biological utilizations, it is an important step to construct full atomic models from the reduced structure decoys. Most of the current full atomic model reconstruction procedures have defects which either could not completely remove the steric clashes among backbone atoms or generate final atomic models with worse topology similarity relative to the native structures than the reduced models. In this work, we develop a new protocol, called REMO, to generate full atomic protein models by optimizing the hydrogen‐bonding network with basic fragments matched from a newly constructed backbone isomer library of solved protein structures. The algorithm is benchmarked on 230 nonhomologous proteins with reduced structure decoys generated by I‐TASSER simulations. The results show that REMO has a significant ability to remove steric clashes, and meanwhile retains good topology of the reduced model. The hydrogen‐bonding network of the final models is dramatically improved during the procedure. The REMO algorithm has been exploited in the recent CASP8 experiment which demonstrated significant improvements of the I‐TASSER models in both atomic‐level structural refinement and hydrogen‐bonding network construction. Proteins 2009. © 2009 Wiley‐Liss, Inc.  相似文献   

4.
Naturally occurring proteins comprise a special subset of all plausible sequences and structures selected through evolution. Simulating protein evolution with simplified and all-atom models has shed light on the evolutionary dynamics of protein populations, the nature of evolved sequences and structures, and the extent to which today's proteins are shaped by selection pressures on folding, structure and function. Extensive mapping of the native structure, stability and folding rate in sequence space using lattice proteins has revealed organizational principles of the sequence/structure map important for evolutionary dynamics. Evolutionary simulations with lattice proteins have highlighted the importance of fitness landscapes, evolutionary mechanisms, population dynamics and sequence space entropy in shaping the generic properties of proteins. Finally, evolutionary-like simulations with all-atom models, in particular computational protein design, have helped identify the dominant selection pressures on naturally occurring protein sequences and structures.  相似文献   

5.
Multistate computational protein design (MSD) with backbone ensembles approximating conformational flexibility can predict higher quality sequences than single‐state design with a single fixed backbone. However, it is currently unclear what characteristics of backbone ensembles are required for the accurate prediction of protein sequence stability. In this study, we aimed to improve the accuracy of protein stability predictions made with MSD by using a variety of backbone ensembles to recapitulate the experimentally measured stability of 85 Streptococcal protein G domain β1 sequences. Ensembles tested here include an NMR ensemble as well as those generated by molecular dynamics (MD) simulations, by Backrub motions, and by PertMin, a new method that we developed involving the perturbation of atomic coordinates followed by energy minimization. MSD with the PertMin ensembles resulted in the most accurate predictions by providing the highest number of stable sequences in the top 25, and by correctly binning sequences as stable or unstable with the highest success rate (≈90%) and the lowest number of false positives. The performance of PertMin ensembles is due to the fact that their members closely resemble the input crystal structure and have low potential energy. Conversely, the NMR ensemble as well as those generated by MD simulations at 500 or 1000 K reduced prediction accuracy due to their low structural similarity to the crystal structure. The ensembles tested herein thus represent on‐ or off‐target models of the native protein fold and could be used in future studies to design for desired properties other than stability. Proteins 2014; 82:771–784. © 2013 Wiley Periodicals, Inc.  相似文献   

6.
Protein folding and design are major biophysical problems, the solution of which would lead to important applications especially in medicine. Here we provide evidence of how a novel parametrization of the Caterpillar model may be used for both quantitative protein design and folding. With computer simulations it is shown that, for a large set of real protein structures, the model produces designed sequences with similar physical properties to the corresponding natural occurring sequences. The designed sequences require further experimental testing. For an independent set of proteins, previously used as benchmark, the correct folded structure of both the designed and the natural sequences is also demonstrated. The equilibrium folding properties are characterized by free energy calculations. The resulting free energy profiles not only are consistent among natural and designed proteins, but also show a remarkable precision when the folded structures are compared to the experimentally determined ones. Ultimately, the updated Caterpillar model is unique in the combination of its fundamental three features: its simplicity, its ability to produce natural foldable designed sequences, and its structure prediction precision. It is also remarkable that low frustration sequences can be obtained with such a simple and universal design procedure, and that the folding of natural proteins shows funnelled free energy landscapes without the need of any potentials based on the native structure.  相似文献   

7.
Designing protein sequences that can fold into a given structure is a well‐known inverse protein‐folding problem. One important characteristic to attain for a protein design program is the ability to recover wild‐type sequences given their native backbone structures. The highest average sequence identity accuracy achieved by current protein‐design programs in this problem is around 30%, achieved by our previous system, SPIN. SPIN is a program that predicts sequences compatible with a provided structure using a neural network with fragment‐based local and energy‐based nonlocal profiles. Our new model, SPIN2, uses a deep neural network and additional structural features to improve on SPIN. SPIN2 achieves over 34% in sequence recovery in 10‐fold cross‐validation and independent tests, a 4% improvement over the previous version. The sequence profiles generated from SPIN2 are expected to be useful for improving existing fold recognition and protein design techniques. SPIN2 is available at http://sparks-lab.org .  相似文献   

8.
Currently, one of the most serious problems in protein-folding simulations for de novo structure prediction is conformational sampling of medium-to-large proteins. In vivo, folding of these proteins is mediated by molecular chaperones. Inspired by the functions of chaperonins, we designed a simple chaperonin-like simulation protocol within the framework of the standard fragment assembly method: in our protocol, the strength of the hydrophobic interaction is periodically modulated to help the protein escape from misfolded structures. We tested this protocol for 38 proteins and found that, using a certain defined criterion of success, our method could successfully predict the native structures of 14 targets, whereas only those of 10 targets were successfully predicted using the standard protocol. In particular, for non-α-helical proteins, our method yielded significantly better predictions than the standard approach. This chaperonin-inspired protocol that enhanced de novo structure prediction using folding simulations may, in turn, provide new insights into the working principles underlying the chaperonin system.  相似文献   

9.
Systematic Monte Carlo simulations of simple lattice models show that the final stage of protein folding is an ordered process where native contacts get locked (i.e., the residues come into contact and remain in contact for the duration of the folding process) in a well‐defined order. The detailed study of the folding dynamics of protein‐like sequences designed as to exhibit different contact energy distributions, as well as different degrees of sequence optimization (i.e., participation of non‐native interactions in the folding process), reveals significant differences in the corresponding locking scenarios—the collection of native contacts and their average locking times, which are largely ascribable to the dynamics of non‐native contacts. Furthermore, strong evidence for a positive role played by non‐native contacts at an early folding stage was also found. Interestingly, for topologically simple target structures, a positive interplay between native and non‐native contacts is observed also toward the end of the folding process, suggesting that non‐native contacts may indeed affect the overall folding process. For target models exhibiting clear two‐state kinetics, the relation between the nucleation mechanism of folding and the locking scenario is investigated. Our results suggest that the stabilization of the folding transition state can be achieved through the establishment of a very small network of native contacts that are the first to lock during the folding process.  相似文献   

10.
There have been steady improvements in protein structure prediction during the past 2 decades. However, current methods are still far from consistently predicting structural models accurately with computing power accessible to common users. Toward achieving more accurate and efficient structure prediction, we developed a number of novel methods and integrated them into a software package, MUFOLD. First, a systematic protocol was developed to identify useful templates and fragments from Protein Data Bank for a given target protein. Then, an efficient process was applied for iterative coarse‐grain model generation and evaluation at the Cα or backbone level. In this process, we construct models using interresidue spatial restraints derived from alignments by multidimensional scaling, evaluate and select models through clustering and static scoring functions, and iteratively improve the selected models by integrating spatial restraints and previous models. Finally, the full‐atom models were evaluated using molecular dynamics simulations based on structural changes under simulated heating. We have continuously improved the performance of MUFOLD by using a benchmark of 200 proteins from the Astral database, where no template with >25% sequence identity to any target protein is included. The average root‐mean‐square deviation of the best models from the native structures is 4.28 Å, which shows significant and systematic improvement over our previous methods. The computing time of MUFOLD is much shorter than many other tools, such as Rosetta. MUFOLD demonstrated some success in the 2008 community‐wide experiment for protein structure prediction CASP8. Proteins 2010. © 2009 Wiley‐Liss, Inc.  相似文献   

11.
A significant number of protein sequences in a given proteome have no obvious evolutionarily related protein in the database of solved protein structures, the PDB. Under these conditions, ab initio or template-free modeling methods are the sole means of predicting protein structure. To assess its expected performance on proteomes, the TASSER structure prediction algorithm is benchmarked in the ab initio limit on a representative set of 1129 nonhomologous sequences ranging from 40 to 200 residues that cover the PDB at 30% sequence identity and which adopt alpha, alpha + beta, and beta secondary structures. For sequences in the 40-100 (100-200) residue range, as assessed by their root mean square deviation from native, RMSD, the best of the top five ranked models of TASSER has a global fold that is significantly close to the native structure for 25% (16%) of the sequences, and with a correct identification of the structure of the protein core for 59% (36%). In the absence of a native structure, the structural similarity among the top five ranked models is a moderately reliable predictor of folding accuracy. If we classify the sequences according to their secondary structure content, then 64% (36%) of alpha, 43% (24%) of alpha + beta, and 20% (12%) of beta sequences in the 40-100 (100-200) residue range have a significant TM-score (TM-score > or = 0.4). TASSER performs best on helical proteins because there are less secondary structural elements to arrange in a helical protein than in a beta protein of equal length, since the average length of a helix is longer than that of a strand. In addition, helical proteins have shorter loops and dangling tails. If we exclude these flexible fragments, then TASSER has similar accuracy for sequences containing the same number of secondary structural elements, irrespective of whether they are helices and/or strands. Thus, it is the effective configurational entropy of the protein that dictates the average likelihood of correctly arranging the secondary structure elements.  相似文献   

12.
Joshi S  Rana S  Wangikar P  Durani S 《Biopolymers》2006,83(2):122-134
Artificial proteins potentially barrier-free in the folding kinetics are approached computationally under the guidance of protein-folding theories. The smallest and fastest folding globular protein triple-helix-bundle (THB) is so modified as to minimize or eliminate its presumed barriers in folding speed. As the barriers may reside in the ordering of either secondary or tertiary structure, the elements of both secondary and tertiary structure in the protein are targeted for prenucleation with suitable stereochemically constrained amino acid residues. The required elements of topology and sequence for the THB are optimized independently; first the topology is optimized with simulated annealing in polypeptides of highly simplified alphabet; next, the sequence in side chains is optimized using the standard inverse design methods. The resultant three best-adapted THBs, variable in topology and distinctive in sequences, are assessed by comparing them with a few benchmark proteins. The results of mainly molecular dynamics (MD) comparisons, undertaken in explicit water at different temperatures, show that the designed sequences are favorably placed against the chosen benchmarks as THB proteins potentially thermostable in the native folds. Folding simulation experiments with MD establish that the designed sequences are rapid in the folding of individual helices, but not in the evolution of tertiary structure; energetic cum topological frustrations remain but could be the artifacts of the starting conformations that were chosen in the THBs in the folding simulations. Overall, a practical high-throughput approach for de novo protein design has been developed that may have fruitful application for any type of tertiary structure.  相似文献   

13.
Models of protein energetics that neglect interactions between amino acids that are not adjacent in the native state, such as the Gō model, encode or underlie many influential ideas on protein folding. Implicit in this simplification is a crucial assumption that has never been critically evaluated in a broad context: Detailed mechanisms of protein folding are not biased by nonnative contacts, typically argued to be a consequence of sequence design and/or topology. Here we present, using computer simulations of a well-studied lattice heteropolymer model, the first systematic test of this oft-assumed correspondence over the statistically significant range of hundreds of thousands of amino acid sequences that fold to the same native structure. Contrary to previous conjectures, we find a multiplicity of folding mechanisms, suggesting that Gō-like models cannot be justified by considerations of topology alone. Instead, we find that the crucial factor in discriminating among topological pathways is the heterogeneity of native contact energies: The order in which native contacts accumulate is profoundly insensitive to omission of nonnative interactions, provided that native contact heterogeneity is retained. This robustness holds over a surprisingly wide range of folding rates for our designed sequences. Mirroring predictions based on the principle of minimum frustration, fast-folding sequences match their Gō-like counterparts in both topological mechanism and transit times. Less optimized sequences dwell much longer in the unfolded state and/or off-pathway intermediates than do Gō-like models. For dynamics that bridge unfolded and unfolded states, however, even slow folders exhibit topological mechanisms and transit times nearly identical with those of their Gō-like counterparts. Our results do not imply a direct correspondence between folding trajectories of Gō-like models and those of real proteins, but they do help to clarify key topological and energetic assumptions that are commonly used to justify such caricatures.  相似文献   

14.
Scott KA  Daggett V 《Biochemistry》2007,46(6):1545-1556
The problem of how a protein folds from a linear chain of amino acids to the three-dimensional structure necessary for function is often investigated using proteins with a low degree of sequence identity that adopt different folds. The design of pairs of proteins with a high degree of sequence identity but different folds offers the opportunity for a complementary study; in two highly similar sequences, which residues are the most important in directing folding to a particular structure? Here we use molecular dynamics simulations to characterize the folding-unfolding pathways of a pair of proteins designed by Bryan and co-workers [Alexander, P. A., et al. (2005) Biochemistry 44, 14045-14054; He, Y. N., et al. (2005) Biochemistry 44, 14055-14061]. Despite being 59% identical, the two protein sequences fold to two different structures. The first sequence folds to the alpha+beta protein G structure and the second to the all-alpha-helical protein A structure. We show that the final protein structure is determined early along the folding pathway. In folding to the protein G structure, the single alpha-helix (alpha1) and the beta3-beta4 turn fold early. Formation of the hairpin turn essentially prevents folding to helical structure in this region of the protein. This early structure is then consolidated by formation of long-range hydrophobic interactions between alpha1 and the beta3-beta4 turn. The protein A sequence differs both in the residues that form the beta3-beta4 turn and also in many of the residues that form the early hydrophobic interactions in the protein G structure. Instead, in the protein A sequence, a more hierarchical mechanism is observed, with helices folding before many of the tertiary interactions are formed. We find that small, but critical, sequence differences determine the topology of the protein early along the folding pathway, which help to explain the process by which one fold can evolve into another.  相似文献   

15.
De novo sequence design of foldable proteins provides a way of investigating principles of protein architecture. We performed fully automated sequence design for a target structure having a three-helix bundle topology and synthesized the designed sequences. Our design principle is different from the conventional approach, in that instead of optimizing interactions within the target structure, we design the global shape of the protein folding funnel. This includes automated implementation of negative design by explicitly requiring higher free energy of the denatured state. The designed sequences do not have significant similarity to those of any natural proteins. The NMR and CD spectroscopic data indicated that one designed sequence has a well-defined three-dimensional structure as well as alpha-helical content consistent with the target.  相似文献   

16.
Xu D  Zhang Y 《Proteins》2012,80(7):1715-1735
Ab initio protein folding is one of the major unsolved problems in computational biology owing to the difficulties in force field design and conformational search. We developed a novel program, QUARK, for template-free protein structure prediction. Query sequences are first broken into fragments of 1-20 residues where multiple fragment structures are retrieved at each position from unrelated experimental structures. Full-length structure models are then assembled from fragments using replica-exchange Monte Carlo simulations, which are guided by a composite knowledge-based force field. A number of novel energy terms and Monte Carlo movements are introduced and the particular contributions to enhancing the efficiency of both force field and search engine are analyzed in detail. QUARK prediction procedure is depicted and tested on the structure modeling of 145 nonhomologous proteins. Although no global templates are used and all fragments from experimental structures with template modeling score >0.5 are excluded, QUARK can successfully construct 3D models of correct folds in one-third cases of short proteins up to 100 residues. In the ninth community-wide Critical Assessment of protein Structure Prediction experiment, QUARK server outperformed the second and third best servers by 18 and 47% based on the cumulative Z-score of global distance test-total scores in the FM category. Although ab initio protein folding remains a significant challenge, these data demonstrate new progress toward the solution of the most important problem in the field.  相似文献   

17.
The protein folding problem is often studied by comparing the mechanisms of proteins sharing the same structure but different sequence. The recent design of the two proteins GA88 and GB88, displaying different structures and functions while sharing 88% sequence identity (49 out of 56 amino acids), allows the unique opportunity for a complementary approach. At which stage of its folding pathway does a protein commit to a given topology? Which residues are crucial in directing folding mechanisms to a given structure? By using a combination of biophysical and computational techniques, we have characterized the folding of both GA88 and GB88. We show that, contrary to expectation, GB88, characterized by a native α+β fold, displays in the denatured state a content of native-like helical structure greater than GA88, which is all-α in its native state. Both experiments and simulations indicate that such residual structure may be tuned by changing pH. Thus, despite the high sequence identity, the folding pathways for these two proteins appear to diverge as early as in the denatured state. Our results suggest a mechanism whereby protein topology is committed very early along the folding pathway, being imprinted in the residual structure of the denatured state.  相似文献   

18.
Three-dimensional RNA structure prediction and folding is of significant interest in the biological research community. Here, we present iFoldRNA, a novel web-based methodology for RNA structure prediction with near atomic resolution accuracy and analysis of RNA folding thermodynamics. iFoldRNA rapidly explores RNA conformations using discrete molecular dynamics simulations of input RNA sequences. Starting from simplified linear-chain conformations, RNA molecules (<50 nt) fold to native-like structures within half an hour of simulation, facilitating rapid RNA structure prediction. All-atom reconstruction of energetically stable conformations generates iFoldRNA predicted RNA structures. The predicted RNA structures are within 2-5 A root mean squre deviations (RMSDs) from corresponding experimentally derived structures. RNA folding parameters including specific heat, contact maps, simulation trajectories, gyration radii, RMSDs from native state, fraction of native-like contacts are accessible from iFoldRNA. We expect iFoldRNA will serve as a useful resource for RNA structure prediction and folding thermodynamic analyses. AVAILABILITY: http://iFoldRNA.dokhlab.org.  相似文献   

19.
20.
Protein folding research during the past decade has emphasized the dominant role of native state topology in determining the speed and mechanism of folding for small proteins; this has been illustrated by simulations using minimalist protein models. The advantages of minimalist protein models lie in their ability to rapidly collect meaningful statistics about folding pathways and kinetics, their ease of characterization with coarse-grained order parameters and their concentration on the essential physics of the problem to connect with experimental observables for a target protein. The maturation of experimental protein folding has driven the need for more quantitative protein simulations to better understand the balance between sequence details and fold topology. In the past year, we have seen the emergence of more complex minimalist models, ranging from all-atom Gō potentials to coarse-grained bead models in which Gō interactions are replaced or supplemented by more physically motivated potentials. The reduced computational cost at the coarse-grained level of abstraction will potentially enable both folding studies on a genomic scale and systematic application in protein design.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号