首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Protein design aims at designing new protein molecules of desired structure and functionality. One of the major obstacles to large-scale protein design are the extensive time and manpower requirements for experimental validation of designed sequences. Recent advances in protein structure prediction have provided potentials for an automated assessment of the designed sequences via folding simulations. We present a new protocol for protein design and validation. The sequence space is initially searched by Monte Carlo sampling guided by a public atomic potential, with candidate sequences selected by the clustering of sequence decoys. The designed sequences are then assessed by I-TASSER folding simulations, which generate full-length atomic structural models by the iterative assembly of threading fragments. The protocol is tested on 52 nonhomologous single-domain proteins, with an average sequence identity of 24% between the designed sequences and the native sequences. Despite this low sequence identity, three-dimensional models predicted for the first designed sequence have an RMSD of < 2 Å to the target structure in 62% of cases. This percentage increases to 77% if we consider the three-dimensional models from the top 10 designed sequences. Such a striking consistency between the target structure and the structural prediction from nonhomologous sequences, despite the fact that the design and folding algorithms adopt completely different force fields, indicates that the design algorithm captures the features essential to the global fold of the target. On average, the designed sequences have a free energy that is 0.39 kcal/(mol residue) lower than in the native sequences, potentially affording a greater stability to synthesized target folds.  相似文献   

2.
The protein folding problem: when will it be solved?   总被引:5,自引:0,他引:5  
The protein folding problem can be viewed as three different problems: defining the thermodynamic folding code; devising a good computational structure prediction algorithm; and answering Levinthal's question regarding the kinetic mechanism of how proteins can fold so quickly. Once regarded as a grand challenge, protein folding has seen much progress in recent years. Folding codes are now being used to successfully design proteins and non-biological foldable polymers; aided by the Critical Assessment of Techniques for Structure Prediction (CASP) competition, protein structure prediction has now become quite good. Even the once-challenging Levinthal puzzle now seems to have an answer--a protein can avoid searching irrelevant conformations and fold quickly by making local independent decisions first, followed by non-local global decisions later.  相似文献   

3.
Internal symmetry is commonly observed in the majority of fundamental protein folds. Meanwhile, sufficient evidence suggests that nascent polypeptide chains of proteins have the potential to start the co-translational folding process and this process allows mRNA to contain additional information on protein structure. In this paper, we study the relationship between gene sequences and protein structures from the viewpoint of symmetry to explore how gene sequences code for structural symmetry in proteins. We found that, for a set of two-fold symmetric proteins from left-handed beta-helix fold, intragenic symmetry always exists in their corresponding gene sequences. Meanwhile, codon usage bias and local mRNA structure might be involved in modulating translation speed for the formation of structural symmetry: a major decrease of local codon usage bias in the middle of the codon sequence can be identified as a common feature; and major or consecutive decreases in local mRNA folding energy near the boundaries of the symmetric substructures can also be observed. The results suggest that gene duplication and fusion may be an evolutionarily conserved process for this protein fold. In addition, the usage of rare codons and the formation of higher order of secondary structure near the boundaries of symmetric substructures might have coevolved as conserved mechanisms to slow down translation elongation and to facilitate effective folding of symmetric substructures. These findings provide valuable insights into our understanding of the mechanisms of translation and its evolution, as well as the design of proteins via symmetric modules.  相似文献   

4.

Background  

Ever since the ground-breaking work of Anfinsen et al. in which a denatured protein was found to refold to its native state, it has been frequently stated by the protein fold prediction community that all the information required for protein folding lies in the amino acid sequence. Recent in vitro experiments and in silico computational studies, however, have shown that cotranslation may affect the folding pathway of some proteins, especially those of ancient folds. In this paper aspects of cotranslational folding have been incorporated into a protein structure prediction algorithm by adapting the Rosetta program to fold proteins as the nascent chain elongates. This makes it possible to conduct a pairwise comparison of folding accuracy, by comparing folds created sequentially from each end of the protein.  相似文献   

5.
Protein functional annotation relies on the identification of accurate relationships, sequence divergence being a key factor. This is especially evident when distant protein relationships are demonstrated only with three-dimensional structures. To address this challenge, we describe a computational approach to purposefully bridge gaps between related protein families through directed design of protein-like “linker” sequences. For this, we represented SCOP domain families, integrated with sequence homologues, as multiple profiles and performed HMM-HMM alignments between related domain families. Where convincing alignments were achieved, we applied a roulette wheel-based method to design 3,611,010 protein-like sequences corresponding to 374 SCOP folds. To analyze their ability to link proteins in homology searches, we used 3024 queries to search two databases, one containing only natural sequences and another one additionally containing designed sequences. Our results showed that augmented database searches showed up to 30% improvement in fold coverage for over 74% of the folds, with 52 folds achieving all theoretically possible connections. Although sequences could not be designed between some families, the availability of designed sequences between other families within the fold established the sequence continuum to demonstrate 373 difficult relationships. Ultimately, as a practical and realistic extension, we demonstrate that such protein-like sequences can be “plugged-into” routine and generic sequence database searches to empower not only remote homology detection but also fold recognition. Our richly statistically supported findings show that complementary searches in both databases will increase the effectiveness of sequence-based searches in recognizing all homologues sharing a common fold.  相似文献   

6.
Protein folding and design are major biophysical problems, the solution of which would lead to important applications especially in medicine. Here we provide evidence of how a novel parametrization of the Caterpillar model may be used for both quantitative protein design and folding. With computer simulations it is shown that, for a large set of real protein structures, the model produces designed sequences with similar physical properties to the corresponding natural occurring sequences. The designed sequences require further experimental testing. For an independent set of proteins, previously used as benchmark, the correct folded structure of both the designed and the natural sequences is also demonstrated. The equilibrium folding properties are characterized by free energy calculations. The resulting free energy profiles not only are consistent among natural and designed proteins, but also show a remarkable precision when the folded structures are compared to the experimentally determined ones. Ultimately, the updated Caterpillar model is unique in the combination of its fundamental three features: its simplicity, its ability to produce natural foldable designed sequences, and its structure prediction precision. It is also remarkable that low frustration sequences can be obtained with such a simple and universal design procedure, and that the folding of natural proteins shows funnelled free energy landscapes without the need of any potentials based on the native structure.  相似文献   

7.
Diverse proteins with similar structures are grouped into families of homologs and analogs, if their sequence similarity is higher or lower, respectively, than 20%–30%. It was suggested that protein homologs and analogs originate from a common ancestor and diverge in their distinct evolutionary time scales, emerging as a consequence of the physical properties of the protein sequence space. Although a number of studies have determined key signatures of protein family organization, the sequence-structure factors that differentiate the two evolution-related protein families remain unknown. Here, we stipulate that subtle structural changes, which appear due to accumulating mutations in the homologous families, lead to distinct packing of the protein core and, thus, novel compositions of core residues. The latter process leads to the formation of distinct families of homologs. We propose that such differentiation results in the formation of analogous families. To test our postulate, we developed a molecular modeling and design toolkit, Medusa, to computationally design protein sequences that correspond to the same fold family. We find that analogous proteins emerge when a backbone structure deviates only 1–2 Å root-mean-square deviation from the original structure. For close homologs, core residues are highly conserved. However, when the overall sequence similarity drops to ~25%–30%, the composition of core residues starts to diverge, thereby forming novel families of protein homologs. This direct observation of the formation of protein homologs within a specific fold family supports our hypothesis. The conservation of amino acids in designed sequences recapitulates that of the naturally occurring sequences, thereby validating our computational design methodology.  相似文献   

8.
The aim of de novo protein design is to find the amino acid sequences that will fold into a desired 3-dimensional structure with improvements in specific properties, such as binding affinity, agonist or antagonist behavior, or stability, relative to the native sequence. Protein design lies at the center of current advances drug design and discovery. Not only does protein design provide predictions for potentially useful drug targets, but it also enhances our understanding of the protein folding process and protein-protein interactions. Experimental methods such as directed evolution have shown success in protein design. However, such methods are restricted by the limited sequence space that can be searched tractably. In contrast, computational design strategies allow for the screening of a much larger set of sequences covering a wide variety of properties and functionality. We have developed a range of computational de novo protein design methods capable of tackling several important areas of protein design. These include the design of monomeric proteins for increased stability and complexes for increased binding affinity.To disseminate these methods for broader use we present Protein WISDOM (http://www.proteinwisdom.org), a tool that provides automated methods for a variety of protein design problems. Structural templates are submitted to initialize the design process. The first stage of design is an optimization sequence selection stage that aims at improving stability through minimization of potential energy in the sequence space. Selected sequences are then run through a fold specificity stage and a binding affinity stage. A rank-ordered list of the sequences for each step of the process, along with relevant designed structures, provides the user with a comprehensive quantitative assessment of the design. Here we provide the details of each design method, as well as several notable experimental successes attained through the use of the methods.  相似文献   

9.
Typically, protein spatial structures are more conserved in evolution than amino acid sequences. However, the recent explosion of sequence and structure information accompanied by the development of powerful computational methods led to the accumulation of examples of homologous proteins with globally distinct structures. Significant sequence conservation, local structural resemblance, and functional similarity strongly indicate evolutionary relationships between these proteins despite pronounced structural differences at the fold level. Several mechanisms such as insertions/deletions/substitutions, circular permutations, and rearrangements in beta-sheet topologies account for the majority of detected structural irregularities. The existence of evolutionarily related proteins that possess different folds brings new challenges to the homology modeling techniques and the structure classification strategies and offers new opportunities for protein design in experimental studies.  相似文献   

10.

Background  

The reliable prediction of protein tertiary structure from the amino acid sequence remains challenging even for small proteins. We have developed an all-atom free-energy protein forcefield (PFF01) that we could use to fold several small proteins from completely extended conformations. Because the computational cost of de-novo folding studies rises steeply with system size, this approach is unsuitable for structure prediction purposes. We therefore investigate here a low-cost free-energy relaxation protocol for protein structure prediction that combines heuristic methods for model generation with all-atom free-energy relaxation in PFF01.  相似文献   

11.
Functional RNA structures tend to be conserved during evolution. This finding is, for example, exploited by comparative methods for RNA secondary structure prediction that currently provide the state-of-art in terms of prediction accuracy. We here provide strong evidence that homologous RNA genes not only fold into similar final RNA structures, but that their folding pathways also share common transient structural features that have been evolutionarily conserved. For this, we compile and investigate a non-redundant data set of 32 sequences with known transient and final RNA secondary structures and devise a dedicated computational analysis pipeline.  相似文献   

12.
Computational protein design has progressed rapidly over the last years. A number of design methods have been proposed and tested. In this paper, we report the successful application of a fragment-based method for protein design. The method uses statistical information on tetrapeptide backbone conformations. The previously published artificial fold of TOP 7 (Kuhlman et al., Science, 2003; 302:1364-1368) was chosen as template. A series of polypeptide sequences were created that were predicted to fold into this target structure. Two of the designed proteins, M5 and M7, were expressed and characterized by fluorescence spectroscopy, circular dichroism and NMR. They showed the hallmarks of well-ordered tertiary structure as well as cooperative folding/unfolding transitions. Furthermore, the two novel proteins were found to be highly stable against temperature and denaturant-induced unfolding.  相似文献   

13.
Klepeis JL  Wei Y  Hecht MH  Floudas CA 《Proteins》2005,58(3):560-570
Ab initio structure prediction and de novo protein design are two problems at the forefront of research in the fields of structural biology and chemistry. The goal of ab initio structure prediction of proteins is to correctly characterize the 3D structure of a protein using only the amino acid sequence as input. De novo protein design involves the production of novel protein sequences that adopt a desired fold. In this work, the results of a double-blind study are presented in which a new ab initio method was successfully used to predict the 3D structure of a protein designed through an experimental approach using binary patterned combinatorial libraries of de novo sequences. The predicted structure, which was produced before the experimental structure was known and without consideration of the design goals, and the final NMR analysis both characterize this protein as a 4-helix bundle. The similarity of these structures is evidenced by both small RMSD values between the coordinates of the two structures and a detailed analysis of the helical packing.  相似文献   

14.
Protein sequences have evolved to fold into functional structures, resulting in families of diverse protein sequences that all share the same overall fold. One can harness protein family sequence data to infer likely contacts between pairs of residues. In the current study, we combine this kind of inference from coevolutionary information with a coarse‐grained protein force field ordinarily used with single sequence input, the Associative memory, Water mediated, Structure and Energy Model (AWSEM), to achieve improved structure prediction. The resulting Associative memory, Water mediated, Structure and Energy Model with Evolutionary Restraints (AWSEM‐ER) yields a significant improvement in the quality of protein structure prediction over the single sequence prediction from AWSEM when a sufficiently large number of homologous sequences are available. Free energy landscape analysis shows that the addition of the evolutionary term shifts the free energy minimum to more native‐like structures, which explains the improvement in the quality of structures when performing predictions using simulated annealing. Simulations using AWSEM without coevolutionary information have proved useful in elucidating not only protein folding behavior, but also mechanisms of protein function. The success of AWSEM‐ER in de novo structure prediction suggests that the enhanced model opens the door to functional studies of proteins even when no experimentally solved structures are available.  相似文献   

15.
For a minimalist model of protein folding, which we introduced recently, we investigate various methods to obtain folding sequences. A detailed study of random sequences shows that, for this model, such sequences usually do not fold to their ground states during simulations. Straightforward techniques for the construction of folding sequences, based solely on the target structure, fail. We describe in detail an optimization algorithm, based on genetic algorithms, for the “simulated breeding” of folding sequences in this model. We find that, for any target structure studied, there is not only a single folding sequence but a patch of sequences in sequence space that fold to this structure. In addition, we show that, much as in real proteins, nonhomologous sequences may fold to the same target structure. © 1997 John Wiley & Sons, Inc.  相似文献   

16.
MOTIVATION: A method for recognizing the three-dimensional fold from the protein amino acid sequence based on a combination of hidden Markov models (HMMs) and secondary structure prediction was recently developed for proteins in the Mainly-Alpha structural class. Here, this methodology is extended to Mainly-Beta and Alpha-Beta class proteins. Compared to other fold recognition methods based on HMMs, this approach is novel in that only secondary structure information is used. Each HMM is trained from known secondary structure sequences of proteins having a similar fold. Secondary structure prediction is performed for the amino acid sequence of a query protein. The predicted fold of a query protein is the fold described by the model fitting the predicted sequence the best. RESULTS: After model cross-validation, the success rate on 44 test proteins covering the three structural classes was found to be 59%. On seven fold predictions performed prior to the publication of experimental structure, the success rate was 71%. In conclusion, this approach manages to capture important information about the fold of a protein embedded in the length and arrangement of the predicted helices, strands and coils along the polypeptide chain. When a more extensive library of HMMs representing the universe of known structural families is available (work in progress), the program will allow rapid screening of genomic databases and sequence annotation when fold similarity is not detectable from the amino acid sequence. AVAILABILITY: FORESST web server at http://absalpha.dcrt.nih.gov:8008/ for the library of HMMs of structural families used in this paper. FORESST web server at http://www.tigr.org/ for a more extensive library of HMMs (work in progress). CONTACT: valedf@tigr.org; munson@helix.nih.gov; garnier@helix.nih.gov  相似文献   

17.
Scott KA  Daggett V 《Biochemistry》2007,46(6):1545-1556
The problem of how a protein folds from a linear chain of amino acids to the three-dimensional structure necessary for function is often investigated using proteins with a low degree of sequence identity that adopt different folds. The design of pairs of proteins with a high degree of sequence identity but different folds offers the opportunity for a complementary study; in two highly similar sequences, which residues are the most important in directing folding to a particular structure? Here we use molecular dynamics simulations to characterize the folding-unfolding pathways of a pair of proteins designed by Bryan and co-workers [Alexander, P. A., et al. (2005) Biochemistry 44, 14045-14054; He, Y. N., et al. (2005) Biochemistry 44, 14055-14061]. Despite being 59% identical, the two protein sequences fold to two different structures. The first sequence folds to the alpha+beta protein G structure and the second to the all-alpha-helical protein A structure. We show that the final protein structure is determined early along the folding pathway. In folding to the protein G structure, the single alpha-helix (alpha1) and the beta3-beta4 turn fold early. Formation of the hairpin turn essentially prevents folding to helical structure in this region of the protein. This early structure is then consolidated by formation of long-range hydrophobic interactions between alpha1 and the beta3-beta4 turn. The protein A sequence differs both in the residues that form the beta3-beta4 turn and also in many of the residues that form the early hydrophobic interactions in the protein G structure. Instead, in the protein A sequence, a more hierarchical mechanism is observed, with helices folding before many of the tertiary interactions are formed. We find that small, but critical, sequence differences determine the topology of the protein early along the folding pathway, which help to explain the process by which one fold can evolve into another.  相似文献   

18.
One of the major goals of molecular biology is to understand how protein chains fold into a unique 3-dimensional structure. Given this knowledge, perhaps the most exciting prospect will be the possibility of designing new proteins to perform designated tasks, an application that could prove to be of great importance in medicine and biotechnology. It is possible that effective protein design may be achieved without the requirement for a full understanding of the protein folding process. In this paper a simple method is described for designing an amino acid sequence to fit a given 3-dimensional structure. The compatibility of a designed sequence with a given fold is assessed by means of a set of statistically determined potentials (including interresidue pairwise and solvation terms), which have been previously applied to the problem of protein fold recognition. In order to generate sequences that best fit the fold, a genetic algorithm is used, whereby the sequence is optimized by a stochastic search in the style of natural selection.  相似文献   

19.
To test the hypothesis that the folding pathways of evolutionarily related proteins with similar three-dimensional structures but widely different sequences should be similar, the folding pathway of apoleghemoglobin has been characterized using stopped-flow circular dichroism, heteronuclear NMR pulse labeling techniques and mass spectrometry. The pathway of folding was found to differ significantly from that of a protein of the same family, apomyoglobin, although both proteins appear to fold through helical burst phase intermediates. For leghemoglobin, the burst phase intermediate exhibits stable helical structure in the G and H helices, together with a small region in the center of the E helix. The A and B helices are not stabilized until later stages of the folding process. The structure of the burst phase folding intermediate thus differs from that of apomyoglobin, in which stable helical structure is formed in the A, B, G and H helix regions.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号