首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
To what extent does natural selection act to optimize the details of protein folding kinetics? In an effort to address this question, the relationship between an amino acid's evolutionary conservation and its role in protein folding kinetics has been investigated intensively. Despite this effort, no consensus has been reached regarding the degree to which residues involved in native-like transition state structure (the folding nucleus) are conserved. Here we report the results of an exhaustive, systematic study of sequence conservation among residues known to participate in the experimentally (Phi-value) defined folding nuclei of all of the appropriately characterized proteins reported to date. We observe no significant evidence that these residues exhibit any anomalous sequence conservation. We do observe, however, a significant bias in the existing kinetic data: the mean sequence conservation of the residues that have been the subject of kinetic characterization is greater than the mean sequence conservation of all residues in 13 of 14 proteins studied. This systematic experimental bias gives rise to the previous observation that the median conservation of residues reported to participate in the folding nucleus is greater than the median conservation of all of the residues in a protein. When this bias is corrected (by comparing, for example, the conservation of residues known to participate in the folding nucleus with that of other, kinetically characterized residues) the previously reported preferential conservation is effectively eliminated. In contrast to well-established theoretical expectations, both poorly and highly conserved residues are apparently equally likely to participate in the protein-folding nucleus.  相似文献   

2.
Global and co-translational protein folding may both occur in vivo, and understanding the relationship between these folding mechanisms is pivotal to our understanding of protein-structure formation. Within this study, over 1.5 million hydrophobic-polar sequences were classified based on their ability to attain a unique, but not necessarily minimal energy conformation through co-translational folding. The sequence and structure properties of the sets were then compared to elucidate signatures of co-translational folding. The strongest signature of co-translational folding is a reduced number of possible favorable contacts in the amino terminus. There is no evidence of fewer contacts, more local contacts, or less-compact structures. Co-translational folding produces a more compact amino- than carboxy-terminal region and an amino-terminal-biased set of core residues. In real proteins these signatures are also observed and found most strongly in proteins of the alpha/beta structural class of proteins (SCOP) where 71 % have an amino-terminal set of core residues. The prominence of co-translational features in experimentally determined protein structures suggests that the importance of co-translational folding is currently underestimated.  相似文献   

3.
Selection of representative protein data sets.   总被引:37,自引:17,他引:20       下载免费PDF全文
The Protein Data Bank currently contains about 600 data sets of three-dimensional protein coordinates determined by X-ray crystallography or NMR. There is considerable redundancy in the data base, as many protein pairs are identical or very similar in sequence. However, statistical analyses of protein sequence-structure relations require nonredundant data. We have developed two algorithms to extract from the data base representative sets of protein chains with maximum coverage and minimum redundancy. The first algorithm focuses on optimizing a particular property of the selected proteins and works by successive selection of proteins from an ordered list and exclusion of all neighbors of each selected protein. The other algorithm aims at maximizing the size of the selected set and works by successive thinning out of clusters of similar proteins. Both algorithms are generally applicable to other data bases in which criteria of similarity can be defined and relate to problems in graph theory. The largest nonredundant set extracted from the current release of the Protein Data Bank has 155 protein chains. In this set, no two proteins have sequence similarity higher than a certain cutoff (30% identical residues for aligned subsequences longer than 80 residues), yet all structurally unique protein families are represented. Periodically updated lists of representative data sets are available by electronic mail from the file server "netserv@embl-heidelberg.de." The selection may be useful in statistical approaches to protein folding as well as in the analysis and documentation of the known spectrum of three-dimensional protein structures.  相似文献   

4.
Structural features of protein folding nuclei   总被引:1,自引:0,他引:1  
A crucial event of protein folding is the formation of a folding nucleus. We demonstrate the presence of a considerable coincidence between the location of folding nuclei and the location of so-called "root structural motifs", which have unique overall folds and handedness. In the case of proteins with a single root structural motif, the involvement in the formation of a folding nucleus is in average significantly higher for amino acids residues that are in root structural motifs, compared to residues in other parts of the protein. The tests carried out revealed that the observed difference is statistically reliable. Thus, a structural feature that corresponds to the protein folding nucleus is now found.  相似文献   

5.
An important puzzle in structural biology is the question of how proteins are able to fold so quickly into their unique native structures. There is much evidence that protein folding is hierarchic. In that case, folding routes are not linear, but have a tree structure. Trees are commonly used to represent the grammatical structure of natural language sentences, and chart parsing algorithms efficiently search the space of all possible trees for a given input string. Here we show that one such method, the CKY algorithm, can be useful both for providing novel insight into the physical protein folding process, and for computational protein structure prediction. As proof of concept, we apply this algorithm to the HP lattice model of proteins. Our algorithm identifies all direct folding route trees to the native state and allows us to construct a simple model of the folding process. Despite its simplicity, our model provides an account for the fact that folding rates depend only on the topology of the native state but not on sequence composition.  相似文献   

6.
Phylogenetic profiling of amino acid substitution patterns in proteins has led many to conclude that most structural information is carried by interior core residues that are solvent inaccessible. This conclusion is based on the observation that buried residues generally tolerate only conserved sequence changes, while surface residues allow more diverse chemical substitutions. This notion is now changing as it has become apparent that both core and surface residues play important roles in protein folding and stability. Unfortunately, the ability to identify specific mutations that will lead to enhanced stability remains a challenging problem. Here we discuss two mutations that emerged from an in vitro selection experiment designed to improve the folding stability of a non-biological ATP binding protein. These mutations alter two solvent accessible residues, and dramatically enhance the expression, solubility, thermal stability, and ligand binding affinity of the protein. The significance of both mutations was investigated individually and together, and the X-ray crystal structures of the parent sequence and double mutant protein were solved to a resolution limit of 2.8 and 1.65 A, respectively. Comparative structural analysis of the evolved protein to proteins found in nature reveals that our non-biological protein evolved certain structural features shared by many thermophilic proteins. This experimental result suggests that protein fold optimization by in vitro selection offers a viable approach to generating stable variants of many naturally occurring proteins whose structures and functions are otherwise difficult to study.  相似文献   

7.
The impact on protein evolution of the physical laws that govern folding remains obscure. Here, by analyzing in silico-evolved sequences subjected to evolutionary pressure for fast folding, it is shown that: First, a subset of residues in the thermodynamic folding nucleus is mainly responsible for modulating the protein folding rate. Second and most important, the protein topology itself is of paramount importance in determining the location of these residues in the structure. Further stabilization of the interactions in this nucleus leads to fast folding sequences. Third, these nucleation points restrict the sequence space available to the protein during evolution. Correlated mutations between positions around these hot spots arise in a statistically significant manner, and most involve contacting residues. When a similar analysis is carried out on real proteins, qualitatively similar results are obtained.  相似文献   

8.
Dong Q  Wang X  Lin L 《Proteins》2008,72(1):353-366
In recent years, protein structure prediction using local structure information has made great progress. In this study, a novel and effective method is developed to predict the local structure and the folding fragments of proteins. First, the proteins with known structures are split into fragments. Second, these fragments, represented by dihedrals, are clustered to produce the building blocks (BBs). Third, an efficient machine learning method is used to predict the local structures of proteins from sequence profiles. Finally, a bi-gram model, trained by an iterated algorithm, is introduced to simulate the interactions of these BBs. For test proteins, the building-block lattice is constructed, which contains all the folding fragments of the proteins. The local structures and the optimal fragments are then obtained by the dynamic programming algorithm. The experiment is performed on a subset of the PDB database with sequence identity less than 25%. The results show that the performance of the method is better than the method that uses only sequence information. When multiple paths are returned, the average classification accuracy of local structures is 72.27% and the average prediction accuracy of local structures is 67.72%, which is a significant improvement in comparison with previous studies. The method can predict not only the local structures but also the folding fragments of proteins. This work is helpful for the ab initio protein structure prediction and especially, the understanding of the folding process of proteins.  相似文献   

9.
TOUCHSTONEX, a new method for folding proteins that uses a small number of long-range contact restraints derived from NMR experimental NOE (nuclear Overhauser enhancement) data, is described. The method employs a new lattice-based, reduced model of proteins that explicitly represents C(alpha), C(beta), and the sidechain centers of mass. The force field consists of knowledge-based terms to produce protein-like behavior, including various short-range interactions, hydrogen bonding, and one-body, pairwise, and multibody long-range interactions. Contact restraints were incorporated into the force field as an NOE-specific pairwise potential. We evaluated the algorithm using a set of 125 proteins of various secondary structure types and lengths up to 174 residues. Using N/8 simulated, long-range sidechain contact restraints, where N is the number of residues, 108 proteins were folded to a C(alpha)-root-mean-square deviation (RMSD) from native below 6.5 A. The average RMSD of the lowest RMSD structures for all 125 proteins (folded and unfolded) was 4.4 A. The algorithm was also applied to limited experimental NOE data generated for three proteins. Using very few experimental sidechain contact restraints, and a small number of sidechain-main chain and main chain-main chain contact restraints, we folded all three proteins to low-to-medium resolution structures. The algorithm can be applied to the NMR structure determination process or other experimental methods that can provide tertiary restraint information, especially in the early stage of structure determination, when only limited data are available.  相似文献   

10.
In the family of acyl-coenzyme A binding proteins, a subset of 26 sequence sites are identical in all eukaryotes and conserved throughout evolution of the eukaryotic kingdoms. In the context of the bovine protein, the importance of these 26 sequence positions for structure, function, stability, and folding has been analyzed using single-site mutations. A total of 28 mutant proteins were analyzed which covered 17 conserved sequence positions and three nonconserved positions. As a first step, the influence of the mutations on the protein folding reaction has been probed, revealing a folding nucleus of eight hydrophobic residues formed between the N- and C-terminal helices [Kragelund, B. B., et al. (1999) Nat. Struct. Biol. (In press)]. To fully analyze the role of the conserved residues, the function and the stability have been measured for the same set of mutant proteins. Effects on function were measured by the extent of binding of the ligand dodecanoyl-CoA using isothermal titration calorimetry, and effects on protein stability were measured with chemical denaturation followed by intrinsic tryptophan and tyrosine fluorescence. The sequence sites that have been conserved for direct functional purposes have been identified. These are Phe5, Tyr28, Tyr31, Lys32, Lys54, and Tyr73. Binding site residues are mainly polar or charged residues, and together, four of these contribute approximately 8 kcal mol-1 of the total free energy of binding of 11 kcal mol-1. The sequence sites conserved for stability of the structure have likewise been identified and are Phe5, Ala9, Val12, Leu15, Leu25, Tyr28, Lys32, Gln33, Tyr73, Val77, and Leu80. Essentially, all of the conserved residues that maintain the stability are hydrophobic residues at the interface of the helices. Only one conserved polar residue, Gln33, is involved in stability. The results indicate that conservation of residues in homologous proteins may result from a summed optimization of an effective folding reaction, a stable native protein, and a fully active binding site. This is important in protein design strategies, where optimization of one of these parameters, typically function or stability, may influence any of the others markedly.  相似文献   

11.
It has been shown for 20 proteins that amino acid residues included into the protein folding nucleus, determined experimentally, are often involved in the theoretically determined amyloidogenic fragments. For 18 proteins, Φ-values indicative of the extent of residue involvement into the folding nucleus are on average higher for amino acid residues within amyloidogenic regions. Amyloidogenic fragments were predicted for 20 proteins by two methods chosen from four on the basis of comparison of prediction of amyloidogenic regions known from experimental data. Since theoretical folding nuclei are detected by the protein three-dimensional structure and amyloidogenic regions by the protein chain primary structure, the detected regularity makes possible predictions of folding nucleation sites on the basis of amino acid sequence.  相似文献   

12.
Torshin IY  Harrison RW 《Proteins》2001,43(4):353-364
Electrostatic interactions are important for protein folding. At low resolution, the electrostatic field of the whole molecule can be described in terms of charge center(s). To study electrostatic effects, the centers of positive and negative charge were calculated for 20 small proteins of known structure, for which hydrogen exchange cores had been determined experimentally. Two observations seem to be important. First, in all 20 proteins studied 30-100% of the residues forming hydrogen exchange core(s) were clustered around the charge centers. Moreover, in each protein more than half of the core sequences are located near the centers of charge. Second, the general architecture of all-alpha proteins from the set seems to be stabilized by interactions of residues surrounding the charge centers. In most of the alpha-beta proteins, either or both of the centers are located near a pair of consecutive strands, and this is even more characteristic for alpha/Beta and all-beta structures. Consecutive strands are very probable sites of early folding events. These two points lead to the conclusion that charge centers, defined solely from the structure of the folded protein may indicate the location of a protein's hydrogen exchange/folding core. In addition, almost all the proteins contain well-conserved continuous hydrophobic sequences of three or more residues located in the vicinity of the charge centers. These hydrophobic sequences may be primary nucleation sites for protein folding. The results suggest the following scheme for the order of events in folding: local hydrophobic nucleation, electrostatic collapse of the core, global hydrophobic collapse, and slow annealing to the native state. This analysis emphasizes the importance of treating electrostatics during protein-folding simulations.  相似文献   

13.
Protein is the working molecule of the cell, and evolution is the hallmark of life. It is important to understand how protein folding and evolution influence each other. Several studies correlating experimental measurement of residue participation in folding nucleus and sequence conservation have reached different conclusions. These studies are based on assessment of sequence conservation at folding nucleus sites using entropy or relative entropy measurement derived from multiple sequence alignment. Here we report analysis of conservation of folding nucleus using an evolutionary model alternative to entropy-based approaches. We employ a continuous time Markov model of codon substitution to distinguish mutation fixed by evolution and mutation fixed by chance. This model takes into account bias in codon frequency, bias-favoring transition over transversion, as well as explicit phylogenetic information. We measure selection pressure using the ratio omega of synonymous versus non-synonymous substitution at individual residue site. The omega-values are estimated using the PAML method, a maximum-likelihood estimator. Our results show that there is little correlation between the extent of kinetic participation in protein folding nucleus as measured by experimental phi-value and selection pressure as measured by omega-value. In addition, two randomization tests failed to show that folding nucleus residues are significantly more conserved than the whole protein, or the median omega value of all residues in the protein. These results suggest that at the level of codon substitution, there is no indication that folding nucleus residues are significantly more conserved than other residues. We further reconstruct candidate ancestral residues of the folding nucleus and suggest possible test tube mutation studies for testing folding behavior of ancient folding nucleus.  相似文献   

14.
Predicting protein folding rate from amino acid sequence is an important challenge in computational and molecular biology. Over the past few years, many methods have been developed to reflect the correlation between the folding rates and protein structures and sequences. In this paper, we present an effective method, a combined neural network--genetic algorithm approach, to predict protein folding rates only from amino acid sequences, without any explicit structural information. The originality of this paper is that, for the first time, it tackles the effect of sequence order. The proposed method provides a good correlation between the predicted and experimental folding rates. The correlation coefficient is 0.80 and the standard error is 2.65 for 93 proteins, the largest such databases of proteins yet studied, when evaluated with leave-one-out jackknife test. The comparative results demonstrate that this correlation is better than most of other methods, and suggest the important contribution of sequence order information to the determination of protein folding rates.  相似文献   

15.
The linear sequence of amino acids contains all the necessary information for a protein to fold into its unique three-dimensional structure. Native protein sequences are known to accomplish this by promoting the formation of stable, kinetically accessible structures. Here we describe a Pro residue in the center of the third transmembrane helix of the cystic fibrosis transmembrane conductance regulator that promotes folding by a distinct mechanism: disfavoring the formation of a misfolded structure. The generality of this mechanism is supported by genome-wide transmembrane sequence analyses. Furthermore, the results provide an explanation for the increased frequency of Pro residues in transmembrane alpha-helices. Incorporation by nature of such 'negative folding determinants', aimed at preventing the formation of off-pathway structures, represents an additional mechanism by which folding information is encoded within the evolved sequences of proteins.  相似文献   

16.
Regions of secondary structure are predicted, without using information about the conformation of the protein itself, and compared with crystallographic assignments for seven proteins of recently published sequence and conformation (Table 1). It is observed in Table 3 that the prediction of helices is good (78.7% for %cor.ass.3), except for proteins having large antiparallel pleated sheets, and the prediction of β-structure is quite good (51.2% for %cor.ass.3) except for helix-rich proteins.The prediction of secondary structure from sequence, and a survey of all protein structures analysed so far by X-ray crystallography, suggest that nuceleation starts in almost all cases from interactions in the medium range between the regions having helical potential (α-candidate) and β-structural potential (β-candidate), which are very close to each other but separated by at least three hydrophilic or neutral residues in four consecutive residues on the polypeptide chain. Predictability of loops or turns is enhanced to 71.3% (%cor.ass.2) from 64.4% by taking into account the contiguous α-β interactions. Such a medium-range interaction is called here a probable nucleus. There are a lot of nuclei in large proteins such as carboxypeptidase Aα, while there exists at least one in small proteins like the trypsin inhibitor, Moreover, such an interaction could be a transitionary state towards a helix-rich protein, and towards a helix-deficient protein having a large antiparallel pleated sheet β-structure as well.The analysis of the relation between probable nuclei with regard to their mutual spatial proximity strongly suggests that the topological pathway of the polypeptide chain in three-dimensional space might be decided by the long-range interactions between an α-candidate and a β-candidate. An empirical rule is observed that almost all parallel pleated sheets are accompanied by helices in their neighbourhood. An accumulation of chemical facts, such as complementation experiments, combinations of disulphide bonds, etc., seems also to be elucidated by the proposed mechanism of protein folding.  相似文献   

17.
Short motifs are known to play diverse roles in proteins, such as in mediating the interactions with other molecules, binding to membranes, or conducting a specific biological function. Standard approaches currently employed to detect short motifs in proteins search for enrichment of amino acid motifs considering mostly the sequence information. Here, we presented a new approach to search for common motifs (protein signatures) which share both physicochemical and structural properties, looking simultaneously at different features. Our method takes as an input an amino acid sequence and translates it to a new alphabet that reflects its intrinsic structural and chemical properties. Using the MEME search algorithm, we identified the proteins signatures within subsets of protein which encompass common sequence and structural information. We demonstrated that we can detect enriched structural motifs, such as the amphipathic helix, from large datasets of linear sequences, as well as predicting common structural properties (such as disorder, surface accessibility, or secondary structures) of known functional‐motifs. Finally, we applied the method to the yeast protein interactome and identified novel putative interacting motifs. We propose that our approach can be applied for de novo protein function prediction given either sequence or structural information. Proteins 2013; © 2012 Wiley Periodicals, Inc.  相似文献   

18.
We identified key residues from the structural alignment of families of protein domains from SCOP which we represented in the form of sparse protein signatures. A signature-generating algorithm (SigGen) was developed and used to automatically identify key residues based on several structural and sequence-based criteria. The capacity of the signatures to detect related sequences from the SWISSPROT database was assessed by receiver operator characteristic (ROC) analysis and jack-knife testing. Test signatures for families from each of the main SCOP classes are described in relation to the quality of the structural alignments, the SigGen parameters used, and their diagnostic performance. We show that automatically generated signatures are potently diagnostic for their family (ROC50 scores typically >0.8), consistently outperform random signatures, and can identify sequence relationships in the "twilight zone" of protein sequence similarity (<40%). Signatures based on 15%-30% of alignment positions occurred most frequently among the best-performing signatures. When alignment quality is poor, sparser signatures perform better, whereas signatures generated from higher-quality alignments of fewer structures require more positions to be diagnostic. Our validation of signatures from the Globin family shows that when sequences from the structural alignment are removed and new signatures generated, the omitted sequences are still detected. The positions highlighted by the signature often correspond (alignment specificity >0.7) to the key positions in the original (non-jack-knifed) alignment. We discuss potential applications of sparse signatures in sequence annotation and homology modeling.  相似文献   

19.
The contact order is believed to be an important factor for understanding protein folding mechanisms. In our earlier work, we have shown that the long-range interactions play a vital role in protein folding. In this work, we analyzed the contribution of long-range contacts to determine the folding rate of two-state proteins. We found that the residues that are close in space and are separated by at least ten to 15 residues in sequence are important determinants of folding rates, suggesting the presence of a folding nucleus at an interval of approximately 25 residues. A novel parameter "long-range order" has been proposed to predict protein folding rates. This parameter shows as good a relationship with the folding rate of two-state proteins as contact order. Further, we examined the minimum limit of residue separation to determine the long-range contacts for different structural classes. We observed an excellent correlation between long-range order and folding rate for all classes of globular proteins. We suggest that in mixed-class proteins, a larger number of residues can serve as folding nuclei compared to all-alpha and all-beta proteins. A simple statistical method has been developed to predict the folding rates of two-state proteins using the long-range order that produces an agreement with experimental results that is better or comparable to other methods in the literature.  相似文献   

20.
A distance constraint approach is applied to two-dimensional models of proteins in order to visualize the nature of protein folding and to examine the relative roles of different ranges of interaction. Three different native structures (I, II, and III) are considered; they have two different kinds of residues, viz., hydrophobic and hydrophilic, and different sequences of these residues. We examine how the distance constraint approach functions in the prediction of protein folding when we know the sequence of the residues, the (fixed) bond lengths, the mean distances between residues i and i + 2, and i and i + 3, and the mean distances for hydrophobic–hydrophobic, hydrophobic–hydrophilic, and hydrophilic–hydrophilic contacts between residues i and i + j, where j ≥ 4. This approach involves optimization of an object function with respect to 98 variables and is not free of the multiple-minimum problem. The optimization is always terminated if the chain is entangled and/or the segments (residues) are packed too compactly to move. In order to escape from such situations and to take the excluded-volume effect into account, a Monte Carlo method is used after the optimization is trapped in local minima. Success in the prediction of folding is found to depend on the starting conformations and on the native conformations. Fair success is obtained in predicting the helix-like structure in protein I and the overall structure of protein III, but not the β-like structures of proteins I and II. Insofar as the prediction of the structure of protein III is reasonable, it appears that some sequences of residues produce greater constraints on their conformations than others, if one considers only the hydrophobic and hydrophilic nature of the residues. These results imply that, in the folding of real proteins in three dimensions, the competition for hydrophobic (and hydrophilic) residues for inside (outside) positions in the molecule probably constitutes a necessary but not a sufficient condition to form and stabilize the native structure. The failure to predict the structure of protein II, and part of that of protein I, suggests that there are two types of long-range interactions. One (which we considered here) is nonspecific (i.e., is defined only in terms of contacts between residues of the same or different polarity) and acts at any stage of protein folding; the other (which we did not consider here) is a specific interaction between residues in pairs and contributes only when the residues in the specific pair take on the native conformation. Presumably, incorporation of such specific long-range interactions, together with the nonspecific ones, is necessary for successful protein folding, using the distance constraint approach.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号