首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Normal mode analyses of homologous proteins at the family and superfamily level show that slow dynamics are similar and are preserved through evolution. This study investigates how the slow dynamics of proteins is affected by variation in the protein architecture and fold. For this purpose, we have used computer-generated protein models based on idealized protein structures with varying folds. These are shown to be protein-like in their behavior, and they are used to investigate the influence of architecture and fold on the slow dynamics. We compared the dynamics of models having different folds but similar architecture and found the architecture to be the dominant factor for the slow dynamics.  相似文献   

2.
Many seemingly unrelated protein families share common folds. Theoretical models based on structure designability have suggested that a few folds should be very common while many others have low probability. In agreement with the predictions of these models, we show that the distribution of observed protein families over different folds can be modeled with a highly-stretched exponential. Our results suggest that there are approximately 4,000 possible folds, some so unlikely that only approximately 2,000 folds existing among naturally-occurring proteins. Due to the large number of extremely rare folds, constructing a comprehensive database of all existent folds would be difficult. Constructing a database of the most-likely folds representing the vast majority of protein families would be considerably easier.  相似文献   

3.

Background  

Accurate and automatic gene finding and structural prediction is a common problem in bioinformatics, and applications need to be capable of handling non-canonical splice sites, micro-exons and partial gene structure predictions that span across several genomic clones.  相似文献   

4.
The use of classical molecular dynamics simulations, performed in explicit water, for the refinement of structural models of proteins generated ab initio or based on homology has been investigated. The study involved a test set of 15 proteins that were previously used by Baker and coworkers to assess the efficiency of the ROSETTA method for ab initio protein structure prediction. For each protein, four models generated using the ROSETTA procedure were simulated for periods of between 5 and 400 nsec in explicit solvent, under identical conditions. In addition, the experimentally determined structure and the experimentally derived structure in which the side chains of all residues had been deleted and then regenerated using the WHATIF program were simulated and used as controls. A significant improvement in the deviation of the model structures from the experimentally determined structures was observed in several cases. In addition, it was found that in certain cases in which the experimental structure deviated rapidly from the initial structure in the simulations, indicating internal strain, the structures were more stable after regenerating the side-chain positions. Overall, the results indicate that molecular dynamics simulations on a tens to hundreds of nanoseconds time scale are useful for the refinement of homology or ab initio models of small to medium-size proteins.  相似文献   

5.
One of the main barriers to accurate computational protein structure prediction is searching the vast space of protein conformations. Distance restraints or inter‐residue contacts have been used to reduce this search space, easing the discovery of the correct folded state. It has been suggested that about 1 contact for every 12 residues may be sufficient to predict structure at fold level accuracy. Here, we use coarse‐grained structure‐based models in conjunction with molecular dynamics simulations to examine this empirical prediction. We generate sparse contact maps for 15 proteins of varying sequence lengths and topologies and find that given perfect secondary‐structural information, a small fraction of the native contact map (5%‐10%) suffices to fold proteins to their correct native states. We also find that different sparse maps are not equivalent and we make several observations about the type of maps that are successful at such structure prediction. Long range contacts are found to encode more information than shorter range ones, especially for α and αβ‐proteins. However, this distinction reduces for β‐proteins. Choosing contacts that are a consensus from successful maps gives predictive sparse maps as does choosing contacts that are well spread out over the protein structure. Additionally, the folding of proteins can also be used to choose predictive sparse maps. Overall, we conclude that structure‐based models can be used to understand the efficacy of structure‐prediction restraints and could, in future, be tuned to include specific force‐field interactions, secondary structure errors and noise in the sparse maps.  相似文献   

6.
In order to understand the evolution of enzyme reactions and to gain an overview of biological catalysis we have combined sequence and structural data to generate phylogenetic trees in an analysis of 276 structurally defined enzyme superfamilies, and used these to study how enzyme functions have evolved. We describe in detail the analysis of two superfamilies to illustrate different paradigms of enzyme evolution. Gathering together data from all the superfamilies supports and develops the observation that they have all evolved to act on a diverse set of substrates, whilst the evolution of new chemistry is much less common. Despite that, by bringing together so much data, we can provide a comprehensive overview of the most common and rare types of changes in function. Our analysis demonstrates on a larger scale than previously studied, that modifications in overall chemistry still occur, with all possible changes at the primary level of the Enzyme Commission (E.C.) classification observed to a greater or lesser extent. The phylogenetic trees map out the evolutionary route taken within a superfamily, as well as all the possible changes within a superfamily. This has been used to generate a matrix of observed exchanges from one enzyme function to another, revealing the scale and nature of enzyme evolution and that some types of exchanges between and within E.C. classes are more prevalent than others. Surprisingly a large proportion (71%) of all known enzyme functions are performed by this relatively small set of 276 superfamilies. This reinforces the hypothesis that relatively few ancient enzymatic domain superfamilies were progenitors for most of the chemistry required for life.  相似文献   

7.
A computer program (ORB) has been developed to predict 1H,13C and 15N NMR chemical shifts of previouslyunassigned proteins. The program makes use of the information contained in achemical shift database of previously assigned proteins supplemented by astatistically derived averaged chemical shift database in which the shifts arecategorized according to their residue, atom and secondary structure type[Wishart et al. (1991) J. Mol. Biol., 222, 311–333]. The predictionprocess starts with a multiple alignment of all previously assigned proteinswith the unassigned query protein. ORB uses the sequence and secondarystructure alignment program XALIGN for this task [Wishart et al. (1994)CABIOS, 10, 121–132; 687–688]. The prediction algorithm in ORB isbased on a scoring of the known shifts for each sequence. The scores dependon global sequence similarity, local sequence similarity, structuralsimilarity and residue similarity and determine how much weight one particularshift is given in the prediction process. In situations where no applicablepreviously assigned chemical shifts are available, the shifts derived from theaveraged database are used. In addition to supplying the user with predictedchemical shifts, ORB calculates a confidence value for every prediction. Theseconfidence values enable the user to judge which predictions are the mostaccurate and they are particularly useful when ORB is incorporated into acomplete autoassignment package. The usefulness of ORB was tested on threemedium-sized proteins: an interleukin-8 analog, a troponin C synthetic peptideheterodimer and cardiac troponin C. Excellent results are obtained if ORB isable to use the chemical shifts of at least one highly homologous sequence.ORB performs well as long as the sequence identity between proteins with knownchemical shifts and the new sequence is not less than 30%.  相似文献   

8.
In order adequately to sample conformational space, methods for protein structure prediction make necessary simplifications that also prevent them from being as accurate as desired. Thus, the idea of feeding them, hierarchically, into a more accurate method that samples less effectively was introduced a decade ago but has not met with more than limited success in a few isolated instances. Ideally, the final stages should be able to identify the native state, show a good correlation with native similarity in order to add value to the selection process, and refine the structures even further. In this work, we explore the possibility of using state-of-the-art explicit solvent molecular dynamics and implicit solvent free energy calculations to accomplish all three of those objectives on 12 small, single-domain proteins, four each of alpha, beta and mixed topologies. We find that this approach is very successful in ranking the native and also enhances the structure selection of predictions generated from the Rosetta method.  相似文献   

9.
Bartlett GJ  Taylor WR 《Proteins》2008,71(2):950-959
Distinguishing native from non-native folds remains a challenging problem for protein structure prediction. We describe a method, SCA-distance scoring, based on results from statistical coupling analysis which discriminates between native and non-native folds produced by a de novo protein structure prediction method for four out of five test proteins. The method is particularly good at discriminating non-native folds which are close in RMSD to the true fold but contain a change in an internal structural element. SCA-distance scoring is a useful addition to the tools available for distinguishing native from non-native folds in protein structure prediction.  相似文献   

10.
Herein we present a computational technique for generating helix-membrane protein folds matching a predefined set of distance constraints, such as those obtained from NMR NOE, chemical cross-linking, dipolar EPR, and FRET experiments. The purpose of the technique is to provide initial structures for local conformational searches based on either energetic considerations or ad-hoc scoring criteria. In order to properly screen the conformational space, the technique generates an exhaustive list of conformations within a specified root-mean-square deviation (RMSD) where the helices are positioned in order to match the provided distances. Our results indicate that the number of structures decreases exponentially as the number of distances increases, and increases exponentially as the errors associated with the distances increases. We also found the number of solutions to be smaller when all the distances share one helix in common, compared to the case where the distances connect helices in a daisy-chain manner. We found that for 7 helices, at least 15 distances with errors up to 8 A are needed to produce a number of solutions that is not too large to be processed by local search refinement procedures. Finally, without energetic considerations, our enumeration technique retrieved the transmembrane domains of Bacteriorhodopsin (PDB entry1c3w), Halorhodopsin (1e12), Rhodopsin (1f88), Aquaporin-1 (1fqy), Glycerol uptake facilitator protein (1fx8), Sensory Rhodopsin (1jgj), and a subunit of Fumarate reductase flavoprotein (1qlaC) with Calpha level RMSDs of 3.0 A, 2.3 A, 3.2 A, 4.6 A, 6.0 A, 3.7 A, and 4.4 A, respectively.  相似文献   

11.
De novo prediction of protein structures, the prediction of structures from amino acid sequences which are not similar to those of hitherto resolved structures, has been one of the major challenges in molecular biophysics. In this paper, we develop a new method of de novo prediction, which combines the fragment assembly method and the simulation of physical folding process: structures which have consistently assembled fragments are dynamically searched by Langevin molecular dynamics of conformational change. The benchmarking test shows that the prediction is improved when the candidate structures are cross-checked by an empirically derived score function.  相似文献   

12.
Using the data on proteins encoded in complete genomes, combined with a rigorous theory of the sampling process, we estimate the total number of protein folds and families, as well as the number of folds and families in each genome. The total number of folds in globular, water- soluble proteins is estimated at about 1000, with structural information currently available for about one-third of the number. The sequenced genomes of unicellular organisms encode from approximately 25%, for the minimal genomes of the Mycoplasmas, to 70-80% for larger genomes, such as Escherichia coli and yeast, of the total number of folds. The number of protein families with significant sequence conservation was estimated to be between 4000 and 7000, with structures available for about 20% of these.  相似文献   

13.
Leonov H  Mitchell JS  Arkin IT 《Proteins》2003,51(3):352-359
The estimation of the number of protein folds in nature is a matter of considerable interest. In this study, a Monte Carlo method employing the broken stick model is used to assign a given number of proteins into a given number of folds. Subsequently, random, integer, non-repeating numbers are generated in order to simulate the process of fold discovery. With this conceptual framework at hand, the effects of two factors upon the fold identification process were investigated: (1) the nature of folds distributions and (2) preferential sampling bias of previously identified folds. Depending on the type of distribution, dividing 100,000 proteins into 1,000 folds resulted in 10-30% of the folds having 10 proteins or less per fold, approximately 10% of the folds having 10-20 proteins per fold, 31-45% having 20-100 proteins per fold, and >30% of the folds having more than 100 proteins per fold. After randomly sampling one tenth of the proteins, 68-96% of the folds were identified. These percentages depend both on folds distribution and biased/non-biased sampling. Only upon increasing the sampling bias for previously identified folds to 1,000, did the model result in a reduction of the number of proteins identified by an order of magnitude (approximately 9%). Thus, assuming the structures of one tenth of the population of proteins in nature have been solved, the results of the Monte Carlo simulation are more consistent with recent lower estimates of the number of folds, 相似文献   

14.
15.
A novel method for the refinement of misfolded protein structures is proposed in which the properties of the solvent environment are oscillated in order to mimic some aspects of the role of molecular chaperones play in protein folding in vivo. Specifically, the hydrophobicity of the solvent is cycled by repetitively altering the partial charges on solvent molecules (water) during a molecular dynamics simulation. During periods when the hydrophobicity of the solvent is increased, intramolecular hydrogen bonding and secondary structure formation are promoted. During periods of increased solvent polarity, poorly packed regions of secondary structures are destabilized, promoting structural rearrangement. By cycling between these two extremes, the aim is to minimize the formation of long-lived intermediates. The approach has been applied to the refinement of structural models of three proteins generated by using the ROSETTA procedure for ab initio structure prediction. A significant improvement in the deviation of the model structures from the corresponding experimental structures was observed. Although preliminary, the results indicate computationally mimicking some functions of molecular chaperones in molecular dynamics simulations can promote the correct formation of secondary structure and thus be of general use in protein folding simulations and in the refinement of structural models of small- to medium-size proteins.  相似文献   

16.
MOTIVATION: Structural alignments of superfamily members often exhibit insertions and deletions of secondary structure elements (SSEs), yet conserved subsets of SSEs appear to be important for maintaining the fold and facilitating common functionalities. RESULTS: A database of aligned SSEs was constructed from the structure-based alignments of protein superfamily members in the CAMPASS database. SSEs were classified into several types on the basis of their length and solvent accessibility and counts were made for the replacements of SSEs in different types at structurally aligned positions. The results, summarized as log-odds substitution matrices, can be used for two types of comparisons: (1) structure against structure, both with secondary structure assignments; and (2) structure against sequence with predicted secondary structures. The conservation of SSEs at each alignment position was defined as the deviation of observed SSE frequencies from the uniform distribution. This offers a useful resource to define and examine the core of superfamily folds. Even when the structure of only a single member of a superfamily is known, the extended method can be used to predict the conservation of SSEs. Such information will be useful when modelling the structure of other members of a superfamily or identifying structurally and functionally important positions in the fold.  相似文献   

17.
Over the next few years, various genome projects will sequence many new genes and yield many new gene products. Many of these products will have no known function and little, if any, sequence homology to existing proteins. There is reason to believe that a rapid determination of a protein fold, even at low resolution, can aid in the identification of function and expedite the determination of structure at higher resolution. Recently devised NMR methods of measuring residual dipolar couplings provide one route to the determination of a fold. They do this by allowing the alignment of previously identified secondary structural elements with respect to each other. When combined with constraints involving loops connecting elements or other short-range experimental distance information, a fold is produced. We illustrate this approach to protein fold determination on (15)N-labeled Eschericia coli acyl carrier protein using a limited set of (15)N-(1)H and (1)H-(1)H dipolar couplings. We also illustrate an approach using a more extended set of heteronuclear couplings on a related protein, (13)C, (15)N-labeled NodF protein from Rhizobium leguminosarum.  相似文献   

18.
Ishida T  Nakamura S  Shimizu K 《Proteins》2006,64(4):940-947
We developed a novel knowledge-based residue environment potential for assessing the quality of protein structures in protein structure prediction. The potential uses the contact number of residues in a protein structure and the absolute contact number of residues predicted from its amino acid sequence using a new prediction method based on a support vector regression (SVR). The contact number of an amino acid residue in a protein structure is defined by the number of residues around a given residue. First, the contact number of each residue is predicted using SVR from an amino acid sequence of a target protein. Then, the potential of the protein structure is calculated from the probability distribution of the native contact numbers corresponding to the predicted ones. The performance of this potential is compared with other score functions using decoy structures to identify both native structure from other structures and near-native structures from nonnative structures. This potential improves not only the ability to identify native structures from other structures but also the ability to discriminate near-native structures from nonnative structures.  相似文献   

19.
Liu X  Fan K  Wang W 《Proteins》2004,54(3):491-499
Currently, of the 10(6) known protein sequences, only about 10(4) structures have been solved. Based on homologies and similarities, proteins are grouped into different families in which each has a structural prototype, namely, the fold, and some share the same folds. However, the total number of folds and families, and furthermore, the distribution of folds over families in nature, are still an enigma. Here, we report a study on the distribution of folds over families and the total number of folds in nature, using a maximum probability principle and the moment method of estimation. A quadratic relation between the numbers of families and folds is found for the number of families in an interval from 6000 to 30,000. For example, about 2700 folds for 23,100 families are obtained, among them about 33 superfolds, including more than 100 families each, and the largest superfold comprises about 800 families. Our results suggest that although the majority of folds have only a single family per fold, a considerably larger number of folds include many more families each than in the database, and the distribution of folds over families in nature differs markedly from the sampled distribution. The long tail of fold distribution is first estimated in this article. The results fit the data for different versions of the structural classification of proteins (SCOP) excellently, and the goodness-of-fit tests strongly support the results. In addition, the method of directly "enlarging" the sample to the population may be useful in inferring distributions of species in different fields.  相似文献   

20.
Circular dichroism (CD) is an excellent tool for rapid determination of the secondary structure and folding properties of proteins that have been obtained using recombinant techniques or purified from tissues. The most widely used applications of protein CD are to determine whether an expressed, purified protein is folded, or if a mutation affects its conformation or stability. In addition, it can be used to study protein interactions. This protocol details the basic steps of obtaining and interpreting CD data, and methods for analyzing spectra to estimate the secondary structural composition of proteins. CD has the advantage that measurements may be made on multiple samples containing < or =20 microg of proteins in physiological buffers in a few hours. However, it does not give the residue-specific information that can be obtained by x-ray crystallography or NMR.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号