首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Background  

Accurate and automatic gene finding and structural prediction is a common problem in bioinformatics, and applications need to be capable of handling non-canonical splice sites, micro-exons and partial gene structure predictions that span across several genomic clones.  相似文献   

2.
In order to understand the evolution of enzyme reactions and to gain an overview of biological catalysis we have combined sequence and structural data to generate phylogenetic trees in an analysis of 276 structurally defined enzyme superfamilies, and used these to study how enzyme functions have evolved. We describe in detail the analysis of two superfamilies to illustrate different paradigms of enzyme evolution. Gathering together data from all the superfamilies supports and develops the observation that they have all evolved to act on a diverse set of substrates, whilst the evolution of new chemistry is much less common. Despite that, by bringing together so much data, we can provide a comprehensive overview of the most common and rare types of changes in function. Our analysis demonstrates on a larger scale than previously studied, that modifications in overall chemistry still occur, with all possible changes at the primary level of the Enzyme Commission (E.C.) classification observed to a greater or lesser extent. The phylogenetic trees map out the evolutionary route taken within a superfamily, as well as all the possible changes within a superfamily. This has been used to generate a matrix of observed exchanges from one enzyme function to another, revealing the scale and nature of enzyme evolution and that some types of exchanges between and within E.C. classes are more prevalent than others. Surprisingly a large proportion (71%) of all known enzyme functions are performed by this relatively small set of 276 superfamilies. This reinforces the hypothesis that relatively few ancient enzymatic domain superfamilies were progenitors for most of the chemistry required for life.  相似文献   

3.
A computer program (ORB) has been developed to predict 1H,13C and 15N NMR chemical shifts of previouslyunassigned proteins. The program makes use of the information contained in achemical shift database of previously assigned proteins supplemented by astatistically derived averaged chemical shift database in which the shifts arecategorized according to their residue, atom and secondary structure type[Wishart et al. (1991) J. Mol. Biol., 222, 311–333]. The predictionprocess starts with a multiple alignment of all previously assigned proteinswith the unassigned query protein. ORB uses the sequence and secondarystructure alignment program XALIGN for this task [Wishart et al. (1994)CABIOS, 10, 121–132; 687–688]. The prediction algorithm in ORB isbased on a scoring of the known shifts for each sequence. The scores dependon global sequence similarity, local sequence similarity, structuralsimilarity and residue similarity and determine how much weight one particularshift is given in the prediction process. In situations where no applicablepreviously assigned chemical shifts are available, the shifts derived from theaveraged database are used. In addition to supplying the user with predictedchemical shifts, ORB calculates a confidence value for every prediction. Theseconfidence values enable the user to judge which predictions are the mostaccurate and they are particularly useful when ORB is incorporated into acomplete autoassignment package. The usefulness of ORB was tested on threemedium-sized proteins: an interleukin-8 analog, a troponin C synthetic peptideheterodimer and cardiac troponin C. Excellent results are obtained if ORB isable to use the chemical shifts of at least one highly homologous sequence.ORB performs well as long as the sequence identity between proteins with knownchemical shifts and the new sequence is not less than 30%.  相似文献   

4.
In order adequately to sample conformational space, methods for protein structure prediction make necessary simplifications that also prevent them from being as accurate as desired. Thus, the idea of feeding them, hierarchically, into a more accurate method that samples less effectively was introduced a decade ago but has not met with more than limited success in a few isolated instances. Ideally, the final stages should be able to identify the native state, show a good correlation with native similarity in order to add value to the selection process, and refine the structures even further. In this work, we explore the possibility of using state-of-the-art explicit solvent molecular dynamics and implicit solvent free energy calculations to accomplish all three of those objectives on 12 small, single-domain proteins, four each of alpha, beta and mixed topologies. We find that this approach is very successful in ranking the native and also enhances the structure selection of predictions generated from the Rosetta method.  相似文献   

5.
De novo prediction of protein structures, the prediction of structures from amino acid sequences which are not similar to those of hitherto resolved structures, has been one of the major challenges in molecular biophysics. In this paper, we develop a new method of de novo prediction, which combines the fragment assembly method and the simulation of physical folding process: structures which have consistently assembled fragments are dynamically searched by Langevin molecular dynamics of conformational change. The benchmarking test shows that the prediction is improved when the candidate structures are cross-checked by an empirically derived score function.  相似文献   

6.
Using the data on proteins encoded in complete genomes, combined with a rigorous theory of the sampling process, we estimate the total number of protein folds and families, as well as the number of folds and families in each genome. The total number of folds in globular, water- soluble proteins is estimated at about 1000, with structural information currently available for about one-third of the number. The sequenced genomes of unicellular organisms encode from approximately 25%, for the minimal genomes of the Mycoplasmas, to 70-80% for larger genomes, such as Escherichia coli and yeast, of the total number of folds. The number of protein families with significant sequence conservation was estimated to be between 4000 and 7000, with structures available for about 20% of these.  相似文献   

7.
MOTIVATION: Structural alignments of superfamily members often exhibit insertions and deletions of secondary structure elements (SSEs), yet conserved subsets of SSEs appear to be important for maintaining the fold and facilitating common functionalities. RESULTS: A database of aligned SSEs was constructed from the structure-based alignments of protein superfamily members in the CAMPASS database. SSEs were classified into several types on the basis of their length and solvent accessibility and counts were made for the replacements of SSEs in different types at structurally aligned positions. The results, summarized as log-odds substitution matrices, can be used for two types of comparisons: (1) structure against structure, both with secondary structure assignments; and (2) structure against sequence with predicted secondary structures. The conservation of SSEs at each alignment position was defined as the deviation of observed SSE frequencies from the uniform distribution. This offers a useful resource to define and examine the core of superfamily folds. Even when the structure of only a single member of a superfamily is known, the extended method can be used to predict the conservation of SSEs. Such information will be useful when modelling the structure of other members of a superfamily or identifying structurally and functionally important positions in the fold.  相似文献   

8.
Over the next few years, various genome projects will sequence many new genes and yield many new gene products. Many of these products will have no known function and little, if any, sequence homology to existing proteins. There is reason to believe that a rapid determination of a protein fold, even at low resolution, can aid in the identification of function and expedite the determination of structure at higher resolution. Recently devised NMR methods of measuring residual dipolar couplings provide one route to the determination of a fold. They do this by allowing the alignment of previously identified secondary structural elements with respect to each other. When combined with constraints involving loops connecting elements or other short-range experimental distance information, a fold is produced. We illustrate this approach to protein fold determination on (15)N-labeled Eschericia coli acyl carrier protein using a limited set of (15)N-(1)H and (1)H-(1)H dipolar couplings. We also illustrate an approach using a more extended set of heteronuclear couplings on a related protein, (13)C, (15)N-labeled NodF protein from Rhizobium leguminosarum.  相似文献   

9.
Circular dichroism (CD) is an excellent tool for rapid determination of the secondary structure and folding properties of proteins that have been obtained using recombinant techniques or purified from tissues. The most widely used applications of protein CD are to determine whether an expressed, purified protein is folded, or if a mutation affects its conformation or stability. In addition, it can be used to study protein interactions. This protocol details the basic steps of obtaining and interpreting CD data, and methods for analyzing spectra to estimate the secondary structural composition of proteins. CD has the advantage that measurements may be made on multiple samples containing < or =20 microg of proteins in physiological buffers in a few hours. However, it does not give the residue-specific information that can be obtained by x-ray crystallography or NMR.  相似文献   

10.
Much structural information is encoded in the internal distances; a distance matrix-based approach can be used to predict protein structure and dynamics, and for structural refinement. Our approach is based on the square distance matrix D = [r ij 2 ] containing all square distances between residues in proteins. This distance matrix contains more information than the contact matrix C, that has elements of either 0 or 1 depending on whether the distance r ij is greater or less than a cutoff value r cutoff. We have performed spectral decomposition of the distance matrices $ {\mathbf{D}} = \sum {\lambda_{k} {\mathbf{v}}_{k} {\mathbf{v}}_{k}^{T} } Much structural information is encoded in the internal distances; a distance matrix-based approach can be used to predict protein structure and dynamics, and for structural refinement. Our approach is based on the square distance matrix D = [r ij2] containing all square distances between residues in proteins. This distance matrix contains more information than the contact matrix C, that has elements of either 0 or 1 depending on whether the distance r ij is greater or less than a cutoff value r cutoff. We have performed spectral decomposition of the distance matrices , in terms of eigenvalues and the corresponding eigenvectors and found that it contains at most five nonzero terms. A dominant eigenvector is proportional to r 2—the square distance of points from the center of mass, with the next three being the principal components of the system of points. By predicting r 2 from the sequence we can approximate a distance matrix of a protein with an expected RMSD value of about 7.3 ?, and by combining it with the prediction of the first principal component we can improve this approximation to 4.0 ?. We can also explain the role of hydrophobic interactions for the protein structure, because r is highly correlated with the hydrophobic profile of the sequence. Moreover, r is highly correlated with several sequence profiles which are useful in protein structure prediction, such as contact number, the residue-wise contact order (RWCO) or mean square fluctuations (i.e. crystallographic temperature factors). We have also shown that the next three components are related to spatial directionality of the secondary structure elements, and they may be also predicted from the sequence, improving overall structure prediction. We have also shown that the large number of available HIV-1 protease structures provides a remarkable sampling of conformations, which can be viewed as direct structural information about the dynamics. After structure matching, we apply principal component analysis (PCA) to obtain the important apparent motions for both bound and unbound structures. There are significant similarities between the first few key motions and the first few low-frequency normal modes calculated from a static representative structure with an elastic network model (ENM) that is based on the contact matrix C (related to D), strongly suggesting that the variations among the observed structures and the corresponding conformational changes are facilitated by the low-frequency, global motions intrinsic to the structure. Similarities are also found when the approach is applied to an NMR ensemble, as well as to atomic molecular dynamics (MD) trajectories. Thus, a sufficiently large number of experimental structures can directly provide important information about protein dynamics, but ENM can also provide a similar sampling of conformations. Finally, we use distance constraints from databases of known protein structures for structure refinement. We use the distributions of distances of various types in known protein structures to obtain the most probable ranges or the mean-force potentials for the distances. We then impose these constraints on structures to be refined or include the mean-force potentials directly in the energy minimization so that more plausible structural models can be built. This approach has been successfully used by us in 2006 in the CASPR structure refinement ().  相似文献   

11.
Confinement effects can lead to drastic changes in the structural and dynamical properties of water molecules. In this work, we have performed classical molecular dynamics simulations of endohedral fullerenes of type (H2O)n@Cm (n = 1, 12, 21, 62, 108 and m = 60, 180, 240, 500 and 720) to explore the effects of spherical confinement on water properties. It is shown that these confined water molecules can form distinct solvation pattern depending upon the available space inside the fullerene cavity. For the systems with smaller diameter, cage-like structure is predominant whereas bulk-like structure is observed for larger fullerenes. The orientational relaxation of these confined water molecules showed slower relaxation as the cavity diameter increases except for the (H2O)21@C240. In this case, stable cage-like structure hinders the overall dynamics of the trapped water molecules. Finally, we have calculated the hydrogen bond lifetimes from the hydrogen bond time correlation functions and compared with that of bulk water.  相似文献   

12.

Background  

Classification of newly resolved protein structures is important in understanding their architectural, evolutionary and functional relatedness to known protein structures. Among various efforts to improve the database of Structural Classification of Proteins (SCOP), automation has received particular attention. Herein, we predict the deepest SCOP structural level that an unclassified protein shares with classified proteins with an equal number of secondary structure elements (SSEs).  相似文献   

13.
Residual dipolar couplings provide significant structural information for proteins in the solution state, which makes them attractive for the rapid determination of protein structures. While dipolar couplings contain inherent structural ambiguities, these can be reduced via an overlap similarity measure that insists that protein fragments assigned to overlapping regions of the sequence must have self-consistent structures. This allows us to determine a backbone fold (including the correct C–C bond orientations) using only residual dipolar coupling data from one ordering medium. The resulting backbone structures are of sufficient quality to allow for modeling of sidechain rotamer states using a rotamer prediction algorithm and a force field employing the Surface Generalized Born continuum solvation model. We demonstrate the applicability of the method using experimental data for ubiquitin. These results illustrate the synergies that are possible between protein structural database and molecular modeling methods and NMR spectroscopy, and we expect that the further development of these methods will lead to the extraction of high resolution structural information from minimal NMR data.  相似文献   

14.
MOTIVATION: Knots in polypeptide chains have been found in very few proteins, and consequently should be generally avoided in protein structure prediction methods. Most effective structure prediction methods do not model the protein folding process itself, but rather seek only to correctly obtain the final native state. Consequently, the mechanisms that prevent knots from occurring in native proteins are not relevant to the modeling process, and as a result, knots can occur with significantly higher frequency in protein models. Here we describe Knotfind, a simple algorithm for knot detection that is fast enough for structure prediction, where tens or hundreds of thousands of conformations may be sampled during the course of a prediction. We have used this algorithm to characterize knots in large populations of model structures generated for targets in CASP 5 and CASP 6 using the Rosetta homology-based modeling method. RESULTS: Analysis of CASP5 models suggested several possible avenues for introduction of knots into these models, and these insights were applied to structure prediction in CASP 6, resulting in a significant decrease in the proportion of knotted models generated. Additionally, using the knot detection algorithm on structures in the Protein Data Bank, a previously unreported deep trefoil knot was found in acetylornithine transcarbamylase. AVAILABILITY: The Knotfind algorithm is available in the Rosetta structure prediction program at http://www.rosettacommons.org.  相似文献   

15.
The rapid increase in sequence data in combination with a greater understanding of the forces regulating protein structure has been the impetus for an upsurge in the development of theoretical prediction methods. These methods have afforded protein chemists the ability to identify and quantify the various secondary structures along the protein chain. Concurrently, various physico-chemical techniques have been developed such as nuclear Overhauser enhancement n.m.r. and laser Raman spectroscopy. In addition, traditional methods such as infrared and circular dichroism spectroscopy have been refined. Although both predictive and physico-chemical techniques are limited in the types of secondary structure they are capable of determining, they have provided valuable information with regards to protein folding and topology in the absence of X-ray data, and have formed the basis for the development of improved methods for secondary structure determination. This paper reviews some of the predictive and physico-chemical methods presently used to determine protein secondary structure.  相似文献   

16.

Background  

Inference of remote homology between proteins is very challenging and remains a prerogative of an expert. Thus a significant drawback to the use of evolutionary-based protein structure classifications is the difficulty in assigning new proteins to unique positions in the classification scheme with automatic methods. To address this issue, we have developed an algorithm to map protein domains to an existing structural classification scheme and have applied it to the SCOP database.  相似文献   

17.
The local environment of an amino acid in a folded protein determines the acceptability of mutations at that position. In order to characterize and quantify these structural constraints, we have made a comparative analysis of families of homologous proteins. Residues in each structure are classified according to amino acid type, secondary structure, accessibility of the side chain, and existence of hydrogen bonds from the side chains. Analysis of the pattern of observed substitutions as a function of local environment shows that there are distinct patterns, especially for buried polar residues. The substitution data tables are available on diskette with Protein Science. Given the fold of a protein, one is able to predict sequences compatible with the fold (profiles or templates) and potentially to discriminate between a correctly folded and misfolded protein. Conversely, analysis of residue variation across a family of aligned sequences in terms of substitution profiles can allow prediction of secondary structure or tertiary environment.  相似文献   

18.
After decades of slow progress, the pace of research on membrane protein structures is beginning to quicken thanks to various improvements in technology, including protein engineering and microfocus X-ray diffraction. Here we review these developments and, where possible, highlight generic new approaches to solving membrane protein structures based on recent technological advances. Rational approaches to overcoming the bottlenecks in the field are urgently required as membrane proteins, which typically comprise ~30% of the proteomes of organisms, are dramatically under-represented in the structural database of the Protein Data Bank.  相似文献   

19.
Protein-biomineral interactions are paramount to materials production in biology, including the mineral phase of hard tissue. Unfortunately, the structure of biomineral-associated proteins cannot be determined by X-ray crystallography or solution nuclear magnetic resonance (NMR). Here we report a method for determining the structure of biomineral-associated proteins. The method combines solid-state NMR (ssNMR) and ssNMR-biased computational structure prediction. In addition, the algorithm is able to identify lattice geometries most compatible with ssNMR constraints, representing a quantitative, novel method for investigating crystal-face binding specificity. We use this method to determine most of the structure of human salivary statherin interacting with the mineral phase of tooth enamel. Computation and experiment converge on an ensemble of related structures and identify preferential binding at three crystal surfaces. The work represents a significant advance toward determining structure of biomineral-adsorbed protein using experimentally biased structure prediction. This method is generally applicable to proteins that can be chemically synthesized.  相似文献   

20.
MOTIVATION: Since the newly developed Grid platform has been considered as a powerful tool to share resources in the Internet environment, it is of interest to demonstrate an efficient methodology to process massive biological data on the Grid environments at a low cost. This paper presents an efficient and economical method based on a Grid platform to predict secondary structures of all proteins in a given organism, which normally requires a long computation time through sequential execution, by means of processing a large amount of protein sequence data simultaneously. From the prediction results, a genome scale protein fold space can be pursued. RESULTS: Using the improved Grid platform, the secondary structure prediction on genomic scale and protein topology derived from the new scoring scheme for four different model proteomes was presented. This protein fold space was compared with structures from the Protein Data Bank, database and it showed similarly aligned distribution. Therefore, the fold space approach based on this new scoring scheme could be a guideline for predicting a folding family in a given organism.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号