首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Abeln S  Deane CM 《Proteins》2005,60(4):690-700
We review fold usage on completed genomes to explore protein structure evolution. The patterns of presence or absence of folds on genomes gives us insights into the relationships between folds, the age of different folds and how we have arrived at the set of folds we see today. We examine the relationships between different measures which describe protein fold usage, such as the number of copies of a fold per genome, the number of families per fold, and the number of genomes a fold occurs on. We obtained these measures of fold usage by searching for the structural domains on 157 completed genome sequences from all three kingdoms of life. In our comparisons of these measures we found that bacteria have relatively more distinct folds on their genomes than archaea. Eukaryotes were found to have many more copies of a fold on their genomes. If we separate out the different fold classes, the alpha/beta class has relatively fewer distinct folds on large genomes, more copies of a fold on bacteria and more folds occurring in all three kingdoms simultaneously. These results possibly indicate that most alpha/beta folds originated earlier than other folds. The expected power law distribution is observed for copies of a fold per genome and we found a similar distribution for the number of families per fold. However, a more complicated distribution appears for fold occurrence across genomes, which strongly depends on fold class and kingdom. We also show that there is not a clear relationship between the three measures of fold usage. A fold which occurs on many genomes does not necessarily have many copies on each genome. Similarly, folds with many copies do not necessarily have many families or vice versa.  相似文献   

2.
Proteins form arguably the most significant link between genotype and phenotype. Understanding the relationship between protein sequence and structure, and applying this knowledge to predict function, is difficult. One way to investigate these relationships is by considering the space of protein folds and how one might move from fold to fold through similarity, or potential evolutionary relationships. The many individual characterisations of fold space presented in the literature can tell us a lot about how well the current Protein Data Bank represents protein fold space, how convergence and divergence may affect protein evolution, how proteins affect the whole of which they are part, and how proteins themselves function. A synthesis of these different approaches and viewpoints seems the most likely way to further our knowledge of protein structure evolution and thus, facilitate improved protein structure design and prediction.  相似文献   

3.
In the fold recognition approach to structure prediction, a sequence is tested for compatibility with an already known fold. For membrane proteins, however, few folds have been determined experimentally. Here the feasibility of computing the vast majority of likely membrane protein folds is tested. The results indicate that conformation space can be effectively sampled for small numbers of helices. The vast majority of potential monomeric membrane protein structures can be represented by about 30-folds for three helices, but increases exponentially to about 1,500,000 folds for seven helices. The generated folds could serve as templates for fold recognition or as starting points for conformational searches that are well distributed throughout conformation space.  相似文献   

4.
Many protein pairs that share the same fold do not have any detectable sequence similarity, providing a valuable source of information for studying sequence-structure relationship. In this study, we use a stringent data set of structurally similar, sequence-dissimilar protein pairs to characterize residues that may play a role in the determination of protein structure and/or function. For each protein in the database, we identify amino-acid positions that show residue conservation within both close and distant family members. These positions are termed "persistently conserved". We then proceed to determine the "mutually" persistently conserved (MPC) positions: those structurally aligned positions in a protein pair that are persistently conserved in both pair mates. Because of their intra- and interfamily conservation, these positions are good candidates for determining protein fold and function. We find that 45% of the persistently conserved positions are mutually conserved. A significant fraction of them are located in critical positions for secondary structure determination, they are mostly buried, and many of them form spatial clusters within their protein structures. A substitution matrix based on the subset of MPC positions shows two distinct characteristics: (i) it is different from other available matrices, even those that are derived from structural alignments; (ii) its relative entropy is high, emphasizing the special residue restrictions imposed on these positions. Such a substitution matrix should be valuable for protein design experiments.  相似文献   

5.
6.
Liu X  Fan K  Wang W 《Proteins》2004,54(3):491-499
Currently, of the 10(6) known protein sequences, only about 10(4) structures have been solved. Based on homologies and similarities, proteins are grouped into different families in which each has a structural prototype, namely, the fold, and some share the same folds. However, the total number of folds and families, and furthermore, the distribution of folds over families in nature, are still an enigma. Here, we report a study on the distribution of folds over families and the total number of folds in nature, using a maximum probability principle and the moment method of estimation. A quadratic relation between the numbers of families and folds is found for the number of families in an interval from 6000 to 30,000. For example, about 2700 folds for 23,100 families are obtained, among them about 33 superfolds, including more than 100 families each, and the largest superfold comprises about 800 families. Our results suggest that although the majority of folds have only a single family per fold, a considerably larger number of folds include many more families each than in the database, and the distribution of folds over families in nature differs markedly from the sampled distribution. The long tail of fold distribution is first estimated in this article. The results fit the data for different versions of the structural classification of proteins (SCOP) excellently, and the goodness-of-fit tests strongly support the results. In addition, the method of directly "enlarging" the sample to the population may be useful in inferring distributions of species in different fields.  相似文献   

7.
An alternative core packing group, involving a set of five positions, has been introduced into human acidic FGF-1. This alternative group was designed so as to constrain the primary structure within the core region to the same threefold symmetry present in the tertiary structure of the protein fold (the beta-trefoil superfold). The alternative core is essentially indistinguishable from the WT core with regard to structure, stability, and folding kinetics. The results show that the beta-trefoil superfold is compatible with a threefold symmetric constraint on the core region, as might be the case if the superfold arose as a result of gene duplication/fusion events. Furthermore, this new core arrangement can form the basis of a structural "building block" that can greatly simplify the de novo design of beta-trefoil proteins by using symmetric structural complementarity. Remaining asymmetry within the core appears to be related to asymmetry in the tertiary structure associated with receptor and heparin binding functionality of the growth factor.  相似文献   

8.
In previous studies designed to increase the primary structure symmetry within the hydrophobic core of human acidic fibroblast growth factor (FGF-1) a combination of five mutations were accommodated, resulting in structure, stability and folding kinetic properties similar to wild-type (despite the symmetric constraint upon the set of core residues). A sixth mutation in the core, involving a highly conserved Met residue at position 67, appeared intolerant to substitution. Structural analysis suggested that the local packing environment of position 67 involved two regions of apparent insertions that distorted the tertiary structure symmetry inherent in the beta-trefoil architecture. It was postulated that a symmetric constraint upon the primary structure within the core could only be achieved after these insertions had been deleted (concomitantly increasing the tertiary structure symmetry). The deletion of these insertions is now shown to permit mutation of position 67, thereby increasing the primary structure symmetry relationship within the core. Furthermore, despite the imposed symmetric constraint upon both the primary and tertiary structure, the resulting mutant form of FGF-1 is substantially more stable. The apparent inserted regions are shown to be associated with heparin-binding functionality; however, despite a marked reduction in heparin-binding affinity the mutant form of FGF-1 is surprisingly approximately 70 times more potent in 3T3 fibroblast mitogenic assays. The results support the hypothesis that primary structure symmetry within a symmetric protein superfold represents a possible solution, rather than a constraint, to achieving a foldable polypeptide.  相似文献   

9.
The globin family of protein structures was the first for which it was recognized that tertiary structure can be highly conserved even when primary sequences have diverged to a virtually undetectable level of similarity. This principle of structural inertia in molecular evolution is now evident for many other protein families. We have performed a systematic comparison of the sequences and structures of 6 representative hemoglobin subunits as diverse in origin as plants, clams, and humans. Our analysis is based on a 97-residue helical core in common to all 6 structures. Amino acid sequence identities range from 12.4% to 42.3% in pairwise comparisons, and, despite these variations, the maximal RMS deviation in alpha-carbon positions is 3.02 A. Overall, sequence similarity and structural deviation are significantly anticorrelated, with a correlation coefficient of -0.71, but for a set of structures having under 20% pairwise identity, this anticorrelation falls to -0.38, which emphasizes the weak connection between a specific sequence and the tertiary fold. There is substantial variability in structure outside the helical core, and functional characteristics of these globins also differ appreciably. Nevertheless, despite variations in detail that the sequence dissimilarities and functional differences imply, the core structures of these globins remain remarkably preserved.  相似文献   

10.
Signature sequences are contiguous patterns of amino acids 10-50 residues long that are associated with a particular structure or function in proteins. These may be of three types (by our nomenclature): superfamily signatures, remnant homologies, and motifs. We have performed a systematic search through a database of protein sequences to automatically and preferentially find remnant homologies and motifs. This was accomplished in three steps: 1. We generated a nonredundant sequence database. 2. We used BLAST3 (Altschul and Lipman, Proc. Natl. Acad. Sci. U.S.A. 87:5509-5513, 1990) to generate local pairwise and triplet sequence alignments for every protein in the database vs. every other. 3. We selected "interesting" alignments and grouped them into clusters. We find that most of the clusters contain segments from proteins which share a common structure or function. Many of them correspond to signatures previously noted in the literature. We discuss three previously recognized motifs in detail (FAD/NAD-binding, ATP/GTP-binding, and cytochrome b5-like domains) to demonstrate how the alignments generated by our procedure are consistent with previous work and make structural and functional sense. We also discuss two signatures (for N-acetyltransferases and glycerol-phosphate binding) which to our knowledge have not been previously recognized.  相似文献   

11.
Abstract The interface of protein structural biology, protein biophysics, molecular evolution, and molecular population genetics forms the foundations for a mechanistic understanding of many aspects of protein biochemistry. Current efforts in interdisciplinary protein modeling are in their infancy and the state-of-the art of such models is described. Beyond the relationship between amino acid substitution and static protein structure, protein function, and corresponding organismal fitness, other considerations are also discussed. More complex mutational processes such as insertion and deletion and domain rearrangements and even circular permutations should be evaluated. The role of intrinsically disordered proteins is still controversial, but may be increasingly important to consider. Protein geometry and protein dynamics as a deviation from static considerations of protein structure are also important. Protein expression level is known to be a major determinant of evolutionary rate and several considerations including selection at the mRNA level and the role of interaction specificity are discussed. Lastly, the relationship between modeling and needed high-throughput experimental data as well as experimental examination of protein evolution using ancestral sequence resurrection and in vitro biochemistry are presented, towards an aim of ultimately generating better models for biological inference and prediction.  相似文献   

12.
Miller J  Zeng C  Wingreen NS  Tang C 《Proteins》2002,47(4):506-512
Despite the variety of protein sizes, shapes, and backbone configurations found in nature, the design of novel protein folds remains an open problem. Within simple lattice models it has been shown that all structures are not equally suitable for design. Rather, certain structures are distinguished by unusually high designability: the number of amino acid sequences for which they represent the unique lowest energy state; sequences associated with such structures possess both robustness to mutation and thermodynamic stability. Here we report that highly designable backbone conformations also emerge in a realistic off-lattice model. The highly designable conformations of a chain of 23 amino acids are identified and found to be remarkably insensitive to model parameters. Although some of these conformations correspond closely to known natural protein folds, such as the zinc finger and the helix-turn-helix motifs, others do not resemble known folds and may be candidates for novel fold design.  相似文献   

13.
We investigate the performance of combinatorial pattern discovery to detect remote sequence similarities in terms of both biological accuracy and computational efficiency for a pair of distantly related families, as a case study. The two families represent the cupredoxins and multicopper oxidases, both containing blue copper-binding domains. These families present a challenging case due to low sequence similarity, different local structure, and variable sequence conservation at their copper-binding active sites. In this study, we investigate a new approach for automatically identifying weak sequence similarities that is based on combinatorial pattern discovery. We compare its performance with a traditional, HMM-based scheme and obtain estimates for sensitivity and specificity of the two approaches. Our analysis suggests that pattern discovery methods can be substantially more sensitive in detecting remote protein relationships while at the same time guaranteeing high specificity.  相似文献   

14.
It is an open question whether nature has utilized all possible protein folds. For a simple protein architecture, the helical repeats, we report a method to address this question based on a mapping between the set of repetitive curves and a space of parameters specifying the curve. The exploration of the parameter space for a particular architecture enables a systematic exploration of the fold space for that protein architecture. In a planar subspace of the parameter space of helical repeats we have identified points corresponding to both naturally occurring folds and potential folds not observed so far.  相似文献   

15.
A substantial fraction of protein sequences derived from genomic analyses is currently classified as representing 'hypothetical proteins of unknown function'. In part, this reflects the limitations of methods for comparison of sequences with very low identity. We evaluated the effectiveness of a Psi-BLAST search strategy to identify proteins of similar fold at low sequence identity. Psi-BLAST searches for structurally characterized low-sequence-identity matches were carried out on a set of over 300 proteins of known structure. Searches were conducted in NCBI's non-redundant database and were limited to three rounds. Some 614 potential homologs with 25% or lower sequence identity to 166 members of the search set were obtained. Disregarding the expect value, level of sequence identity and span of alignment, correspondence of fold between the target and potential homolog was found in more than 95% of the Psi-BLAST matches. Restrictions on expect value or span of alignment improved the false positive rate at the expense of eliminating many true homologs. Approximately three-quarters of the putative homologs obtained by three rounds of Psi-BLAST revealed no significant sequence similarity to the target protein upon direct sequence comparison by BLAST, and therefore could not be found by a conventional search. Although three rounds of Psi-BLAST identified many more homologs than a standard BLAST search, most homologs were undetected. It appears that more than 80% of all homologs to a target protein may be characterized by a lack of significant sequence similarity. We suggest that conservative use of Psi-BLAST has the potential to propose experimentally testable functions for the majority of proteins currently annotated as 'hypothetical proteins of unknown function'.  相似文献   

16.
High divergence in protein sequences makes the detection of distant protein relationships through homology-based approaches challenging. Grouping protein sequences into families, through similarities in either sequence or 3-D structure, facilitates in the improved recognition of protein relationships. In addition, strategically designed protein-like sequences have been shown to bridge distant structural domain families by serving as artificial linkers. In this study, we have augmented a search database of known protein domain families with such designed sequences, with the intention of providing functional clues to domain families of unknown structure. When assessed using representative query sequences from each family, we obtain a success rate of 94% in protein domain families of known structure. Further, we demonstrate that the augmented search space enabled fold recognition for 582 families with no structural information available a priori. Additionally, we were able to provide reliable functional relationships for 610 orphan families. We discuss the application of our method in predicting functional roles through select examples for DUF4922, DUF5131, and DUF5085. Our approach also detects new associations between families that were previously not known to be related, as demonstrated through new sub-groups of the RNA polymerase domain among three distinct RNA viruses. Taken together, designed sequences-augmented search databases direct the detection of meaningful relationships between distant protein families. In turn, they enable fold recognition and offer reliable pointers to potential functional sites that may be probed further through direct mutagenesis studies.  相似文献   

17.
Classification is central to many studies of protein structure, function, and evolution. This article presents a strategy for classifying protein three-dimensional structures. Methods for and issues related to secondary structure, domain, and class assignment are discussed, in addition to methods for the comparison of protein three-dimensional structures. Strategies for assigning protein domains to particular folds and homologous superfamilies are then described in the context of the currently available classification schemes. Two examples (adenylate cyclase/DNA polymerase and glycogen phosphorylase/β-glucosyltransferase) are presented to illustrate problems associated with protein classification.  相似文献   

18.
Analysis of the results of the recent protein structure prediction experiment for our method shows that we achieved a high level of success, Of the 18 available prediction targets of known structure, the assessors have identified 11 chains which either entirely match a previously known fold, or which partially match a substantial region of a known fold. Of these 11 chains, we made predictions for 9, and correctly assigned the folds in 5 cases. We have also identified a further 2 chains which also partially match known folds, and both of these were correctly predicted. The success rate for our method under blind testing is therefore 7 out of 11 chains. A further 2 folds could have easily been recognized but failed due to either overzealous filtering of potential matches, or to simple human error on our part. One of the two targets for which we did not submit a prediction, prosubtilisin, would not have been recognized by our usual criteria, but even in this case, it is possible that a correct prediction could have been made by considerin a combination of pairwise energy and solvation energy Z-scores. Inspection of the threading alignments for the (αβ)8 barrels provides clues as to how fold recognition by threading works, in that these folds are recognized by parts rather than as a whole. The prospects for developing sequence threading technology further is discussed. © 1995 Wiley-Liss, Inc.  相似文献   

19.
Based on a study involving structural comparisons of proteins sharing 25% or less sequence identity, three rounds of Psi-BLAST appear capable of identifying remote evolutionary homologs with greater than 95% confidence provided that more than 50% of the query sequence can be aligned with the target sequence. Since it seems that more than 80% of all homologous protein pairs may be characterized by a lack of significant sequence similarity, the experimental biologist is often confronted with a lack of guidance from conventional homology searches involving pair-wise sequence comparisons. The ability to disregard levels of sequence identity and expect value in Psi-BLAST if at least 50% of the query sequence has been aligned allows for generation of new hypotheses by consideration of matches that are conventionally disregarded. In one example, we suggest a possible evolutionary linkage between the cupredoxin and immunoglobulin fold families. A thermostable hypothetical protein of unknown function may be a circularly permuted homolog to phosphotriesterase, an enzyme capable of detoxifying organophosphate nerve agents. In a third example, the amino acid sequence of another hypothetical protein of unknown function reveals the ATP binding-site, metal binding site, and catalytic sidechain consistent with kinase activity of unknown specificity. This approach significantly expands the utility of existing sequence data to define the primary structure degeneracy of binding sites for substrates, cofactors and other proteins.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号