首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Many protein classification systems capture homologous relationships by grouping domains into families and superfamilies on the basis of sequence similarity. Superfamilies with similar 3D structures are further grouped into folds. In the absence of discernable sequence similarity, these structural similarities were long thought to have originated independently, by convergent evolution. However, the growth of databases and advances in sequence comparison methods have led to the discovery of many distant evolutionary relationships that transcend the boundaries of superfamilies and folds. To investigate the contributions of convergent versus divergent evolution in the origin of protein folds, we clustered representative domains of known structure by their sequence similarity, treating them as point masses in a virtual 2D space which attract or repel each other depending on their pairwise sequence similarities. As expected, families in the same superfamily form tight clusters. But often, superfamilies of the same fold are linked with each other, suggesting that the entire fold evolved from an ancient prototype. Strikingly, some links connect superfamilies with different folds. They arise from modular peptide fragments of between 20 and 40 residues that co‐occur in the connected folds in disparate structural contexts. These may be descendants of an ancestral pool of peptide modules that evolved as cofactors in the RNA world and from which the first folded proteins arose by amplification and recombination. Our galaxy of folds summarizes, in a single image, most known and many yet undescribed homologous relationships between protein superfamilies, providing new insights into the evolution of protein domains.  相似文献   

2.
Most homologous pairs of proteins have no significant sequence similarity to each other and are not identified by direct sequence comparison or profile-based strategies. However, multiple sequence alignments of low similarity homologues typically reveal a limited number of positions that are well conserved despite diversity of function. It may be inferred that conservation at most of these positions is the result of the importance of the contribution of these amino acids to the folding and stability of the protein. As such, these amino acids and their relative positions may define a structural signature. We demonstrate that extraction of this fold template provides the basis for the sequence database to be searched for patterns consistent with the fold, enabling identification of homologs that are not recognized by global sequence analysis. The fold template method was developed to address the need for a tool that could comprehensively search the midnight and twilight zones of protein sequence similarity without reliance on global statistical significance. Manual implementations of the fold template method were performed on three folds--immunoglobulin, c-lectin and TIM barrel. Following proof of concept of the template method, an automated version of the approach was developed. This automated fold template method was used to develop fold templates for 10 of the more populated folds in the SCOP database. The fold template method developed three-dimensional structural motifs or signatures that were able to return a diverse collection of proteins, while maintaining a low false positive rate. Although the results of the manual fold template method were more comprehensive than the automated fold template method, the diversity of the results from the automated fold template method surpassed those of current methods that rely on statistical significance to infer evolutionary relationships among divergent proteins.  相似文献   

3.
We suspect that there is a level of granularity of protein structure intermediate between the classical levels of “architecture” and “topology,” as reflected in such phenomena as extensive three‐dimensional structural similarity above the level of (super)folds. Here, we examine this notion of architectural identity despite topological variability, starting with a concept that we call the “Urfold.” We believe that this model could offer a new conceptual approach for protein structural analysis and classification: indeed, the Urfold concept may help reconcile various phenomena that have been frequently recognized or debated for years, such as the precise meaning of “significant” structural overlap and the degree of continuity of fold space. More broadly, the role of structural similarity in sequence?structure?function evolution has been studied via many models over the years; by addressing a conceptual gap that we believe exists between the architecture and topology levels of structural classification schemes, the Urfold eventually may help synthesize these models into a generalized, consistent framework. Here, we begin by qualitatively introducing the concept.  相似文献   

4.
Alexander V. Efimov 《Proteins》2017,85(10):1925-1930
In this study, the structural motifs that can be represented as combinations of small motifs such as β‐hairpins, S‐, and Z‐like β‐sheets and βαβ‐units, and the П‐like module are described and analyzed. The П‐module consists of connected elements of the β‐strand‐loop‐β‐strand type arranged in space so that its overall fold resembles a clip or the Greek letter П. In proteins, the П‐module itself and the structural motifs containing it exhibit unique overall folds and have specific sequence patterns of the key hydrophobic, hydrophilic and glycine residues. All this together enables us to conclude that these structural motifs can fold independently of the remaining part of the molecule and can act as nuclei and/or “ready‐made” building blocks in protein folding.  相似文献   

5.
We give an overview of the emerging field of structural genomics, describing how genomes can be compared in terms of protein structure. As the number of genes in a genome and the total number of protein folds are both quite limited, these comparisons take the form of surveys of a finite parts list, similar in respects to demographic censuses. Fold surveys have many similarities with other whole-genome characterizations, e.g., analyses of motifs or pathways. However, structure has a number of aspects that make it particularly suitable for comparing genomes, namely the way it allows for the precise definition of a basic protein module and the fact that it has a better defined relationship to sequence similarity than does protein function. An essential requirement for a structure survey is a library of folds, which groups the known structures into 'fold families.' This library can be built up automatically using a structure comparison program, and we described how important objective statistical measures are for assessing similarities within the library and between the library and genome sequences. After building the library, one can use it to count the number of folds in genomes, expressing the results in the form of Venn diagrams and 'top-10' statistics for shared and common folds. Depending on the counting methodology employed, these statistics can reflect different aspects of the genome, such as the amount of internal duplication or gene expression. Previous analyses have shown that the common folds shared between very different microorganisms, i.e., in different kingdoms, have a remarkably similar structure, being comprised of repeated strand-helix-strand super-secondary structure units. A major difficulty with this sort of 'fold-counting' is that only a small subset of the structures in a complete genome are currently known and this subset is prone to sampling bias. One way of overcoming biases is through structure prediction, which can be applied uniformly and comprehensively to a whole genome. Various investigators have, in fact, already applied many of the existing techniques for predicting secondary structure and transmembrane (TM) helices to the recently sequenced genomes. The results have been consistent: microbial genomes have similar fractions of strands and helices even though they have significantly different amino acid composition. The fraction of membrane proteins with a given number of TM helices falls off rapidly with more TM elements, approximately according to a Zipf law. This latter finding indicates that there is no preference for the highly studied 7-TM proteins in microbial genomes. Continuously updated tables and further information pertinent to this review are available over the web at http://bioinfo.mbb.yale.edu/genome.  相似文献   

6.
Protein fold classification often assumes that similarity in primary, secondary, or tertiary structure signifies a common evolutionary origin. However, when similarity is not obvious, it is sometimes difficult to conclude that particular proteins are completely unrelated. Clearly, a set of organizing principles that is independent of traditional classification could be valuable in linking different structural motifs and identifying common ancestry from seemingly disparate folds. Here, a four-dimensional ensemble-based energetic space spanned by a diverse set of proteins was defined and its characteristics were contrasted with those of Cartesian coordinate space. Eigenvector decomposition of this energetic space revealed the dominant physical processes contributing to the more or less stable regions of a protein. Unexpectedly, those processes were identical for proteins with different secondary structure content and were also identical among different amino-acid types. The implications of these results are twofold. First, it indicates that excited conformational states comprising the protein native state ensemble, largely invisible upon inspection of the high-resolution structure, are the major determinant of the energetic space. Second, it suggests that folds dissimilar in sequence or structure could nonetheless be energetically similar if their respective excited conformational states are considered, one example of which was observed in the N-terminal region of the Arc repressor switch mutant. Taken together, these results provide a surface area-based framework for understanding folds in energetic terms, a framework that may eventually yield a means of identifying common ancestry among structurally dissimilar proteins.  相似文献   

7.
We have developed a method of searching for similar spatial arrangements of atoms around a given chemical moiety in proteins that bind a common ligand. The first step in this method is to consider a set of atoms that closely surround a given chemical moiety. Then, to compare the spatial arrangements of such surrounding atoms in different proteins, they are translated and rotated so that the chemical moieties are superposed on each other. Spatial arrangements of surrounding atoms in a pair of proteins are judged to be similar, when there are many corresponding atoms occupying similar spatial positions. Because the method focuses on the arrangements of surrounding atoms, it can detect structural similarities of binding sites in proteins that are dissimilar in their amino acid sequences or in their chain folds. We have applied this method to identify modes of nucleotide base recognition by proteins. An all-against-all comparison of the arrangements of atoms surrounding adenine moieties revealed an unexpected structural similarity between protein kinases, cAMP-dependent protein kinase (cAPK), and casein kinase-1 (CK1), and D-Ala:D-Ala ligase (DD-ligase) at their adenine-binding sites, despite a lack of similarity in their chain folds. The similar local structure consists of a four-residue segment and three sequentially separated residues. In particular the four-residue segments of these enzymes were found to have nearly identical conformations in their backbone parts, which are involved in the recognition of adenine. This common local structure was also found in substrate-free three-dimensional structures of other proteins that are similar to DD-ligase in the chain fold and of other protein kinases. As the proteins with different folds were found to share a common local structure, these proteins seem to constitute a remarkable example of convergent evolution for the same recognition mechanism. Received: 9 December 1996 / Accepted: 7 February 1997  相似文献   

8.
The 'immunoglobulin-like' fold is one of most common structural motifs observed in proteins. This topology is found in more than 80 superfamilies of proteins, including Cu,Zn-superoxide dismutase (SOD) and cupredoxin. Evolutionary relationships have not been identified, but may exist. The challenge remains, therefore, of resolving the issue of whether the diverse distribution of the fold is accounted for by divergent evolution of function or convergent evolution of structure following multiple independent origins of function. Since the early studies that revealed conformational similarity of immunoglobulins and other proteins, the number of primary structures available for comparison has dramatically increased and new computational approaches for analysis of sequences have been developed. It now appears that a hypothesis of a common evolutionary origin for cupredoxins, Cu,Zn-SOD, and immunoglobulins may be credible. The distinction between protein homology and protein analogy is fundamental. The immunoglobulin-like fold may represent a robust system within which to examine again the issue of protein homology versus analogy.  相似文献   

9.
Disulfide-rich domains are small protein domains whose global folds are stabilized primarily by the formation of disulfide bonds and, to a much lesser extent, by secondary structure and hydrophobic interactions. Disulfide-rich domains perform a wide variety of roles functioning as growth factors, toxins, enzyme inhibitors, hormones, pheromones, allergens, etc. These domains are commonly found both as independent (single-domain) proteins and as domains within larger polypeptides. Here, we present a comprehensive structural classification of approximately 3000 small, disulfide-rich protein domains. We find that these domains can be arranged into 41 fold groups on the basis of structural similarity. Our fold groups, which describe broader structural relationships than existing groupings of these domains, bring together representatives with previously unacknowledged similarities; 18 of the 41 fold groups include domains from several SCOP folds. Within the fold groups, the domains are assembled into families of homologs. We define 98 families of disulfide-rich domains, some of which include newly detected homologs, particularly among knottin-like domains. On the basis of this classification, we have examined cases of convergent and divergent evolution of functions performed by disulfide-rich proteins. Disulfide bonding patterns in these domains are also evaluated. Reducible disulfide bonding patterns are much less frequent, while symmetric disulfide bonding patterns are more common than expected from random considerations. Examples of variations in disulfide bonding patterns found within families and fold groups are discussed.  相似文献   

10.
A new method to analyze the similarity between multiply aligned protein motifs (blocks) was developed. It identifies sets of consistently aligned blocks. These are found to be protein regions of similar function and structure that appear in different contexts. For example, the Rossmann fold ligand-binding region is found similar to TIM barrel and methylase regions, various protein families are predicted to have a TIM-barrel fold and the structural relation between the ClpP protease and crotonase folds is identified from their sequence. Besides identifying local structure features, sequence similarity across short sequence-regions (less than 20 amino acid regions) also predicts structure similarity of whole domains (folds) a few hundred amino acid residues long. Most of these relations could not be identified by other advanced sequence-to-sequence or sequence-to-multiple alignments comparisons. We describe the method (termed CYRCA), present examples of our findings, and discuss their implications.  相似文献   

11.
To understand the molecular basis of glycosyltransferases' (GTFs) catalytic mechanism, extensive structural information is required. Here, fold recognition methods were employed to assign 3D protein shapes (folds) to the currently known GTF sequences, available in public databases such as GenBank and Swissprot. First, GTF sequences were retrieved and classified into clusters, based on sequence similarity only. Intracluster sequence similarity was chosen sufficiently high to ensure that the same fold is found within a given cluster. Then, a representative sequence from each cluster was selected to compose a subset of GTF sequences. The members of this reduced set were processed by three different fold recognition methods: 3D-PSSM, FUGUE, and GeneFold. Finally, the results from different fold recognition methods were analyzed and compared to sequence-similarity search methods (i.e., BLAST and PSI-BLAST). It was established that the folds of about 70% of all currently known GTF sequences can be confidently assigned by fold recognition methods, a value which is higher than the fold identification rate based on sequence comparison alone (48% for BLAST and 64% for PSI-BLAST). The identified folds were submitted to 3D clustering, and we found that most of the GTF sequences adopt the typical GTF A or GTF B folds. Our results indicate a lack of evidence that new GTF folds (i.e., folds other than GTF A and B) exist. Based on cases where fold identification was not possible, we suggest several sequences as the most promising targets for a structural genomics initiative focused on the GTF protein family.  相似文献   

12.
Garces RG  Wu N  Gillon W  Pai EF 《The EMBO journal》2004,23(8):1688-1698
The cyanobacterial clock proteins KaiA and KaiB are proposed as regulators of the circadian rhythm in cyanobacteria. Mutations in both proteins have been reported to alter or abolish circadian rhythmicity. Here, we present molecular models of both KaiA and KaiB from the cyanobacteria Anabaena sp PCC7120 deduced by crystal structure analysis, and we discuss how clock-changing or abolishing mutations may cause their resulting circadian phenotype. The overall fold of the KaiA monomer is that of a four-helix bundle. KaiB, on the other hand, adopts an alpha-beta meander motif. Both proteins purify and crystallize as dimers. While the folds of the two proteins are clearly different, their size and some surface features of the physiologically relevant dimers are very similar. Notably, the functionally relevant residues Arg 69 of KaiA and Arg 23 of KaiB align well in space. The apparent structural similarities suggest that KaiA and KaiB may compete for a potential common binding site on KaiC.  相似文献   

13.
Comparing two remotely similar structures is a difficult problem: more often than not, resulting structure alignments will show ambiguities and a unique answer usually does not even exist. In addition, alignments in general have a limited information content because every aligned residue is considered equally important. To solve these issues to a certain extent, one can take the perspective of a whole group of similar structures and then evaluate common structural features. Here, we describe a consistency approach that, although not actually performing a multiple structure alignment, does produce the information that one would conceivably want from such an experiment: the key structural features of the group, e.g., a fold, which in this case are projected onto either a pair of proteins or a single protein. Both representations are useful for a number of applications, ranging from the detection of (partially) wrong structure alignments to protein structure classification and fold recognition. To demonstrate some of these applications, the procedure was applied to 195 SCOP folds containing a total of 1802 domains sharing very low sequence similarity.  相似文献   

14.
We have determined consensus protein-fold classifications on the basis of three classification methods, SCOP, CATH, and Dali. These classifications make use of different methods of defining and categorizing protein folds that lead to different views of protein-fold space. Pairwise comparisons of domains on the basis of their fold classifications show that much of the disagreement between the classification systems is due to differing domain definitions rather than assigning the same domain to different folds. However, there are significant differences in the fold assignments between the three systems. These remaining differences can be explained primarily in terms of the breadth of the fold classifications. Many structures may be defined as having one fold in one system, whereas far fewer are defined as having the analogous fold in another system. By comparing these folds for a nonredundant set of proteins, the consensus method breaks up broad fold classifications and combines restrictive fold classifications into metafolds, creating, in effect, an averaged view of fold space. This averaged view requires that the structural similarities between proteins having the same metafold be recognized by multiple classification systems. Thus, the consensus map is useful for researchers looking for fold similarities that are relatively independent of the method used to compare proteins. The 30 most populated metafolds, representing the folds of about half of a nonredundant subset of the PDB, are presented here. The full list of metafolds is presented on the Web.  相似文献   

15.
BACKGROUND: Structures that have diverged from a common ancestor often retain functional and sequence similarity, although the latter may be very reduced. Even so, the overall fold of the structure is generally highly conserved. Now however, several have been identified of proteins that have been identified that have different functions but which have converged to a similar fold. These proteins will also have low sequence identities. RESULTS: By comparing the complete structure databank against itself, using sequence and structure alignment techniques, we have been able to identify six new examples of structurally related folds that have no apparent sequence or functional similarity. These related proteins include a family of crambin-like folds and a family of ferredoxin II folds. We found that all the similarities between structures are present in small proteins and occur as motifs within the core of a larger protein. CONCLUSION: The low sequence similarity and the lack of any obvious functional relationship between proteins with similar structures suggest that the proteins have diverged from independent ancestors. The similarities may therefore be of interest for understanding the various stereochemical and physical criteria that operate to generate a favourable fold.  相似文献   

16.
Here we present evidence that domains in soluble proteins containing either the GXXXG or GXXXA motif are stabilized by the interaction of a beta-strand with the following alpha-helix. As an example, we characterized a beta-strand-helix interaction from the FAD or NAD(P)-binding Rossmann fold. The Rossmann fold is one of the three most highly represented folds in the Protein Data Bank (PDB). A subset of the proteins that adopt the Rossmann fold also bind to nucleotide cofactors such as FAD and NAD(P) and function as oxidoreductases. These Rossmann folds can often be identified by the short amino acid sequence motif, GX(1-2)GXXG. Here, we present evidence that in addition to this sequence motif, Rossmann folds that bind FAD and NAD(P) also typically contain either GXXXG or GXXXA motifs, where the first glycyl residue of these motifs and the third glycyl residue of the GX(1-2)GXXG motif are the same residue. These two motifs appear to stabilize the Rossmann fold: the first glycyl residue of either the GXXXG or GXXXA motif contacts the carbonyl oxygen atom from the first glycyl residue of the GX(1-2)GXXG motif consistent with the formation of a C(alpha)-H cdots, three dots, centered O hydrogen bond. In addition, both the glycyl and alanyl residues of the GXXXG or GXXXA motifs form van der Waals interactions with either a valine or isoleucine residue located either seven or eight residues further back along the polypeptide chain from the first glycine of the GXXXG or GXXXA motifs. Therefore, we combine both the GX(1-2)GXXG and GXXXG/A motifs into an extended motif, V/IXGX(1-2)GXXGXXXG/A, that is more strongly indicative than previously described motifs of Rossmann folds that bind FAD or NAD(P). The V/IXGX(1-2)GXXGXXXG/A motif can be used to search genomic sequence data and to annotate the function of proteins containing the motif as oxidoreductases, including proteins of previously unknown function.  相似文献   

17.
Ochagavía ME  Wodak S 《Proteins》2004,55(2):436-454
MALECON is a progressive combinatorial procedure for multiple alignments of protein structures. It searches a library of pairwise alignments for all three-protein alignments in which a specified number of residues is consistently aligned. These alignments are progressively expanded to include additional proteins and more spatially equivalent residues, subject to certain criteria. This action involves superimposing the aligned proteins by their hitherto equivalent residues and searching for additional Calpha atoms that lie close in space. The performance of MALECON is illustrated and compared with several extant multiple structure alignment methods by using as test the globin homologous superfamily, the OB and the Jellyrolls folds. MALECON gives better definitions of the common structural features in the structurally more diverse proteins of the OB and Jellyrolls folds, but it yields comparable results for the more similar globins. When no consistent multiple alignments can be derived for all members of a protein group, our procedure is still capable of automatically generating consistent alignments and common core definitions for subgroups of the members. This finding is illustrated for proteins of the OB fold and SH3 domains, believed to share common structural features, and should be very instrumental in homology modeling and investigations of protein evolution.  相似文献   

18.
In the fold recognition approach to structure prediction, a sequence is tested for compatibility with an already known fold. For membrane proteins, however, few folds have been determined experimentally. Here the feasibility of computing the vast majority of likely membrane protein folds is tested. The results indicate that conformation space can be effectively sampled for small numbers of helices. The vast majority of potential monomeric membrane protein structures can be represented by about 30-folds for three helices, but increases exponentially to about 1,500,000 folds for seven helices. The generated folds could serve as templates for fold recognition or as starting points for conformational searches that are well distributed throughout conformation space.  相似文献   

19.
The general similarity in the forces governing protein folding and protein-protein associations has led us to examine the similarity in the architectural motifs between the interfaces and the monomers. We have carried out extensive, all-against-all structural comparisons between the single-chain protein structural dataset and the interface dataset, derived both from all protein-protein complexes in the structural database and from interfaces generated via an automated crystal symmetry operation. We show that despite the absence of chain connections, the global features of the architectural motifs, present in monomers, recur in the interfaces, a reflection of the limited set of the folding patterns. However, although similarity has been observed, the details of the architectural motifs vary. In particular, the extent of the similarity correlates with the consideration of how the interface has been formed. Interfaces derived from two-state model complexes, where the chains fold cooperatively, display a considerable similarity to architectures in protein cores, as judged by the quality of their geometric superposition. On the other hand, the three-state model interfaces, representing binding of already folded molecules, manifest a larger variability and resemble the monomer architecture only in general outline. The origin of the difference between the monomers and the three-state model interfaces can be understood in terms of the different nature of the folding and the binding that are involved. Whereas in the former all degrees of freedom are available to the backbone to maximize favorable interactions, in rigid body, three-state model binding, only six degrees of freedom are allowed. Hence, residue or atom pair-wise potentials derived from protein-protein associations are expected to be less accurate, substantially increasing the number of computationally acceptable alternate binding modes (Finkelstein et al., 1995).  相似文献   

20.
The crystal structure of a complex of methyl-alpha-D-mannoside with banana lectin from Musa paradisiaca reveals two primary binding sites in the lectin, unlike in other lectins with beta-prism I fold which essentially consists of three Greek key motifs. It has been suggested that the fold evolved through successive gene duplication and fusion of an ancestral Greek key motif. In other lectins, all from dicots, the primary binding site exists on one of the three motifs in the three-fold symmetric molecule. Banana is a monocot, and the three motifs have not diverged enough to obliterate sequence similarity among them. Two Greek key motifs in it carry one primary binding site each. A common secondary binding site exists on the third Greek key. Modelling shows that both the primary sites can support 1-2, 1-3, and 1-6 linked mannosides with the second residue interacting in each case primarily with the secondary binding site. Modelling also readily leads to a bound branched mannopentose with the nonreducing ends of the two branches anchored at the two primary binding sites, providing a structural explanation for the lectin's specificity for branched alpha-mannans. A comparison of the dimeric banana lectin with other beta-prism I fold lectins, provides interesting insights into the variability in their quaternary structure.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号