首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
2.
The main mechanisms shaping the modular evolution of proteins are gene duplication, fusion and fission, recombination and loss of fragments. While a large body of research has focused on duplications and fusions, we concentrated, in this study, on how domains are lost. We investigated motif databases and introduced a measure of protein similarity that is based on domain arrangements. Proteins are represented as strings of domains and comparison was based on the classic dynamic alignment scheme. We found that domain losses and duplications were more frequent at the ends of proteins. We showed that losses can be explained by the introduction of start and stop codons which render the terminal domains nonfunctional, such that further shortening, until the whole domain is lost, is not evolutionarily selected against. We demonstrated that domains which also occur as single-domain proteins are less likely to be lost at the N terminus and in the middle, than at the C terminus. We conclude that fission/fusion events with single-domain proteins occur mostly at the C terminus. We found that domain substitutions are rare, in particular in the middle of proteins. We also showed that many cases of substitutions or losses result from erroneous annotations, but we were also able to find courses of evolutionary events where domains vanish over time. This is explained by a case study on the bacterial formate dehydrogenases.  相似文献   

3.
Many proteins consist of several structural domains. These multi-domain proteins have likely been generated by selective genome growth dynamics during evolution to perform new functions as well as to create structures that fold on a biologically feasible time scale. Domain units frequently evolved through a variety of genetic shuffling mechanisms. Here we examine the protein domain statistics of more than 1000 organisms including eukaryotic, archaeal and bacterial species. The analysis extends earlier findings on asymmetric statistical laws for proteome to a wider variety of species. While proteins are composed of a wide range of domains, displaying a power-law decay, the computation of domain families for each protein reveals an exponential distribution, characterizing a protein universe composed of a thin number of unique families. Structural studies in proteomics have shown that domain repeats, or internal duplicated domains, represent a small but significant fraction of genome. In spite of its importance, this observation has been largely overlooked until recently. We model the evolutionary dynamics of proteome and demonstrate that these distinct distributions are in fact rooted in an internal duplication mechanism. This process generates the contemporary protein structural domain universe, determines its reduced thickness, and tames its growth. These findings have important implications, ranging from protein interaction network modeling to evolutionary studies based on fundamental mechanisms governing genome expansion.  相似文献   

4.
During evolution genes can produce more complex proteins by gene fusion or less complex proteins by gene fission. Considering proteins from 131 completely sequenced genomes from all three kingdoms of life, we identified 2869 groups of multi-domain proteins as a single protein in certain organisms and as two or more smaller proteins with equivalent domain architectures in other organisms. We found that fusion events are approximately four times more common than fission events, and we established that, in most cases, any particular fusion or fission event only occurred once during the course of evolution.  相似文献   

5.
Superfamily classifications are based variably on similarity of sequences, global folds, local structures, or functions. We have examined the possibility of defining superfamilies purely from the viewpoint of the global fold/function relationship. For this purpose, we first classified protein domains according to the beta-sheet topology. We then introduced the concept of kinship relations among the classified beta-sheet topology by assuming that the major elementary event leading to creation of a new beta-sheet topology is either an addition or deletion of one beta-strand at the edge of an existing beta-sheet during the molecular evolution. Based on this kinship relation, a network of protein domains was constructed so that the distance between a pair of domains represents the number of evolutionary events that lead one from the other domain. We then mapped on it all known domains with a specific core chemical function (here taken, as an example, that involving ATP or its analogs). Careful analyses revealed that the domains are found distributed on the network as >20 mutually disjointed clusters. The proteins in each cluster are defined to form a fold-based superfamily. The results indicate that >20 ATP-binding protein superfamilies have been invented independently in the process of molecular evolution, and the conservative evolutionary diffusion of global folds and functions is the origin of the relationship between them.  相似文献   

6.
Most proteins comprise one or several domains. New domain architectures can be created by combining previously existing domains. The elementary events that create new domain architectures may be categorized into three classes, namely domain(s) insertion or deletion (indel), exchange and repetition. Using 'DomainTeam', a tool dedicated to the search for microsyntenies of domains, we quantified the relative contribution of these events. This tool allowed us to collect homologous bacterial genes encoding proteins that have obviously evolved by modular assembly of domains. We show that indels are the most frequent elementary events and that they occur in most cases at either the N- or C-terminus of the proteins. As revealed by the genomic neighbourhood/context of the corresponding genes, we show that a substantial number of these terminal indels are the consequence of gene fusions/fissions. We provide evidence showing that the contribution of gene fusion/fission to the evolution of multi-domain bacterial proteins is lower-bounded by 27% and upper-bounded by 64%. We conclude that gene fusion/fission is a major contributor to the evolution of multi-domain bacterial proteins.  相似文献   

7.
Particularly in higher eukaryotes, some protein domains are found in tandem repeats, performing broad functions often related to cellular organization. For instance, the eukaryotic protein filamin interacts with many proteins and is crucial for the cytoskeleton. The functional properties of long repeat domains are governed by the specific properties of each individual domain as well as by the repeat copy number. To provide better understanding of the evolutionary and functional history of repeating domains, we investigated the mode of evolution of the filamin domain in some detail. Among the domains that are common in long repeat proteins, sushi and spectrin domains evolve primarily through cassette tandem duplications while scavenger and immunoglobulin repeats appear to evolve through clustered tandem duplications. Additionally, immunoglobulin and filamin repeats exhibit a unique pattern where every other domain shows high sequence similarity. This pattern may be the result of tandem duplications, serve to avert aggregation between adjacent domains or it is the result of functional constraints. In filamin, our studies confirm the presence of interspersed integrin binding domains in vertebrates, while invertebrates exhibit more varied patterns, including more clustered integrin binding domains. The most notable case is leech filamin, which contains a 20 repeat expansion and exhibits unique dimerization topology. Clearly, invertebrate filamins are varied and contain examples of similar adjacent integrin-binding domains. Given that invertebrate integrin shows more similarity to the weaker filamin binder, integrin β3, it is possible that the distance between integrin-binding domains is not as crucial for invertebrate filamins as for vertebrates.  相似文献   

8.
Many proteins, especially in eukaryotes, contain tandem repeats of several domains from the same family. These repeats have a variety of binding properties and are involved in protein–protein interactions as well as binding to other ligands such as DNA and RNA. The rapid expansion of protein domain repeats is assumed to have evolved through internal tandem duplications. However, the exact mechanisms behind these tandem duplications are not well-understood. Here, we have studied the evolution, function, protein structure, gene structure, and phylogenetic distribution of domain repeats. For this purpose we have assigned Pfam-A domain families to 24 proteomes with more sensitive domain assignments in the repeat regions. These assignments confirmed previous findings that eukaryotes, and in particular vertebrates, contain a much higher fraction of proteins with repeats compared with prokaryotes. The internal sequence similarity in each protein revealed that the domain repeats are often expanded through duplications of several domains at a time, while the duplication of one domain is less common. Many of the repeats appear to have been duplicated in the middle of the repeat region. This is in strong contrast to the evolution of other proteins that mainly works through additions of single domains at either terminus. Further, we found that some domain families show distinct duplication patterns, e.g., nebulin domains have mainly been expanded with a unit of seven domains at a time, while duplications of other domain families involve varying numbers of domains. Finally, no common mechanism for the expansion of all repeats could be detected. We found that the duplication patterns show no dependence on the size of the domains. Further, repeat expansion in some families can possibly be explained by shuffling of exons. However, exon shuffling could not have created all repeats.  相似文献   

9.
10.
Domains are the evolutionary units that comprise proteins, and most proteins are built from more than one domain. Domains can be shuffled by recombination to create proteins with new arrangements of domains. Using structural domain assignments, we examined the combinations of domains in the proteins of 131 completely sequenced organisms. We found two-domain and three-domain combinations that recur in different protein contexts with different partner domains. The domains within these combinations have a particular functional and spatial relationship. These units are larger than individual domains and we term them "supra-domains". Amongst the supra-domains, we identified some 1400 (1203 two-domain and 166 three-domain) combinations that are statistically significantly over-represented relative to the occurrence and versatility of the individual component domains. Over one-third of all structurally assigned multi-domain proteins contain these over-represented supra-domains. This means that investigation of the structural and functional relationships of the domains forming these popular combinations would be particularly useful for an understanding of multi-domain protein function and evolution as well as for genome annotation. These and other supra-domains were analysed for their versatility, duplication, their distribution across the three kingdoms of life and their functional classes. By examining the three-dimensional structures of several examples of supra-domains in different biological processes, we identify two basic types of spatial relationships between the component domains: the combined function of the two domains is such that either the geometry of the two domains is crucial and there is a tight constraint on the interface, or the precise orientation of the domains is less important and they are spatially separate. Frequently, the role of the supra-domain becomes clear only once the three-dimensional structure is known. Since this is the case for only a quarter of the supra-domains, we provide a list of the most important unknown supra-domains as potential targets for structural genomics projects.  相似文献   

11.
It has been observed that the evolutionary distances of interacting proteins often display a higher level of similarity than those of noninteracting proteins. This finding indicates that interacting proteins are subject to common evolutionary constraints and constitutes the basis of a method to predict protein interactions known as mirrortree. It has been difficult, however, to identify the direct cause of the observed similarities between evolutionary trees. One possible explanation is the existence of compensatory mutations between partners' binding sites to maintain proper binding. This explanation, though, has been recently challenged, and it has been suggested that the signal of correlated evolution uncovered by the mirrortree method is unrelated to any correlated evolution between binding sites. We examine the contribution of binding sites to the correlation between evolutionary trees of interacting domains. We show that binding neighborhoods of interacting proteins have, on average, higher coevolutionary signal compared with the regions outside binding sites; however, when the binding neighborhood is removed, the remaining domain sequence still contains some coevolutionary signal. In conclusion, the correlation between evolutionary trees of interacting domains cannot exclusively be attributed to the correlated evolution of the binding sites or to common evolutionary pressure exerted on the whole protein domain sequence, each of which contributes to the signal measured by the mirrortree approach.  相似文献   

12.
MOTIVATION: Multi-domain proteins have evolved by insertions or deletions of distinct protein domains. Tracing the history of a certain domain combination can be important for functional annotation of multi-domain proteins, and for understanding the function of individual domains. In order to analyze the evolutionary history of the domains in modular proteins it is desirable to inspect a phylogenetic tree based on sequence divergence with the modular architecture of the sequences superimposed on the tree. RESULT: A Java applet, NIFAS, that integrates graphical domain schematics for each sequence in an evolutionary tree was developed. NIFAS retrieves domain information from the Pfam database and uses CLUSTAL W to calculate a tree for a given Pfam domain. The tree can be displayed with symbolic bootstrap values, and to allow the user to focus on a part of the tree, the layout can be altered by swapping nodes, changing the outgroup, and showing/collapsing subtrees. NIFAS is integrated with the Pfam database and is accessible over the internet (http://www.cgr.ki.se/Pfam). As an example, we use NIFAS to analyze the evolution of domains in Protein Kinases C.  相似文献   

13.
Eukaryotic genomes encode a considerably higher fraction of multi-domain proteins than their prokaryotic counterparts. It has been postulated that efficient co-translational and sequential domain folding has facilitated the explosive evolution of multi-domain proteins in eukaryotes by the recombination of pre-existent domains. Here, we tested whether eukaryotes and bacteria differ generally in the folding efficiency of multi-domain proteins generated by domain recombination. To this end, we compared the folding behavior of a series of recombinant proteins comprised of green fluorescent protein (GFP) fused to four different robustly folding proteins through six different linkers upon expression in Escherichia coli and the yeast Saccharomyces cerevisiae. We found that, unlike yeast, bacteria are remarkably inefficient at folding these fusion proteins, even at comparable levels of expression. In vitro and in vivo folding experiments demonstrate that the GFP domain imposes significant constraints on de novo folding of its fusion partners in bacteria, consistent with a largely post-translational folding mechanism. This behavior may result from an interference of GFP with adjacent domains during folding due to the particular topology of the beta-barrel GFP structure. By following the accumulation of enzymatic activity, we found that the rate of appearance of correctly folded fusion protein per ribosome is indeed considerably higher in yeast than in bacteria.  相似文献   

14.
The function of most proteins is accomplished through the interplay of two or more protein domains and fine-tuned by natural evolution. In contrast, artificial enzymes have often been engineered from a single domain scaffold and frequently have lower catalytic activity than natural enzymes. We previously generated an artificial enzyme that catalyzed an RNA ligation by >2 million-fold but was likely limited in its activity by low substrate affinity. Inspired by nature''s concept of domain fusion, we fused the artificial enzyme to a series of protein domains known to bind nucleic acids with the goal of improving its catalytic activity. The effect of the fused domains on catalytic activity varied greatly, yielding severalfold increases but also reductions caused by domains that previously enhanced nucleic acid binding in other protein engineering projects. The combination of the two better performing binding domains improved the activity of the parental ligase by more than an order of magnitude. These results demonstrate for the first time that nature''s successful evolutionary mechanism of domain fusion can also improve an unevolved primordial-like protein whose structure and function had just been created in the test tube. The generation of multi-domain proteins might therefore be an ancient evolutionary process.  相似文献   

15.
Understanding relationships between sequence, structure, and evolution is important for functional characterization of proteins. Here, we define a novel DOM-fold as a consensus structure of the domains in DmpA (L-aminopeptidase D-Ala-esterase/amidase), OAT (ornithine acetyltransferase), and MocoBD (molybdenum cofactor-binding domain), and discuss possible evolutionary scenarios of its origin. As shown by a comprehensive structure similarity search, DOM-fold distinguished by a two-layered beta/alpha architecture of a particular topology with unusual crossing loops is unique to those three protein families. DmpA and OAT are evolutionarily related as indicated by their sequence, structural, and functional similarities. Structural similarity between the DmpA/OAT superfamily and the MocoBD domains has not been reported before. Contrary to previous reports, we conclude that functional similarities between DmpA/OAT proteins and N-terminal nucleophile (Ntn) hydrolases are convergent and are unlikely to be inherited from a common ancestor.  相似文献   

16.
Comparative studies of the proteomes from different organisms have provided valuable information about protein domain distribution in the kingdoms of life. Earlier studies have been limited by the fact that only about 50% of the proteomes could be matched to a domain. Here, we have extended these studies by including less well-defined domain definitions, Pfam-B and clustered domains, MAS, in addition to Pfam-A and SCOP domains. It was found that a significant fraction of these domain families are homologous to Pfam-A or SCOP domains. Further, we show that all regions that do not match a Pfam-A or SCOP domain contain a significantly higher fraction of disordered structure. These unstructured regions may be contained within orphan domains or function as linkers between structured domains. Using several different definitions we have re-estimated the number of multi-domain proteins in different organisms and found that several methods all predict that eukaryotes have approximately 65% multi-domain proteins, while the prokaryotes consist of approximately 40% multi-domain proteins. However, these numbers are strongly dependent on the exact choice of cut-off for domains in unassigned regions. In conclusion, all eukaryotes have similar fractions of multi-domain proteins and disorder, whereas a high fraction of repeating domain is distinguished only in multicellular eukaryotes. This implies a role for repeats in cell-cell contacts while the other two features are important for intracellular functions.  相似文献   

17.
18.
Using structural similarity clustering of protein domains: protein domain universe graph (PDUG), and a hierarchical functional annotation: gene ontology (GO) as two evolutionary lenses, we find that each structural cluster (domain fold) exhibits a distribution of functions that is unique to it. These functional distributions are functional fingerprints that are specific to characteristic structural clusters and vary from cluster to cluster. Furthermore, as structural similarity threshold for domain clustering in the PDUG is relaxed we observe an influx of earlier-diverged domains into clusters. These domains join clusters without destroying the functional fingerprint. These results can be understood in light of a divergent evolution scenario that posits correlated divergence of structural and functional traits in protein domains from one or few progenitors.  相似文献   

19.
ABSTRACT: BACKGROUND: Proteins convey the majority of biochemical and cellular activities in organisms. Over the course of evolution, proteins undergo normal sequence mutations as well as large scale mutations involving domain duplication and/or domain shuffling. These events result in the generation of new proteins and protein families. Processes that affect proteome evolution drive species diversity and adaptation. Herein, change over the course of metazoan evolution, as defined by birth/death and duplication/deletion events within protein families and domains, was examined using the proteomes of 9 metazoan and two outgroup species. RESULTS: In studying members of the three major metazoan groups, the vertebrates, arthropods, and nematodes, we found that the number of protein families increased at the majority of lineages over the course of metazoan evolution where the magnitude of these increases was greatest at the lineages leading to mammals. In contrast, the number of protein domains decreased at most lineages and at all terminal lineages. This resulted in a weak correlation between protein family birth and domain birth; however, the correlation between domain birth and domain member duplication was quite strong. These data suggest that domain birth and protein family birth occur via different mechanisms, and that domain shuffling plays a role in the formation of protein families. The ratio of protein family birth to protein domain birth (domain shuffling index) suggests that shuffling had a more demonstrable effect on protein families in nematodes and arthropods than in vertebrates. Through the contrast of high and low domain shuffling indices at the lineages of Trichinella spiralis and Gallus gallus, we propose a link between protein redundancy and evolutionary changes controlled by domain shuffling; however, the speed of adaptation among the different lineages was relatively invariant. Evaluating the functions of protein families that appeared or disappeared at the last common ancestors (LCAs) of the three metazoan clades supports a correlation with organism adaptation. Furthermore, bursts of new protein families and domains in the LCAs of metazoans and vertebrates are consistent with whole genome duplications. CONCLUSION: Metazoan speciation and adaptation were explored by birth/death and duplication/deletion events among protein families and domains. Our results provide insights into protein evolution and its bearing on metazoan evolution.  相似文献   

20.
Annotations of the genes and their products are largely guided by inferring homology. Sequence similarity is the primary measure used for annotation purpose however, the domain content and order were given less importance albeit the fact that domain insertion, deletion, positional changes can bring in functional varieties. Of late, several methods developed quantify domain architecture similarity depending on alignments of their sequences and are focused on only homologous proteins. We present an alignment-free domain architecture-similarity search (ADASS) algorithm that identifies proteins that share very poor sequence similarity yet having similar domain architectures. We introduce a “singlet matching-triplet comparison” method in ADASS, wherein triplet of domains is compared with other triplets in a pair-wise comparison of two domain architectures. Different events in the triplet comparison are scored as per a scoring scheme and an average pairwise distance score (Domain Architecture Distance score - DAD Score) is calculated between protein domains architectures. We use domain architectures of a selected domain termed as centric domain and cluster them based on DAD score. The algorithm has high Positive Prediction Value (PPV) with respect to the clustering of the sequences of selected domain architectures. A comparison of domain architecture based dendrograms using ADASS method and an existing method revealed that ADASS can classify proteins depending on the extent of domain architecture level similarity. ADASS is more relevant in cases of proteins with tiny domains having little contribution to the overall sequence similarity but contributing significantly to the overall function.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号