首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The RNA world hypothesis, that RNA genomes and catalysts preceded DNA genomes and genetically-encoded protein catalysts, has been central to models for the early evolution of life on Earth. A key part of such models is continuity between the earliest stages in the evolution of life and the RNA repertoires of extant lineages. Some assessments seem consistent with a diverse RNA world, yet direct continuity between modern RNAs and an RNA world has not been demonstrated for the majority of RNA families, and, anecdotally, many RNA functions appear restricted in their distribution. Despite much discussion of the possible antiquity of RNA families, no systematic analyses of RNA family distribution have been performed. To chart the broad evolutionary history of known RNA families, we performed comparative genomic analysis of over 3 million RNA annotations spanning 1446 families from the Rfam 10 database. We report that 99% of known RNA families are restricted to a single domain of life, revealing discrete repertoires for each domain. For the 1% of RNA families/clans present in more than one domain, over half show evidence of horizontal gene transfer (HGT), and the rest show a vertical trace, indicating the presence of a complex protein synthesis machinery in the Last Universal Common Ancestor (LUCA) and consistent with the evolutionary history of the most ancient protein-coding genes. However, with limited interdomain transfer and few RNA families exhibiting demonstrable antiquity as predicted under RNA world continuity, our results indicate that the majority of modern cellular RNA repertoires have primarily evolved in a domain-specific manner.  相似文献   

2.
New directions in biology are being driven by the complete sequencing of genomes, which has given us the protein repertoires of diverse organisms from all kingdoms of life. In tandem with this accumulation of sequence data, worldwide structural genomics initiatives, advanced by the development of improved technologies in X-ray crystallography and NMR, are expanding our knowledge of structural families and increasing our fold libraries. Methods for detecting remote sequence similarities have also been made more sensitive and this means that we can map domains from these structural families onto genome sequences to understand how these families are distributed throughout the genomes and reveal how they might influence the functional repertoires and biological complexities of the organisms. We have used robust protocols to assign sequences from completed genomes to domain structures in the CATH database, allowing up to 60% of domain sequences in these genomes, depending on the organism, to be assigned to a domain family of known structure. Analysis of the distribution of these families throughout bacterial genomes identified more than 300 universal families, some of which had expanded significantly in proportion to genome size. These highly expanded families are primarily involved in metabolism and regulation and appear to make major contributions to the functional repertoire and complexity of bacterial organisms. When comparisons are made across all kingdoms of life, we find a smaller set of universal domain families (approx. 140), of which families involved in protein biosynthesis are the largest conserved component. Analysis of the behaviour of other families reveals that some (e.g. those involved in metabolism, regulation) have remained highly innovative during evolution, making it harder to trace their evolutionary ancestry. Structural analyses of metabolic families provide some insights into the mechanisms of functional innovation, which include changes in domain partnerships and significant structural embellishments leading to modulation of active sites and protein interactions.  相似文献   

3.
Ticks evolved various mechanisms to modulate their host's hemostatic and immune defenses. Differences in the anti-hemostatic repertoires suggest that hard and soft ticks evolved anti-hemostatic mechanisms independently, but raise questions on the conservation of salivary gland proteins in the ancestral tick lineage. To address this issue, the sialome (salivary gland secretory proteome) from the soft tick, Argas monolakensis, was determined by proteomic analysis and cDNA library construction of salivary glands from fed and unfed adult female ticks. The sialome is composed of approximately 130 secretory proteins of which the most abundant protein folds are the lipocalin, BTSP, BPTI and metalloprotease families which also comprise the most abundant proteins found in the salivary glands. Comparative analysis indicates that the major protein families are conserved in hard and soft ticks. Phylogenetic analysis shows, however, that most gene duplications are lineage specific, indicating that the protein families analyzed possibly evolved most of their functions after divergence of the two major tick families. In conclusion, the ancestral tick may have possessed a simple (few members for each family), but diverse (many different protein families) salivary gland protein domain repertoire.  相似文献   

4.
Several approaches, some of which are described in this issue, have been proposed to assemble a complete protein interaction map. These are often based on high throughput methods that explore the ability of each gene product to bind any other element of the proteome of the organism. Here we propose that a large number of interactions can be inferred by revealing the rules underlying recognition specificity of a small number (a few hundreds) of families of protein recognition modules. This can be achieved through the construction and characterization of domain repertoires. A domain repertoire is assembled in a combinatorial fashion by allowing each amino acid position in the binding site of a given protein recognition domain to vary to include all the residues allowed at that position in the domain family. The repertoire is then searched by phage display techniques with any target of interest and from the primary structure of the binding site of the selected domains one derives rules that are used to infer the formation of complexes between natural proteins in the cell.  相似文献   

5.
The 106 small molecule metabolic (SMM) pathways in Escherichia coli are formed by the protein products of 581 genes. We can define 722 domains, nearly all of which are homologous to proteins of known structure, that form all or part of 510 of these proteins. This information allows us to answer general questions on the structural anatomy of the SMM pathway proteins and to trace family relationships and recruitment events within and across pathways. Half the gene products contain a single domain and half are formed by combinations of between two and six domains. The 722 domains belong to one of 213 families that have between one and 51 members. Family members usually conserve their catalytic or cofactor binding properties; substrate recognition is rarely conserved. Of the 213 families, members of only a quarter occur in isolation, i.e. they form single-domain proteins. Most members of the other families combine with domains from just one or two other families and a few more versatile families can combine with several different partners.Excluding isoenzymes, more than twice as many homologues are distributed across pathways as within pathways. However, serial recruitment, with two consecutive enzymes both being recruited to another pathway, is rare and recruitment of three consecutive enzymes is not observed. Only eight of the 106 pathways have a high number of homologues. Homology between consecutive pairs of enzymes with conservation of the main substrate-binding site but change in catalytic mechanism (which would support a simple model of retrograde pathway evolution) occurs only six times in the whole set of enzymes. Most of the domains that form SMM pathways have homologues in non-SMM pathways. Taken together, these results imply a pervasive "mosaic" model for the formation of protein repertoires and pathways.  相似文献   

6.

Background  

It is well known that different species have different protein domain repertoires, and indeed that some protein domains are kingdom specific. This information has not yet been incorporated into statistical methods for finding domains in sequences of amino acids.  相似文献   

7.
ABSTRACT: BACKGROUND: Proteins convey the majority of biochemical and cellular activities in organisms. Over the course of evolution, proteins undergo normal sequence mutations as well as large scale mutations involving domain duplication and/or domain shuffling. These events result in the generation of new proteins and protein families. Processes that affect proteome evolution drive species diversity and adaptation. Herein, change over the course of metazoan evolution, as defined by birth/death and duplication/deletion events within protein families and domains, was examined using the proteomes of 9 metazoan and two outgroup species. RESULTS: In studying members of the three major metazoan groups, the vertebrates, arthropods, and nematodes, we found that the number of protein families increased at the majority of lineages over the course of metazoan evolution where the magnitude of these increases was greatest at the lineages leading to mammals. In contrast, the number of protein domains decreased at most lineages and at all terminal lineages. This resulted in a weak correlation between protein family birth and domain birth; however, the correlation between domain birth and domain member duplication was quite strong. These data suggest that domain birth and protein family birth occur via different mechanisms, and that domain shuffling plays a role in the formation of protein families. The ratio of protein family birth to protein domain birth (domain shuffling index) suggests that shuffling had a more demonstrable effect on protein families in nematodes and arthropods than in vertebrates. Through the contrast of high and low domain shuffling indices at the lineages of Trichinella spiralis and Gallus gallus, we propose a link between protein redundancy and evolutionary changes controlled by domain shuffling; however, the speed of adaptation among the different lineages was relatively invariant. Evaluating the functions of protein families that appeared or disappeared at the last common ancestors (LCAs) of the three metazoan clades supports a correlation with organism adaptation. Furthermore, bursts of new protein families and domains in the LCAs of metazoans and vertebrates are consistent with whole genome duplications. CONCLUSION: Metazoan speciation and adaptation were explored by birth/death and duplication/deletion events among protein families and domains. Our results provide insights into protein evolution and its bearing on metazoan evolution.  相似文献   

8.
Interactive Tree Of Life (iTOL) is a web-based tool for the display, manipulation and annotation of phylogenetic trees. Trees can be interactively pruned and re-rooted. Various types of data such as genome sizes or protein domain repertoires can be mapped onto the tree. Export to several bitmap and vector graphics formats is supported. AVAILABILITY: iTOL is available at http://itol.embl.de  相似文献   

9.
In the postgenomic era, one of the most interesting and important challenges is to understand protein interactions on a large scale. The physical interactions between protein domains are fundamental to the workings of a cell: in multi-domain polypeptide chains, in multi-subunit proteins and in transient complexes between proteins that also exist independently. To study the large-scale patterns and evolution of interactions between protein domains, we view interactions between protein domains in terms of the interactions between structural families of evolutionarily related domains. This allows us to classify 8151 interactions between individual domains in the Protein Data Bank and the yeast Saccharomyces cerevisiae in terms of 664 types of interactions, between protein families. At least 51 interactions do not occur in the Protein Data Bank and can only be derived from the yeast data. The map of interactions between protein families has the form of a scale-free network, meaning that most protein families only interact with one or two other families, while a few families are extremely versatile in their interactions and are connected to many families. We observe that almost half of all known families engage in interactions with domains from their own family. We also see that the repertoires of interactions of domains within and between polypeptide chains overlap mostly for two specific types of protein families: enzymes and same-family interactions. This suggests that different types of protein interaction repertoires exist for structural, functional and regulatory reasons. Copyright 12001 Academic Press.  相似文献   

10.
High divergence in protein sequences makes the detection of distant protein relationships through homology-based approaches challenging. Grouping protein sequences into families, through similarities in either sequence or 3-D structure, facilitates in the improved recognition of protein relationships. In addition, strategically designed protein-like sequences have been shown to bridge distant structural domain families by serving as artificial linkers. In this study, we have augmented a search database of known protein domain families with such designed sequences, with the intention of providing functional clues to domain families of unknown structure. When assessed using representative query sequences from each family, we obtain a success rate of 94% in protein domain families of known structure. Further, we demonstrate that the augmented search space enabled fold recognition for 582 families with no structural information available a priori. Additionally, we were able to provide reliable functional relationships for 610 orphan families. We discuss the application of our method in predicting functional roles through select examples for DUF4922, DUF5131, and DUF5085. Our approach also detects new associations between families that were previously not known to be related, as demonstrated through new sub-groups of the RNA polymerase domain among three distinct RNA viruses. Taken together, designed sequences-augmented search databases direct the detection of meaningful relationships between distant protein families. In turn, they enable fold recognition and offer reliable pointers to potential functional sites that may be probed further through direct mutagenesis studies.  相似文献   

11.
Immunoglobulin superfamily proteins in Caenorhabditis elegans   总被引:2,自引:0,他引:2  
  相似文献   

12.
Protein kinases phosphorylating Ser/Thr/Tyr residues in several cellular proteins exert tight control over their biological functions. They constitute the largest protein family in most eukaryotic species. Protein kinases classified based on sequence similarity in their catalytic domains, cluster into subfamilies, which share gross functional properties. Many protein kinases are associated or tethered covalently to domains that serve as adapter or regulatory modules, aiding substrate recruitment, specificity, and also serve as scaffolds. Hence the modular organisation of the protein kinases serves as guidelines to their functional and molecular properties. Analysis of genomic repertoires of protein kinases in eukaryotes have revealed wide spectrum of domain organisation across various subfamilies of kinases. Occurrence of organism-specific novel domain combinations suggests functional diversity achieved by protein kinases in order to regulate variety of biological processes. In addition, domain architecture of protein kinases revealed existence of hybrid protein kinase subfamilies and their emerging roles in the signaling of eukaryotic organisms. In this review we discuss the repertoire of non-kinase domains tethered to multi-domain kinases in the metazoans. Similarities and differences in the domain architectures of protein kinases in these organisms indicate conserved and unique features that are critical to functional specialization.  相似文献   

13.
Domains are considered as the basic units of protein folding, evolution, and function. Decomposing each protein into modular domains is thus a basic prerequisite for accurate functional classification of biological molecules. Here, we present ADDA, an automatic algorithm for domain decomposition and clustering of all protein domain families. We use alignments derived from an all-on-all sequence comparison to define domains within protein sequences based on a global maximum likelihood model. In all, 90% of domain boundaries are predicted within 10% of domain size when compared with the manual domain definitions given in the SCOP database. A representative database of 249,264 protein sequences were decomposed into 450,462 domains. These domains were clustered on the basis of sequence similarities into 33,879 domain families containing at least two members with less than 40% sequence identity. Validation against family definitions in the manually curated databases SCOP and PFAM indicates almost perfect unification of various large domain families while contamination by unrelated sequences remains at a low level. The global survey of protein-domain space by ADDA confirms that most large and universal domain families are already described in PFAM and/or SMART. However, a survey of the complete set of mobile modules leads to the identification of 1479 new interesting domain families which shuffle around in multi-domain proteins. The data are publicly available at ftp://ftp.ebi.ac.uk/pub/contrib/heger/adda.  相似文献   

14.
Lee D  Grant A  Marsden RL  Orengo C 《Proteins》2005,59(3):603-615
Using a new protocol, PFscape, we undertake a systematic identification of protein families and domain architectures in 120 complete genomes. PFscape clusters sequences into protein families using a Markov clustering algorithm (Enright et al., Nucleic Acids Res 2002;30:1575-1584) followed by complete linkage clustering according to sequence identity. Within each protein family, domains are recognized using a library of hidden Markov models comprising CATH structural and Pfam functional domains. Domain architectures are then determined using DomainFinder (Pearl et al., Protein Sci 2002;11:233-244) and the protein family and domain architecture data are amalgamated in the Gene3D database (Buchan et al., Genome Res 2002;12:503-514). Using Gene3D, we have investigated protein sequence space, the extent of structural annotation, and the distribution of different domain architectures in completed genomes from all kingdoms of life. As with earlier studies by other researchers, the distribution of domain families shows power-law behavior such that the largest 2,000 domain families can be mapped to approximately 70% of nonsingleton genome sequences; the remaining sequences are assigned to much smaller families. While approximately 50% of domain annotations within a genome are assigned to 219 universal domain families, a much smaller proportion (< 10%) of protein sequences are assigned to universal protein families. This supports the mosaic theory of evolution whereby domain duplication followed by domain shuffling gives rise to novel domain architectures that can expand the protein functional repertoire of an organism. Functional data (e.g. COG/KEGG/GO) integrated within Gene3D result in a comprehensive resource that is currently being used in structure genomics initiatives and can be accessed via http://www.biochem.ucl.ac.uk/bsm/cath/Gene3D/.  相似文献   

15.
Evolutionary origins of genomic repertoires in bacteria   总被引:7,自引:0,他引:7       下载免费PDF全文
Explaining the diversity of gene repertoires has been a major problem in modern evolutionary biology. In eukaryotes, this diversity is believed to result mainly from gene duplication and loss, but in prokaryotes, lateral gene transfer (LGT) can also contribute substantially to genome contents. To determine the histories of gene inventories, we conducted an exhaustive analysis of gene phylogenies for all gene families in a widely sampled group, the γ-Proteobacteria. We show that, although these bacterial genomes display striking differences in gene repertoires, most gene families having representatives in several species have congruent histories. Other than the few vast multigene families, gene duplication has contributed relatively little to the contents of these genomes; instead, LGT, over time, provides most of the diversity in genomic repertoires. Most such acquired genes are lost, but the majority of those that persist in genomes are transmitted strictly vertically. Although our analyses are limited to the γ-Proteobacteria, these results resolve a long-standing paradox—i.e., the ability to make robust phylogenetic inferences in light of substantial LGT.  相似文献   

16.
POTRA (for polypeptide-transport-associated domain) is a novel domain identified in proteins of the ShlB, Toc75, D15 and FtsQ/DivIB families. In most cases, the POTRA domain is associated with a beta-barrel outer membrane domain and its function has been experimentally related to polypeptide transport in Toc75 (Tic-Toc protein import system in chloroplast) and ShlB families. In addition to potential key roles in protein transport across the outer membrane and in bacterial septation, the POTRA domain has attractive features for vaccine development in diseases such as cholera, meningitis, gonorrhoea and syphilis.  相似文献   

17.
During the past few years, substantial progress has been accomplished in the elucidation of the structural diversity of the lectin repertoires of invertebrates, protochordates and ectothermic vertebrates, providing particularly valuable information on those groups that constitute the invertebrate/vertebrate 'boundary'. Although representatives of lectin families typical of mammals, such as C-type lectins, galectins and pentraxins, have been described in these taxa, the detailed study of selected model species has yielded either novel variants of the structures described for the mammalian lectin representatives or novel lectin families with unique sequence motifs, multidomain arrangements and a new structural fold. Along with the high structural diversity of the lectin repertoires in these taxa, a wide spectrum of biological roles is starting to emerge, underscoring the value of invertebrate and lower vertebrate models for gaining insight into structural, functional and evolutionary aspects of lectins.  相似文献   

18.
The VASP-Spred-Sprouty domain puzzle   总被引:3,自引:0,他引:3  
Sprouty-related proteins with an EVH1 domain (Spreds) belong to a new protein family harboring a conserved N-terminal EVH1 domain, which is related to the VASP (vasodilator-stimulated phosphoprotein) EVH1 domain (Enabled/VASP homology 1 domain) and a C-terminal Sprouty-related domain, typical for Sprouty proteins. Spreds were, like Sproutys, initially discovered as inhibitors of the Ras/MAPK pathway, and the SPR (Sprouty-related) domains of both protein families seem to be very important for many protein interactions and cellular processes. VASP was initially characterized as a proline-rich substrate of protein kinases A and G in human platelets and later shown to be a scaffold protein, regulating both signal transduction pathways and the actin filament system. The VASP-EVH1 domain is known to bind specifically to a FP(4) binding motif, which is, for example, present in the focal adhesion proteins vinculin and zyxin. In this review we give a structural and functional overview on these three protein families and ask whether nature plays a modular protein domain puzzle with stable exchangeable elements or if these closely related domains have various functions when pasted in a different protein context.  相似文献   

19.

Background  

Proteins are comprised of one or several building blocks, known as domains. Such domains can be classified into families according to their evolutionary origin. Whereas sequencing technologies have advanced immensely in recent years, there are no matching computational methodologies for large-scale determination of protein domains and their boundaries. We provide and rigorously evaluate a novel set of domain families that is automatically generated from sequence data. Our domain family identification process, called EVEREST (EVolutionary Ensembles of REcurrent SegmenTs), begins by constructing a library of protein segments that emerge in an all vs. all pairwise sequence comparison. It then proceeds to cluster these segments into putative domain families. The selection of the best putative families is done using machine learning techniques. A statistical model is then created for each of the chosen families. This procedure is then iterated: the aforementioned statistical models are used to scan all protein sequences, to recreate a library of segments and to cluster them again.  相似文献   

20.
Many proteins, especially in eukaryotes, contain tandem repeats of several domains from the same family. These repeats have a variety of binding properties and are involved in protein–protein interactions as well as binding to other ligands such as DNA and RNA. The rapid expansion of protein domain repeats is assumed to have evolved through internal tandem duplications. However, the exact mechanisms behind these tandem duplications are not well-understood. Here, we have studied the evolution, function, protein structure, gene structure, and phylogenetic distribution of domain repeats. For this purpose we have assigned Pfam-A domain families to 24 proteomes with more sensitive domain assignments in the repeat regions. These assignments confirmed previous findings that eukaryotes, and in particular vertebrates, contain a much higher fraction of proteins with repeats compared with prokaryotes. The internal sequence similarity in each protein revealed that the domain repeats are often expanded through duplications of several domains at a time, while the duplication of one domain is less common. Many of the repeats appear to have been duplicated in the middle of the repeat region. This is in strong contrast to the evolution of other proteins that mainly works through additions of single domains at either terminus. Further, we found that some domain families show distinct duplication patterns, e.g., nebulin domains have mainly been expanded with a unit of seven domains at a time, while duplications of other domain families involve varying numbers of domains. Finally, no common mechanism for the expansion of all repeats could be detected. We found that the duplication patterns show no dependence on the size of the domains. Further, repeat expansion in some families can possibly be explained by shuffling of exons. However, exon shuffling could not have created all repeats.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号