首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Most eukaryotic proteins consist of multiple domains created through gene fusions or internal duplications. The most frequent change of a domain architecture (DA) is insertion or deletion of a domain at the N or C terminus. Still, the mechanisms underlying the evolution of multidomain proteins are not very well studied.Here, we have studied the evolution of multidomain architectures (MDA), guided by evolutionary information in the form of a phylogenetic tree. Our results show that Pfam domain families and MDAs have been created with comparable rates (0.1-1 per million years (My)). The major changes in DA evolution have occurred in the process of multicellularization and within the metazoan lineage. In contrast, creation of domains seems to have been frequent already in the early evolution. Furthermore, most of the architectures have been created from older domains or architectures, whereas novel domains are mainly found in single-domain proteins. However, a particular group of exon-bordering domains may have contributed to the rapid evolution of novel multidomain proteins in metazoan organisms. Finally, MDAs have evolved predominantly through insertions of domains, whereas domain deletions are less common.In conclusion, the rate of creation of multidomain proteins has accelerated in the metazoan lineage, which may partly be explained by the frequent insertion of exon-bordering domains into new architectures. However, our results indicate that other factors have contributed as well.  相似文献   

2.
Kim H  Sung S  Klein R 《Genetica》2007,131(1):59-68
In order to examine the evolution of lineage specific genes, we analyzed intron phase distributions and exon-bordering domains in primate and rodent specific genes. We found that the expansion of symmetric exon-bordering domains could not explain the evolution of lineage specific genes. Rather internal intron loss of a domain can partially explain the excess of class 1–1 intron phases in the lineage specific genes. We suggest the event that led to excess of symmetric exons in lineage specific genes had little bearing on shaping the phenotypes specific to the individual lineage. Instead, Kruppel-associated box (KRAB) proteins associated with zinc finger C2H2 (zf-C2H2) type are likely to be responsible for the lineage specific function.  相似文献   

3.
Diversity and evolution of the thyroglobulin type-1 domain superfamily   总被引:1,自引:0,他引:1  
Multidomain proteins are gaining increasing consideration for their puzzling, flexible utilization in nature. The presence of the characteristic thyroglobulin type-1 (Tg1) domain as a protein module in a variety of multicellular organisms suggests pivotal roles for this building block. To gain insight into the evolution of Tg1 domains, we performed searches of protein, expressed sequence tag, and genome databases. Tg1 domains were found to be Metazoa specific, and we retrieved a total of 170 Tg1 domain-containing protein sequences. Their architectures revealed a wide taxonomic distribution of proteins containing Tg1 domains followed or preceded by secreted protein, acidic, rich in cysteines (SPARC)-type extracellular calcium-binding domains. Other proteins contained lineage-specific domain combinations of peptidase inhibitory modules or domains with different biological functions. Phylogenetic analysis showed that Tg1 domains are highly conserved within protein structures, whereas insertion into novel proteins is followed by rapid diversification. Seven different basic types of protein architecture containing the Tg1 domain were identified in vertebrates. We examined the evolution of these protein groups by combining Tg1 domain phylogeny with additional analyses based on other characteristic domains. Testicans and secreted modular calcium binding protein (SMOCs) evolved from invertebrate homologs by introduction of vertebrate-specific domains, nidogen evolved by insertion of a Tg1 domain into a preexisting architecture, and the remaining four have unique architectures. Thyroglobulin, Trops, and the major histocompatibility complex class II-associated invariant chain are vertebrate specific, while an insulin-like growth factor-binding protein and nidogen were also identified in urochordates. Among vertebrates, we observed differences in protein repertoires, which result from gene duplication and domain duplication. Members of five groups have been characterized at the molecular level. All exhibit subtle differences in their specificities and function either as peptidase inhibitors (thyropins), substrates, or both. As far as the sequence is concerned, only a few conserved residues were identified. In combination with structural data, our analysis shows that the Tg1 domain fold is highly adaptive and comprises a relatively well-conserved core surrounded by highly variable loops that account for its multipurpose function in the animal kingdom.  相似文献   

4.

Background

Conserved domains are recognized as the building blocks of eukaryotic proteins. Domains showing a tendency to occur in diverse combinations (??promiscuous?? domains) are involved in versatile architectures in proteins with different functions. Current models, based on global-level analyses of domain combinations in multiple genomes, have suggested that the propensity of some domains to associate with other domains in high-level architectures increases with organismal complexity. Alternative models using domain-based phylogenetic trees propose that domains have become promiscuous independently in different lineages through convergent evolution and are, thus, random with no functional or structural preferences. Here we test whether complex protein architectures have occurred by accretion from simpler systems and whether the appearance of multidomain combinations parallels organismal complexity. As a model, we analyze the modular evolution of the PWWP domain and ask whether its appearance in combinations with other domains into multidomain architectures is linked with the occurrence of more complex life-forms. Whether high-level combinations of domains are conserved and transmitted as stable units (cassettes) through evolution is examined in the genomes of plant or metazoan species selected for their established position in the evolution of the respective lineages.

Results

Using the domain-tree approach, we analyze the evolutionary origins and distribution patterns of the promiscuous PWWP domain to understand the principles of its modular evolution and its existence in combination with other domains in higher-level protein architectures. We found that as a single module the PWWP domain occurs only in proteins with a limited, mainly, species-specific distribution. Earlier, it was suggested that domain promiscuity is a fast-changing (volatile) feature shaped by natural selection and that only a few domains retain their promiscuity status throughout evolution. In contrast, our data show that most of the multidomain PWWP combinations in extant multicellular organisms (humans or land plants) are present in their unicellular ancestral relatives suggesting they have been transmitted through evolution as conserved linear arrangements (??cassettes??). Among the most interesting biologically relevant results is the finding that the genes of the two plant Trithorax family subgroups (ATX1/2 and ATX3/4/5) have different phylogenetic origins. The two subgroups occur together in the earliest land plants Physcomitrella patens and Selaginella moellendorffii.

Conclusion

Gain/loss of a single PWWP domain is observed throughout evolution reflecting dynamic lineage- or species-specific events. In contrast, higher-level protein architectures involving the PWWP domain have survived as stable arrangements driven by evolutionary descent. The association of PWWP domains with the DNA methyltransferases in O. tauri and in the metazoan lineage seems to have occurred independently consistent with convergent evolution. Our results do not support models wherein more complex protein architectures involving the PWWP domain occur with the appearance of more evolutionarily advanced life forms.  相似文献   

5.
Huang QS  Xie XL  Liang G  Gong F  Wang Y  Wei XQ  Wang Q  Ji ZL  Chen QX 《Glycobiology》2012,22(1):23-34
The glycoside hydrolase 18 (GH18) family of chitinases is a multigene family that plays various roles, such as ecdysis, embryonic development, allergic inflammation and so on. Efforts are still needed to reveal their functional diversification in an evolutionary and systematic manner. We collected 85 GH18 genes from eukaryotic representatives. The domain architectures of GH18 proteins were analyzed and several conserved patterns were identified. It was observed that some (11 proteins) GH18 members in Ecdysozoa or fungi possess repeats of catalytic domains and/or chitin-binding domains (ChtBs). The domain repeats are likely to meet requirements for higher efficiency of chitin degradation in chitin-containing species. On the contrary, all vertebrate GH18 proteins contain no more than one catalytic domain or ChtB. The results from homologous analysis, domain architectures, exon arrangements and synteny loci supported two evolutionary paths for the GH18 family. One path experienced gene expansion and contraction several times during evolution, covering most of GH18 members except CHID1 (stabilin-1 interacting partner) and its homologs. Proteins in this path underwent frequent domain gain and loss, as well as domain recombination, that could achieve versatility in function. The other path is comparatively conserved. The CHID1 gene evolved without gene duplication except in Danio rerio. Domain architectures of CHID1 orthologs are all identical. The diverse phylogeny of the GH18 family in arthropod is also presented.  相似文献   

6.
7.
Phosphatidylinositol phosphates (PIPs, phosphoinositides) are localized to the membranes of all cellular compartments, and play pivotal roles in multiple cellular events. To fulfill their functions, PIPs that are located to specific organelles or membrane domains bind to and recruit various proteins in spatiotemporal specific manner via protein domains that selectively bind to either a single or an array of PIPs. In Entamoeba histolytica, the human intestinal protozoan parasite, PIPs and PIP-binding proteins have been shown to be involved in their virulence-associated mechanisms such as cell motility, vesicular traffic, trogo- and phagocytosis. In silico search of the domains and the signatures implicated in PIP binding in the E. histolytica proteome allows identification of dozens of potential PIP-binding proteins. However, such analysis is often misleading unless the protein domain used as query is cautiously selected and the binding specificity of the proteins are experimentally validated. This is because all the domains initially presumed to bind PIPs in other systems are not always capable of PIP binding, but rather involved in other biological roles. In this review, we carried out in silico survey of proteins which have PIP-binding domains in the E. histolytica genome by utilizing only validated PIP-binding domains that had been experimentally proven to be faithful PIP-binding bioprobes. Our survey has identified that FYVE (Fab1, YOTB1, Vac1, EEA1) and PH (pleckstrin homology) domain containing proteins are the most expanded families in E. histolytica. A few FYVE domain-containing proteins (EhFP4 and 10) and phox homology (PX) domain containing proteins (EhSNX1 and 2) were previously studied in depth in E. histolytica. Furthermore, most of the identified PH domain-containing proteins are annotated as protein kinases and possess protein kinase domains. Overall, PIP-binding domain-containing proteins that can be identified by in silico survey of the genome using the domains from well characterized bioprobes are limited in E. histolytica. However, their domain architectures are often unique, suggesting unique evolution of PIP-binding domain-containing proteins in this organism.  相似文献   

8.

Background

Chromosome conformation capture studies suggest that eukaryotic genomes are organized into structures called topologically associating domains. The borders of these domains are highly enriched for architectural proteins with characterized roles in insulator function. However, a majority of architectural protein binding sites localize within topological domains, suggesting sites associated with domain borders represent a functionally different subclass of these regulatory elements. How topologically associating domains are established and what differentiates border-associated from non-border architectural protein binding sites remain unanswered questions.

Results

By mapping the genome-wide target sites for several Drosophila architectural proteins, including previously uncharacterized profiles for TFIIIC and SMC-containing condensin complexes, we uncover an extensive pattern of colocalization in which architectural proteins establish dense clusters at the borders of topological domains. Reporter-based enhancer-blocking insulator activity as well as endogenous domain border strength scale with the occupancy level of architectural protein binding sites, suggesting co-binding by architectural proteins underlies the functional potential of these loci. Analyses in mouse and human stem cells suggest that clustering of architectural proteins is a general feature of genome organization, and conserved architectural protein binding sites may underlie the tissue-invariant nature of topologically associating domains observed in mammals.

Conclusions

We identify a spectrum of architectural protein occupancy that scales with the topological structure of chromosomes and the regulatory potential of these elements. Whereas high occupancy architectural protein binding sites associate with robust partitioning of topologically associating domains and robust insulator function, low occupancy sites appear reserved for gene-specific regulation within topological domains.  相似文献   

9.
Database searches of the Caenorhabditis elegans and human genomic DNA sequences revealed genes encoding ribonuclease H1 (RNase H1) and RNase H2 in each genome. The human genome contains a single copy of each gene, whereas C. elegans has four genes encoding RNase H1-related proteins and one gene for RNase H2. By analyzing the mRNAs produced from the C. elegans genes, examining the amino acid sequence of the predicted protein, and expressing the proteins in Esherichia coli we have identified two active RNase H1-like proteins. One is similar to other eukaryotic RNases H1, whereas the second RNase H (rnh-1.1) is unique. The rnh-1.0 gene is transcribed as a dicistronic message with three dsRNA-binding domains; the mature mRNA is transspliced with SL2 splice leader and contains only one dsRNA-binding domain. Formation of RNase H1 is further regulated by differential cis-splicing events. A single rnh-2 gene, encoding a protein similar to several other eukaryotic RNase H2L's, also has been examined. The diversity and enzymatic properties of RNase H homologues are other examples of expansion of protein families in C. elegans. The presence of two RNases H1 in C. elegans suggests that two enzymes are required in this rather simple organism to perform the functions that are accomplished by a single enzyme in more complex organisms. Phylogenetic analysis indicates that the active C. elegans RNases H1 are distantly related to one another and that the C. elegans RNase H1 is more closely related to the human RNase H1. The database searches also suggest that RNase H domains of LTR-retrotransposons in C. elegans are quite unrelated to cellular RNases H1, but numerous RNase H domains of human endogenous retroviruses are more closely related to cellular RNases H.  相似文献   

10.
BAR domains are found in proteins that bind and remodel membranes and participate in cytoskeletal and nuclear processes. Here, we report the crystal structure of the BAR domain from the human Bin1 protein at 2.0 A resolution. Both the quaternary and tertiary architectures of the homodimeric Bin1BAR domain are built upon "knobs-into-holes" packing of side chains, like those found in conventional left-handed coiled-coils, and this packing governs the curvature of a putative membrane-engaging concave face. Our calculations indicate that the Bin1BAR domain contains two potential sites for protein-protein interactions on the convex face of the dimer. Comparative analysis of structural features reveals that at least three architectural subtypes of the BAR domain are encoded in the human genome, represented by the Arfaptin, Bin1/Amphiphysin, and IRSp53 BAR domains. We discuss how these principal groups may differ in their potential to form regulatory heterotypic interactions.  相似文献   

11.
Ubiquitin-associated (UBA) domains are found in a large number of proteins with diverse functions involved in ubiquitination, DNA repair, and signaling pathways. Recent studies have shown that several UBA domain proteins interact with ubiquitin (Ub), specifically p62, the phosphotyrosine-independent ligand of the SH2 domain of p56(lck); HHR23A, a human nucleotide excision repair protein; and DDI1, another damage-inducible protein. NMR chemical shift mapping reveals that Ub binds specifically but weakly to a conserved hydrophobic epitope on HHR23A UBA(1) and UBA(2) and that the UBA domains bind on the hydrophobic patch on the surface of the five-stranded beta-sheet of Ub. Models of the UBA(1)-Ub and UBA(2)-Ub complexes obtained from de novo docking reveal different orientations of the UBA domains on the Ub surface compared with those obtained by homology modeling with the related CUE domains, which also bind Ub. Our results suggest that UBA domains may interact with Ub as well as other proteins in more than one way while utilizing the same binding surface.  相似文献   

12.
MAPL (mitochondria-associated protein ligase, also called MULAN/GIDE/MUL1) is a multifunctional mitochondrial outer membrane protein found in human cells that contains a unique BAM (beside a membrane) domain and a C-terminal RING-finger domain. MAPL has been implicated in several processes that occur in animal cells such as NF-kB activation, innate immunity and antiviral signaling, suppression of PINK1/parkin defects, mitophagy in skeletal muscle, and caspase-dependent apoptosis. Previous studies demonstrated that the BAM domain is present in diverse organisms in which most of these processes do not occur, including plants, archaea, and bacteria. Thus the conserved function of MAPL and its BAM domain remains an open question. In order to gain insight into its conserved function, we investigated the evolutionary origins of MAPL by searching for homologues in predicted proteomes of diverse eukaryotes. We show that MAPL proteins with a conserved BAM-RING architecture are present in most animals, protists closely related to animals, a single species of fungus, and several multicellular plants and related green algae. Phylogenetic analysis demonstrated that eukaryotic MAPL proteins originate from a common ancestor and not from independent horizontal gene transfers from bacteria. We also determined that two independent duplications of MAPL occurred, one at the base of multicellular plants and another at the base of vertebrates. Although no other eukaryote genome examined contained a verifiable MAPL orthologue, BAM domain-containing proteins were identified in the protists Bigelowiella natans and Ectocarpus siliculosis. Phylogenetic analyses demonstrated that these proteins are more closely related to prokaryotic BAM proteins and therefore likely arose from independent horizontal gene transfers from bacteria. We conclude that MAPL proteins with BAM-RING architectures have been present in the holozoan and viridiplantae lineages since their very beginnings. Our work paves the way for future studies into MAPL function in alternative model organisms like Capsaspora owczarzaki and Chlamydomonas reinhardtii that will help to answer the question of MAPL’s ancestral function in ways that cannot be answered by studying animal cells alone.  相似文献   

13.
14.
15.

Background  

The kelch motif is an ancient and evolutionarily-widespread sequence motif of 44–56 amino acids in length. It occurs as five to seven repeats that form a β-propeller tertiary structure. Over 28 kelch-repeat proteins have been sequenced and functionally characterised from diverse organisms spanning from viruses, plants and fungi to mammals and it is evident from expressed sequence tag, domain and genome databases that many additional hypothetical proteins contain kelch-repeats. In general, kelch-repeat β-propellers are involved in protein-protein interactions, however the modest sequence identity between kelch motifs, the diversity of domain architectures, and the partial information on this protein family in any single species, all present difficulties to developing a coherent view of the kelch-repeat domain and the kelch-repeat protein superfamily. To understand the complexity of this superfamily of proteins, we have analysed by bioinformatics the complement of kelch-repeat proteins encoded in the human genome and have made comparisons to the kelch-repeat proteins encoded in other sequenced genomes.  相似文献   

16.
Annotations of the genes and their products are largely guided by inferring homology. Sequence similarity is the primary measure used for annotation purpose however, the domain content and order were given less importance albeit the fact that domain insertion, deletion, positional changes can bring in functional varieties. Of late, several methods developed quantify domain architecture similarity depending on alignments of their sequences and are focused on only homologous proteins. We present an alignment-free domain architecture-similarity search (ADASS) algorithm that identifies proteins that share very poor sequence similarity yet having similar domain architectures. We introduce a “singlet matching-triplet comparison” method in ADASS, wherein triplet of domains is compared with other triplets in a pair-wise comparison of two domain architectures. Different events in the triplet comparison are scored as per a scoring scheme and an average pairwise distance score (Domain Architecture Distance score - DAD Score) is calculated between protein domains architectures. We use domain architectures of a selected domain termed as centric domain and cluster them based on DAD score. The algorithm has high Positive Prediction Value (PPV) with respect to the clustering of the sequences of selected domain architectures. A comparison of domain architecture based dendrograms using ADASS method and an existing method revealed that ADASS can classify proteins depending on the extent of domain architecture level similarity. ADASS is more relevant in cases of proteins with tiny domains having little contribution to the overall sequence similarity but contributing significantly to the overall function.  相似文献   

17.
There is a limited repertoire of domain families in nature that are duplicated and combined in different ways to form the set of proteins in a genome. Most proteins in both prokaryote and eukaryote genomes consist of two or more domains, and we show that the family size distribution of multi-domain protein families follows a power law like that of individual families. Most domain pairs occur in four to six different domain architectures: in isolation and in combinations with different partners. We showed previously that within the set of all pairwise domain combinations, most small and medium-sized families are observed in combination with one or two other families, while a few large families are very versatile and combine with many different partners. Though this may appear to be a stochastic pattern, in which large families have more combination partners by virtue of their size, we establish here that all the domain families with more than three members in genomes are duplicated more frequently than would be expected by chance considering their number of neighbouring domains. This duplication of domain pairs is statistically significant for between one and three quarters of all families with seven or more members. For the majority of pairwise domain combinations, there is no known three-dimensional structure of the two domains together, and we term these novel combinations. Novel domain combinations are interesting and important targets for structural elucidation, as the geometry and interaction between the domains will help understand the function and evolution of multi-domain proteins. Of particular interest are those combinations that occur in the largest number of multi-domain proteins, and several of these frequent novel combinations contain DNA-binding domains.Abbreviations:SCOP: Structural Classification of Proteins database, PDB: Protein DataBank, HMM: hidden Markov model  相似文献   

18.
With the preponderance of multidomain proteins in eukaryotic genomes, it is essential to recognize the constituent domains and their functions. Often function involves communications across the domain interfaces, and the knowledge of the interacting sites is essential to our understanding of the structure–function relationship. Using evolutionary information extracted from homologous domains in at least two diverse domain architectures (single and multidomain), we predict the interface residues corresponding to domains from the two‐domain proteins. We also use information from the three‐dimensional structures of individual domains of two‐domain proteins to train naïve Bayes classifier model to predict the interfacial residues. Our predictions are highly accurate (~85%) and specific (~95%) to the domain–domain interfaces. This method is specific to multidomain proteins which contain domains in at least more than one protein architectural context. Using predicted residues to constrain domain–domain interaction, rigid‐body docking was able to provide us with accurate full‐length protein structures with correct orientation of domains. We believe that these results can be of considerable interest toward rational protein and interaction design, apart from providing us with valuable information on the nature of interactions. Proteins 2014; 82:1219–1234. © 2013 Wiley Periodicals, Inc.  相似文献   

19.
Helminths secrete a plethora of proteins involved in parasitism-related processes such as tissue penetration, migration, feeding and immunoregulation. Astacins, a family of zinc metalloproteases belonging to the peptidase family M12, are one of the most abundantly represented protein families in the secretomes of helminths. Despite their involvement in virulence, very few studies have addressed the role of this loosely defined protein group in parasitic helminths. Herein, we have analysed the predicted proteomes from 154 helminth species and confirmed the expansion of the astacin family in several nematode taxa. The astacin domain associated with up to 110 other domains into 145 unique domain architectures, where CUB and ShK constitute the principal and nearly independent bi-domain frameworks. The presence of co-existing domains suggests promiscuous adaptable functions to several roles. These activities could be related either to substrate specificity or to higher-order functions, such as anti-angiogenesis and immunomodulation, where the astacin domain would play an accessory role. Furthermore, some phylogenetically restricted mutations in the astacin domain affected residues located at the active cleft and binding sub-pockets, suggesting adaptation to different substrate specificities. Altogether, these findings suggest the astacin domain is a highly adaptable module that fulfils multiple proteolytic needs of the parasitic lifestyle. This study contributes to the understanding of helminth-secreted astacins and, ultimately, provides the foundation to guide future investigations about the role of this diverse family of proteins in host–parasite interactions.  相似文献   

20.
Domains are basic evolutionary units of proteins and most proteins have more than one domain. Advances in domain modeling and collection are making it possible to annotate a large fraction of known protein sequences by a linear ordering of their domains, yielding their architecture. Protein domain architectures link evolutionarily related proteins and underscore their shared functions. Here, we attempt to better understand this association by identifying the evolutionary pathways by which extant architectures may have evolved. We propose a model of evolution in which architectures arise through rearrangements of inferred precursor architectures and acquisition of new domains. These pathways are ranked using a parsimony principle, whereby scenarios requiring the fewest number of independent recombination events, namely fission and fusion operations, are assumed to be more likely. Using a data set of domain architectures present in 159 proteomes that represent all three major branches of the tree of life allows us to estimate the history of over 85% of all architectures in the sequence database. We find that the distribution of rearrangement classes is robust with respect to alternative parsimony rules for inferring the presence of precursor architectures in ancestral species. Analyzing the most parsimonious pathways, we find 87% of architectures to gain complexity over time through simple changes, among which fusion events account for 5.6 times as many architectures as fission. Our results may be used to compute domain architecture similarities, for example, based on the number of historical recombination events separating them. Domain architecture "neighbors" identified in this way may lead to new insights about the evolution of protein function.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号