首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Using structural similarity clustering of protein domains: protein domain universe graph (PDUG), and a hierarchical functional annotation: gene ontology (GO) as two evolutionary lenses, we find that each structural cluster (domain fold) exhibits a distribution of functions that is unique to it. These functional distributions are functional fingerprints that are specific to characteristic structural clusters and vary from cluster to cluster. Furthermore, as structural similarity threshold for domain clustering in the PDUG is relaxed we observe an influx of earlier-diverged domains into clusters. These domains join clusters without destroying the functional fingerprint. These results can be understood in light of a divergent evolution scenario that posits correlated divergence of structural and functional traits in protein domains from one or few progenitors.  相似文献   

2.
Recent work has shown that the network of structural similarity between protein domains exhibits a power-law distribution of edges per node. The scale-free nature of this graph, termed the protein domain universe graph or PDUG, may be reproduced via a divergent model of structural evolution. The performance of this model, however, does not preclude the existence of a successful convergent model. To further resolve the issue of protein structural evolution, we explore the predictions of both convergent and divergent models directly. We show that when nodes from the PDUG are partitioned into subgraphs on the basis of their occurrence in the proteomes of particular organisms, these subgraphs exhibit a scale-free nature as well. We explore a simple convergent model of structural evolution and find that the implications of this model are inconsistent with features of these organismal subgraphs. Importantly, we find that biased convergent models are inconsistent with our data. We find that when speciation mechanisms are added to a simple divergent model, subgraphs similar to the organismal subgraphs are produced, demonstrating that dynamic models can easily explain the distributions of structural similarity that exist within proteomes. We show that speciation events must be included in a divergent model of structural evolution to account for the non-random overlap of structural proteomes. These findings have implications for the long-standing debate over convergent and divergent models of protein structural evolution, and for the study of the evolution of organisms as a whole.  相似文献   

3.
Many proteins consist of several structural domains. These multi-domain proteins have likely been generated by selective genome growth dynamics during evolution to perform new functions as well as to create structures that fold on a biologically feasible time scale. Domain units frequently evolved through a variety of genetic shuffling mechanisms. Here we examine the protein domain statistics of more than 1000 organisms including eukaryotic, archaeal and bacterial species. The analysis extends earlier findings on asymmetric statistical laws for proteome to a wider variety of species. While proteins are composed of a wide range of domains, displaying a power-law decay, the computation of domain families for each protein reveals an exponential distribution, characterizing a protein universe composed of a thin number of unique families. Structural studies in proteomics have shown that domain repeats, or internal duplicated domains, represent a small but significant fraction of genome. In spite of its importance, this observation has been largely overlooked until recently. We model the evolutionary dynamics of proteome and demonstrate that these distinct distributions are in fact rooted in an internal duplication mechanism. This process generates the contemporary protein structural domain universe, determines its reduced thickness, and tames its growth. These findings have important implications, ranging from protein interaction network modeling to evolutionary studies based on fundamental mechanisms governing genome expansion.  相似文献   

4.
In the field of evolutionary structural genomics, methods are needed to evaluate why genomes evolved to contain the fold distributions that are observed. In order to study the effects of population dynamics in the evolved genomes we need fast and accurate evolutionary models which can analyze the effects of selection, drift and fixation of a protein sequence in a population that are grounded by physical parameters governing the folding and binding properties of the sequence. In this study, various knowledge-based, force field, and statistical methods for protein folding have been evaluated with four different folds: SH2 domains, SH3 domains, Globin-like, and Flavodoxin-like, to evaluate the speed and accuracy of the energy functions. Similarly, knowledge-based and force field methods have been used to predict ligand binding specificity in SH2 domain. To demonstrate the applicability of these methods, the dynamics of evolution of new binding capabilities by an SH2 domain is demonstrated.  相似文献   

5.
Protein evolution within a structural space   总被引:2,自引:1,他引:1       下载免费PDF全文
Understanding of the evolutionary origins of protein structures represents a key component of the understanding of molecular evolution as a whole. Here we seek to elucidate how the features of an underlying protein structural “space” might impact protein structural evolution. We approach this question using lattice polymers as a completely characterized model of this space. We develop a measure of structural comparison of lattice structures that is analogous to the one used to understand structural similarities between real proteins. We use this measure of structural relatedness to create a graph of lattice structures and compare this graph (in which nodes are lattice structures and edges are defined using structural similarity) to the graph obtained for real protein structures. We find that the graph obtained from all compact lattice structures exhibits a distribution of structural neighbors per node consistent with a random graph. We also find that subgraphs of 3500 nodes chosen either at random or according to physical constraints also represent random graphs. We develop a divergent evolution model based on the lattice space which produces graphs that, within certain parameter regimes, recapitulate the scale-free behavior observed in similar graphs of real protein structures.  相似文献   

6.
It has recently been discovered that many biological systems, when represented as graphs, exhibit a scale-free topology. One such system is the set of structural relationships among protein domains. The scale-free nature of this and other systems has previously been explained using network growth models that, although motivated by biological processes, do not explicitly consider the underlying physics or biology. In this work we explore a sequence-based model for the evolution protein structures and demonstrate that this model is able to recapitulate the scale-free nature observed in graphs of real protein structures. We find that this model also reproduces other statistical feature of the protein domain graph. This represents, to our knowledge, the first such microscopic, physics-based evolutionary model for a scale-free network of biological importance and as such has strong implications for our understanding of the evolution of protein structures and of other biological networks.  相似文献   

7.
Gendoo DM  Harrison PM 《PloS one》2011,6(11):e27342
The HET-s prion-forming domain from the filamentous fungus Podospora anserina is gaining considerable interest since it yielded the first well-defined atomic structure of a functional amyloid fibril. This structure has been identified as a left-handed beta solenoid with a triangular hydrophobic core. To delineate the origins of the HET-s prion-forming protein and to discover other amyloid-forming proteins, we searched for all homologs of the HET-s protein in a database of protein domains and fungal genomes, using a combined application of HMM, psi-blast and pGenThreader techniques, and performed a comparative evolutionary analysis of the N-terminal alpha-helical domain and the C-terminal prion-forming domain of HET-s. By assessing the tandem evolution of both domains, we observed that the prion-forming domain is restricted to Sordariomycetes, with a marginal additional sequence homolog in Arthroderma otae as a likely case of horizontal transfer. This suggests innovation and rapid evolution of the solenoid fold in the Sordariomycetes clade. In contrast, the N-terminal domain evolves at a slower rate (in Sordariomycetes) and spans many diverse clades of fungi. We performed a full three-dimensional protein threading analysis on all identified HET-s homologs against the HET-s solenoid fold, and present detailed structural annotations for identified structural homologs to the prion-forming domain. An analysis of the physicochemical characteristics in our set of structural models indicates that the HET-s solenoid shape can be readily adopted in these homologs, but that they are all less optimized for fibril formation than the P. anserina HET-s sequence itself, due chiefly to the presence of fewer asparagine ladders and salt bridges. Our combined structural and evolutionary analysis suggests that the HET-s shape has "limited scope" for amyloidosis across the wider protein universe, compared to the 'generic' left-handed beta helix. We discuss the implications of our findings on future identification of amyloid-forming proteins sharing the solenoid fold.  相似文献   

8.
Convergent evolution of domain architectures (is rare)   总被引:4,自引:0,他引:4  
MOTIVATION: In this paper, we shall examine the evolution of domain architectures across 62 genomes of known phylogeny including all kingdoms of life. We look in particular at the possibility of convergent evolution, with a view to determining the extent to which the architectures observed in the genomes are due to functional necessity or evolutionary descent. We used domains of known structure, because from this and other information we know their evolutionary relationships. We use a range of methods including phylogenetic grouping, sequence similarity/alignment, mutation rates and comparative genomics to approach this difficult problem from several angles. RESULTS: Although we do not claim an exhaustive analysis, we conclude that between 0.4 and 4% of sequences are involved in convergent evolution of domain architectures, and expect the actual number to be close to the lower bound. We also made two incidental observations, albeit on a small sample: the events leading to convergent evolution appear to be random with no functional or structural preferences, and changes in the number of tandem repeat domains occur more readily than changes which alter the domain composition. CONCLUSION: The principal conclusion is that the observed domain architectures of the sequences in the genomes are driven by evolutionary descent rather than functional necessity. CONTACT: gough@supfam.org.  相似文献   

9.
10.

Background

Conserved domains are recognized as the building blocks of eukaryotic proteins. Domains showing a tendency to occur in diverse combinations (??promiscuous?? domains) are involved in versatile architectures in proteins with different functions. Current models, based on global-level analyses of domain combinations in multiple genomes, have suggested that the propensity of some domains to associate with other domains in high-level architectures increases with organismal complexity. Alternative models using domain-based phylogenetic trees propose that domains have become promiscuous independently in different lineages through convergent evolution and are, thus, random with no functional or structural preferences. Here we test whether complex protein architectures have occurred by accretion from simpler systems and whether the appearance of multidomain combinations parallels organismal complexity. As a model, we analyze the modular evolution of the PWWP domain and ask whether its appearance in combinations with other domains into multidomain architectures is linked with the occurrence of more complex life-forms. Whether high-level combinations of domains are conserved and transmitted as stable units (cassettes) through evolution is examined in the genomes of plant or metazoan species selected for their established position in the evolution of the respective lineages.

Results

Using the domain-tree approach, we analyze the evolutionary origins and distribution patterns of the promiscuous PWWP domain to understand the principles of its modular evolution and its existence in combination with other domains in higher-level protein architectures. We found that as a single module the PWWP domain occurs only in proteins with a limited, mainly, species-specific distribution. Earlier, it was suggested that domain promiscuity is a fast-changing (volatile) feature shaped by natural selection and that only a few domains retain their promiscuity status throughout evolution. In contrast, our data show that most of the multidomain PWWP combinations in extant multicellular organisms (humans or land plants) are present in their unicellular ancestral relatives suggesting they have been transmitted through evolution as conserved linear arrangements (??cassettes??). Among the most interesting biologically relevant results is the finding that the genes of the two plant Trithorax family subgroups (ATX1/2 and ATX3/4/5) have different phylogenetic origins. The two subgroups occur together in the earliest land plants Physcomitrella patens and Selaginella moellendorffii.

Conclusion

Gain/loss of a single PWWP domain is observed throughout evolution reflecting dynamic lineage- or species-specific events. In contrast, higher-level protein architectures involving the PWWP domain have survived as stable arrangements driven by evolutionary descent. The association of PWWP domains with the DNA methyltransferases in O. tauri and in the metazoan lineage seems to have occurred independently consistent with convergent evolution. Our results do not support models wherein more complex protein architectures involving the PWWP domain occur with the appearance of more evolutionarily advanced life forms.  相似文献   

11.
Most eukaryotic proteins are composed of two or more domains. These assemble in a modular manner to create new proteins usually by the acquisition of one or more domains to an existing protein. Promiscuous domains which are found embedded in a variety of proteins and co-exist with many other domains are of particular interest and were shown to have roles in signaling pathways and mediating network communication. The evolution of domain promiscuity is still an open problem, mostly due to the lack of sequenced ancestral genomes. Here we use inferred domain architectures of ancestral genomes to trace the evolution of domain promiscuity in eukaryotic genomes. We find an increase in average promiscuity along many branches of the eukaryotic tree. Moreover, domain promiscuity can proceed at almost a steady rate over long evolutionary time or exhibit lineage-specific acceleration. We also observe that many signaling and regulatory domains gained domain promiscuity around the Bilateria divergence. In addition we show that those domains that played a role in the creation of two body axes and existed before the divergence of the bilaterians from fungi/metazoan achieve a boost in their promiscuities during the bilaterian evolution.  相似文献   

12.
MOTIVATION: The structural interaction of proteins and their domains in networks is one of the most basic molecular mechanisms for biological cells. Topological analysis of such networks can provide an understanding of and solutions for predicting properties of proteins and their evolution in terms of domains. A single paradigm for the analysis of interactions at different layers, such as domain and protein layers, is needed. RESULTS: Applying a colored vertex graph model, we integrated two basic interaction layers under a unified model: (1) structural domains and (2) their protein/complex networks. We identified four basic and distinct elements in the model that explains protein interactions at the domain level. We searched for motifs in the networks to detect their topological characteristics using a pruning strategy and a hash table for rapid detection. We obtained the following results: first, compared with a random distribution, a substantial part of the protein interactions could be explained by domain-level structural interaction information. Second, there were distinct kinds of protein interaction patterns classified by specific and distinguishable numbers of domains. The intermolecular domain interaction was the most dominant protein interaction pattern. Third, despite the coverage of the protein interaction information differing among species, the similarity of their networks indicated shared architectures of protein interaction network in living organisms. Remarkably, there were only a few basic architectures in the model (>10 for a 4-node network topology), and we propose that most biological combinations of domains into proteins and complexes can be explained by a small number of key topological motifs. CONTACT: doheon@kaist.ac.kr.  相似文献   

13.
New directions in biology are being driven by the complete sequencing of genomes, which has given us the protein repertoires of diverse organisms from all kingdoms of life. In tandem with this accumulation of sequence data, worldwide structural genomics initiatives, advanced by the development of improved technologies in X-ray crystallography and NMR, are expanding our knowledge of structural families and increasing our fold libraries. Methods for detecting remote sequence similarities have also been made more sensitive and this means that we can map domains from these structural families onto genome sequences to understand how these families are distributed throughout the genomes and reveal how they might influence the functional repertoires and biological complexities of the organisms. We have used robust protocols to assign sequences from completed genomes to domain structures in the CATH database, allowing up to 60% of domain sequences in these genomes, depending on the organism, to be assigned to a domain family of known structure. Analysis of the distribution of these families throughout bacterial genomes identified more than 300 universal families, some of which had expanded significantly in proportion to genome size. These highly expanded families are primarily involved in metabolism and regulation and appear to make major contributions to the functional repertoire and complexity of bacterial organisms. When comparisons are made across all kingdoms of life, we find a smaller set of universal domain families (approx. 140), of which families involved in protein biosynthesis are the largest conserved component. Analysis of the behaviour of other families reveals that some (e.g. those involved in metabolism, regulation) have remained highly innovative during evolution, making it harder to trace their evolutionary ancestry. Structural analyses of metabolic families provide some insights into the mechanisms of functional innovation, which include changes in domain partnerships and significant structural embellishments leading to modulation of active sites and protein interactions.  相似文献   

14.
Multidomain proteins form in evolution through the concatenation of domains, but structural domains may comprise multiple segments of the chain. In this work, we demonstrate that new multidomain architectures can evolve by an apparent three-dimensional swap of segments between structurally similar domains within a single-chain monomer. By a comprehensive structural search of the current Protein Data Bank (PDB), we identified 32 well-defined segment-swapped proteins (SSPs) belonging to 18 structural families. Nearly 13% of all multidomain proteins in the PDB may have a segment-swapped evolutionary precursor as estimated by more permissive searching criteria. The formation of SSPs can be explained by two principal evolutionary mechanisms: (i) domain swapping and fusion (DSF) and (ii) circular permutation (CP). By large-scale comparative analyses using structural alignment and hidden Markov model methods, it was found that the majority of SSPs have evolved via the DSF mechanism, and a much smaller fraction, via CP. Functional analyses further revealed that segment swapping, which results in two linkers connecting the domains, may impart directed flexibility to multidomain proteins and contributes to the development of new functions. Thus, inter-domain segment swapping represents a novel general mechanism by which new protein folds and multidomain architectures arise in evolution, and SSPs have structural and functional properties that make them worth defining as a separate group.  相似文献   

15.
For the past four decades the compositional organization of the mammalian genome posed a formidable challenge to molecular evolutionists attempting to explain it from an evolutionary perspective. Unfortunately, most of the explanations adhered to the “isochore theory,” which has long been rebutted. Recently, an alternative compositional domain model was proposed depicting the human and cow genomes as composed mostly of short compositionally homogeneous and nonhomogeneous domains and a few long ones. We test the validity of this model through a rigorous sequence-based analysis of eleven completely sequenced mammalian and avian genomes. Seven attributes of compositional domains are used in the analyses: (1) the number of compositional domains, (2) compositional domain-length distribution, (3) density of compositional domains, (4) genome coverage by the different domain types, (5) degree of fit to a power-law distribution, (6) compositional domain GC content, and (7) the joint distribution of GC content and length of the different domain types. We discuss the evolution of these attributes in light of two competing phylogenetic hypotheses that differ from each other in the validity of clade Euarchontoglires. If valid, the murid genome compositional organization would be a derived state and exhibit a high similarity to that of other mammals. If invalid, the murid genome compositional organization would be closer to an ancestral state. We demonstrate that the compositional organization of the murid genome differs from those of primates and laurasiatherians, a phenomenon previously termed the “murid shift,” and in many ways resembles the genome of opossum. We find no support to the “isochore theory.” Instead, our findings depict the mammalian genome as a tapestry of mostly short homogeneous and nonhomogeneous domains and few long ones thus providing strong evidence in favor of the compositional domain model and seem to invalidate clade Euarchontoglires.  相似文献   

16.
The linear sequence of genomes exists within the three-dimensional space of the cell nucleus. The spatial arrangement of genes and chromosomes within the interphase nucleus is nonrandom and gives rise to specific patterns. While recent work has begun to describe some of the positioning patterns of chromosomes and gene loci, the structural constraints that are responsible for nonrandom positioning and the relevance of spatial genome organization for genome expression are unclear. Here we discuss potential functional consequences of spatial genome organization and we speculate on the possible molecular mechanisms of how genomes are organized within the space of the mammalian cell nucleus.  相似文献   

17.
Protein domains represent the basic evolutionary units that form proteins. Domain duplication and shuffling by recombination are probably the most important forces driving protein evolution and hence the complexity of the proteome. While the duplication of whole genes as well as domain-encoding exons increases the abundance of domains in the proteome, domain shuffling increases versatility, i.e. the number of distinct contexts in which a domain can occur. Here, we describe a comprehensive, genome-wide analysis of the relationship between these two processes. We observe a strong and robust correlation between domain versatility and abundance: domains that occur more often also have many different combination partners. This supports the view that domain recombination occurs in a random way. However, we do not observe all the different combinations that are expected from a simple random recombination scenario, and this is due to frequent duplication of specific domain combinations. When we simulate the evolution of the protein repertoire considering stochastic recombination of domains followed by extensive duplication of the combinations, we approximate the observed data well. Our analyses are consistent with a stochastic process that governs domain recombination and thus protein divergence with respect to domains within a polypeptide chain. At the same time, they support a scenario in which domain combinations are formed only once during the evolution of the protein repertoire, and are then duplicated to various extents. The extent of duplication of different combinations varies widely and, in nature, will depend on selection for the domain combination based on its function. Some of the pair-wise domain combinations that are highly duplicated also recur frequently with other partner domains, and thus represent evolutionary units larger than single protein domains, which we term "supra-domains".  相似文献   

18.
Chloroplast transit peptides: structure, function and evolution   总被引:21,自引:0,他引:21  
It is thought that two to three thousand different proteins are targeted to the chloroplast, and the 'transit peptides' that act as chloroplast targeting sequences are probably the largest class of targeting sequences in plants. At a primary structural level, transit peptide sequences are highly divergent in length, composition and organization. An emerging concept suggests that transit peptides contain multiple domains that provide either distinct or overlapping functions. These functions include direct interaction with envelope lipids, chloroplast receptors and the stromal processing peptidase. The genomic organization of transit peptides suggests that these domains might have originated from distinct exons, which were shuffled and streamlined throughout evolution to yield a modern, multifunctional transit peptide. Although still poorly characterized, this evolutionary process could yield transit peptides with different domain organizations. The plasticity of transit peptide design is consistent with the diverse biological functions of chloroplast proteins.  相似文献   

19.
MOTIVATION: In our previous studies, we developed discrete-space birth, death and innovation models (BDIMs) of genome evolution. These models explain the origin of the characteristic Pareto distribution of paralogous gene family sizes in genomes, and model parameters that provide for the evolution of these distributions within a realistic time frame have been identified. However, extracting the temporal dynamics of genome evolution from discrete-space BDIM was not technically feasible. We were interested in obtaining dynamic portraits of the genome evolution process by developing a diffusion approximation of BDIM. RESULTS: The diffusion version of BDIM belongs to a class of continuous-state models whose dynamics is described by the Fokker-Plank equation and the stationary solution could be any specified Pareto function. The diffusion models have time-dependent solutions of a special kind, namely, generalized self-similar solutions, which describe the transition from one stationary distribution of the system to another; this provides for the possibility of examining the temporal dynamics of genome evolution. Analysis of the generalized self-similar solutions of the diffusion BDIM reveals a biphasic curve of genome growth in which the initial, relatively short, self-accelerating phase is followed by a prolonged phase of slow deceleration. This evolutionary dynamics was observed both when genome growth started from zero and proceeded via innovation (a potential model of primordial evolution), and when evolution proceeded from one stationary state to another. In biological terms, this regime of evolution can be tentatively interpreted as a punctuated-equilibrium-like phenomenon whereby evolutionary transitions are accompanied by rapid gene amplification and innovation, followed by slow relaxation to a new stationary state.  相似文献   

20.
Ponting CP  Dickens NJ 《Genome biology》2001,2(7):comment2006.1-comment20066
The evolutionary history of eukaryotic proteins involves rapid sequence divergence, addition and deletion of domains, and fusion and fission of genes. Although the protein repertoires of distantly related species differ greatly, their domain repertoires do not. To account for the great diversity of domain contexts and an unexpected paucity of ortholog conservation, we must categorize the coding regions of completely sequenced genomes into domain families, as well as protein families.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号