首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Studies of microbial eukaryotes have been pivotal in the discovery of biological phenomena, including RNA editing, self-splicing RNA, and telomere addition. Here we extend this list by demonstrating that genome architecture, namely the extensive processing of somatic (macronuclear) genomes in some ciliate lineages, is associated with elevated rates of protein evolution. Using newly developed likelihood-based procedures for studying molecular evolution, we investigate 6 genes to compare 1) ciliate protein evolution to that of 3 other clades of eukaryotes (plants, animals, and fungi) and 2) protein evolution in ciliates with extensively processed macronuclear genomes to that of other ciliate lineages. In 5 of the 6 genes, ciliates are estimated to have a higher ratio of nonsynonymous/synonymous substitution rates, consistent with an increase in the rate of protein diversification in ciliates relative to other eukaryotes. Even more striking, there is a significant effect of genome architecture within ciliates as the most divergent proteins are consistently found in those lineages with the most highly processed macronuclear genomes. We propose a model whereby genome architecture-specifically chromosomal processing, amitosis within macronuclei, and epigenetics-allows ciliates to explore protein space in a novel manner. Further, we predict that examination of diverse eukaryotes will reveal additional evidence of the impact of genome architecture on molecular evolution.  相似文献   

2.
Domains are basic evolutionary units of proteins and most proteins have more than one domain. Advances in domain modeling and collection are making it possible to annotate a large fraction of known protein sequences by a linear ordering of their domains, yielding their architecture. Protein domain architectures link evolutionarily related proteins and underscore their shared functions. Here, we attempt to better understand this association by identifying the evolutionary pathways by which extant architectures may have evolved. We propose a model of evolution in which architectures arise through rearrangements of inferred precursor architectures and acquisition of new domains. These pathways are ranked using a parsimony principle, whereby scenarios requiring the fewest number of independent recombination events, namely fission and fusion operations, are assumed to be more likely. Using a data set of domain architectures present in 159 proteomes that represent all three major branches of the tree of life allows us to estimate the history of over 85% of all architectures in the sequence database. We find that the distribution of rearrangement classes is robust with respect to alternative parsimony rules for inferring the presence of precursor architectures in ancestral species. Analyzing the most parsimonious pathways, we find 87% of architectures to gain complexity over time through simple changes, among which fusion events account for 5.6 times as many architectures as fission. Our results may be used to compute domain architecture similarities, for example, based on the number of historical recombination events separating them. Domain architecture "neighbors" identified in this way may lead to new insights about the evolution of protein function.  相似文献   

3.
Most eukaryotic proteins are multi-domain proteins that are created from fusions of genes, deletions and internal repetitions. An investigation of such evolutionary events requires a method to find the domain architecture from which each protein originates. Therefore, we defined a novel measure, domain distance, which is calculated as the number of domains that differ between two domain architectures. Using this measure the evolutionary events that distinguish a protein from its closest ancestor have been studied and it was found that indels are more common than internal repetition and that the exchange of a domain is rare. Indels and repetitions are common at both the N and C-terminals while they are rare between domains. The evolution of the majority of multi-domain proteins can be explained by the stepwise insertions of single domains, with the exception of repeats that sometimes are duplicated several domains in tandem. We show that domain distances agree with sequence similarity and semantic similarity based on gene ontology annotations. In addition, we demonstrate the use of the domain distance measure to build evolutionary trees. Finally, the evolution of multi-domain proteins is exemplified by a closer study of the evolution of two protein families, non-receptor tyrosine kinases and RhoGEFs.  相似文献   

4.
Protein domain architectures (PDAs), in which single domains are linked to form multiple-domain proteins, are a major molecular form used by evolution for the diversification of protein functions. However, the design principles of PDAs remain largely uninvestigated. In this study, we constructed networks to connect domain architectures that had grown out from the same single domain for every single domain in the Pfam-A database and found that there are three main distinctive types of these networks, which suggests that evolution can exploit PDAs in three different ways. Further analysis showed that these three different types of PDA networks are each adopted by different types of protein domains, although many networks exhibit the characteristics of more than one of the three types. Our results shed light on nature''s blueprint for protein architecture and provide a framework for understanding architectural design from a network perspective.  相似文献   

5.
6.
A consensus approach for the assignment of structural domains in proteins is presented. The approach combines a number of previously published algorithms, and takes advantage of the elevated accuracy obtained when assignments from the individual algorithms are in agreement. The consensus approach is tested on a data set of 55 protein chains, for which domain assignments from four automated methods were known, and for which crystallographers assignments had been reported in the literature. Accuracy was found to increase in this test from 72% using individual algorithms to 100% when all four methods were in agreement. However a consensus prediction using all four methods was only possible for 52% of the dataset. The consensus approach [using three publicly available domain assignment algorithms (PUU, DETECTIVE, DOMAK)] was then used to make domain assignments for a data set of 787 protein chains from the Protein Data Bank. Analysis of the assignments showed 55.7% of assignments could be made automatically, and of these, 13.5% were multi-domain proteins. Of the remaining 44.3% that could not be assigned by the consensus procedure 90.4% had their domain boundaries assigned correctly by at least one of the algorithms. Once identified, these domains were analyzed for trends in their size and secondary structure class. In addition, the discontinuity of each domain along the protein chain was considered.  相似文献   

7.
In animals, the innate immune system is the first line of defense against invading microorganisms, and the pattern-recognition receptors (PRRs) are the key components of this system, detecting microbial invasion and initiating innate immune defenses. Two families of PRRs, the intracellular NOD-like receptors (NLRs) and the transmembrane Toll-like receptors (TLRs), are of particular interest because of their roles in a number of diseases. Understanding the evolutionary history of these families and their pattern of evolutionary changes may lead to new insights into the functioning of this critical system. We found that the evolution of both NLR and TLR families included massive species-specific expansions and domain shuffling in various lineages, which resulted in the same domain architectures evolving independently within different lineages in a process that fits the definition of parallel evolution. This observation illustrates both the dynamics of the innate immune system and the effects of “combinatorially constrained” evolution, where existence of the limited numbers of functionally relevant domains constrains the choices of domain architectures for new members in the family, resulting in the emergence of independently evolved proteins with identical domain architectures, often mistaken for orthologs.  相似文献   

8.
Abstract The interface of protein structural biology, protein biophysics, molecular evolution, and molecular population genetics forms the foundations for a mechanistic understanding of many aspects of protein biochemistry. Current efforts in interdisciplinary protein modeling are in their infancy and the state-of-the art of such models is described. Beyond the relationship between amino acid substitution and static protein structure, protein function, and corresponding organismal fitness, other considerations are also discussed. More complex mutational processes such as insertion and deletion and domain rearrangements and even circular permutations should be evaluated. The role of intrinsically disordered proteins is still controversial, but may be increasingly important to consider. Protein geometry and protein dynamics as a deviation from static considerations of protein structure are also important. Protein expression level is known to be a major determinant of evolutionary rate and several considerations including selection at the mRNA level and the role of interaction specificity are discussed. Lastly, the relationship between modeling and needed high-throughput experimental data as well as experimental examination of protein evolution using ancestral sequence resurrection and in vitro biochemistry are presented, towards an aim of ultimately generating better models for biological inference and prediction.  相似文献   

9.
Lee D  Grant A  Marsden RL  Orengo C 《Proteins》2005,59(3):603-615
Using a new protocol, PFscape, we undertake a systematic identification of protein families and domain architectures in 120 complete genomes. PFscape clusters sequences into protein families using a Markov clustering algorithm (Enright et al., Nucleic Acids Res 2002;30:1575-1584) followed by complete linkage clustering according to sequence identity. Within each protein family, domains are recognized using a library of hidden Markov models comprising CATH structural and Pfam functional domains. Domain architectures are then determined using DomainFinder (Pearl et al., Protein Sci 2002;11:233-244) and the protein family and domain architecture data are amalgamated in the Gene3D database (Buchan et al., Genome Res 2002;12:503-514). Using Gene3D, we have investigated protein sequence space, the extent of structural annotation, and the distribution of different domain architectures in completed genomes from all kingdoms of life. As with earlier studies by other researchers, the distribution of domain families shows power-law behavior such that the largest 2,000 domain families can be mapped to approximately 70% of nonsingleton genome sequences; the remaining sequences are assigned to much smaller families. While approximately 50% of domain annotations within a genome are assigned to 219 universal domain families, a much smaller proportion (< 10%) of protein sequences are assigned to universal protein families. This supports the mosaic theory of evolution whereby domain duplication followed by domain shuffling gives rise to novel domain architectures that can expand the protein functional repertoire of an organism. Functional data (e.g. COG/KEGG/GO) integrated within Gene3D result in a comprehensive resource that is currently being used in structure genomics initiatives and can be accessed via http://www.biochem.ucl.ac.uk/bsm/cath/Gene3D/.  相似文献   

10.
Sequence annotation is fundamental for studying the evolution of protein families, particularly when working with nonmodel species. Given the rapid, ever-increasing number of species receiving high-quality genome sequencing, accurate domain modeling that is representative of species diversity is crucial for understanding protein family sequence evolution and their inferred function(s). Here, we describe a bioinformatic tool called Taxon-Informed Adjustment of Markov Model Attributes (TIAMMAt) which revises domain profile hidden Markov models (HMMs) by incorporating homologous domain sequences from underrepresented and nonmodel species. Using innate immunity pathways as a case study, we show that revising profile HMM parameters to directly account for variation in homologs among underrepresented species provides valuable insight into the evolution of protein families. Following adjustment by TIAMMAt, domain profile HMMs exhibit changes in their per-site amino acid state emission probabilities and insertion/deletion probabilities while maintaining the overall structure of the consensus sequence. Our results show that domain revision can heavily impact evolutionary interpretations for some families (i.e., NLR’s NACHT domain), whereas impact on other domains (e.g., rel homology domain and interferon regulatory factor domains) is minimal due to high levels of sequence conservation across the sampled phylogenetic depth (i.e., Metazoa). Importantly, TIAMMAt revises target domain models to reflect homologous sequence variation using the taxonomic distribution under consideration by the user. TIAMMAt’s flexibility to revise any subset of the Pfam database using a user-defined taxonomic pool will make it a valuable tool for future protein evolution studies, particularly when incorporating (or focusing) on nonmodel species.  相似文献   

11.
The extensive family of plant terpene synthases (TPSs) generally has a bi-domain structure, yet phylogenetic analyses consistently indicate that these synthases have evolved from larger diterpene synthases. In particular, that duplication of the diterpene synthase genes required for gibberellin phytohormone biosynthesis provided an early predecessor, whose loss of a approximately 220 amino acid 'internal sequence element' (now recognized as the γ domain) gave rise to the precursor of the modern mono- and sesqui-TPSs found in all higher plants. Intriguingly, TPSs are conserved by taxonomic relationships rather than function. This relationship demonstrates that such functional radiation has occurred both repeatedly and relatively recently, yet phylogenetic analyses assume that the 'internal/γ' domain loss represents a single evolutionary event. Here we provide evidence that such a loss was not a singular event, but rather has occurred multiple times. Specifically, we provide an example of a bi-domain diterpene synthase from Salvia miltiorrhiza, along with a sesquiterpene synthase from Triticum aestivum (wheat) that is not only closely related to diterpene synthases, but retains the ent-kaurene synthase activity relevant to the ancestral gibberellin metabolic function. Indeed, while the wheat sesquiterpene synthase clearly no longer contains the 'internal/γ' domain, it is closely related to rice diterpene synthase genes that retain the ancestral tri-domain structure. Thus, these findings provide examples of key evolutionary intermediates that underlie the bi-domain structure observed in the expansive plant TPS gene family, as well as indicating that 'internal/γ' domain loss has occurred independently multiple times, highlighting the complex evolutionary history of this important enzymatic family.  相似文献   

12.
Investigating the relative importance of protein stability, function, and folding kinetics in driving protein evolution has long been hindered by the fact that we can only compare modern natural proteins, the products of the very process we seek to understand, to each other, with no external references or baselines. Through a large-scale all-atom simulation of protein evolution, we have created a large diverse alignment of SH3 domain sequences which have been selected only for native state stability, with no other influencing factors. Although the average pairwise identity between computationally evolved and natural sequences is only 17%, the residue frequency distributions of the computationally evolved sequences are similar to natural SH3 sequences at 86% of the positions in the domain, suggesting that optimization for the native state structure has dominated the evolution of natural SH3 domains. Additionally, the positions which play a consistent role in the transition state of three well-characterized SH3 domains (by phi-value analysis) are structurally optimized for the native state, and vice versa. Indeed, we see a specific and significant correlation between sequence optimization for native state stability and conservation of transition state structure.  相似文献   

13.
14.
Domains are the evolutionary units that comprise proteins, and most proteins are built from more than one domain. Domains can be shuffled by recombination to create proteins with new arrangements of domains. Using structural domain assignments, we examined the combinations of domains in the proteins of 131 completely sequenced organisms. We found two-domain and three-domain combinations that recur in different protein contexts with different partner domains. The domains within these combinations have a particular functional and spatial relationship. These units are larger than individual domains and we term them "supra-domains". Amongst the supra-domains, we identified some 1400 (1203 two-domain and 166 three-domain) combinations that are statistically significantly over-represented relative to the occurrence and versatility of the individual component domains. Over one-third of all structurally assigned multi-domain proteins contain these over-represented supra-domains. This means that investigation of the structural and functional relationships of the domains forming these popular combinations would be particularly useful for an understanding of multi-domain protein function and evolution as well as for genome annotation. These and other supra-domains were analysed for their versatility, duplication, their distribution across the three kingdoms of life and their functional classes. By examining the three-dimensional structures of several examples of supra-domains in different biological processes, we identify two basic types of spatial relationships between the component domains: the combined function of the two domains is such that either the geometry of the two domains is crucial and there is a tight constraint on the interface, or the precise orientation of the domains is less important and they are spatially separate. Frequently, the role of the supra-domain becomes clear only once the three-dimensional structure is known. Since this is the case for only a quarter of the supra-domains, we provide a list of the most important unknown supra-domains as potential targets for structural genomics projects.  相似文献   

15.
The mitochondrion is an essential cellular compartment in eukaryotes. The mitochondrial proteins Tom20 and Tom22 are receptors that ensure recognition and binding of proteins imported for mitochondrial biogenesis. Comparison of the sequence for the Tom20 and Tom22 subunits in the yeasts Saccharomyces cerevisiae and Saccharomyces castellii, show a rare case of domain stealing, where in Saccharomyces castellii Tom22 has lost an acidic domain, and Tom20 has gained one. This example of domain stealing is a snapshot of evolution in action and provides excellent evidence that Tom20 and Tom22 are subunits of a single, composite receptor that binds precursor proteins for import into mitochondria.  相似文献   

16.
We have investigated the mechanism and the evolutionary pathway of protein dimerization through analysis of experimental structures of dimers. We propose that the evolution of dimers may have multiple pathways, including (1) formation of a functional dimer directly without going through an ancestor monomer, (2) formation of a stable monomer as an intermediate followed by mutations of its surface residues, and (3), a domain swapping mechanism, replacing one segment in a monomer by an equivalent segment from an identical chain in the dimer. Some of the dimers which are governed by a domain swapping mechanism may have evolved at an earlier stage of evolution via the second mechanism. Here, we follow the theory that the kinetic pathway reflects the evolutionary pathway. We analyze the structure-kinetics-evolution relationship for a collection of symmetric homodimers classified into three groups: (1) 14 dimers, which were referred to as domain swapping dimers in the literature; (2) nine 2-state dimers, which have no measurable intermediates in equilibrium denaturation; and (3), eight 3-state dimers, which have stable intermediates in equilibrium denaturation. The analysis consists of the following stages: (i) The dimer is divided into two structural units, which have twofold symmetry. Each unit contains a contiguous segment from one polypeptide chain of the dimer, and its complementary contiguous segment from the other chain. (ii) The division is repeated progressively, with different combinations of the two segments in each unit. (iii) The coefficient of compactness is calculated for the units in all divisions. The coefficients obtained for different cuttings of a dimer form a compactness profile. The profile probes the structural organization of the two chains in a dimer and the stability of the monomeric state. We describe the features of the compactness profiles in each of the three dimer groups. The profiles identify the swapping segments in domain swapping dimers, and can usually predict whether a dimer has domain swapping. The kinetics of dimerization indicates that some dimers which have been assigned in the literature as domain swapping cases, dimerize through the 2-state kinetics, rather than through swapping segments of performed monomers. The compactness profiles indicate a wide spectrum in the kinetics of dimerization: dimers having no intermediate stable monomers; dimers having an intermediate with a stable monomer structure; and dimers having an intermediate with a stable structure in part of the monomer. These correspond to the multiple evolutionary pathways for dimer formation. The evolutionary mechanisms proposed here for dimers are applicable to other oligomers as well.  相似文献   

17.
Most studies of behaviour examine traits whose proximate causes include sensory input and neural decision-making, but conflict and collaboration in biological systems began long before brains or sensory systems evolved. Many behaviours result from non-neural mechanisms such as direct physical contact between recognition proteins or modifications of development that coincide with altered behaviour. These simple molecular mechanisms form the basis of important biological functions and can enact organismal interactions that are as subtle, strategic and interesting as any. The genetic changes that underlie divergent molecular behaviours are often targets of selection, indicating that their functional variation has important fitness consequences. These behaviours evolve by discrete units of quantifiable phenotypic effect (amino acid and regulatory mutations, often by successive mutations of the same gene), so the role of selection in shaping evolutionary change can be evaluated on the scale at which heritable phenotypic variation originates. We describe experimental strategies for finding genes that underlie biochemical and developmental alterations of behaviour, survey the existing literature highlighting cases where the simplicity of molecular behaviours has allowed insight to the evolutionary process and discuss the utility of a genetic knowledge of the sources and spectrum of phenotypic variation for a deeper understanding of how genetic and phenotypic architectures evolve.  相似文献   

18.
Patrick Slama 《Proteins》2018,86(1):3-12
Residues at different positions of a multiple sequence alignment sometimes evolve together, due to a correlated structural or functional stress at these positions. Co‐evolution has thus been evidenced computationally in multiple proteins or protein domains. Here, we wish to study whether an evolutionary stress is exerted on a sequence alignment across protein domains, i.e., on longer sequence separations than within a single protein domain. JmjC‐containing lysine demethylases were chosen for analysis, as a follow‐up to previous studies; these proteins are important multidomain epigenetic regulators. In these proteins, the JmjC domain is responsible for the demethylase activity, and surrounding domains interact with histones, DNA or partner proteins. This family of enzymes was analyzed at the sequence level, in order to determine whether the sequence of JmjC‐domains was affected by the presence of a neighboring JmjN domain or PHD finger in the protein. Multiple positions within JmjC sequences were shown to have their residue distributions significantly altered by the presence of the second domain. Structural considerations confirmed the relevance of the analysis for JmjN‐JmjC proteins, while among PHD‐JmjC proteins, the length of the linker region could be correlated to the residues observed at the most affected positions. The correlation of domain architecture with residue types at certain positions, as well as that of overall architecture with protein function, is discussed. The present results thus evidence the existence of an across‐domain evolutionary stress in JmjC‐containing demethylases, and provide further insights into the overall domain architecture of JmjC domain‐containing proteins.  相似文献   

19.
We have developed a statistical method named MAP (mutagenesis assistant program) to equip protein engineers with a tool to develop promising directed evolution strategies by comparing 19 mutagenesis methods. Instead of conventional transition/transversion bias indicators as benchmarks for comparison, we propose to use three indicators based on the subset of amino acid substitutions generated on the protein level: (1) protein structure indicator; (2) amino acid diversity indicator with a codon diversity coefficient; and (3) chemical diversity indicator. A MAP analysis for a single nucleotide substitution was performed for four genes: (1) heme domain of cytochrome P450 BM-3 from Bacillus megaterium (EC 1.14.14.1); (2) glucose oxidase from Aspergillus niger (EC 1.1.3.4); (3) arylesterase from Pseudomonas fluorescens (EC 3.1.1.2); and (4) alcohol dehydrogenase from Saccharomyces cerevisiae (EC 1.1.1.1). Based on the MAP analysis of these four genes, 19 mutagenesis methods have been evaluated and criteria for an ideal mutagenesis method have been proposed. The statistical analysis showed that existing gene mutagenesis methods are limited and highly biased. An average amino acid substitution per residue of only 3.15-7.4 can be achieved with current random mutagenesis methods. For the four investigated gene sequences, an average fraction of amino acid substitutions of 0.5-7% results in stop codons and 4.5-23.9% in glycine or proline residues. An average fraction of 16.2-44.2% of the amino acid substitutions are preserved, and 45.6% (epPCR method) are chemically different. The diversity remains low even when applying a non-biased method: an average of seven amino acid substitutions per residue, 2.9-4.7% stop codons, 11.1-16% glycine/proline residues, 21-25.8% preserved amino acids, and 55.5% are amino acids with chemically different side-chains. Statistical information for each mutagenesis method can further be used to investigate the mutational spectra in protein regions regarded as important for the property of interest.  相似文献   

20.
The structure of many proteins consists of a combination of discrete modules that have been shuffled during evolution. Such modules can frequently be recognized from the analysis of homology. Here we present a systematic analysis of the modular organization of all sequenced proteins. To achieve this we have developed an automatic method to identify protein domains from sequence comparisons. Homologous domains can then be clustered into consistent families. The method was applied to all 21,098 nonfragment protein sequences in SWISS-PROT 21.0, which was automatically reorganized into a comprehensive protein domain database, ProDom. We have constructed multiple sequence alignments for each domain family in ProDom, from which consensus sequences were generated. These nonreduntant domain consensuses are useful for fast homology searches. Domain organization in ProDom is exemplified for proteins of the phosphoenolpyruvate:sugar phosphotransferase system (PEP:PTS) and for bacterial 2-component regulators. We provide 2 examples of previously unrecognized domain arrangements discovered with the help of ProDom.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号