首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Internal protein dynamics is essential for biological function. During evolution, protein divergence is functionally constrained: properties more relevant for function vary more slowly than less important properties. Thus, if protein dynamics is relevant for function, it should be evolutionary conserved. In contrast with the well-studied evolution of protein structure, the evolutionary divergence of protein dynamics has not been addressed systematically before, apart from a few case studies. X-Ray diffraction analysis gives information not only on protein structure but also on B-factors, which characterize the flexibility that results from protein dynamics. Here we study the evolutionary divergence of protein backbone dynamics by comparing the Cα flexibility (B-factor) profiles for a large dataset of homologous proteins classified into families and superfamilies. We show that Cα flexibility profiles diverge slowly, so that they are conserved at family and superfamily levels, even for pairs of proteins with nonsignificant sequence similarity. We also analyze and discuss the correlations among the divergences of flexibility, sequence, and structure. Electronic Supplementary Material Electronic Supplementary material is available for this article at and accessible for authorised users. [Reviewing Editor: Dr. David Pollock]  相似文献   

2.
Twenty-seven protein sequence elements, six to nine amino acids long, were extracted from 15 phylogenetically diverse complete prokaryotic proteomes. The elements are present in all of these proteomes, with at least one copy each (omnipresent elements), and have presumably been conserved since the last universal common ancestor (LUCA). All these omnipresent elements are identified in crystallized protein structures as parts of highly conserved closed loops, 25–30 residues long, thus representing the closed-loop modules discovered in 2000 by Berezovsky et al. The omnipresent peptides make up seven distinct groups, of which the largest groups, Aleph and Beth, contain 18 and four elements, respectively, which are related but different, while five other groups are represented by only one element each. The LUCA modules appear with one or several copies per protein molecule in a variety of combinations depending on the functional identity of the corresponding protein. The functional involvement of individual LUCA modules is outlined on the basis of known protein annotations. Analyses of all the related sequences in a large, formatted protein sequence space suggest that many, if not all, of the 27 omnipresent elements have a common sequence origin. This sequence space network analysis may lead to elucidation of the earliest stages of protein evolution.  相似文献   

3.
Members of the protein phosphatase 2C (PP2C) superfamily are Mg2+/Mn2+-dependent serine/threonine phosphatases, which are essential for regulation of cell cycle and stress signaling pathways in cells. In this study, a comprehensive genomic analysis of all available metazoan PP2C sequences was conducted. The phylogeny of PP2C was reconstructed, revealing the existence of 15 vertebrate families which arose following a series of gene duplication events. Relative dating of these duplications showed that they occurred in two active periods: before the divergence of bilaterians and before vertebrate diversification. PP2C families which duplicated during the first period take part in different signaling pathways, whereas PP2C families which diverged in the second period display tissue expression differences yet participate in similar signaling pathways. These differences were found to involve variation of expression in tissues which show higher complexity in vertebrates, such as skeletal muscle and the nervous system. Further analysis was performed with the aim of identifying the functional domains of PP2C. The conservation pattern across the entire PP2C superfamily revealed an extensive domain of more than 50 amino acids which is highly conserved throughout all PP2C members. Several insertion or deletion events were found which may have led to the specialization of each PP2C family. Electronic Supplementary Material Electronic Supplementary material is available for this article at and accessible for authorised users. [Reviewing Editor: Dr. Hector Musto]  相似文献   

4.
The recent growth in structural data, and ensuing analyses, have revealed the structural and functional versatility of protein families. With respect to enzymes, local active-site mutations, variations in surface loops and recruitment of additional domains accommodate the diverse substrate specificities and catalytic activities observed within several superfamilies. Conversely, some functions have more than one structural solution, having evolved independently several times during evolution. Combined with the existence of multi-functional genes, which have arisen by gene recruitment, these phenomena must be considered in the process of genome annotation.  相似文献   

5.
Many protein classification systems capture homologous relationships by grouping domains into families and superfamilies on the basis of sequence similarity. Superfamilies with similar 3D structures are further grouped into folds. In the absence of discernable sequence similarity, these structural similarities were long thought to have originated independently, by convergent evolution. However, the growth of databases and advances in sequence comparison methods have led to the discovery of many distant evolutionary relationships that transcend the boundaries of superfamilies and folds. To investigate the contributions of convergent versus divergent evolution in the origin of protein folds, we clustered representative domains of known structure by their sequence similarity, treating them as point masses in a virtual 2D space which attract or repel each other depending on their pairwise sequence similarities. As expected, families in the same superfamily form tight clusters. But often, superfamilies of the same fold are linked with each other, suggesting that the entire fold evolved from an ancient prototype. Strikingly, some links connect superfamilies with different folds. They arise from modular peptide fragments of between 20 and 40 residues that co‐occur in the connected folds in disparate structural contexts. These may be descendants of an ancestral pool of peptide modules that evolved as cofactors in the RNA world and from which the first folded proteins arose by amplification and recombination. Our galaxy of folds summarizes, in a single image, most known and many yet undescribed homologous relationships between protein superfamilies, providing new insights into the evolution of protein domains.  相似文献   

6.
The origins of modern proteomes   总被引:1,自引:0,他引:1  
Kurland CG  Canbäck B  Berg OG 《Biochimie》2007,89(12):1454-1463
Distributions of phylogenetically related protein domains (fold superfamilies), or FSFs, among the three Superkingdoms (trichotomy) are assessed. Very nearly 900 of the 1200 FSFs of the trichotomy are shared by two or three Superkingdoms. Parsimony analysis of FSF distributions suggests that the FSF complement of the last common ancestor to the trichotomy was more like that of modern eukaryotes than that of archaea and bacteria. Studies of length distributions among members of orthologous families of proteins present in all three Superkingdoms reveal that such lengths are significantly longer among eukaryotes than among bacteria and archaea. The data also reveal that proteins lengths of eukaryotes are more broadly distributed than they are within archaeal and bacterial members of the same orthologous families. Accordingly, selective pressure for a minimal size is significantly greater for orthologous protein lengths in archaea and bacteria than in eukaryotes. Alignments of orthologous proteins of archaea, bacteria and eukaryotes are characterized by greater sequence variation at their N-terminal and C-terminal domains, than in their central cores. Length variations tend to be localized in the terminal sequences; the conserved sequences of orthologous families are localized in a central core. These data are consistent with the interpretation that the genomes of the last common ancestor (LUCA) encoded a cohort of FSFs not very different from that of modern eukaryotes. Divergence of bacterial and archaeal genomes from that common ancestor may have been accompanied by more intensive reductive evolution of proteomes than that expressed in eukaryotes. Dollo's Law suggests that the evolution of novel FSFs is a very slow process, while laboratory experiments suggests that novel protein genesis from preexisting FSFs can be relatively rapid. Reassortment of FSFs to create novel proteins may have been mediated by genetic recombination before the advent of more efficient splicing mechanisms.  相似文献   

7.
In addition to its value in the study of molecular evolution, ancestral sequence reconstruction (ASR) has emerged as a useful methodology for engineering proteins with enhanced properties. Proteins generated by ASR often exhibit unique or improved activity, stability, and/or promiscuity, all of which are properties that are valued by protein engineers. Comparison between extant proteins and evolutionary intermediates generated by ASR also allows protein engineers to identify substitutions that have contributed to functional innovation or diversification within protein families. As ASR becomes more widely adopted as a protein engineering approach, it is important to understand the applications, limitations, and recent developments of this technique. This review highlights recent exemplifications of ASR, as well as technical aspects of the reconstruction process that are relevant to protein engineering.  相似文献   

8.
Summary We have implemented a routine procedure for screening protein sequences for evidence of intragenic duplications. We tested 163 protein sequences representing 116 superfamilies of unrelated proteins. Twenty superfamilies contain proteins with internal gene duplications. The intragenic duplications detected can be divided into two major types. (1) One or more duplications of all or part of a gene produce a protein with two or several detectable regions of sequence homology. Sequences from 18 superfamilies contained this type of duplication. (2) Repeated reduplication of a small DNA segment can produce a protein that is repetitive over most of its length. Three superfamilies contain such repetitive sequences. We also investigated the limits of detection of ancient duplications using sequences derived by random mutation of a model sequence consisting of ten 10-residue repeats. The original repetitive nature of the sequence was usually detected after 250 point mutations even though the ancestral segment could not be accurately reconstructed.  相似文献   

9.
Molecular dynamics (MD) simulations on a bacterial cytochrome c were performed to investigate the lifetime and fluctuations of backbone hydrogen bonds and to correlate these data with protection factors for hydrogen exchange measured by NMR spectroscopy (Bartalesi et al. in Biochemistry, 42:10923–10930, 2003). The MD simulations provide a consistent pattern in that long lifetimes of hydrogen bonds go along with small amplitude fluctuations. In agreement with experiments, differences in stability were found with a rather flexible N-terminal segment as compared with a more rigid C-terminal part. Protection factors of backbone hydrogen exchange correlate strongly with the number of contacts but also with hydrogen-bond occupancy, hydrogen-bond survival times, as well as the inverse of fluctuations of backbone atoms and hydrogen-bond lengths derived from MD simulation data. We observed a conformational transition in the C-terminal loop, and significant motion in the N-terminal loop, which can be interpreted as being the structural units involved in the onset of the protein unfolding process in agreement with experimental evidence on mitochondrial cytochrome c. Electronic Supplementary Material Supplementary material is available for this article at and is accessible for authorized users. Gernot Kieseritzky and Giulia Morra both contributed equally to this work.  相似文献   

10.
Aminoacyl-tRNA synthetases (aaRS) consist of several families of functionally conserved proteins essential for translation and protein synthesis. Like nearly all components of the translation machinery, most aaRS families are universally distributed across cellular life, being inherited from the time of the Last Universal Common Ancestor (LUCA). However, unlike the rest of the translation machinery, aaRS have undergone numerous ancient horizontal gene transfers, with several independent events detected between domains, and some possibly involving lineages diverging before the time of LUCA. These transfers reveal the complexity of molecular evolution at this early time, and the chimeric nature of genomes within cells that gave rise to the major domains. Additionally, given the role of these protein families in defining the amino acids used for protein synthesis, sequence reconstruction of their pre-LUCA ancestors can reveal the evolutionary processes at work in the origin of the genetic code. In particular, sequence reconstructions of the paralog ancestors of isoleucyl- and valyl- RS provide strong empirical evidence that at least for this divergence, the genetic code did not co-evolve with the aaRSs; rather, both amino acids were already part of the genetic code before their cognate aaRSs diverged from their common ancestor. The implications of this observation for the early evolution of RNA-directed protein biosynthesis are discussed.  相似文献   

11.
12.
The Universal Ancestor and the Ancestor of Bacteria Were Hyperthermophiles   总被引:4,自引:0,他引:4  
The definition of the node of the last universal common ancestor (LUCA) is justified in a topology of the unrooted universal tree. This definition allows previous analyses based on paralogous proteins to be extended to orthologous ones. In particular, the use of a thermophily index (based on the amino acids propensity to enter the [hyper] thermophile proteins more frequently) and its correlation with the optimal growth temperature of the various organisms allow inferences to be made on the habitat in which the LUCA lived. The reconstruction of ancestral sequences by means of the maximum likelihood method and their attribution to the set of mesophilic or hyperthermophilic sequences have led to the following conclusions: the LUCA was a hyperthermophile organism, as were the ancestors of the Archaea and Bacteria domains, while the ancestor of the Eukarya domain was a mesophile. These conclusions are independent of the presence of hyperthermophile bacteria in the sample of sequences used in the analysis and are therefore independent of whether or not these are the first lines of divergence in the Bacteria domain, as observed in the topology of the universal tree of ribosomal RNA. These conclusions are thus more easily understood under the hypothesis that the origin of life took place at a high temperature.  相似文献   

13.
Selection pressures on proteins are usually measured by comparing homologous nucleotide sequences (Zuckerkandl and Pauling 1965). Recently we introduced a novel method, termed volatility, to estimate selection pressures on proteins on the basis of their synonymous codon usage (Plotkin and Dushoff 2003; Plotkin et al. 2004). Here we provide a theoretical foundation for this approach. Under the Fisher-Wright model, we derive the expected frequencies of synonymous codons as a function of the strength of selection on amino acids, the mutation rate, and the effective population size. We analyze the conditions under which we can expect to draw inferences from biased codon usage, and we estimate the time scales required to establish and maintain such a signal. We find that synonymous codon usage can reliably distinguish between negative selection and neutrality only for organisms, such as some microbes, that experience large effective population sizes or periods of elevated mutation rates. The power of volatility to detect positive selection is also modest—requiring approximately 100 selected sites—but it depends less strongly on population size. We show that phenomena such as transient hyper-mutators can improve the power of volatility to detect selection, even when the neutral site heterozygosity is low. We also discuss several confounding factors, neglected by the Fisher-Wright model, that may limit the applicability of volatility in practice. Electronic Supplementary Material Electronic Supplementary material is available for this article at and accessible for authorised users. [Reviewing Editor: Dr. Lauren Meyers]  相似文献   

14.
The co-chaperone Hop [heat shock protein (HSP) organising protein] is known to bind both Hsp70 and Hsp90. Hop comprises three repeats of a tetratricopeptide repeat (TPR) domain, each consisting of three TPR motifs. The first and last TPR domains are followed by a domain containing several dipeptide (DP) repeats called the DP domain. These analyses suggest that the hop genes result from successive recombination events of an ancestral TPR–DP module. From a hydrophobic cluster analysis of homologous Hop protein sequences derived from gene families, we can postulate that shifts in the open reading frames are at the origin of the present sequences. Moreover, these shifts can be related to the presence or absence of biological function. We propose to extend the family of Hop co-chaperons into the kingdom of bacteria, as several structurally related genes have been identified by hydrophobic cluster analysis. We also provide evidence of common structural characteristics between hop and hip genes, suggesting a shared precursor of ancestral TPR–DP domains. Electronic supplementary material  The online version of this article (doi:) contains supplementary material, which is available to authorized users.  相似文献   

15.
Several lines of evidence such as the basal location of thermophilic lineages in large-scale phylogenetic trees and the ancestral sequence reconstruction of single enzymes or large protein concatenations support the conclusion that the ancestors of the bacterial and archaeal domains were thermophilic organisms which were adapted to hot environments during the early stages of the Earth. A parsimonious reasoning would therefore suggest that the last universal common ancestor (LUCA) was also thermophilic. Various authors have used branch-wise non-homogeneous evolutionary models that better capture the variation of molecular compositions among lineages to accurately reconstruct the ancestral G + C contents of ribosomal RNAs and the ancestral amino acid composition of highly conserved proteins. They confirmed the thermophilic nature of the ancestors of Bacteria and Archaea but concluded that LUCA, their last common ancestor, was a mesophilic organism having a moderate optimal growth temperature. In this letter, we investigate the unknown nature of the phylogenetic signal that informs ancestral sequence reconstruction to support this non-parsimonious scenario. We find that rate variation across sites of molecular sequences provides information at different time scales by recording the oldest adaptation to temperature in slow-evolving regions and subsequent adaptations in fast-evolving ones.  相似文献   

16.
Domains are the evolutionary units that comprise proteins, and most proteins are built from more than one domain. Domains can be shuffled by recombination to create proteins with new arrangements of domains. Using structural domain assignments, we examined the combinations of domains in the proteins of 131 completely sequenced organisms. We found two-domain and three-domain combinations that recur in different protein contexts with different partner domains. The domains within these combinations have a particular functional and spatial relationship. These units are larger than individual domains and we term them "supra-domains". Amongst the supra-domains, we identified some 1400 (1203 two-domain and 166 three-domain) combinations that are statistically significantly over-represented relative to the occurrence and versatility of the individual component domains. Over one-third of all structurally assigned multi-domain proteins contain these over-represented supra-domains. This means that investigation of the structural and functional relationships of the domains forming these popular combinations would be particularly useful for an understanding of multi-domain protein function and evolution as well as for genome annotation. These and other supra-domains were analysed for their versatility, duplication, their distribution across the three kingdoms of life and their functional classes. By examining the three-dimensional structures of several examples of supra-domains in different biological processes, we identify two basic types of spatial relationships between the component domains: the combined function of the two domains is such that either the geometry of the two domains is crucial and there is a tight constraint on the interface, or the precise orientation of the domains is less important and they are spatially separate. Frequently, the role of the supra-domain becomes clear only once the three-dimensional structure is known. Since this is the case for only a quarter of the supra-domains, we provide a list of the most important unknown supra-domains as potential targets for structural genomics projects.  相似文献   

17.

Background

Throughout evolution, mutations in particular regions of some protein structures have resulted in extra covalent bonds that increase the overall robustness of the fold: disulfide bonds. The two strategically placed cysteines can also have a more direct role in protein function, either by assisting thiol or disulfide exchange, or through allosteric effects. In this work, we verified how the structural similarities between disulfides can reflect functional and evolutionary relationships between different proteins. We analyzed the conformational patterns of the disulfide bonds in a set of disulfide-rich proteins that included twelve SCOP superfamilies: thioredoxin-like and eleven superfamilies containing small disulfide-rich proteins (SDP).

Results

The twenty conformations considered in the present study were characterized by both structural and energetic parameters. The corresponding frequencies present diverse patterns for the different superfamilies. The least-strained conformations are more abundant for the SDP superfamilies, while the “catalytic” +/−RHook is dominant for the thioredoxin-like superfamily. The “allosteric” -RHSaple is moderately abundant for BBI, Crisp and Thioredoxin-like superfamilies and less frequent for the remaining superfamilies. Using a hierarchical clustering analysis we found that the twelve superfamilies were grouped in biologically significant clusters.

Conclusions

In this work, we carried out an extensive statistical analysis of the conformational motifs for the disulfide bonds present in a set of disulfide-rich proteins. We show that the conformational patterns observed in disulfide bonds are sufficient to group proteins that share both functional and structural patterns and can therefore be used as a criterion for protein classification.  相似文献   

18.
In this study, we used a computational approach to investigate the early evolutionary history of a system of proteins that, together, embed and translocate other proteins across cell membranes. Cell membranes comprise the basis for cellularity, which is an ancient, fundamental organizing principle shared by all organisms and a key innovation in the evolution of life on Earth. Two related requirements for cellularity are that organisms are able to both embed proteins into membranes and translocate proteins across membranes. One system that accomplishes these tasks is the signal recognition particle (SRP) system, in which the core protein components are the paralogs, FtsY and Ffh. Complementary to the SRP system is the Sec translocation channel, in which the primary channel-forming protein is SecY. We performed phylogenetic analyses that strongly supported prior inferences that FtsY, Ffh, and SecY were all present by the time of the last universal common ancestor of life, the LUCA, and that the ancestor of FtsY and Ffh existed before the LUCA. Further, we combined ancestral sequence reconstruction and protein structure and function prediction to show that the LUCA had an SRP system and Sec translocation channel that were similar to those of extant organisms. We also show that the ancestor of Ffh and FtsY that predated the LUCA was more similar to FtsY than Ffh but could still have comprised a rudimentary protein translocation system on its own. Duplication of the ancestor of FtsY and Ffh facilitated the specialization of FtsY as a membrane bound receptor and Ffh as a cytoplasmic protein that could bind nascent proteins with specific membrane-targeting signal sequences. Finally, we analyzed amino acid frequencies in our ancestral sequence reconstructions to infer that the ancestral Ffh/FtsY protein likely arose prior to or just after the completion of the canonical genetic code. Taken together, our results offer a window into the very early evolutionary history of cellularity.  相似文献   

19.
The theory that Shigella is derived from multiple independent origins of Escherichia coli (Pupo et al. 2000) has been challenged by recent findings that the virulence plasmids (VPs) and the chromosomes share a similar evolutionary history (Escobar-Paramo et al. 2003), which suggests that an ancestral VP entered an E. coli strain only once, which gave rise to Shigella spp. In an attempt to resolve these conflicting theories, we constructed three phylogenetic trees in this study: a robust chromosomal tree using 23 housekeeping genes from 46 strains of Shigella and enteroinvasive E. coli (EIEC), a chromosomal tree using 4 housekeeping genes from 19 EcoR strains and 46 Shigella/EIEC strains, and a VP tree using 5 genes outside of the VP cell-entry region from 38 Shigella/EIEC strains. Both chromosomal trees group Shigella into three main clusters and five outliers, and strongly suggest that Shigella has multiple origins within E. coli. Most strikingly, the VP tree shows that the VPs from two main Shigella clusters, C1 and C2, are more closely related, which contradicts the chromosomal trees that place C2 and C3 next to each other but C1 at a distance. Additionally, we have identified a complete tra operon of the F-plasmid in the genome sequence of an EIEC strain and found that two other EIEC strains are also likely to possess a complete tra operon. All lines of evidence support an alternative multiorigin theory that transferable diverse ancestral VPs entered diverse origins of E. coli multiple times during a prolonged period of time, resulting in Shigella species with diverse genomes but similar pathogenic properties. Electronic Supplementary Material Electronic Supplementary material is available for this article at and accessible for authorised users. [Reviewing Editor: Dr. Martin Kreitman] Jian Yang and Huan Nie contributed equally to this work.  相似文献   

20.

Background

As tertiary structure is currently available only for a fraction of known protein families, it is important to assess what parts of sequence space have been structurally characterized. We consider protein domains whose structure can be predicted by sequence similarity to proteins with solved structure and address the following questions. Do these domains represent an unbiased random sample of all sequence families? Do targets solved by structural genomic initiatives (SGI) provide such a sample? What are approximate total numbers of structure-based superfamilies and folds among soluble globular domains?

Results

To make these assessments, we combine two approaches: (i) sequence analysis and homology-based structure prediction for proteins from complete genomes; and (ii) monitoring dynamics of the assigned structure set in time, with the accumulation of experimentally solved structures. In the Clusters of Orthologous Groups (COG) database, we map the growing population of structurally characterized domain families onto the network of sequence-based connections between domains. This mapping reveals a systematic bias suggesting that target families for structure determination tend to be located in highly populated areas of sequence space. In contrast, the subset of domains whose structure is initially inferred by SGI is similar to a random sample from the whole population. To accommodate for the observed bias, we propose a new non-parametric approach to the estimation of the total numbers of structural superfamilies and folds, which does not rely on a specific model of the sampling process. Based on dynamics of robust distribution-based parameters in the growing set of structure predictions, we estimate the total numbers of superfamilies and folds among soluble globular proteins in the COG database.

Conclusion

The set of currently solved protein structures allows for structure prediction in approximately a third of sequence-based domain families. The choice of targets for structure determination is biased towards domains with many sequence-based homologs. The growing SGI output in the future should further contribute to the reduction of this bias. The total number of structural superfamilies and folds in the COG database are estimated as ~4000 and ~1700. These numbers are respectively four and three times higher than the numbers of superfamilies and folds that can currently be assigned to COG proteins.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号