首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Abstract Protein structures are much more conserved than sequences during evolution. Based on this observation, we investigate the consequences of structural conservation on protein evolution. We study seven of the most studied protein folds, determining that an extended neutral network in sequence space is associated with each of them. Within our model, neutral evolution leads to a non-Poissonian substitution process, due to the broad distribution of connectivities in neutral networks. The observation that the substitution process has non-Poissonian statistics has been used to argue against the original Kimura neutral theory, while our model shows that this is a generic property of neutral evolution with structural conservation. Our model also predicts that the substitution rate can strongly fluctuate from one branch to another of the evolutionary tree. The average sequence similarity within a neutral network is close to the threshold of randomness, as observed for families of sequences sharing the same fold. Nevertheless, some positions are more difficult to mutate than others. We compare such structurally conserved positions to positions conserved in protein evolution, suggesting that our model can be a valuable tool to distinguish structural from functional conservation in databases of protein families. These results indicate that a synergy between database analysis and structurally based computational studies can increase our understanding of protein evolution.  相似文献   

3.
We have characterized the relationship between accurate phylogenetic reconstruction and sequence similarity, testing whether high levels of sequence similarity can consistently produce accurate evolutionary trees. We generated protein families with known phylogenies using a modified version of the PAML/EVOLVER program that produces insertions and deletions as well as substitutions. Protein families were evolved over a range of 100-400 point accepted mutations; at these distances 63% of the families shared significant sequence similarity. Protein families were evolved using balanced and unbalanced trees, with ancient or recent radiations. In families sharing statistically significant similarity, about 60% of multiple sequence alignments were 95% identical to true alignments. To compare recovered topologies with true topologies, we used a score that reflects the fraction of clades that were correctly clustered. As expected, the accuracy of the phylogenies was greatest in the least divergent families. About 88% of phylogenies clustered over 80% of clades in families that shared significant sequence similarity, using Bayesian, parsimony, distance, and maximum likelihood methods. However, for protein families with short ancient branches (ancient radiation), only 30% of the most divergent (but statistically significant) families produced accurate phylogenies, and only about 70% of the second most highly conserved families, with median expectation values better than 10(-60), produced accurate trees. These values represent upper bounds on expected tree accuracy for sequences with a simple divergence history; proteins from 700 Giardia families, with a similar range of sequence similarities but considerably more gaps, produced much less accurate trees. For our simulated insertions and deletions, correct multiple sequence alignments did not perform much better than those produced by T-COFFEE, and including sequences with expressed sequence tag-like sequencing errors did not significantly decrease phylogenetic accuracy. In general, although less-divergent sequence families produce more accurate trees, the likelihood of estimating an accurate tree is most dependent on whether radiation in the family was ancient or recent. Accuracy can be improved by combining genes from the same organism when creating species trees or by selecting protein families with the best bootstrap values in comprehensive studies.  相似文献   

4.
Most eubacteria, and all eukaryotes examined thus far, encode homologs of the DNA mismatch repair protein MutS. Although eubacteria encode only one or two MutS-like proteins, eukaryotes encode at least six distinct MutS homolog (MSH) proteins, corresponding to conserved (orthologous) gene families. This suggests evolution of individual gene family lines of descent by several duplication/specialization events. Using quantitative phylogenetic analyses (RASA, or relative apparent synapomorphy analysis), we demonstrate that comparison of complete MutS protein sequences, rather than highly conserved C-terminal domains only, maximizes information about evolutionary relationships. We identify a novel, highly conserved middle domain, as well as clearly delineate an N-terminal domain, previously implicated in mismatch recognition, that shows family-specific patterns of aromatic and charged amino acids. Our final analysis, in contrast to previous analyses of MutS-like sequences, yields a stable phylogenetic tree consistent with the known biochemical functions of MutS/MSH proteins, that now assigns all known eukaryotic MSH proteins to a monophyletic group, whose branches correspond to the respective specialized gene families. The rooted phylogenetic tree suggests their derivation from a mitochondrial MSH1-like protein, itself the descendent of the MutS of a symbiont in a primitive eukaryotic precursor.  相似文献   

5.
Hu Z  Ma B  Wolfson H  Nussinov R 《Proteins》2000,39(4):331-342
A number of studies have addressed the question of which are the critical residues at protein-binding sites. These studies examined either a single or a few protein-protein interfaces. The most extensive study to date has been an analysis of alanine-scanning mutagenesis. However, although the total number of mutations was large, the number of protein interfaces was small, with some of the interfaces closely related. Here we show that although overall binding sites are hydrophobic, they are studded with specific, conserved polar residues at specific locations, possibly serving as energy "hot spots." Our results confirm and generalize the alanine-scanning data analysis, despite its limited size. Previously Trp, Arg, and Tyr were shown to constitute energetic hot spots. These were rationalized by their polar interactions and by their surrounding rings of hydrophobic residues. However, there was no compelling reason as to why specifically these residues were conserved. Here we show that other polar residues are similarly conserved. These conserved residues have been detected consistently in all interface families that we have examined. Our results are based on an extensive examination of residues which are in contact across protein interfaces. We utilize all clustered interface families with at least five members and with sequence similarity between the members in the range of 20-90%. There are 11 such clustered interface families, comprising a total of 97 crystal structures. Our three-dimensional superpositioning analysis of the occurrences of matched residues in each of the families identifies conserved residues at spatially similar environments. Additionally, in enzyme inhibitors, we observe that residues are more conserved at the interfaces than at other locations. On the other hand, antibody-protein interfaces have similar surface conservation as compared to their corresponding linear sequence alignment, consistent with the suggestion that evolution has optimized protein interfaces for function.  相似文献   

6.
Dong JH  Wen JF  Tian HF 《Gene》2007,396(1):116-124
Ras superfamily proteins are key regulators in a wide variety of cellular processes. Previously, they were considered to be specific to eukaryotes, and MglA, a group of obviously different prokaryotic proteins, were recognized as their only prokaryotic analogs or even ancestors. Here, taking advantage of quite a current accumulation of prokaryotic genomic databases, we have investigated the existence and taxonomic distribution of Ras superfamily protein homologs in a much wider prokaryotic range, and analyzed their phylogenetic correlation with their eukaryotic analogs. Thirteen unambiguous prokaryotic homologs, which possess the GDP/GTP-binding domain with all the five characteristic motifs of their eukaryotic analogs, were identified in 12 eubacteria and one archaebacterium, respectively. In some other archaebacteria, including four methanogenic archaebacteria and three Thermoplasmales, homologs were also found, but with the GDP/GTP-binding domains not containing all the five characteristic motifs. Many more MglA orthologs were identified than in previous studies mainly in delta-proteobacteria, and all were shown to have common unique features distinct from the Ras superfamily proteins. Our phylogenetic analysis indicated eukaryotic Rab, Ran, Ras, and Rho families have the closest phylogenetic correlation with the 13 unambiguous prokaryotic homologs, whereas the other three eukaryotic protein families (SRbeta, Sar1, and Arf) branch separately from them, but have a relatively close relationship with the methanogenic archaebacterial homologs and MglA. Although homologs were identified in a relative minority of prokaryotes with genomic databases, their presence in a relatively wide variety of lineages, their unique sequence characters distinct from those of eukaryotic analogs, and the topology of our phylogenetic tree altogether do not support their origin from eukaryotes as a result of lateral gene transfer. Therefore, we argue that Ras superfamily proteins might have already emerged at least in some prokaryotic lineages, and that the seven eukaryotic protein families of the Ras superfamily may have two independent prokaryotic origins, probably reflecting the 'fusion' evolutionary history of the eukaryotic cell.  相似文献   

7.
Kim WK  Ison JC 《Proteins》2005,61(4):1075-1088
Considering the limited success of the most sophisticated docking methods available and the amount of computation required for systematic docking, cataloging all the known interfaces may be an alternative basis for the prediction of protein tertiary and quaternary structures. We classify domain interfaces according to the geometry of domain-domain association. By applying a simple and efficient method called "interface tag clustering," more than 4,000 distinct types of domain interfaces are collected from Protein Quaternary Structure Server and Protein Data Bank. Given a pair of interacting domains, we define "face" as the set of interacting residues in each single domain and the pair of interacting faces as an "interface." We investigate how the geometry of interfaces relates to a network of interacting protein families, such as how many different binding orientations are possible between two families or whether a family uses distinct surfaces or the same surface when the family has diverse interaction partners from various families. We show there are, on average, 1.2-1.9 different types of interfaces between interacting domains and a significant number of family pairs associate in multiple orientations. In general, a family tends to use distinct faces for each partner when the family has diverse interaction partners. Each face is highly specific to its interaction partner and the binding orientation. The relative positions of interface residues are generally well conserved within the same type of interface even between remote homologs. The classification result is available at http://www.biotec.tu-dresden.de/~wkim/supplement.  相似文献   

8.
tRNAs are among the most ancient, highly conserved sequences on earth, but are often thought to be poor phylogenetic markers because they are short, often subject to horizontal gene transfer, and easily change specificity. Here we use an algorithm now commonly used in microbial ecology, UniFrac, to cluster 175 genomes spanning all three domains of life based on the phylogenetic relationships among their complete tRNA pools. We find that the overall pattern of similarities and differences in the tRNA pools recaptures universal phylogeny to a remarkable extent, and that the resulting tree is similar to the distribution of bootstrapped rRNA trees from the same genomes. In contrast, the trees derived from tRNAs of identical specificity or of individual isoacceptors generally produced trees of lower quality. However, some tRNA isoacceptors were very good predictors of the overall pattern of organismal evolution. These results show that UniFrac can extract meaningful biological patterns from even phylogenies with high level of statistical inaccuracy and horizontal gene transfer, and that, overall, the pattern of tRNA evolution tracks universal phylogeny and provides a background against which we can test hypotheses about the evolution of individual isoacceptors.  相似文献   

9.
Protein interfaces are thought to be distinguishable from the rest of the protein surface by their greater degree of residue conservation. We test the validity of this approach on an expanded set of 64 protein-protein interfaces using conservation scores derived from two multiple sequence alignment types, one of close homologs/orthologs and one of diverse homologs/paralogs. Overall, we find that the interface is slightly more conserved than the rest of the protein surface when using either alignment type, with alignments of diverse homologs showing marginally better discrimination. However, using a novel surface-patch definition, we find that the interface is rarely significantly more conserved than other surface patches when using either alignment type. When an interface is among the most conserved surface patches, it tends to be part of an enzyme active site. The most conserved surface patch overlaps with 39% (+/- 28%) and 36% (+/- 28%) of the actual interface for diverse and close homologs, respectively. Contrary to results obtained from smaller data sets, this work indicates that residue conservation is rarely sufficient for complete and accurate prediction of protein interfaces. Finally, we find that obligate interfaces differ from transient interfaces in that the former have significantly fewer alignment gaps at the interface than the rest of the protein surface, as well as having buried interface residues that are more conserved than partially buried interface residues.  相似文献   

10.
Glycosylation is an important aspect of epigenetic regulation. Glycosyltransferase is a key enzyme in the biosynthesis of glycans, which glycosylates more than half of all proteins in eukaryotes and is involved in a wide range of biological processes. It has been suggested previously that homooligomerization in glycosyltransferases and other proteins might be crucial for their function. In this study, we explore functional homooligomeric states of glycosyltransferases in various organisms, trace their evolution, and perform comparative analyses to find structural features that can mediate or disrupt the formation of different homooligomers. First, we make a structure-based classification of the diverse superfamily of glycosyltransferases and confirm that the majority of the structures are indeed clustered into the GT-A or GT-B folds. We find that homooligomeric glycosyltransferases appear to be as ancient as monomeric glycosyltransferases and go back in evolution to the last universal common ancestor (LUCA). Moreover, we show that interface residues have significant bias to be gapped out or unaligned in the monomers, implying that they might represent features crucial for oligomer formation. Structural analysis of these features reveals that the majority of them represent loops, terminal regions, and helices, indicating that these secondary-structure elements mediate the formation of glycosyltransferases' homooligomers and directly contribute to the specific binding. We also observe relatively short protein regions that disrupt the homodimer interactions, although such cases are rare. These results suggest that relatively small structural changes in the nonconserved regions may contribute to the formation of different functional oligomeric states and might be important in regulation of enzyme activity through homooligomerization.  相似文献   

11.
Cadherins are cell surface adhesion proteins important for tissue development and integrity. Type I and type II, or classical, cadherins form adhesive dimers via an interface formed through the exchange, or “swapping”, of the N-terminal β-strands from their membrane-distal EC1 domains. Here, we ask which sequence and structural features in EC1 domains are responsible for β-strand swapping and whether members of other cadherin families form similar strand-swapped binding interfaces. We created a comprehensive database of multiple alignments of each type of cadherin domain. We used the known three-dimensional structures of classical cadherins to identify conserved positions in multiple sequence alignments that appear to be crucial determinants of the cadherin domain structure. We identified features that are unique to EC1 domains. On the basis of our analysis, we conclude that all cadherin domains have very similar overall folds but, with the exception of classical and desmosomal cadherin EC1 domains, most of them do not appear to bind through a strand-swapping mechanism. Thus, non-classical cadherins that function in adhesion are likely to use different protein-protein interaction interfaces. Our results have implications for the evolution of molecular mechanisms of cadherin-mediated adhesion in vertebrates.  相似文献   

12.
Signal recognition particle (SRP) is a cytoplasmic ribonucleoprotein that targets a subset of nascent presecretory proteins to the endoplasmic reticulum membrane. We have considered the SRP cycle from the perspective of molecular evolution, using recently determined sequences of genes or cDNAs encoding homologs of SRP (7SL) RNA, the Srp54 protein (Srp54p), and the alpha subunit of the SRP receptor (SR alpha) from a broad spectrum of organisms, together with the remaining five polypeptides of mammalian SRP. Our analysis provides insight into the significance of structural variation in SRP RNA and identifies novel conserved motifs in protein components of this pathway. The lack of congruence between an established phylogenetic tree and size variation in 7SL homologs implies the occurrence of several independent events that eliminated more than half the sequence content of this RNA during bacterial evolution. The apparently non-essential structures are domain I, a tRNA-like element that is constant in archaea, varies in size among eucaryotes, and is generally missing in bacteria, and domain III, a tightly base-paired hairpin that is present in all eucaryotic and archeal SRP RNAs but is invariably absent in bacteria. Based on both structural and functional considerations, we propose that the conserved core of SRP consists minimally of the 54 kDa signal sequence-binding protein complexed with the loosely base-paired domain IV helix of SRP RNA, and is also likely to contain a homolog of the Srp68 protein. Comparative sequence analysis of the methionine-rich M domains from a diverse array of Srp54p homologs reveals an extended region of amino acid identity that resembles a recently identified RNA recognition motif. Multiple sequence alignment of the G domains of Srp54p and SR alpha homologs indicates that these two polypeptides exhibit significant similarity even outside the four GTPase consensus motifs, including a block of nine contiguous amino acids in a location analogous to the binding site of the guanine nucleotide dissociation stimulator (GDS) for E. coli EF-Tu. The conservation of this sequence, in combination with the results of earlier genetic and biochemical studies of the SRP cycle, leads us to hypothesize that a component of the Srp68/72p heterodimer serves as the GDS for both Srp54p and SR alpha. Using an iterative alignment procedure, we demonstrate similarity between Srp68p and sequence motifs conserved among GDS proteins for small Ras-related GTPases. The conservation of SRP cycle components in organisms from all three major branches of the phylogenetic tree suggests that this pathway for protein export is of ancient evolutionary origin.  相似文献   

13.
One of the causes of genome size expansion is considered to be amplification of retrotransposons. We determined nucleotide sequences of 24 PCR products for each of six retrotransposons in Brassica rapa and Brassica oleracea. Phylogenetic trees of these sequences showed species-specific clades. We also sequenced STF7a homologs and Tto1 homologs, 24 PCR products each, in nine diploids and three allopolyploids, and constructed phylogenetic trees. In these phylogenetic trees, species-specific clades of diploid species were also formed, but retrotransposons of allopolyploids were clustered into the clades of their original genomes, indicating that these two retrotransposons amplified after speciation of the nine diploids. Genetic variation in these retrotransposons may have arisen before emergence of allopolyploid species. There was a positive correlation between the genome size and the average number of substitutions of STF7a and Tto1 homologs in at least seven diploids. The implications of these results in the genome evolution of Brassicaceae are herein discussed.  相似文献   

14.
The Bowman-Birk family (BBI) of proteinase inhibitors is probably the most studied family of plant inhibitors. We describe the primary structure and the gene expression profile of 14 putative BBIs from the sugarcane expressed sequence tag database and show how we used these newly discovered sequences together with 87 previously described BBI sequences from the GenBank database to construct phylogenetic trees for the BBI family. Phylogenetic analysis revealed that BBI-type inhibitors from monocotyledonous and dicotyledonous plants could be clearly separated into different groups, while the overall topology of the BBI tree suggests a different pattern of evolution for BBI families in flowering plants. We also found that BBI proteinase inhibitors from dicotyledonous plants were well conserved, accumulating only slight differences during their evolution. In addition, we found that BBIs from monocotyledonous plants were highly variable, indicating an interesting process of evolution based on internal gene duplications and mutation events.  相似文献   

15.
Large double-stranded DNA viruses, including poxviruses and mimiviruses, encode enzymes to catalyze the formation of disulfide bonds in viral proteins produced in the cell cytosol, an atypical location for oxidative protein folding. These viral disulfide catalysts belong to a family of sulfhydryl oxidases that are dimers of a small five-helix fold containing a Cys-X-X-Cys motif juxtaposed to a flavin adenine dinucleotide cofactor. We report that the sulfhydryl oxidase pB119L from African swine fever virus (ASFV) uses for self-assembly surface different from that observed in homologs from mammals, plants, and fungi. Within a protein family, different packing interfaces for the same oligomerization state are extremely rare. We find that the alternate dimerization mode seen in ASFV pB119L is not characteristic of all viral sulfhydryl oxidases, as the flavin-binding domain from a mimivirus sulfhydryl oxidase assumes the same dimer structure as the known eukaryotic enzymes. ASFV pB119L demonstrates the potential of large double-stranded DNA viruses, which have faster mutation rates than their hosts and the tendency to incorporate host genes, to pioneer new protein folds and self-assembly modes.  相似文献   

16.
Protein interactions are fundamental to the functioning of cells, and high throughput experimental and computational strategies are sought to map interactions. Predicting interaction specificity, such as matching members of a ligand family to specific members of a receptor family, is largely an unsolved problem. Here we show that by using evolutionary relationships within such families, it is possible to predict their physical interaction specificities. We introduce the computational method of matrix alignment for finding the optimal alignment between protein family similarity matrices. A second method, 3D embedding, allows visualization of interacting partners via spatial representation of the protein families. These methods essentially align phylogenetic trees of interacting protein families to define specific interaction partners. Prediction accuracy depends strongly on phylogenetic tree complexity, as measured with information theoretic methods. These results, along with simulations of protein evolution, suggest a model for the evolution of interacting protein families in which interaction partners are duplicated in coupled processes. Using these methods, it is possible to successfully find protein interaction specificities, as demonstrated for >18 protein families.  相似文献   

17.
Nucleotide binding site (NBS)–leucine-rich repeat (LRR) genes belong to the largest class of disease-resistance gene super groups in plants, and their intra- or interspecies nucleotide variations have been studied extensively to understand their evolution and function. However, little is known about the evolutionary patterns of their copy numbers in related species. Here, 129, 245, 239 and 508 NBSs were identified in maize, sorghum, brachypodium and rice, respectively, suggesting considerable variations of these genes. Based on phylogenetic relationships from a total of 496 ancestral branches of grass NBS families, three gene number variation patterns were categorized: conserved, sharing two or more species, and species-specific. Notably, the species-specific NBS branches are dominant (71.6%), while there is only a small percentage (3.83%) of conserved families. In contrast, the conserved families are dominant in 51 randomly selected house-keeping genes (96.1%). The opposite patterns between NBS and the other gene groups suggest that natural selection is responsible for the drastic number variation of NBS genes. The rapid expansion and/or contraction may be a fundamentally important strategy for a species to adapt to the quickly changing species-specific pathogen spectrum. In addition, the small proportion of conserved NBSs suggests that the loss of NBSs may be a general tendency in grass species.  相似文献   

18.
19.
As a result of recent genome sequencing projects as well as detailed biochemical, molecular genetic and physiological experimentation on representative transport proteins, we have come to realize that all organisms possess an extensive but limited array of transport protein types that allow the uptake of nutrients and excretion of toxic substances. These proteins fall into phylogenetic families that presumably reflect their evolutionary histories. Some of these families are restricted to a single phylogenetic group of organisms and may have arisen recently in evolutionary time while others are found ubiquitously and may be ancient. In this study we conduct systematic phylogenetic analyses of 26 families of transport systems that either had not been characterized previously or were in need of updating. Among the families analyzed are some that are bacterial-specific, others that are eukaryotic-specific, and others that are ubiquitous. They can function by either a channel-type or a carrier-type mechanism, and in the latter case, they are frequently energized by coupling solute transport to the flux of an ion down its electrochemical gradient. We tabulate the currently sequenced members of the 26 families analyzed, describe the properties of these families, and present partial multiple alignments, signature sequences and phylogenetic trees for them all.  相似文献   

20.
Small molecules that bind at protein-protein interfaces may either block or stabilize protein-protein interactions in cells. Thus, some of these binding interfaces may turn into prospective targets for drug design. Here, we collected 175 pairs of protein-protein (PP) complexes and protein-ligand (PL) complexes with known three-dimensional structures for which (1) one protein from the PP complex shares at least 40% sequence identity with the protein from the PL complex, and (2) the interface regions of these proteins overlap at least partially with each other. We found that those residues of the interfaces that may bind the other protein as well as the small molecule are evolutionary more conserved on average, have a higher tendency of being located in pockets and expose a smaller fraction of their surface area to the solvent than the remaining protein-protein interface region. Based on these findings we derived a statistical classifier that predicts patches at binding interfaces that have a higher tendency to bind small molecules. We applied this new prediction method to more than 10 000 interfaces from the protein data bank. For several complexes related to apoptosis the predicted binding patches were in direct contact to co-crystallized small molecules.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号