首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
The most probable horizontal gene transfer events in the evolution of Archaea are reconstructed based on the comparison of phylogenetic trees of housekeeping orthologous protein families with consensus phylogenies of Archaea. The existence of these phenomena suggests that the common ancestor of Archaea was of methanogenic and hyperthermophilic nature and dwelt in communities with a high level of ecological integration.  相似文献   

3.
Protein interactions are fundamental to the functioning of cells, and high throughput experimental and computational strategies are sought to map interactions. Predicting interaction specificity, such as matching members of a ligand family to specific members of a receptor family, is largely an unsolved problem. Here we show that by using evolutionary relationships within such families, it is possible to predict their physical interaction specificities. We introduce the computational method of matrix alignment for finding the optimal alignment between protein family similarity matrices. A second method, 3D embedding, allows visualization of interacting partners via spatial representation of the protein families. These methods essentially align phylogenetic trees of interacting protein families to define specific interaction partners. Prediction accuracy depends strongly on phylogenetic tree complexity, as measured with information theoretic methods. These results, along with simulations of protein evolution, suggest a model for the evolution of interacting protein families in which interaction partners are duplicated in coupled processes. Using these methods, it is possible to successfully find protein interaction specificities, as demonstrated for >18 protein families.  相似文献   

4.
We have characterized the relationship between accurate phylogenetic reconstruction and sequence similarity, testing whether high levels of sequence similarity can consistently produce accurate evolutionary trees. We generated protein families with known phylogenies using a modified version of the PAML/EVOLVER program that produces insertions and deletions as well as substitutions. Protein families were evolved over a range of 100-400 point accepted mutations; at these distances 63% of the families shared significant sequence similarity. Protein families were evolved using balanced and unbalanced trees, with ancient or recent radiations. In families sharing statistically significant similarity, about 60% of multiple sequence alignments were 95% identical to true alignments. To compare recovered topologies with true topologies, we used a score that reflects the fraction of clades that were correctly clustered. As expected, the accuracy of the phylogenies was greatest in the least divergent families. About 88% of phylogenies clustered over 80% of clades in families that shared significant sequence similarity, using Bayesian, parsimony, distance, and maximum likelihood methods. However, for protein families with short ancient branches (ancient radiation), only 30% of the most divergent (but statistically significant) families produced accurate phylogenies, and only about 70% of the second most highly conserved families, with median expectation values better than 10(-60), produced accurate trees. These values represent upper bounds on expected tree accuracy for sequences with a simple divergence history; proteins from 700 Giardia families, with a similar range of sequence similarities but considerably more gaps, produced much less accurate trees. For our simulated insertions and deletions, correct multiple sequence alignments did not perform much better than those produced by T-COFFEE, and including sequences with expressed sequence tag-like sequencing errors did not significantly decrease phylogenetic accuracy. In general, although less-divergent sequence families produce more accurate trees, the likelihood of estimating an accurate tree is most dependent on whether radiation in the family was ancient or recent. Accuracy can be improved by combining genes from the same organism when creating species trees or by selecting protein families with the best bootstrap values in comprehensive studies.  相似文献   

5.
The proliferation of gene data from multiple loci of large multigene families has been greatly facilitated by considerable recent advances in sequence generation. The evolution of such gene families, which often undergo complex histories and different rates of change, combined with increases in sequence data, pose complex problems for traditional phylogenetic analyses, and in particular, those that aim to successfully recover species relationships from gene trees. Here, we implement gene tree parsimony analyses on multicopy gene family data sets of snake venom proteins for two separate groups of taxa, incorporating Bayesian posterior distributions as a rigorous strategy to account for the uncertainty present in gene trees. Gene tree parsimony largely failed to infer species trees congruent with each other or with species phylogenies derived from mitochondrial and single-copy nuclear sequences. Analysis of four toxin gene families from a large expressed sequence tag data set from the viper genus Echis failed to produce a consistent topology, and reanalysis of a previously published gene tree parsimony data set, from the family Elapidae, suggested that species tree topologies were predominantly unsupported. We suggest that gene tree parsimony failure in the family Elapidae is likely the result of unequal and/or incomplete sampling of paralogous genes and demonstrate that multiple parallel gene losses are likely responsible for the significant species tree conflict observed in the genus Echis. These results highlight the potential for gene tree parsimony analyses to be undermined by rapidly evolving multilocus gene families under strong natural selection.  相似文献   

6.
Z Yi  LA Katz  W Song 《PloS one》2012,7(7):e40635
The current understanding of ciliate phylogeny is mainly based on analyses of a single gene, the small subunit ribosomal RNA (SSU-rDNA). However, phylogenetic trees based on single gene sequence are not reliable estimators of species trees, and SSU-rDNA genealogies are not useful for resolution of some branches within Ciliophora. Since congruence between multiple loci is the best tool to determine evolutionary history, we assessed the usefulness of alpha-tubulin gene, a protein-coding gene that is frequently sequenced, for ciliate phylogeny. Here, we generate alpha-tubulin gene sequences of 12 genera and 30 species within the order Euplotida, one of the most frequently encountered ciliate clades with numerous apparently cosmopolitan species, as well as four genera within its putative sister order Discocephalida. Analyses of the resulting data reveal that: 1) the alpha-tubulin gene is suitable phylogenetic marker for euplotids at the family level, since both nucleotide and amino acid phylogenies recover all monophyletic euplotid families as defined by both morphological criteria and SSU-rDNA trees; however, alpha-tubulin gene is not a good marker for defining species, order and subclass; 2) for seven out of nine euplotid species for which paralogs are detected, gene duplication appears recent as paralogs are monophyletic; 3) the order Euplotida is non-monophyletic, and the family Uronychiidae with sequences from four genera, is non-monophyletic; and 4) there is more genetic diversity within the family Euplotidae than is evident from dargyrome (geometrical pattern of dorsal "silverline system" in ciliates) patterns, habit and SSU-rDNA phylogeny, which indicates the urgent need for taxonomic revision in this area.  相似文献   

7.
Exon-intron structure and evolution of the Lipocalin gene family   总被引:6,自引:0,他引:6  
The Lipocalins are an ancient protein family whose expression is currently confirmed in bacteria, protoctists, plants, arthropods, and chordates. The evolution of this protein family has been assessed previously using amino acid sequence phylogenies. In this report we use an independent set of characters derived from the gene structure (exon-intron arrangement) to infer a new lipocalin phylogeny. We also present the novel gene structure of three insect lipocalins. The position and phase of introns are well preserved among lipocalin clades when mapped onto a protein sequence alignment, suggesting the homologous nature of these introns. Because of this homology, we use the intron position and phase of 23 lipocalin genes to reconstruct a phylogeny by maximum parsimony and distance methods. These phylogenies are very similar to the phylogenies derived from protein sequence. This result is confirmed by congruence analysis, and a consensus tree shows the commonalities between the two source trees. Interestingly, the intron arrangement phylogeny shows that metazoan lipocalins have more introns than other eukaryotic lipocalins, and that intron gains have occurred in the C-termini of chordate lipocalins. We also analyze the relationship of intron arrangement and protein tertiary structure, as well as the relationship of lipocalins with members of the proposed structural superfamily of calycins. Our congruence analysis validates the gene structure data as a source of phylogenetic information and helps to further refine our hypothesis on the evolutionary history of lipocalins.  相似文献   

8.
Uncovering the correct phylogeny of closely related species requires analysis of multiple gene genealogies or, alternatively, genealogies inferred from the multiple alleles found at highly polymorphic loci, such as microsatellites. However, a concern in using microsatellites is that constraints on allele sizes may occur, resulting in homoplasious distributions of alleles, leading to incorrect phylogenies. Seven microsatellites from the pathogenic fungus Coccidioides immitis were sequenced for 20 clinical isolates chosen to represent the known genetic diversity of the pathogen. An organismal phylogeny for C. immitis was inferred from microsatellite-flanking sequence polymorphisms and other restriction fragment length polymorphism-containing loci. Two microsatellite genetic distances were then used to determine phylogenies for C. immitis, and the trees found by these three methods were compared. Congruence between the organismal and microsatellite phylogenies occurred when microsatellite distances were based on simple allele frequency data. However, complex mutation events at some loci made distances based on stepwise mutation models unreliable. Estimates of times of divergence for the two species of C. immitis based on microsatellites were significantly lower than those calculated from flanking sequence, most likely due to constraints on microsatellite allele sizes. Flanking-sequence insertions/deletions significantly decreased the accuracy of genealogical information inferred from microsatellite loci and caused interspecific length homoplasies at one of the seven loci. Our analysis shows that microsatellites are useful phylogenetic markers, although care should be taken to choose loci with appropriate flanking sequences when they are intended for use in evolutionary studies.  相似文献   

9.
GeneTRACE-reconstruction of gene content of ancestral species   总被引:4,自引:0,他引:4  
While current computational methods allow the reconstruction of individual ancestral protein sequences, reconstruction of complete gene content of ancestral species is not yet an established task. In this paper, we describe GENETRACE, an efficient linear-time algorithm that allows the reconstruction of evolutionary history of individual protein families as well as the complete gene content of ancestral species. The performance of the method was validated with a simulated evolution program called SimulEv. Our results indicate that given a set of correct phylogenetic profiles and a correct species tree, ancestral gene content can be reconstructed with sensitivity and selectivity of more than 90%. SimulEv simulations were also used to evaluate performance of the reconstruction of gene content-based phylogenetic trees, suggesting that these trees may be accurate at the terminal branches but suffer from long branch attraction near the root of the tree.  相似文献   

10.
11.
Gene duplications have been common throughout vertebrate evolution, introducing paralogy and so complicating phylogenetic inference from nuclear genes. Reconciled trees are one method capable of dealing with paralogy, using the relationship between a gene phylogeny and the phylogeny of the organisms containing those genes to identify gene duplication events. This allows us to infer phylogenies from gene families containing both orthologous and paralogous copies. Vertebrate phylogeny is well understood from morphological and palaeontological data, but studies using mitochondrial sequence data have failed to reproduce this classical view. Reconciled tree analysis of a database of 118 vertebrate gene families supports a largely classical vertebrate phylogeny.  相似文献   

12.
Over the past 10years, much research has been dedicated to the understanding of protein interactions. Large-scale experiments to elucidate the global structure of protein interaction networks have been complemented by detailed studies of protein interaction interfaces. Understanding the evolution of interfaces allows one to identify convergently evolved interfaces which are evolutionary unrelated but share a few key residues and hence have common binding partners. Understanding interaction interfaces and their evolution is an important basis for pharmaceutical applications in drug discovery. Here, we review the algorithms and databases on 3D protein interactions and discuss in detail applications in interface evolution, drug discovery, and interface prediction.  相似文献   

13.
14.
The evolutionary history of quorum-sensing systems in bacteria   总被引:3,自引:0,他引:3  
Communication among bacterial cells through quorum-sensing (QS) systems is used to regulate ecologically and medically important traits, including virulence to hosts. QS is widespread in bacteria; it has been demonstrated experimentally in diverse phylogenetic groups, and homologs to the implicated genes have been discovered in a large proportion of sequenced bacterial genomes. The widespread distribution of the underlying gene families (LuxI/R and LuxS) raises the questions of how often QS genes have been transferred among bacterial lineages and the extent to which genes in the same QS system exchange partners or coevolve. Phylogenetic analyses of the relevant gene families show that the genes annotated as LuxI/R inducer and receptor elements comprise two families with virtually no homology between them and with one family restricted to the gamma-Proteobacteria and the other more widely distributed. Within bacterial phyla, trees for the LuxS and the two LuxI/R families show broad agreement with the ribosomal RNA tree, suggesting that these systems have been continually present during the evolution of groups such as the Proteobacteria and the Firmicutes. However, lateral transfer can be inferred for some genes (e.g., from Firmicutes to some distantly related lineages for LuxS). In general, the inducer/receptor elements in the LuxI/R systems have evolved together with little exchange of partners, although loss or replacement of partners has occurred in several lineages of gamma-Proteobacteria, the group for which sampling is most intensive in current databases. For instance, in Pseudomonas aeruginosa, a transferred QS system has been incorporated into the pathway of a native one. Gene phylogenies for the main LuxI/R family in Pseudomonas species imply a complex history of lateral transfer, ancestral duplication, and gene loss within the genus.  相似文献   

15.
Pybus OG  Rambaut A  Harvey PH 《Genetics》2000,155(3):1429-1437
We describe a unified set of methods for the inference of demographic history using genealogies reconstructed from gene sequence data. We introduce the skyline plot, a graphical, nonparametric estimate of demographic history. We discuss both maximum-likelihood parameter estimation and demographic hypothesis testing. Simulations are carried out to investigate the statistical properties of maximum-likelihood estimates of demographic parameters. The simulations reveal that (i) the performance of exponential growth model estimates is determined by a simple function of the true parameter values and (ii) under some conditions, estimates from reconstructed trees perform as well as estimates from perfect trees. We apply our methods to HIV-1 sequence data and find strong evidence that subtypes A and B have different demographic histories. We also provide the first (albeit tentative) genetic evidence for a recent decrease in the growth rate of subtype B.  相似文献   

16.
Comparative analyses of eukaryotic genomes are providing insights into the mode and tempo of domain family evolution. Gene duplication, the source of family expansion, far exceeds the rate of emergence of domains from non-coding sequence, and the rate of recruitment of domains into novel architectures. Domain families that appear to be restricted to certain lineages are likely to be the result of gene duplication, coupled with rapid sequence diversification. If such families are evidence of past adaptation, then their functions must relate to the underlying mechanism of selection: competition among organisms.  相似文献   

17.
MOTIVATION: Uncovering the protein-protein interaction network is a fundamental step in the quest to understand the molecular machinery of a cell. This motivates the search for efficient computational methods for predicting such interactions. Among the available predictors are those that are based on the co-evolution hypothesis "evolutionary trees of protein families (that are known to interact) are expected to have similar topologies". Many of these methods are limited by the fact that they can handle only a small number of protein sequences. Also, details on evolutionary tree topology are missing as they use similarity matrices in lieu of the trees. RESULTS: We introduce MORPH, a new algorithm for predicting protein interaction partners between members of two protein families that are known to interact. Our approach can also be seen as a new method for searching the best superposition of the corresponding evolutionary trees based on tree automorphism group. We discuss relevant facts related to the predictability of protein-protein interaction based on their co-evolution. When compared with related computational approaches, our method reduces the search space by approximately 3 x 10(5)-fold and at the same time increases the accuracy of predicting correct binding partners.  相似文献   

18.
MOTIVATION: Most molecular phylogenies are based on sequence alignments. Consequently, they fail to account for modes of sequence evolution that involve frequent insertions or deletions. Here we present a method for generating accurate gene and species phylogenies from whole genome sequence that makes use of short character string matches not placed within explicit alignments. In this work, the singular value decomposition of a sparse tetrapeptide frequency matrix is used to represent the proteins of organisms uniquely and precisely as vectors in a high-dimensional space. Vectors of this kind can be used to calculate pairwise distance values based on the angle separating the vectors, and the resulting distance values can be used to generate phylogenetic trees. Protein trees so derived can be examined directly for homologous sequences. Alternatively, vectors defining each of the proteins within an organism can be summed to provide a vector representation of the organism, which is then used to generate species trees. RESULTS: Using a large mitochondrial genome dataset, we have produced species trees that are largely in agreement with previously published trees based on the analysis of identical datasets using different methods. These trees also agree well with currently accepted phylogenetic theory. In principle, our method could be used to compare much larger bacterial or nuclear genomes in full molecular detail, ultimately allowing accurate gene and species relationships to be derived from a comprehensive comparison of complete genomes. In contrast to phylogenetic methods based on alignments, sequences that evolve by relative insertion or deletion would tend to remain recognizably similar.  相似文献   

19.
20.
Lineage sorting and introgression can lead to incongruence among gene phylogenies, complicating the inference of species trees for large groups of taxa that have recently and rapidly radiated. In addition, it can be difficult to determine which of these processes is responsible for this incongruence. We explore these issues with the radiation of New Zealand alpine cicadas of the genus Maoricicada Dugdale. Gene trees were estimated from four putative independent loci: mitochondrial DNA (2274 nucleotides), elongation factor 1-alpha (1275 nucleotides), period (1709 nucleotides), and calmodulin (678 nucleotides). We reconstructed phylogenies using maximum likelihood and Bayesian methods from 44 individuals representing the 19 species and subspecies of Maoricicada and two outgroups. Species-level relationships were reconstructed using a novel extension of gene tree parsimony, whereby gene trees were weighted by their Bayesian posterior probabilities. The inferred gene trees show marked incongruence in the placement of some taxa, especially the enigmatic forest and scrub dwelling species, M. iolanthe. Using the species tree estimated by gene tree parsimony, we simulated coalescent gene trees in order to test the null hypothesis that the nonrandom placement of M. iolanthe among gene trees has arisen by chance. Under the assumptions of constant population size, known generation time, and panmixia, we were able to reject this null hypothesis. Furthermore, because the two alternative placements of M. iolanthe are in each case with species that share a similar song structure, we conclude that it is more likely that an ancient introgression event rather than lineage sorting has caused this incongruence.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号