首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Species tree inference from gene family trees is becoming increasingly popular because it can account for discordance between the species tree and the corresponding gene family trees. In particular, methods that can account for multiple-copy gene families exhibit potential to leverage paralogy as informative signal. At present, there does not exist any widely adopted inference method for this purpose. Here, we present SpeciesRax, the first maximum likelihood method that can infer a rooted species tree from a set of gene family trees and can account for gene duplication, loss, and transfer events. By explicitly modeling events by which gene trees can depart from the species tree, SpeciesRax leverages the phylogenetic rooting signal in gene trees. SpeciesRax infers species tree branch lengths in units of expected substitutions per site and branch support values via paralogy-aware quartets extracted from the gene family trees. Using both empirical and simulated data sets we show that SpeciesRax is at least as accurate as the best competing methods while being one order of magnitude faster on large data sets at the same time. We used SpeciesRax to infer a biologically plausible rooted phylogeny of the vertebrates comprising 188 species from 31,612 gene families in 1 h using 40 cores. SpeciesRax is available under GNU GPL at https://github.com/BenoitMorel/GeneRax and on BioConda.  相似文献   

2.
Inferring phylogenetic trees for individual homologous gene families is difficult because alignments are often too short, and thus contain insufficient signal, while substitution models inevitably fail to capture the complexity of the evolutionary processes. To overcome these challenges, species-tree-aware methods also leverage information from a putative species tree. However, only few methods are available that implement a full likelihood framework or account for horizontal gene transfers. Furthermore, these methods often require expensive data preprocessing (e.g., computing bootstrap trees) and rely on approximations and heuristics that limit the degree of tree space exploration. Here, we present GeneRax, the first maximum likelihood species-tree-aware phylogenetic inference software. It simultaneously accounts for substitutions at the sequence level as well as gene level events, such as duplication, transfer, and loss relying on established maximum likelihood optimization algorithms. GeneRax can infer rooted phylogenetic trees for multiple gene families, directly from the per-gene sequence alignments and a rooted, yet undated, species tree. We show that compared with competing tools, on simulated data GeneRax infers trees that are the closest to the true tree in 90% of the simulations in terms of relative Robinson–Foulds distance. On empirical data sets, GeneRax is the fastest among all tested methods when starting from aligned sequences, and it infers trees with the highest likelihood score, based on our model. GeneRax completed tree inferences and reconciliations for 1,099 Cyanobacteria families in 8 min on 512 CPU cores. Thus, its parallelization scheme enables large-scale analyses. GeneRax is available under GNU GPL at https://github.com/BenoitMorel/GeneRax (last accessed June 17, 2020).    相似文献   

3.
PhyloGenie: automated phylome generation and analysis   总被引:12,自引:1,他引:11  
Phylogenetic reconstruction is the method of choice to determine the homologous relationships between sequences. Difficulties in producing high-quality alignments, which are the basis of good trees, and in automating the analysis of trees have unfortunately limited the use of phylogenetic reconstruction methods to individual genes or gene families. Due to the large number of sequences involved, phylogenetic analyses of proteomes preclude manual steps and therefore require a high degree of automation in sequence selection, alignment, phylogenetic inference and analysis of the resulting set of trees. We present a set of programs that automates the steps from seed sequence to phylogeny and a utility to extract all phylogenies that match specific topological constraints from a database of trees. Two example applications that show the type of questions that can be answered by phylome analysis are provided. The generation and analysis of the Thermoplasma acidophilum phylome with regard to lateral gene transfer between Thermoplasmata and Sulfolobus, showed best BLAST hits to be far less reliable indicators of lateral transfer than the corresponding protein phylogenies.The generation and analysis of the Danio rerio phylome provided more than twice as many proteins as described previously, supporting the hypothesis of an additional round of genome duplication in the actinopterygian lineage.  相似文献   

4.
Previous analysis of the gene encoding phosphoglucose isomerase (Pgi) suggests that this gene may have been transferred between a eukaryote and a bacterium. However, excluding the alternative hypothesis of ancient gene duplication has proven difficult because of both insufficient sampling of taxa and an earlier misidentification of a bacterialPgi sequence. This paper presents a phylogenetic analysis of published completePgi sequences together with analysis of new partialPgi sequences from six species of bacteria. The data identify a group of bacterialPgi sequences, including sequences fromEscherichia coli andHaemophilus influenzae, which are more closely related to eukaryoticPgi sequences than to other bacterial sequences. The topology of gene trees constructed using several different methods are all consistent with the hypothesis of lateral gene transfer andnot ancient gene duplication. Furthermore, an estimate of a molecular clock forPgi dates the divergence of theE. coli andH. influenzae sequences from the animal sequences to between 470 and 650 million years ago, well after other estimates of the divergence between eukaryotes and bacteria. This study provides the most convincing evidence to date of the transkingdom transfer of a nuclear gene.  相似文献   

5.
Phylogenetic networks represent the evolution of organisms that have undergone reticulate events, such as recombination, hybrid speciation or lateral gene transfer. An important way to interpret a phylogenetic network is in terms of the trees it displays, which represent all the possible histories of the characters carried by the organisms in the network. Interestingly, however, different networks may display exactly the same set of trees, an observation that poses a problem for network reconstruction: from the perspective of many inference methods such networks are indistinguishable. This is true for all methods that evaluate a phylogenetic network solely on the basis of how well the displayed trees fit the available data, including all methods based on input data consisting of clades, triples, quartets, or trees with any number of taxa, and also sequence-based approaches such as popular formalisations of maximum parsimony and maximum likelihood for networks. This identifiability problem is partially solved by accounting for branch lengths, although this merely reduces the frequency of the problem. Here we propose that network inference methods should only attempt to reconstruct what they can uniquely identify. To this end, we introduce a novel definition of what constitutes a uniquely reconstructible network. For any given set of indistinguishable networks, we define a canonical network that, under mild assumptions, is unique and thus representative of the entire set. Given data that underwent reticulate evolution, only the canonical form of the underlying phylogenetic network can be uniquely reconstructed. While on the methodological side this will imply a drastic reduction of the solution space in network inference, for the study of reticulate evolution this is a fundamental limitation that will require an important change of perspective when interpreting phylogenetic networks.  相似文献   

6.
While there is compelling evidence for the impact of endosymbiotic gene transfer (EGT; transfer from either mitochondrion or chloroplast to the nucleus) on genome evolution in eukaryotes, the role of interdomain transfer from bacteria and/or archaea (i.e. prokaryotes) is less clear. Lateral gene transfers (LGTs) have been argued to be potential sources of phylogenetic information, particularly for reconstructing deep nodes that are difficult to recover with traditional phylogenetic methods. We sought to identify interdomain LGTs by using a phylogenomic pipeline that generated 13 465 single gene trees and included up to 487 eukaryotes, 303 bacteria and 118 archaea. Our goals include searching for LGTs that unite major eukaryotic clades, and describing the relative contributions of LGT and EGT across the eukaryotic tree of life. Given the difficulties in interpreting single gene trees that aim to capture the approximately 1.8 billion years of eukaryotic evolution, we focus on presence–absence data to identify interdomain transfer events. Specifically, we identify 1138 genes found only in prokaryotes and representatives of three or fewer major clades of eukaryotes (e.g. Amoebozoa, Archaeplastida, Excavata, Opisthokonta, SAR and orphan lineages). The majority of these genes have phylogenetic patterns that are consistent with recent interdomain LGTs and, with the notable exception of EGTs involving photosynthetic eukaryotes, we detect few ancient interdomain LGTs. These analyses suggest that LGTs have probably occurred throughout the history of eukaryotes, but that ancient events are not maintained unless they are associated with endosymbiotic gene transfer among photosynthetic lineages.  相似文献   

7.
Members of the Deinococcaceae (e.g., Thermus, Meiothermus, Deinococcus) contain A/V-ATPases typically found in Archaea or Eukaryotes which were probably acquired by horizontal gene transfer. Two methods were used to quantify the extent to which archaeal or eukaryotic genes have been acquired by this lineage. Screening of a Meiothermus ruber library with probes made against Thermoplasma acidophilum DNA yielded a number of clones which hybridized more strongly than background. One of these contained the prolyl tRNA synthetase (RS) gene. Phylogenetic analysis shows the M. ruber and D. radiodurans prolyl RS to be more closely related to archaeal and eukaryal forms of this gene than to the typical bacterial type. Using a bioinformatics approach, putative open reading frames (ORFs) from the prerelease version of the D. radiodurans genome were screened for genes more closely related to archaeal or eukaryotic genes. Putative ORFs were searched against representative genomes from each of the three domains using automated BLAST. ORFs showing the highest matches against archaeal and eukaryotic genes were collected and ranked. Among the top-ranked hits were the A/V-ATPase catalytic and noncatalytic subunits and the prolyl RS genes. Using phylogenetic methods, ORFs were analyzed and trees assessed for evidence of horizontal gene transfer. Of the 45 genes examined, 20 showed topologies in which D. radiodurans homologues clearly group with eukaryotic or archaeal homologues, and 17 additional trees were found to show probable evidence of horizontal gene transfer. Compared to the total number of ORFs in the genome, those that can be identified as having been acquired from Archaea or Eukaryotes are relatively few (approximately 1%), suggesting that interdomain transfer is rare.  相似文献   

8.
Although the role of lateral gene transfer is well recognized in the evolution of bacteria, it is generally assumed that it has had less influence among eukaryotes. To explore this hypothesis, we compare the dynamics of genome evolution in two groups of organisms: cyanobacteria and fungi. Ancestral genomes are inferred in both clades using two types of methods: first, Count, a gene tree unaware method that models gene duplications, gains and losses to explain the observed numbers of genes present in a genome; second, ALE, a more recent gene tree-aware method that reconciles gene trees with a species tree using a model of gene duplication, loss and transfer. We compare their merits and their ability to quantify the role of transfers, and assess the impact of taxonomic sampling on their inferences. We present what we believe is compelling evidence that gene transfer plays a significant role in the evolution of fungi.  相似文献   

9.
Detection of lateral gene transfer among microbial genomes   总被引:17,自引:0,他引:17  
An increasingly comprehensive assessment is being developed of the extent and potential significance of lateral gene transfer among microbial genomes. Genomic sequences can be identified as being of putatively lateral origin by their unexpected phyletic distribution, atypical sequence composition, differential presence or absence in closely related genomes, or incongruent phylogenetic trees. These complementary approaches sometimes yield inconsistent results. Not only more data but also quantitative models and simulations are needed urgently.  相似文献   

10.
Aim Geographic affinities were analysed for nodule bacteria (Bradyrhizobium sp. Jordan) associated with two legume trees indigenous to the Philippines: Pterocarpus indicus (Papilionoideae) and Wallaceodendron celebicum (Mimosoideae). Location Nodule bacteria from Luzon, the Philippines, were compared with reference strains from Central America, eastern North America, Japan, Korea, China and Australia. Methods Two PCR assays targetting length polymorphisms in the rRNA region were carried out on 96 Philippine bacterial isolates. A 496‐bp portion of the 23S rRNA gene was sequenced in 14 representative isolates. Eight strains were analysed in greater depth by sequencing portions of four other genes (16S rRNA [1410 bp], dnaK [603 bp], nifD [822 bp], recA [512 bp]), and phylogenetic trees were constructed by maximum parsimony, neighbour joining and maximum likelihood methods. Results Most of the Philippine Bradyrhizobium strains showed greater similarity to reference strains from Central America than to strains from other source regions included in the analysis. However, phylogenetic trees for the five genes had significantly conflicting topologies, suggesting that lateral gene transfer events had altered genealogical relationships at different loci. In particular, two Philippine strains resembled Bradyrhizobium strains from Central America or China for 16S rRNA, dnaK and recA sequences, but had nifD sequences that clustered with Australian strains (with bootstrap support values of 90–96%). Main conclusions The Philippines have been colonized by Bradyrhizobium strains from multiple source regions. Subsequent lateral gene transfer has resulted in the evolution of Bradyrhizobium strains that combine DNA segments of different geographic origin.  相似文献   

11.
Rare evolutionary events, such as lateral gene transfers and gene fusions, may be useful to pinpoint, and correlate the timing of, key branches across the tree of life. For example, the shared possession of a transferred gene indicates a phylogenetic relationship among organismal lineages by virtue of their shared common ancestral recipient. Here, we present phylogenetic analyses of prolyl-tRNA and alanyl-tRNA synthetase genes that indicate lateral gene transfer events to an ancestor of the diplomonads and parabasalids from lineages more closely related to the newly discovered archaeal hyperthermophile Nanoarchaeum equitans (Nanoarchaeota) than to Crenarchaeota or Euryarchaeota. The support for this scenario is strong from all applied phylogenetic methods for the alanyl-tRNA sequences, whereas the phylogenetic analyses of the prolyl-tRNA sequences show some disagreements between methods, indicating that the donor lineage cannot be identified with a high degree of certainty. However, in both trees, the diplomonads and parabasalids branch together within the Archaea, strongly suggesting that these two groups of unicellular eukaryotes, often regarded as the two earliest independent offshoots of the eukaryotic lineage, share a common ancestor to the exclusion of the eukaryotic root. Unfortunately, the phylogenetic analyses of these two aminoacyl-tRNA synthetase genes are inconclusive regarding the position of the diplomonad/parabasalid group within the eukaryotes. Our results also show that the lineage leading to Nanoarchaeota branched off from Euryarchaeota and Crenarchaeota before the divergence of diplomonads and parabasalids, that this unexplored archaeal diversity, currently only represented by the hyperthermophilic organism Nanoarchaeum equitans, may include members living in close proximity to mesophilic eukaryotes, and that the presence of split genes in the Nanoarchaeum genome is a derived feature.  相似文献   

12.

Background

The influence of lateral gene transfer on gene origins and biology in eukaryotes is poorly understood compared with those of prokaryotes. A number of independent investigations focusing on specific genes, individual genomes, or specific functional categories from various eukaryotes have indicated that lateral gene transfer does indeed affect eukaryotic genomes. However, the lack of common methodology and criteria in these studies makes it difficult to assess the general importance and influence of lateral gene transfer on eukaryotic genome evolution.

Results

We used a phylogenomic approach to systematically investigate lateral gene transfer affecting the proteomes of thirteen, mainly parasitic, microbial eukaryotes, representing four of the six eukaryotic super-groups. All of the genomes investigated have been significantly affected by prokaryote-to-eukaryote lateral gene transfers, dramatically affecting the enzymes of core pathways, particularly amino acid and sugar metabolism, but also providing new genes of potential adaptive significance in the life of parasites. A broad range of prokaryotic donors is involved in such transfers, but there is clear and significant enrichment for bacterial groups that share the same habitats, including the human microbiota, as the parasites investigated.

Conclusions

Our data show that ecology and lifestyle strongly influence gene origins and opportunities for gene transfer and reveal that, although the outlines of the core eukaryotic metabolism are conserved among lineages, the genes making up those pathways can have very different origins in different eukaryotes. Thus, from the perspective of the effects of lateral gene transfer on individual gene ancestries in different lineages, eukaryotic metabolism appears to be chimeric.  相似文献   

13.
Adenovirus-mediated gene transfer is a promising method for studies of vascular biology and potentially for gene therapy. Intravascular approaches for gene transfer to blood vessels in vivo generally require interruption of blood flow and have several limitations. We have used two alternative approaches for gene transfer to blood vessels in vivo using perivascular application of vectors. First, replication-deficient adenovirus expressing nuclear-targeted bacterial b-galactosidase was injected into cerebrospinal fluid via the cisterna magna of rats. Leptomeningeal cells over the major arteries were efficiently transfected, and adventitial cells of large vessels and smooth muscle cells of small vessels were occasionally stained. When viral suspension was injected with the rat in a lateral position, the reporter gene was expressed extensively on the ipsilateral surface of the brain. Thus, adenovirus injected into cerebrospinal fluid provides gene transfer in vivo to cerebral blood vessels and, with greater efficiency, to perivascular tissue. Furthermore, positioning of the head may target specific regions of the brain. Second, vascular gene delivery was accomplished by perivascular injection of virus in peripheral vessels. Injection of the adenoviral vector within the periarterial sheath of monkeys resulted in gene transfer to the vessel wall that was substantial in magnitude although limited to cells in the adventitia. Approximately20% of adventitial cells expressed the transgene, with no gene transfer to cells in the intima or media. These approaches may provide alternative approaches for gene transfer to blood vessels, and may be useful for studies of vascular biology and perhaps vascular gene therapy.  相似文献   

14.
Placement of the mitochondrial branch on the tree of life has been problematic. Sparse sampling, the uncertainty of how lateral gene transfer might overwrite phylogenetic signals, and the uncertainty of phylogenetic inference have all contributed to the issue. Here we address this issue using a supertree approach and completed genomic sequences. We first determine that a sensible alpha-proteobacterial phylogenetic tree exists and that it can confidently be inferred using orthologous genes. We show that congruence across these orthologous gene trees is significantly better than might be expected by random chance. There is some evidence of horizontal gene transfer within the alpha-proteobacteria, but it appears to be restricted to a minority of genes ( approximately 23%) most of whom ( approximately 74%) can be categorized as operational. This means that placement of the mitochondrion should not be excessively hampered by interspecies gene transfer. We then show that there is a consistently strong signal for placement of the mitochondrion on this tree and that this placement is relatively insensitive to methodological approach or data set. A concatenated alignment was created consisting of 15 mitochondrion-encoded proteins that are unlikely to have undergone any lateral gene transfer in the timeline under consideration. This alignment infers that the sister group of the mitochondria, for the taxa that have been sampled, is the order Rickettsiales.  相似文献   

15.
Phylogenetic inference from genome-wide data (phylogenomics) has revolutionized the study of evolution because it enables accounting for discordance among evolutionary histories across the genome. To this end, summary methods have been developed to allow accurate and scalable inference of species trees from gene trees. However, most of these methods, including the widely used ASTRAL, can only handle single-copy gene trees and do not attempt to model gene duplication and gene loss. As a result, most phylogenomic studies have focused on single-copy genes and have discarded large parts of the data. Here, we first propose a measure of quartet similarity between single-copy and multicopy trees that accounts for orthology and paralogy. We then introduce a method called ASTRAL-Pro (ASTRAL for PaRalogs and Orthologs) to find the species tree that optimizes our quartet similarity measure using dynamic programing. By studying its performance on an extensive collection of simulated data sets and on real data sets, we show that ASTRAL-Pro is more accurate than alternative methods.  相似文献   

16.
Theories of macroevolution rarely have been extended to include microbes; however, because microbes represent the most ancient and diverse assemblage of organismal diversity, such oversight limits our understanding of evolutionary history. Our analysis of phylogenetic trees for microbes suggests that macroevolution may differ between prokaryotes and both micro- and macroeukaryotes (mainly plants and animals). Phylogenetic trees inferred for prokaryotes and some microbial eukaryotes conformed to expectations assuming a constant rate of cladogenesis over time and among lineages: nevertheless, microbial eukaryote trees exhibited more variation in rates of cladogenesis than prokaryote trees. We hypothesize that the contrast of macroevolutionary dynamics between prokaryotes and many eukaryotes is due, at least in part, to differences in the prevalence of lateral gene transfer (LGT) between the two groups. Inheritance is predominantly, if not wholly, vertical within eukaryotes, a feature that allows for the emergence and maintenance of heritable variation among lineages. By contrast, frequent LGT in prokaryotes may ameliorate heritable variation in rate of cladogenesis resulting from the emergence of key innovations; thus, the inferred difference in macroevolution might reflect exclusivity of key innovations in eukaryotes and their promiscuous nature in prokaryotes.  相似文献   

17.
Evolutionary processes such as hybridisation, lateral gene transfer, and recombination are all key factors in shaping the structure of genes and genomes. However, since such processes are not always best represented by trees, there is now considerable interest in using more general networks instead. For example, in recent studies it has been shown that networks can be used to provide lower bounds on the number of recombination events and also for the number of lateral gene transfers that took place in the evolutionary history of a set of molecular sequences. In this paper we describe the theoretical performance of some related bounds that result when merging pairs of trees into networks.  相似文献   

18.
19.
All characters and trait systems in an organism share a common evolutionary history that can be estimated using phylogenetic methods. However, differential rates of change and the evolutionary mechanisms driving those rates result in pervasive phylogenetic conflict. These drivers need to be uncovered because mismatches between evolutionary processes and phylogenetic models can lead to high confidence in incorrect hypotheses. Incongruence between phylogenies derived from morphological versus molecular analyses, and between trees based on different subsets of molecular sequences has become pervasive as datasets have expanded rapidly in both characters and species. For more than a decade, evolutionary relationships among members of the New World bat family Phyllostomidae inferred from morphological and molecular data have been in conflict. Here, we develop and apply methods to minimize systematic biases, uncover the biological mechanisms underlying phylogenetic conflict, and outline data requirements for future phylogenomic and morphological data collection. We introduce new morphological data for phyllostomids and outgroups and expand previous molecular analyses to eliminate methodological sources of phylogenetic conflict such as taxonomic sampling, sparse character sampling, or use of different algorithms to estimate the phylogeny. We also evaluate the impact of biological sources of conflict: saturation in morphological changes and molecular substitutions, and other processes that result in incongruent trees, including convergent morphological and molecular evolution. Methodological sources of incongruence play some role in generating phylogenetic conflict, and are relatively easy to eliminate by matching taxa, collecting more characters, and applying the same algorithms to optimize phylogeny. The evolutionary patterns uncovered are consistent with multiple biological sources of conflict, including saturation in morphological and molecular changes, adaptive morphological convergence among nectar‐feeding lineages, and incongruent gene trees. Applying methods to account for nucleotide sequence saturation reduces, but does not completely eliminate, phylogenetic conflict. We ruled out paralogy, lateral gene transfer, and poor taxon sampling and outgroup choices among the processes leading to incongruent gene trees in phyllostomid bats. Uncovering and countering the possible effects of introgression and lineage sorting of ancestral polymorphism on gene trees will require great leaps in genomic and allelic sequencing in this species‐rich mammalian family. We also found evidence for adaptive molecular evolution leading to convergence in mitochondrial proteins among nectar‐feeding lineages. In conclusion, the biological processes that generate phylogenetic conflict are ubiquitous, and overcoming incongruence requires better models and more data than have been collected even in well‐studied organisms such as phyllostomid bats.  相似文献   

20.
The kinesin superfamily across eukaryotes was used to examine how incorporation of gap characters scored from conserved regions shared by all members of a gene family and incorporation of amino acid and gap characters scored from lineage‐specific regions affect gene‐tree inference of the gene family as a whole. We addressed these two questions in the context of two different densities of sequence sampling, four alignment programs, and two methods of tree construction. Taken together, our findings suggest the following. First, gap characters should be incorporated into gene‐tree inference, even for divergent sequences. Second, gene regions that are not conserved among all or most sequences sampled should not be automatically discarded without evaluation of potential phylogenetic signal that may be contained in gap and/or sequence characters. Third, among the four alignment programs evaluated using their default alignment parameters, Clustal may be expected to output alignments that result in the greatest gene‐tree resolution and support. Yet, this high resolution and support should be regarded as optimistic, rather than conservative, estimates. Fourth, this same conclusion regarding resolution and support holds for Bayesian gene‐tree analyses relative to parsimony‐jackknife gene‐tree analyses. We suggest that a more conservative approach, such as aligning the sequences using DIALIGN‐T or MAFFT, analyzing the appropriate characters using parsimony, and assessing branch support using the jackknife, is more appropriate for inferring gene trees of divergent gene families. © The Willi Hennig Society 2007.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号