期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Assessing the performance of single-copy genes for recovering robust phylogenies

Aguileta G Marthey S Chiapello H Lebrun MH Rodolphe F Fournier E Gendrault-Jacquemard A Giraud T 《Systematic biology》2008,57(4):613-627

Phylogenies involving nonmodel species are based on a few genes, mostly chosen following historical or practical criteria. Because gene trees are sometimes incongruent with species trees, the resulting phylogenies may not accurately reflect the evolutionary relationships among species. The increase in availability of genome sequences now provides large numbers of genes that could be used for building phylogenies. However, for practical reasons only a few genes can be sequenced for a wide range of species. Here we asked whether we can identify a few genes, among the single-copy genes common to most fungal genomes, that are sufficient for recovering accurate and well-supported phylogenies. Fungi represent a model group for phylogenomics because many complete fungal genomes are available. An automated procedure was developed to extract single-copy orthologous genes from complete fungal genomes using a Markov Clustering Algorithm (Tribe-MCL). Using 21 complete, publicly available fungal genomes with reliable protein predictions, 246 single-copy orthologous gene clusters were identified. We inferred the maximum likelihood trees using the individual orthologous sequences and constructed a reference tree from concatenated protein alignments. The topologies of the individual gene trees were compared to that of the reference tree using three different methods. The performance of individual genes in recovering the reference tree was highly variable. Gene size and the number of variable sites were highly correlated and significantly affected the performance of the genes, but the average substitution rate did not. Two genes recovered exactly the same topology as the reference tree, and when concatenated provided high bootstrap values. The genes typically used for fungal phylogenies did not perform well, which suggests that current fungal phylogenies based on these genes may not accurately reflect the evolutionary relationships among species. Analyses on subsets of species showed that the phylogenetic performance did not seem to depend strongly on the sample. We expect that the best-performing genes identified here will be very useful for phylogenetic studies of fungi, at least at a large taxonomic scale. Furthermore, we compare the method developed here for finding genes for building robust phylogenies with previous ones and we advocate that our method could be applied to other groups of organisms when more complete genomes are available. 相似文献

2.

Estimating species phylogeny from gene-tree probabilities despite incomplete lineage sorting: an example from Melanoplus grasshoppers 总被引：2，自引：0，他引：2

Carstens BC Knowles LL 《Systematic biology》2007,56(3):400-411

Estimating phylogenetic relationships among closely related species can be extremely difficult when there is incongruence among gene trees and between the gene trees and the species tree. Here we show that incorporating a model of the stochastic loss of gene lineages by genetic drift into the phylogenetic estimation procedure can provide a robust estimate of species relationships, despite widespread incomplete sorting of ancestral polymorphism. This approach is applied to a group of montane Melanoplus grasshoppers for which genealogical discordance among loci and incomplete lineage sorting obscures any obvious phylogenetic relationships among species. Unlike traditional treatments where gene trees estimated using standard phylogenetic methods are implicitly equated with the species tree, with the coalescent-based approach the species tree is modeled probabilistically from the estimated gene trees. The estimated species phylogeny (the ESP) is calculated for the grasshoppers from multiple gene trees reconstructed for nuclear loci and a mitochondrial gene. This empirical application is coupled with a simulation study to explore the performance of the coalescent-based approach. Specifically, we test the accuracy of the ESP given the data based on analyses of simulated data matching the multilocus data collected in Melanoplus (i.e., data were simulated for each locus with the same number of base pairs and locus-specific mutational models). The results of the study show that ESPs can be computed using the coalescent-based approach long before reciprocal monophyly has been achieved, and that these statistical estimates are accurate. This contrasts with analyses of the empirical data collected in Melanoplus and simulated data based on concatenation of multiple loci, for which the incomplete lineage sorting of recently diverged species posed significant problems. The strengths and potential challenges associated with incorporating an explicit model of gene-lineage coalescence into the phylogenetic procedure to obtain an ESP, as illustrated by application to Melanoplus, versus concatenation and consensus approaches are discussed. This study represents a fundamental shift in how species relationships are estimated - the relationship between the gene trees and the species phylogeny is modeled probabilistically rather than equating gene trees with a species tree. 相似文献

3.

A method for molecular phylogeny construction by direct use of nucleotide sequence data

Yoshio Tateno 《Journal of molecular evolution》1990,30(1):85-93

Summary A method for molecular phylogeny construction is newly developed. The method, called the stepwise ancestral sequence method, estimates molecular phylogenetic trees and ancestral sequences simultaneously on the basis of parsimony and sequence homology. For simplicity the emphasis is placed more on parsiomony than on sequence homology in the present study, though both are certainly important. Because parsimony alone will sometimes generate plural candidate trees, the method retains not one but five candidates from which one can then single out the final tree taking other criteria into account.The properties and performance of the method are then examined by simulating an evolving gene along a model phylogenetic tree. The estimated trees are found to lie in a narrow range of the parsimony criteria used in the present study. Thus, other criteria such as biological evidence and likelihood are necessary to single out the correct tree among them, with biological evidence taking precedence over any other criterion. The computer simulation also reveals that the method satisfactorily estimates both tree topology and ancestral sequences, at least for the evolutionary model used in the present study. 相似文献

4.

Impact of deep coalescence on the reliability of species tree inference from different types of DNA markers in mammals

Sánchez-Gracia A Castresana J 《PloS one》2012,7(1):e30239

An important challenge for phylogenetic studies of closely related species is the existence of deep coalescence and gene tree heterogeneity. However, their effects can vary between species and they are often neglected in phylogenetic analyses. In addition, a practical problem in the reconstruction of shallow phylogenies is to determine the most efficient set of DNA markers for a reliable estimation. To address these questions, we conducted a multilocus simulation study using empirical values of nucleotide diversity and substitution rates obtained from a wide range of mammals and evaluated the performance of both gene tree and species tree approaches to recover the known speciation times and topological relationships. We first show that deep coalescence can be a serious problem, more than usually assumed, for the estimation of speciation times in mammals using traditional gene trees. Furthermore, we tested the performance of different sets of DNA markers in the determination of species trees using a coalescent approach. Although the best estimates of speciation times were obtained, as expected, with the use of an increasing number of nuclear loci, our results show that similar estimations can be obtained with a much lower number of genes and the incorporation of a mitochondrial marker, with its high information content. Thus, the use of the combined information of both nuclear and mitochondrial markers in a species tree framework is the most efficient option to estimate recent speciation times and, consequently, the underlying species tree. 相似文献

5.

Reconciling gene and genome duplication events: using multiple nuclear gene families to infer the phylogeny of the aquatic plant family Pontederiaceae

Ness RW Graham SW Barrett SC 《Molecular biology and evolution》2011,28(11):3009-3018

Most plant phylogenetic inference has used DNA sequence data from the plastid genome. This genome represents a single genealogical sample with no recombination among genes, potentially limiting the resolution of evolutionary relationships in some contexts. In contrast, nuclear DNA is inherently more difficult to employ for phylogeny reconstruction because major mutational events in the genome, including polyploidization, gene duplication, and gene extinction can result in homologous gene copies that are difficult to identify as orthologs or paralogs. Gene tree parsimony (GTP) can be used to infer the rooted species tree by fitting gene genealogies to species trees while simultaneously minimizing the estimated number of duplications needed to reconcile conflicts among them. Here, we use GTP for five nuclear gene families and a previously published plastid data set to reconstruct the phylogenetic backbone of the aquatic plant family Pontederiaceae. Plastid-based phylogenetic studies strongly supported extensive paraphyly of Eichhornia (one of the four major genera) but also depicted considerable ambiguity concerning the true root placement for the family. Our results indicate that species trees inferred from the nuclear genes (alone and in combination with the plastid data) are highly congruent with gene trees inferred from plastid data alone. Consideration of optimal and suboptimal gene tree reconciliations place the root of the family at (or near) a branch leading to the rare and locally restricted E. meyeri. We also explore methods to incorporate uncertainty in individual gene trees during reconciliation by considering their individual bootstrap profiles and relate inferred excesses of gene duplication events on individual branches to whole-genome duplication events inferred for the same branches. Our study improves understanding of the phylogenetic history of Pontederiaceae and also demonstrates the utility of GTP for phylogenetic analysis. 相似文献

6.

Multilocus phylogenetics of a rapid radiation in the genus Thomomys (Rodentia: Geomyidae)

Belfiore NM Liu L Moritz C 《Systematic biology》2008,57(2):294-310

Species complexes undergoing rapid radiation present a challenge in molecular systematics because of the possibility that ancestral polymorphism is retained in component gene trees. Coalescent theory has demonstrated that gene trees often fail to match lineage trees when taxon divergence times are less than the ancestral effective population sizes. Suggestions to increase the number of loci and the number of individuals per taxon have been proposed; however, phylogenetic methods to adequately analyze these data in a coalescent framework are scarce. We compare two approaches to estimating lineage (species) trees using multiple individuals and multiple loci: the commonly used partitioned Bayesian analysis of concatenated sequences and a modification of a newly developed hierarchical Bayesian method (BEST) that simultaneously estimates gene trees and species trees from multilocus data. We test these approaches on a phylogeny of rapidly radiating species wherein divergence times are likely to be smaller than effective population sizes, and incomplete lineage sorting is known, in the rodent genus, Thomomys. We use seven independent noncoding nuclear sequence loci (total approximately 4300 bp) and between 1 and 12 individuals per taxon to construct a phylogenetic hypothesis for eight Thomomys species. The majority-rule consensus tree from the partitioned concatenated analysis included 14 strongly supported bipartitions, corroborating monophyletic species status of five of the eight named species. The BEST tree strongly supported only the split between the two subgenera and showed very low support for any other clade. Comparison of both lineage trees to individual gene trees revealed that the concatenation method appears to ignore conflicting signals among gene trees, whereas the BEST tree considers conflicting signals and downweights support for those nodes. Bayes factor analysis of posterior tree distributions from both analyses strongly favor the model underlying the BEST analysis. This comparison underscores the risks of overreliance on results from concatenation, and ignoring the properties of coalescence, especially in cases of recent, rapid radiations. 相似文献

7.

Species trees from gene trees: reconstructing Bayesian posterior distributions of a species phylogeny using estimated gene tree distributions 总被引：5，自引：0，他引：5

Liu L Pearl DK 《Systematic biology》2007,56(3):504-514

The desire to infer the evolutionary history of a group of species should be more viable now that a considerable amount of multilocus molecular data is available. However, the current molecular phylogenetic paradigm still reconstructs gene trees to represent the species tree. Further, commonly used methods of combining data, such as the concatenation method, are known to be inconsistent in some circumstances. In this paper, we propose a Bayesian hierarchical model to estimate the phylogeny of a group of species using multiple estimated gene tree distributions, such as those that arise in a Bayesian analysis of DNA sequence data. Our model employs substitution models used in traditional phylogenetics but also uses coalescent theory to explain genealogical signals from species trees to gene trees and from gene trees to sequence data, thereby forming a complete stochastic model to estimate gene trees, species trees, ancestral population sizes, and species divergence times simultaneously. Our model is founded on the assumption that gene trees, even of unlinked loci, are correlated due to being derived from a single species tree and therefore should be estimated jointly. We apply the method to two multilocus data sets of DNA sequences. The estimates of the species tree topology and divergence times appear to be robust to the prior of the population size, whereas the estimates of effective population sizes are sensitive to the prior used in the analysis. These analyses also suggest that the model is superior to the concatenation method in fitting these data sets and thus provides a more realistic assessment of the variability in the distribution of the species tree that may have produced the molecular information at hand. Future improvements of our model and algorithm should include consideration of other factors that can cause discordance of gene trees and species trees, such as horizontal transfer or gene duplication. 相似文献

8.

The accuracy of species tree estimation under simulation: a comparison of methods

Leaché AD Rannala B 《Systematic biology》2011,60(2):126-137

Numerous simulation studies have investigated the accuracy of phylogenetic inference of gene trees under maximum parsimony, maximum likelihood, and Bayesian techniques. The relative accuracy of species tree inference methods under simulation has received less study. The number of analytical techniques available for inferring species trees is increasing rapidly, and in this paper, we compare the performance of several species tree inference techniques at estimating recent species divergences using computer simulation. Simulating gene trees within species trees of different shapes and with varying tree lengths (T) and population sizes (), and evolving sequences on those gene trees, allows us to determine how phylogenetic accuracy changes in relation to different levels of deep coalescence and phylogenetic signal. When the probability of discordance between the gene trees and the species tree is high (i.e., T is small and/or is large), Bayesian species tree inference using the multispecies coalescent (BEST) outperforms other methods. The performance of all methods improves as the total length of the species tree is increased, which reflects the combined benefits of decreasing the probability of discordance between species trees and gene trees and gaining more accurate estimates for gene trees. Decreasing the probability of deep coalescences by reducing also leads to accuracy gains for most methods. Increasing the number of loci from 10 to 100 improves accuracy under difficult demographic scenarios (i.e., coalescent units ≤ 4N(e)), but 10 loci are adequate for estimating the correct species tree in cases where deep coalescence is limited or absent. In general, the correlation between the phylogenetic accuracy and the posterior probability values obtained from BEST is high, although posterior probabilities are overestimated when the prior distribution for is misspecified. 相似文献

9.

A polynomial time algorithm for calculating the probability of a ranked gene tree given a species tree

Stadler T Degnan JH 《Algorithms for molecular biology : AMB》2012,7(1):7

ABSTRACT: BACKGROUND: The ancestries of genes form gene trees which do not necessarily have the same topology as the species tree due to incomplete lineage sorting. Available algorithms determining the probability of a gene tree given a species tree require exponential computational runtime. RESULTS: In this paper, we provide a polynomial time algorithm to calculate the probability of a ranked gene tree topology for a given species tree, where a ranked tree topology is a tree topology with the internal vertices being ordered. The probability of a gene tree topology can thus be calculated in polynomial time if the number of orderings of the internal vertices is a polynomial number. However, the complexity of calculating the probability of a gene tree topology with an exponential number of rankings for a given species tree remains unknown. CONCLUSIONS: Polynomial algorithms for calculating ranked gene tree probabilities may become useful in developing methodology to infer species trees based on a collection of gene trees, leading to a more accurate reconstruction of ancestral species relationships. 相似文献

10.

The Use (and Misuse) of Phylogenetic Trees in Comparative Behavioral Analyses

Luca Pozzi Christina M. Bergey Andrew S. Burrell 《International journal of primatology》2014,35(1):32-54

Phylogenetic comparative methods play a critical role in our understanding of the adaptive origin of primate behaviors. To incorporate evolutionary history directly into comparative behavioral research, behavioral ecologists rely on strong, well-resolved phylogenetic trees. Phylogenies provide the framework on which behaviors can be compared and homologies can be distinguished from similarities due to convergent or parallel evolution. Phylogenetic reconstructions are also of critical importance when inferring the ancestral state of behavioral patterns and when suggesting the evolutionary changes that behavior has undergone. Improvements in genome sequencing technologies have increased the amount of data available to researchers. Recently, several primate phylogenetic studies have used multiple loci to produce robust phylogenetic trees that include hundreds of primate species. These trees are now commonly used in comparative analyses and there is a perception that we have a complete picture of the primate tree. But how confident can we be in those phylogenies? And how reliable are comparative analyses based on such trees? Herein, we argue that even recent molecular phylogenies should be treated cautiously because they rely on many assumptions and have many shortcomings. Most phylogenetic studies do not model gene tree diversity and can produce misleading results, such as strong support for an incorrect species tree, especially in the case of rapid and recent radiations. We discuss implications that incorrect phylogenies can have for reconstructing the evolution of primate behaviors and we urge primatologists to be aware of the current limitations of phylogenetic reconstructions when applying phylogenetic comparative methods. 相似文献

11.

Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci 总被引：11，自引：0，他引：11

Rannala B Yang Z 《Genetics》2003,164(4):1645-1656

The effective population sizes of ancestral as well as modern species are important parameters in models of population genetics and human evolution. The commonly used method for estimating ancestral population sizes, based on counting mismatches between the species tree and the inferred gene trees, is highly biased as it ignores uncertainties in gene tree reconstruction. In this article, we develop a Bayes method for simultaneous estimation of the species divergence times and current and ancestral population sizes. The method uses DNA sequence data from multiple loci and extracts information about conflicts among gene tree topologies and coalescent times to estimate ancestral population sizes. The topology of the species tree is assumed known. A Markov chain Monte Carlo algorithm is implemented to integrate over uncertain gene trees and branch lengths (or coalescence times) at each locus as well as species divergence times. The method can handle any species tree and allows different numbers of sequences at different loci. We apply the method to published noncoding DNA sequences from the human and the great apes. There are strong correlations between posterior estimates of speciation times and ancestral population sizes. With the use of an informative prior for the human-chimpanzee divergence date, the population size of the common ancestor of the two species is estimated to be approximately 20,000, with a 95% credibility interval (8000, 40,000). Our estimates, however, are affected by model assumptions as well as data quality. We suggest that reliable estimates have yet to await more data and more realistic models. 相似文献

12.

Phylogenetic trees based on gene content 总被引：2，自引：0，他引：2

Huson DH Steel M 《Bioinformatics (Oxford, England)》2004,20(13):2044-2049

Comparing gene content between species can be a useful approach for reconstructing phylogenetic trees. In this paper, we derive a maximum-likelihood estimation of evolutionary distance between species under a simple model of gene genesis and gene loss. Using simulated data on a biological tree with 107 taxa (and on a number of randomly generated trees), we compare the accuracy of tree reconstruction using this ML distance measure to an earlier ad hoc distance. We then compare these distance-based approaches to a character-based tree reconstruction method (Dollo parsimony) which seems well suited to the analysis of gene content data. To simplify simulations, we give a formal proof of the well-known 'fact' that the Dollo parsimony score is independent of the choice of root. Our results show a consistent trend, with the character-based method and ML distance measure outperforming the earlier ad hoc distance method. AVAILABILITY: http://www.ab.informatik.uni-tuebingen.de/software/genecontent/welcome_en.html 相似文献

13.

Accurate Phylogenetic Tree Reconstruction from Quartets: A Heuristic Approach

Rezwana Reaz Md. Shamsuzzoha Bayzid M. Sohel Rahman 《PloS one》2014,9(8)

Supertree methods construct trees on a set of taxa (species) combining many smaller trees on the overlapping subsets of the entire set of taxa. A ‘quartet’ is an unrooted tree over taxa, hence the quartet-based supertree methods combine many -taxon unrooted trees into a single and coherent tree over the complete set of taxa. Quartet-based phylogeny reconstruction methods have been receiving considerable attentions in the recent years. An accurate and efficient quartet-based method might be competitive with the current best phylogenetic tree reconstruction methods (such as maximum likelihood or Bayesian MCMC analyses), without being as computationally intensive. In this paper, we present a novel and highly accurate quartet-based phylogenetic tree reconstruction method. We performed an extensive experimental study to evaluate the accuracy and scalability of our approach on both simulated and biological datasets. 相似文献

14.

Compositional Bias May Affect Both DNA-Based and Protein-Based Phylogenetic Reconstructions

Foster PG Hickey DA 《Journal of molecular evolution》1999,48(3):284-290

It is now well-established that compositional bias in DNA sequences can adversely affect phylogenetic analysis based on those sequences. Phylogenetic analyses based on protein sequences are generally considered to be more reliable than those derived from the corresponding DNA sequences because it is believed that the use of encoded protein sequences circumvents the problems caused by nucleotide compositional biases in the DNA sequences. There exists, however, a correlation between AT/GC bias at the nucleotide level and content of AT- and GC-rich codons and their corresponding amino acids. Consequently, protein sequences can also be affected secondarily by nucleotide compositional bias. Here, we report that DNA bias not only may affect phylogenetic analysis based on DNA sequences, but also drives a protein bias which may affect analyses based on protein sequences. We present a striking example where common phylogenetic tools fail to recover the correct tree from complete animal mitochondrial protein-coding sequences. The data set is very extensive, containing several thousand sites per sequence, and the incorrect phylogenetic trees are statistically very well supported. Additionally, neither the use of the LogDet/paralinear transform nor removal of positions in the protein alignment with AT- or GC-rich codons allowed recovery of the correct tree. Two taxa with a large compositional bias continually group together in these analyses, despite a lack of close biological relatedness. We conclude that even protein-based phylogenetic trees may be misleading, and we advise caution in phylogenetic reconstruction using protein sequences, especially those that are compositionally biased. Received: 19 February 1998 / Accepted: 28 August 1998 相似文献

15.

Disk-covering, a fast-converging method for phylogenetic tree reconstruction. 总被引：2，自引：0，他引：2

D H Huson S M Nettles T J Warnow 《Journal of computational biology》1999,6(3-4):369-386

The evolutionary history of a set of species is represented by a phylogenetic tree, which is a rooted, leaf-labeled tree, where internal nodes represent ancestral species and the leaves represent modern day species. Accurate (or even boundedly inaccurate) topology reconstructions of large and divergent trees from realistic length sequences have long been considered one of the major challenges in systematic biology. In this paper, we present a simple method, the Disk-Covering Method (DCM), which boosts the performance of base phylogenetic methods under various Markov models of evolution. We analyze the performance of DCM-boosted distance methods under the Jukes-Cantor Markov model of biomolecular sequence evolution, and prove that for almost all trees, polylogarithmic length sequences suffice for complete accuracy with high probability, while polynomial length sequences always suffice. We also provide an experimental study based upon simulating sequence evolution on model trees. This study confirms substantial reductions in error rates at realistic sequence lengths. 相似文献

16.

A support vector machine based test for incongruence between sets of trees in tree space

DC Haws P Huggins EM O'Neill DW Weisrock R Yoshida 《BMC bioinformatics》2012,13(1):210

ABSTRACT: BACKGROUND: The increased use of multi-locus data sets for phylogenetic reconstruction has increased the need to determine whether a set of gene trees significantly deviate from the phylogenetic patterns of other genes. Such unusual gene trees may have been influenced by other evolutionary processes such as selection, gene duplication, or horizontal gene transfer. RESULTS: Motivated by this problem we propose a nonparametric goodness-of-fit test for two empirical distributions of gene trees, and we developed the software GeneOut to estimate a p-value for the test. Our approach maps trees into a multi-dimensional vector space and then applies support vector machines (SVMs) to measure the separation between two sets of pre-defined trees. We use a permutation test to assess the significance of the SVM separation. To demonstrate the performance of GeneOut, we applied it to the comparison of gene trees simulated within different species trees across a range of species tree depths. Applied directly to sets of simulated gene trees with large sample sizes, GeneOut was able to detect very small differences between two set of gene trees generated under different species trees. Our statistical test can also include tree reconstruction into its test framework through a variety of phylogenetic optimality criteria. When applied to DNA sequence data simulated from different sets of gene trees, results in the form of receiver operating characteristic (ROC) curves indicated that GeneOut performed well in the detection of differences between sets of trees with different distributions in a multi-dimensional space. Furthermore, it controlled false positive and false negative rates very well, indicating a high degree of accuracy. CONCLUSIONS: The non-parametric nature of our statistical test provides fast and efficient analyses, and makes it an applicable test for any scenario where evolutionary or other factors can lead to trees with different multi-dimensional distributions. The software GeneOut is freely available under the GNU public license. 相似文献

17.

Inferring phylogenetic networks by the maximum parsimony criterion: a case study

Jin G Nakhleh L Snir S Tuller T 《Molecular biology and evolution》2007,24(1):324-337

Horizontal gene transfer (HGT) may result in genes whose evolutionary histories disagree with each other, as well as with the species tree. In this case, reconciling the species and gene trees results in a network of relationships, known as the "phylogenetic network" of the set of species. A phylogenetic network that incorporates HGT consists of an underlying species tree that captures vertical inheritance and a set of edges which model the "horizontal" transfer of genetic material. In a series of papers, Nakhleh and colleagues have recently formulated a maximum parsimony (MP) criterion for phylogenetic networks, provided an array of computationally efficient algorithms and heuristics for computing it, and demonstrated its plausibility on simulated data. In this article, we study the performance and robustness of this criterion on biological data. Our findings indicate that MP is very promising when its application is extended to the domain of phylogenetic network reconstruction and HGT detection. In all cases we investigated, the MP criterion detected the correct number of HGT events required to map the evolutionary history of a gene data set onto the species phylogeny. Furthermore, our results indicate that the criterion is robust with respect to both incomplete taxon sampling and the use of different site substitution matrices. Finally, our results show that the MP criterion is very promising in detecting HGT in chimeric genes, whose evolutionary histories are a mix of vertical and horizontal evolution. Besides the performance analysis of MP, our findings offer new insights into the evolution of 4 biological data sets and new possible explanations of HGT scenarios in their evolutionary history. 相似文献

18.

Evolution of the RNA polymerase B' subunit gene (rpoB') in Halobacteriales: a complementary molecular marker to the SSU rRNA gene

Walsh DA Bapteste E Kamekura M Doolittle WF 《Molecular biology and evolution》2004,21(12):2340-2351

Many prokaryotes have multiple ribosomal RNA operons. Generally, sequence differences between small subunit (SSU) rRNA genes are minor (<1%) and cause little concern for phylogenetic inference or environmental diversity studies. For Halobacteriales, an order of extremely halophilic, aerobic Archaea, within-genome SSU rRNA sequence divergence can exceed 5%, rendering phylogenetic assignment problematic. The RNA polymerase B' subunit gene (rpoB') is a single-copy conserved gene that may be an appropriate alternative phylogenetic marker for Halobacteriales. We sequenced a fragment of the rpoB' gene from 21 species, encompassing 15 genera of Halobacteriales. To examine the utility of rpoB' as a phylogenetic marker in Halobacteriales, we investigated three properties of rpoB' trees: the variation in resolution between trees inferred from the rpoB' DNA and RpoB' protein alignment, the degree of mutational saturation between taxa, and congruence with the SSU rRNA tree. The rpoB' DNA and protein trees were for the most part congruent and consistently recovered two well-supported monophyletic groups, the clade I and clade II haloarchaea, within a collection of less well resolved Halobacteriales lineages. A comparison of observed versus inferred numbers of substitution revealed mutational saturation in the rpoB' DNA data set, particularly between more distant species. Thus, the RpoB' protein sequence may be more reliable than the rpoB' DNA sequence for inferring Halobacteriales phylogeny. AU tests of tree selection indicated the trees inferred from rpoB' DNA and protein alignments were significantly incongruent with the SSU rRNA tree. We discuss possible explanations for this incongruence, including tree reconstruction artifact, differential paralog sampling, and lateral gene transfer. This is the first study of Halobacteriales evolution based on a marker other than the SSU rRNA gene. In addition, we present a valuable phylogenetic framework encompassing a broad diversity of Halobacteriales, in which novel sequences can be inserted for evolutionary, ecological, or taxonomic investigations. 相似文献

19.

Rapid maximum likelihood ancestral state reconstruction of continuous characters: A rerooting‐free algorithm

下载免费PDF全文

Eric W. Goolsby 《Ecology and evolution》2017,7(8):2791-2797

Ancestral state reconstruction is a method used to study the evolutionary trajectories of quantitative characters on phylogenies. Although efficient methods for univariate ancestral state reconstruction under a Brownian motion model have been described for at least 25 years, to date no generalization has been described to allow more complex evolutionary models, such as multivariate trait evolution, non‐Brownian models, missing data, and within‐species variation. Furthermore, even for simple univariate Brownian motion models, most phylogenetic comparative R packages compute ancestral states via inefficient tree rerooting and full tree traversals at each tree node, making ancestral state reconstruction extremely time‐consuming for large phylogenies. Here, a computationally efficient method for fast maximum likelihood ancestral state reconstruction of continuous characters is described. The algorithm has linear complexity relative to the number of species and outperforms the fastest existing R implementations by several orders of magnitude. The described algorithm is capable of performing ancestral state reconstruction on a 1,000,000‐species phylogeny in fewer than 2 s using a standard laptop, whereas the next fastest R implementation would take several days to complete. The method is generalizable to more complex evolutionary models, such as phylogenetic regression, within‐species variation, non‐Brownian evolutionary models, and multivariate trait evolution. Because this method enables fast repeated computations on phylogenies of virtually any size, implementation of the described algorithm can drastically alleviate the computational burden of many otherwise prohibitively time‐consuming tasks requiring reconstruction of ancestral states, such as phylogenetic imputation of missing data, bootstrapping procedures, Expectation‐Maximization algorithms, and Bayesian estimation. The described ancestral state reconstruction algorithm is implemented in the Rphylopars functions anc.recon and phylopars. 相似文献

20.

Rooted phylogeny of the three superkingdoms

Ajith Harish Anders TunlidCharles G. Kurland 《Biochimie》2013

The traditional bacterial rooting of the three superkingdoms in sequence-based gene trees is inconsistent with new phylogenetic reconstructions based on genome content of compact protein domains. We find that protein domains at the level of the SCOP superfamily (SF) from sequenced genomes implement with maximum parsimony fully resolved rooted trees. Such genome content trees identify archaea and bacteria (akaryotes) as sister clades that diverge from an akaryote common ancestor, LACA. Several eukaryote sister clades diverge from a eukaryote common ancestor, LECA. LACA and LECA descend in parallel from the most recent universal common ancestor (MRUCA), which is not a bacterium. Rather, MRUCA presents 75% of the unique SFs encoded by extant genomes of the three superkingdoms, each encoding a proteome that partially overlaps all others. This alone implies that the common ancestor to the superkingdoms was very complex. Such ancestral complexity is confirmed by phylogenetic reconstructions. In addition, the divergence of proteomes from the complex ancestor in each superkingdom is both reductive in numbers of unique SFs as well as cumulative in the abundance of surviving SFs. These data suggest that the common ancestor was not the first cell lineage and that modern global phylogeny is the crown of a “recently” re-rooted tree. We suggest that a bottlenecked survivor of an environmental collapse, which preceded the flourishing of the modern crown, seeded the current phylogenetic tree. 相似文献