首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Parsimony, likelihood, and simplicity   总被引:2,自引:1,他引:1  
The latest charge against parsimony in phylogenetic inference is that it involves estimating too many parameters. The charge is derived from the fact that, when each character is allowed a branch length vector of its own (instead of the homogeneous branch lengths assumed in current likelihood models), the results for likelihood and parsimony are identical. Parsimony, however, can also be derived from simpler models, involving fewer parameters. Therefore, parsimony provides (as many authors had argued before) the simplest explanation of the data, or the most realistic, depending on one's views. If (as argued by likelihoodists) phylogenetic inference is to use the simplest model that provides sufficient explanation of the data, the starting point of phylogenetic analyses should be parsimony, not maximum likelihood. If the addition of new parameters (which increase the likelihood) to a parsimony estimation is seen as desirable, this may lead to a preference for results based on current likelihood models. If the addition of parameters is continued, however, the results will eventually come back to the same place where they had started, since allowing each character a branch length of its own also produces parsimony. Parsimony can be justified by very different types of models—either very complex or very simple. This suggests that parsimony does have a unique place among methods of phylogenetic estimation.  相似文献   

2.
3.
Awareness of the complex structure and evolutionary dynamics of noncoding DNA has improved both noncoding sequence alignment and the use of microstructural changes as characters in phylogenetic analysis. The next step is to consider improvements in the use and selection of phylogenetic models for noncoding sequence data. Models of character evolution are central to phylogeny estimation, but the use of an inadequate model can mislead topology selection and branch length estimations. This is particularly likely when sequence divergence is either limited (nearly invariable, as in population-level or species-level studies) or extreme (nearly saturated, as in deep-level studies that focus on conserved secondary structures). Noncoding data sets are often at these extremes, and they can be particularly awkward for model definition and model selection. This paper introduces the goals of model use in phylogenetics and identifies ten issues that arise from the application of models to noncoding sequence data. It is concluded that most of these issues derive from small data set sizes, very low or very high sequence variability, limitations of current phylogenetic models, and possibly character definition and nonindependence. Recommendations are made that should help to improve alignment, character quality, model selection, and phylogeny estimation based on noncoding sequence data.  相似文献   

4.
The problem of rooting rapid radiations   总被引:3,自引:0,他引:3  
There are many examples of groups (such as birds, bees, mammals, multicellular animals, and flowering plants) that have undergone a rapid radiation. In such cases, where there is a combination of short internal and long external branches, correctly estimating and rooting phylogenetic trees is known to be a difficult problem. In this simulation study, we tested the performances of different phylogenetic methods at estimating a tree that models a rapid radiation. We found that maximum likelihood, corrected and uncorrected neighbor-joining, and corrected and uncorrected parsimony, all suffer from biases toward specific tree topologies. In addition, we found that using a single-taxon outgroup to root a tree frequently disrupts an otherwise correct ingroup phylogeny. Moreover, for uncorrected parsimony, we found cases where several individual trees (in which the outgroup was placed incorrectly) were selected more frequently than the correct tree. Even for parameter settings where the correct tree was selected most frequently when using extremely long sequences, for sequences of up to 60,000 nucleotides the incorrectly rooted trees were each selected more frequently than the correct tree. For all the cases tested here, tree estimation using a two taxon outgroup was more accurate than when using a single-taxon outgroup. However, the ingroup was most accurately recovered when no outgroup was used.  相似文献   

5.
Estimating phylogenetic relationships among closely related species can be extremely difficult when there is incongruence among gene trees and between the gene trees and the species tree. Here we show that incorporating a model of the stochastic loss of gene lineages by genetic drift into the phylogenetic estimation procedure can provide a robust estimate of species relationships, despite widespread incomplete sorting of ancestral polymorphism. This approach is applied to a group of montane Melanoplus grasshoppers for which genealogical discordance among loci and incomplete lineage sorting obscures any obvious phylogenetic relationships among species. Unlike traditional treatments where gene trees estimated using standard phylogenetic methods are implicitly equated with the species tree, with the coalescent-based approach the species tree is modeled probabilistically from the estimated gene trees. The estimated species phylogeny (the ESP) is calculated for the grasshoppers from multiple gene trees reconstructed for nuclear loci and a mitochondrial gene. This empirical application is coupled with a simulation study to explore the performance of the coalescent-based approach. Specifically, we test the accuracy of the ESP given the data based on analyses of simulated data matching the multilocus data collected in Melanoplus (i.e., data were simulated for each locus with the same number of base pairs and locus-specific mutational models). The results of the study show that ESPs can be computed using the coalescent-based approach long before reciprocal monophyly has been achieved, and that these statistical estimates are accurate. This contrasts with analyses of the empirical data collected in Melanoplus and simulated data based on concatenation of multiple loci, for which the incomplete lineage sorting of recently diverged species posed significant problems. The strengths and potential challenges associated with incorporating an explicit model of gene-lineage coalescence into the phylogenetic procedure to obtain an ESP, as illustrated by application to Melanoplus, versus concatenation and consensus approaches are discussed. This study represents a fundamental shift in how species relationships are estimated - the relationship between the gene trees and the species phylogeny is modeled probabilistically rather than equating gene trees with a species tree.  相似文献   

6.
Models of character evolution underpin all phylogeny estimations, thus model adequacy remains a crucial issue for phylogenetics and its many applications. Although progress has been made in selecting appropriate models for phylogeny estimation, there is still concern about their purpose and proper use. How do we interpret models in a phylogenetic context? What are their effects on phylogeny estimation? How can we improve confidence in the models that we choose? That the phylogenetics community is asking such questions denotes an important stage in the use of explicit models. Here, we examine these and other common questions and draw conclusions about how the community is using and choosing models, and where this process will take us next.  相似文献   

7.
Many research groups are estimating trees containing anywhere from a few thousands to hundreds of thousands of species, toward the eventual goal of the estimation of a Tree of Life, containing perhaps as many as several million leaves. These phylogenetic estimations present enormous computational challenges, and current computational methods are likely to fail to run even on data sets in the low end of this range. One approach to estimate a large species tree is to use phylogenetic estimation methods (such as maximum likelihood) on a supermatrix produced by concatenating multiple sequence alignments for a collection of markers; however, the most accurate of these phylogenetic estimation methods are extremely computationally intensive for data sets with more than a few thousand sequences. Supertree methods, which assemble phylogenetic trees from a collection of trees on subsets of the taxa, are important tools for phylogeny estimation where phylogenetic analyses based upon maximum likelihood (ML) are infeasible. In this paper, we introduce SuperFine, a meta-method that utilizes a novel two-step procedure in order to improve the accuracy and scalability of supertree methods. Our study, using both simulated and empirical data, shows that SuperFine-boosted supertree methods produce more accurate trees than standard supertree methods, and run quickly on very large data sets with thousands of sequences. Furthermore, SuperFine-boosted matrix representation with parsimony (MRP, the most well-known supertree method) approaches the accuracy of ML methods on supermatrix data sets under realistic conditions.  相似文献   

8.
Bayesian inference is becoming a common statistical approach to phylogenetic estimation because, among other reasons, it allows for rapid analysis of large data sets with complex evolutionary models. Conveniently, Bayesian phylogenetic methods use currently available stochastic models of sequence evolution. However, as with other model-based approaches, the results of Bayesian inference are conditional on the assumed model of evolution: inadequate models (models that poorly fit the data) may result in erroneous inferences. In this article, I present a Bayesian phylogenetic method that evaluates the adequacy of evolutionary models using posterior predictive distributions. By evaluating a model's posterior predictive performance, an adequate model can be selected for a Bayesian phylogenetic study. Although I present a single test statistic that assesses the overall (global) performance of a phylogenetic model, a variety of test statistics can be tailored to evaluate specific features (local performance) of evolutionary models to identify sources failure. The method presented here, unlike the likelihood-ratio test and parametric bootstrap, accounts for uncertainty in the phylogeny and model parameters.  相似文献   

9.
The phylogenetic position ofAulotandra (Zingiberaceae).—Nord. J. Bot. 23 : 725–734. Copenhagen. ISSN 0107–055X.
Molecular data for 41 representatives of Zingiberaceae are analysed focusing on the phylogenetic position of Aulotandra and its relationship to Siphonochilus. Sequence divergences indicate that accessions of Aulotandra from Madagascar are closest to those of African Siphonochilus in both ITS and trnL-F data sets, indicating a close relationship. Together these genera form a highly supported monophyletic clade. This African-Madagascan lineage is sister to the rest of the family with African, Asian and South American members, showing that Aulotandra does not belong in the tribe Alpinieae, where it has been traditionally placed, but in the subfamily Siphonochiloideae with the genus Siphonochilus.  相似文献   

10.
Covarion processes allow changes in evolutionary rates at sites along the branches of a phylogenetic tree. Covarion-like evolution is increasingly recognized as an important mode of protein evolution. Several recent reports suggest that maximum likelihood estimation employing covarion models may support different optimal topologies than estimation using standard rates-across-sites (RAS) models. However, it remains to be demonstrated that ignoring covarion evolution will generally result in topological misestimation. In this study we performed analytical and theoretical studies of limiting distances under the covarion model and four-taxon tree simulations to investigate the extent to which the covarion process impacts on phylogenetic estimation. In particular, we assessed the limits of an RAS model-based maximum likelihood method to recover the phylogenies when the sequence data were simulated under the covarion processes. We find that, when ignored, covarion processes can induce systematic errors in phylogeny reconstruction. Surprisingly, when sequences are evolved under a covarion process but an RAS model is used for estimation, we find that a long branch repel bias occurs.  相似文献   

11.

Background  

Likelihood-based phylogenetic inference is generally considered to be the most reliable classification method for unknown sequences. However, traditional likelihood-based phylogenetic methods cannot be applied to large volumes of short reads from next-generation sequencing due to computational complexity issues and lack of phylogenetic signal. "Phylogenetic placement," where a reference tree is fixed and the unknown query sequences are placed onto the tree via a reference alignment, is a way to bring the inferential power offered by likelihood-based approaches to large data sets.  相似文献   

12.
This article is concerned with the Bayesian estimation of stochastic rate constants in the context of dynamic models of intracellular processes. The underlying discrete stochastic kinetic model is replaced by a diffusion approximation (or stochastic differential equation approach) where a white noise term models stochastic behavior and the model is identified using equispaced time course data. The estimation framework involves the introduction of m- 1 latent data points between every pair of observations. MCMC methods are then used to sample the posterior distribution of the latent process and the model parameters. The methodology is applied to the estimation of parameters in a prokaryotic autoregulatory gene network.  相似文献   

13.
The statistical estimation of phylogenies is always associated with uncertainty, and accommodating this uncertainty is an important component of modern phylogenetic comparative analysis. The birth–death polytomy resolver is a method of accounting for phylogenetic uncertainty that places missing (unsampled) taxa onto phylogenetic trees, using taxonomic information alone. Recent studies of birds and mammals have used this approach to generate pseudoposterior distributions of phylogenetic trees that are complete at the species level, even in the absence of genetic data for many species. Many researchers have used these distributions of phylogenies for downstream evolutionary analyses that involve inferences on phenotypic evolution, geography, and community assembly. I demonstrate that the use of phylogenies constructed in this fashion is inappropriate for many questions involving traits. Because species are placed on trees at random with respect to trait values, the birth–death polytomy resolver breaks down natural patterns of trait phylogenetic structure. Inferences based on these trees are predictably and often drastically biased in a direction that depends on the underlying (true) pattern of phylogenetic structure in traits. I illustrate the severity of the phenomenon for both continuous and discrete traits using examples from a global bird phylogeny.  相似文献   

14.
Most phylogenetic tree estimation methods assume that there is a single set of hierarchical relationships among sequences in a data set for all sites along an alignment. Mosaic sequences produced by past recombination events will violate this assumption and may lead to misleading results from a phylogenetic analysis due to the imposition of a single tree along the entire alignment. Therefore, the detection of past recombination is an important first step in an analysis. A Bayesian model for the changes in topology caused by recombination events is described here. This model relaxes the assumption of one topology for all sites in an alignment and uses the theory of Hidden Markov models to facilitate calculations, the hidden states being the underlying topologies at each site in the data set. Changes in topology along the multiple sequence alignment are estimated by means of the maximum a posteriori (MAP) estimate. The performance of the MAP estimate is assessed by application of the model to data sets of four sequences, both simulated and real.  相似文献   

15.
Conventional phylogenetic tree estimation methods assume that all sites in a DNA multiple alignment have the same evolutionary history. This assumption is violated in data sets from certain bacteria and viruses due to recombination, a process that leads to the creation of mosaic sequences from different strains and, if undetected, causes systematic errors in phylogenetic tree estimation. In the current work, a hidden Markov model (HMM) is employed to detect recombination events in multiple alignments of DNA sequences. The emission probabilities in a given state are determined by the branching order (topology) and the branch lengths of the respective phylogenetic tree, while the transition probabilities depend on the global recombination probability. The present study improves on an earlier heuristic parameter optimization scheme and shows how the branch lengths and the recombination probability can be optimized in a maximum likelihood sense by applying the expectation maximization (EM) algorithm. The novel algorithm is tested on a synthetic benchmark problem and is found to clearly outperform the earlier heuristic approach. The paper concludes with an application of this scheme to a DNA sequence alignment of the argF gene from four Neisseria strains, where a likely recombination event is clearly detected.  相似文献   

16.

Background  

Accurately modeling the sequence substitution process is required for the correct estimation of evolutionary parameters, be they phylogenetic relationships, substitution rates or ancestral states; it is also crucial to simulate realistic data sets. Such simulation procedures are needed to estimate the null-distribution of complex statistics, an approach referred to as parametric bootstrapping, and are also used to test the quality of phylogenetic reconstruction programs. It has often been observed that homologous sequences can vary widely in their nucleotide or amino-acid compositions, revealing that sequence evolution has changed importantly among lineages, and may therefore be most appropriately approached through non-homogeneous models. Several programs implementing such models have been developed, but they are limited in their possibilities: only a few particular models are available for likelihood optimization, and data sets cannot be easily generated using the resulting estimated parameters.  相似文献   

17.
In theory, codon models that account for the dependence of nucleotide substitutions between codon positions as well as differences between synonymous and non-synonymous changes best describe the sequence evolution in protein coding genes. However, in practice we know little about the degree to which violations of the assumptions of codon model-based estimates occur, and how significant these artifacts may be. In nucleotide-based phylogenies from first and second codon positions in a concatenated plastid gene data set, two distantly related taxa--dinoflagellate and haptophyte plastids--were robustly grouped together. This artifactual grouping is attributed to the parallel heterogeneity in leucine (Leu) and serine (Ser) codon usages in the data set. Here, by using this data set, we demonstrated that codon-based phylogenetic estimations are seriously biased, robustly uniting the dinoflagellate and haptophyte plastids into a monophyletic clade, when the model assumption of homogeneity of codon composition was violated. Our results suggest that similar phylogenetic artifacts may occur via codon usage heterogeneity in any amino acids in codon model-based estimations. We advise that homogeneity in codon usage across taxa in a data set be confirmed before codon model-based phylogenetic estimation is attempted.  相似文献   

18.
The multispecies coalescent (MSC) model accommodates both species divergences and within-species coalescent and provides a natural framework for phylogenetic analysis of genomic data when the gene trees vary across the genome. The MSC model implemented in the program bpp assumes a molecular clock and the Jukes–Cantor model, and is suitable for analyzing genomic data from closely related species. Here we extend our implementation to more general substitution models and relaxed clocks to allow the rate to vary among species. The MSC-with-relaxed-clock model allows the estimation of species divergence times and ancestral population sizes using genomic sequences sampled from contemporary species when the strict clock assumption is violated, and provides a simulation framework for evaluating species tree estimation methods. We conducted simulations and analyzed two real datasets to evaluate the utility of the new models. We confirm that the clock-JC model is adequate for inference of shallow trees with closely related species, but it is important to account for clock violation for distant species. Our simulation suggests that there is valuable phylogenetic information in the gene-tree branch lengths even if the molecular clock assumption is seriously violated, and the relaxed-clock models implemented in bpp are able to extract such information. Our Markov chain Monte Carlo algorithms suffer from mixing problems when used for species tree estimation under the relaxed clock and we discuss possible improvements. We conclude that the new models are currently most effective for estimating population parameters such as species divergence times when the species tree is fixed.  相似文献   

19.
Liatrinae is a small subtribe of Eupatorieae that occurs in North America with a center of generic-level diversity in the southeastern United States. Molecular phylogenetic data were sought to assess whether two monotypic genera, Garberia and Hartwrightia, are accurately placed in the subtribe, and to resolve questions of the generic-level classification of Carphephorus. Phylogenetic analyses of nuclear ITS/ETS and plastid DNA data indicated that Garberia is the basalmost diverging lineage, and that Hartwrightia is phylogenetically embedded in the subtribe. There was significant incongruence between the ITS/ETS and plastid DNA datasets in the placement of Hartwrightia and another monotypic genus, Litrisa, suggesting that both are of original hybrid origin. The results also showed that Carphephorus s.l. is not monophyletic, and even after removal of the two species of Trilisa, it is still paraphyletic to Liatris. The apparent hybrid origin of Hartwrightia, which is morphologically transgressive relative to its inferred parental lineages, suggests that reticulation between phylogenetically distinct lineages may be a recurrent problem for phylogenetic estimation in Asteraceae.  相似文献   

20.
Despite the introduction of likelihood-based methods for estimating phylogenetic trees from phenotypic data, parsimony remains the most widely-used optimality criterion for building trees from discrete morphological data. However, it has been known for decades that there are regions of solution space in which parsimony is a poor estimator of tree topology. Numerous software implementations of likelihood-based models for the estimation of phylogeny from discrete morphological data exist, especially for the Mk model of discrete character evolution. Here we explore the efficacy of Bayesian estimation of phylogeny, using the Mk model, under conditions that are commonly encountered in paleontological studies. Using simulated data, we describe the relative performances of parsimony and the Mk model under a range of realistic conditions that include common scenarios of missing data and rate heterogeneity.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号