首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 609 毫秒
1.
Network models for sequence evolution   总被引:4,自引:0,他引:4  
We introduce a general class of models for sequence evolution that includes network phylogenies. Networks, a generalization of strictly tree-like phylogenies, are proposed to model situations where multiple lineages contribute to the observed sequences. An algorithm to compute the probability distribution of binary character-state configurations is presented and statistical inference for this model is developed in a likelihood framework. A stepwise procedure based on likelihood ratios is used to explore the space of models. Starting with a star phylogeny, new splits (nontrivial bipartitions of the sequence set) are successively added to the model until no significant change in the likelihood is observed. A novel feature of our approach is that the new splits are not necessarily constrained to be consistent with a treelike mode of evolution. The fraction of invariable sites is estimated by maximum likelihood simultaneously with other model parameters and is essential to obtain a good fit to the data. The effect of finite sequence length on the inference methods is discussed. Finally, we provide an illustrative example using aligned VPl genes from the foot and mouth disease viruses (FMDV). The different serotypes of the FMDV exhibit a range of treelike and network evolutionary relationships.Correspondence to: A. von Haeseler  相似文献   

2.
Speciation and extinction probabilities can be estimated from molecular phylogenies of extant species that are complete at the species level. Because only a fraction of published phylogenies is complete at the species level, methods have been developed to estimate speciation and extinction probabilities also from incomplete phylogenies. However, due to different estimation techniques, estimates from complete and incomplete phylogenies are difficult to compare statistically. Here I show with some examples how existing likelihood functions can be used to obtain Bayesian estimates of speciation and extinction probabilities, and how this approach is applied to both complete and incomplete phylogenies.  相似文献   

3.
The branching times of molecular phylogenies allow us to infer speciation and extinction dynamics even when fossils are absent. Troublingly, phylogenetic approaches usually return estimates of zero extinction, conflicting with fossil evidence. Phylogenies and fossils do agree, however, that there are often limits to diversity. Here, we present a general approach to evaluate the likelihood of a phylogeny under a model that accommodates diversity-dependence and extinction. We find, by likelihood maximization, that extinction is estimated most precisely if the rate of increase in the number of lineages in the phylogeny saturates towards the present or first decreases and then increases. We demonstrate the utility and limits of our approach by applying it to the phylogenies for two cases where a fossil record exists (Cetacea and Cenozoic macroperforate planktonic foraminifera) and to three radiations lacking fossil evidence (Dendroica, Plethodon and Heliconius). We propose that the diversity-dependence model with extinction be used as the standard model for macro-evolutionary dynamics because of its biological realism and flexibility.  相似文献   

4.
5.
We introduce a mechanism for analytically deriving upper bounds on the maximum likelihood for genetic sequence data on sets of phylogenies. A simple 'partition' bound is introduced for general models. Tighter bounds are developed for the simplest model of evolution, the two state symmetric model of nucleotide substitution under the molecular clock. This follows earlier theoretical work which has been restricted to this model by analytic complexity. A weakness of current numerical computation is that reported 'maximum likelihood' results cannot be guaranteed, both for a specified tree (because of the possibility of multiple maxima) or over the full tree space (as the computation is intractable for large sets of trees). The bounds we develop here can be used to conclusively eliminate large proportions of tree space in the search for the maximum likelihood tree. This is vital in the development of a branch and bound search strategy for identifying the maximum likelihood tree. We report the results from a simulation study of approximately 10(6) data sets generated on clock-like trees of five leaves. In each trial a likelihood value of one specific instance of a parameterised tree is compared to the bound determined for each of the 105 possible rooted binary trees. The proportion of trees that are eliminated from the search for the maximum likelihood tree ranged from 92% to almost 98%, indicating a computational speed-up factor of between 12 and 44.  相似文献   

6.
Restriction sites data can be analyzed by maximum likelihood to obtain estimates of phylogenies. The likelihood methods of Smouse and Li, who were able to compute likelihoods for up to four species under a simplified model of base change, can be extended numerically to deal with any number of species. The computational methods for doing so are outlined. The resulting algorithms are slow but take multiple gains and losses of restriction sites fully into account, unlike parsimony methods. They allow for the failure to observe potential sites that are absent from all species. Analysis of the five-species hominoid data of Ferris and coworkers confirms the pattern found by Smouse and Li with four species—that a chimpanzee-gorilla clade is favored, but not statistically significantly over other tree topologies. A large data set produced by computer simulation has also been analyzed to confirm that the method works properly. The methods used here do not allow for different rates of transitions and transversions. They can be extended to do so, but only at a cost of considerably slower computations. The present method is available in a computer program.  相似文献   

7.
Evolutionary biologists have adopted simple likelihood models for purposes of estimating ancestral states and evaluating character independence on specified phylogenies; however, for purposes of estimating phylogenies by using discrete morphological data, maximum parsimony remains the only option. This paper explores the possibility of using standard, well-behaved Markov models for estimating morphological phylogenies (including branch lengths) under the likelihood criterion. An important modification of standard Markov models involves making the likelihood conditional on characters being variable, because constant characters are absent in morphological data sets. Without this modification, branch lengths are often overestimated, resulting in potentially serious biases in tree topology selection. Several new avenues of research are opened by an explicitly model-based approach to phylogenetic analysis of discrete morphological data, including combined-data likelihood analyses (morphology + sequence data), likelihood ratio tests, and Bayesian analyses.  相似文献   

8.
Speciation is not instantaneous but takes time. The protracted birth–death diversification model incorporates this fact and predicts the often observed slowdown of lineage accumulation toward the present. The mathematical complexity of the protracted speciation model has barred estimation of its parameters until recently a method to compute the likelihood of phylogenetic branching times under this model was outlined (Lambert et al. 2014 ). Here, we implement this method and study using simulated phylogenies of extant species how well we can estimate the model parameters (rate of initiation of speciation, rate of extinction of incipient and good species, and rate of completion of speciation) as well as the duration of speciation, which is a combination of the aforementioned parameters. We illustrate our approach by applying it to a primate phylogeny. The simulations show that phylogenies often do not contain enough information to provide unbiased estimates of the speciation‐initiation rate and the extinction rate, but the duration of speciation can be estimated without much bias. The estimate of the duration of speciation for the primate clade is consistent with literature estimates. We conclude that phylogenies combined with the protracted speciation model provide a promising way to estimate the duration of speciation.  相似文献   

9.
Which conditions favour the evolution of hermaphroditism or separate sexes? One classical hypothesis states that an organism’s mode of locomotion (if any) when searching for a mate should influence breeding system evolution. We used published phylogenies to reconstruct evolutionary changes in adult mate‐search efficiency and breeding systems among multicellular organisms. Employing maximum‐likelihood analyses, we found that changes in adult mate‐search efficiency are significantly correlated with changes in breeding system, and this result is robust to uncertainties in the phylogenies. These data provide the first statistical support, across a broad range of taxa, for the hypothesis that breeding systems and mate‐search efficiency did not evolve independently. We discuss our results in context with other causal factors, such as inbreeding avoidance and sexual specialization, likely to affect breeding system evolution.  相似文献   

10.
Incomplete lineage sorting can cause incongruence between the phylogenetic history of genes (the gene tree) and that of the species (the species tree), which can complicate the inference of phylogenies. In this article, I present a new coalescent-based algorithm for species tree inference with maximum likelihood. I first describe an improved method for computing the probability of a gene tree topology given a species tree, which is much faster than an existing algorithm by Degnan and Salter (2005). Based on this method, I develop a practical algorithm that takes a set of gene tree topologies and infers species trees with maximum likelihood. This algorithm searches for the best species tree by starting from initial species trees and performing heuristic search to obtain better trees with higher likelihood. This algorithm, called STELLS (which stands for Species Tree InfErence with Likelihood for Lineage Sorting), has been implemented in a program that is downloadable from the author's web page. The simulation results show that the STELLS algorithm is more accurate than an existing maximum likelihood method for many datasets, especially when there is noise in gene trees. I also show that the STELLS algorithm is efficient and can be applied to real biological datasets.  相似文献   

11.
The enormous diversity of Arthropoda has complicated attempts by systematists to deduce the history of this group in terms of phylogenetic relationships and phenotypic change. Traditional hypotheses regarding the relationships of the major arthropod groups (Chelicerata, Myriapoda, Crustacea, and Hexapoda) focus on suites of morphological characters, whereas phylogenomics relies on large amounts of molecular sequence data to infer evolutionary relationships. The present discussion is based on expressed sequence tags (ESTs) that provide large numbers of short molecular sequences and so provide an abundant source of sequence data for phylogenetic inference. This study presents well-supported phylogenies of diverse arthropod and metazoan outgroup taxa obtained from publicly-available databases. An in-house bioinformatics pipeline has been used to compile and align conserved orthologs from each taxon for maximum likelihood inferences. This approach resolves many currently accepted hypotheses regarding internal relationships between the major groups of Arthropoda, including monophyletic Hexapoda, Tetraconata (Crustacea + Hexapoda), Myriapoda, and Chelicerata sensu lato (Pycnogonida + Euchelicerata). "Crustacea" is a paraphyletic group with some taxa more closely related to the monophyletic Hexapoda. These results support studies that have utilized more restricted EST data for phylogenetic inference, yet they differ in important regards from recently published phylogenies employing nuclear protein-coding sequences. The present results do not, however, depart from other phylogenies that resolve Branchiopoda as the crustacean sister group of Hexapoda. Like other molecular phylogenies, EST-derived phylogenies alone are unable to resolve morphological convergences or evolved reversals and thus omit what may be crucial events in the history of life. For example, molecular data are unable to resolve whether a Hexapod-Branchiopod sister relationship infers a branchiopod-like ancestry of the Hexapoda, or whether this assemblage originates from a malacostracan-like ancestor, with the morphologically simpler Branchiopoda being highly derived. Whereas this study supports many internal arthropod relationships obtained by other sources of molecular data, other approaches are required to resolve such evolutionary scenarios. The approach presented here turns out to be essential: integrating results of molecular phylogenetics and neural cladistics to infer that Branchiopoda evolved simplification from a more elaborate ancestor. Whereas the phenomenon of evolved simplification may be widespread, it is largely invisible to molecular techniques unless these are performed in conjunction with morphology-based strategies.  相似文献   

12.
Whether there are ecological limits to species diversification is a hotly debated topic. Molecular phylogenies show slowdowns in lineage accumulation, suggesting that speciation rates decline with increasing diversity. A maximum‐likelihood (ML) method to detect diversity‐dependent (DD) diversification from phylogenetic branching times exists, but it assumes that diversity‐dependence is a global phenomenon and therefore ignores that the underlying species interactions are mostly local, and not all species in the phylogeny co‐occur locally. Here, we explore whether this ML method based on the nonspatial diversity‐dependence model can detect local diversity‐dependence, by applying it to phylogenies, simulated with a spatial stochastic model of local DD speciation, extinction, and dispersal between two local communities. We find that type I errors (falsely detecting diversity‐dependence) are low, and the power to detect diversity‐dependence is high when dispersal rates are not too low. Interestingly, when dispersal is high the power to detect diversity‐dependence is even higher than in the nonspatial model. Moreover, estimates of intrinsic speciation rate, extinction rate, and ecological limit strongly depend on dispersal rate. We conclude that the nonspatial DD approach can be used to detect diversity‐dependence in clades of species that live in not too disconnected areas, but parameter estimates must be interpreted cautiously.  相似文献   

13.
Allozyme data are widely used to infer the phylogenies of populations and closely-related species. Numerous parsimony, distance, and likelihood methods have been proposed for phylogenetic analysis of these data; the relative merits of these methods have been debated vigorously, but their accuracy has not been well explored. In this study, I compare the performance of 13 phylogenetic methods (six parsimony, six distance, and continuous maximum likelihood) by applying a congruence approach to eight allozyme data sets from the literature. Clades are identified that are supported by multiple data sets other than allozymes (e.g. morphology, DNA sequences), and the ability of different methods to recover these 'known' clades is compared. The results suggest that (1) distance and likelihood methods generally outperform parsimony methods, (2) methods that utilize frequency data tend to perform well, and (3) continuous maximum likelihood is among the most accurate methods, and appears to be robust to violations of its assumptions. These results are in agreement with those from recent simulation studies, and help provide a basis for empirical workers to choose among the many methods available for analysing allozyme characters.  相似文献   

14.
The estimation of diversification rates using phylogenetic data has attracted a lot of attention in the past decade. In this context, the analysis of incomplete phylogenies (e.g. phylogenies resolved at the family level but unresolved at the species level) has remained difficult. I present here a likelihood-based method to combine partly resolved phylogenies with taxonomic (species-richness) data to estimate speciation and extinction rates. This method is based on fitting a birth-and-death model to both phylogenetic and taxonomic data. Some examples of the method are presented with data on birds and on mammals. The method is compared with existing approaches that deal with incomplete phylogenies. Some applications and generalizations of the approach introduced in this paper are further discussed.  相似文献   

15.
The phylogenetic relationships of oceanic dolphins (family Delphinidae) remain unclear. Several works using mitochondrial and/or nuclear DNA on different genera and species have been published, though no consensus exists regarding even the subfamilies that conform the family. Here, a new phylogeny for the family Delphinidae, including 36 different complete mitochondrial genomes (plus two outgroups), was constructed under Bayesian and maximum likelihood approaches. Results indicate identical tree topology in both cases, with almost all nodes fully supported independently of the reconstruction approach. This topology is different from those previously published and proposes new phylogenetic relationships among subfamilies, genera and species of the family. These findings are critically important for the study of oceanic dolphin taxonomy, ecology, evolution and conservation, and highlight the importance of revisiting and resolving uncertain phylogenies.  相似文献   

16.
Model‐based approaches (e.g. maximum likelihood, Bayesian inference) are widely used with molecular data, where they might be more appropriate than maximum parsimony for estimating phylogenies under various models of molecular evolution. Recently, there has been an increase in the application of model‐based approaches with morphological (mainly fossil) data; however, there is some doubt as to the effectiveness of the model of morphological evolution. The input parameters (prior probabilities) for the model are unclear, particularly when concerned with unobserved character states. Despite this, some systematists are suggesting superiority of these model‐based methods over maximum parsimony based on, for example, increased resolution or, in the current study, the preferred phylogenetic placement of an iconic taxon. Here, we revisit a recently published analysis implying such superiority and document the discrepancies between parsimony‐based and model‐based approaches to phylogeny estimation. We find that although some taxa are shifted back to their “traditional” phylogenetic placement, other clades are disturbed. The model‐based phylogenies are better resolved; however, due to the lack of an appropriate model of morphological evolution, the increase in resolving power is probably not meaningful. Similarly, some of the preferred phylogenetic positions of taxa, particularly of labile taxa such as Archaeopteryx, are based solely on analyses employing maximum parsimony as the optimality criterion. Poor resolution and labile taxa indicate a need for further examination of the morphology and not a change in method.  相似文献   

17.
In this paper, I develop efficient tools to simulate trees with a fixed number of extant species. The tools are provided in my open source R-package TreeSim available on CRAN. The new model presented here is a constant rate birth-death process with mass extinction and/or rate shift events at arbitrarily fixed times 1) before the present or 2) after the origin. The simulation approach for case (2) can also be used to simulate under more general models with fixed events after the origin. I use the developed simulation tools for showing that a mass extinction event cannot be distinguished from a model with constant speciation and extinction rates interrupted by a phase of stasis based on trees consisting of only extant species. However, once we distinguish between mass extinction and period of stasis based on paleontological data, fast simulations of trees with a fixed number of species allow inference of speciation and extinction rates using approximate Bayesian computation and allow for robustness analysis once maximum likelihood parameter estimations are available.  相似文献   

18.
A central task in the study of molecular evolution is the reconstruction of a phylogenetic tree from sequences of current-day taxa. The most established approach to tree reconstruction is maximum likelihood (ML) analysis. Unfortunately, searching for the maximum likelihood phylogenetic tree is computationally prohibitive for large data sets. In this paper, we describe a new algorithm that uses Structural Expectation Maximization (EM) for learning maximum likelihood phylogenetic trees. This algorithm is similar to the standard EM method for edge-length estimation, except that during iterations of the Structural EM algorithm the topology is improved as well as the edge length. Our algorithm performs iterations of two steps. In the E-step, we use the current tree topology and edge lengths to compute expected sufficient statistics, which summarize the data. In the M-Step, we search for a topology that maximizes the likelihood with respect to these expected sufficient statistics. We show that searching for better topologies inside the M-step can be done efficiently, as opposed to standard methods for topology search. We prove that each iteration of this procedure increases the likelihood of the topology, and thus the procedure must converge. This convergence point, however, can be a suboptimal one. To escape from such "local optima," we further enhance our basic EM procedure by incorporating moves in the flavor of simulated annealing. We evaluate these new algorithms on both synthetic and real sequence data and show that for protein sequences even our basic algorithm finds more plausible trees than existing methods for searching maximum likelihood phylogenies. Furthermore, our algorithms are dramatically faster than such methods, enabling, for the first time, phylogenetic analysis of large protein data sets in the maximum likelihood framework.  相似文献   

19.
We would like to use maximum likelihood to estimate parameters such as the effective population size N(e) or, if we do not know mutation rates, the product 4N(e) mu of mutation rate per site and effective population size. To compute the likelihood for a sample of unrecombined nucleotide sequences taken from a random-mating population it is necessary to sum over all genealogies that could have led to the sequences, computing for each one the probability that it would have yielded the sequences, and weighting each one by its prior probability. The genealogies vary in tree topology and in branch lengths. Although the likelihood and the prior are straightforward to compute, the summation over all genealogies seems at first sight hopelessly difficult. This paper reports that it is possible to carry out a Monte Carlo integration to evaluate the likelihoods approximately. The method uses bootstrap sampling of sites to create data sets for each of which a maximum likelihood tree is estimated. The resulting trees are assumed to be sampled from a distribution whose height is proportional to the likelihood surface for the full data. That it will be so is dependent on a theorem which is not proven, but seems likely to be true if the sequences are not short. One can use the resulting estimated likelihood curve to make a maximum likelihood estimate of the parameter of interest, N(e) or of 4N(e) mu. The method requires at least 100 times the computational effort required for estimation of a phylogeny by maximum likelihood, but is practical on today's work stations. The method does not at present have any way of dealing with recombination.  相似文献   

20.
Likelihood methods for detecting temporal shifts in diversification rates   总被引:8,自引:0,他引:8  
Maximum likelihood is a potentially powerful approach for investigating the tempo of diversification using molecular phylogenetic data. Likelihood methods distinguish between rate-constant and rate-variable models of diversification by fitting birth-death models to phylogenetic data. Because model selection in this context is a test of the null hypothesis that diversification rates have been constant over time, strategies for selecting best-fit models must minimize Type I error rates while retaining power to detect rate variation when it is present. Here I examine model selection, parameter estimation, and power to reject the null hypothesis using likelihood models based on the birth-death process. The Akaike information criterion (AIC) has often been used to select among diversification models; however, I find that selecting models based on the lowest AIC score leads to a dramatic inflation of the Type I error rate. When appropriately corrected to reduce Type I error rates, the birth-death likelihood approach performs as well or better than the widely used gamma statistic, at least when diversification rates have shifted abruptly over time. Analyses of datasets simulated under a range of rate-variable diversification scenarios indicate that the birth-death likelihood method has much greater power to detect variation in diversification rates when extinction is present. Furthermore, this method appears to be the only approach available that can distinguish between a temporal increase in diversification rates and a rate-constant model with nonzero extinction. I illustrate use of the method by analyzing a published phylogeny for Australian agamid lizards.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号