首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 203 毫秒
1.
The rate at which a given site in a gene sequence alignment evolves over time may vary. This phenomenon--known as heterotachy--can bias or distort phylogenetic trees inferred from models of sequence evolution that assume rates of evolution are constant. Here, we describe a phylogenetic mixture model designed to accommodate heterotachy. The method sums the likelihood of the data at each site over more than one set of branch lengths on the same tree topology. A branch-length set that is best for one site may differ from the branch-length set that is best for some other site, thereby allowing different sites to have different rates of change throughout the tree. Because rate variation may not be present in all branches, we use a reversible-jump Markov chain Monte Carlo algorithm to identify those branches in which reliable amounts of heterotachy occur. We implement the method in combination with our 'pattern-heterogeneity' mixture model, applying it to simulated data and five published datasets. We find that complex evolutionary signals of heterotachy are routinely present over and above variation in the rate or pattern of evolution across sites, that the reversible-jump method requires far fewer parameters than conventional mixture models to describe it, and serves to identify the regions of the tree in which heterotachy is most pronounced. The reversible-jump procedure also removes the need for a posteriori tests of 'significance' such as the Akaike or Bayesian information criterion tests, or Bayes factors. Heterotachy has important consequences for the correct reconstruction of phylogenies as well as for tests of hypotheses that rely on accurate branch-length information. These include molecular clocks, analyses of tempo and mode of evolution, comparative studies and ancestral state reconstruction. The model is available from the authors' website, and can be used for the analysis of both nucleotide and morphological data.  相似文献   

2.
Heterotachy occurs when the relative evolutionary rates among sites are not the same across lineages. Sequence alignments are likely to exhibit heterotachy with varying severity because the intensity of purifying selection and adaptive forces at a given amino acid or DNA sequence position is unlikely to be the same in different species. In a recent study, the influence of heterotachy on the performance of different phylogenetic methods was examined using computer simulation for a four-species phylogeny. Maximum parsimony (MP) was reported to generally outperform maximum likelihood (ML). However, our comparisons of MP and ML methods using the methods and evaluation criteria employed in that study, but considering the possible range of proportions of sites involved in heterotachy, contradict their findings and indicate that, in fact, ML is significantly superior to MP even under heterotachy.  相似文献   

3.
We conducted a simulation study of the phylogenetic methods UPGMA, neighbor joining, maximum parsimony, and maximum likelihood for a five-taxon tree under a molecular clock. The parameter space included a small region where maximum parsimony is inconsistent, so we tested inconsistency correction for parsimony and distance correction for neighbor joining. As expected, corrected parsimony was consistent. For these data, maximum likelihood with the clock assumption outperformed each of the other methods tested. The distance-based methods performed marginally better than did maximum parsimony and maximum likelihood without the clock assumption. Data correction was generally detrimental to accuracy, especially for short sequence lengths. We identified another region of the parameter space where, although consistent for a given method, some incorrect trees were each selected with up to twice the frequency of the correct (generating) tree for sequences of bounded length. These incorrect trees are those where the outgroup has been incorrectly placed. In addition to this problem, the placement of the outgroup sequence can have a confounding effect on the ingroup tree, whereby the ingroup is correct when using the ingroup sequences alone, but with the inclusion of the outgroup the ingroup tree becomes incorrect.  相似文献   

4.
Despite the advances in understanding molecular evolution, current phylogenetic methods barely take account of a fraction of the complexity of evolution. We are chiefly constrained by our incomplete knowledge of molecular evolutionary processes and the limits of computational power. These limitations lead to the establishment of either biologically simplistic models that rarely account for a fraction of the complexity involved or overfitting models that add little resolution to the problem. Such oversimplified models may lead us to assign high confidence to an incorrect tree (inconsistency). Rate-across-site (RAS) models are commonly used evolutionary models in phylogenetic studies. These account for heterogeneity in the evolutionary rates among sites but do not account for changing within-site rates across lineages (heterotachy). If heterotachy is common, using RAS models may lead to systematic errors in tree inference. In this work we show possible misleading effects in tree inference when the assumption of constant within-site rates across lineages is violated using maximum likelihood. Using a simulation study, we explore the ways in which gamma stationary models can lead to wrong topology or to deceptive bootstrap support values when the within-site rates change across lineages. More precisely, we show that different degrees of heterotachy mislead phylogenetic inference when the model assumed is stationary. Finally, we propose a geometry-based approach to visualize and to test for the possible existence of bias due to heterotachy.  相似文献   

5.
Likelihood and Inconsistency   总被引:2,自引:1,他引:2  
Parsimony can be inconsistent, but not maximum likelihood—likelihood advocates often say. This difference and conclusions drawn from it have provided the main reasons advanced by likelihoodists against the use of parsimony. Recent statistical research, however, shows that maximum likelihood estimation of phylogenetic trees can become inconsistent in all but the simplest cases, so that under realistic conditions the consistency of maximum likelihood cannot be assured. If likelihoodists wish to dispose of parsimony, they will have to find another argument.  相似文献   

6.
Kluge's (2001, Syst. Biol. 50:322-330) continued arguments that phylogenetic methods based on the statistical principle of likelihood are incompatible with the philosophy of science described by Karl Popper are based on false premises related to Kluge's misrepresentations of Popper's philosophy. Contrary to Kluge's conjectures, likelihood methods are not inherently verificationist; they do not treat every instance of a hypothesis as confirmation of that hypothesis. The historical nature of phylogeny does not preclude phylogenetic hypotheses from being evaluated using the probability of evidence. The low absolute probabilities of hypotheses are irrelevant to the correct interpretation of Popper's concept termed degree of corroboration, which is defined entirely in terms of relative probabilities. Popper did not advocate minimizing background knowledge; in any case, the background knowledge of both parsimony and likelihood methods consists of the general assumption of descent with modification and additional assumptions that are deterministic, concerning which tree is considered most highly corroborated. Although parsimony methods do not assume (in the sense of entailing) that homoplasy is rare, they do assume (in the sense of requiring to obtain a correct phylogenetic inference) certain things about patterns of homoplasy. Both parsimony and likelihood methods assume (in the sense of implying by the manner in which they operate) various things about evolutionary processes, although violation of those assumptions does not always cause the methods to yield incorrect phylogenetic inferences. Test severity is increased by sampling additional relevant characters rather than by character reanalysis, although either interpretation is compatible with the use of phylogenetic likelihood methods. Neither parsimony nor likelihood methods assess test severity (critical evidence) when used to identify a most highly corroborated tree(s) based on a single method or model and a single body of data; however, both classes of methods can be used to perform severe tests. The assumption of descent with modification is insufficient background knowledge to justify cladistic parsimony as a method for assessing degree of corroboration. Invoking equivalency between parsimony methods and likelihood models that assume no common mechanism emphasizes the necessity of additional assumptions, at least some of which are probabilistic in nature. Incongruent characters do not qualify as falsifiers of phylogenetic hypotheses except under extremely unrealistic evolutionary models; therefore, justifications of parsimony methods as falsificationist based on the idea that they minimize the ad hoc dismissal of falsifiers are questionable. Probabilistic concepts such as degree of corroboration and likelihood provide a more appropriate framework for understanding how phylogenetics conforms with Popper's philosophy of science. Likelihood ratio tests do not assume what is at issue but instead are methods for testing hypotheses according to an accepted standard of statistical significance and for incorporating considerations about test severity. These tests are fundamentally similar to Popper's degree of corroboration in being based on the relationship between the probability of the evidence e in the presence versus absence of the hypothesis h, i.e., between p(e|hb) and p(e|b), where b is the background knowledge. Both parsimony and likelihood methods are inductive in that their inferences (particular trees) contain more information than (and therefore do not follow necessarily from) the observations upon which they are based; however, both are deductive in that their conclusions (tree lengths and likelihoods) follow necessarily from their premises (particular trees, observed character state distributions, and evolutionary models). For these and other reasons, phylogenetic likelihood methods are highly compatible with Karl Popper's philosophy of science and offer several advantages over parsimony methods in this context.  相似文献   

7.
The nature of heterotachy at the center of recent controversy over the relative performance of tree-building methods is different from the form of heterotachy that has been inferred in empirical studies. The latter have suggested that proportions of variable sites (p(var)) vary among orthologues and among paralogues. However, the strength of this inference, describing what may be one of the most important evolutionary properties of sequence data, has remained weak. Consequently, other models of sequence evolution have been proposed to explain some long-branch attraction (LBA) problems that could be attributed to differences in p(var). For an empirical case with plastid and eubacterial RNA polymerase sequences, we confirm using capture-recapture estimates and simulations that p(var) can differ among orthologues in anciently diverged evolutionary lineages. We find that parsimony and a least squares distance method that implements an overly simple model of sequence evolution are susceptible to LBA induced by this form of heterotachy. Although homogeneous maximum likelihood inference was found to be robust to model misspecification in our specific example, we caution against assuming that it will always be so.  相似文献   

8.
Phylogenetic analyses of DNA sequences were conducted to evaluate four alternative hypotheses of phrynosomatine sand lizard relationships. Sequences comprising 2871 aligned base pair positions representing the regions spanning ND1-COI and cyt b-tRNA(Thr) of the mitochondrial genome from all recognized sand lizard species were analyzed using unpartitioned parsimony and likelihood methods, likelihood methods with assumed partitions, Bayesian methods with assumed partitions, and Bayesian mixture models. The topology (Uma, (Callisaurus, (Cophosaurus, Holbrookia))) and thus monophyly of the "earless" taxa, Cophosaurus and Holbrookia, is supported by all analyses. Previously proposed topologies in which Uma and Callisaurus are sister taxa and those in which Holbrookia is the sister group to all other sand lizard taxa are rejected using both parsimony and likelihood-based significance tests with the combined, unparitioned data set. Bayesian hypothesis tests also reject those topologies using six assumed partitioning strategies, and the two partitioning strategies presumably associated with the most powerful tests also reject a third previously proposed topology, in which Callisaurus and Cophosaurus are sister taxa. For both maximum likelihood and Bayesian methods with assumed partitions, those partitions defined by codon position and tRNA stem and nonstems explained the data better than other strategies examined. Bayes factor estimates comparing results of assumed partitions versus mixture models suggest that mixture models perform better than assumed partitions when the latter were not based on functional characteristics of the data, such as codon position and tRNA stem and nonstems. However, assumed partitions performed better than mixture models when functional differences were incorporated. We reiterate the importance of accounting for heterogeneous evolutionary processes in the analysis of complex data sets and emphasize the importance of implementing mixed model likelihood methods.  相似文献   

9.
Parsimony, likelihood, and simplicity   总被引:2,自引:1,他引:1  
The latest charge against parsimony in phylogenetic inference is that it involves estimating too many parameters. The charge is derived from the fact that, when each character is allowed a branch length vector of its own (instead of the homogeneous branch lengths assumed in current likelihood models), the results for likelihood and parsimony are identical. Parsimony, however, can also be derived from simpler models, involving fewer parameters. Therefore, parsimony provides (as many authors had argued before) the simplest explanation of the data, or the most realistic, depending on one's views. If (as argued by likelihoodists) phylogenetic inference is to use the simplest model that provides sufficient explanation of the data, the starting point of phylogenetic analyses should be parsimony, not maximum likelihood. If the addition of new parameters (which increase the likelihood) to a parsimony estimation is seen as desirable, this may lead to a preference for results based on current likelihood models. If the addition of parameters is continued, however, the results will eventually come back to the same place where they had started, since allowing each character a branch length of its own also produces parsimony. Parsimony can be justified by very different types of models—either very complex or very simple. This suggests that parsimony does have a unique place among methods of phylogenetic estimation.  相似文献   

10.
The present paper is mainly concerned with homology assessment through phylogenetic analyses. It raises a fundamental question: What are the epistemological differences between modern parsimony and model‐based analyses in relation to homology assessment and phylogenetic inference? Although these methods usually achieve concordant topological results, they may generate discordant inferences of character evolution from the same datasets. This indicates that method selection has serious implications for evolutionary scenarios and classificatory arrangements. Notwithstanding that parsimony and model‐based approaches use the Hennigian concepts of monophyly and synapomorphy, they employ different epistemological ways of dealing with the monophyly/synapomorphy relationship. Independently of their differences, these analyses should take into account all relevant evidence in support of the phylogenetic inferences. A focus on morphological homologues means that they must be included in data matrices, evaluated as part of the phylogenetic analysis, and cannot be ignored in calculation of the tree(s) length (parsimony), maximum‐likelihood (maximum‐likelihood), and posterior probabilities (Bayes).  相似文献   

11.
Phylogenetic analysis of large datasets using complex nucleotide substitution models under a maximum likelihood framework can be computationally infeasible, especially when attempting to infer confidence values by way of nonparametric bootstrapping. Recent developments in phylogenetics suggest the computational burden can be reduced by using Bayesian methods of phylogenetic inference. However, few empirical phylogenetic studies exist that explore the efficiency of Bayesian analysis of large datasets. To this end, we conducted an extensive phylogenetic analysis of the wide-ranging and geographically variable Eastern Fence Lizard (Sceloporus undulatus). Maximum parsimony, maximum likelihood, and Bayesian phylogenetic analyses were performed on a combined mitochondrial DNA dataset (12S and 16S rRNA, ND1 protein-coding gene, and associated tRNA; 3,688 bp total) for 56 populations of S. undulatus (78 total terminals including other S. undulatus group species and outgroups). Maximum parsimony analysis resulted in numerous equally parsimonious trees (82,646 from equally weighted parsimony and 335 from weighted parsimony). The majority rule consensus tree derived from the Bayesian analysis was topologically identical to the single best phylogeny inferred from the maximum likelihood analysis, but required approximately 80% less computational time. The mtDNA data provide strong support for the monophyly of the S. undulatus group and the paraphyly of "S. undulatus" with respect to S. belli, S. cautus, and S. woodi. Parallel evolution of ecomorphs within "S. undulatus" has masked the actual number of species within this group. This evidence, along with convincing patterns of phylogeographic differentiation suggests "S. undulatus" represents at least four lineages that should be recognized as evolutionary species.  相似文献   

12.
We develop a new approach to estimate a matrix of pairwise evolutionary distances from a codon-based alignment based on a codon evolutionary model. The method first computes a standard distance matrix for each of the three codon positions. Then these three distance matrices are weighted according to an estimate of the global evolutionary rate of each codon position and averaged into a unique distance matrix. Using a large set of both real and simulated codon-based alignments of nucleotide sequences, we show that this approach leads to distance matrices that have a significantly better treelikeness compared to those obtained by standard nucleotide evolutionary distances. We also propose an alternative weighting to eliminate the part of the noise often associated with some codon positions, particularly the third position, which is known to induce a fast evolutionary rate. Simulation results show that fast distance-based tree reconstruction algorithms on distance matrices based on this codon position weighting can lead to phylogenetic trees that are at least as accurate as, if not better, than those inferred by maximum likelihood. Finally, a well-known multigene dataset composed of eight yeast species and 106 codon-based alignments is reanalyzed and shows that our codon evolutionary distances allow building a phylogenetic tree which is similar to those obtained by non-distance-based methods (e.g., maximum parsimony and maximum likelihood) and also significantly improved compared to standard nucleotide evolutionary distance estimates.  相似文献   

13.
Comparative restriction site mapping of the chloroplast genome was performed to examine phylogenetic relationships among 27 species representing 16 genera of the Berberidaceae and two outgroups. Chloroplast genomes of the species included in this study showed no major structural rearrangements (i.e., they are collinear to tobacco cpDNA) except for the extension of the inverted repeat in species of Berberis and Mahonia. Excluding several regions that exhibited severe length variation, a total of 501 phylogenetically informative sites was mapped for ten restriction enzymes. The strict consensus tree of 14 equally parsimonious trees indicated that some berberidaceous genera (Berberis, Mahonia, Diphylleia) are not monophyletic. To explore phylogenetic utility of different parsimony methods phylogenetic trees were generated using Wagner, Dollo, and weighted parsimony for a reduced data set that included 18 species. One of the most significant results was the recognition of the four chromosomal groups, which were strongly supported regardless of the parsimony method used. The most notable difference among the trees produced by the three parsimony methods was the relationships among the four chromosomal groups. The cpDNA trees also strongly supported a close relationship of several generic pairs (e.g., Berberis-Mahonia, Epimedium-Vancouveria, etc.). Maximum likelihood values were computed for the four different tree topologies of the chromosomal groups, two Wagner, one Dollo, and one weighted topology. The results indicate that the weighted tree has the highest likelihood value. The lowest likelihood value was obtained for the Dollo tree, which had the highest bootstrap and decay values. Separate analyses using only the Inverted Repeat (IR) region resulted in a tree that is identical to the weighted tree. Poor resolution and/or support for the relationships among the four chromosomal lineages of the Berberidaceae indicate that they may have radiated from an ancestral stock in a relatively short evolutionary time.  相似文献   

14.
The increase in the number of large data sets and the complexity of current probabilistic sequence evolution models necessitates fast and reliable phylogeny reconstruction methods. We describe a new approach, based on the maximum- likelihood principle, which clearly satisfies these requirements. The core of this method is a simple hill-climbing algorithm that adjusts tree topology and branch lengths simultaneously. This algorithm starts from an initial tree built by a fast distance-based method and modifies this tree to improve its likelihood at each iteration. Due to this simultaneous adjustment of the topology and branch lengths, only a few iterations are sufficient to reach an optimum. We used extensive and realistic computer simulations to show that the topological accuracy of this new method is at least as high as that of the existing maximum-likelihood programs and much higher than the performance of distance-based and parsimony approaches. The reduction of computing time is dramatic in comparison with other maximum-likelihood packages, while the likelihood maximization ability tends to be higher. For example, only 12 min were required on a standard personal computer to analyze a data set consisting of 500 rbcL sequences with 1,428 base pairs from plant plastids, thus reaching a speed of the same order as some popular distance-based and parsimony algorithms. This new method is implemented in the PHYML program, which is freely available on our web page: http://www.lirmm.fr/w3ifa/MAAS/.  相似文献   

15.
Quantifying branch support using the bootstrap and/or jackknife is generally considered to be an essential component of rigorous parsimony and maximum likelihood phylogenetic analyses. Previous authors have described how application of the frequency-within-replicates approach to treating multiple equally optimal trees found in a given bootstrap pseudoreplicate can provide apparent support for otherwise unsupported clades. We demonstrate how a similar problem may occur when a non-representative subset of equally optimal trees are held per pseudoreplicate, which we term the undersampling-within-replicates artifact. We illustrate the frequency-within-replicates and undersampling-within-replicates bootstrap and jackknife artifacts using both contrived and empirical examples, demonstrate that the artifacts can occur in both parsimony and likelihood analyses, and show that the artifacts occur in outputs from multiple different phylogenetic-inference programs. Based on our results, we make the following five recommendations, which are particularly relevant to supermatrix analyses, but apply to all phylogenetic analyses. First, when two or more optimal trees are found in a given pseudoreplicate they should be summarized using the strict-consensus rather than frequency-within-replicates approach. Second jackknife resampling should be used rather than bootstrap resampling. Third, multiple tree searches while holding multiple trees per search should be conducted in each pseudoreplicate rather than conducting only a single search and holding only a single tree. Fourth, branches with a minimum possible optimized length of zero should be collapsed within each tree search rather than collapsing branches only if their maximum possible optimized length is zero. Fifth, resampling values should be mapped onto the strict consensus of all optimal trees found rather than simply presenting the ≥ 50% bootstrap or jackknife tree or mapping the resampling values onto a single optimal tree.  相似文献   

16.
A comparative study of the accuracy of three different approaches to phylogenetic estimation was made on simulated data with differing rates of change in different lineages. The three approaches were maximum likelihood, maximum parsimony, and phenetic clustering. The data were generated by simulating genetic drift with different population sizes over a simple four-species tree topology. Although the accuracy of all three approaches was found to be dependent on the number of loci (characters), maximum likelihood was found to perform considerably and consistently better than maximum parsimony or phenetic clustering.  相似文献   

17.
Given a collection of discrete characters (e.g., aligned DNA sites, gene adjacencies), a common measure of distance between taxa is the proportion of characters for which taxa have different character states. Tree reconstruction based on these (uncorrected) distances can be statistically inconsistent and can lead to trees different from those obtained using character-based methods such as maximum likelihood or maximum parsimony. However, in these cases the distance data often reveal their unreliability by some deviation from additivity, as indicated by conflicting support for more than one tree. We describe two results that show how uncorrected (and miscorrected) distance data can be simultaneously perfectly additive and misleading. First, multistate character data can be perfectly compatible and define one tree, and yet the uncorrected distances derived from these characters are perfectly treelike (and obey a molecular clock), only for a completely different tree. Second, under a Markov model of character evolution a similar phenomenon can occur; not only is there statistical inconsistency using uncorrected distances, but there is no evidence of this inconsistency because the distances look perfectly treelike (this does not occur in the classic two-parameter Felsenstein zone). We characterize precisely when uncorrected distances are additive on the true (and on a false) tree for four taxa. We also extend this result to a more general setting that applies to distances corrected according to an incorrect model.  相似文献   

18.
A group II intron containing the matK gene, which encodes a splicing-associated maturase, was found in the trnK (lysine tRNA) exon in the chloroplast genome of the six extant genera of green algae in the family Characeae, which among green algae are the sister group to embryophytes (land plants). The characean trnK intron (~2.5 kilobases [kb]) and matK ORF (~1.5 kb) are comparable in size to the intron and ORF of land plants, in which they are similarly found inserted in the trnK exon. Domain X, a sequence of conserved amino acid residues within matK, occurs in the Characeae. Phylogenetic analysis using maximum likelihood (GTR + I + gamma likelihood model) and parsimony (branch and bound search) yielded one tree with high bootstrap support for all branches. The matK tree was congruent with the rbcL tree for the same taxa. The number and proportion of informative sites was higher in matK (501, 31% of matK sequence) compared to rbcL (122, 10%). Characeae branch lengths were on average more than five times longer for matK compared to rbcL and provided better resolution within the Characeae. These findings along with recent genomic analyses demonstrate that the intron and matK invaded the chloroplast genome of green algae prior to the evolution of land plants.  相似文献   

19.
Tuffley and Steel (Bull. Math. Biol. 59:581–607, 1997) proved that maximum likelihood and maximum parsimony methods in phylogenetics are equivalent for sequences of characters under a simple symmetric model of substitution with no common mechanism. This result has been widely cited ever since. We show that small changes to the model assumptions suffice to make the two methods inequivalent. In particular, we analyze the case of bounded substitution probabilities as well as the molecular clock assumption. We show that in these cases, even under no common mechanism, maximum parsimony and maximum likelihood might make conflicting choices. We also show that if there is an upper bound on the substitution probabilities which is ‘sufficiently small’, every maximum likelihood tree is also a maximum parsimony tree (but not vice versa).  相似文献   

20.
Summary A method for molecular phylogeny construction is newly developed. The method, called the stepwise ancestral sequence method, estimates molecular phylogenetic trees and ancestral sequences simultaneously on the basis of parsimony and sequence homology. For simplicity the emphasis is placed more on parsiomony than on sequence homology in the present study, though both are certainly important. Because parsimony alone will sometimes generate plural candidate trees, the method retains not one but five candidates from which one can then single out the final tree taking other criteria into account.The properties and performance of the method are then examined by simulating an evolving gene along a model phylogenetic tree. The estimated trees are found to lie in a narrow range of the parsimony criteria used in the present study. Thus, other criteria such as biological evidence and likelihood are necessary to single out the correct tree among them, with biological evidence taking precedence over any other criterion. The computer simulation also reveals that the method satisfactorily estimates both tree topology and ancestral sequences, at least for the evolutionary model used in the present study.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号