首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In this paper we investigate mathematical questions concerning the reliability (reconstruction accuracy) of Fitch's maximum parsimony algorithm for reconstructing the ancestral state given a phylogenetic tree and a character. In particular, we consider the question whether the maximum parsimony method applied to a subset of taxa can reconstruct the ancestral state of the root more accurately than when applied to all taxa, and we give an example showing that this indeed is possible. A surprising feature of our example is that ignoring a taxon closer to the root improves the reliability of the method. On the other hand, in the case of the two-state symmetric substitution model, we answer affirmatively a conjecture of Li, Steel and Zhang which states that under a molecular clock the probability that the state at a single taxon is a correct guess of the ancestral state is a lower bound on the reconstruction accuracy of Fitch's method applied to all taxa.  相似文献   

2.
ANOTHER MONOPHYLY INDEX: REVISITING THE JACKKNIFE   总被引:1,自引:0,他引:1  
Abstract — Randomization routines have quickly gained wide usage in phylogenetic systematies. Introduced a decade ago, the jackknife has rarely been applied in cladistic methodology. This data resampling technique was re-investigated here as a means to discover the effect that taxon removal may have on the stability of the results obtained from parsimony analyses. This study shows that the removal of even a single taxon in an analysis can cause a solution of relatively few multiple equally parsimonious trees in an inclusive matrix to result in hundreds of equally parsimonious trees with the single removal of a taxon. On the other hand, removal of other taxa can stabilize the results to fewer trees. An index of clade stability, the Jackknife Monophyly Index (JMI) is developed which, like the bootstrap, applies a value to each clade according to its frequency of occurrence in jackknife pseudoreplicates. Unlike the bootstrap and earlier application of the jackknife, alternative suboptimal hypotheses are not forwarded by this method. Only those clades in the most parsimonious tree(s) are given JMI values. The behaviour of this index is investigated both in relation to a hypothetical and a real data set, as well as how it performs in comparison to the bootstrap. The JMI is found to not be influenced by uninformative characters or relative synapomorphy number, unlike the bootstrap.  相似文献   

3.
Many molecular phylogenies show longer root-to-tip path lengths in species-rich groups, encouraging hypotheses linking cladogenesis with accelerated molecular evolution. However, the pattern can also be caused by an artifact called the node density effect (NDE): this effect occurs when the method used to reconstruct a tree underestimates multiple hits that would have been revealed by extra nodes, leading to longer root-to-tip path lengths in clades with more terminal taxa. Here we use a twofold approach to demonstrate that maximum likelihood and Bayesian methods also suffer from the NDE known to affect parsimony. First, simulations deliberately mismatching the simulation and reconstruction models show that the greater the model disparity, the greater the gap between actual and reconstructed tree lengths, and the greater the NDE. Second, taxon sampling manipulation with empirical data shows that NDE can still be present when using optimized models: across 12 datasets, 70 out of 109 sister path comparisons showed significant evidence of NDE. Unless the model fairly accurately reconstructs the real tree length-and given the complexity of real sequence evolution this may be uncommon -- it will consistently produce a node density artifact. At commonly encountered divergence levels, a 10% underestimation of tree length results in > or = 80% of simulated phylogenies showing a positive NDE. Bayesian trees have a slight but consistently stronger effect. This pervasive methodological artifact increases apparent rate heterogeneity, and can compromise investigations of factors influencing molecular evolutionary rate that use path lengths in topologically asymmetric trees.  相似文献   

4.
A stepwise algorithm for finding minimum evolution trees   总被引:7,自引:6,他引:1  
A stepwise algorithm for reconstructing minimum evolution (ME) trees from evolutionary distance data is proposed. In each step, a taxon that potentially has a neighbor (another taxon connected to it with a single interior node) is first chosen and then its true neighbor searched iteratively. For m taxa, at most (m-1)!/2 trees are examined and the tree with the minimum sum of branch lengths (S) is chosen as the final tree. This algorithm provides simple strategies for restricting the tree space searched and allows us to implement efficient ways of dynamically computing the ordinary least squares estimates of S for the topologies examined. Using computer simulation, we found that the efficiency of the ME method in recovering the correct tree is similar to that of the neighbor-joining method (Saitou and Nei 1987). A more exhaustive search is unlikely to improve the efficiency of the ME method in finding the correct tree because the correct tree is almost always included in the tree space searched with this stepwise algorithm. The new algorithm finds trees for which S values may not be significantly different from that of the ME tree if the correct tree contains very small interior branches or if the pairwise distance estimates have large sampling errors. These topologies form a set of plausible alternatives to the ME tree and can be compared with each other using statistical tests based on the minimum evolution principle. The new algorithm makes it possible to use the ME method for large data sets.   相似文献   

5.

Background

Supertree methods combine trees on subsets of the full taxon set together to produce a tree on the entire set of taxa. Of the many supertree methods, the most popular is MRP (Matrix Representation with Parsimony), a method that operates by first encoding the input set of source trees by a large matrix (the "MRP matrix") over {0,1, ?}, and then running maximum parsimony heuristics on the MRP matrix. Experimental studies evaluating MRP in comparison to other supertree methods have established that for large datasets, MRP generally produces trees of equal or greater accuracy than other methods, and can run on larger datasets. A recent development in supertree methods is SuperFine+MRP, a method that combines MRP with a divide-and-conquer approach, and produces more accurate trees in less time than MRP. In this paper we consider a new approach for supertree estimation, called MRL (Matrix Representation with Likelihood). MRL begins with the same MRP matrix, but then analyzes the MRP matrix using heuristics (such as RAxML) for 2-state Maximum Likelihood.

Results

We compared MRP and SuperFine+MRP with MRL and SuperFine+MRL on simulated and biological datasets. We examined the MRP and MRL scores of each method on a wide range of datasets, as well as the resulting topological accuracy of the trees. Our experimental results show that MRL, coupled with a very good ML heuristic such as RAxML, produced more accurate trees than MRP, and MRL scores were more strongly correlated with topological accuracy than MRP scores.

Conclusions

SuperFine+MRP, when based upon a good MP heuristic, such as TNT, produces among the best scores for both MRP and MRL, and is generally faster and more topologically accurate than other supertree methods we tested.  相似文献   

6.
Random trees and random characters can be used in null models for testing phylogenetic hypothesis. We consider three interpretations of random trees: first, that trees are selected from the set of all possible trees with equal probability; second, that trees are formed by random speciation or coalescence (equivalent); and third, that trees are formed by a series of random partitions of the taxa. We consider two interpretations of random characters: first, that the number of taxa with each state is held constant, but the states are randomly reshuffled among the taxa; and second, that the probability each taxon is assigned a particular state is constant from one taxon to the next. Under null models representing various combinations of randomizations of trees and characters, exact recursion equations are given to calculate the probability distribution of the number of character state changes required by a phylogenetic tree. Possible applications of these probability distributions are discussed. They can be used, for example, to test for a panmictic population structure within a species or to test phylogenetic inertia in a character's evolution. Whether and how a null model incorporates tree randomness makes little difference to the probability distribution in many but not all circumstances. The null model's sense of character randomness appears more critical. The difficult issue of choosing a null model is discussed.  相似文献   

7.
Parsimony methods infer phylogenetic trees by minimizing number of character changes required to explain observed character states. From the perspective of applicability of parsimony methods, it is important to assess whether the characters used to infer phylogeny are likely to provide a correct tree. We introduce a graph theoretical characterization that helps to assess whether given set of characters is appropriate to use with parsimony methods. Given a set of characters and a set of taxa, we construct a network called character overlap graph. We show that the character overlap graph for characters that are appropriate to use in parsimony methods is characterized by significant under-representation of subnetworks known as holes, and provide a validation for this observation. This characterization explains success in constructing evolutionary trees using parsimony method for some characters (e.g., protein domains) and lack of such success for other characters (e.g., introns). In the latter case, the understanding of obstacles to applying parsimony methods in a direct way has lead us to a new approach for detecting inconsistent and/or noisy data. Namely, we introduce the concept of stable characters which is similar but less restrictive than the well known concept of pairwise compatible characters. Application of this approach to introns produces the evolutionary tree consistent with the Coelomata hypothesis.  相似文献   

8.
Taxon sampling and seed plant phylogeny   总被引:2,自引:0,他引:2  
We investigated the effects of taxon sampling on phylogenetic inference by exchanging terminals in two sizes of rbcL matrices for seed plants, applying parsimony and bayesian analyses to ten 38‐taxon matrices and ten 80‐taxon matrices. In comparing tree topologies we concentrated on the position of the Gnetales, an important group whose placement has long been disputed. With either method, trees obtained from different taxon samples could be mutually contradictory and even disagree on groups that seemed strongly supported. Adding terminals improved the consistency of results for unweighted parsimony, but not for parsimony with third positions excluded and not for bayesian analysis, particularly when the general time‐reversible model was employed. This suggests that attempting to resolve deep relationships using only a few taxa can lead to spurious conclusions, groupings unlikely to be repeatable with different taxon samplings or larger data sets. The effect of taxon sampling has not generally been recognized, and phylogenetic studies of seed plants have often been based on few taxa. Such insufficient sampling may help explain the variety of phylogenetic hypotheses for seed plants proposed in recent years. We recommend that restricted data sets such as single‐gene subsets of multigene studies should be reanalyzed with alternative selections of terminals to assess topological consistency.  相似文献   

9.

Motivation

Species tree estimation from gene trees can be complicated by gene duplication and loss, and “gene tree parsimony” (GTP) is one approach for estimating species trees from multiple gene trees. In its standard formulation, the objective is to find a species tree that minimizes the total number of gene duplications and losses with respect to the input set of gene trees. Although much is known about GTP, little is known about how to treat inputs containing some incomplete gene trees (i.e., gene trees lacking one or more of the species).

Results

We present new theory for GTP considering whether the incompleteness is due to gene birth and death (i.e., true biological loss) or taxon sampling, and present dynamic programming algorithms that can be used for an exact but exponential time solution for small numbers of taxa, or as a heuristic for larger numbers of taxa. We also prove that the “standard” calculations for duplications and losses exactly solve GTP when incompleteness results from taxon sampling, although they can be incorrect when incompleteness results from true biological loss. The software for the DP algorithm is freely available as open source code at https://github.com/smirarab/DynaDup.
  相似文献   

10.
Increased taxon sampling greatly reduces phylogenetic error   总被引:1,自引:0,他引:1  
Several authors have argued recently that extensive taxon sampling has a positive and important effect on the accuracy of phylogenetic estimates. However, other authors have argued that there is little benefit of extensive taxon sampling, and so phylogenetic problems can or should be reduced to a few exemplar taxa as a means of reducing the computational complexity of the phylogenetic analysis. In this paper we examined five aspects of study design that may have led to these different perspectives. First, we considered the measurement of phylogenetic error across a wide range of taxon sample sizes, and conclude that the expected error based on randomly selecting trees (which varies by taxon sample size) must be considered in evaluating error in studies of the effects of taxon sampling. Second, we addressed the scope of the phylogenetic problems defined by different samples of taxa, and argue that phylogenetic scope needs to be considered in evaluating the importance of taxon-sampling strategies. Third, we examined the claim that fast and simple tree searches are as effective as more thorough searches at finding near-optimal trees that minimize error. We show that a more complete search of tree space reduces phylogenetic error, especially as the taxon sample size increases. Fourth, we examined the effects of simple versus complex simulation models on taxonomic sampling studies. Although benefits of taxon sampling are apparent for all models, data generated under more complex models of evolution produce higher overall levels of error and show greater positive effects of increased taxon sampling. Fifth, we asked if different phylogenetic optimality criteria show different effects of taxon sampling. Although we found strong differences in effectiveness of different optimality criteria as a function of taxon sample size, increased taxon sampling improved the results from all the common optimality criteria. Nonetheless, the method that showed the lowest overall performance (minimum evolution) also showed the least improvement from increased taxon sampling. Taking each of these results into account re-enforces the conclusion that increased sampling of taxa is one of the most important ways to increase overall phylogenetic accuracy.  相似文献   

11.
A molecular phylogeny of annelids   总被引:6,自引:0,他引:6  
We present parsimony analyses of annelids based on the largest taxon sample and most extensive molecular data set yet assembled, with two nuclear ribosomal genes (18S rDNA and the D1 region of 28S rDNA), one nuclear protein coding‐gene (Histone H3) and one mitochondrial ribosomal gene (16S rDNA) from 217 terminal taxa. Of these, 267 sequences are newly sequenced, and the remaining were obtained from GenBank. The included taxa are based on the criteria that the taxon must have 18S rDNA or at least two other loci. Our analyses show that 68% of annelid family ranked taxa represented by more than one taxon in our study are supported by a jackknife value > 50%. In spite of the size of our data set, the phylogenetic signal in the deepest part of the tree remains weak and the majority of the currently recognized major polychaete clades (except Amphinomida and Aphroditiformia) could not be recovered. Terbelliformia is monophyletic (with the exclusion of Pectinariidae, for which only 18S data were available), whereas members of taxa such as Phyllodocida, Cirratuliformia, Sabellida and Scolecida are scattered over the trees. Clitellata is monophyletic, although Dinophilidae should possibly be included, and Clitellata has a sister group within the polychaetes. One major problem is the current lack of knowledge on the closest relatives to annelids and the position of the annelid root. We suggest that the poor resolution in the basal parts of the trees presented here may be due to lack of signal connected to incomplete data sets both in terms of terminal and gene sampling, rapid radiation events and/or uneven evolutionary rates and long‐branch attraction. © The Willi Hennig Society 2006.  相似文献   

12.
The Parsimony Ratchet, a New Method for Rapid Parsimony Analysis   总被引:26,自引:2,他引:26  
The Parsimony Ratchet 1 1 This method, the Parsimony Ratchet, was originally presented at the Numerical Cladistics Symposium at the American Museum of Natural History, New York, in May 1998 (see Horovitz, 1999) and at the Meeting of the Willi Hennig Society (Hennig XVII) in September 1998 in São Paulo, Brazil.
is presented as a new method for analysis of large data sets. The method can be easily implemented with existing phylogenetic software by generating batch command files. Such an approach has been implemented in the programs DADA (Nixon, 1998) and Winclada (Nixon, 1999). The Parsimony Ratchet has also been implemented in the most recent versions of NONA (Goloboff, 1998). These implementations of the ratchet use the following steps: (1) Generate a starting tree (e.g., a “Wagner” tree followed by some level of branch swapping or not). (2) Randomly select a subset of characters, each of which is given additional weight (e.g., add 1 to the weight of each selected character). (3) Perform branch swapping (e.g., “branch-breaking” or TBR) on the current tree using the reweighted matrix, keeping only one (or few) tree. (4) Set all weights for the characters to the “original” weights (typically, equal weights). (5) Perform branch swapping (e.g., branch-breaking or TBR) on the current tree (from step 3) holding one (or few) tree. (6) Return to step 2. Steps 2–6 are considered to be one iteration, and typically, 50–200 or more iterations are performed. The number of characters to be sampled for reweighting in step 2 is determined by the user; I have found that between 5 and 25% of the characters provide good results in most cases. The performance of the ratchet for large data sets is outstanding, and the results of analyses of the 500 taxon seed plant rbcL data set (Chase et al., 1993) are presented here. A separate analysis of a three-gene data set for 567 taxa will be presented elsewhere (Soltis et al., in preparation) demonstrating the same extraordinary power. With the 500-taxon data set, shortest trees are typically found within 22 h (four runs of 200 iterations) on a 200-MHz Pentium Pro. These analyses indicate efficiency increases of 20×–80× over “traditional methods” such as varying taxon order randomly and holding few trees, followed by more complete analyses of the best trees found, and thousands of times faster than nonstrategic searches with PAUP. Because the ratchet samples many tree islands with fewer trees from each island, it provides much more accurate estimates of the “true” consensus than collecting many trees from few islands. With the ratchet, Goloboff's NONA, and existing computer hardware, data sets that were previously intractable or required months or years of analysis with PAUP* can now be adequately analyzed in a few hours or days.  相似文献   

13.
Comparisons are made of the accuracy of the restricted maximum-likelihood, Wagner parsimony, and UPGMA (unweighted pair-group method using arithmetic averages) clustering methods to estimate phylogenetic trees. Data matrices were generated by constructing simulated stochastic evolution in a multidimensional gene-frequency space using a simple genetic-drift model (Brownian-motion, random-walk) with constant rates of divergence in all lineages. Ten differentphylogenetic tree topologies of 20 operational taxonomic units (OTU's), representing a range of tree shapes, were used. Felsenstein's restricted maximum-likelihood method, Wagner parsimony, and UPGMA clustering were used to construct trees from the resulting data matrices. The computations for the restricted maximum-likelihood method were performed on a Cray-1 supercomputer since the required calculations (especially when optimized for the vector hardware) are performed substantially faster than on more conventional computing systems. The overall level of accuracy of tree reconstruction depends on the topology of the true phylogenetic tree. The UPGMA clustering method, especially when genetic-distance coefficients are used, gives the most accurate estimates of the true phylogeny (for our model with constant evolutionary rates). For large numbers of loci, all methods give similar results, but trends in the results imply that the restricted maximum-likelihood method would produce the most accurate trees if sample sizes were large enough.  相似文献   

14.
The proliferation of gene data from multiple loci of large multigene families has been greatly facilitated by considerable recent advances in sequence generation. The evolution of such gene families, which often undergo complex histories and different rates of change, combined with increases in sequence data, pose complex problems for traditional phylogenetic analyses, and in particular, those that aim to successfully recover species relationships from gene trees. Here, we implement gene tree parsimony analyses on multicopy gene family data sets of snake venom proteins for two separate groups of taxa, incorporating Bayesian posterior distributions as a rigorous strategy to account for the uncertainty present in gene trees. Gene tree parsimony largely failed to infer species trees congruent with each other or with species phylogenies derived from mitochondrial and single-copy nuclear sequences. Analysis of four toxin gene families from a large expressed sequence tag data set from the viper genus Echis failed to produce a consistent topology, and reanalysis of a previously published gene tree parsimony data set, from the family Elapidae, suggested that species tree topologies were predominantly unsupported. We suggest that gene tree parsimony failure in the family Elapidae is likely the result of unequal and/or incomplete sampling of paralogous genes and demonstrate that multiple parallel gene losses are likely responsible for the significant species tree conflict observed in the genus Echis. These results highlight the potential for gene tree parsimony analyses to be undermined by rapidly evolving multilocus gene families under strong natural selection.  相似文献   

15.
The method of evolutionary parsimony--or operator invariants--is a technique of nucleic acid sequence analysis related to parsimony analysis and explicitly designed for determining evolutionary relationships among four distantly related taxa. The method is independent of substitution rates because it is derived from consideration of the group properties of substitution operators rather than from an analysis of the probabilities of substitution in branches of a tree. In both parsimony and evolutionary parsimony, three patterns of nucleotide substitution are associated one-to-one with the three topologically linked trees for four taxa. In evolutionary parsimony, the three quantities are operator invariants. These invariants are the remnants of substitutions that have occurred in the interior branch of the tree and are analogous to the substitutions assigned to the central branch by parsimony. The two invariants associated with the incorrect trees must equal zero (statistically), whereas only the correct tree can have a nonzero invariant. The chi 2-test is used to ascertain the nonzero invariant and the statistically favored tree. Examples, obtained using data calculated with evolutionary rates and branchings designed to camouflage the true tree, show that the method accurately predicts the tree, even when substitution rates differ greatly in neighboring peripheral branches (conditions under which parsimony will consistently fail). As the number of substitutions in peripheral branches becomes fewer, the parsimony and the evolutionary-parsimony solutions converge. The method is robust and easy to use.   相似文献   

16.
Although long-branch attraction (LBA) is frequently cited as the cause of anomalous phylogenetic groupings, few examples of LBA involving real sequence data are known. We have found several cases of probable LBA by analyzing subsamples from an alignment of 18S rDNA sequences for 133 metazoans. In one example, maximum parsimony analysis of sequences from two rotifers, a ctenophore, and a polychaete annelid resulted in strong support for a tree grouping two "long-branch taxa" (a rotifer and the ctenophore). Maximum-likelihood analysis of the same sequences yielded strong support for a more biologically reasonable "rotifer monophyly" tree. Attempts to break up long branches for problematic subsamples through increased taxon sampling reduced, but did not eliminate, LBA problems. Exhaustive analyses of all quartets for a subset of 50 sequences were performed in order to compare the performance of maximum likelihood, equal-weights parsimony, and two additional variants of parsimony; these methods do differ substantially in their rates of failure to recover trees consistent with well established, but highly unresolved phylogenies. Power analyses using simulations suggest that some incorrect inferences by maximum parsimony are due to statistical inconsistency and that when estimates of central branch lengths for certain quartets are very low, maximum-likelihood analyses have difficulty recovering accepted phylogenies even with large amounts of data. These examples demonstrate that LBA problems can occur in real data sets, and they provide an opportunity to investigate causes of incorrect inferences.  相似文献   

17.
It is well known among phylogeneticists that adding an extra taxon (e.g. species) to a data set can alter the structure of the optimal phylogenetic tree in surprising ways. However, little is known about this “rogue taxon” effect. In this paper we characterize the behavior of balanced minimum evolution (BME) phylogenetics on data sets of this type using tools from polyhedral geometry. First we show that for any distance matrix there exist distances to a “rogue taxon” such that the BME-optimal tree for the data set with the new taxon does not contain any nontrivial splits (bipartitions) of the optimal tree for the original data. Second, we prove a theorem which restricts the topology of BME-optimal trees for data sets of this type, thus showing that a rogue taxon cannot have an arbitrary effect on the optimal tree. Third, we computationally construct polyhedral cones that give complete answers for BME rogue taxon behavior when our original data fits a tree on four, five, and six taxa. We use these cones to derive sufficient conditions for rogue taxon behavior for four taxa, and to understand the frequency of the rogue taxon effect via simulation.  相似文献   

18.
Synecological analyses are usually based on typological, phenetic and cladistic methods. The disadvantages of these techniques are shown. The application of the Wagner parsimony method to synecology is considered. All the methods need some prerequisites, viz. definitions of localities and characters (the most simple one being the presence/absence of taxa); the choice of taxonomic level of taxa; their autochthony. The application of Wagner parsimony needs a new terminology. The congruence of any environmental condition, including freshwater monitoring indices, can be tested on parsimonious trees. The Wagner parsimony method not only provides various indices (tree length, CI, HI, RC, RI) which allow the comparison of trees but also minimal trees which are direct tools in synecology.  相似文献   

19.

Background  

Supertree methods comprise one approach to reconstructing large molecular phylogenies given multi-marker datasets: trees are estimated on each marker and then combined into a tree (the "supertree") on the entire set of taxa. Supertrees can be constructed using various algorithmic techniques, with the most common being matrix representation with parsimony (MRP). When the data allow, the competing approach is a combined analysis (also known as a "supermatrix" or "total evidence" approach) whereby the different sequence data matrices for each of the different subsets of taxa are concatenated into a single supermatrix, and a tree is estimated on that supermatrix.  相似文献   

20.
The effect of taxonomic sampling on phylogenetic accuracy under parsimony is examined by simulating nucleotide sequence evolution. Random error is minimized by using very large numbers of simulated characters. This allows estimation of the consistency behavior of parsimony, even for trees with up to 100 taxa. Data were simulated on 8 distinct 100-taxon model trees and analyzed as stratified subsets containing either 25 or 50 taxa, in addition to the full 100-taxon data set. Overall accuracy decreased in a majority of cases when taxa were added. However, the magnitude of change in the cases in which accuracy increased was larger than the magnitude of change in the cases in which accuracy decreased, so, on average, overall accuracy increased as more taxa were included. A stratified sampling scheme was used to assess accuracy for an initial subsample of 25 taxa. The 25-taxon analyses were compared to 50- and 100-taxon analyses that were pruned to include only the original 25 taxa. On average, accuracy for the 25 taxa was improved by taxon addition, but there was considerable variation in the degree of improvement among the model trees and across different rates of substitution.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号