首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
ANOTHER MONOPHYLY INDEX: REVISITING THE JACKKNIFE   总被引:1,自引:0,他引:1  
Abstract — Randomization routines have quickly gained wide usage in phylogenetic systematies. Introduced a decade ago, the jackknife has rarely been applied in cladistic methodology. This data resampling technique was re-investigated here as a means to discover the effect that taxon removal may have on the stability of the results obtained from parsimony analyses. This study shows that the removal of even a single taxon in an analysis can cause a solution of relatively few multiple equally parsimonious trees in an inclusive matrix to result in hundreds of equally parsimonious trees with the single removal of a taxon. On the other hand, removal of other taxa can stabilize the results to fewer trees. An index of clade stability, the Jackknife Monophyly Index (JMI) is developed which, like the bootstrap, applies a value to each clade according to its frequency of occurrence in jackknife pseudoreplicates. Unlike the bootstrap and earlier application of the jackknife, alternative suboptimal hypotheses are not forwarded by this method. Only those clades in the most parsimonious tree(s) are given JMI values. The behaviour of this index is investigated both in relation to a hypothetical and a real data set, as well as how it performs in comparison to the bootstrap. The JMI is found to not be influenced by uninformative characters or relative synapomorphy number, unlike the bootstrap.  相似文献   

2.

Background  

For parsimony analyses, the most common way to estimate confidence is by resampling plans (nonparametric bootstrap, jackknife), and Bremer support (Decay indices). The recent literature reveals that parameter settings that are quite commonly employed are not those that are recommended by theoretical considerations and by previous empirical studies. The optimal search strategy to be applied during resampling was previously addressed solely via standard search strategies available in PAUP*. The question of a compromise between search extensiveness and improved support accuracy for Bremer support received even less attention. A set of experiments was conducted on different datasets to find an empirical cut-off point at which increased search extensiveness does not significantly change Bremer support and jackknife or bootstrap proportions any more.  相似文献   

3.
Sequences of the small subunit (SSU) ribosomal RNA are considered useful for reconstructing the tree of life because this molecule is found in all organisms and is large enough not to have become saturated with multiple mutations. However, these data sets are large, difficult to align, and have extreme biases in base compositions which makes their phylogenetic signal ambiguous. Large ambiguous data sets may have many most-parsimonious trees, and finding them all may be impossible using convential phylogenetic methods. To examine the reliability of the number and relationships of eukaryotic kingdoms proposed by previous analyses of the SSU, we calculated trees from aligned sequences from eukaryotes in the Ribosomal Database Project using parsimony jackknifing which uses a resampling procedure to rapidly search large data sets for the branches that are strongly supported and eliminates poorly supported groups. Two separate analyses were carried out: an analysis in which all bases were equally weighted, and one in which transversions only were used. The parsimony jackknife procedure was able to efficiently find trees in which most major groups of eukaryotes were supported and in which some evolutionary hypotheses proposed by previous workers were tested. The relationships of these major groups to each other were largely unresolved, indicating that the SSU data, as represented in this database, is insufficient for answering questions about these deep branches. Interestingly, the analysis of transitions differs from the results of the entire data set, primarily being less resolved. This indicates that transversional mutations are important contributors to the resolved structure of the tree.  相似文献   

4.
The clade size effect refers to a bias that causes middle‐sized clades to be less supported than small or large‐sized clades. This bias is present in resampling measures of support calculated under maximum likelihood and maximum parsimony and in Bayesian posterior probabilities. Previous analyses indicated that the clade size effect is worst in maximum parsimony, followed by maximum likelihood, while Bayesian inference is the least affected. Homoplasy was interpreted as the main cause of the effect. In this study, we explored the presence of the clade size effect in alternative measures of branch support under maximum parsimony: Bremer support and symmetric resampling, expressed as absolute frequencies and frequency differences. Analyses were performed using 50 molecular and morphological matrices. Symmetric resampling showed the same tendency that bootstrap and jackknife did for maximum parsimony and maximum likelihood. Few matrices showed a significant bias using Bremer support, presenting a better performance than resampling measures of support and comparable to Bayesian posterior probabilities. Our results indicate that the problem is not maximum parsimony, but resampling measures of support. We corroborated the role of homoplasy as a possible cause of the clade size effect, increasing the number of random trees during the resampling, which together with the higher chances that medium‐sized clades have of being contradicted generates the bias during the perturbation of the original matrix, making it stronger in resampling measures of support.  相似文献   

5.
The kinesin superfamily across eukaryotes was used to examine how incorporation of gap characters scored from conserved regions shared by all members of a gene family and incorporation of amino acid and gap characters scored from lineage‐specific regions affect gene‐tree inference of the gene family as a whole. We addressed these two questions in the context of two different densities of sequence sampling, four alignment programs, and two methods of tree construction. Taken together, our findings suggest the following. First, gap characters should be incorporated into gene‐tree inference, even for divergent sequences. Second, gene regions that are not conserved among all or most sequences sampled should not be automatically discarded without evaluation of potential phylogenetic signal that may be contained in gap and/or sequence characters. Third, among the four alignment programs evaluated using their default alignment parameters, Clustal may be expected to output alignments that result in the greatest gene‐tree resolution and support. Yet, this high resolution and support should be regarded as optimistic, rather than conservative, estimates. Fourth, this same conclusion regarding resolution and support holds for Bayesian gene‐tree analyses relative to parsimony‐jackknife gene‐tree analyses. We suggest that a more conservative approach, such as aligning the sequences using DIALIGN‐T or MAFFT, analyzing the appropriate characters using parsimony, and assessing branch support using the jackknife, is more appropriate for inferring gene trees of divergent gene families. © The Willi Hennig Society 2007.  相似文献   

6.
In phylogenetic trees the addition and removal of taxa has large effects on tree topology, hence measures of branch support and tree stability should account for taxonomic composition. Currently no comprehensive system of composition-dependent parameters exists in any cladistic or phenetic strategy. We introduce several values and indices based on a modification of the original jackknife resampling. Their advantage is a complete evaluation and optimization of taxon composition in phylogenetic data. While related to the Jackknife Monophyly Index (JMI), our system of support measures expands beyond parsimony analyses, and includes indices estimating support for the entire phylogenetic tree based on individual branch supports.  相似文献   

7.
Non-random distributions of missing data are a general problem for likelihood-based statistical analyses, including those in a phylogenetic context. Extensive non-randomly distributed missing data are particularly problematic in supermatrix analyses that include many terminals and/or loci. It has been widely reported that missing data can lead to loss of resolution, but only very rarely create misleading or otherwise unsupported results in a parsimony context. Yet this does not hold for all parametric-based analyses because of their assumption of homogeneity across characters and lineages, which can lead to both long-branch attraction and long-branch repulsion. Contrived examples were used to demonstrate that non-random distributions of missing data, even without rate heterogeneity among characters and a well fitting model, can provide misleading likelihood-based topologies and branch-support values that are radically unstable based on slight modifications to character sampling. The same can occur despite complete absence of parsimony-informative characters. Otherwise unsupported resolution and high branch support for these clades were found to occur frequently in 22 empirical examples derived from a published supermatrix. Partitioning characters based on the distribution of missing data helped to decrease, but did not eliminate, these artifacts. These artifacts were exacerbated by low quality tree searches, particularly when holding only a single optimal tree that must be fully resolved.  相似文献   

8.
The bootstrap is an important tool for estimating the confidence interval of monophyletic groups within phylogenies. Although bootstrap analyses are used in most evolutionary studies, there is no clear consensus as how best to interpret bootstrap probability values. To study further the bootstrap method, nine small subunit ribosomal DNA (SSU rDNA) data sets were submitted to bootstrapped maximum parsimony (MP) analyses using unweighted and weighted sequence positions. Analyses of the lengths (i.e., parsimony steps) of the bootstrap trees show that the shape and mean of the bootstrap tree distribution may provide important insights into the evolutionary signal within the sequence data. With complex phylogenies containing nodes defined by short internal branches (multifurcations), the mean of the bootstrap tree distribution may differ by 2 standard deviations from the length of the best tree found from the original data set. Weighting sequence positions significantly increases the bootstrap values at internal nodes. There may, however, be strong bootstrap support for conflicting species groupings among different data sets. This phenomenon appears to result from a correlation between the topology of the tree used to create the weights and the topology of the bootstrap consensus tree inferred from the MP analysis of these weighted data. The analyses also show that characteristics of the bootstrap tree distribution (e.g., skewness) may be used to choose between alternative weighting schemes for phylogenetic analyses.  相似文献   

9.
Contemporary phylogenomic studies frequently incorporate two-step coalescent analyses wherein the first step is to infer individual-gene trees, generally using maximum-likelihood implemented in the popular programs PhyML or RAxML . Four concerns with this approach are that these programs only present a single fully resolved gene tree to the user despite potential for ambiguous support, insufficient phylogenetic signal to fully resolve each gene tree, inexact computer arithmetic affecting the reported likelihood of gene trees, and an exclusive focus on the most likely tree while ignoring trees that are only slightly suboptimal or within the error tolerance. Taken together, these four concerns are sufficient for RAxML and Phy ML users to be suspicious of the resulting (perhaps over-resolved) gene-tree topologies and (perhaps unjustifiably high) bootstrap support for individual clades. In this study, we sought to determine how frequently these concerns apply in practice to contemporary phylogenomic studies that use RAxML for gene-tree inference. We did so by re-analyzing 100 genes from each of ten studies that, taken together, are representative of many empirical phylogenomic studies. Our seven findings are as follows. First, the few search replicates that are frequently applied in phylogenomic studies are generally insufficient to find the optimal gene-tree topology. Second, there is often more topological variation among slightly suboptimal gene trees relative to the best-reported tree than can be safely ignored. Third, the Shimodaira–Hasegawa-like approximate likelihood ratio test is highly effective at identifying dubiously supported clades and outperforms the alternative approaches of relying on bootstrap support or collapsing minimum-length branches. Fourth, the bootstrap can, but rarely does, indicate high support for clades that are not supported amongst slightly suboptimal trees. Fifth, increasing the accuracy by which RA xML optimizes model-parameter values generally has a nominal effect on selection of optimal trees. Sixth, tree searches using the GTRCAT model were generally less effective at finding optimal known trees than those using the GTRGAMMA model. Seventh, choice of gene-tree sampling strategy can affect inferred coalescent branch lengths, species-tree topology and branch support.  相似文献   

10.
Phylogenetic relationships among embryophytes (tracheophytes, mosses, liverworts, and hornworts) were examined using 21 newly generated mitochondrial small-subunit (19S) rDNA sequences. The "core" 19S rDNA contained more phylogenetically informative sites and lower homoplasy than either nuclear 18S or plastid 16S rDNA. Results of phylogenetic analyses using parsimony (MP) and likelihood (ML) were generally congruent. Using MP, two trees were obtained that resolved either liverworts or hornworts as the basal land plant clade. The optimal ML tree showed hornworts as basal. That topology was not statistically different from the two MP trees, thus both appear to be equally viable evolutionary hypotheses. High bootstrap support was obtained for the majority of higher level embryophyte clades named in a recent morphologically based classification, e.g., Tracheophyta, Euphyllophytina, Lycophytina, and Spermatophytata. Strong support was also obtained for the following monophyletic groups: hornworts, liverworts, mosses, lycopsids, leptosporangiate and eusporangiate ferns, gymnosperms and angiosperms. This molecular analysis supported a sister relationship between Equisetum and leptosporangiate ferns and a monophyletic gymnosperms sister to angiosperms. The topologies of deeper clades were affected by taxon inclusion (particularly hornworts) as demonstrated by jackknife analyses. This study represents the first use of mitochondrial 19S rDNA for phylogenetic purposes and it appears well-suited for examining intermediate to deep evolutionary relationships among embryophytes.  相似文献   

11.
The success of resampling approaches to branch support depends on the effectiveness of the underlying tree searches. Two primary factors are identified as key: the depth of tree search and the number of trees saved per resampling replicate. Two datasets were explored for a range of search parameters using jackknifing. Greater depth of tree search tends to increase support values because shorter trees conflict less with each other, while increasing numbers of trees saved tends to reduce support values because of conflict that reduces structure in the replicate consensus. Although a relatively small amount of branch swapping will achieve near‐accurate values for a majority of clades, some clades do not yield accurate values until more extensive searches are performed. This means that in order to maximize the accuracy of resampling analyses, one should employ as extensive a search strategy as possible, and save as many trees per replicate as possible. Strict consensus summary of resampling replicates is preferable to frequency‐within‐replicates summary because it is a more conservative approach to the reporting of replicate results. Jackknife analysis is preferable to bootstrap because of its closer relationship to the original data.© The Willi Hennig Society 2010.  相似文献   

12.
Even when the maximum likelihood (ML) tree is a better estimate of the true phylogenetic tree than those produced by other methods, the result of a poor ML search may be no better than that of a more thorough search under some faster criterion. The ability to find the globally optimal ML tree is therefore important. Here, I compare a range of heuristic search strategies (and their associated computer programs) in terms of their success at locating the ML tree for 20 empirical data sets with 14 to 158 sequences and 411 to 120,762 aligned nucleotides. Three distinct topics are discussed: the success of the search strategies in relation to certain features of the data, the generation of starting trees for the search, and the exploration of multiple islands of trees. As a starting tree, there was little difference among the neighbor-joining tree based on absolute differences (including the BioNJ tree), the stepwise-addition parsimony tree (with or without nearest-neighbor-interchange (NNI) branch swapping), and the stepwise-addition ML tree. The latter produced the best ML score on average but was orders of magnitude slower than the alternatives. The BioNJ tree was second best on average. As search strategies, star decomposition and quartet puzzling were the slowest and produced the worst ML scores. The DPRml, IQPNNI, MultiPhyl, PhyML, PhyNav, and TreeFinder programs with default options produced qualitatively similar results, each locating a single tree that tended to be in an NNI suboptimum (rather than the global optimum) when the data set had low phylogenetic information. For such data sets, there were multiple tree islands with very similar ML scores. The likelihood surface only became relatively simple for data sets that contained approximately 500 aligned nucleotides for 50 sequences and 3,000 nucleotides for 100 sequences. The RAxML and GARLI programs allowed multiple islands to be explored easily, but both programs also tended to find NNI suboptima. A newly developed version of the likelihood ratchet using PAUP* successfully found the peaks of multiple islands, but its speed needs to be improved.  相似文献   

13.
A new consensus method for summarizing competing phylogenetic hypotheses, weighted compromise, is described. The method corrects for a bias inherent in majority‐rule consensus/compromise trees when the source trees exhibit non‐independence due to ambiguity in terminal clades. Suggestions are given for its employment in parsimony analyses and tree resampling strategies such as bootstrapping and jackknifing. An R function is described that can be used with the programming language R to produce the consensus.  相似文献   

14.
Phylogenetic relationships of mushrooms and their relatives within the order Agaricales were addressed by using nuclear large subunit ribosomal DNA sequences. Approximately 900 bases of the 5' end of the nucleus-encoded large subunit RNA gene were sequenced for 154 selected taxa representing most families within the Agaricales. Several phylogenetic methods were used, including weighted and equally weighted parsimony (MP), maximum likelihood (ML), and distance methods (NJ). The starting tree for branch swapping in the ML analyses was the tree with the highest ML score among previously produced MP and NJ trees. A high degree of consensus was observed between phylogenetic estimates obtained through MP and ML. NJ trees differed according to the distance model that was used; however, all NJ trees still supported most of the same terminal groupings as the MP and ML trees did. NJ trees were always significantly suboptimal when evaluated against the best MP and ML trees, by both parsimony and likelihood tests. Our analyses suggest that weighted MP and ML provide the best estimates of Agaricales phylogeny. Similar support was observed between bootstrapping and jackknifing methods for evaluation of tree robustness. Phylogenetic analyses revealed many groups of agaricoid fungi that are supported by moderate to high bootstrap or jackknife values or are consistent with morphology-based classification schemes. Analyses also support separate placement of the boletes and russules, which are basal to the main core group of gilled mushrooms (the Agaricineae of Singer). Examples of monophyletic groups include the families Amanitaceae, Coprinaceae (excluding Coprinus comatus and subfamily Panaeolideae), Agaricaceae (excluding the Cystodermateae), and Strophariaceae pro parte (Stropharia, Pholiota, and Hypholoma); the mycorrhizal species of Tricholoma (including Leucopaxillus, also mycorrhizal); Mycena and Resinomycena; Termitomyces, Podabrella, and Lyophyllum; and Pleurotus with Hohenbuehelia. Several groups revealed by these data to be nonmonophyletic include the families Tricholomataceae, Cortinariaceae, and Hygrophoraceae and the genera Clitocybe, Omphalina, and Marasmius. This study provides a framework for future systematics studies in the Agaricales and suggestions for analyzing large molecular data sets.  相似文献   

15.
Most plant phylogenetic inference has used DNA sequence data from the plastid genome. This genome represents a single genealogical sample with no recombination among genes, potentially limiting the resolution of evolutionary relationships in some contexts. In contrast, nuclear DNA is inherently more difficult to employ for phylogeny reconstruction because major mutational events in the genome, including polyploidization, gene duplication, and gene extinction can result in homologous gene copies that are difficult to identify as orthologs or paralogs. Gene tree parsimony (GTP) can be used to infer the rooted species tree by fitting gene genealogies to species trees while simultaneously minimizing the estimated number of duplications needed to reconcile conflicts among them. Here, we use GTP for five nuclear gene families and a previously published plastid data set to reconstruct the phylogenetic backbone of the aquatic plant family Pontederiaceae. Plastid-based phylogenetic studies strongly supported extensive paraphyly of Eichhornia (one of the four major genera) but also depicted considerable ambiguity concerning the true root placement for the family. Our results indicate that species trees inferred from the nuclear genes (alone and in combination with the plastid data) are highly congruent with gene trees inferred from plastid data alone. Consideration of optimal and suboptimal gene tree reconciliations place the root of the family at (or near) a branch leading to the rare and locally restricted E. meyeri. We also explore methods to incorporate uncertainty in individual gene trees during reconciliation by considering their individual bootstrap profiles and relate inferred excesses of gene duplication events on individual branches to whole-genome duplication events inferred for the same branches. Our study improves understanding of the phylogenetic history of Pontederiaceae and also demonstrates the utility of GTP for phylogenetic analysis.  相似文献   

16.
Several large phylogenomic analyses have recently cast doubt on long‐held beliefs about early metazoan phylogenetic patterns. Those data sets, and the relative bootstrap support for various controversial clades, are reanalysed in the context of parsimony, yielding results that are at considerable odds with the original likelihood or Bayesian findings. Discrepancies are considered in light of the tendency of RAxML to overestimate support values by virtue (sic) of its lazy search algorithm and its autocorrelated pseudoreplication as well as the extraordinary ability for Bayesian analyses to be led astray by missing data. In addition to standard nonparametric bootstrapping as a measure of support, a new strategy involving resampling loci as units, partition bootstrap support, is introduced as a more defensible alternative to resampling nonindependent sites. © The Willi Hennig Society 2009.  相似文献   

17.
Despite the growing popularity of supertree construction for combining phylogenetic information to produce more inclusive phylogenies, large-scale performance testing of this method has not been done. Through simulation, we tested the accuracy of the most widely used supertree method, matrix representation with parsimony analysis (MRP), with respect to a (maximum parsimony) total evidence solution and a known model tree. When source trees overlap completely, MRP provided a reasonable approximation of the total evidence tree; agreement was usually > 85%. Performance improved slightly when using smaller, more numerous, or more congruent source trees, and especially when elements were weighted in proportion to the bootstrap frequencies of the nodes they represented on each source tree ("weighted MRP"). Although total evidence always estimated the model tree slightly better than nonweighted MRP methods, weighted MRP in turn usually out-performed total evidence slightly. When source studies were even moderately nonoverlapping (i.e., sharing only three-quarters of the taxa), the high proportion of missing data caused a loss in resolution that severely degraded the performance for all methods, including total evidence. In such cases, even combining more trees, which had positive effects elsewhere, did not improve accuracy. Instead, "seeding" the supertree or total evidence analyses with a single largely complete study improved performance substantially. This finding could be an important strategy for any studies that seek to combine phylogenetic information. Overall, our results suggest that MRP supertree construction provides a reasonable approximation of a total evidence solution and that weighted MRP should be used whenever possible.  相似文献   

18.
We compared general behaviour trends of resampling methods (bootstrap, bootstrap with Poisson distribution, jackknife, and jackknife with symmetric resampling) and different ways to summarize the results for resampling (absolute frequency, F, and frequency difference, GC') for real data sets under variable resampling strengths in three weighting schemes. We propose an equivalence between bootstrap and jackknife in order to make bootstrap variable across different resampling strengths. Specifically, for each method we evaluated the number of spurious groups (groups not present in the strict consensus of the unaltered data set), of real groups, and of inconsistencies in ranking of groups under variable resampling strengths. We found that GC' always generated more spurious groups and recovered more groups than F. Bootstrap methods generated more spurious groups than jackknife methods; and jackknife is the method that recovered more real groups. We consistently obtained a higher proportion of spurious groups for GC' than for F; and for bootstrap than for jackknife. Finally, we evaluated the ranking of groups under variable resampling strengths qualitatively in the trajectories of "support" against resampling strength, and quantitatively with Kendall coefficient values. We found fewer ranking inconsistencies for GC' than for F, and for bootstrap than for jackknife.
© The Willi Hennig Society 2009.  相似文献   

19.
PARSIMONY JACKKNIFING OUTPERFORMS NEIGHBOR-JOINING   总被引:25,自引:1,他引:25  
Abstract— Because they are designed to produced just one tree, neighbor-joining programs can obscure ambiguities in data. Ambiguities can be uncovered by resampling, but existing neighbor-joining programs may give misleading bootstrap frequencies because they do not suppress zero-length branches and/or are sensitive to the order of terminals in the data. A new procedure, parsimony jackknifing, overcomes these problems while running hundreds of times faster than existing programs for neighbor-joining bootstrapping. For analysis of large matrices, parsimony jackknifing is hundreds of thousands of times faster than extensive branch-swapping, yet is better able to screen out poorly-supported groups.  相似文献   

20.
There has been a sort of cottage industry in the development of randomization routines in systematics beginning with the bootstrap and the jackknife and, in a sense, culminating with various Monte Carlo routines that have been used to assess the performance of phylogenetic methods in limiting circumstances. These methods can be segregated into three basic areas of interest: measures of support such as bootstrap, jackknife, Permutation Tail Probability, T‐PTP, and MoJo; measures of how well independent data are correlated in a phylogenetic framework like PCP for coevolution and Manhattan Stratigraphic Measure (MSM) for stratigraphy; and simulation‐based Monte‐Carlo methods for ascertaining relative performance of optimality criteria or coding methods. Although one approach to assessing cospeciation questions has been the randomization of, for example, hosts and parasite trees, it is well established that in questions that are of a correlative type, the association themselves are what should be permuted. This has been applied to Brooks' parsimony analysis previously and here to the recent reconciled tree approach to these questions. Although it is debatable whether the extrinsic temporal position of a fossil can stand as refutation of intrinsic morphological character‐based cladograms, one can, nonetheless, determine the strength and significance of fit of stratigraphic data to a cladogram. The only method available in this regard that has been shown to not be biased by tree shape is the MSM and modifications of that. Another similar approach that is new is applied to evaluating the historical informativeness of base composition biases. Incongruence length difference tests too are essentially correlative in nature and comparing the behavior of “perceived” partitions to randomly determined partitions of the same size has become the standard for interpreting the relative conflict between differently acquired data. Unlike the foregoing, which make full use of the observed structure of the data, Monte Carlo methods require the input of parameters or of models and in that sense the results tend to be lacking in verisimilitude. Nonetheless, these kinds of questions seem to have been those most widely promulgated in our field. The well‐established theoretical proposition that parsimony has problems with adjacent long‐branches was of course illustrated through such methods, much to the concern and angst of systematists. That likelihood later was shown to perform worse than parsimony when those long branches might repel each other has generated less concern and angst. But then many such circumstances can be divined, like the “short‐branch‐mess” problem wherein likelihood has difficulty placing just a single long branch. Overall, then, in the interpretation of these or any other Monte Carlo issues it will be important to critically examine the structure of the modeled process and the scope of inferences that can be drawn therefrom. Modeling situations that are bound to yield results favorable to only one approach (such as unrealistic even splitting of ancestral populations at unrealistically predictable times in examination of the coding of polymorphic data) should be viewed with great caution. More to the point, since history is singular and not repeatable, the utility of statistical approaches may itself be dubious except in very special circumstances—most of the requirements for stochasticity and independence can never be met.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号