首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Quantifying branch support using the bootstrap and/or jackknife is generally considered to be an essential component of rigorous parsimony and maximum likelihood phylogenetic analyses. Previous authors have described how application of the frequency-within-replicates approach to treating multiple equally optimal trees found in a given bootstrap pseudoreplicate can provide apparent support for otherwise unsupported clades. We demonstrate how a similar problem may occur when a non-representative subset of equally optimal trees are held per pseudoreplicate, which we term the undersampling-within-replicates artifact. We illustrate the frequency-within-replicates and undersampling-within-replicates bootstrap and jackknife artifacts using both contrived and empirical examples, demonstrate that the artifacts can occur in both parsimony and likelihood analyses, and show that the artifacts occur in outputs from multiple different phylogenetic-inference programs. Based on our results, we make the following five recommendations, which are particularly relevant to supermatrix analyses, but apply to all phylogenetic analyses. First, when two or more optimal trees are found in a given pseudoreplicate they should be summarized using the strict-consensus rather than frequency-within-replicates approach. Second jackknife resampling should be used rather than bootstrap resampling. Third, multiple tree searches while holding multiple trees per search should be conducted in each pseudoreplicate rather than conducting only a single search and holding only a single tree. Fourth, branches with a minimum possible optimized length of zero should be collapsed within each tree search rather than collapsing branches only if their maximum possible optimized length is zero. Fifth, resampling values should be mapped onto the strict consensus of all optimal trees found rather than simply presenting the ≥ 50% bootstrap or jackknife tree or mapping the resampling values onto a single optimal tree.  相似文献   

2.
We compared general behaviour trends of resampling methods (bootstrap, bootstrap with Poisson distribution, jackknife, and jackknife with symmetric resampling) and different ways to summarize the results for resampling (absolute frequency, F, and frequency difference, GC') for real data sets under variable resampling strengths in three weighting schemes. We propose an equivalence between bootstrap and jackknife in order to make bootstrap variable across different resampling strengths. Specifically, for each method we evaluated the number of spurious groups (groups not present in the strict consensus of the unaltered data set), of real groups, and of inconsistencies in ranking of groups under variable resampling strengths. We found that GC' always generated more spurious groups and recovered more groups than F. Bootstrap methods generated more spurious groups than jackknife methods; and jackknife is the method that recovered more real groups. We consistently obtained a higher proportion of spurious groups for GC' than for F; and for bootstrap than for jackknife. Finally, we evaluated the ranking of groups under variable resampling strengths qualitatively in the trajectories of "support" against resampling strength, and quantitatively with Kendall coefficient values. We found fewer ranking inconsistencies for GC' than for F, and for bootstrap than for jackknife.
© The Willi Hennig Society 2009.  相似文献   

3.

Background  

In recent years, gene order data has attracted increasing attention from both biologists and computer scientists as a new type of data for phylogenetic analysis. If gene orders are viewed as one character with a large number of states, traditional bootstrap procedures cannot be applied. Researchers began to use a jackknife resampling method to assess the quality of gene order phylogenies.  相似文献   

4.
The clade size effect refers to a bias that causes middle‐sized clades to be less supported than small or large‐sized clades. This bias is present in resampling measures of support calculated under maximum likelihood and maximum parsimony and in Bayesian posterior probabilities. Previous analyses indicated that the clade size effect is worst in maximum parsimony, followed by maximum likelihood, while Bayesian inference is the least affected. Homoplasy was interpreted as the main cause of the effect. In this study, we explored the presence of the clade size effect in alternative measures of branch support under maximum parsimony: Bremer support and symmetric resampling, expressed as absolute frequencies and frequency differences. Analyses were performed using 50 molecular and morphological matrices. Symmetric resampling showed the same tendency that bootstrap and jackknife did for maximum parsimony and maximum likelihood. Few matrices showed a significant bias using Bremer support, presenting a better performance than resampling measures of support and comparable to Bayesian posterior probabilities. Our results indicate that the problem is not maximum parsimony, but resampling measures of support. We corroborated the role of homoplasy as a possible cause of the clade size effect, increasing the number of random trees during the resampling, which together with the higher chances that medium‐sized clades have of being contradicted generates the bias during the perturbation of the original matrix, making it stronger in resampling measures of support.  相似文献   

5.
In addition to hypothesis optimality, the evaluation of clade (group, edge, split, node) support is an important aspect of phylogenetic analysis. Here we clarify the logical relationship between support and optimality and formulate adequacy conditions for support measures. Support, S, and optimality, O, are both empirical knowledge claims about the strength of hypotheses, h1, h2, …hn, in relation to evidence, e, given background knowledge, b. Whereas optimality refers to the absolute strength of hypotheses, support refers to the relative strength of hypotheses. Consequently, support and optimality are logically related such that they vary in direct proportion to each other, S(h | e,b) ∝ O(h | e,b). Furthermore, in order for a support measure to be objective it must quantify support as a function of explanatory power. For example, Goodman–Bremer support and ratio of explanatory power (REP) support satisfy the adequacy requirement S(h | e,b) ∝ O(h | e,b) and calculate support as a function of explanatory power. As such, these are adequate measures of objective support. The equivalent measures for statistical optimality criteria are the likelihood ratio (or log‐likelihood difference) and likelihood difference support measures for maximum likelihood and the posterior probability ratio and posterior probability difference support measures for Bayesian inference. These statistical support measures satisfy the adequacy requirement S(h | e,b) ∝ O(h | e,b) and to that extent are internally consistent; however, they do not quantify support as a function of explanatory power and therefore are not measures of objective support. Neither the relative fit difference (RFD; relative GB support) nor any of the parsimony (bootstrap and jackknife character resampling) or statistical [bootstrap character resampling, Markov chain Monte Carlo (MCMC) clade frequencies] support measures based on clade frequencies satisfy the adequacy condition S(h | e,b) ∝ O(h | e,b) or calculate support as a function of explanatory power. As such, they are not adequate support measures. © The Willi Hennig Society 2008.  相似文献   

6.

Background  

Many analyses of microarray association studies involve permutation, bootstrap resampling and cross-validation, that are ideally formulated as embarrassingly parallel computing problems. Given that these analyses are computationally intensive, scalable approaches that can take advantage of multi-core processor systems need to be developed.  相似文献   

7.

Background  

Hierarchical clustering is a widely applied tool in the analysis of microarray gene expression data. The assessment of cluster stability is a major challenge in clustering procedures. Statistical methods are required to distinguish between real and random clusters. Several methods for assessing cluster stability have been published, including resampling methods such as the bootstrap.  相似文献   

8.
Grant and Kluge (2003) associated resampling measures of group support with the aim of evaluating statistical stability, confidence, or the probability of recovering a true phylogenetic group. This interpretation is not necessary to methods such as jackknifing or bootstrapping, which are better interpreted as measures of support from the current dataset. Grant and Kluge only accepted the absolute Bremer value as a measure of group support, and considered resampling methods as irrelevant to phylogenetic inference. It is shown that under simple circumstances resampling indices better reflect the degree of support than Bremer values. Grant and Kluge associated the resampling methods (and the use of measures of group support in general) with what they call a “verificationist agenda”, where strongly supported groups are first detected, and then protected against additional testing. They propose that identifying weakly supported groups, and then concentrating additional tests on them, will better serve science. Both programs are actually equivalent, and inert as to the selection of methods to estimate group support. The ranking of groups under a range of resampling strength is proposed as an additional criterion to evaluate resampling methods. A reexamination of the slope of symmetric resampling frequency as a function of resampling strength suggest that slopes can be problematic as well as a measure of group support. © The Willi Hennig Society 2005.  相似文献   

9.

Background  

Non-parametric bootstrapping is a widely-used statistical procedure for assessing confidence of model parameters based on the empirical distribution of the observed data [1] and, as such, it has become a common method for assessing tree confidence in phylogenetics [2]. Traditional non-parametric bootstrapping does not weigh each tree inferred from resampled (i.e., pseudo-replicated) sequences. Hence, the quality of these trees is not taken into account when computing bootstrap scores associated with the clades of the original phylogeny. As a consequence, traditionally, the trees with different bootstrap support or those providing a different fit to the corresponding pseudo-replicated sequences (the fit quality can be expressed through the LS, ML or parsimony score) contribute in the same way to the computation of the bootstrap support of the original phylogeny.  相似文献   

10.
In this study, we used an empirical example based on 100 mitochondrial genomes from higher teleost fishes to compare the accuracy of parsimony-based jackknife values with Bayesian support values. Phylogenetic analyses of 366 partitions, using differential taxon and character sampling from the entire data matrix of 100 taxa and 7,990 characters, were performed for both phylogenetic methods. The tree topology and branch-support values from each partition were compared with the tree inferred from all taxa and characters. Using this approach, we quantified the accuracy of the branch-support values assigned by the jackknife and Bayesian methods, with respect to each of 15 basal clades. In comparing the jackknife and Bayesian methods, we found that (1) both measures of support differ significantly from an ideal support index; (2) the jackknife underestimated support values; (3) the Bayesian method consistently overestimated support; (4) the magnitude by which Bayesian values overestimate support exceeds the magnitude by which the jackknife underestimates support; and (5) both methods performed poorly when taxon sampling was increased and character sampling was not increases. These results indicate that (1) the higher Bayesian support values are inappropriate (in magnitude), and (2) Bayesian support values should not be interpreted as probabilities that clades are correctly resolved. We advocate the continued use of the relatively conservative bootstrap and jackknife approaches to estimating branch support rather than the more extreme overestimates provided by the Markov Chain Monte Carlo-based Bayesian methods.  相似文献   

11.

Background

Higher-level relationships within the Lepidoptera, and particularly within the species-rich subclade Ditrysia, are generally not well understood, although recent studies have yielded progress. We present the most comprehensive molecular analysis of lepidopteran phylogeny to date, focusing on relationships among superfamilies.

Methodology / Principal Findings

483 taxa spanning 115 of 124 families were sampled for 19 protein-coding nuclear genes, from which maximum likelihood tree estimates and bootstrap percentages were obtained using GARLI. Assessment of heuristic search effectiveness showed that better trees and higher bootstrap percentages probably remain to be discovered even after 1000 or more search replicates, but further search proved impractical even with grid computing. Other analyses explored the effects of sampling nonsynonymous change only versus partitioned and unpartitioned total nucleotide change; deletion of rogue taxa; and compositional heterogeneity. Relationships among the non-ditrysian lineages previously inferred from morphology were largely confirmed, plus some new ones, with strong support. Robust support was also found for divergences among non-apoditrysian lineages of Ditrysia, but only rarely so within Apoditrysia. Paraphyly for Tineoidea is strongly supported by analysis of nonsynonymous-only signal; conflicting, strong support for tineoid monophyly when synonymous signal was added back is shown to result from compositional heterogeneity.

Conclusions / Significance

Support for among-superfamily relationships outside the Apoditrysia is now generally strong. Comparable support is mostly lacking within Apoditrysia, but dramatically increased bootstrap percentages for some nodes after rogue taxon removal, and concordance with other evidence, strongly suggest that our picture of apoditrysian phylogeny is approximately correct. This study highlights the challenge of finding optimal topologies when analyzing hundreds of taxa. It also shows that some nodes get strong support only when analysis is restricted to nonsynonymous change, while total change is necessary for strong support of others. Thus, multiple types of analyses will be necessary to fully resolve lepidopteran phylogeny.  相似文献   

12.
The phylogenetic relationships of 22 species of Coelopidae are reconstructed based on a data matrix consisting of morphological and DNA sequence characters (16S rDNA, EF-1alpha). Optimal gap and transversion costs are determined via a sensitivity analysis and both equal weighting and a transversion cost of 2 are found to perform best based on taxonomic congruence, character incongruence, and tree support. The preferred phylogenetic hypothesis is fully resolved and well-supported by jackknife, bootstrap, and Bremer support values, but it is in conflict with the cladogram based on morphological characters alone. Most notably, the Coelopidae and the genus Coelopa are not monophyletic. However, partitioned Bremer Support and an analysis of node stability under different gap and transversion costs reveal that the critical clades rendering these taxa non-monophyletic are poorly supported. Furthermore, the monophyly of Coelopidae and Coelopa is not rejected in analyses using 16S rDNA that was manually aligned. The resolution of the tree based on this reduced data sets is, however, lower than for the tree based on the full data sets. Partitioned Bremer support values reveal that 16S rDNA characters provide the largest amount of tree support, but the support values are heavily dependent on analysis conditions. Problems with direct comparison of branch support values for trees derived using fixed alignments with those obtained under optimization alignment are discussed. Biogeographic history and available behavioral and genetic data are also discussed in light of this first cladogram for Coelopidae based on a quantitative phylogenetic analysis.  相似文献   

13.
ANOTHER MONOPHYLY INDEX: REVISITING THE JACKKNIFE   总被引:1,自引:0,他引:1  
Abstract — Randomization routines have quickly gained wide usage in phylogenetic systematies. Introduced a decade ago, the jackknife has rarely been applied in cladistic methodology. This data resampling technique was re-investigated here as a means to discover the effect that taxon removal may have on the stability of the results obtained from parsimony analyses. This study shows that the removal of even a single taxon in an analysis can cause a solution of relatively few multiple equally parsimonious trees in an inclusive matrix to result in hundreds of equally parsimonious trees with the single removal of a taxon. On the other hand, removal of other taxa can stabilize the results to fewer trees. An index of clade stability, the Jackknife Monophyly Index (JMI) is developed which, like the bootstrap, applies a value to each clade according to its frequency of occurrence in jackknife pseudoreplicates. Unlike the bootstrap and earlier application of the jackknife, alternative suboptimal hypotheses are not forwarded by this method. Only those clades in the most parsimonious tree(s) are given JMI values. The behaviour of this index is investigated both in relation to a hypothetical and a real data set, as well as how it performs in comparison to the bootstrap. The JMI is found to not be influenced by uninformative characters or relative synapomorphy number, unlike the bootstrap.  相似文献   

14.
Several large phylogenomic analyses have recently cast doubt on long‐held beliefs about early metazoan phylogenetic patterns. Those data sets, and the relative bootstrap support for various controversial clades, are reanalysed in the context of parsimony, yielding results that are at considerable odds with the original likelihood or Bayesian findings. Discrepancies are considered in light of the tendency of RAxML to overestimate support values by virtue (sic) of its lazy search algorithm and its autocorrelated pseudoreplication as well as the extraordinary ability for Bayesian analyses to be led astray by missing data. In addition to standard nonparametric bootstrapping as a measure of support, a new strategy involving resampling loci as units, partition bootstrap support, is introduced as a more defensible alternative to resampling nonindependent sites. © The Willi Hennig Society 2009.  相似文献   

15.

Background

Using gene order as a phylogenetic character has the potential to resolve previously unresolved species relationships. This character was used to resolve the evolutionary history within the genus Prochlorococcus, a group of marine cyanobacteria.

Methodology/Principal Findings

Orthologous gene sets and their genomic positions were identified from 12 species of Prochlorococcus and 1 outgroup species of Synechococcus. From this data, inversion and breakpoint distance-based phylogenetic trees were computed by GRAPPA and FastME. Statistical support of the resulting topology was obtained by application of a 50% jackknife resampling technique. The result was consistent and congruent with nucleotide sequence-based and gene-content based trees. Also, a previously unresolved clade was resolved, that of MIT9211 and SS120.

Conclusions/Significance

This is the first study to use gene order data to resolve a bacterial phylogeny at the genus level. It suggests that the technique is useful in resolving the Tree of Life.  相似文献   

16.
A covariance estimator for GEE with improved small-sample properties   总被引:2,自引:0,他引:2  
Mancl LA  DeRouen TA 《Biometrics》2001,57(1):126-134
In this paper, we propose an alternative covariance estimator to the robust covariance estimator of generalized estimating equations (GEE). Hypothesis tests using the robust covariance estimator can have inflated size when the number of independent clusters is small. Resampling methods, such as the jackknife and bootstrap, have been suggested for covariance estimation when the number of clusters is small. A drawback of the resampling methods when the response is binary is that the methods can break down when the number of subjects is small due to zero or near-zero cell counts caused by resampling. We propose a bias-corrected covariance estimator that avoids this problem. In a small simulation study, we compare the bias-corrected covariance estimator to the robust and jackknife covariance estimators for binary responses for situations involving 10-40 subjects with equal and unequal cluster sizes of 16-64 observations. The bias-corrected covariance estimator gave tests with sizes close to the nominal level even when the number of subjects was 10 and cluster sizes were unequal, whereas the robust and jackknife covariance estimators gave tests with sizes that could be 2-3 times the nominal level. The methods are illustrated using data from a randomized clinical trial on treatment for bone loss in subjects with periodontal disease.  相似文献   

17.
An analysis of the relationship between the number of loci utilized in an electrophoretic study of genetic relationships and the statistical support for the topology of UPGMA trees is reported for two published data sets. These are Highton and Larson (Syst. Zool.28: 579-599, 1979), an analysis of the relationships of 28 species of plethodonine salamanders, and Hedges (Syst. Zool., 35: 1-21, 1986), a similar study of 30 taxa of Holarctic hylid frogs. As the number of loci increases, the statistical support for the topology at each node in UPGMA trees was determined by both the bootstrap and jackknife methods. The results show that the bootstrap and jackknife probabilities supporting the topology at some nodes of UPGMA trees increase as the number of loci utilized in a study is increased, as expected for nodes that have groupings that reflect phylogenetic relationships. The pattern of increase varies and is especially rapid in the case of groups with no close relatives. At nodes that likely do not represent correct phylogenetic relationships, the bootstrap probabilities do not increase and often decline with the addition of more loci.  相似文献   

18.

Background  

We have developed a new haplotyping program based on the combination of an iterative multiallelic EM algorithm (IEM), bootstrap resampling and a pseudo Gibbs sampler. The use of the IEM-bootstrap procedure considerably reduces the space of possible haplotype configurations to be explored, greatly reducing computation time, while the adaptation of the Gibbs sampler with a recombination model on this restricted space maintains high accuracy. On large SNP datasets (>30 SNPs), we used a segmented approach based on a specific partition-ligation strategy. We compared this software, Ishape (Iterative Segmented HAPlotyping by Em), with reference programs such as Phase, Fastphase, and PL-EM. Analogously with Phase, there are 2 versions of Ishape: Ishape1 which uses a simple coalescence model for the pseudo Gibbs sampler step, and Ishape2 which uses a recombination model instead.  相似文献   

19.
We report and analyze nucleotide sequence variation in the first exon (1158 bp) of the nuclear gene encoding the Interphotoreceptor Retinoid Binding Protein (IRBP) among 21 species representing all 15 currently recognized genera of living didelphids. Six previously published IRBP sequences representing five nondidelphimorph marsupial orders were also analyzed to test didelphid monophyly, and 12 published sequences representing ten placental orders were used as outgroups. No gaps (indels) are necessary to align didelphid sequences, but one short region (35 bp) is alignment-ambiguous among nondidelphids. Uncorrected pairwise sequence divergence ranges from 0.7 to 5.7% among nonconspecific didelphids, from 9.2 to 15.3% between didelphids and nondidelphid marsupials, and from 24.9 to 32.1% between marsupials and placentals. Neither transitions nor transversions exhibit saturation for any codon position at any level of taxonomic comparison. Parsimony analyses of these data provide strong support (bootstrap values >95%, Bremer values 7) for the monophyly of (1) Didelphidae ("caluromyines" + Didelphinae); (2) a group containing Caluromys and Caluromysiops; (3) Didelphinae; (4) a group of large opossums that includes Metachirus; (5) a group containing the remaining large opossums (with 2N = 22 chromosomes); (6) a group containing Marmosa and Micoureus; (7) a group containing Thylamys, Lestodelphys, and Gracilinanus; and (8) a group containing the last three genera plus a monophyletic Marmosops. In addition, we found moderate support (bootstrap values >80%, Bremer values 2) for the monophyly of Thylamys + Lestodelphys and for a sister-group relationship between Monodelphis and Marmosa + Micoureus. Sensitivity analysis suggests that all of these clades, together with their associated levels of bootstrap and Bremer support, are robust to alternative hypotheses of positional homology within the ambiguously alignable region. Although some of the relationships supported by IRBP are not consistent with the results of published morphological analyses, our reassessment of the morphological data suggests that many conflicts are more apparent than real.  相似文献   

20.
The success of resampling approaches to branch support depends on the effectiveness of the underlying tree searches. Two primary factors are identified as key: the depth of tree search and the number of trees saved per resampling replicate. Two datasets were explored for a range of search parameters using jackknifing. Greater depth of tree search tends to increase support values because shorter trees conflict less with each other, while increasing numbers of trees saved tends to reduce support values because of conflict that reduces structure in the replicate consensus. Although a relatively small amount of branch swapping will achieve near‐accurate values for a majority of clades, some clades do not yield accurate values until more extensive searches are performed. This means that in order to maximize the accuracy of resampling analyses, one should employ as extensive a search strategy as possible, and save as many trees per replicate as possible. Strict consensus summary of resampling replicates is preferable to frequency‐within‐replicates summary because it is a more conservative approach to the reporting of replicate results. Jackknife analysis is preferable to bootstrap because of its closer relationship to the original data.© The Willi Hennig Society 2010.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号