首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Quantifying branch support using the bootstrap and/or jackknife is generally considered to be an essential component of rigorous parsimony and maximum likelihood phylogenetic analyses. Previous authors have described how application of the frequency-within-replicates approach to treating multiple equally optimal trees found in a given bootstrap pseudoreplicate can provide apparent support for otherwise unsupported clades. We demonstrate how a similar problem may occur when a non-representative subset of equally optimal trees are held per pseudoreplicate, which we term the undersampling-within-replicates artifact. We illustrate the frequency-within-replicates and undersampling-within-replicates bootstrap and jackknife artifacts using both contrived and empirical examples, demonstrate that the artifacts can occur in both parsimony and likelihood analyses, and show that the artifacts occur in outputs from multiple different phylogenetic-inference programs. Based on our results, we make the following five recommendations, which are particularly relevant to supermatrix analyses, but apply to all phylogenetic analyses. First, when two or more optimal trees are found in a given pseudoreplicate they should be summarized using the strict-consensus rather than frequency-within-replicates approach. Second jackknife resampling should be used rather than bootstrap resampling. Third, multiple tree searches while holding multiple trees per search should be conducted in each pseudoreplicate rather than conducting only a single search and holding only a single tree. Fourth, branches with a minimum possible optimized length of zero should be collapsed within each tree search rather than collapsing branches only if their maximum possible optimized length is zero. Fifth, resampling values should be mapped onto the strict consensus of all optimal trees found rather than simply presenting the ≥ 50% bootstrap or jackknife tree or mapping the resampling values onto a single optimal tree.  相似文献   

2.
An expanded plastid DNA phylogeny for Orchidaceae was generated from sequences of rbcL and matK for representatives of all five subfamilies. The data were analyzed using equally weighted parsimony, and branch support was assessed with jackknifing. The analysis supports recognition of five subfamilies with the following relationships: (Apostasioideae (Vanilloideae (Cypripedioideae (Orchidoideae (Epidendroideae))))). Support for many tribal-level groups within Epidendroideae is evident, but relationships among these groups remain uncertain, probably due to a rapid radiation in the subfamily that resulted in short branches along the spine of the tree. A series of experiments examined jackknife parameters and strategies to determine a reasonable balance between computational effort and results. We found that support values plateau rapidly with increased search effort. Tree bisection-reconnection swapping in a single search replicate per jackknife replicate and saving only two trees resulted in values that were close to those obtained in the most extensive searches. Although this approach uses considerably more computational effort than less extensive (or no) swapping, the results were also distinctly better. The effect of saving a maximal number of trees in each jackknife replicate can also be pronounced and is important for representing support accurately.  相似文献   

3.
Contemporary phylogenomic studies frequently incorporate two-step coalescent analyses wherein the first step is to infer individual-gene trees, generally using maximum-likelihood implemented in the popular programs PhyML or RAxML . Four concerns with this approach are that these programs only present a single fully resolved gene tree to the user despite potential for ambiguous support, insufficient phylogenetic signal to fully resolve each gene tree, inexact computer arithmetic affecting the reported likelihood of gene trees, and an exclusive focus on the most likely tree while ignoring trees that are only slightly suboptimal or within the error tolerance. Taken together, these four concerns are sufficient for RAxML and Phy ML users to be suspicious of the resulting (perhaps over-resolved) gene-tree topologies and (perhaps unjustifiably high) bootstrap support for individual clades. In this study, we sought to determine how frequently these concerns apply in practice to contemporary phylogenomic studies that use RAxML for gene-tree inference. We did so by re-analyzing 100 genes from each of ten studies that, taken together, are representative of many empirical phylogenomic studies. Our seven findings are as follows. First, the few search replicates that are frequently applied in phylogenomic studies are generally insufficient to find the optimal gene-tree topology. Second, there is often more topological variation among slightly suboptimal gene trees relative to the best-reported tree than can be safely ignored. Third, the Shimodaira–Hasegawa-like approximate likelihood ratio test is highly effective at identifying dubiously supported clades and outperforms the alternative approaches of relying on bootstrap support or collapsing minimum-length branches. Fourth, the bootstrap can, but rarely does, indicate high support for clades that are not supported amongst slightly suboptimal trees. Fifth, increasing the accuracy by which RA xML optimizes model-parameter values generally has a nominal effect on selection of optimal trees. Sixth, tree searches using the GTRCAT model were generally less effective at finding optimal known trees than those using the GTRGAMMA model. Seventh, choice of gene-tree sampling strategy can affect inferred coalescent branch lengths, species-tree topology and branch support.  相似文献   

4.
A new consensus method for summarizing competing phylogenetic hypotheses, weighted compromise, is described. The method corrects for a bias inherent in majority‐rule consensus/compromise trees when the source trees exhibit non‐independence due to ambiguity in terminal clades. Suggestions are given for its employment in parsimony analyses and tree resampling strategies such as bootstrapping and jackknifing. An R function is described that can be used with the programming language R to produce the consensus.  相似文献   

5.
Abstract— Protein variation among 37 species of carcharhiniform sharks was examined at 17 presumed loci. Evolutionary trees were inferred from these data using both cladistic character and a distance Wagner analysis. Initial cladistic character analysis resulted in more than 30 000 equally parsimonious tree arrangements. Randomization tests designed to evaluate the phylogenetic information content of the data suggest the data are highly significantly different from random in spite of the large number of parsimonious trees produced. Different starting seed trees were found to influence the kind of tree topologies discovered by the heuristic branch swapping algorithm used. The trees generated during the early phases of branch swapping on a single seed tree were found to be topologically similar to those generated throughout the course of branch swapping. Successive weighting increased the frequency and the consistency with which certain clades were found during the course of branch swapping, causing the semi-strict consensus to be more resolved. Successive weighting also appeared resilient to the bias associated with the choice of initial seed tree causing analyses seeded with different trees to converge on identical final character weights and the same semi-strict consensus tree.
The summary cladistic character analysis and the distance Wagner analysis both support the monophyly of two major clades, the genus Rhizoprionodon and the genus Sphyrna. . The distance Wagner analysis also supports the monophyly of the genus Carcharhinus . However, the cladistic analysis suggests that Carcharhinus is a paraphyletic group that includes the blue shark Prionace glauca .  相似文献   

6.
The clade size effect refers to a bias that causes middle‐sized clades to be less supported than small or large‐sized clades. This bias is present in resampling measures of support calculated under maximum likelihood and maximum parsimony and in Bayesian posterior probabilities. Previous analyses indicated that the clade size effect is worst in maximum parsimony, followed by maximum likelihood, while Bayesian inference is the least affected. Homoplasy was interpreted as the main cause of the effect. In this study, we explored the presence of the clade size effect in alternative measures of branch support under maximum parsimony: Bremer support and symmetric resampling, expressed as absolute frequencies and frequency differences. Analyses were performed using 50 molecular and morphological matrices. Symmetric resampling showed the same tendency that bootstrap and jackknife did for maximum parsimony and maximum likelihood. Few matrices showed a significant bias using Bremer support, presenting a better performance than resampling measures of support and comparable to Bayesian posterior probabilities. Our results indicate that the problem is not maximum parsimony, but resampling measures of support. We corroborated the role of homoplasy as a possible cause of the clade size effect, increasing the number of random trees during the resampling, which together with the higher chances that medium‐sized clades have of being contradicted generates the bias during the perturbation of the original matrix, making it stronger in resampling measures of support.  相似文献   

7.
Incongruence among trees reconstructed with different data may stem from historical (gene tree‐species tree conflict) or process (character change biases) phenomena. Regardless of the source, incongruent data, as determined with “global” measures of homoplasy, have often been excluded from parsimony analysis of the combined data. Recent studies suggest that these homoplasy measures do not predict the contribution of each character to overall tree structure. Branch support measures identify, on a character to node basis, sources of support and conflict resulting from a simultaneous analysis of the data. We implement these branch support measures to identify sources of character conflict in a clade of water striders consisting of Gerris Fabricius, Aquarius Schellenberg, and Limnoporus Stål species. Separate analyses of morphology, mitochondrial cytochrome oxidase I (COI), large mitochondrial ribosomal subunit (16SrRNA), and elongation factor‐1α (EF‐1α) data resulted in cladograms that varied in resolution and topological concordance. Simultaneous analysis of the data resulted in two trees that were unresolved for one node in a strict consensus. The topology agreed with current classification except for the placements of Aquarius chilensis and the Aquarius remigis species group closer to Gerris than to congeneric species. Branch support measures indicated that support derived from each data set varied among nodes, but COI had an overall negative effect on branch support. However, Spearman rank correlation of partitioned branch support values indicated no negative associations of branch support between any data sets and a positive association between EF‐1α and 16SrRNA. Thus incongruence among data sets was not drastic and the gene‐tree versus species tree phenomenon was not implicated. Biases in character change were a more likely reason for incongruence, although saturation curves and incongruence length difference for COI indicated little potential for homoplasy. However, a posteriori inspection of COI nucleotide change with reference to the simultaneous analysis tree revealed AT and codon biases. These biases were not associated with branch support measures. Therefore, it is difficult to predict incongruence or identify its cause. Exclusion of data is ill advised because every character is potentially parsimony informative.  相似文献   

8.
Phylogenetic trees from multiple genes can be obtained in two fundamentally different ways. In one, gene sequences are concatenated into a super-gene alignment, which is then analyzed to generate the species tree. In the other, phylogenies are inferred separately from each gene, and a consensus of these gene phylogenies is used to represent the species tree. Here, we have compared these two approaches by means of computer simulation, using 448 parameter sets, including evolutionary rate, sequence length, base composition, and transition/transversion rate bias. In these simulations, we emphasized a worst-case scenario analysis in which 100 replicate datasets for each evolutionary parameter set (gene) were generated, and the replicate dataset that produced a tree topology showing the largest number of phylogenetic errors was selected to represent that parameter set. Both randomly selected and worst-case replicates were utilized to compare the consensus and concatenation approaches primarily using the neighbor-joining (NJ) method. We find that the concatenation approach yields more accurate trees, even when the sequences concatenated have evolved with very different substitution patterns and no attempts are made to accommodate these differences while inferring phylogenies. These results appear to hold true for parsimony and likelihood methods as well. The concatenation approach shows >95% accuracy with only 10 genes. However, this gain in accuracy is sometimes accompanied by reinforcement of certain systematic biases, resulting in spuriously high bootstrap support for incorrect partitions, whether we employ site, gene, or a combined bootstrap resampling approach. Therefore, it will be prudent to report the number of individual genes supporting an inferred clade in the concatenated sequence tree, in addition to the bootstrap support.  相似文献   

9.
In recent years, the emphasis of theoretical work on phylogenetic inference has shifted from the development of new tree inference methods to the development of methods to measure the statistical support for the topologies. This paper reviews 3 approaches to assign support values to branches in trees obtained in the analysis of molecular sequences: the bootstrap, the Bayesian posterior probabilities for clades, and the interior branch tests. In some circumstances, these methods give different answers. It should not be surprising: their assumptions are different. Thus the interior branch tests assume that a given topology is true and only consider if a particular branch length is longer than zero. If a tree is incorrect, a wrong branch (a low bootstrap or Bayesian support may be an indication) may have a non-zero length. If the substitution model is oversimplified, the length of a branch may be overestimated, and the Bayesian support for the branch may be inflated. The bootstrap, on the other hand, approximates the variance of the data under the real model of sequence evolution, because it involves direct resampling from this data. Thus the discrepancy between the Bayesian support and the bootstrap support may signal model inaccuracy. In practical application, use of all 3 methods is recommended, and if discrepancies are observed, then a careful analysis of their potential origins should be made.  相似文献   

10.
A numerical cladistic analysis, based on 23 terminal groups and 63 morphological characters, was done to infer phylogenetic relationships within the Eurasian catfish family Siluridae. Nine hundred and forty-five equally most parsimonious trees (134 steps, consistency index 0.634) were found that differ in their resolutions of four polychotomies. Strict consensus of these trees includes ten internal nodes, does not support monophyly of Silurus, Ompok and Kryptopterus , as usually defined, and offers ambiguous support for monophyly of Wallago. Silurus and Kryptopterus are each composed of two non-sister group clades, and Ompok is composed of at least two such clades. Heuristic searches constrained by monophyly of Silurus, Ompok or Kryptopterus yielded trees five or six steps longer than the shortest trees free of constraints. The strict consensus also infers a basal dichotomy that separates the Siluridae into a temperate Eurasian clade with about 20 nominal species and a subtropical/tropical south and southeast Asian clade with about 75 nominal species. The distributions of these clades overlap in a relatively narrow region of east Asia. A heuristic search for trees 1 step longer than the shortest trees yielded 253890 trees. A strict consensus of these trees also infers a basal dichotomy between the above-mentioned clades. This analysis revealed four additional putative synapomorphies of the Siluridae, pending further resolution of the family's outgroup relationships.  相似文献   

11.
The statistical properties of sample estimation and bootstrap estimation of phylogenetic variability from a sample of nucleotide sequences are studied by using model trees of three taxa with an outgroup and by assuming a constant rate of nucleotide substitution. The maximum-parsimony method of tree reconstruction is used. An analytic formula is derived for estimating the sequence length that is required if P, the probability of obtaining the true tree from the sampled sequences, is to be equal to or higher than a given value. Bootstrap estimation is formulated as a two-step sampling procedure: (1) sampling of sequences from the evolutionary process and (2) resampling of the original sequence sample. The probability that a bootstrap resampling of an original sequence sample will support the true tree is found to depend on the model tree, the sequence length, and the probability that a randomly chosen nucleotide site is an informative site. When a trifurcating tree is used as the model tree, the probability that one of the three bifurcating trees will appear in > or = 95% of the bootstrap replicates is < 5%, even if the number of bootstrap replicates is only 50; therefore, the probability of accepting an erroneous tree as the true tree is < 5% if that tree appears in > or = 95% of the bootstrap replicates and if more than 50 bootstrap replications are conducted. However, if a particular bifurcating tree is observed in, say, < 75% of the bootstrap replicates, then it cannot be claimed to be better than the trifurcating tree even if > or = 1,000 bootstrap replications are conducted. When a bifurcating tree is used as the model tree, the bootstrap approach tends to overestimate P when the sequences are very short, but it tends to underestimate that probability when the sequences are long. Moreover, simulation results show that, if a tree is accepted as the true tree only if it has appeared in > or = 95% of the bootstrap replicates, then the probability of failing to accept any bifurcating tree can be as large as 58% even when P = 95%, i.e., even when 95% of the samples from the evolutionary process will support the true tree. Thus, if the rate-constancy assumption holds, bootstrapping is a conservative approach for estimating the reliability of an inferred phylogeny for four taxa.  相似文献   

12.
Matrix representation with parsimony (MRP) supertree construction has been criticized because the supertree may specify clades that are contradicted by every source tree contributing to it. Such unsupported clades may also occur using other supertree methods; however, their incidence is largely unknown. In this study, I investigated the frequency of unsupported clades in both simulated and empirical MRP supertrees. Here, I propose a new index, QS, to quantify the qualitative support for a supertree and its clades among the set of source trees. Results show that unsupported clades are very rare in MRP supertrees, occurring most often when there are few source trees that all possess the same set of taxa. However, even under these conditions the frequency of unsupported clades was <0.2%. Unsupported clades were absent from both the Carnivora and Lagomorpha supertrees, reflecting the use of large numbers of source trees for both. The proposed QS indices are correlated broadly with another measure of quantitative clade support (bootstrap frequencies, as derived from resampling of the MRP matrix) but appear to be more sensitive. More importantly, they sample at the level of the source trees and thus, unlike the bootstrap, are suitable for summarizing the support of MRP supertree clades.  相似文献   

13.
Abstract Phylogenetic relationships of 25 genera of Holarctic Teleiodini (Gelechiidae) are postulated based on morphology and molecular characters, including CO‐I, CO‐II, and 28S genes. The phylogenetic analysis of the morphology matrix yielded four equal most‐parsimonious trees (length 330 steps, CI = 0.36, RI = 0.55) and a strict consensus tree (length 335 steps, CI = 0.36, RI = 0.54) with one polytomy and one trichotomy. The phylogenetic analysis of the combined morphology and CO‐I + CO‐II + 28S matrices yielded two equally most‐parsimonious trees (length 1184 steps, CI = 0.50, RI = 0.42) and a strict consensus tree (length 1187 steps, CI = 0.50, RI = 0.42) that reinforced results from the morphological analysis and resolved the one polytomy present in the morphology consensus tree. Teleiodini are defined as a monophyletic clade with a Bremer support value greater than 5 in the consensus tree based on morphological and molecular data. Twenty‐three clades of genera are defined with Bremer support values provided. An analysis of larval host‐plant preferences based on the consensus tree for combined data indicates derivation of feeding on woody hosts from genera feeding on herbaceous hosts and a single origin of feeding on coniferous hosts. An area cladogram indicates five independent origins of Nearctic genera from Holarctic ancestors and one origin from a Palearctic genus.  相似文献   

14.
BRANCH SUPPORT AND TREE STABILITY   总被引:38,自引:1,他引:37  
Abstract— Branch support is quantified as the extra length needed to lose a branch in the consensus of near-most-parsimonious trees. This approach is based solely on the original data, as opposed to the data perturbation used in the bootstrap procedure. If trees have been generated by Farris's successive approximations approach to character weighting, branch support should be examined in terms of weighted extra length needed to lose a branch. The sum of all branch support values over the tree divided by the length of the most parsimonious tree[s] provides a new index, the total support index. This index is a measure of tree stability in terms of supported resolutions, which is of prime importance in cladistic analysis.  相似文献   

15.
In phylogenetic trees the addition and removal of taxa has large effects on tree topology, hence measures of branch support and tree stability should account for taxonomic composition. Currently no comprehensive system of composition-dependent parameters exists in any cladistic or phenetic strategy. We introduce several values and indices based on a modification of the original jackknife resampling. Their advantage is a complete evaluation and optimization of taxon composition in phylogenetic data. While related to the Jackknife Monophyly Index (JMI), our system of support measures expands beyond parsimony analyses, and includes indices estimating support for the entire phylogenetic tree based on individual branch supports.  相似文献   

16.
The future of phylogeny reconstruction   总被引:1,自引:0,他引:1  
A new approach to phylogenetic analysis, parsimony jackknifing, uses simple parsimony calculations combined with resampling of characters to arrive at a tree comprising well-supported groups. This is usually much the same as the consensus of most-parsimonious trees found from extensive multiple-tree calculations, but the new method is thousands of times faster, allowing analysis of much larger data matrices, and also provides information on the strength of support for different groups. Jackknife frequencies provide a more reliable assessment of support than do alternative methods, notably "confidence probability" (CP) and T-PTP testing.  相似文献   

17.
Recent phylogenetic analyses of a large dataset for mammalian families (169 taxa, 26 loci) portray contrasting results. Supermatrix (concatenation) methods support a generally robust tree with only a few inconsistently resolved polytomies, whereas MP‐EST coalescence analysis of the same dataset yields a weakly supported tree that conflicts with many traditionally recognized clades. Here, we evaluate this discrepancy via improved coalescence analyses with reference to the rich history of phylogenetic studies on mammals. This integration clearly demonstrates that both supermatrix and coalescence analyses of just 26 loci yield a congruent, well‐supported phylogenetic hypothesis for Mammalia. Discrepancies between published studies are explained by implementation of overly simple DNA substitution models, inadequate tree‐search routines and limitations of the MP‐EST method. We develop a simple measure, partitioned coalescence support (PCS), which summarizes the distribution of support and conflict among gene trees for a given clade. Extremely high PCS scores for outlier gene trees at two nodes in the mammalian tree indicate a troubling bias in the MP‐EST method. We conclude that in this age of phylogenomics, a solid understanding of systematics fundamentals, choice of valid methodology and a broad knowledge of a clade's taxonomic history are still required to yield coherent phylogenetic inferences.  相似文献   

18.
Partitioned Bremer support (PBS) is a valuable means of assessing congruence in combined data sets, but some aspects require clarification. When more than one equally parsimonious tree is found during the constrained search for trees lacking the node of interest, averaging PBS for each data set across these trees can conceal conflict, and PBS should ideally be examined for each constrained tree. Similarly, when multiple most parsimonious trees (MPTs) are generated during analysis of the combined data, PBS is usually calculated on the consensus tree. However, extra information can be obtained if PBS is calculated on each of the MPTs or even suboptimal trees.  相似文献   

19.
Abstract — The relationships of the clicking Elateroidea beetles were studied with the help of parsimony analysis using Hennig86. The character matrix included 70 characters and 27 taxa. The results demonstrate the monophyly of the group Throscidae sensu Crowson, contrary to views presented in other papers. Methods for solving this problem were sought. When several minimum length solutions were obtained, successive weighting and a search for a strict consensus tree identical with one of the original trees appeared to be acceptable ways for trying to identify the preferred solution. When conflicting trees from separate data sets were compared, a combined global analysis turned out to be impossible to perform because the data sets used different terminal taxa. In this case, the incongruence and total support tests provided by Farris' programs RNA and KON proved indispensable. The conflict found between the results obtained here and those presented by other workers using a large suite of larval characters were shown to be caused by an incongruent data matrix used in the latter study—the larval data set resulted in a polyphyletic ingroup and suggests relationships quite different from adult data alone. Directed large scale homoplasy due to repeated re-invasion of two major habitats by separate clades may be the factor causing difficulties in coding the larval characters.  相似文献   

20.
Nonparamtric bootstrapping methods may be useful for assessing confidence in a supertree inference. We examined the performance of two supertree bootstrapping methods on four published data sets that each include sequence data from more than 100 genes. In "input tree bootstrapping," input gene trees are sampled with replacement and then combined in replicate supertree analyses; in "stratified bootstrapping," trees from each gene's separate (conventional) bootstrap tree set are sampled randomly with replacement and then combined. Generally, support values from both supertree bootstrap methods were similar or slightly lower than corresponding bootstrap values from a total evidence, or supermatrix, analysis. Yet, supertree bootstrap support also exceeded supermatrix bootstrap support for a number of clades. There was little overall difference in support scores between the input tree and stratified bootstrapping methods. Results from supertree bootstrapping methods, when compared to results from corresponding supermatrix bootstrapping, may provide insights into patterns of variation among genes in genome-scale data sets.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号