首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
For the last 2 decades, supertree reconstruction has been an active field of research and has seen the development of a large number of major algorithms. Because of the growing popularity of the supertree methods, it has become necessary to evaluate the performance of these algorithms to determine which are the best options (especially with regard to the supermatrix approach that is widely used). In this study, seven of the most commonly used supertree methods are investigated by using a large empirical data set (in terms of number of taxa and molecular markers) from the worldwide flowering plant family Sapindaceae. Supertree methods were evaluated using several criteria: similarity of the supertrees with the input trees, similarity between the supertrees and the total evidence tree, level of resolution of the supertree and computational time required by the algorithm. Additional analyses were also conducted on a reduced data set to test if the performance levels were affected by the heuristic searches rather than the algorithms themselves. Based on our results, two main groups of supertree methods were identified: on one hand, the matrix representation with parsimony (MRP), MinFlip, and MinCut methods performed well according to our criteria, whereas the average consensus, split fit, and most similar supertree methods showed a poorer performance or at least did not behave the same way as the total evidence tree. Results for the super distance matrix, that is, the most recent approach tested here, were promising with at least one derived method performing as well as MRP, MinFlip, and MinCut. The output of each method was only slightly improved when applied to the reduced data set, suggesting a correct behavior of the heuristic searches and a relatively low sensitivity of the algorithms to data set sizes and missing data. Results also showed that the MRP analyses could reach a high level of quality even when using a simple heuristic search strategy, with the exception of MRP with Purvis coding scheme and reversible parsimony. The future of supertrees lies in the implementation of a standardized heuristic search for all methods and the increase in computing power to handle large data sets. The latter would prove to be particularly useful for promising approaches such as the maximum quartet fit method that yet requires substantial computing power.  相似文献   

2.

Background  

Supertree methods synthesize collections of small phylogenetic trees with incomplete taxon overlap into comprehensive trees, or supertrees, that include all taxa found in the input trees. Supertree methods based on the well established Robinson-Foulds (RF) distance have the potential to build supertrees that retain much information from the input trees. Specifically, the RF supertree problem seeks a binary supertree that minimizes the sum of the RF distances from the supertree to the input trees. Thus, an RF supertree is a supertree that is consistent with the largest number of clusters (or clades) from the input trees.  相似文献   

3.
Matrix representation with parsimony (MRP) supertree construction has been criticized because the supertree may specify clades that are contradicted by every source tree contributing to it. Such unsupported clades may also occur using other supertree methods; however, their incidence is largely unknown. In this study, I investigated the frequency of unsupported clades in both simulated and empirical MRP supertrees. Here, I propose a new index, QS, to quantify the qualitative support for a supertree and its clades among the set of source trees. Results show that unsupported clades are very rare in MRP supertrees, occurring most often when there are few source trees that all possess the same set of taxa. However, even under these conditions the frequency of unsupported clades was <0.2%. Unsupported clades were absent from both the Carnivora and Lagomorpha supertrees, reflecting the use of large numbers of source trees for both. The proposed QS indices are correlated broadly with another measure of quantitative clade support (bootstrap frequencies, as derived from resampling of the MRP matrix) but appear to be more sensitive. More importantly, they sample at the level of the source trees and thus, unlike the bootstrap, are suitable for summarizing the support of MRP supertree clades.  相似文献   

4.

Background

Supertree methods combine trees on subsets of the full taxon set together to produce a tree on the entire set of taxa. Of the many supertree methods, the most popular is MRP (Matrix Representation with Parsimony), a method that operates by first encoding the input set of source trees by a large matrix (the "MRP matrix") over {0,1, ?}, and then running maximum parsimony heuristics on the MRP matrix. Experimental studies evaluating MRP in comparison to other supertree methods have established that for large datasets, MRP generally produces trees of equal or greater accuracy than other methods, and can run on larger datasets. A recent development in supertree methods is SuperFine+MRP, a method that combines MRP with a divide-and-conquer approach, and produces more accurate trees in less time than MRP. In this paper we consider a new approach for supertree estimation, called MRL (Matrix Representation with Likelihood). MRL begins with the same MRP matrix, but then analyzes the MRP matrix using heuristics (such as RAxML) for 2-state Maximum Likelihood.

Results

We compared MRP and SuperFine+MRP with MRL and SuperFine+MRL on simulated and biological datasets. We examined the MRP and MRL scores of each method on a wide range of datasets, as well as the resulting topological accuracy of the trees. Our experimental results show that MRL, coupled with a very good ML heuristic such as RAxML, produced more accurate trees than MRP, and MRL scores were more strongly correlated with topological accuracy than MRP scores.

Conclusions

SuperFine+MRP, when based upon a good MP heuristic, such as TNT, produces among the best scores for both MRP and MRL, and is generally faster and more topologically accurate than other supertree methods we tested.  相似文献   

5.
Large and comprehensive phylogenetic trees are desirable for studying macroevolutionary processes and for classification purposes. Such trees can be obtained in two different ways. Either the widest possible range of taxa can be sampled and used in a phylogenetic analysis to produce a "big tree," or preexisting topologies can be used to create a supertree. Although large multigene analyses are often favored, combinable data are not always available, and supertrees offer a suitable solution. The most commonly used method of supertree reconstruction, matrix representation with parsimony (MRP), is presented here. We used a combined data set for the Poaceae to (1) assess the differences between an approach that uses combined data and one that uses different MRP modifications based on the character partitions and (2) investigate the advantages and disadvantages of these modifications. Baum and Ragan and Purvis modifications gave similar results. Incorporating bootstrap support associated with pre-existing topologies improved Baum and Ragan modification and its similarity with a combined analysis. Finally, we used the supertree reconstruction approach on 55 published phylogenies to build one of most comprehensive phylogenetic trees published for the grass family including 403 taxa and discuss its strengths and weaknesses in relation to other published hypotheses.  相似文献   

6.
New examples are presented, showing that supertree methods such as matrix representation with parsimony, minimum flip trees, and compatibility analysis of the matrix representing the input trees, produce supertrees that cannot be interpreted as displaying the groups present in the majority of the input trees. These methods may produce a supertree displaying some groups present in the minority of the trees, and contradicted by the majority. Of the three methods, compatibility analysis is the least used, but it seems to be the one that differs the least from majority rule consensus. The three methods are similar in that they choose the supertree(s) that best fit the set of input trees (quantified as some measure of the fit to the matrix representation of the input trees); in the case of complete trees, it is argued that, for a supertree method to be equivalent to majority rule or frequency difference consensus, two necessary (but not sufficient) conditions must be met. First, the measure of fit between a supertree and an input tree must be symmetrical. Second, the fit for a character representing a group must be measured as absolute: either it fits or it does not fit. In the restricted case of complete and equally resolved input trees, compatibility analysis (unlike MRP and minimum flipping) fulfils these two conditions: it is symmetrical (i.e., as long as the trees have the same taxon sets and are equally resolved, the number of characters in the matrix representation of tree A that require homoplasy in tree B is always the same as the number of characters in the matrix representation of tree B that require homoplasy in tree A) and it measures fit as all‐or‐none. In the case of just two complete and equally resolved input trees, the two conditions (symmetry and absolute fit) are necessary and sufficient, which explains why the compatibility analysis of such trees behaves as majority consensus. With more than two such trees, these conditions are still necessary but no longer sufficient for the equivalence; in such cases, the compatibility supertree may differ significantly from the majority rule consensus, even when these conditions apply (as shown by example). MRP and minimum flipping are asymmetric and measure various degrees of fit for each character, which explains why they often behave very differently from majority rule procedures, and why they are very likely to have groups contradicted by each of the input trees, or groups supported by a minority of the input trees. © The Willi Hennig Society 2005.  相似文献   

7.
Maximum likelihood supertrees   总被引:2,自引:0,他引:2  
  相似文献   

8.
Many research groups are estimating trees containing anywhere from a few thousands to hundreds of thousands of species, toward the eventual goal of the estimation of a Tree of Life, containing perhaps as many as several million leaves. These phylogenetic estimations present enormous computational challenges, and current computational methods are likely to fail to run even on data sets in the low end of this range. One approach to estimate a large species tree is to use phylogenetic estimation methods (such as maximum likelihood) on a supermatrix produced by concatenating multiple sequence alignments for a collection of markers; however, the most accurate of these phylogenetic estimation methods are extremely computationally intensive for data sets with more than a few thousand sequences. Supertree methods, which assemble phylogenetic trees from a collection of trees on subsets of the taxa, are important tools for phylogeny estimation where phylogenetic analyses based upon maximum likelihood (ML) are infeasible. In this paper, we introduce SuperFine, a meta-method that utilizes a novel two-step procedure in order to improve the accuracy and scalability of supertree methods. Our study, using both simulated and empirical data, shows that SuperFine-boosted supertree methods produce more accurate trees than standard supertree methods, and run quickly on very large data sets with thousands of sequences. Furthermore, SuperFine-boosted matrix representation with parsimony (MRP, the most well-known supertree method) approaches the accuracy of ML methods on supermatrix data sets under realistic conditions.  相似文献   

9.
Accurate phylogenetic reconstruction methods are currently limited to a maximum of few dozens of taxa. Supertree methods construct a large tree over a large set of taxa, from a set of small trees over overlapping subsets of the complete taxa set. Hence, in order to construct the tree of life over a million and a half different species, the use of a supertree method over the product of accurate methods, is inevitable. Perhaps the simplest version of this task that is still widely applicable, yet quite challenging, is quartet-based reconstruction. This problem lies at the root of many tree reconstruction methods and theoretical as well as experimental results have been reported. Nevertheless, dealing with false, conflicting quartet trees remains problematic. In this paper, we describe an algorithm for constructing a tree from a set of input quartet trees even with a significant fraction of errors. We show empirically that conflicts in the inputs are handled satisfactorily and that it significantly outperforms and outraces the Matrix Representation with Parsimony (MRP) methods that have previously been most successful in dealing with supertrees. Our algorithm is based on a divide and conquer algorithm where our divide step uses a semidefinite programming (SDP) formulation of MaxCut. We remark that this builds on previous work of ours for piecing together trees from rooted triplet trees. The recursion for unrooted quartets, however, is more complicated in that even with completely consistent set of quartet trees the problem is NP-hard, as opposed to the problem for triples where there is a linear time algorithm. This complexity leads to several issues and some solutions of possible independent interest.  相似文献   

10.
Semi-strict supertrees   总被引:3,自引:1,他引:2  
A method to calculate semi‐strict supertrees is proposed. The semi‐strict supertrees are calculated by creating the matrix that represents all the groups in the source trees (as done in already existing techniques), and then finding the trees determined by the ultra‐clique. The ultra‐clique is defined as the set of characters where each possible subset is compatible with each possible subset from the entire matrix. Finding the ultra‐clique is computationally complex (since in most cases many of the characters have missing entries), but a heuristic method yields reliable results. When the trees have no conflict, or when there are only two trees, the method produces the exact result for any ordering of the input trees and any ordering of the groups within them; when there are more than two trees and they have conflict, a single ordering or sequence can create some spurious groups, but doing multiple sequences eliminates the spurious groups. The method uses only state set operations, and is thus easily implemented in computer programs. Unlike any existing type of supertree, semi‐strict supertrees display all the groups, and only those groups, that are implied by at least some combination of the input trees and contradicted by none. The idea that supertrees should take into account the number of occurences of a given group, so as to retain some groups even in the case of conflict, is discussed; it is argued that a conceptual equivalent of the majority rule consensus is not possible when the sets of taxa differ among trees. Also, when pruning taxa from a set of trees, the supertree can display groups that contradict the consensus for the entire trees, suggesting that supertrees for matrices with very dissimilar sets of taxa should be interpreted with caution. If (for any valid reason) the data cannot be combined in a single matrix, it is advisable that the taxon sets in the matrices be as similar as possible.  相似文献   

11.
Using a simple example and simulations, we explore the impact of input tree shape upon a broad range of supertree methods. We find that input tree shape can affect how conflict is resolved by several supertree methods and that input tree shape effects may be substantial. Standard and irreversible matrix representation with parsimony (MRP), MinFlip, duplication-only Gene Tree Parsimony (GTP), and an implementation of the average consensus method have a tendency to resolve conflict in favor of relationships in unbalanced trees. Purvis MRP and the average dendrogram method appear to have an opposite tendency. Biases with respect to tree shape are correlated with objective functions that are based upon unusual asymmetric tree-to-tree distance or fit measures. Split, quartet, and triplet fit, most similar supertree, and MinCut methods (provided the latter are interpreted as Adams consensus-like supertrees) are not revealed to have any bias with respect to tree shape by our example, but whether this holds more generally is an open problem. Future development and evaluation of supertree methods should consider explicitly the undesirable biases and other properties that we highlight. In the meantime, use of a single, arbitrarily chosen supertree method is discouraged. Use of multiple methods and/or weighting schemes may allow practical assessment of the extent to which inferences from real data depend upon methodological biases with respect to input tree shape or size.  相似文献   

12.
A Robinson-Foulds (RF) supertree for a collection of input trees is a tree containing all the species in the input trees that is at minimum total RF distance to the input trees. Thus, an RF supertree is consistent with the maximum number of splits in the input trees. Constructing RF supertrees for rooted and unrooted data is NP-hard. Nevertheless, effective local search heuristics have been developed for the restricted case where the input trees and the supertree are rooted. We describe new heuristics, based on the Edge Contract and Refine (ECR) operation, that remove this restriction, thereby expanding the utility of RF supertrees. Our experimental results on simulated and empirical data sets show that our unrooted local search algorithms yield better supertrees than those obtained from MRP and rooted RF heuristics in terms of total RF distance to the input trees and, for simulated data, in terms of RF distance to the true tree.  相似文献   

13.
Despite the growing popularity of supertree construction for combining phylogenetic information to produce more inclusive phylogenies, large-scale performance testing of this method has not been done. Through simulation, we tested the accuracy of the most widely used supertree method, matrix representation with parsimony analysis (MRP), with respect to a (maximum parsimony) total evidence solution and a known model tree. When source trees overlap completely, MRP provided a reasonable approximation of the total evidence tree; agreement was usually > 85%. Performance improved slightly when using smaller, more numerous, or more congruent source trees, and especially when elements were weighted in proportion to the bootstrap frequencies of the nodes they represented on each source tree ("weighted MRP"). Although total evidence always estimated the model tree slightly better than nonweighted MRP methods, weighted MRP in turn usually out-performed total evidence slightly. When source studies were even moderately nonoverlapping (i.e., sharing only three-quarters of the taxa), the high proportion of missing data caused a loss in resolution that severely degraded the performance for all methods, including total evidence. In such cases, even combining more trees, which had positive effects elsewhere, did not improve accuracy. Instead, "seeding" the supertree or total evidence analyses with a single largely complete study improved performance substantially. This finding could be an important strategy for any studies that seek to combine phylogenetic information. Overall, our results suggest that MRP supertree construction provides a reasonable approximation of a total evidence solution and that weighted MRP should be used whenever possible.  相似文献   

14.
Compatibility of phylogenetic trees is the most important concept underlying widely-used methods for assessing the agreement of different phylogenetic trees with overlapping taxa and combining them into common supertrees to reveal the tree of life. The notion of ancestral compatibility of phylogenetic trees with nested taxa was recently introduced. In this paper we analyze in detail the meaning of this compatibility from the points of view of the local structure of the trees, of the existence of embeddings into a common supertree, and of the joint properties of their cluster representations. Our analysis leads to a very simple polynomial-time algorithm for testing this compatibility, which we have implemented and is freely available for download from the BioPerl collection of Perl modules for computational biology.  相似文献   

15.
The estimation of ever larger phylogenies requires consideration of alternative inference strategies, including divide-and-conquer approaches that decompose the global inference problem to a set of smaller, more manageable component problems. A prominent locus of research in this area is the development of supertree methods, which estimate a composite tree by combining a set of partially overlapping component topologies. Although promising, the use of component tree topologies as the primary data dissociates supertrees from complexities within the underling character data and complicates the evaluation of phylogenetic uncertainty. We address these issues by exploring three approaches that variously incorporate nonparametric bootstrapping into a common supertree estimation algorithm (matrix representation with parsimony, although any algorithm might be used), including bootstrap-weighting, source-tree bootstrapping, and hierarchical bootstrapping. We illustrate these procedures by means of hypothetical and empirical examples. Our preliminary experiments suggest that these methods have the potential to improve the correspondence of supertree estimates to those derived from simultaneous analysis of the combined data and to allow uncertainty in supertree topologies to be quantified. The ability to increase the transparency of supertrees to the underlying character data has several practical implications and sheds new light on an old debate. These methods have been implemented in the freely available program, tREeBOOT.  相似文献   

16.
This paper examines a recent proposal to calculate supertrees by minimizing the sum of subtree prune‐and‐regraft distances to the input trees. The supertrees thus calculated may display groups present in a minority of the input trees but contradicted by the majority, or groups that are not supported by any input tree or combination of input trees. The proponents of the method themselves stated that these are serious problems of “matrix representation with parsimony”, but they can in fact occur in their own method. The majority rule supertrees, being explicitly clade‐based, cannot have these problems, and seem much more suited to retrieving common clades from a set of trees with different taxon sets. However, it is dubious that so‐called majority rule supertrees can always be interpreted as displaying those clades present (or compatible with) with a majority of the trees. The majority rule consensus is always a median tree, in terms of the Robinson–Foulds distances (i.e. it minimizes the sum of Robinson–Foulds distances to the input trees). In contrast, majority rule supertrees may not be median—different, contradictory trees may minimize Robinson–Foulds distances, while their strict consensus does not. If being “majority” results from being median in Robinson–Foulds distances, this means that in the supertree setting a “majority” is ambiguously defined, sometimes achievable only by mutually contradictory trees.  相似文献   

17.
Inferring species phylogenies is an important part of understanding molecular evolution. Even so, it is well known that an accurate phylogenetic tree reconstruction for a single gene does not always necessarily correspond to the species phylogeny. One commonly accepted strategy to cope with this problem is to sequence many genes; the way in which to analyze the resulting collection of genes is somewhat more contentious. Supermatrix and supertree methods can be used, although these can suppress conflicts arising from true differences in the gene trees caused by processes such as lineage sorting, horizontal gene transfer, or gene duplication and loss. In 2004, Huson et al. (IEEE/ACM Trans. Comput. Biol. Bioinformatics 1:151-158) presented the Z-closure method that can circumvent this problem by generating a supernetwork as opposed to a supertree. Here we present an alternative way for generating supernetworks called Q-imputation. In particular, we describe a method that uses quartet information to add missing taxa into gene trees. The resulting trees are subsequently used to generate consensus networks, networks that generalize strict and majority-rule consensus trees. Through simulations and application to real data sets, we compare Q-imputation to the matrix representation with parsimony (MRP) supertree method and Z-closure, and demonstrate that it provides a useful complementary tool.  相似文献   

18.
Given a collection of rooted phylogenetic trees with overlapping sets of leaves, a compatible supertree $S$ is a single tree whose set of leaves is the union of the input sets of leaves and such that $S$ agrees with each input tree when restricted to the leaves of the input tree. Typically with trees from real data, no compatible supertree exists, and various methods may be utilized to reconcile the incompatibilities in the input trees. This paper focuses on a measure of robustness of a supertree method called its ``radius" $R$. The larger the value of $R$ is, the further the data set can be from a natural correct tree $T$ and yet the method will still output $T$. It is shown that the maximal possible radius for a method is $R = 1/2$. Many familiar methods, both for supertrees and consensus trees, are shown to have $R = 0$, indicating that they need not output a tree $T$ that would seem to be the natural correct answer. A polynomial-time method Normalized Triplet Supertree (NTS) with the maximal possible $R = 1/2$ is defined. A geometric interpretion is given, and NTS is shown to solve an optimization problem. Additional properties of NTS are described.  相似文献   

19.
Phylogenetic relationships among advanced snakes (Acrochordus + Colubroidea = Caenophidia) and the position of the genus Acrochordus relative to colubroid taxa are contentious. These concerns were investigated by phylogenetic analysis of fragments from four mitochondrial genes representing 62 caenophidian genera and 5 noncaenophidian taxa. Four methods of phylogeny reconstruction were applied: matrix representation with parsimony (MRP) supertree consensus, maximum parsimony, maximum likelihood, and Bayesian analysis. Because of incomplete sampling, extensive missing data were inherent in this study. Analyses of individual genes retrieved roughly the same clades, but branching order varied greatly between gene trees, and nodal support was poor. Trees generated from combined data sets using maximum parsimony, maximum likelihood, and Bayesian analysis had medium to low nodal support but were largely congruent with each other and with MRP supertrees. Conclusions about caenophidian relationships were based on these combined analyses. The Xenoderminae, Viperidae, Pareatinae, Psammophiinae, Pseudoxyrophiinae, Homalopsinae, Natricinae, Xenodontinae, and Colubrinae (redefined) emerged as monophyletic, whereas Lamprophiinae, Atractaspididae, and Elapidae were not in one or more topologies. A clade comprising Acrochordus and Xenoderminae branched closest to the root, and when Acrochordus was assessed in relation to a colubroid subsample and all five noncaenophidians, it remained associated with the Colubroidea. Thus, Acrochordus + Xenoderminae appears to be the sister group to the Colubroidea, and Xenoderminae should be excluded from Colubroidea. Within Colubroidea, Viperidae was the most basal clade. Other relationships appearing in all final topologies were (1) a clade comprising Psammophiinae, Lamprophiinae, Atractaspididae, Pseudoxyrophiinae, and Elapidae, within which the latter four taxa formed a subclade, and (2) a clade comprising Colubrinae, Natricinae, and Xenodontinae, within which the latter two taxa formed a subclade. Pareatinae and Homalopsinae were the most unstable clades.  相似文献   

20.

Background  

Supertree methods comprise one approach to reconstructing large molecular phylogenies given multi-marker datasets: trees are estimated on each marker and then combined into a tree (the "supertree") on the entire set of taxa. Supertrees can be constructed using various algorithmic techniques, with the most common being matrix representation with parsimony (MRP). When the data allow, the competing approach is a combined analysis (also known as a "supermatrix" or "total evidence" approach) whereby the different sequence data matrices for each of the different subsets of taxa are concatenated into a single supermatrix, and a tree is estimated on that supermatrix.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号