首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Phylogenetic trees from multiple genes can be obtained in two fundamentally different ways. In one, gene sequences are concatenated into a super-gene alignment, which is then analyzed to generate the species tree. In the other, phylogenies are inferred separately from each gene, and a consensus of these gene phylogenies is used to represent the species tree. Here, we have compared these two approaches by means of computer simulation, using 448 parameter sets, including evolutionary rate, sequence length, base composition, and transition/transversion rate bias. In these simulations, we emphasized a worst-case scenario analysis in which 100 replicate datasets for each evolutionary parameter set (gene) were generated, and the replicate dataset that produced a tree topology showing the largest number of phylogenetic errors was selected to represent that parameter set. Both randomly selected and worst-case replicates were utilized to compare the consensus and concatenation approaches primarily using the neighbor-joining (NJ) method. We find that the concatenation approach yields more accurate trees, even when the sequences concatenated have evolved with very different substitution patterns and no attempts are made to accommodate these differences while inferring phylogenies. These results appear to hold true for parsimony and likelihood methods as well. The concatenation approach shows >95% accuracy with only 10 genes. However, this gain in accuracy is sometimes accompanied by reinforcement of certain systematic biases, resulting in spuriously high bootstrap support for incorrect partitions, whether we employ site, gene, or a combined bootstrap resampling approach. Therefore, it will be prudent to report the number of individual genes supporting an inferred clade in the concatenated sequence tree, in addition to the bootstrap support.  相似文献   

2.
We examine the impact of likelihood surface characteristics on phylogenetic inference. Amino acid data sets simulated from topologies with branch length features chosen to represent varying degrees of difficulty for likelihood maximization are analyzed. We present situations where the tree found to achieve the global maximum in likelihood is often not equal to the true tree. We use the program covSEARCH to demonstrate how the use of adaptively sized pools of candidate trees that are updated using confidence tests results in solution sets that are highly likely to contain the true tree. This approach requires more computation than traditional maximum likelihood methods, hence covSEARCH is best suited to small to medium-sized alignments or large alignments with some constrained nodes. The majority rule consensus tree computed from the confidence sets also proves to be different from the generating topology. Although low phylogenetic signal in the input alignment can result in large confidence sets of trees, some biological information can still be obtained based on nodes that exhibit high support within the confidence set. Two real data examples are analyzed: mammal mitochondrial proteins and a small tubulin alignment. We conclude that the technique of confidence set optimization can significantly improve the robustness of phylogenetic inference at a reasonable computational cost. Additionally, when either very short internal branches or very long terminal branches are present, confident resolution of specific bipartitions or subtrees, rather than whole-tree phylogenies, may be the most realistic goal for phylogenetic methods. [Reviewing Editor: Dr. Nicolas Galtier]  相似文献   

3.
Phylogenetic relationships between species of Allium section Cepa and A. rqylei (section Rhizirideum) have been inferred from nuclear DNA variation (RAPDs; nDNA dataset) and from morphological, pollen epidermis texture, chromosomal and chemical variation (supranuclear dataset). These sets were complemented with data, taken from the literature, on cpDNA variation and crossability. The trees produced with the supranuclear, nDNA and cpDNA datasets were compared by using the topology of the most parsimonious tree of one dataset as the constraint for the construction of a most parsimonious tree of another dataset. The accuracy of the trees were evaluated by calculating several Consistency and Incongruence Indices. The constrained tree of supranuclear-nDNA datasets showed the highest index values. The tree topologies of the supranuclear and cpDNA datasets were the least similar. The cpDNA tree and crossability dendrograms were identical. The most important difference between the nDNA-supranuclear trees and the cpDNA-crossability trees pertains to the position of Allium roylei , which is much closer to the clade A. cepa/A. vavilovii in the cpDNA tree than in the nDNA tree. This difference is considered to be the result of chloroplast capture from one species to another after an introgression event. A shorter distance between species inferred from a cpDNA tree than from a nDNA or comparable tree might be indicative for the level of crossability.  相似文献   

4.
The importance of fossils to phylogenetic reconstruction is well established. However, analyses of fossil data sets are confounded by problems related to the less complete nature of the specimens. Taxa that are incompletely known are problematic because of the uncertainty of their placement within a tree, leading to a proliferation of most-parsimonious solutions and "wild card" behavior. Problematic taxa are commonly deleted based on a priori criteria of completeness. Paradoxically, a taxon's problematic behavior is tree dependent, and levels of completeness are not directly associated with problematic behavior. Exclusion of taxa on the basis of completeness eliminates real character conflict and, by not allowing incomplete taxa to determine tree topology, diminishes the phylogenetic hypothesis. Here, the phylogenetic trunk approach is proposed to allow optimization of taxonomic inclusion and tree stability. The use of this method in an analysis of the Paleozoic Lepospondyli finds a single most-parsimonious tree, or trunk, after the removal of one taxon identified as being problematic. Moreover, the 38 trees found at one additional step from this primary trunk were reduced to 2 by removal of one additional taxon. These trunks are compared with the trees that were found by excluding taxa with various degrees of completeness, and the effects of incomplete taxa are explored with regard to use of the trunk. Correlated characters associated with limblessness are discussed regarding the assumption of character independence; however, inclusion of intermediate taxa is found to be the single best method for breaking down long branches.  相似文献   

5.
Semi-strict supertrees   总被引:3,自引:1,他引:2  
A method to calculate semi‐strict supertrees is proposed. The semi‐strict supertrees are calculated by creating the matrix that represents all the groups in the source trees (as done in already existing techniques), and then finding the trees determined by the ultra‐clique. The ultra‐clique is defined as the set of characters where each possible subset is compatible with each possible subset from the entire matrix. Finding the ultra‐clique is computationally complex (since in most cases many of the characters have missing entries), but a heuristic method yields reliable results. When the trees have no conflict, or when there are only two trees, the method produces the exact result for any ordering of the input trees and any ordering of the groups within them; when there are more than two trees and they have conflict, a single ordering or sequence can create some spurious groups, but doing multiple sequences eliminates the spurious groups. The method uses only state set operations, and is thus easily implemented in computer programs. Unlike any existing type of supertree, semi‐strict supertrees display all the groups, and only those groups, that are implied by at least some combination of the input trees and contradicted by none. The idea that supertrees should take into account the number of occurences of a given group, so as to retain some groups even in the case of conflict, is discussed; it is argued that a conceptual equivalent of the majority rule consensus is not possible when the sets of taxa differ among trees. Also, when pruning taxa from a set of trees, the supertree can display groups that contradict the consensus for the entire trees, suggesting that supertrees for matrices with very dissimilar sets of taxa should be interpreted with caution. If (for any valid reason) the data cannot be combined in a single matrix, it is advisable that the taxon sets in the matrices be as similar as possible.  相似文献   

6.
The beetle suborder Adephaga has been the subject of many phylogenetic reconstructions utilizing a variety of data sources and inference methods. However, no strong consensus has yet emerged on the relationships among major adephagan lineages. Ultraconserved elements (UCEs) have proved useful for inferring difficult or unresolved phylogenies at varying timescales in vertebrates, arachnids and Hymenoptera. Recently, a UCE bait set was developed for Coleoptera using polyphagan genomes and a member of the order Strepsiptera as an outgroup. Here, we examine the utility of UCEs for reconstructing the phylogeny of adephagan families, in the first in vitro application a UCE bait set in Coleoptera. Our final dataset included 305 UCE loci for 18 representatives of all adephagan families except Aspidytidae, and two polyphagan outgroups, with a total concatenated length of 83 547 bp. We inferred trees using maximum likelihood analyses of the concatenated UCE alignment and coalescent species tree methods (astral ii , ASTRID, svdquartets ). Although the coalescent species tree methods had poor resolution and weak support, concatenated analyses produced well‐resolved, highly supported trees. Hydradephaga was recovered as paraphyletic, with Gyrinidae sister to Geadephaga and all other adephagans. Haliplidae was recovered as sister to Dytiscoidea, with Hygrobiidae and Amphizoidae successive sisters to Dytiscidae. Finally, Noteridae was recovered as monophyletic and sister to Meruidae. Given the success of UCE data for resolving phylogenetic relationships within Adephaga, we suggest the potential for further resolution of relationships within Adephaga using UCEs with improved taxon sampling, and by developing Adephaga‐specific probes.  相似文献   

7.
Partitioned Bremer support (PBS) is a valuable means of assessing congruence in combined data sets, but some aspects require clarification. When more than one equally parsimonious tree is found during the constrained search for trees lacking the node of interest, averaging PBS for each data set across these trees can conceal conflict, and PBS should ideally be examined for each constrained tree. Similarly, when multiple most parsimonious trees (MPTs) are generated during analysis of the combined data, PBS is usually calculated on the consensus tree. However, extra information can be obtained if PBS is calculated on each of the MPTs or even suboptimal trees.  相似文献   

8.
Evolutionary trees were constructed, by distance methods, from an alignment of 225 complete large subunit (LSU) rRNA sequences, representing Eucarya, Archaea, Bacteria, plastids, and mitochondria. A comparison was made with trees based on sets of small subunit (SSU) rRNA sequences. Trees constructed on the set of 172 species and organelles for which the sequences of both molecules are known had a very similar topology, at least with respect to the divergence order of large taxa such as the eukaryotic kingdoms and the bacterial divisions. However, since there are more than ten times as many SSU as LSU rRNA sequences, it is possible to select many SSU rRNA sequence sets of equivalent size but different species composition. The topologies of these trees showed considerable differences according to the particular species set selected.The effect of the dataset and of different distance correction methods on tree topology was tested for both LSU and SSU rRNA by repetitive random sampling of a single species from each large taxon. The impact of the species set on the topology of the resulting consensus trees is much lower using LSU than using SSU rRNA. This might imply that LSU rRNA is a better molecule for studying wide-range relationships. The mitochondria behave clearly as a monophyletic group, clustering with the Proteobacteria. Gram-positive bacteria appear as two distinct groups, which are found clustered together in very few cases. Archaea behave as if monophyletic in most cases, but with a low confidence.Abbreviations LSU rRNA large subunit ribosomal RNA - SSU rRNA small subunit ribosomal RNA - JC Jukes and Cantor - JN Jin and Nei Correspondence to: R. De Wachter  相似文献   

9.
Given a collection of rooted phylogenetic trees with overlapping sets of leaves, a compatible supertree $S$ is a single tree whose set of leaves is the union of the input sets of leaves and such that $S$ agrees with each input tree when restricted to the leaves of the input tree. Typically with trees from real data, no compatible supertree exists, and various methods may be utilized to reconcile the incompatibilities in the input trees. This paper focuses on a measure of robustness of a supertree method called its ``radius" $R$. The larger the value of $R$ is, the further the data set can be from a natural correct tree $T$ and yet the method will still output $T$. It is shown that the maximal possible radius for a method is $R = 1/2$. Many familiar methods, both for supertrees and consensus trees, are shown to have $R = 0$, indicating that they need not output a tree $T$ that would seem to be the natural correct answer. A polynomial-time method Normalized Triplet Supertree (NTS) with the maximal possible $R = 1/2$ is defined. A geometric interpretion is given, and NTS is shown to solve an optimization problem. Additional properties of NTS are described.  相似文献   

10.
Despite numerous large-scale phylogenomic studies, certain parts of the mammalian tree are extraordinarily difficult to resolve. We used the coding regions from 19 completely sequenced genomes to study the relationships within the super-clade Euarchontoglires (Primates, Rodentia, Lagomorpha, Dermoptera and Scandentia) because the placement of Scandentia within this clade is controversial. The difficulty in resolving this issue is due to the short time spans between the early divergences of Euarchontoglires, which may cause incongruent gene trees. The conflict in the data can be depicted by network analyses and the contentious relationships are best reconstructed by coalescent-based analyses. This method is expected to be superior to analyses of concatenated data in reconstructing a species tree from numerous gene trees. The total concatenated dataset used to study the relationships in this group comprises 5,875 protein-coding genes (9,799,170 nucleotides) from all orders except Dermoptera (flying lemurs). Reconstruction of the species tree from 1,006 gene trees using coalescent models placed Scandentia as sister group to the primates, which is in agreement with maximum likelihood analyses of concatenated nucleotide sequence data. Additionally, both analytical approaches favoured the Tarsier to be sister taxon to Anthropoidea, thus belonging to the Haplorrhine clade. When divergence times are short such as in radiations over periods of a few million years, even genome scale analyses struggle to resolve phylogenetic relationships. On these short branches processes such as incomplete lineage sorting and possibly hybridization occur and make it preferable to base phylogenomic analyses on coalescent methods.  相似文献   

11.
The success of resampling approaches to branch support depends on the effectiveness of the underlying tree searches. Two primary factors are identified as key: the depth of tree search and the number of trees saved per resampling replicate. Two datasets were explored for a range of search parameters using jackknifing. Greater depth of tree search tends to increase support values because shorter trees conflict less with each other, while increasing numbers of trees saved tends to reduce support values because of conflict that reduces structure in the replicate consensus. Although a relatively small amount of branch swapping will achieve near‐accurate values for a majority of clades, some clades do not yield accurate values until more extensive searches are performed. This means that in order to maximize the accuracy of resampling analyses, one should employ as extensive a search strategy as possible, and save as many trees per replicate as possible. Strict consensus summary of resampling replicates is preferable to frequency‐within‐replicates summary because it is a more conservative approach to the reporting of replicate results. Jackknife analysis is preferable to bootstrap because of its closer relationship to the original data.© The Willi Hennig Society 2010.  相似文献   

12.

Background  

The availability of many gene alignments with overlapping taxon sets raises the question of which strategy is the best to infer species phylogenies from multiple gene information. Methods and programs abound that use the gene alignment in different ways to reconstruct the species tree. In particular, different methods combine the original data at different points along the way from the underlying sequences to the final tree. Accordingly, they are classified into superalignment, supertree and medium-level approaches. Here, we present a simulation study to compare different methods from each of these three approaches.  相似文献   

13.

Background  

Supertree methods combine phylogenies with overlapping sets of taxa into a larger one. Topological conflicts frequently arise among source trees for methodological or biological reasons, such as long branch attraction, lateral gene transfers, gene duplication/loss or deep gene coalescence. When topological conflicts occur among source trees, liberal methods infer supertrees containing the most frequent alternative, while veto methods infer supertrees not contradicting any source tree, i.e. discard all conflicting resolutions. When the source trees host a significant number of topological conflicts or have a small taxon overlap, supertree methods of both kinds can propose poorly resolved, hence uninformative, supertrees.  相似文献   

14.
Highly accurate estimation of phylogenetic trees for large data sets is difficult, in part because multiple sequence alignments must be accurate for phylogeny estimation methods to be accurate. Coestimation of alignments and trees has been attempted but currently only SATé estimates reasonably accurate trees and alignments for large data sets in practical time frames (Liu K., Raghavan S., Nelesen S., Linder C.R., Warnow T. 2009b. Rapid and accurate large-scale coestimation of sequence alignments and phylogenetic trees. Science. 324:1561-1564). Here, we present a modification to the original SATé algorithm that improves upon SATé (which we now call SATé-I) in terms of speed and of phylogenetic and alignment accuracy. SATé-II uses a different divide-and-conquer strategy than SATé-I and so produces smaller more closely related subsets than SATé-I; as a result, SATé-II produces more accurate alignments and trees, can analyze larger data sets, and runs more efficiently than SATé-I. Generally, SATé is a metamethod that takes an existing multiple sequence alignment method as an input parameter and boosts the quality of that alignment method. SATé-II-boosted alignment methods are significantly more accurate than their unboosted versions, and trees based upon these improved alignments are more accurate than trees based upon the original alignments. Because SATé-I used maximum likelihood (ML) methods that treat gaps as missing data to estimate trees and because we found a correlation between the quality of tree/alignment pairs and ML scores, we explored the degree to which SATé's performance depends on using ML with gaps treated as missing data to determine the best tree/alignment pair. We present two lines of evidence that using ML with gaps treated as missing data to optimize the alignment and tree produces very poor results. First, we show that the optimization problem where a set of unaligned DNA sequences is given and the output is the tree and alignment of those sequences that maximize likelihood under the Jukes-Cantor model is uninformative in the worst possible sense. For all inputs, all trees optimize the likelihood score. Second, we show that a greedy heuristic that uses GTR+Gamma ML to optimize the alignment and the tree can produce very poor alignments and trees. Therefore, the excellent performance of SATé-II and SATé-I is not because ML is used as an optimization criterion for choosing the best tree/alignment pair but rather due to the particular divide-and-conquer realignment techniques employed.  相似文献   

15.
Mitochondrial genomes provide a valuable dataset for phylogenetic studies, in particular of metazoan phylogeny because of the extensive taxon sample that is available. Beyond the traditional sequence-based analysis it is possible to extract phylogenetic information from the gene order. Here we present a novel approach utilizing these data based on cyclic list alignments of the gene orders. A progressive alignment approach is used to combine pairwise list alignments into a multiple alignment of gene orders. Parsimony methods are used to reconstruct phylogenetic trees, ancestral gene orders, and consensus patterns in a straightforward approach. We apply this method to study the phylogeny of protostomes based exclusively on mitochondrial genome arrangements. We, furthermore, demonstrate that our approach is also applicable to the much larger genomes of chloroplasts.  相似文献   

16.
We present a simple and effective method for combining distance matrices from multiple genes on identical taxon sets to obtain a single representative distance matrix from which to derive a combined-gene phylogenetic tree. The method applies singular value decomposition (SVD) to extract the greatest common signal present in the distances obtained from each gene. The first right eigenvector of the SVD, which corresponds to a weighted average of the distance matrices of all genes, can thus be used to derive a representative tree from multiple genes. We apply our method to three well known data sets and estimate the uncertainty using bootstrap methods. Our results show that this method works well for these three data sets and that the uncertainty in these estimates is small. A simulation study is conducted to compare the performance of our method with several other distance based approaches (namely SDM, SDM* and ACS97), and we find the performances of all these approaches are comparable in the consensus setting. The computational complexity of our method is similar to that of SDM. Besides constructing a representative tree from multiple genes, we also demonstrate how the subsequent eigenvalues and eigenvectors may be used to identify if there are conflicting signals in the data and which genes might be influential or outliers for the estimated combined-gene tree.  相似文献   

17.
Species tree methods have provided improvements for estimating species relationships and the timing of diversification in recent radiations by allowing for gene tree discordance. Although gene tree discordance is often observed, most discordance is attributed to incomplete lineage sorting rather than other biological phenomena, and the causes of discordance are rarely investigated. We use species trees from multi-locus data to estimate the species relationships, evolutionary history and timing of diversification among Australian Gehyra—a group renowned for taxonomic uncertainty and showing a large degree of gene tree discordance. We find support for a recent Asian origin and two major clades: a tropically adapted clade and an arid adapted clade, with some exceptions, but no support for allopatric speciation driven by chromosomal rearrangement in the group. Bayesian concordance analysis revealed high gene tree discordance and comparisons of Robinson–Foulds distances showed that discordance between gene trees was significantly higher than that generated by topological uncertainty within each gene. Analysis of gene tree discordance and incomplete taxon sampling revealed that gene tree discordance was high whether terminal taxon or gene sampling was maximized, indicating discordance is due to biological processes, which may be important in contributing to gene tree discordance in many recently diversified organisms.  相似文献   

18.
MOTIVATION: Phylogenetic analyses often produce thousands of candidate trees. Biologists resolve the conflict by computing the consensus of these trees. Single-tree consensus as postprocessing methods can be unsatisfactory due to their inherent limitations. RESULTS: In this paper we present an alternative approach by using clustering algorithms on the set of candidate trees. We propose bicriterion problems, in particular using the concept of information loss, and new consensus trees called characteristic trees that minimize the information loss. Our empirical study using four biological datasets shows that our approach provides a significant improvement in the information content, while adding only a small amount of complexity. Furthermore, the consensus trees we obtain for each of our large clusters are more resolved than the single-tree consensus trees. We also provide some initial progress on theoretical questions that arise in this context.  相似文献   

19.
With approximately 3000 marine species, Tunicata represents the most disparate subtaxon of Chordata. Molecular phylogenetic studies support Tunicata as sister taxon to Craniota, rendering it pivotal to understanding craniate evolution. Although successively more molecular data have become available to resolve internal tunicate phylogenetic relationships, phenotypic data have not been utilized consistently. Herein these shortcomings are addressed by cladistically analyzing 117 phenotypic characters for 49 tunicate species comprising all higher tunicate taxa, and five craniate and cephalochordate outgroup species. In addition, a combined analysis of the phenotypic characters with 18S rDNA-sequence data is performed in 32 OTUs. The analysis of the combined data is congruent with published molecular analyses. Successively up-weighting phenotypic characters indicates that phenotypic data contribute disproportionally more to the resulting phylogenetic hypothesis. The strict consensus tree from the analysis of the phenotypic characters as well as the single most parsimonious tree found in the analysis of the combined dataset recover monophyletic Appendicularia as sister taxon to the remaining tunicate taxa. Thus, both datasets support the hypothesis that the last common ancestor of Tunicata was free-living and that ascidian sessility is a derived trait within Tunicata. “Thaliacea” is found to be paraphyletic with Pyrosomatida as sister taxon to monophyletic Ascidiacea and the relationship between Doliolida and Salpida is unresolved in the analysis of morphological characters; however, the analysis of the combined data reconstructs Thaliacea as monophyletic nested within paraphyletic “Ascidiacea”. Therefore, both datasets differ in the interpretation of the evolution of the complex holoplanktonic life history of thaliacean taxa. According to the phenotypic data, this evolution occurred in the plankton, whereas from the combined dataset a secondary transition into the plankton from a sessile ascidian is inferred. Besides these major differences, both analyses are in accord on many phylogenetic groupings, although both phylogenetic reconstructions invoke a high degree of homoplasy. In conclusion, this study represents the first serious attempt to utilize the potential phylogenetic information present in phenotypic characters to elucidate the inter-relationships of this diverse marine taxon in a consistent cladistic framework.  相似文献   

20.
New examples are presented, showing that supertree methods such as matrix representation with parsimony, minimum flip trees, and compatibility analysis of the matrix representing the input trees, produce supertrees that cannot be interpreted as displaying the groups present in the majority of the input trees. These methods may produce a supertree displaying some groups present in the minority of the trees, and contradicted by the majority. Of the three methods, compatibility analysis is the least used, but it seems to be the one that differs the least from majority rule consensus. The three methods are similar in that they choose the supertree(s) that best fit the set of input trees (quantified as some measure of the fit to the matrix representation of the input trees); in the case of complete trees, it is argued that, for a supertree method to be equivalent to majority rule or frequency difference consensus, two necessary (but not sufficient) conditions must be met. First, the measure of fit between a supertree and an input tree must be symmetrical. Second, the fit for a character representing a group must be measured as absolute: either it fits or it does not fit. In the restricted case of complete and equally resolved input trees, compatibility analysis (unlike MRP and minimum flipping) fulfils these two conditions: it is symmetrical (i.e., as long as the trees have the same taxon sets and are equally resolved, the number of characters in the matrix representation of tree A that require homoplasy in tree B is always the same as the number of characters in the matrix representation of tree B that require homoplasy in tree A) and it measures fit as all‐or‐none. In the case of just two complete and equally resolved input trees, the two conditions (symmetry and absolute fit) are necessary and sufficient, which explains why the compatibility analysis of such trees behaves as majority consensus. With more than two such trees, these conditions are still necessary but no longer sufficient for the equivalence; in such cases, the compatibility supertree may differ significantly from the majority rule consensus, even when these conditions apply (as shown by example). MRP and minimum flipping are asymmetric and measure various degrees of fit for each character, which explains why they often behave very differently from majority rule procedures, and why they are very likely to have groups contradicted by each of the input trees, or groups supported by a minority of the input trees. © The Willi Hennig Society 2005.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号