首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Although 11 studies have addressed the systematics of the four families and 281 fish species of the ecomorphologically diverse Anostomoidea, none has proposed a global hypothesis of relationships. We synthesized these studies to yield a supermatrix with 463 morphological characters among 174 ingroup species, and inferred phylogeny with parsimony and Bayesian optimization. We evaluated the applicability of the supermatrix approach to morphological datasets, tested its sensitivity to missing data, determined the impact of homoplastic characters on phylogenetic resolution, and determined the distribution of homologies and homoplasies on the topology. Despite more than 60% missing data, analyses supported the monophyly of all families, and phylogenetic structure degraded only with inclusion of species with high percentages of missing data and in analyses limited to homoplasies. The latter differs modestly from the full matrix indicating phylogenetic signal in homoplastic characters. Character distributions differ across the phylogeny, with a greater prevalence of homologies at deeper nodes and homoplasies nearer the tips than expected by chance. This may suggest early diversification into distinct bauplans with subsequent diversification of faster evolving character systems. The morphological supermatrix approach is powerful and allows integration of classical data with modern methods to examine the evolution of multiple character systems.  相似文献   

2.
Many research groups are estimating trees containing anywhere from a few thousands to hundreds of thousands of species, toward the eventual goal of the estimation of a Tree of Life, containing perhaps as many as several million leaves. These phylogenetic estimations present enormous computational challenges, and current computational methods are likely to fail to run even on data sets in the low end of this range. One approach to estimate a large species tree is to use phylogenetic estimation methods (such as maximum likelihood) on a supermatrix produced by concatenating multiple sequence alignments for a collection of markers; however, the most accurate of these phylogenetic estimation methods are extremely computationally intensive for data sets with more than a few thousand sequences. Supertree methods, which assemble phylogenetic trees from a collection of trees on subsets of the taxa, are important tools for phylogeny estimation where phylogenetic analyses based upon maximum likelihood (ML) are infeasible. In this paper, we introduce SuperFine, a meta-method that utilizes a novel two-step procedure in order to improve the accuracy and scalability of supertree methods. Our study, using both simulated and empirical data, shows that SuperFine-boosted supertree methods produce more accurate trees than standard supertree methods, and run quickly on very large data sets with thousands of sequences. Furthermore, SuperFine-boosted matrix representation with parsimony (MRP, the most well-known supertree method) approaches the accuracy of ML methods on supermatrix data sets under realistic conditions.  相似文献   

3.
It has been proposed that supertree approaches should be applied to large multilocus datasets to achieve computational tractability. Large datasets such as those derived from phylogenomics studies can be broken into many locus‐specific tree searches and the resulting trees can be stitched together via a supertree method. Using simulated data, workers have reported that they can rapidly construct a supertree that is comparable to the results of heuristic tree search on the entire dataset. To test this assertion with organismal data, we compare tree length under the parsimony criterion and computational time for 20 multilocus datasets using supertree (SuperFine and SuperTriplets) and supermatrix (heuristic search in TNT) approaches. Tree length and computational times were compared among methods using the Wilcoxon matched‐pairs signed rank test. Supermatrix searches produced significantly shorter trees than either supertree approach (SuperFine or SuperTriplets; P < 0.0002 in both cases). Moreover, the processing time of supermatrix search was significantly lower than SuperFine+locus‐specific search (P < 0.01) but roughly equivalent to that of SuperTriplets+locus‐specific search (P > 0.4, not significant). In conclusion, we show by using real rather than simulated data that there is no basis, either in time tractability or in tree length, for use of supertrees over heuristic tree search using a supermatrix for phylogenomics.  相似文献   

4.
Nonparamtric bootstrapping methods may be useful for assessing confidence in a supertree inference. We examined the performance of two supertree bootstrapping methods on four published data sets that each include sequence data from more than 100 genes. In "input tree bootstrapping," input gene trees are sampled with replacement and then combined in replicate supertree analyses; in "stratified bootstrapping," trees from each gene's separate (conventional) bootstrap tree set are sampled randomly with replacement and then combined. Generally, support values from both supertree bootstrap methods were similar or slightly lower than corresponding bootstrap values from a total evidence, or supermatrix, analysis. Yet, supertree bootstrap support also exceeded supermatrix bootstrap support for a number of clades. There was little overall difference in support scores between the input tree and stratified bootstrapping methods. Results from supertree bootstrapping methods, when compared to results from corresponding supermatrix bootstrapping, may provide insights into patterns of variation among genes in genome-scale data sets.  相似文献   

5.
Next-generation sequencing and phylogenomics hold great promise for elucidating complex relationships among large plant families. Here, we performed targeted capture of low copy sequences followed by next-generation sequencing on the Illumina platform in the large and diverse angiosperm family Compositae (Asteraceae). The family is monophyletic, based on morphology and molecular data, yet many areas of the phylogeny have unresolved polytomies and interpreting phylogenetic patterns has been historically difficult. In order to outline a method and provide a framework and for future phylogenetic studies in the Compositae, we sequenced 23 taxa from across the family in which the relationships were well established as well as a member of the sister family Calyceraceae. We generated nuclear data from 795 loci and assembled chloroplast genomes from off-target capture reads enabling the comparison of nuclear and chloroplast genomes for phylogenetic analyses. We also analyzed multi-copy nuclear genes in our data set using a clustering method during orthology detection, and we applied a network approach to these clusters—analyzing all related locus copies. Using these data, we produced hypotheses of phylogenetic relationships employing both a conservative (restricted to only loci with one copy per targeted locus) and a multigene approach (including all copies per targeted locus). The methods and bioinformatics workflow presented here provide a solid foundation for future work aimed at understanding gene family evolution in the Compositae as well as providing a model for phylogenomic analyses in other plant mega-families.  相似文献   

6.
The macroevolutionary transition of whales (cetaceans) from a terrestrial quadruped to an obligate aquatic form involved major changes in sensory abilities. Compared to terrestrial mammals, the olfactory system of baleen whales is dramatically reduced, and in toothed whales is completely absent. We sampled the olfactory receptor (OR) subgenomes of eight cetacean species from four families. A multigene tree of 115 newly characterized OR sequences from these eight species and published data for Bos taurus revealed a diverse array of class II OR paralogues in Cetacea. Evolution of the OR gene superfamily in toothed whales (Odontoceti) featured a multitude of independent pseudogenization events, supporting anatomical evidence that odontocetes have lost their olfactory sense. We explored the phylogenetic utility of OR pseudogenes in Cetacea, concentrating on delphinids (oceanic dolphins), the product of a rapid evolutionary radiation that has been difficult to resolve in previous studies of mitochondrial DNA sequences. Phylogenetic analyses of OR pseudogenes using both gene-tree reconciliation and supermatrix methods yielded fully resolved, consistently supported relationships among members of four delphinid subfamilies. Alternative minimizations of gene duplications, gene duplications plus gene losses, deep coalescence events, and nucleotide substitutions plus indels returned highly congruent phylogenetic hypotheses. Novel DNA sequence data for six single-copy nuclear loci and three mitochondrial genes (> 5000 aligned nucleotides) provided an independent test of the OR trees. Nucleotide substitutions and indels in OR pseudogenes showed a very low degree of homoplasy in comparison to mitochondrial DNA and, on average, provided more variation than single-copy nuclear DNA. Our results suggest that phylogenetic analysis of the large OR superfamily will be effective for resolving relationships within Cetacea whether supermatrix or gene-tree reconciliation procedures are used.  相似文献   

7.
Given that most species that have ever existed on Earth are extinct, no evolutionary history can ever be complete without the inclusion of fossil taxa. Bovids (antelopes and relatives) are one of the most diverse clades of large mammals alive today, with over a hundred living species and hundreds of documented fossil species. With the advent of molecular phylogenetics, major advances have been made in the phylogeny of this clade; however, there has been little attempt to integrate the fossil record into the developing phylogenetic picture. We here describe a new large fossil caprin species from ca. 1.9-Ma deposits from the Middle Awash, Ethiopia. To place the new species phylogenetically, we perform a Bayesian analysis of a combined molecular (cytochrome b) and morphological (osteological) character supermatrix. We include all living species of Caprini, the new fossil species, a fossil takin from the Pliocene of Ethiopia (Budorcas churcheri), and the insular subfossil Myotragus balearicus. The combined analysis demonstrates successful incorporation of both living and fossil species within a single phylogeny based on both molecular and morphological evidence. Analysis of the combined supermatrix produces superior resolution than with either the molecular or morphological data sets considered alone. Parsimony and Bayesian analyses of the data set are also compared and shown to produce similar results. The combined phylogenetic analysis indicates that the new fossil species is nested within Capra, making it one of the earliest representatives of this clade, with implications for molecular clock calibration. Geographical optimization indicates no less than four independent dispersals into Africa by caprins since the Pliocene.  相似文献   

8.
Ren F  Tanaka H  Yang Z 《Gene》2009,441(1-2):119-125
Supermatrix and supertree methods are two strategies advocated for phylogenetic analysis of sequence data from multiple gene loci, especially when some species are missing at some loci. The supermatrix method concatenates sequences from multiple genes into a data supermatrix for phylogenetic analysis, and ignores differences in evolutionary dynamics among the genes. The supertree method analyzes each gene separately and assembles the subtrees estimated from individual genes into a supertree for all species. Most algorithms suggested for supertree construction lack statistical justifications and ignore uncertainties in the subtrees. Instead of supermatrix or supertree, we advocate the use of likelihood function to combine data from multiple genes while accommodating their differences in the evolutionary process. This combines the strengths of the supermatrix and supertree methods while avoiding their drawbacks. We conduct computer simulation to evaluate the performance of the supermatrix, supertree, and maximum likelihood methods applied to two phylogenetic problems: molecular-clock dating of species divergences and reconstruction of species phylogenies. The results confirm the theoretical superiority of the likelihood method. Supertree or separate analyses of data of multiple genes may be useful in revealing the characteristics of the evolutionary process of multiple gene loci, and the information may be used to formulate realistic models for combined analysis of all genes by likelihood.  相似文献   

9.

Background  

Supertree methods comprise one approach to reconstructing large molecular phylogenies given multi-marker datasets: trees are estimated on each marker and then combined into a tree (the "supertree") on the entire set of taxa. Supertrees can be constructed using various algorithmic techniques, with the most common being matrix representation with parsimony (MRP). When the data allow, the competing approach is a combined analysis (also known as a "supermatrix" or "total evidence" approach) whereby the different sequence data matrices for each of the different subsets of taxa are concatenated into a single supermatrix, and a tree is estimated on that supermatrix.  相似文献   

10.
Supermatrix and supertree are two methods for constructing a phylogenetic tree by using multiple data sets. However, these methods are not a panacea, as conflicting signals between data sets can lead to misinterpret the evolutionary history of taxa. In particular, the supermatrix approach is expected to be misleading if the species-tree signal is not dominant after the combination of the data sets. Moreover, most current supertree methods suffer from two limitations: (i) they ignore or misinterpret secondary (non-dominant) phylogenetic signals of the different data sets; and (ii) the logical basis of node robustness measures is unclear.To overcome these limitations, we propose a new approach, called SuperTRI, which is based on the branch support analyses of the independent data sets, and where the reliability of the nodes is assessed using three measures: the supertree Bootstrap percentage and two other values calculated from the separate analyses: the mean branch support (mean Bootstrap percentage or mean posterior probability) and the reproducibility index.The SuperTRI approach is tested on a data matrix including seven genes for 82 taxa of the family Bovidae (Mammalia, Ruminantia), and the results are compared to those found with the supermatrix approach. The phylogenetic analyses of the supermatrix and independent data sets were done using four methods of tree reconstruction: Bayesian inference, maximum likelihood, and unweighted and weighted maximum parsimony. The results indicate, firstly, that the SuperTRI approach shows less sensitivity to the four phylogenetic methods, secondly, that it is more accurate to interpret the relationships among taxa, and thirdly, that interesting conclusions on introgression and radiation can be drawn from the comparisons between SuperTRI and supermatrix analyses. To cite this article: A. Ropiquet et al., C. R. Biologies 332 (2009).  相似文献   

11.
For the predominantly southern hemisphere plant group Styphelioideae (Ericaceae) published sequence datasets of five markers are now available for all except one of the 38 recognised genera. However, several markers are highly incomplete therefore missing data is problematic for producing a genus level phylogeny. We explore the relative utility of supertree and supermatrix approaches for addressing this challenge, and examine the effects of missing data on tree topology and resolution. Although the supertree approach returned a more conservative hypothesis, overall, both supermatrix and supertree analyses concurred in the topologies they returned. Using multiple genes and a dataset of variably complete taxa we found improved support for the monophyly and position of the tribes and genus level relationships. However, there was mixed support for the Richeeae tribe appearing one node basal to the Cosmelieae tribe or vice versa. It is probable that this will only be resolved through further sequencing. Our study supports previous findings that the amount of data is more critical than the completeness of the dataset in estimating well-resolved trees. Our results suggest that a “serendipitous” scaffolding approach that includes a mixture of well and poorly sequenced taxa can lead to robust phylogenetic hypotheses.  相似文献   

12.
Recent phylogenetic analyses of a large dataset for mammalian families (169 taxa, 26 loci) portray contrasting results. Supermatrix (concatenation) methods support a generally robust tree with only a few inconsistently resolved polytomies, whereas MP‐EST coalescence analysis of the same dataset yields a weakly supported tree that conflicts with many traditionally recognized clades. Here, we evaluate this discrepancy via improved coalescence analyses with reference to the rich history of phylogenetic studies on mammals. This integration clearly demonstrates that both supermatrix and coalescence analyses of just 26 loci yield a congruent, well‐supported phylogenetic hypothesis for Mammalia. Discrepancies between published studies are explained by implementation of overly simple DNA substitution models, inadequate tree‐search routines and limitations of the MP‐EST method. We develop a simple measure, partitioned coalescence support (PCS), which summarizes the distribution of support and conflict among gene trees for a given clade. Extremely high PCS scores for outlier gene trees at two nodes in the mammalian tree indicate a troubling bias in the MP‐EST method. We conclude that in this age of phylogenomics, a solid understanding of systematics fundamentals, choice of valid methodology and a broad knowledge of a clade's taxonomic history are still required to yield coherent phylogenetic inferences.  相似文献   

13.
In just the past 20 years systematics has progressed from the sequencing of individual genes for a few taxa to routine sequencing of complete plastid and even nuclear genomes. Recent technological advances have made it possible to compile very large data sets, the analyses of which have in turn provided unprecedented insights into phylogeny and evolution. Indeed, this narrow window of a few decades will likely be viewed as a golden era in systematics. Relationships have been resolved at all taxonomic levels across all groups of photosynthetic life. In the angiosperms, problematic deep-level relationships have either been largely resolved, or will be resolved within the next several years. The same large data sets have also provided new insights into the many rapid radiations that have characterized angiosperm evolution. For example, all of the major lineages of angiosperms likely arose within a narrow window of just a few million years. At the population level, the ease of DNA sequencing has given new life to phylogeographic studies, and microsatellite analyses have become more commonplace, with a concomitant impact on conservation and population biology. With the wealth of sequence data soon to be available, we are on the cusp of assembling the first semi-comprehensive tree of life for many of the 15,000 genera of flowering plants and indeed for much of green life. Accompanying these opportunities are also enormous new computational/informatic challenges including the management and phylogenetic analysis of such large, sometimes fragmentary data sets, and visualization of trees with thousands of terminals.  相似文献   

14.
The millions of herbarium specimens in collections around the world provide historical resources for phylogenomics and evolutionary studies. Many rare and endangered species exist only as historical specimens. Here, we report a case study of the monotypic Pseudobartsia yunnanensis D. Y. Hong (=Pseudobartsia glandulosa[Bentham] W. B. Yu & D. Z. Li: Orobanchaceae) known from a single Chinese collection taken in 1940. We obtained genomic data of Pseudobartsia glandulosa using high-throughput short-read sequencing, and then assembled a complete chloroplast genome and nuclear ribosome DNA region in this study. We found that the newly assembled three plastid DNA regions (atpB-rbcL, rpl16, and trnS-G) and nuclear ribosomal internal transcribed spacer (nrITS) of Pseudobartsia glandulosa were more than 99.98% similar to published sequences obtained by target sequencing. Phylogenies of Orobanchaceae using 30 plastomes (including 10 new plastomes), using both supermatrix and multispecies coalescent approaches following a novel plastid phylogenomic workflow, recovered seven recognized tribes and two unranked groups, both of which were proposed as new tribes, that is, Brandisieae and Pterygielleae. Within Pterygielleae, all analyses strongly supported Xizangia D. Y. Hong as the first diverging genus, with Pseudobartsia D. Y. Hong as sister to Pterygiella Oliver + Phtheirospermum Bunge (excluding Phtheirospermum japonicum [Thunberg] Kanitz); this supports reinstatement of Pseudobartsia and Xizangia. Although elements of Buchnereae-Cymbarieae-Orobancheae and Brandisieae-Pterygielleae-Rhinantheae showed incongruence among gene trees, the topology of the supermatrix tree was congruent with the majority of gene trees and functional-group trees. Therefore, most plastid genes are evolving as a linkage group, allowing the supermatrix tree approach to yield internally consistent phylogenies for Orobanchaceae.  相似文献   

15.
Non-random distributions of missing data are a general problem for likelihood-based statistical analyses, including those in a phylogenetic context. Extensive non-randomly distributed missing data are particularly problematic in supermatrix analyses that include many terminals and/or loci. It has been widely reported that missing data can lead to loss of resolution, but only very rarely create misleading or otherwise unsupported results in a parsimony context. Yet this does not hold for all parametric-based analyses because of their assumption of homogeneity across characters and lineages, which can lead to both long-branch attraction and long-branch repulsion. Contrived examples were used to demonstrate that non-random distributions of missing data, even without rate heterogeneity among characters and a well fitting model, can provide misleading likelihood-based topologies and branch-support values that are radically unstable based on slight modifications to character sampling. The same can occur despite complete absence of parsimony-informative characters. Otherwise unsupported resolution and high branch support for these clades were found to occur frequently in 22 empirical examples derived from a published supermatrix. Partitioning characters based on the distribution of missing data helped to decrease, but did not eliminate, these artifacts. These artifacts were exacerbated by low quality tree searches, particularly when holding only a single optimal tree that must be fully resolved.  相似文献   

16.
17.
For the last 2 decades, supertree reconstruction has been an active field of research and has seen the development of a large number of major algorithms. Because of the growing popularity of the supertree methods, it has become necessary to evaluate the performance of these algorithms to determine which are the best options (especially with regard to the supermatrix approach that is widely used). In this study, seven of the most commonly used supertree methods are investigated by using a large empirical data set (in terms of number of taxa and molecular markers) from the worldwide flowering plant family Sapindaceae. Supertree methods were evaluated using several criteria: similarity of the supertrees with the input trees, similarity between the supertrees and the total evidence tree, level of resolution of the supertree and computational time required by the algorithm. Additional analyses were also conducted on a reduced data set to test if the performance levels were affected by the heuristic searches rather than the algorithms themselves. Based on our results, two main groups of supertree methods were identified: on one hand, the matrix representation with parsimony (MRP), MinFlip, and MinCut methods performed well according to our criteria, whereas the average consensus, split fit, and most similar supertree methods showed a poorer performance or at least did not behave the same way as the total evidence tree. Results for the super distance matrix, that is, the most recent approach tested here, were promising with at least one derived method performing as well as MRP, MinFlip, and MinCut. The output of each method was only slightly improved when applied to the reduced data set, suggesting a correct behavior of the heuristic searches and a relatively low sensitivity of the algorithms to data set sizes and missing data. Results also showed that the MRP analyses could reach a high level of quality even when using a simple heuristic search strategy, with the exception of MRP with Purvis coding scheme and reversible parsimony. The future of supertrees lies in the implementation of a standardized heuristic search for all methods and the increase in computing power to handle large data sets. The latter would prove to be particularly useful for promising approaches such as the maximum quartet fit method that yet requires substantial computing power.  相似文献   

18.
The inference of population divergence times and branching patterns is of fundamental importance in many population genetic analyses. Many methods have been developed for estimating population divergence times, and recently, there has been particular attention towards genome-wide single-nucleotide polymorphisms (SNP) data. However, most SNP data have been affected by an ascertainment bias caused by the SNP selection and discovery protocols. Here, we present a modification of an existing maximum likelihood method that will allow approximately unbiased inferences when ascertainment is based on a set of outgroup populations. We also present a method for estimating trees from the asymmetric dissimilarity measures arising from pairwise divergence time estimation in population genetics. We evaluate the methods by simulations and by applying them to a large SNP data set of seven East Asian populations.  相似文献   

19.
Quantifying branch support using the bootstrap and/or jackknife is generally considered to be an essential component of rigorous parsimony and maximum likelihood phylogenetic analyses. Previous authors have described how application of the frequency-within-replicates approach to treating multiple equally optimal trees found in a given bootstrap pseudoreplicate can provide apparent support for otherwise unsupported clades. We demonstrate how a similar problem may occur when a non-representative subset of equally optimal trees are held per pseudoreplicate, which we term the undersampling-within-replicates artifact. We illustrate the frequency-within-replicates and undersampling-within-replicates bootstrap and jackknife artifacts using both contrived and empirical examples, demonstrate that the artifacts can occur in both parsimony and likelihood analyses, and show that the artifacts occur in outputs from multiple different phylogenetic-inference programs. Based on our results, we make the following five recommendations, which are particularly relevant to supermatrix analyses, but apply to all phylogenetic analyses. First, when two or more optimal trees are found in a given pseudoreplicate they should be summarized using the strict-consensus rather than frequency-within-replicates approach. Second jackknife resampling should be used rather than bootstrap resampling. Third, multiple tree searches while holding multiple trees per search should be conducted in each pseudoreplicate rather than conducting only a single search and holding only a single tree. Fourth, branches with a minimum possible optimized length of zero should be collapsed within each tree search rather than collapsing branches only if their maximum possible optimized length is zero. Fifth, resampling values should be mapped onto the strict consensus of all optimal trees found rather than simply presenting the ≥ 50% bootstrap or jackknife tree or mapping the resampling values onto a single optimal tree.  相似文献   

20.
For more than 10 years, systematists have been debating the superiority of character or taxonomic congruence in phylogenetic analysis. In this paper, we demonstrate that the competing approaches can converge to the same solution when a consensus method that accounts for branch lengths is selected. Thus, we propose to use both methods in combination, as a way to corroborate the results of combined and separate analyses. This so-called "global congruence" approach is tested with a wide variety of examples sampled from the literature, and the results are compared with those obtained by standard consensus methods. Our analyses show that when the total evidence and consensus trees differ topologically, collapsing weakly supported nodes with low bootstrap support usually improves "global congruence".  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号