首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Incomplete lineage sorting can cause incongruence between the phylogenetic history of genes (the gene tree) and that of the species (the species tree), which can complicate the inference of phylogenies. In this article, I present a new coalescent-based algorithm for species tree inference with maximum likelihood. I first describe an improved method for computing the probability of a gene tree topology given a species tree, which is much faster than an existing algorithm by Degnan and Salter (2005). Based on this method, I develop a practical algorithm that takes a set of gene tree topologies and infers species trees with maximum likelihood. This algorithm searches for the best species tree by starting from initial species trees and performing heuristic search to obtain better trees with higher likelihood. This algorithm, called STELLS (which stands for Species Tree InfErence with Likelihood for Lineage Sorting), has been implemented in a program that is downloadable from the author's web page. The simulation results show that the STELLS algorithm is more accurate than an existing maximum likelihood method for many datasets, especially when there is noise in gene trees. I also show that the STELLS algorithm is efficient and can be applied to real biological datasets.  相似文献   

2.
RAxML-VI-HPC (randomized axelerated maximum likelihood for high performance computing) is a sequential and parallel program for inference of large phylogenies with maximum likelihood (ML). Low-level technical optimizations, a modification of the search algorithm, and the use of the GTR+CAT approximation as replacement for GTR+Gamma yield a program that is between 2.7 and 52 times faster than the previous version of RAxML. A large-scale performance comparison with GARLI, PHYML, IQPNNI and MrBayes on real data containing 1000 up to 6722 taxa shows that RAxML requires at least 5.6 times less main memory and yields better trees in similar times than the best competing program (GARLI) on datasets up to 2500 taxa. On datasets > or =4000 taxa it also runs 2-3 times faster than GARLI. RAxML has been parallelized with MPI to conduct parallel multiple bootstraps and inferences on distinct starting trees. The program has been used to compute ML trees on two of the largest alignments to date containing 25,057 (1463 bp) and 2182 (51,089 bp) taxa, respectively. AVAILABILITY: icwww.epfl.ch/~stamatak  相似文献   

3.
treegraph assists in producing complex ready‐to‐publish figures of phylogenetic trees. The TGF format used by the program automates formatting of several different statistical support value types (confidence estimates) per tree node. Moreover, internal text and graphical labels are automatically arranged at the nodes as are annotations for clades or groups of terminals. treegraph imports nexus trees and related file formats. Beyond common tree edit operations, simultaneous pruning of subtrees (simplification of the tree to higher order clades) and saving of subtrees is possible. treegraph exports to the standard vector graphics formats Scalable Vector Graphics and PostScript.  相似文献   

4.
The Bayesian method of phylogenetic inference often produces high posterior probabilities (PPs) for trees or clades, even when the trees are clearly incorrect. The problem appears to be mainly due to large sizes of molecular datasets and to the large-sample properties of Bayesian model selection and its sensitivity to the prior when several of the models under comparison are nearly equally correct (or nearly equally wrong) and are of the same dimension. A previous suggestion to alleviate the problem is to let the internal branch lengths in the tree become increasingly small in the prior with the increase in the data size so that the bifurcating trees are increasingly star-like. In particular, if the internal branch lengths are assigned the exponential prior, the prior mean mu0 should approach zero faster than 1/square root n but more slowly than 1/n, where n is the sequence length. This paper examines the usefulness of this data size-dependent prior using a dataset of the mitochondrial protein-coding genes from the baleen whales, with the prior mean fixed at mu0=0.1n(-2/3). In this dataset, phylogeny reconstruction is sensitive to the assumed evolutionary model, species sampling and the type of data (DNA or protein sequences), but Bayesian inference using the default prior attaches high PPs for conflicting phylogenetic relationships. The data size-dependent prior alleviates the problem to some extent, giving weaker support for unstable relationships. This prior may be useful in reducing apparent conflicts in the results of Bayesian analysis or in making the method less sensitive to model violations.  相似文献   

5.
Genome-scale data have greatly facilitated the resolution of recalcitrant nodes that Sanger-based datasets have been unable to resolve. However, phylogenomic studies continue to use traditional methods such as bootstrapping to estimate branch support; and high bootstrap values are still interpreted as providing strong support for the correct topology. Furthermore, relatively little attention has been given to assessing discordances between gene and species trees, and the underlying processes that produce phylogenetic conflict. We generated novel genomic datasets to characterize and determine the causes of discordance in Old World treefrogs (Family: Rhacophoridae)—a group that is fraught with conflicting and poorly supported topologies among major clades. Additionally, a suite of data filtering strategies and analytical methods were applied to assess their impact on phylogenetic inference. We showed that incomplete lineage sorting was detected at all nodes that exhibited high levels of discordance. Those nodes were also associated with extremely short internal branches. We also clearly demonstrate that bootstrap values do not reflect uncertainty or confidence for the correct topology and, hence, should not be used as a measure of branch support in phylogenomic datasets. Overall, we showed that phylogenetic discordances in Old World treefrogs resulted from incomplete lineage sorting and that species tree inference can be improved using a multi-faceted, total-evidence approach, which uses the most amount of data and considers results from different analytical methods and datasets.  相似文献   

6.
Comparison of the performance and accuracy of different inference methods, such as maximum likelihood (ML) and Bayesian inference, is difficult because the inference methods are implemented in different programs, often written by different authors. Both methods were implemented in the program MIGRATE, that estimates population genetic parameters, such as population sizes and migration rates, using coalescence theory. Both inference methods use the same Markov chain Monte Carlo algorithm and differ from each other in only two aspects: parameter proposal distribution and maximization of the likelihood function. Using simulated datasets, the Bayesian method generally fares better than the ML approach in accuracy and coverage, although for some values the two approaches are equal in performance. MOTIVATION: The Markov chain Monte Carlo-based ML framework can fail on sparse data and can deliver non-conservative support intervals. A Bayesian framework with appropriate prior distribution is able to remedy some of these problems. RESULTS: The program MIGRATE was extended to allow not only for ML(-) maximum likelihood estimation of population genetics parameters but also for using a Bayesian framework. Comparisons between the Bayesian approach and the ML approach are facilitated because both modes estimate the same parameters under the same population model and assumptions.  相似文献   

7.
The root lesion nematodes of the genus Pratylenchus Filipjev, 1936 are migratory endoparasites of plant roots, considered among the most widespread and important nematode parasites in a variety of crops. We obtained gene sequences from the D2 and D3 expansion segments of 28S rRNA partial and 18S rRNA from 31 populations belonging to 11 valid and two unidentified species of root lesion nematodes and five outgroup taxa. These datasets were analyzed using maximum parsimony and Bayesian inference. The alignments were generated using the secondary structure models for these molecules and analyzed with Bayesian inference under the standard models and the complex model, considering helices under the doublet model and loops and bulges under the general time reversible model. The phylogenetic informativeness of morphological characters is tested by reconstruction of their histories on rRNA based trees using parallel parsimony and Bayesian approaches. Phylogenetic and sequence analyses of the 28S D2–D3 dataset with 145 accessions for 28 species and 18S dataset with 68 accessions for 15 species confirmed among large numbers of geographical diverse isolates that most classical morphospecies are monophyletic. Phylogenetic analyses revealed at least six distinct major clades of examined Pratylenchus species and these clades are generally congruent with those defined by characters derived from lip patterns, numbers of lip annules, and spermatheca shape. Morphological results suggest the need for sophisticated character discovery and analysis for morphology based phylogenetics in nematodes.  相似文献   

8.
MOTIVATION: The computation of large phylogenetic trees with statistical models such as maximum likelihood or bayesian inference is computationally extremely intensive. It has repeatedly been demonstrated that these models are able to recover the true tree or a tree which is topologically closer to the true tree more frequently than less elaborate methods such as parsimony or neighbor joining. Due to the combinatorial and computational complexity the size of trees which can be computed on a Biologist's PC workstation within reasonable time is limited to trees containing approximately 100 taxa. RESULTS: In this paper we present the latest release of our program RAxML-III for rapid maximum likelihood-based inference of large evolutionary trees which allows for computation of 1.000-taxon trees in less than 24 hours on a single PC processor. We compare RAxML-III to the currently fastest implementations for maximum likelihood and bayesian inference: PHYML and MrBayes. Whereas RAxML-III performs worse than PHYML and MrBayes on synthetic data it clearly outperforms both programs on all real data alignments used in terms of speed and final likelihood values. Availability SUPPLEMENTARY INFORMATION: RAxML-III including all alignments and final trees mentioned in this paper is freely available as open source code at http://wwwbode.cs.tum/~stamatak CONTACT: stamatak@cs.tum.edu.  相似文献   

9.
autoinfer is a computer program for biogeographical inference based on nested clade analysis. To reduce the obscurity caused by manual inference, we defined geographically concordant clades and intermediate geographical areas between two clades. The program will perform most of the inferences automatically with a minimum of input from the user. We believe that autoinfer will save much time for the user compared with using the inference key by hand and, furthermore, will reduce the errors of inference resulting from different criteria in deduction.  相似文献   

10.
This study reports maximum parsimony and Bayesian phylogenetic analyses of selected Old World Astragalus using two chloroplast fragments including trnL-F and ndhF and the nuclear ribosomal internal transcribed spacer (nrDNA ITS). A total of 52 taxa including 34 euploid Old World and New World Astragalus , one aneuploid species from the Neo-Astragalus clade as a representative and 14 other Astragalean taxa, plus Cheseneya astragalina and two species of Caragana as outgroups were analyzed for both trnL-F and nrDNA ITS regions. ndhF was analyzed in 30 taxa and the same number for the combination of these three datasets were examined. In general, the trnL-F dataset and the ndhF and nrDNA ITS datasets generated more or less the same clades within Astragalus . However, in the trnL-F and ndhF phylogenies, Astragalus species are not gathered in a single clade, the so-called Astragalus s.s., as indicated by the nrDNA ITS tree. Visual inspection of these three phylogenies revealed that they were inconsistent regarding the position and relationships of Astragalus hemsleyi , A. ophiocarpus , A. annularis–A. epiglottis / Astragalus pelecinus, A. echinatus and A. arizonicus . Incongruence length difference test suggested that the trnL-F , ndhF and nrDNA ITS datasets were incongruent. In spite of this, phylogenetic analyses of the combined datasets as one unit or as three partitions generated trees that were topologically similar as a mix of the cpDNA and the nrDNA ITS trees. However, the combined dataset provided more resolved and statistically supported clades. The recently described A. memoriosus appeared closely related to A. stocksii (both from sect. Caraganella ) based on both trnL-F and nrDNA ITS sequences.  相似文献   

11.
芍药属牡丹组基于形态学证据的系统发育关系分析   总被引:1,自引:4,他引:1  
对芍药属牡丹组Paeonia L.sect.Moutan DC.(全部野生种)40个居群进行了基于形态学证据的系统学分析,试图建立组内种间的系统发育关系。利用PAUP (4.0)计算机程序分别构建了建立在25个形态学性状基础上的所有研究类群的距离树(UPGMA、NJ)和最大简约树(MP)。所得树的拓扑结构基本一致,差异只发生在距离树和简约树之间,在由形态和细胞学关系都很近的5个种(牡丹P.suffruticosa、矮牡丹P.jishanensis、卵叶牡丹P.qiui、紫斑牡丹P.rockii和凤丹P.o  相似文献   

12.
Abstract.  The phylogeny of Iberian Aphodiini species was reconstructed based on morphology. Wing venation, mouthparts, male and female genitalia, and external morphology provided ninety-four characters scored for ninety-three Aphodiini species. Phylogenetic analyses were based on maximum parsimony and Bayesian inference criteria. Maximum parsimony consensus trees recovered Acrossus species as a sister group of the remaining Aphodiini, followed by two other branches, one including Neagolius , Plagiogonus , Ahermodontus and Ammoecius species, and the other including Oxyomus , Nimbus , Heptaulacus and Euheptaulacus species. The remaining studied taxa clustered in an unresolved group. Bayesian inference trees recovered Acrossus as the sister group of the remaining Iberian Aphodiini, followed by Colobopterus erraticus and the rest of the Iberian Aphodiini, but this latter branch was unresolved. The general lack of statistical support for the inferred phylogenetic relationships at terminal nodes using both maximum parsimony and Bayesian inference suggests that variation in morphological characters useful for phylogenetic inference in the present study is small, perhaps as a consequence of a radiation event occurring at the origin of the tribe. A probable evolutionary pattern for Aphodiini is proposed which infers six groups, namely Acrossian, Ammoecian, Oxyomian, Aphodian s.str., Colobopteran and Aphodian s.l. clades.  相似文献   

13.
Quantification of the success of phylogenetic inference in simulations   总被引:1,自引:0,他引:1  
For phylogenetic simulation studies, the accuracy of topological reconstruction obtained from different data matrices or different methods of phylogenetic inference generally needs to be quantified. Two components of performance within this context are: (1) how the inferred tree topology matches or conflicts with the correct tree topology, and (2) the branch support assigned to both correctly and incorrectly resolved clades. We present a method (averaged overall success of resolution) that incorporates both of these components. Branch support is incorporated in the averaged overall success of resolution by linearly scaling the observed support relative to that conferred by uncontradicted synapomorphies. We believe that this method represents an improvement relative to the commonly used approaches of quantifying the percentage of clades that are correctly resolved in the inferred trees or presenting the Robinson–Foulds distance between the inferred trees and the correct tree. In contrast to Bremer support, the averaged overall success of resolution may be applied equally well to distance, likelihood and parsimony analyses. © The Willi Hennig Society 2006.  相似文献   

14.
Phylogenetic networks represent the evolution of organisms that have undergone reticulate events, such as recombination, hybrid speciation or lateral gene transfer. An important way to interpret a phylogenetic network is in terms of the trees it displays, which represent all the possible histories of the characters carried by the organisms in the network. Interestingly, however, different networks may display exactly the same set of trees, an observation that poses a problem for network reconstruction: from the perspective of many inference methods such networks are indistinguishable. This is true for all methods that evaluate a phylogenetic network solely on the basis of how well the displayed trees fit the available data, including all methods based on input data consisting of clades, triples, quartets, or trees with any number of taxa, and also sequence-based approaches such as popular formalisations of maximum parsimony and maximum likelihood for networks. This identifiability problem is partially solved by accounting for branch lengths, although this merely reduces the frequency of the problem. Here we propose that network inference methods should only attempt to reconstruct what they can uniquely identify. To this end, we introduce a novel definition of what constitutes a uniquely reconstructible network. For any given set of indistinguishable networks, we define a canonical network that, under mild assumptions, is unique and thus representative of the entire set. Given data that underwent reticulate evolution, only the canonical form of the underlying phylogenetic network can be uniquely reconstructed. While on the methodological side this will imply a drastic reduction of the solution space in network inference, for the study of reticulate evolution this is a fundamental limitation that will require an important change of perspective when interpreting phylogenetic networks.  相似文献   

15.
Matrix representation with parsimony (MRP) supertree construction has been criticized because the supertree may specify clades that are contradicted by every source tree contributing to it. Such unsupported clades may also occur using other supertree methods; however, their incidence is largely unknown. In this study, I investigated the frequency of unsupported clades in both simulated and empirical MRP supertrees. Here, I propose a new index, QS, to quantify the qualitative support for a supertree and its clades among the set of source trees. Results show that unsupported clades are very rare in MRP supertrees, occurring most often when there are few source trees that all possess the same set of taxa. However, even under these conditions the frequency of unsupported clades was <0.2%. Unsupported clades were absent from both the Carnivora and Lagomorpha supertrees, reflecting the use of large numbers of source trees for both. The proposed QS indices are correlated broadly with another measure of quantitative clade support (bootstrap frequencies, as derived from resampling of the MRP matrix) but appear to be more sensitive. More importantly, they sample at the level of the source trees and thus, unlike the bootstrap, are suitable for summarizing the support of MRP supertree clades.  相似文献   

16.
Phylogenetic analysis of large datasets using complex nucleotide substitution models under a maximum likelihood framework can be computationally infeasible, especially when attempting to infer confidence values by way of nonparametric bootstrapping. Recent developments in phylogenetics suggest the computational burden can be reduced by using Bayesian methods of phylogenetic inference. However, few empirical phylogenetic studies exist that explore the efficiency of Bayesian analysis of large datasets. To this end, we conducted an extensive phylogenetic analysis of the wide-ranging and geographically variable Eastern Fence Lizard (Sceloporus undulatus). Maximum parsimony, maximum likelihood, and Bayesian phylogenetic analyses were performed on a combined mitochondrial DNA dataset (12S and 16S rRNA, ND1 protein-coding gene, and associated tRNA; 3,688 bp total) for 56 populations of S. undulatus (78 total terminals including other S. undulatus group species and outgroups). Maximum parsimony analysis resulted in numerous equally parsimonious trees (82,646 from equally weighted parsimony and 335 from weighted parsimony). The majority rule consensus tree derived from the Bayesian analysis was topologically identical to the single best phylogeny inferred from the maximum likelihood analysis, but required approximately 80% less computational time. The mtDNA data provide strong support for the monophyly of the S. undulatus group and the paraphyly of "S. undulatus" with respect to S. belli, S. cautus, and S. woodi. Parallel evolution of ecomorphs within "S. undulatus" has masked the actual number of species within this group. This evidence, along with convincing patterns of phylogeographic differentiation suggests "S. undulatus" represents at least four lineages that should be recognized as evolutionary species.  相似文献   

17.
The clade size effect refers to a bias that causes middle‐sized clades to be less supported than small or large‐sized clades. This bias is present in resampling measures of support calculated under maximum likelihood and maximum parsimony and in Bayesian posterior probabilities. Previous analyses indicated that the clade size effect is worst in maximum parsimony, followed by maximum likelihood, while Bayesian inference is the least affected. Homoplasy was interpreted as the main cause of the effect. In this study, we explored the presence of the clade size effect in alternative measures of branch support under maximum parsimony: Bremer support and symmetric resampling, expressed as absolute frequencies and frequency differences. Analyses were performed using 50 molecular and morphological matrices. Symmetric resampling showed the same tendency that bootstrap and jackknife did for maximum parsimony and maximum likelihood. Few matrices showed a significant bias using Bremer support, presenting a better performance than resampling measures of support and comparable to Bayesian posterior probabilities. Our results indicate that the problem is not maximum parsimony, but resampling measures of support. We corroborated the role of homoplasy as a possible cause of the clade size effect, increasing the number of random trees during the resampling, which together with the higher chances that medium‐sized clades have of being contradicted generates the bias during the perturbation of the original matrix, making it stronger in resampling measures of support.  相似文献   

18.
In the 'total evidence' approach to phylogenetics, the reliability of a clade is implicitly measured by its degree of support, often embodied in a robustness index such as a bootstrap proportion. In the taxonomic congruence approach, the measurement of reliability has been implemented by various consensus or supertree methods, but was seldom explicitly discussed as such. We explore a reliability index for clades using their repetition across independent data sets. All possible combinations of the elementary data sets are used to compose the sets of independent data sets, across which the repetitions are counted. The more a clade occurs across such independent combinations, the higher its index. However, if other repeated clades occur that are incompatible with that clade, its index is decreased to take into account the uncertainty resulting from conflicting hypotheses. Results can be summarized through a greedy consensus tree in which clades appear according to their repetition indices. This index is tested on a 73 acanthomorph taxa data set composed of five independent molecular markers and multiple combinations of them. On this particular application, we confirm that reliability as defined here and robustness (estimated by bootstrap proportions obtained from a 'total evidence' approach) should be clearly distinguished.  相似文献   

19.
在前人对列当科系统发育研究的基础上,追加了肉苁蓉属(Cistanche)的基因序列数据,运用最大简约法、最大似然法和贝叶斯推断方法探讨了其在列当科中的系统位置及列当科中属间关系.基于rps16基因序列及rps16+ ITS联合序列建立了列当科系统发育树,结果显示,肉苁蓉属、列当属(Orobanche)以及草苁蓉属(Boschniakia)聚在同一进化枝内,肉苁蓉属和列当属表现出最近亲缘关系;列当科中的全寄生类群、半寄生类群和非寄生类群分属在3个不同分支中.  相似文献   

20.
刘涛  李晓贤 《广西植物》2010,30(6):796-804
应用最大似然法(ML)、贝叶斯推论(BI)、邻接法(NJ)和似然比检验(hLRTs)进行泽泻目分子系统学研究。所用的rbcL基因序列代表了泽泻目14科46属以及作为外类群的6相关属。研究结果表明,*等级制似然比检验表明泽泻目rbcL序列最适合的DNA进化模型为GTR+I+G,最大似然法、贝叶斯法和邻接法构建的系统发育树拓扑结构相似,没有显著的差异,但贝叶斯树支持率较高;泽泻目为一单系类群,由两个主要谱系分支构成,深层分布格局由5个主要分支构成。基于分子系统发育树,文中对泽泻目科间、水鳖科+茨藻科、泽泻科+花蔺科+黄花蔺科、和"Cymodoeaceae complex"的系统发育关系进行了讨论。研究结果还表明,泽泻目系统发育关系可能还需要更多的证据进一步的澄清。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号