首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Many phylogenetic methods produce large collections of trees as opposed to a single tree, which allows the exploration of support for various evolutionary hypotheses. However, to be useful, the information contained in large collections of trees should be summarized; frequently this is achieved by constructing a consensus tree. Consensus trees display only those signals that are present in a large proportion of the trees. However, by their very nature consensus trees require that any conflicts between the trees are necessarily disregarded. We present a method that extends the notion of consensus trees to allow the visualization of conflicting hypotheses in a consensus network. We demonstrate the utility of this method in highlighting differences amongst maximum likelihood bootstrap values and Bayesian posterior probabilities in the placental mammal phylogeny, and also in comparing the phylogenetic signal contained in amino acid versus nucleotide characters for hexapod monophyly.  相似文献   

2.
The increasing availability of large genomic data sets as well as the advent of Bayesian phylogenetics facilitates the investigation of phylogenetic incongruence, which can result in the impossibility of representing phylogenetic relationships using a single tree. While sometimes considered as a nuisance, phylogenetic incongruence can also reflect meaningful biological processes as well as relevant statistical uncertainty, both of which can yield valuable insights in evolutionary studies. We introduce a new tool for investigating phylogenetic incongruence through the exploration of phylogenetic tree landscapes. Our approach, implemented in the R package treespace , combines tree metrics and multivariate analysis to provide low‐dimensional representations of the topological variability in a set of trees, which can be used for identifying clusters of similar trees and group‐specific consensus phylogenies. treespace also provides a user‐friendly web interface for interactive data analysis and is integrated alongside existing standards for phylogenetics. It fills a gap in the current phylogenetics toolbox in R and will facilitate the investigation of phylogenetic results.  相似文献   

3.
The generalized Gibbs sampler (GGS) is a recently developed Markov chain Monte Carlo (MCMC) technique that enables Gibbs-like sampling of state spaces that lack a convenient representation in terms of a fixed coordinate system. This paper describes a new sampler, called the tree sampler, which uses the GGS to sample from a state space consisting of phylogenetic trees. The tree sampler is useful for a wide range of phylogenetic applications, including Bayesian, maximum likelihood, and maximum parsimony methods. A fast new algorithm to search for a maximum parsimony phylogeny is presented, using the tree sampler in the context of simulated annealing. The mathematics underlying the algorithm is explained and its time complexity is analyzed. The method is tested on two large data sets consisting of 123 sequences and 500 sequences, respectively. The new algorithm is shown to compare very favorably in terms of speed and accuracy to the program DNAPARS from the PHYLIP package.  相似文献   

4.
It has long been recognized that phylogenetic trees are more unbalanced than those generated by a Yule process. Recently, the degree of this imbalance has been quantified using the large set of phylogenetic trees available in the TreeBASE data set. In this article, a more precise analysis of imbalance is undertaken. Trees simulated under a range of models are compared with trees from TreeBASE and two smaller data sets. Several simple models can match the amount of imbalance measured in real data. Most of them also match the variance of imbalance among empirical trees to a remarkable degree. Statistics are developed to measure balance and to distinguish between trees with the same overall imbalance. The match between models and data for these statistics is investigated. In particular, age-dependent (Bellman-Harris) branching process are studied in detail. It remains difficult to separate the process of macroevolution from biases introduced by sampling. The lessons for phylogenetic analysis are clearer. In particular, the use of the usual proportional to distinguishable arrangements (uniform) prior on tree topologies in Bayesian phylogenetic analysis is not recommended.  相似文献   

5.
Here I advocate the utility of Bayesian concordance analysis as a mechanism for exploring the magnitude and source of phylogenetic signal in concatenated mitogenomic phylogenetic studies. While typically applied to the study of independently evolving gene trees, Bayesian concordance analysis can also be applied to linked, but individually analyzed, gene regions using a prior probability that reflects the expectation of similar phylogenetic reconstructions. For true branches in the mitogenomic tree, concordance factors should represent the number of gene regions that contain phylogenetic signal for a particular clade. As a demonstration of the application of Bayesian concordance analysis to empirical data, I analyzed two different salamander (Hynobiidae and Plethodontidae) mitogenomic data sets using a gene-based partitioning strategy. The results revealed many strongly supported clades in the concatenated trees that have high concordance factors, permitting the inference that these are robustly resolved through phylogenetic signal distributed across the mitogenome. In contrast, a number of strongly supported clades in the concatenated tree received low concordance factors, indicating that their reconstruction is either driven primarily by phylogenetic signal in a small number of gene regions, or that they are inconsistent reconstructions influenced by properties of the data that can produce inaccurate trees (e.g., compositional bias, selection, etc.). Exploration of the Bayesian joint posterior distribution of trees highlighted partitions that contribute phylogenetic information to similar clade reconstructions. This approach was particularly insightful in the hynobiid data, where different combinations of genes were identified that support alternative tree reconstructions. Concatenated analysis of these different subsets of genes highlighted through Bayesian concordance analysis produced strongly supported and contrasting trees, demonstrating the potential for inconsistency in concatenated mitogenomic phylogenetics. The overall results presented here suggest that Bayesian concordance analysis can serve as an effective exploration of the influence of different gene regions in mitogenomic (and other organellar genomic) phylogenetic studies.  相似文献   

6.
Examination of trees for the presence of particular nodes is a fundamental aspect of systematics, and is the basis of phylogenetic sensitivity analysis, but becomes unwieldy when performed manually for complex nodes or over large numbers of trees. The program Cladescan is presented here as a stand-alone application to facilitate the detection of nodes in such situations. Cladescan includes features useful for phylogenetic sensitivity analysis, such as automatic generation of "Navajo rug" sensitivity plots. In addition, researchers may find it useful for general comparisons among large data sets.
© The Willi Hennig Society 2009.  相似文献   

7.
Increasingly, large data sets pose a challenge for computationally intensive phylogenetic methods such as Bayesian Markov chain Monte Carlo (MCMC). Here, we investigate the performance of common MCMC proposal distributions in terms of median and variance of run time to convergence on 11 data sets. We introduce two new Metropolized Gibbs Samplers for moving through "tree space." MCMC simulation using these new proposals shows faster average run time and dramatically improved predictability in performance, with a 20-fold reduction in the variance of the time to estimate the posterior distribution to a given accuracy. We also introduce conditional clade probabilities and demonstrate that they provide a superior means of approximating tree topology posterior probabilities from samples recorded during MCMC.  相似文献   

8.
We examine the impact of likelihood surface characteristics on phylogenetic inference. Amino acid data sets simulated from topologies with branch length features chosen to represent varying degrees of difficulty for likelihood maximization are analyzed. We present situations where the tree found to achieve the global maximum in likelihood is often not equal to the true tree. We use the program covSEARCH to demonstrate how the use of adaptively sized pools of candidate trees that are updated using confidence tests results in solution sets that are highly likely to contain the true tree. This approach requires more computation than traditional maximum likelihood methods, hence covSEARCH is best suited to small to medium-sized alignments or large alignments with some constrained nodes. The majority rule consensus tree computed from the confidence sets also proves to be different from the generating topology. Although low phylogenetic signal in the input alignment can result in large confidence sets of trees, some biological information can still be obtained based on nodes that exhibit high support within the confidence set. Two real data examples are analyzed: mammal mitochondrial proteins and a small tubulin alignment. We conclude that the technique of confidence set optimization can significantly improve the robustness of phylogenetic inference at a reasonable computational cost. Additionally, when either very short internal branches or very long terminal branches are present, confident resolution of specific bipartitions or subtrees, rather than whole-tree phylogenies, may be the most realistic goal for phylogenetic methods. [Reviewing Editor: Dr. Nicolas Galtier]  相似文献   

9.
The fact that different phylogenomic data sets can lead to highly supported but inconsistent results suggest that conflict among gene trees in real data sets could be severe. We provide here a detailed exploration of gene tree space to investigate the relationships in Hymenoptera based on data obtained by Johnson et al. (Current Biology, 2013, 23, 2058), in which ants and Apoidea (bees and spheciform wasps) were recovered as sister groups, contradicting previous studies. We found high levels of topological variation among gene trees, several of them disagreeing with previously published hypotheses. To profile the dynamics of emerging support versus conflicting signal in combined analysis of data, we employed a novel method based on the incremental addition of randomized data to coalescence‐based phylogenetic inference. Although the monophyly of Aculeata and of Formicidae were consistently recovered using as little as 6.5% of the 308 available markers, signal for the Formicidae + Apoidea clade prevailed only after more than 50% of the loci were sampled. Still, non‐negligible support for alternative hypotheses remained until all genes were added to the analysis. Our results suggest that phylogenetic conflict is rather pervasive and not scattered as noise across individual gene trees because alternative topologies were recovered not from a specific subset, but from several random combinations of loci. Thus, even though phylogenetic signal recovered from full gene data sets was already dominant in much smaller ensembles, large amounts of data may be indeed necessary to overcome phylogenetic conflict.  相似文献   

10.
Simultaneous molecular dating of population and species divergences is essential in many biological investigations, including phylogeography, phylodynamics and species delimitation studies. In these investigations, multiple sequence alignments consist of both intra‐ and interspecies samples (mixed samples). As a result, the phylogenetic trees contain interspecies, interpopulation and within‐population divergences. Bayesian relaxed clock methods are often employed in these analyses, but they assume the same tree prior for both inter‐ and intraspecies branching processes and require specification of a clock model for branch rates (independent vs. autocorrelated rates models). We evaluated the impact of a single tree prior on Bayesian divergence time estimates by analysing computer‐simulated data sets. We also examined the effect of the assumption of independence of evolutionary rate variation among branches when the branch rates are autocorrelated. Bayesian approach with coalescent tree priors generally produced excellent molecular dates and highest posterior densities with high coverage probabilities. We also evaluated the performance of a non‐Bayesian method, RelTime, which does not require the specification of a tree prior or a clock model. RelTime's performance was similar to that of the Bayesian approach, suggesting that it is also suitable to analyse data sets containing both populations and species variation when its computational efficiency is needed.  相似文献   

11.
The general problem of representing collections of trees as a single graph has led to many tree summary techniques. Many consensus approaches take sets of trees (either inferred as separate gene trees or gleaned from the posterior of a Bayesian analysis) and produce a single “best” tree. In scenarios where horizontal gene transfer or hybridization are suspected, networks may be preferred, which allow for nodes to have two parents, representing the fusion of lineages. One such construct is the cluster union network (CUN), which is constructed using the union of all clusters in the input trees. The CUN has a number of mathematically desirable properties, but can also present edges not observed in the input trees. In this paper we define a new network construction, the edge union network (EUN), which displays edges if and only if they are contained in the input trees. We also demonstrate that this object can be constructed with polynomial time complexity given arbitrary phylogenetic input trees, and so can be used in conjunction with network analysis techniques for further phylogenetic hypothesis testing.  相似文献   

12.
We modified the phylogenetic program MrBayes 3.1.2 to incorporate the compound Dirichlet priors for branch lengths proposed recently by Rannala, Zhu, and Yang (2012. Tail paradox, partial identifiability and influential priors in Bayesian branch length inference. Mol. Biol. Evol. 29:325-335.) as a solution to the problem of branch-length overestimation in Bayesian phylogenetic inference. The compound Dirichlet prior specifies a fairly diffuse prior on the tree length (the sum of branch lengths) and uses a Dirichlet distribution to partition the tree length into branch lengths. Six problematic data sets originally analyzed by Brown, Hedtke, Lemmon, and Lemmon (2010. When trees grow too long: investigating the causes of highly inaccurate Bayesian branch-length estimates. Syst. Biol. 59:145-161) are reanalyzed using the modified version of MrBayes to investigate properties of Bayesian branch-length estimation using the new priors. While the default exponential priors for branch lengths produced extremely long trees, the compound Dirichlet priors produced posterior estimates that are much closer to the maximum likelihood estimates. Furthermore, the posterior tree lengths were quite robust to changes in the parameter values in the compound Dirichlet priors, for example, when the prior mean of tree length changed over several orders of magnitude. Our results suggest that the compound Dirichlet priors may be useful for correcting branch-length overestimation in phylogenetic analyses of empirical data sets.  相似文献   

13.
14.
Concerns have been raised that posterior probabilities on phylogenetic trees can be unreliable when the true tree is unresolved or has very short internal branches, because existing methods for Bayesian phylogenetic analysis do not explicitly evaluate unresolved trees. Two recent papers have proposed that evaluating only resolved trees results in a "star tree paradox": when the true tree is unresolved or close to it, posterior probabilities were predicted to become increasingly unpredictable as sequence length grows, resulting in inflated confidence in one resolved tree or another and an increasing risk of false-positive inferences. Here we show that this is not the case; existing Bayesian methods do not lead to an inflation of statistical confidence, provided the evolutionary model is correct and uninformative priors are assumed. Posterior probabilities do not become increasingly unpredictable with increasing sequence length, and they exhibit conservative type I error rates, leading to a low rate of false-positive inferences. With infinite data, posterior probabilities give equal support for all resolved trees, and the rate of false inferences falls to zero. We conclude that there is no star tree paradox caused by not sampling unresolved trees.  相似文献   

15.
16.
ABSTRACT: BACKGROUND: The increased use of multi-locus data sets for phylogenetic reconstruction has increased the need to determine whether a set of gene trees significantly deviate from the phylogenetic patterns of other genes. Such unusual gene trees may have been influenced by other evolutionary processes such as selection, gene duplication, or horizontal gene transfer. RESULTS: Motivated by this problem we propose a nonparametric goodness-of-fit test for two empirical distributions of gene trees, and we developed the software GeneOut to estimate a p-value for the test. Our approach maps trees into a multi-dimensional vector space and then applies support vector machines (SVMs) to measure the separation between two sets of pre-defined trees. We use a permutation test to assess the significance of the SVM separation. To demonstrate the performance of GeneOut, we applied it to the comparison of gene trees simulated within different species trees across a range of species tree depths. Applied directly to sets of simulated gene trees with large sample sizes, GeneOut was able to detect very small differences between two set of gene trees generated under different species trees. Our statistical test can also include tree reconstruction into its test framework through a variety of phylogenetic optimality criteria. When applied to DNA sequence data simulated from different sets of gene trees, results in the form of receiver operating characteristic (ROC) curves indicated that GeneOut performed well in the detection of differences between sets of trees with different distributions in a multi-dimensional space. Furthermore, it controlled false positive and false negative rates very well, indicating a high degree of accuracy. CONCLUSIONS: The non-parametric nature of our statistical test provides fast and efficient analyses, and makes it an applicable test for any scenario where evolutionary or other factors can lead to trees with different multi-dimensional distributions. The software GeneOut is freely available under the GNU public license.  相似文献   

17.
Summary NMR as well as X-ray crystallography are used to determine the three-dimensional structures of macromolecules at atomic resolution. Structure calculation generates coordinates that are compatible with NMR data from randomly generated initial structures. We analyzed the trajectory taken by structures during NMR structure calculation in conformational space, assuming that the distance between two structures in conformational space is the root-mean-square deviation between the two structures. The coordinates of a structure in conformational space were obtained by applying the metric multidimensional scaling method. As an example, we used a 22-residue peptide, -Conotoxin GIIIA, and a simulated annealing protocol of XPLOR. We found that the three-dimensional solution of the multidimensional scaling analysis is sufficient to describe the overall configuration of the trajectories in conformational space. By comparing the trajectories of the entire calculation with those of the converged calculation, random sampling of conformational space is readily discernible. Trajectory analysis can also be used for optimization of protocols of NMR structure calculation, by examining individual trajectories.Abbreviations MD molecular dynamics - MDS multidimensional scaling - rmsd root-mean-square deviation - armsd angular rmsd - R multiple correlation coefficient - YASAP yet another simulated annealing protocol - PCA principal component analysis  相似文献   

18.
Knowles LL  Klimov PB 《Parasitology》2011,138(13):1750-1759
With the increased availability of multilocus sequence data, the lack of concordance of gene trees estimated for independent loci has focused attention on both the biological processes producing the discord and the methodologies used to estimate phylogenetic relationships. What has emerged is a suite of new analytical tools for phylogenetic inference--species tree approaches. In contrast to traditional phylogenetic methods that are stymied by the idiosyncrasies of gene trees, approaches for estimating species trees explicitly take into account the cause of discord among loci and, in the process, provides a direct estimate of phylogenetic history (i.e. the history of species divergence, not divergence of specific loci). We illustrate the utility of species tree estimates with an analysis of a diverse group of feather mites, the pinnatus species group (genus Proctophyllodes). Discord among four sequenced nuclear loci is consistent with theoretical expectations, given the short time separating speciation events (as evident by short internodes relative to terminal branch lengths in the trees). Nevertheless, many of the relationships are well resolved in a Bayesian estimate of the species tree; the analysis also highlights ambiguous aspects of the phylogeny that require additional loci. The broad utility of species tree approaches is discussed, and specifically, their application to groups with high speciation rates--a history of diversification with particular prevalence in host/parasite systems where species interactions can drive rapid diversification.  相似文献   

19.
The Cavender-Felsenstein edge-length invariants for binary characters on 4-trees provide the starting point for the development of "customized" invariants for evaluating and comparing phylogenetic hypotheses. The binary character invariants may be generalized to k-valued characters without losing the quadratic nature of the invariants as functions of the theoretical frequencies f(UVXY) of observable character configurations (U at organism 1, V at 2, etc.). The key to the approach is that certain sets of these configurations constitute events which are probabilistically independent from other such sets, under the symmetric Markov change models studied. By introducing more complex sets of configurations, we find the quadratic invariants for 5-trees in the binary model and for individual edges in 6-trees or, indeed, in any size tree. The same technique allows us to formulate invariants for entire trees, but these are cubic functions for 6-trees and are higher-degree polynomials for larger trees. With k-valued characters and, especially, with large trees, the types of configuration sets (events) used in the simpler examples are too rare (i.e., their predicted frequencies are too low) to be useful, and the construction of meaningful pairs of independent events becomes an important and nontrivial task in designing invariants suited to testing specific hypotheses. In a very natural way, this approach fits in with well-known statistical methodology for contingency tables. We explore use of events such as "only transitions occur for character i (i.e., position i in a nucleic acid sequence) in subtree a" in analyzing a set of data on ribosomal RNA in the context of the controversy over the origins of archaebacteria, eubacteria, and eukaryotes.  相似文献   

20.
刘涛  李晓贤 《广西植物》2010,30(6):796-804
应用最大似然法(ML)、贝叶斯推论(BI)、邻接法(NJ)和似然比检验(hLRTs)进行泽泻目分子系统学研究。所用的rbcL基因序列代表了泽泻目14科46属以及作为外类群的6相关属。研究结果表明,*等级制似然比检验表明泽泻目rbcL序列最适合的DNA进化模型为GTR+I+G,最大似然法、贝叶斯法和邻接法构建的系统发育树拓扑结构相似,没有显著的差异,但贝叶斯树支持率较高;泽泻目为一单系类群,由两个主要谱系分支构成,深层分布格局由5个主要分支构成。基于分子系统发育树,文中对泽泻目科间、水鳖科+茨藻科、泽泻科+花蔺科+黄花蔺科、和"Cymodoeaceae complex"的系统发育关系进行了讨论。研究结果还表明,泽泻目系统发育关系可能还需要更多的证据进一步的澄清。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号