首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The shape of evolution: systematic tree topology   总被引:2,自引:0,他引:2  
Three hypotheses that predict probabilities associated with various tree shapes, or topologies, are compared with observed topology frequencies for a large number of 4, 5, 6 and 7-member trees. The united data on these n-member trees demonstrate that both the equiprobable and proportional-to-distinguishable-types hypotheses poorly predict tree topologies, while all observed topology frequencies are similar to predictions of a simple Markovian dichotomous branching hypothesis. Differences in topology frequencies between phenetic and non-phenetic trees are observed, but their statistical significance is uncertain. Relative frequencies of highly asymmetrical topologies are larger, and those of symmetrical topologies are smaller, in phenetic than in non-phenetic trees. The fact that a simple Markovian branching process, which assumes that each species has an equal probability of speciating in each time period, can predict tree topologies offers promise. Refinement of Markovian branching hypotheses to include the possibility of multiple furcations, differential speciation and extinction rates for different groups of organisms as well as for a single group through geological time, hybrid speciation, introgression, and lineage fusion will be necessary to produce realistic models of lineage diversification.  相似文献   

2.
The maximum-likelihood (ML) solution to a simple phylogenetic estimation problem is obtained analytically The problem is estimation of the rooted tree for three species using binary characters with a symmetrical rate of substitution under the molecular clock. ML estimates of branch lengths and log-likelihood scores are obtained analytically for each of the three rooted binary trees. Estimation of the tree topology is equivalent to partitioning the sample space (space of possible data outcomes) into subspaces, within each of which one of the three binary trees is the ML tree. Distance-based least squares and parsimony-like methods produce essentially the same estimate of the tree topology, although differences exist among methods even under this simple model. This seems to be the simplest case, but has many of the conceptual and statistical complexities involved in phylogeny estimation. The solution to this real phylogeny estimation problem will be useful for studying the problem of significance evaluation.  相似文献   

3.
Summary The statistical properties of three molecular tree construction methods—the unweighted pair-group arithmetic average clustering (UPG), Farris, and modified Farris methods—are examined under the neutral mutation model of evolution. The methods are compared for accuracy in construction of the topology and estimation of the branch lengths, using statistics of these two aspects. The distribution of the statistic concerning topological construction is shown to be as important as its mean and variance for the comparison.Of the three methods, the UPG method constructs the tree topology with the least variation. The modified Farris method, however, gives the best performance when the two aspects are considered simultaneously. It is also shown that a topology based on two genes is much more accurate than that based on one gene.There is a tendency to accept published molecular trees, but uncritical acceptance may lead one to spurious conclusions. It should always be kept in mind that a tree is a statistical result that is affected strongly by the stochastic error of nucleotide substitution and the error intrinsic to the tree construction method itself.  相似文献   

4.
Selection is one of the factors that most influence the shape of genealogical trees. Here we report results of simulations of the infinite-sites version of Moran's model of population genetics aiming at quantifying how the presence of selection affects the branching pattern (topology) of binary genealogical trees. In particular, we consider a scenario of purifying or negative selection in which all mutations are deleterious and each new mutation reduces the fitness of the individual by the same fraction. Analysis of five statistical measures of tree balance or symmetry borrowed from taxonomy indicates that the genealogical trees of samples of populations in which selection is actuating are in the average more asymmetric than neutral trees and that this effect is enhanced by increasing the sample size. However, a quantitative evaluation of the power of these balance measures to detect a tree topology significantly distinct from the neutral one indicates that they are not useful as tests of neutrality of mutations.  相似文献   

5.
Development of methods for estimating species trees from multilocus data is a current challenge in evolutionary biology. We propose a method for estimating the species tree topology and branch lengths using approximate Bayesian computation (ABC). The method takes as data a sample of observed rooted gene tree topologies, and then iterates through the following sequence of steps: First, a randomly selected species tree is used to compute the distribution of rooted gene tree topologies. This distribution is then compared to the observed gene topology frequencies, and if the fit between the observed and the predicted distributions is close enough, the proposed species tree is retained. Repeating this many times leads to a collection of retained species trees that are then used to form the estimate of the overall species tree. We test the performance of the method, which we call ST-ABC, using both simulated and empirical data. The simulation study examines both symmetric and asymmetric species trees over a range of branch lengths and sample sizes. The results from the simulation study show that the model performs very well, giving accurate estimates for both the topology and the branch lengths across the conditions studied, and that a sample size of 25 loci appears to be adequate for the method. Further, we apply the method to two empirical cases: a 4-taxon data set for primates and a 7-taxon data set for yeast. In both cases, we find that estimates obtained with ST-ABC agree with previous studies. The method provides efficient estimation of the species tree, and does not require sequence data, but rather the observed distribution of rooted gene topologies without branch lengths. Therefore, this method is a useful alternative to other currently available methods for species tree estimation.  相似文献   

6.
不同林分起源的相容性生物量模型构建   总被引:4,自引:0,他引:4  
目前为止已有不同方法构建生物量相容性模型,但不同林分起源的生物量相容性模型很少报道。针对此问题,以150株南方马尾松(Pinus masson iana)地上生物量数据为例,利用比例平差法和非线性联立方程组法建立不同起源地上生物量以及干材、干皮、树枝和树叶各分项生物量相容的通用性模型。根据分配层次不同,两种方法又各自考虑总量直接控制和分级联合控制两种方案。从直径、树高、地径、枝下高和冠幅5个林分变量中选取不同的变量构建一元、二元和三元生物量模型,并利用加权最小二乘回归法消除生物量模型中存在的异方差性。结果为:比例平差法和非线性联立方程组法都能有效保证各分项生物量总和等于总生物量,模型预测精度满足要求。总体而言,非线性联立方程组方法比比例平差方法精度高,同时两种方法中总量直接控制法比分级联合控制法预测效果好;各分项生物量模型本身作为权函数能有效消除异方差;各分项对应的三元生物量模型预测精度最高,其次是二元生物量模型,最低是一元生物量模型,但这些差异不是很大。总之,为权衡考虑模型预测精度和调查成本,建议把直径和树高作为协变量利用总量直接控制非线性联立方程组法对不同起源生物量建模。  相似文献   

7.
Quartet-mapping, a generalization of the likelihood-mapping procedure.   总被引:5,自引:0,他引:5  
Likelihood-mapping (LM) was suggested as a method of displaying the phylogenetic content of an alignment. However, statistical properties of the method have not been studied. Here we analyze the special case of a four-species tree generated under a range of evolution models and compare the results with those of a natural extension of the likelihood-mapping approach, geometry-mapping (GM), which is based on the method of statistical geometry in sequence space. The methods are compared in their abilities to indicate the correct topology. The performance of both methods in detecting the star topology is especially explored. Our results show that LM tends to reject a star tree more often than GM. When assumptions about the evolutionary model of the maximum-likelihood reconstruction are not matched by the true process of evolution, then LM shows a tendency to favor one tree, whereas GM correctly detects the star tree except for very short outer branch lengths with a statistical significance of >0.95 for all models. LM, on the other hand, reconstructs the correct bifurcating tree with a probability of >0.95 for most branch length combinations even under models with varying substitution rates. The parameter domain for which GM recovers the true tree is much smaller. When the exterior branch lengths are larger than a (analytically derived) threshold value depending on the tree shape (rather than the evolutionary model), GM reconstructs a star tree rather than the true tree. We suggest a combined approach of LM and GM for the evaluation of starlike trees. This approach offers the possibility of testing for significant positive interior branch lengths without extensive statistical and computational efforts.  相似文献   

8.
Summary The maximum likelihood (ML) method for constructing phylogenetic trees (both rooted and unrooted trees) from DNA sequence data was studied. Although there is some theoretical problem in the comparison of ML values conditional for each topology, it is possible to make a heuristic argument to justify the method. Based on this argument, a new algorithm for estimating the ML tree is presented. It is shown that under the assumption of a constant rate of evolution, the ML method and UPGMA always give the same rooted tree for the case of three operational taxonomic units (OTUs). This also seems to hold approximately for the case with four OTUs. When we consider unrooted trees with the assumption of a varying rate of nucleotide substitution, the efficiency of the ML method in obtaining the correct tree is similar to those of the maximum parsimony method and distance methods. The ML method was applied to Brown et al.'s data, and the tree topology obtained was the same as that found by the maximum parsimony method, but it was different from those obtained by distance methods.  相似文献   

9.
Quantification of the success of phylogenetic inference in simulations   总被引:1,自引:0,他引:1  
For phylogenetic simulation studies, the accuracy of topological reconstruction obtained from different data matrices or different methods of phylogenetic inference generally needs to be quantified. Two components of performance within this context are: (1) how the inferred tree topology matches or conflicts with the correct tree topology, and (2) the branch support assigned to both correctly and incorrectly resolved clades. We present a method (averaged overall success of resolution) that incorporates both of these components. Branch support is incorporated in the averaged overall success of resolution by linearly scaling the observed support relative to that conferred by uncontradicted synapomorphies. We believe that this method represents an improvement relative to the commonly used approaches of quantifying the percentage of clades that are correctly resolved in the inferred trees or presenting the Robinson–Foulds distance between the inferred trees and the correct tree. In contrast to Bremer support, the averaged overall success of resolution may be applied equally well to distance, likelihood and parsimony analyses. © The Willi Hennig Society 2006.  相似文献   

10.
The fixation of into living matter sustains all life on Earth, and embeds the biosphere within geochemistry. The six known chemical pathways used by extant organisms for this function are recognized to have overlaps, but their evolution is incompletely understood. Here we reconstruct the complete early evolutionary history of biological carbon-fixation, relating all modern pathways to a single ancestral form. We find that innovations in carbon-fixation were the foundation for most major early divergences in the tree of life. These findings are based on a novel method that fully integrates metabolic and phylogenetic constraints. Comparing gene-profiles across the metabolic cores of deep-branching organisms and requiring that they are capable of synthesizing all their biomass components leads to the surprising conclusion that the most common form for deep-branching autotrophic carbon-fixation combines two disconnected sub-networks, each supplying carbon to distinct biomass components. One of these is a linear folate-based pathway of reduction previously only recognized as a fixation route in the complete Wood-Ljungdahl pathway, but which more generally may exclude the final step of synthesizing acetyl-CoA. Using metabolic constraints we then reconstruct a “phylometabolic” tree with a high degree of parsimony that traces the evolution of complete carbon-fixation pathways, and has a clear structure down to the root. This tree requires few instances of lateral gene transfer or convergence, and instead suggests a simple evolutionary dynamic in which all divergences have primary environmental causes. Energy optimization and oxygen toxicity are the two strongest forces of selection. The root of this tree combines the reductive citric acid cycle and the Wood-Ljungdahl pathway into a single connected network. This linked network lacks the selective optimization of modern fixation pathways but its redundancy leads to a more robust topology, making it more plausible than any modern pathway as a primitive universal ancestral form.  相似文献   

11.
Abstract I show that three parametric-bootstrap (PB) applications that have been proposed for phylogenetic analysis, can be misleading as currently implemented. First, I show that simulating a topology estimated from preliminary data in order to determine the sequence length that should allow the best tree obtained from more extensive data to be correct with a desired probability, delivers an accurate estimate of this length only in topological situations in which most preliminary trees are expected to be both correct and statistically significant, i.e. when no further analysis would be needed. Otherwise, one obtains strong underestimates of the length or similarly biased values for incorrect trees. Second, I show that PB-based topology tests that use as null hypothesis the most likely tree congruent with a pre-specified topological relationship alternative to the unconstrained most likely tree, and simulate this tree for P value estimation, produce excessive type I error (from 50% to 600% and higher) when they are applied to null data generated by star-shaped or dichotomous four-taxon topologies. Simulating the most likely star topology for P value estimation results instead in correct type-I-error production even when the null data are generated by a dichotomous topology. This is a strong indication that the star topology is the correct default null hypothesis for phylogenies. Third, I show that PB-estimated confidence intervals (CIs) for the length of a tree branch are generally accurate, although in some situations they can be strongly over- or under-estimated relative to the “true” CI. Attempts to identify a biased CI through a further round of simulations were unsuccessful. Tracing the origin and propagation of parameter estimate error through the CI estimation exercise, showed that the sparseness of site-patterns which are crucial to the estimation of pivotal parameters, can allow homoplasy to bias these estimates and ultimately the PB-based CI estimation. Concluding, I stress that statistical techniques that simulate models estimated from limited data need to be carefully calibrated, and I defend the point that pattern-sparseness assessment will be the next frontier in the statistical analysis of phylogenies, an effort that will require taking advantage of the merits of black-box maximum-likelihood approaches and of insights from intuitive, site-pattern-oriented approaches like parsimony.  相似文献   

12.
A package of programs (run by a management program called TREECON)was developed for the construction and drawing of evolutionarytrees. The program MATRIX calculates dissimilarity values andcan perform bootstrap analysis on nucleic acid sequences. TREEimplements different evolutionary tree constructing methodsbased on distance matrices. Because some of these methods produceunrooted evolutionary trees, a program ROOT places a root onthe tree. Finally, the program DRAW draws the evolutionary tree,changes its size or topology, and produces drawings suitablefor publication. Whereas MATRIX is suited only for nucleic acids,the modules TREE, ROOT and DRAW are applicable to any kind ofdissimilarity matrix. The programs run on IBM-compatible microcomputersusing the DOS operating system.  相似文献   

13.
Distance-based methods are popular for reconstructing evolutionary trees of protein sequences, mainly because of their speed and generality. A number of variants of the classical neighbor-joining (NJ) algorithm have been proposed, as well as a number of methods to estimate protein distances. We here present a large-scale assessment of performance in reconstructing the correct tree topology for the most popular algorithms. The programs BIONJ, FastME, Weighbor, and standard NJ were run using 12 distance estimators, producing 48 tree-building/distance estimation method combinations. These were evaluated on a test set based on real trees taken from 100 Pfam families. Each tree was used to generate multiple sequence alignments with the ROSE program using three evolutionary models. The accuracy of each method was analyzed as a function of both sequence divergence and location in the tree. We found that BIONJ produced the overall best results, although the average accuracy differed little between the tree-building methods (normally less than 1%). A noticeable trend was that FastME performed poorer than the rest on long branches. Weighbor was several orders of magnitude slower than the other programs. Larger differences were observed when using different distance estimators. Protein-adapted Jukes-Cantor and Kimura distance correction produced clearly poorer results than the other methods, even worse than uncorrected distances. We also assessed the recently developed Scoredist measure, which performed equally well as more complex methods.  相似文献   

14.
perm is a permutation program designed to detect statistical connections between grouping structures and grouping factors or correlates. Groups may be of various kinds such as herds, flocks, schools and mating couples provided they make up meaningful social units. Relatedness, population membership and genotypic contents are among several aggregating variables which may be processed. Typically, perm takes in a collection of grouped data and outputs a P value. The latter is computed on the basis of random membership among groups (HO). All files, including input, output and program, are of Excel type (.xls). perm can be downloaded free of charge at: http://www.bio.ulaval.ca/louisbernatchez/downloads.htm .  相似文献   

15.
The problem of reconstructing the duplication history of a set of tandemly repeated sequences was first introduced by Fitch (1977). Many recent studies deal with this problem, showing the validity of the unequal recombination model proposed by Fitch, describing numerous inference algorithms, and exploring the combinatorial properties of these new mathematical objects, which are duplication trees. In this paper, we deal with the topological rearrangement of these trees. Classical rearrangements used in phylogeny (NNI, SPR, TBR, ...) cannot be applied directly on duplication trees. We show that restricting the neighborhood defined by the SPR (Subtree Pruning and Regrafting) rearrangement to valid duplication trees, allows exploring the whole duplication tree space. We use these restricted rearrangements in a local search method which improves an initial tree via successive rearrangements. This method is applied to the optimization of parsimony and minimum evolution criteria. We show through simulations that this method improves all existing programs for both reconstructing the topology of the true tree and recovering its duplication events. We apply this approach to tandemly repeated human Zinc finger genes and observe that a much better duplication tree is obtained by our method than using any other program.  相似文献   

16.
MOTIVATION: Uncovering the protein-protein interaction network is a fundamental step in the quest to understand the molecular machinery of a cell. This motivates the search for efficient computational methods for predicting such interactions. Among the available predictors are those that are based on the co-evolution hypothesis "evolutionary trees of protein families (that are known to interact) are expected to have similar topologies". Many of these methods are limited by the fact that they can handle only a small number of protein sequences. Also, details on evolutionary tree topology are missing as they use similarity matrices in lieu of the trees. RESULTS: We introduce MORPH, a new algorithm for predicting protein interaction partners between members of two protein families that are known to interact. Our approach can also be seen as a new method for searching the best superposition of the corresponding evolutionary trees based on tree automorphism group. We discuss relevant facts related to the predictability of protein-protein interaction based on their co-evolution. When compared with related computational approaches, our method reduces the search space by approximately 3 x 10(5)-fold and at the same time increases the accuracy of predicting correct binding partners.  相似文献   

17.
Phylogenetic mixtures model the inhomogeneous molecular evolution commonly observed in data. The performance of phylogenetic reconstruction methods where the underlying data are generated by a mixture model has stimulated considerable recent debate. Much of the controversy stems from simulations of mixture model data on a given tree topology for which reconstruction algorithms output a tree of a different topology; these findings were held up to show the shortcomings of particular tree reconstruction methods. In so doing, the underlying assumption was that mixture model data on one topology can be distinguished from data evolved on an unmixed tree of another topology given enough data and the "correct" method. Here we show that this assumption can be false. For biologists, our results imply that, for example, the combined data from two genes whose phylogenetic trees differ only in terms of branch lengths can perfectly fit a tree of a different topology.  相似文献   

18.
Several methods have been designed to infer species trees from gene trees while taking into account gene tree/species tree discordance. Although some of these methods provide consistent species tree topology estimates under a standard model, most either do not estimate branch lengths or are computationally slow. An exception, the GLASS method of Mossel and Roch, is consistent for the species tree topology, estimates branch lengths, and is computationally fast. However, GLASS systematically overestimates divergence times, leading to biased estimates of species tree branch lengths. By assuming a multispecies coalescent model in which multiple lineages are sampled from each of two taxa at L independent loci, we derive the distribution of the waiting time until the first interspecific coalescence occurs between the two taxa, considering all loci and measuring from the divergence time. We then use the mean of this distribution to derive a correction to the GLASS estimator of pairwise divergence times. We show that our improved estimator, which we call iGLASS, consistently estimates the divergence time between a pair of taxa as the number of loci approaches infinity, and that it is an unbiased estimator of divergence times when one lineage is sampled per taxon. We also show that many commonly used clustering methods can be combined with the iGLASS estimator of pairwise divergence times to produce a consistent estimator of the species tree topology. Through simulations, we show that iGLASS can greatly reduce the bias and mean squared error in obtaining estimates of divergence times in a species tree.  相似文献   

19.
Yu Y  Degnan JH  Nakhleh L 《PLoS genetics》2012,8(4):e1002660
Gene tree topologies have proven a powerful data source for various tasks, including species tree inference and species delimitation. Consequently, methods for computing probabilities of gene trees within species trees have been developed and widely used in probabilistic inference frameworks. All these methods assume an underlying multispecies coalescent model. However, when reticulate evolutionary events such as hybridization occur, these methods are inadequate, as they do not account for such events. Methods that account for both hybridization and deep coalescence in computing the probability of a gene tree topology currently exist for very limited cases. However, no such methods exist for general cases, owing primarily to the fact that it is currently unknown how to compute the probability of a gene tree topology within the branches of a phylogenetic network. Here we present a novel method for computing the probability of gene tree topologies on phylogenetic networks and demonstrate its application to the inference of hybridization in the presence of incomplete lineage sorting. We reanalyze a Saccharomyces species data set for which multiple analyses had converged on a species tree candidate. Using our method, though, we show that an evolutionary hypothesis involving hybridization in this group has better support than one of strict divergence. A similar reanalysis on a group of three Drosophila species shows that the data is consistent with hybridization. Further, using extensive simulation studies, we demonstrate the power of gene tree topologies at obtaining accurate estimates of branch lengths and hybridization probabilities of a given phylogenetic network. Finally, we discuss identifiability issues with detecting hybridization, particularly in cases that involve extinction or incomplete sampling of taxa.  相似文献   

20.
Gene trees are evolutionary trees representing the ancestry of genes sampled from multiple populations. Species trees represent populations of individuals—each with many genes—splitting into new populations or species. The coalescent process, which models ancestry of gene copies within populations, is often used to model the probability distribution of gene trees given a fixed species tree. This multispecies coalescent model provides a framework for phylogeneticists to infer species trees from gene trees using maximum likelihood or Bayesian approaches. Because the coalescent models a branching process over time, all trees are typically assumed to be rooted in this setting. Often, however, gene trees inferred by traditional phylogenetic methods are unrooted. We investigate probabilities of unrooted gene trees under the multispecies coalescent model. We show that when there are four species with one gene sampled per species, the distribution of unrooted gene tree topologies identifies the unrooted species tree topology and some, but not all, information in the species tree edges (branch lengths). The location of the root on the species tree is not identifiable in this situation. However, for 5 or more species with one gene sampled per species, we show that the distribution of unrooted gene tree topologies identifies the rooted species tree topology and all its internal branch lengths. The length of any pendant branch leading to a leaf of the species tree is also identifiable for any species from which more than one gene is sampled.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号