首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Many phylogenetic algorithms search the space of possible trees using topological rearrangements and some optimality criterion. FastME is such an approach that uses the {em balanced minimum evolution (BME)} principle, which computer studies have demonstrated to have high accuracy. FastME includes two variants: {em balanced subtree prune and regraft (BSPR)} and {em balanced nearest neighbor interchange (BNNI)}. These algorithms take as input a distance matrix and a putative phylogenetic tree. The tree is modified using SPR or NNI operations, respectively, to reduce the BME length relative to the distance matrix, until a tree with (locally) shortest BME length is found. Following computer simulations, it has been conjectured that BSPR and BNNI are consistent, i.e. for an input distance that is a tree-metric, they converge to the corresponding tree. We prove that the BSPR algorithm is consistent. Moreover, even if the input contains small errors relative to a tree-metric, we show that the BSPR algorithm still returns the corresponding tree. Whether BNNI is consistent remains open.  相似文献   

2.
Likelihood-based phylogenetic inference posits a probabilistic model of character state change along branches of a phylogenetic tree. These models typically assume statistical independence of sites in the sequence alignment. This is a restrictive assumption that facilitates computational tractability, but ignores how epistasis, the effect of genetic background on mutational effects, influences the evolution of functional sequences. We consider the effect of using a misspecified site-independent model on the accuracy of Bayesian phylogenetic inference in the setting of pairwise-site epistasis. Previous work has shown that as alignment length increases, tree reconstruction accuracy also increases. Here, we present a simulation study demonstrating that accuracy increases with alignment size even if the additional sites are epistatically coupled. We introduce an alignment-based test statistic that is a diagnostic for pairwise epistasis and can be used in posterior predictive checks.  相似文献   

3.
We examine the impact of likelihood surface characteristics on phylogenetic inference. Amino acid data sets simulated from topologies with branch length features chosen to represent varying degrees of difficulty for likelihood maximization are analyzed. We present situations where the tree found to achieve the global maximum in likelihood is often not equal to the true tree. We use the program covSEARCH to demonstrate how the use of adaptively sized pools of candidate trees that are updated using confidence tests results in solution sets that are highly likely to contain the true tree. This approach requires more computation than traditional maximum likelihood methods, hence covSEARCH is best suited to small to medium-sized alignments or large alignments with some constrained nodes. The majority rule consensus tree computed from the confidence sets also proves to be different from the generating topology. Although low phylogenetic signal in the input alignment can result in large confidence sets of trees, some biological information can still be obtained based on nodes that exhibit high support within the confidence set. Two real data examples are analyzed: mammal mitochondrial proteins and a small tubulin alignment. We conclude that the technique of confidence set optimization can significantly improve the robustness of phylogenetic inference at a reasonable computational cost. Additionally, when either very short internal branches or very long terminal branches are present, confident resolution of specific bipartitions or subtrees, rather than whole-tree phylogenies, may be the most realistic goal for phylogenetic methods. [Reviewing Editor: Dr. Nicolas Galtier]  相似文献   

4.
5.
The distribution of a phenotype on a phylogenetic tree is often a quantity of interest. Many phenotypes have imperfect heritability, so that a measurement of the phenotype for an individual can be thought of as a single realization from the phenotype distribution of that individual. If all individuals in a phylogeny had the same phenotype distribution, measured phenotypes would be randomly distributed on the tree leaves. This is, however, often not the case, implying that the phenotype distribution evolves over time. Here we propose a new model based on this principle of evolving phenotype distribution on the branches of a phylogeny, which is different from ancestral state reconstruction where the phenotype itself is assumed to evolve. We develop an efficient Bayesian inference method to estimate the parameters of our model and to test the evidence for changes in the phenotype distribution. We use multiple simulated data sets to show that our algorithm has good sensitivity and specificity properties. Since our method identifies branches on the tree on which the phenotype distribution has changed, it is able to break down a tree into components for which this distribution is unique and constant. We present two applications of our method, one investigating the association between HIV genetic variation and human leukocyte antigen and the other studying host range distribution in a lineage of Salmonella enterica, and we discuss many other potential applications.  相似文献   

6.
Probabilism and Phylogenetic Inference   总被引:5,自引:2,他引:3  
The maximum likelihood approach to phylogenetics rests on frequency probability theory. This stands in stark contrast to the logical probability of corroboration-based cladistic parsimony. History is particular and cannot be described in terms of universal statements about abstract generalities, the task of the historical sciences being one of explanation, not prediction. Thus, frequency probability methods of estimation are inappropriate for making historical inferences. Maximum likelihood estimation procedures are deconstructed from numerous perspectives in spite of their supposed impressive technicalities. Charges of parsimony's inconsistency are rendered mute, because its justification lies elsewhere, yet maximum likelihood is still subject to Wald's dilemma if realism is of any interest. Although all epistemologies make assumptions, the models employed by maximum likelihood are problematic and deterministic, as opposed to the unproblematic background knowledge characteristic of cladistics. Apart from issues of logical and sampling dependencies, the requirements of frequency probability theory are non-trivial and the maximum likelihood estimation of phylogeny can neither escape, nor satisfy the tenets of calculus independence (e.g. i.i.d.) inherent in the multiplicative relations of the method. If phylogeneticists are to maintain a rational foundation for their epistemology, neo-justificationist appeals to some metaphysical truth must be abandoned in favour of the realism of sophisticated falsification.  相似文献   

7.
8.
本文提出了一种计算蛋白质绝对进化距离和进化速率的方法,它根据现有同源蛋白质的序列构建分子进化树,并推断进化过程中各结点处的共同祖先序列,根据某成员与某结点处共同祖先序列的氨基酸差异百分率,计算该蛋白质序列的特异进化距离和进化速率。比较我们的算法和Dayhoff等的模拟统计方法表明,我们的算法在一定范围内是正确的。结合计算哺乳动物红细胞生成素的进化速率,讨论了本算法在分子进化研究中的应用。  相似文献   

9.
We develop a new approach to estimate a matrix of pairwise evolutionary distances from a codon-based alignment based on a codon evolutionary model. The method first computes a standard distance matrix for each of the three codon positions. Then these three distance matrices are weighted according to an estimate of the global evolutionary rate of each codon position and averaged into a unique distance matrix. Using a large set of both real and simulated codon-based alignments of nucleotide sequences, we show that this approach leads to distance matrices that have a significantly better treelikeness compared to those obtained by standard nucleotide evolutionary distances. We also propose an alternative weighting to eliminate the part of the noise often associated with some codon positions, particularly the third position, which is known to induce a fast evolutionary rate. Simulation results show that fast distance-based tree reconstruction algorithms on distance matrices based on this codon position weighting can lead to phylogenetic trees that are at least as accurate as, if not better, than those inferred by maximum likelihood. Finally, a well-known multigene dataset composed of eight yeast species and 106 codon-based alignments is reanalyzed and shows that our codon evolutionary distances allow building a phylogenetic tree which is similar to those obtained by non-distance-based methods (e.g., maximum parsimony and maximum likelihood) and also significantly improved compared to standard nucleotide evolutionary distance estimates.  相似文献   

10.
Green euglenophytes are a group of eukaryotes with ancient origin. In order to understand the evolution of the group, it is interesting to know which characteristics are more primitive. Here, a phylogenetic tree of green euglenophytes based on the 18S rRNA gene was constructed, and ancestral states were reconstructed based on eight morphological characters. This research clarifies the phylogenetic relationships of green euglenophytes and provides a basis for the study of the origin of these plants. The phylogenetic tree, which was constructed by Bayesian inference, revealed that: Eutreptia and Eutreptiella were sister groups and that Lepocinclis, Phacus, and Discoplastis were close relatives; Euglena, Cryptoglena, Monomorphina, and Colacium were closely related in addition to Trachelomonas and Strombomonas; and Euglena was not monophyletic. An ancestral reconstruction based on morphological characters revealed seven primitive character states: ductile surface, spirally striated, slightly narrowing or sharp elongated cauda, absence of a lorica, chloroplast lamellar, shield or large discoid, pyrenoid with sheath, and with many small paramylon grains. However, the ancestral state of the length of the flagellum could not be inferred. Euglena and Euglenaria, which both possessed all of the ancestral character states, might represent the most ancient lineages of green euglenophytes.  相似文献   

11.
Fitzhugh  Kirk 《Acta biotheoretica》2021,69(4):799-819
Acta Biotheoretica - Three competing ‘methods’ have been endorsed for inferring phylogenetic hypotheses: parsimony, likelihood, and Bayesianism. The latter two have been claimed...  相似文献   

12.
Wen-Hsiung Li 《Genetics》1986,113(1):187-213
Mathematical formulas are developed for the evolutionary change of restriction cleavage sites in a DNA sequence, allowing unequal rates between transitional and transversional types of nucleotide substitution. Formulas are also developed for the probability of having a particular pattern of site changes among evolutionary lineages, such as parallel gains or losses of sites, and for inferring the presence or absence of a restriction site in an ancestral sequence from data on the present-day sequences. The unordered compatibility method is proposed for inferring the phylogenetic relationships among relatively closely related organisms, treating restriction sites as cladistic characters. Formulas are derived for the probability (P+) of obtaining the correct network for a given number (N) of informative sites for the cases of four and five species. These formulas are applied to evaluate the performance of the method and to estimate the N value required for P+ to be 95% or larger. The method performs well when the branches between ancestral nodes and the branches leading to the two most recent species are more or less equal in length, but performs poorly when the latter two branches are considerably longer than the former.  相似文献   

13.
该研究基于叶绿体16S rRNA基因序列,构建绿色裸藻类的系统发育树,并对绿色裸藻类植物8个形态性状进行祖先重建分析,以明确绿色裸藻类植物的系统演化关系,为研究该类植物的起源提供理论依据。结果表明:(1)贝叶斯法构建的绿色裸藻类系统发育树显示,双鞭藻属与拟双鞭藻属互为姐妹群,扁裸藻属、鳞孔藻属和盘裸藻属亲缘关系较近,而囊裸藻属和陀螺藻属亲缘关系较近,裸藻属、隐裸藻属、柄裸藻属和旋形藻属亲缘关系较近,表明裸藻属不是一个单系类群。(2)基于形态性状的祖先重建结果显示,绿色裸藻类相对原始的7个性状包括:表质柔软易变形,出现螺旋形线纹,细胞后端渐尖或尖尾刺状,无囊壳,叶绿体为片状、盾状或大盘状,具无鞘蛋白核,副淀粉粒为小颗粒状且数量不定,而鞭毛长度不能推断可能的祖先状态。(3)综合8种性状祖先重建结果发现,裸藻属和眼裸藻属植物具有所有原始性状,可能是最先出现的绿色裸藻类的祖先。  相似文献   

14.
The multispecies coalescent (MSC) model accommodates both species divergences and within-species coalescent and provides a natural framework for phylogenetic analysis of genomic data when the gene trees vary across the genome. The MSC model implemented in the program bpp assumes a molecular clock and the Jukes–Cantor model, and is suitable for analyzing genomic data from closely related species. Here we extend our implementation to more general substitution models and relaxed clocks to allow the rate to vary among species. The MSC-with-relaxed-clock model allows the estimation of species divergence times and ancestral population sizes using genomic sequences sampled from contemporary species when the strict clock assumption is violated, and provides a simulation framework for evaluating species tree estimation methods. We conducted simulations and analyzed two real datasets to evaluate the utility of the new models. We confirm that the clock-JC model is adequate for inference of shallow trees with closely related species, but it is important to account for clock violation for distant species. Our simulation suggests that there is valuable phylogenetic information in the gene-tree branch lengths even if the molecular clock assumption is seriously violated, and the relaxed-clock models implemented in bpp are able to extract such information. Our Markov chain Monte Carlo algorithms suffer from mixing problems when used for species tree estimation under the relaxed clock and we discuss possible improvements. We conclude that the new models are currently most effective for estimating population parameters such as species divergence times when the species tree is fixed.  相似文献   

15.
Accurate genome-wide identification of orthologs is a central problem in comparative genomics, a fact reflected by the numerous orthology identification projects developed in recent years. However, only a few reports have compared their accuracy, and indeed, several recent efforts have not yet been systematically evaluated. Furthermore, orthology is typically only assessed in terms of function conservation, despite the phylogeny-based original definition of Fitch. We collected and mapped the results of nine leading orthology projects and methods (COG, KOG, Inparanoid, OrthoMCL, Ensembl Compara, Homologene, RoundUp, EggNOG, and OMA) and two standard methods (bidirectional best-hit and reciprocal smallest distance). We systematically compared their predictions with respect to both phylogeny and function, using six different tests. This required the mapping of millions of sequences, the handling of hundreds of millions of predicted pairs of orthologs, and the computation of tens of thousands of trees. In phylogenetic analysis or in functional analysis where high specificity is required, we find that OMA and Homologene perform best. At lower functional specificity but higher coverage level, OrthoMCL outperforms Ensembl Compara, and to a lesser extent Inparanoid. Lastly, the large coverage of the recent EggNOG can be of interest to build broad functional grouping, but the method is not specific enough for phylogenetic or detailed function analyses. In terms of general methodology, we observe that the more sophisticated tree reconstruction/reconciliation approach of Ensembl Compara was at times outperformed by pairwise comparison approaches, even in phylogenetic tests. Furthermore, we show that standard bidirectional best-hit often outperforms projects with more complex algorithms. First, the present study provides guidance for the broad community of orthology data users as to which database best suits their needs. Second, it introduces new methodology to verify orthology. And third, it sets performance standards for current and future approaches.  相似文献   

16.
17.
基于16S rDNA的系统发育分析在微生物进化关系中的应用   总被引:2,自引:0,他引:2  
系统发育树的构建是现代生命科学研究中的重要技术,是分析未知菌种与其他菌种的亲缘关系,为进一步了解生物的进化关系的重要依据。对系统发育树的构建进行了详细的介绍。并对其在微生物进化研究中的具体应用进行了阐述。  相似文献   

18.
The evolutionary rate at an amino acid site is indicative of how conserved this site is and, in turn, allows evaluating the importance of this site in maintaining the structure/function of the protein. When evolutionary rates are estimated, one must reconstruct the phylogenetic tree describing the evolutionary relationship among the sequences under study. However, if the inferred phylogenetic tree is incorrect, it can lead to erroneous site-specific rate estimates. Here we describe a novel Bayesian method that uses Markov chain Monte Carlo methodology to integrate over the space of all possible trees and model parameters. By doing so, the method considers alternative evolutionary scenarios weighted by their posterior probabilities. We show that this comprehensive evolutionary approach is superior over methods that are based on only a single tree. We illustrate the potential of our algorithm by analyzing the conservation pattern of the potassium channel protein family.Itay Mayrose, Amir Mitchell contributed equal. Reviewing Editor : Dr. Nicolas Galtier  相似文献   

19.
20.
Ultraconserved elements (UCEs) are stretches of hundreds of nucleotides with highly conserved cores flanked by variable regions. Although the selective forces responsible for the preservation of UCEs are unknown, they are nonetheless believed to contain phylogenetically meaningful information from deep to shallow divergence events. Phylogenetic applications of UCEs assume the same degree of rate heterogeneity applies across the entire locus, including variable flanking regions. We present a Wright–Fisher model of selection on nucleotides (SelON) which includes the effects of mutation, drift, and spatially varying, stabilizing selection for an optimal nucleotide sequence. The SelON model assumes the strength of stabilizing selection follows a position-dependent Gaussian function whose exact shape can vary between UCEs. We evaluate SelON by comparing its performance to a simpler and spatially invariant GTR+Γ model using an empirical data set of 400 vertebrate UCEs used to determine the phylogenetic position of turtles. We observe much improvement in model fit of SelON over the GTR+Γ model, and support for turtles as sister to lepidosaurs. Overall, the UCE-specific parameters SelON estimates provide a compact way of quantifying the strength and variation in selection within and across UCEs. SelON can also be extended to include more realistic mapping functions between sequence and stabilizing selection as well as allow for greater levels of rate heterogeneity. By more explicitly modeling the nature of selection on UCEs, SelON and similar approaches can be used to better understand the biological mechanisms responsible for their preservation across highly divergent taxa and long evolutionary time scales.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号