首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
MOTIVATION: The desire to compare molecular phylogenies has stimulated the design of numerous tests. Most of these tests are formulated in a frequentist framework, and it is not known how they compare with Bayes procedures. I propose here two new Bayes tests that either compare pairs of trees (Bayes hypothesis test, BHT), or test each tree against an average of the trees included in the analysis (Bayes significance test, BST). RESULTS: The algorithm, based on a standard Metropolis-Hastings sampler, integrates nuisance parameters out and estimates the probability of the data under each topology. These quantities are used to estimate Bayes factors for composite vs. composite hypotheses. Based on two data sets, the BHT and BST are shown to construct similar confidence sets to the bootstrap and the Shimodaira Hasegawa test, respectively. This suggests that the known difference among previous tests is mainly due to the null hypothesis considered.  相似文献   

3.
The recent development of Bayesian phylogenetic inference using Markov chain Monte Carlo (MCMC) techniques has facilitated the exploration of parameter-rich evolutionary models. At the same time, stochastic models have become more realistic (and complex) and have been extended to new types of data, such as morphology. Based on this foundation, we developed a Bayesian MCMC approach to the analysis of combined data sets and explored its utility in inferring relationships among gall wasps based on data from morphology and four genes (nuclear and mitochondrial, ribosomal and protein coding). Examined models range in complexity from those recognizing only a morphological and a molecular partition to those having complex substitution models with independent parameters for each gene. Bayesian MCMC analysis deals efficiently with complex models: convergence occurs faster and more predictably for complex models, mixing is adequate for all parameters even under very complex models, and the parameter update cycle is virtually unaffected by model partitioning across sites. Morphology contributed only 5% of the characters in the data set but nevertheless influenced the combined-data tree, supporting the utility of morphological data in multigene analyses. We used Bayesian criteria (Bayes factors) to show that process heterogeneity across data partitions is a significant model component, although not as important as among-site rate variation. More complex evolutionary models are associated with more topological uncertainty and less conflict between morphology and molecules. Bayes factors sometimes favor simpler models over considerably more parameter-rich models, but the best model overall is also the most complex and Bayes factors do not support exclusion of apparently weak parameters from this model. Thus, Bayes factors appear to be useful for selecting among complex models, but it is still unclear whether their use strikes a reasonable balance between model complexity and error in parameter estimates.  相似文献   

4.
Rates of trait evolution are known to vary across phylogenies; however, standard evolutionary models assume a homogeneous process of trait change. These simple methods are widely applied in small‐scale phylogenetic studies, whereas models of rate heterogeneity are not, so the prevalence and patterns of potential rate variation in groups up to hundreds of species remain unclear. The extent to which trait evolution is modelled accurately on a given phylogeny is also largely unknown because studies typically lack absolute model fit tests. We investigated these issues by applying both rate‐static and variable‐rates methods on (i) body mass data for 88 avian clades of 10–318 species, and (ii) data simulated under a range of rate‐heterogeneity scenarios. Our results show that rate heterogeneity is present across small‐scaled avian clades, and consequently applying only standard single‐process models prompts inaccurate inferences about the generating evolutionary process. Specifically, these approaches underestimate rate variation, and systematically mislabel temporal trends in trait evolution. Conversely, variable‐rates approaches have superior relative fit (they are the best model) and absolute fit (they describe the data well). We show that rate changes such as single internal branch variations, rate decreases and early bursts are hard to detect, even by variable‐rates models. We also use recently developed absolute adequacy tests to highlight misleading conclusions based on relative fit alone (e.g. a consistent preference for constrained evolution when isolated terminal branch rate increases are present). This work highlights the potential for robust inferences about trait evolution when fitting flexible models in conjunction with tests for absolute model fit.  相似文献   

5.
刘超洋  庄文颖 《菌物学报》2013,32(3):563-573
在使用rRNA基因进行系统发育分析过程中,不同位点间进化速度的差异性可能是导致系统误差的一个重要原因。以52个真菌为研究对象,利用rRNA二级结构特征构建分区策略,探讨不同分区策略对贝叶斯分析的影响。结果显示各结构分区的最优核酸替代模型及其参数与分区类型密切相关。与传统的贝叶斯方法相比,使用结构环的分区策略对结果没有显著影响,而引入臂元素的方法却导致更高的边际似然值和支持率。此外,不考虑结构特征,简单的增加子分区数量的分区策略尽管也能导致贝叶斯因素值的增加,却没有提高解决亲缘关系的能力,说明一个合理的分区策略应该基于生物学功能(或二级结构特征)而非纯数学因素。  相似文献   

6.
Traditionally, phylogenetic analyses over many genes combine data into a contiguous block. Under this concatenated model, all genes are assumed to evolve at the same rate. However, it is clear that genes evolve at very different rates and that accounting for this rate heterogeneity is important if we are to accurately infer phylogenies from heterogeneous multigene data sets. There remain open questions regarding how best to incorporate gene rate parameters into phylogenetic models and which properties of real data correlate with improved fit over the concatenated model. In this study, two methods of accounting for gene rate heterogeneity are compared: the n-parameter method, which allows for each of the n gene partitions to have a gene rate parameter, and the alpha-parameter method, which fits a distribution to the gene rates. Results demonstrate that the n-parameter method is both computationally faster and in general provides a better fit over the concatenated model than the alpha-parameter method. Furthermore, improved model fit over the concatenated model is highly correlated with the presence of a gene with a slow relative rate of evolution.  相似文献   

7.
Phylogenetic analyses of DNA sequences were conducted to evaluate four alternative hypotheses of phrynosomatine sand lizard relationships. Sequences comprising 2871 aligned base pair positions representing the regions spanning ND1-COI and cyt b-tRNA(Thr) of the mitochondrial genome from all recognized sand lizard species were analyzed using unpartitioned parsimony and likelihood methods, likelihood methods with assumed partitions, Bayesian methods with assumed partitions, and Bayesian mixture models. The topology (Uma, (Callisaurus, (Cophosaurus, Holbrookia))) and thus monophyly of the "earless" taxa, Cophosaurus and Holbrookia, is supported by all analyses. Previously proposed topologies in which Uma and Callisaurus are sister taxa and those in which Holbrookia is the sister group to all other sand lizard taxa are rejected using both parsimony and likelihood-based significance tests with the combined, unparitioned data set. Bayesian hypothesis tests also reject those topologies using six assumed partitioning strategies, and the two partitioning strategies presumably associated with the most powerful tests also reject a third previously proposed topology, in which Callisaurus and Cophosaurus are sister taxa. For both maximum likelihood and Bayesian methods with assumed partitions, those partitions defined by codon position and tRNA stem and nonstems explained the data better than other strategies examined. Bayes factor estimates comparing results of assumed partitions versus mixture models suggest that mixture models perform better than assumed partitions when the latter were not based on functional characteristics of the data, such as codon position and tRNA stem and nonstems. However, assumed partitions performed better than mixture models when functional differences were incorporated. We reiterate the importance of accounting for heterogeneous evolutionary processes in the analysis of complex data sets and emphasize the importance of implementing mixed model likelihood methods.  相似文献   

8.
Probabilistic tests of topology offer a powerful means of evaluating competing phylogenetic hypotheses. The performance of the nonparametric Shimodaira-Hasegawa (SH) test, the parametric Swofford-Olsen-Waddell-Hillis (SOWH) test, and Bayesian posterior probabilities were explored for five data sets for which all the phylogenetic relationships are known with a very high degree of certainty. These results are consistent with previous simulation studies that have indicated a tendency for the SOWH test to be prone to generating Type 1 errors because of model misspecification coupled with branch length heterogeneity. These results also suggest that the SOWH test may accord overconfidence in the true topology when the null hypothesis is in fact correct. In contrast, the SH test was observed to be much more conservative, even under high substitution rates and branch length heterogeneity. For some of those data sets where the SOWH test proved misleading, the Bayesian posterior probabilities were also misleading. The results of all tests were strongly influenced by the exact substitution model assumptions. Simple models, especially those that assume rate homogeneity among sites, had a higher Type 1 error rate and were more likely to generate misleading posterior probabilities. For some of these data sets, the commonly used substitution models appear to be inadequate for estimating appropriate levels of uncertainty with the SOWH test and Bayesian methods. Reasons for the differences in statistical power between the two maximum likelihood tests are discussed and are contrasted with the Bayesian approach.  相似文献   

9.
The molecular clock, i.e., constancy of the rate of evolution over time, is commonly assumed in estimating divergence dates. However, this assumption is often violated and has drastic effects on date estimation. Recently, a number of attempts have been made to relax the clock assumption. One approach is to use maximum likelihood, which assigns rates to branches and allows the estimation of both rates and times. An alternative is the Bayes approach, which models the change of the rate over time. A number of models of rate change have been proposed. We have extended and evaluated models of rate evolution, i.e., the lognormal and its recent variant, along with the gamma, the exponential, and the Ornstein-Uhlenbeck processes. These models were first applied to a small hominoid data set, where an empirical Bayes approach was used to estimate the hyperparameters that measure the amount of rate variation. Estimation of divergence times was sensitive to these hyperparameters, especially when the assumed model is close to the clock assumption. The rate and date estimates varied little from model to model, although the posterior Bayes factor indicated the Ornstein-Uhlenbeck process outperformed the other models. To demonstrate the importance of allowing for rate change across lineages, this general approach was used to analyze a larger data set consisting of the 18S ribosomal RNA gene of 39 metazoan species. We obtained date estimates consistent with paleontological records, the deepest split within the group being about 560 million years ago. Estimates of the rates were in accordance with the Cambrian explosion hypothesis and suggested some more recent lineage-specific bursts of evolution.  相似文献   

10.
11.
Many molecular phylogenies show longer root-to-tip path lengths in species-rich groups, encouraging hypotheses linking cladogenesis with accelerated molecular evolution. However, the pattern can also be caused by an artifact called the node density effect (NDE): this effect occurs when the method used to reconstruct a tree underestimates multiple hits that would have been revealed by extra nodes, leading to longer root-to-tip path lengths in clades with more terminal taxa. Here we use a twofold approach to demonstrate that maximum likelihood and Bayesian methods also suffer from the NDE known to affect parsimony. First, simulations deliberately mismatching the simulation and reconstruction models show that the greater the model disparity, the greater the gap between actual and reconstructed tree lengths, and the greater the NDE. Second, taxon sampling manipulation with empirical data shows that NDE can still be present when using optimized models: across 12 datasets, 70 out of 109 sister path comparisons showed significant evidence of NDE. Unless the model fairly accurately reconstructs the real tree length-and given the complexity of real sequence evolution this may be uncommon -- it will consistently produce a node density artifact. At commonly encountered divergence levels, a 10% underestimation of tree length results in > or = 80% of simulated phylogenies showing a positive NDE. Bayesian trees have a slight but consistently stronger effect. This pervasive methodological artifact increases apparent rate heterogeneity, and can compromise investigations of factors influencing molecular evolutionary rate that use path lengths in topologically asymmetric trees.  相似文献   

12.
Insertions and deletions (indels) are common molecular evolutionary events. However, probabilistic models for indel evolution are under-developed due to their computational complexity. Here, we introduce several improvements to indel modeling: 1) While previous models for indel evolution assumed that the rates and length distributions of insertions and deletions are equal, here we propose a richer model that explicitly distinguishes between the two; 2) we introduce numerous summary statistics that allow approximate Bayesian computation-based parameter estimation; 3) we develop a method to correct for biases introduced by alignment programs, when inferring indel parameters from empirical data sets; and 4) using a model-selection scheme, we test whether the richer model better fits biological data compared with the simpler model. Our analyses suggest that both our inference scheme and the model-selection procedure achieve high accuracy on simulated data. We further demonstrate that our proposed richer model better fits a large number of empirical data sets and that, for the majority of these data sets, the deletion rate is higher than the insertion rate.  相似文献   

13.
Recent multi-gene phylogenetic analyses of plastid-encoded genes have recovered a robust monophyly of chlorophyll-c containing plastids (Chl-c palstids) in cryptophytes, haptophytes, photosynthetic stramenopiles, and dinoflagellates. However, all the plastid multi-gene phylogenies published to date utilized the "linked" model, which ignores the heterogeneity of sequence evolution across genes in alignments. Both empirical and simulation studies show that, compared to the linked model, the "unlinked" model, which accounts for gene-specific evolution, can greatly improve multi-gene estimations. Here we newly sequenced 46 genes of Chl-c plastids, and examined the Chl-c plastid evolution by multi-gene analyses under the unlinked model. Unexpectedly, Chl-c plastid monophyly received only low to medium support in our analyses based on multi-gene data sets including up to 4829 alignment positions. Although we systematically surveyed and excluded the genes that could mislead estimation, the (inconclusive) support for Chl-c plastid monophyly was not significantly altered. We conclude that the estimates from the current plastid-encoded gene data are insufficient to resolve Chl-c plastid evolution with confidence, and are highly affected by genes subjected to the analyses, and methods for tree reconstruction applied. Thus, future data analyses of larger multi-gene data sets, preferentially under the unlinked model, are required to conclusively understand Chl-c plastid evolution.  相似文献   

14.
The rate at which a given site in a gene sequence alignment evolves over time may vary. This phenomenon--known as heterotachy--can bias or distort phylogenetic trees inferred from models of sequence evolution that assume rates of evolution are constant. Here, we describe a phylogenetic mixture model designed to accommodate heterotachy. The method sums the likelihood of the data at each site over more than one set of branch lengths on the same tree topology. A branch-length set that is best for one site may differ from the branch-length set that is best for some other site, thereby allowing different sites to have different rates of change throughout the tree. Because rate variation may not be present in all branches, we use a reversible-jump Markov chain Monte Carlo algorithm to identify those branches in which reliable amounts of heterotachy occur. We implement the method in combination with our 'pattern-heterogeneity' mixture model, applying it to simulated data and five published datasets. We find that complex evolutionary signals of heterotachy are routinely present over and above variation in the rate or pattern of evolution across sites, that the reversible-jump method requires far fewer parameters than conventional mixture models to describe it, and serves to identify the regions of the tree in which heterotachy is most pronounced. The reversible-jump procedure also removes the need for a posteriori tests of 'significance' such as the Akaike or Bayesian information criterion tests, or Bayes factors. Heterotachy has important consequences for the correct reconstruction of phylogenies as well as for tests of hypotheses that rely on accurate branch-length information. These include molecular clocks, analyses of tempo and mode of evolution, comparative studies and ancestral state reconstruction. The model is available from the authors' website, and can be used for the analysis of both nucleotide and morphological data.  相似文献   

15.
Rates of phenotypic evolution derive from numerous interrelated processes acting at varying spatial and temporal scales and frequently differ substantially among lineages. Although current models employed in reconstructing ancestral character states permit independent rates for distinct types of transition (forward and reverse transitions and transitions between different states), these rates are typically assumed to be identical for all branches in a phylogeny. In this paper, I present a general model of character evolution enabling rate heterogeneity among branches. This model is employed in assessing the extent to which the assumption of uniform transition rates affects reconstructions of ancestral limb morphology in the scincid lizard clade Lerista and, accordingly, the potential for rate variability to mislead inferences of evolutionary patterns. Permitting rate variation among branches significantly improves model fit for both the manus and the pes. A constrained model in which the rate of digit acquisition is assumed to be effectively zero is strongly supported in each case; when compared with a model assuming unconstrained transition rates, this model provides a substantially better fit for the manus and a nearly identical fit for the pes. Ancestral states reconstructed assuming the constrained model imply patterns of limb evolution differing significantly from those implied by reconstructions for uniform-rate models, particularly for the pes; whereas ancestral states for the uniform-rate models consistently entail the reacquisition of pedal digits, those for the model incorporating among-lineage rate heterogeneity imply repeated, unreversed digit loss. These results indicate that the assumption of identical transition rates for all branches in a phylogeny may be inappropriate in modeling the evolution of phenotypic traits and emphasize the need for careful evaluation of phylogenetic tests of Dollo's law.  相似文献   

16.
Analysis of sequence data using time‐reversible substitution models and maximum likelihood (ML) algorithms is currently the most popular method to infer phylogenies, despite the fact that results often contradict each other. Searching for sources of error we focus on a hitherto neglected feature of these methods: character polarity is usually thought to be irrelevant in ML analyses. Mechanisms that lead to wrong tree topologies were analysed at the level of split‐supporting site patterns. In simulations, plesiomorphic site patterns can be identified by comparison with known root sequences. These patterns cause some surprising effects: Using data sets generated with simulations of sequence evolution along a variety of topologies and inferring trees using the same (correct) model, we show for cases of branch‐length heterogeneity that (i) as already known, ML analyses can fail to recover the correct tree even when the correct substitution model is used, but also that (ii) plesiomorphic character states cause substantial mistakes and therefore character polarity is relevant, and (iii) accumulating chance similarities on long branches are far less misleading than plesiomorphic states accumulating on shorter branches. The artefacts occur when branch lengths are heterogeneous. The systematic errors disappear for the most part when the sites with symplesiomorphies supporting false clades are deleted from the data set. We conclude that many of the phylogenies published during the past decades may be false due to the neglected effects of symplesiomorphies.  相似文献   

17.
Likelihood-ratio statistics are proposed to test for heterogeneity in nucleotide substitution rate among regions of a DNA sequence. The tests examine three-sequence phylogenies, and two specific tests are proposed: a test to detect rate heterogeneity among genic regions within a sequence, over all evolutionary lineages; and a test to detect rate heterogeneity among regions in a specific evolutionary lineage. Simulations examine the ability of tests to detect a single region that varies in nucleotide substitution rate relative to the remainder of the sequence. A 50-bp region with a fivefold substitution-rate increase can be detected > or = 90% of the time when it is found in all three lineages of the phylogeny, and a 50-bp region of fivefold rate increase can be detected with approximately 70% power when it is found in only one evolutionary lineage. Simulation also examines the effect of transition- and transversion-rate differences. The tests are applied to published DNA sequences. While the tests are powerful, significant results can be difficult to interpret biologically.   相似文献   

18.
Codon-based substitution models have been widely used to identify amino acid sites under positive selection in comparative analysis of protein-coding DNA sequences. The nonsynonymous-synonymous substitution rate ratio (d(N)/d(S), denoted omega) is used as a measure of selective pressure at the protein level, with omega > 1 indicating positive selection. Statistical distributions are used to model the variation in omega among sites, allowing a subset of sites to have omega > 1 while the rest of the sequence may be under purifying selection with omega < 1. An empirical Bayes (EB) approach is then used to calculate posterior probabilities that a site comes from the site class with omega > 1. Current implementations, however, use the naive EB (NEB) approach and fail to account for sampling errors in maximum likelihood estimates of model parameters, such as the proportions and omega ratios for the site classes. In small data sets lacking information, this approach may lead to unreliable posterior probability calculations. In this paper, we develop a Bayes empirical Bayes (BEB) approach to the problem, which assigns a prior to the model parameters and integrates over their uncertainties. We compare the new and old methods on real and simulated data sets. The results suggest that in small data sets the new BEB method does not generate false positives as did the old NEB approach, while in large data sets it retains the good power of the NEB approach for inferring positively selected sites.  相似文献   

19.
Kingsolver et al.'s review of phenotypic selection gradients from natural populations provided a glimpse of the form and strength of selection in nature and how selection on different organisms and traits varies. Because this review's underlying database could be a key tool for answering fundamental questions concerning natural selection, it has spawned discussion of potential biases inherent in the review process. Here, we explicitly test for two commonly discussed sources of bias: sampling error and publication bias. We model the relationship between variance among selection gradients and sample size that sampling error produces by subsampling large empirical data sets containing measurements of traits and fitness. We find that this relationship was not mimicked by the review data set and therefore conclude that sampling error does not bias estimations of the average strength of selection. Using graphical tests, we find evidence for bias against publishing weak estimates of selection only among very small studies (N<38). However, this evidence is counteracted by excess weak estimates in larger studies. Thus, estimates of average strength of selection from the review are less biased than is often assumed. Devising and conducting straightforward tests for different biases allows concern to be focused on the most troublesome factors.  相似文献   

20.
The evolution of genomic base composition in bacteria   总被引:1,自引:0,他引:1  
Abstract. Guanine plus cytosine (GC) content ranges broadly among bacterial genomes. In this study, we explore the use of a Brownian-motion model for the evolution of GC content over time. This model assumes that GC content varies over time in a continuous and homogeneous manner. Using this model and a maximum-likelihood approach, we analyzed the evolution of GC content across several bacterial phylogenies. Using three independent tests, we found that the observed divergence in GC content was consistent with a homogeneous Brownian-motion model. For example, similar rates of GC content evolution were inferred in several different bacterial subclades, indicating that there is relatively little rate heterogeneity in GC content evolution over broad evolutionary time scales. We thus argue that the homogeneous Brownian-motion model provides a good working model for GC content evolution. We then use this model to determine the overall rate of GC content evolution among eubacteria. We also determine the time frame over which GC content remains similar in related taxa, using a flexible definition for "similarity" in GC content so that, depending on the context, more or less stringent criteria may be applied. Our results have implications for models of sequence evolution, including those used for phylogenetic reconstruction and for inferring unusual changes in GC content.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号