首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 406 毫秒
1.
Recent developments in marginal likelihood estimation for model selection in the field of Bayesian phylogenetics and molecular evolution have emphasized the poor performance of the harmonic mean estimator (HME). Although these studies have shown the merits of new approaches applied to standard normally distributed examples and small real-world data sets, not much is currently known concerning the performance and computational issues of these methods when fitting complex evolutionary and population genetic models to empirical real-world data sets. Further, these approaches have not yet seen widespread application in the field due to the lack of implementations of these computationally demanding techniques in commonly used phylogenetic packages. We here investigate the performance of some of these new marginal likelihood estimators, specifically, path sampling (PS) and stepping-stone (SS) sampling for comparing models of demographic change and relaxed molecular clocks, using synthetic data and real-world examples for which unexpected inferences were made using the HME. Given the drastically increased computational demands of PS and SS sampling, we also investigate a posterior simulation-based analogue of Akaike's information criterion (AIC) through Markov chain Monte Carlo (MCMC), a model comparison approach that shares with the HME the appealing feature of having a low computational overhead over the original MCMC analysis. We confirm that the HME systematically overestimates the marginal likelihood and fails to yield reliable model classification and show that the AICM performs better and may be a useful initial evaluation of model choice but that it is also, to a lesser degree, unreliable. We show that PS and SS sampling substantially outperform these estimators and adjust the conclusions made concerning previous analyses for the three real-world data sets that we reanalyzed. The methods used in this article are now available in BEAST, a powerful user-friendly software package to perform Bayesian evolutionary analyses.  相似文献   

2.
Xie W  Lewis PO  Fan Y  Kuo L  Chen MH 《Systematic biology》2011,60(2):150-160
The marginal likelihood is commonly used for comparing different evolutionary models in Bayesian phylogenetics and is the central quantity used in computing Bayes Factors for comparing model fit. A popular method for estimating marginal likelihoods, the harmonic mean (HM) method, can be easily computed from the output of a Markov chain Monte Carlo analysis but often greatly overestimates the marginal likelihood. The thermodynamic integration (TI) method is much more accurate than the HM method but requires more computation. In this paper, we introduce a new method, steppingstone sampling (SS), which uses importance sampling to estimate each ratio in a series (the "stepping stones") bridging the posterior and prior distributions. We compare the performance of the SS approach to the TI and HM methods in simulation and using real data. We conclude that the greatly increased accuracy of the SS and TI methods argues for their use instead of the HM method, despite the extra computation needed.  相似文献   

3.
In order to have confidence in model-based phylogenetic analysis, the model of nucleotide substitution adopted must be selected in a statistically rigorous manner. Several model-selection methods are applicable to maximum likelihood (ML) analysis, including the hierarchical likelihood-ratio test (hLRT), Akaike information criterion (AIC), Bayesian information criterion (BIC), and decision theory (DT), but their performance relative to empirical data has not been investigated thoroughly. In this study, we use 250 phylogenetic data sets obtained from TreeBASE to examine the effects that choice in model selection has on ML estimation of phylogeny, with an emphasis on optimal topology, bootstrap support, and hypothesis testing. We show that the use of different methods leads to the selection of two or more models for approximately 80% of the data sets and that the AIC typically selects more complex models than alternative approaches. Although ML estimation with different best-fit models results in incongruent tree topologies approximately 50% of the time, these differences are primarily attributable to alternative resolutions of poorly supported nodes. Furthermore, topologies and bootstrap values estimated with ML using alternative statistically supported models are more similar to each other than to topologies and bootstrap values estimated with ML under the Kimura two-parameter (K2P) model or maximum parsimony (MP). In addition, Swofford-Olsen-Waddell-Hillis (SOWH) tests indicate that ML trees estimated with alternative best-fit models are usually not significantly different from each other when evaluated with the same model. However, ML trees estimated with statistically supported models are often significantly suboptimal to ML trees made with the K2P model when both are evaluated with K2P, indicating that not all models perform in an equivalent manner. Nevertheless, the use of alternative statistically supported models generally does not affect tests of monophyletic relationships under either the Shimodaira-Hasegawa (S-H) or SOWH methods. Our results suggest that although choice in model selection has a strong impact on optimal tree topology, it rarely affects evolutionary inferences drawn from the data because differences are mainly confined to poorly supported nodes. Moreover, since ML with alternative best-fit models tends to produce more similar estimates of phylogeny than ML under the K2P model or MP, the use of any statistically based model-selection method is vastly preferable to forgoing the model-selection process altogether.  相似文献   

4.
The longitudinal spread of temperate organisms into refugial populations in Southern Europe is generally assumed to predate the last interglacial. However, few studies have attempted to quantify this process in nonmodel organisms using explicit models and multilocus data. We used sequence data for 20 intron‐spanning loci (12 kb per individual) to resolve the history of refugial populations of a widespread western Palaearctic oak gall parasitoid Cecidostiba fungosa (Pteromalidae). Using maximum likelihood and Bayesian methods we assess alternative population tree topologies and estimate divergence times and ancestral population sizes under a model of divergence between three refugia (Middle East, Balkans and Iberia). Both methods support an “Out of the East” history for C. fungosa, matching the pattern previously inferred for their gallwasp hosts. However, coalescent‐based estimates of the ages of population divides are much more recent (coinciding with the Eemian interglacial) than nodal ages of single gene trees for C. fungosa and other species. We also find that increasing the sample size from one haploid sequence per refugial population to three only marginally improves parameter estimates. Our results suggest that there is significant information in the minimal samples currently analyzable with maximum likelihood methods, and that similar methods could be applied to multiple species to test alternative models of assemblage evolution.  相似文献   

5.
Aim   Although parameter estimates are not as affected by spatial autocorrelation as Type I errors, the change from classical null hypothesis significance testing to model selection under an information theoretic approach does not completely avoid problems caused by spatial autocorrelation. Here we briefly review the model selection approach based on the Akaike information criterion (AIC) and present a new routine for Spatial Analysis in Macroecology (SAM) software that helps establishing minimum adequate models in the presence of spatial autocorrelation.
Innovation    We illustrate how a model selection approach based on the AIC can be used in geographical data by modelling patterns of mammal species in South America represented in a grid system ( n  = 383) with 2° of resolution, as a function of five environmental explanatory variables, performing an exhaustive search of minimum adequate models considering three regression methods: non-spatial ordinary least squares (OLS), spatial eigenvector mapping and the autoregressive (lagged-response) model. The models selected by spatial methods included a smaller number of explanatory variables than the one selected by OLS, and minimum adequate models contain different explanatory variables, although model averaging revealed a similar rank of explanatory variables.
Main conclusions    We stress that the AIC is sensitive to the presence of spatial autocorrelation, generating unstable and overfitted minimum adequate models to describe macroecological data based on non-spatial OLS regression. Alternative regression techniques provided different minimum adequate models and have different uncertainty levels. Despite this, the averaged model based on Akaike weights generates consistent and robust results across different methods and may be the best approach for understanding of macroecological patterns.  相似文献   

6.
Understanding the mechanisms underlying the observed dynamics of complex biological systems requires the statistical assessment and comparison of multiple alternative models. Although this has traditionally been done using maximum likelihood-based methods such as Akaike''s Information Criterion (AIC), Bayesian methods have gained in popularity because they provide more informative output in the form of posterior probability distributions. However, comparison between multiple models in a Bayesian framework is made difficult by the computational cost of numerical integration over large parameter spaces. A new, efficient method for the computation of posterior probabilities has recently been proposed and applied to complex problems from the physical sciences. Here we demonstrate how nested sampling can be used for inference and model comparison in biological sciences. We present a reanalysis of data from experimental infection of mice with Salmonella enterica showing the distribution of bacteria in liver cells. In addition to confirming the main finding of the original analysis, which relied on AIC, our approach provides: (a) integration across the parameter space, (b) estimation of the posterior parameter distributions (with visualisations of parameter correlations), and (c) estimation of the posterior predictive distributions for goodness-of-fit assessments of the models. The goodness-of-fit results suggest that alternative mechanistic models and a relaxation of the quasi-stationary assumption should be considered.  相似文献   

7.
Background and Aims Zanthoxylum is the only pantropical genus within Rutaceae, with a few species native to temperate eastern Asia and North America. Efforts using Sanger sequencing failed to resolve the backbone phylogeny of Zanthoxylum. In this study, we employed target-enrichment high-throughput sequencing to improve resolution. Gene trees were examined for concordance and sectional classifications of Zanthoxylum were evaluated. Off-target reads were investigated to identify putative single-copy markers for bait refinement, and low-copy markers for evidence of putative hybridization events.MethodsA custom bait set targeting 354 genes, with a median of 321 bp, was designed for Zanthoxylum and applied to 44 Zanthoxylum species and one Tetradium species as the outgroup. Illumina reads were processed via the HybPhyloMaker pipeline. Phylogenetic inferences were conducted using coalescent and maximum likelihood methods based on concatenated datasets. Concordance was assessed using quartet sampling. Additional phylogenetic analyses were performed on putative single and low-copy genes extracted from off-target reads.Key ResultsFour major clades are supported within Zanthoxylum: the African clade, the Z. asiaticum clade, the Asian–Pacific–Australian clade and the American–eastern Asian clade. While overall support has improved, regions of conflict are similar to those previously observed. Gene tree discordances indicate a hybridization event in the ancestor of the Hawaiian lineage, and incomplete lineage sorting in the American backbone. Off-target putative single-copy genes largely confirm on-target results, and putative low-copy genes provide additional evidence for hybridization in the Hawaiian lineage. Only two of the five sections of Zanthoxylum are resolved as monophyletic.ConclusionsTarget enrichment is suitable for assessing phylogenetic relationships in Zanthoxylum. Our phylogenetic analyses reveal that current sectional classifications need revision. Quartet tree concordance indicates several instances of reticulate evolution. Off-target reads are proven useful to identify additional phylogenetically informative regions for bait refinement or gene tree based approaches.  相似文献   

8.
An excess of nonsynonymous substitutions over synonymous ones is an important indicator of positive selection at the molecular level. A lineage that underwent Darwinian selection may have a nonsynonymous/synonymous rate ratio (dN/dS) that is different from those of other lineages or greater than one. In this paper, several codon-based likelihood models that allow for variable dN/dS ratios among lineages were developed. They were then used to construct likelihood ratio tests to examine whether the dN/dS ratio is variable among evolutionary lineages, whether the ratio for a few lineages of interest is different from the background ratio for other lineages in the phylogeny, and whether the dN/dS ratio for the lineages of interest is greater than one. The tests were applied to the lysozyme genes of 24 primate species. The dN/dS ratios were found to differ significantly among lineages, indicating that the evolution of primate lysozymes is episodic, which is incompatible with the neutral theory. Maximum- likelihood estimates of parameters suggested that about nine nonsynonymous and zero synonymous nucleotide substitutions occurred in the lineage leading to hominoids, and the dN/dS ratio for that lineage is significantly greater than one. The corresponding estimates for the lineage ancestral to colobine monkeys were nine and one, and the dN/dS ratio for the lineage is not significantly greater than one, although it is significantly higher than the background ratio. The likelihood analysis thus confirmed most, but not all, conclusions Messier and Stewart reached using reconstructed ancestral sequences to estimate synonymous and nonsynonymous rates for different lineages.   相似文献   

9.
10.
Liang  Hua; Wu  Hulin; Zou  Guohua 《Biometrika》2008,95(3):773-778
The conventional model selection criterion, the Akaike informationcriterion, AIC, has been applied to choose candidate modelsin mixed-effects models by the consideration of marginal likelihood.Vaida & Blanchard (2005) demonstrated that such a marginalAIC and its small sample correction are inappropriate when theresearch focus is on clusters. Correspondingly, these authorssuggested the use of conditional AIC. Their conditional AICis derived under the assumption that the variance-covariancematrix or scaled variance-covariance matrix of random effectsis known. This note provides a general conditional AIC but withoutthese strong assumptions. Simulation studies show that the proposedmethod is promising.  相似文献   

11.
Aim Spatial autocorrelation is a frequent phenomenon in ecological data and can affect estimates of model coefficients and inference from statistical models. Here, we test the performance of three different simultaneous autoregressive (SAR) model types (spatial error = SARerr, lagged = SARlag and mixed = SARmix) and common ordinary least squares (OLS) regression when accounting for spatial autocorrelation in species distribution data using four artificial data sets with known (but different) spatial autocorrelation structures. Methods We evaluate the performance of SAR models by examining spatial patterns in model residuals (with correlograms and residual maps), by comparing model parameter estimates with true values, and by assessing their type I error control with calibration curves. We calculate a total of 3240 SAR models and illustrate how the best models [in terms of minimum residual spatial autocorrelation (minRSA), maximum model fit (R2), or Akaike information criterion (AIC)] can be identified using model selection procedures. Results Our study shows that the performance of SAR models depends on model specification (i.e. model type, neighbourhood distance, coding styles of spatial weights matrices) and on the kind of spatial autocorrelation present. SAR model parameter estimates might not be more precise than those from OLS regressions in all cases. SARerr models were the most reliable SAR models and performed well in all cases (independent of the kind of spatial autocorrelation induced and whether models were selected by minRSA, R2 or AIC), whereas OLS, SARlag and SARmix models showed weak type I error control and/or unpredictable biases in parameter estimates. Main conclusions SARerr models are recommended for use when dealing with spatially autocorrelated species distribution data. SARlag and SARmix might not always give better estimates of model coefficients than OLS, and can thus generate bias. Other spatial modelling techniques should be assessed comprehensively to test their predictive performance and accuracy for biogeographical and macroecological research.  相似文献   

12.
Understanding how species traits evolved over time is the central question to comprehend assembly rules that govern the phylogenetic structure of communities. The measurement of phylogenetic signal (PS) in ecologically relevant traits is a first step to understand phylogenetically structured community patterns. The different methods available to estimate PS make it difficult to choose which is most appropriate. Furthermore, alternative phylogenetic tree hypotheses, node resolution and clade age estimates might influence PS measurements. In this study, we evaluated to what extent these parameters affect different methods of PS analysis, and discuss advantages and disadvantages when selecting which method to use. We measured fruit/seed traits and flowering/fruiting phenology of endozoochoric species occurring in Southern Brazilian Araucaria forests and evaluated their PS using Mantel regressions, phylogenetic eigenvector regressions (PVR) and K statistic. Mantel regressions always gave less significant results compared to PVR and K statistic in all combinations of phylogenetic trees constructed. Moreover, a better phylogenetic resolution affected PS, independently of the method used to estimate it. Morphological seed traits tended to show higher PS than diaspores traits, while PS in flowering/fruiting phenology depended mostly on the method used to estimate it. This study demonstrates that different PS estimates are obtained depending on the chosen method and the phylogenetic tree resolution. This finding has implications for inferences on phylogenetic niche conservatism or ecological processes determining phylogenetic community structure.  相似文献   

13.
Model complexity in ecological niche modelling has been recently considered as an important issue that might affect model performance. New methodological developments have implemented the Akaike information criterion (AIC) to capture model complexity in the Maxent algorithm model. AIC is calculated based on the number of parameters and likelihoods of continuous raw outputs. ENMeval R package allows users to perform a species-specific tuning of Maxent settings running models with different combinations of regularization multiplier and feature classes and finally, all these models are compared using AIC corrected for small sample size. This approach is focused to find the “best” model parametrization and it is thought to maximize the model complexity and therefore, its predictability. We found that most niche modelling studies examined by us (68%) tend to consider AIC as a criterion of predictive accuracy in geographical distribution. In other words, AIC is used as a criterion to choose those models with the highest capacity to discriminate between presences and absences. However, the link between AIC and geographical predictive accuracy has not been tested so far. Here, we evaluated this relationship using a set of simulated (virtual) species. We created a set of nine virtual species with different ecological and geographical traits (e.g., niche position, niche breadth, range size) and generated different sets of true presences and absences data across geography. We built a set of models using Maxent algorithm with different regularization values and features schemes and calculated AIC values for each model. For each model, we obtained binary predictions using different threshold criteria and validated using independent presence and absences data. We correlated AIC values against standard validation metrics (e.g., Kappa, TSS) and the number of pixels correctly predicted as presences and absences. We did not find a correlation between AIC values and predictive accuracy from validation metrics. In general, those models with the lowest AIC values tend to generate geographical predictions with high commission and omission errors. The results were consistent across all species simulated. Finally, we suggest that AIC should not be used if users are interested in prediction more than explanation in ecological niche modelling.  相似文献   

14.
Most of the previous studies on vermicomposting have been conducted as lab trials at small-scale (SS) using small quantity of waste mixtures. Efforts were made in this study to stabilize the sewage sludge amended with sugarcane trash using pilot-scale (PS) vermicomposting operation. Results of PS vermireactors were compared with SS trials in terms of quality of ready vermicompost and earthworm production rates. Results thus suggest a clear-cut difference between SS and PS in terms of waste mineralization rate and earthworm production. The waste mineralization rate in PS was significantly lower than SS (P < 0.05). Total N and available P were higher in end product from SS, while exchangeable cations (Ca2+ and K+) showed reverse behavior during the process of waste stabilization. There was significant difference between PS and SS for metal remediation rate in end materials. The growth and reproduction pattern of Eisenia fetida was completely different in PS as compared to lab trials, i.e. SS. Probably, the distinct earthworm stocking density and microclimate conditions in SS and PS were responsible for observed differences in results of waste mineralization rate and earthworm growth. This study suggests that SS laboratory trials may differ in PS field operations due to distinct behavior of earthworm in field conditions. It is concluded that SS laboratory trials should be tested in field at large-scale in order to measure the feasibility of technology for large-scale waste decomposition operations in open conditions.  相似文献   

15.

Background

Adaptive radiation in Mediterranean plants is poorly understood. The white-flowered Cistus lineage consists of 12 species primarily distributed in Mediterranean habitats and is herein subject to analysis.

Methodology/Principal Findings

We conducted a “total evidence” analysis combining nuclear (ncpGS, ITS) and plastid (trnL-trnF, trnK-matK, trnS-trnG, rbcL) DNA sequences and using MP and BI to test the hypothesis of radiation as suggested by previous phylogenetic results. One of the five well-supported lineages of the Cistus-Halimium complex, the white-flowered Cistus lineage, comprises the higher number of species (12) and is monophyletic. Molecular dating estimates a Mid Pleistocene (1.04±0.25 Ma) diversification of the white-flowered lineage into two groups (C. clusii and C. salviifolius lineages), which display asymmetric characteristics: number of species (2 vs. 10), leaf morphologies (linear vs. linear to ovate), floral characteristics (small, three-sepalled vs. small to large, three- or five-sepalled flowers) and ecological attributes (low-land vs. low-land to mountain environments). A positive phenotype-environment correlation has been detected by historical reconstructions of morphological traits (leaf shape, leaf labdanum content and leaf pubescence). Ecological evidence indicates that modifications of leaf shape and size, coupled with differences in labdanum secretion and pubescence density, appear to be related to success of new species in different Mediterranean habitats.

Conclusions/Significance

The observation that radiation in the Cistus salviifolius lineage has been accompanied by the emergence of divergent leaf traits (such as shape, pubescence and labdanum secretion) in different environments suggets that radiation in the group has been adaptive. Here we argued that the diverse ecological conditions of Mediterranean habitats played a key role in directing the evolution of alternative leaf strategies in this plant group. Key innovation of morphological characteristics is supported by our dated phylogeny, in which a Mediterranean climate establishment (2.8 Ma) predated the adaptive radiation of the white-flowered Cistus.  相似文献   

16.
MOTIVATION: There often are many alternative models of a biochemical system. Distinguishing models and finding the most suitable ones is an important challenge in Systems Biology, as such model ranking, by experimental evidence, will help to judge the support of the working hypotheses forming each model. Bayes factors are employed as a measure of evidential preference for one model over another. Marginal likelihood is a key component of Bayes factors, however computing the marginal likelihood is a difficult problem, as it involves integration of nonlinear functions in multidimensional space. There are a number of methods available to compute the marginal likelihood approximately. A detailed investigation of such methods is required to find ones that perform appropriately for biochemical modelling. RESULTS: We assess four methods for estimation of the marginal likelihoods required for computing Bayes factors. The Prior Arithmetic Mean estimator, the Posterior Harmonic Mean estimator, the Annealed Importance Sampling and the Annealing-Melting Integration methods are investigated and compared on a typical case study in Systems Biology. This allows us to understand the stability of the analysis results and make reliable judgements in uncertain context. We investigate the variance of Bayes factor estimates, and highlight the stability of the Annealed Importance Sampling and the Annealing-Melting Integration methods for the purposes of comparing nonlinear models. AVAILABILITY: Models used in this study are available in SBML format as the supplementary material to this article.  相似文献   

17.
18.
DNA sequence comparisons of two mitochondrial DNA genes were used to infer phylogenetic relationships among four species of mullids. Approximately 238 bp of the mitochondrial 16S ribosomal RNA (rRNA) and 261 bp of the cytochrome b (cytb) genes were sequenced from representatives of three mullid genera (Mullus, Upeneus, Pseudopeneus), present in the Mediterranean Sea. Trees were constructed using three methods: maximum likelihood (ML), neighbor joining (NJ) and parsimony (MP). The results of the analyses of these data together with published data of the same mtDNA segments of two other perciform species (Sparus aurata, Perca fluviatilis), support the previous taxonomic classification of the three genera examined, as well as the classification of the two red mullet species in the same genus.  相似文献   

19.

Background

The complex history of Southeast Asian islands has long been of interest to biogeographers. Dispersal and vicariance events in the Pleistocene have received the most attention, though recent studies suggest a potentially more ancient history to components of the terrestrial fauna. Among this fauna is the enigmatic archaeobatrachian frog genus Barbourula, which only occurs on the islands of Borneo and Palawan. We utilize this lineage to gain unique insight into the temporal history of lineage diversification in Southeast Asian islands.

Methodology/Principal Findings

Using mitochondrial and nuclear genetic data, multiple fossil calibration points, and likelihood and Bayesian methods, we estimate phylogenetic relationships and divergence times for Barbourula. We determine the sensitivity of focal divergence times to specific calibration points by jackknife approach in which each calibration point is excluded from analysis. We find that relevant divergence time estimates are robust to the exclusion of specific calibration points. Barbourula is recovered as a monophyletic lineage nested within a monophyletic Costata. Barbourula diverged from its sister taxon Bombina in the Paleogene and the two species of Barbourula diverged in the Late Miocene.

Conclusions/Significance

The divergences within Barbourula and between it and Bombina are surprisingly old and represent the oldest estimates for a cladogenetic event resulting in living taxa endemic to Southeast Asian islands. Moreover, these divergence time estimates are consistent with a new biogeographic scenario: the Palawan Ark Hypothesis. We suggest that components of Palawan''s terrestrial fauna might have “rafted” on emergent portions of the North Palawan Block during its migration from the Asian mainland to its present-day position near Borneo. Further, dispersal from Palawan to Borneo (rather than Borneo to Palawan) may explain the current day disjunct distribution of this ancient lineage.  相似文献   

20.
Species tolerances are frequently used in multi-metric ecological quality indices, and typically have the strongest responses to disturbances. Usually the tolerances of many species are based on expert judgment, with little support from empirical ecological or physiological data. This is particularly true for fish of Mediterranean-type rivers, in which there are many basin-endemic taxa with little information on basic life history traits. In addition, the apparent tolerance of native Mediterranean freshwater fish species to naturally harsh environments and their short-term resilience may mask responses to man-made pressures. Consequently, we evaluated different statistical techniques and procedures for quantifying Mediterranean lotic fish tolerances and compared expert judgment of species tolerances with empirically determined tolerance values. We used eight alternative approaches to compute fish tolerance values for the Mediterranean basins of SW Europe. Three types of approaches were used: (1) those based on the concept of niche breadth along an environment/pressure gradient (five models); (2) those based on deviations from expected values at disturbed sites as predicted by statistical models describing relationships between species and environmental variables (generalized linear modelling (GLM) and generalized additive modelling (GAM), two models); and (3) one model based on the relatively independent contributions of pressure variables to the data variation explained by statistical models. Tolerance estimates based on the used/available pressure gradient and the average general pressure value had the highest mean correlations with the expert judgment classification (mean r = 0.4) and with the other approaches (mean r of 0.48 and 0.46, respectively). The high degree of uncertainty in tolerance estimates should be accounted for when applying them in ecological assessments. Results also highlights the need for better designed research to separate effects of natural and disturbance gradients on species occurrences and densities.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号