首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Maximum likelihood and Bayesian approaches are presented for analyzing hierarchical statistical models of natural selection operating on DNA polymorphism within a panmictic population. For analyzing Bayesian models, we present Markov chain Monte-Carlo (MCMC) methods for sampling from the joint posterior distribution of parameters. For frequentist analysis, an Expectation-Maximization (EM) algorithm is presented for finding the maximum likelihood estimate of the genome wide mean and variance in selection intensity among classes of mutations. The framework presented here provides an ideal setting for modeling mutations dispersed through the genome and, in particular, for the analysis of how natural selection operates on different classes of single nucleotide polymorphisms (SNPs).  相似文献   

2.
Recent developments in marginal likelihood estimation for model selection in the field of Bayesian phylogenetics and molecular evolution have emphasized the poor performance of the harmonic mean estimator (HME). Although these studies have shown the merits of new approaches applied to standard normally distributed examples and small real-world data sets, not much is currently known concerning the performance and computational issues of these methods when fitting complex evolutionary and population genetic models to empirical real-world data sets. Further, these approaches have not yet seen widespread application in the field due to the lack of implementations of these computationally demanding techniques in commonly used phylogenetic packages. We here investigate the performance of some of these new marginal likelihood estimators, specifically, path sampling (PS) and stepping-stone (SS) sampling for comparing models of demographic change and relaxed molecular clocks, using synthetic data and real-world examples for which unexpected inferences were made using the HME. Given the drastically increased computational demands of PS and SS sampling, we also investigate a posterior simulation-based analogue of Akaike's information criterion (AIC) through Markov chain Monte Carlo (MCMC), a model comparison approach that shares with the HME the appealing feature of having a low computational overhead over the original MCMC analysis. We confirm that the HME systematically overestimates the marginal likelihood and fails to yield reliable model classification and show that the AICM performs better and may be a useful initial evaluation of model choice but that it is also, to a lesser degree, unreliable. We show that PS and SS sampling substantially outperform these estimators and adjust the conclusions made concerning previous analyses for the three real-world data sets that we reanalyzed. The methods used in this article are now available in BEAST, a powerful user-friendly software package to perform Bayesian evolutionary analyses.  相似文献   

3.
The molecular clock provides a powerful way to estimate species divergence times. If information on some species divergence times is available from the fossil or geological record, it can be used to calibrate a phylogeny and estimate divergence times for all nodes in the tree. The Bayesian method provides a natural framework to incorporate different sources of information concerning divergence times, such as information in the fossil and molecular data. Current models of sequence evolution are intractable in a Bayesian setting, and Markov chain Monte Carlo (MCMC) is used to generate the posterior distribution of divergence times and evolutionary rates. This method is computationally expensive, as it involves the repeated calculation of the likelihood function. Here, we explore the use of Taylor expansion to approximate the likelihood during MCMC iteration. The approximation is much faster than conventional likelihood calculation. However, the approximation is expected to be poor when the proposed parameters are far from the likelihood peak. We explore the use of parameter transforms (square root, logarithm, and arcsine) to improve the approximation to the likelihood curve. We found that the new methods, particularly the arcsine-based transform, provided very good approximations under relaxed clock models and also under the global clock model when the global clock is not seriously violated. The approximation is poorer for analysis under the global clock when the global clock is seriously wrong and should thus not be used. The results suggest that the approximate method may be useful for Bayesian dating analysis using large data sets.  相似文献   

4.
In this paper, we study Bayesian analysis of nonlinear hierarchical mixture models with a finite but unknown number of components. Our approach is based on Markov chain Monte Carlo (MCMC) methods. One of the applications of our method is directed to the clustering problem in gene expression analysis. From a mathematical and statistical point of view, we discuss the following topics: theoretical and practical convergence problems of the MCMC method; determination of the number of components in the mixture; and computational problems associated with likelihood calculations. In the existing literature, these problems have mainly been addressed in the linear case. One of the main contributions of this paper is developing a method for the nonlinear case. Our approach is based on a combination of methods including Gibbs sampling, random permutation sampling, birth-death MCMC, and Kullback-Leibler distance.  相似文献   

5.
Bayesian spatial modeling of haplotype associations   总被引:9,自引:0,他引:9  
We review methods for relating the risk of disease to a collection of single nucleotide polymorphisms (SNPs) within a small region. Association studies using case-control designs with unrelated individuals could be used either to test for a direct effect of a candidate gene and characterize the responsible variant(s), or to fine map an unknown gene by exploiting the pattern of linkage disequilibrium (LD). We consider a flexible class of logistic penetrance models based on haplotypes and compare them with an alternative formulation based on unphased multilocus genotypes. The likelihood for haplotype-based models requires summation over all possible haplotype assignments consistent with the observed genotype data, and can be fitted using either Expectation-Maximization (E-M) or Markov chain Monte Carlo (MCMC) methods. Subtleties involving ascertainment correction for case-control studies are discussed. There has been great interest in methods for LD mapping based on the coalescent or ancestral recombination graphs as well as methods based on haplotype sharing, both of which we review briefly. Because of their computational complexity, we propose some alternative empirical modeling approaches using techniques borrowed from the Bayesian spatial statistics literature. Here, space is interpreted in terms of a distance metric describing the similarity of any pair of haplotypes to each other, and hence their presumed common ancestry. Specifically, we discuss the conditional autoregressive model and two spatial clustering models: Potts and Voronoi. We conclude with a discussion of the implications of these methods for modeling cryptic relatedness, haplotype blocks, and haplotype tagging SNPs, and suggest a Bayesian framework for the HapMap project.  相似文献   

6.
Zhu X  Zhang S  Tang H  Cooper R 《Human genetics》2006,120(3):431-445
Several disease-mapping methods have been proposed recently, which use the information generated by recent admixture of populations from historically distinct geographic origins. These methods include both classic likelihood and Bayesian approaches. In this study we directly maximize the likelihood function from the hidden Markov Model for admixture mapping using the EM algorithm, allowing for uncertainty in model parameters, such as the allele frequencies in the parental populations. We determined the robustness of the proposed method by examining the ancestral allele frequency estimate and individual marker-location specific ancestry when the data were generated by different population admixture models and no learning sample was used. The proposed method outperforms a widely used Bayesian MCMC strategy for data generated from various population admixture models. The multipoint information content for ancestry was derived based on the map provided by Smith et al. (2004) and the associated statistical power was calculated. We examined the distribution of admixture LD across the genome for both real and simulated data and established a threshold for genome wide significance applicable to admixture mapping studies. The software ADMIXPROGRAM for performing admixture mapping is available from authors.  相似文献   

7.
Approximate Bayesian computation in population genetics   总被引:23,自引:0,他引:23  
Beaumont MA  Zhang W  Balding DJ 《Genetics》2002,162(4):2025-2035
We propose a new method for approximate Bayesian statistical inference on the basis of summary statistics. The method is suited to complex problems that arise in population genetics, extending ideas developed in this setting by earlier authors. Properties of the posterior distribution of a parameter, such as its mean or density curve, are approximated without explicit likelihood calculations. This is achieved by fitting a local-linear regression of simulated parameter values on simulated summary statistics, and then substituting the observed summary statistics into the regression equation. The method combines many of the advantages of Bayesian statistical inference with the computational efficiency of methods based on summary statistics. A key advantage of the method is that the nuisance parameters are automatically integrated out in the simulation step, so that the large numbers of nuisance parameters that arise in population genetics problems can be handled without difficulty. Simulation results indicate computational and statistical efficiency that compares favorably with those of alternative methods previously proposed in the literature. We also compare the relative efficiency of inferences obtained using methods based on summary statistics with those obtained directly from the data using MCMC.  相似文献   

8.
Methods for Bayesian inference of phylogeny using DNA sequences based on Markov chain Monte Carlo (MCMC) techniques allow the incorporation of arbitrarily complex models of the DNA substitution process, and other aspects of evolution. This has increased the realism of models, potentially improving the accuracy of the methods, and is largely responsible for their recent popularity. Another consequence of the increased complexity of models in Bayesian phylogenetics is that these models have, in several cases, become overparameterized. In such cases, some parameters of the model are not identifiable; different combinations of nonidentifiable parameters lead to the same likelihood, making it impossible to decide among the potential parameter values based on the data. Overparameterized models can also slow the rate of convergence of MCMC algorithms due to large negative correlations among parameters in the posterior probability distribution. Functions of parameters can sometimes be found, in overparameterized models, that are identifiable, and inferences based on these functions are legitimate. Examples are presented of overparameterized models that have been proposed in the context of several Bayesian methods for inferring the relative ages of nodes in a phylogeny when the substitution rate evolves over time.  相似文献   

9.
Integrative analyses based on statistically relevant associations between genomics and a wealth of intermediary phenotypes (such as imaging) provide vital insights into their clinical relevance in terms of the disease mechanisms. Estimates for uncertainty in the resulting integrative models are however unreliable unless inference accounts for the selection of these associations with accuracy. In this paper, we develop selection-aware Bayesian methods, which (1) counteract the impact of model selection bias through a “selection-aware posterior” in a flexible class of integrative Bayesian models post a selection of promising variables via ℓ1-regularized algorithms; (2) strike an inevitable trade-off between the quality of model selection and inferential power when the same data set is used for both selection and uncertainty estimation. Central to our methodological development, a carefully constructed conditional likelihood function deployed with a reparameterization mapping provides tractable updates when gradient-based Markov chain Monte Carlo (MCMC) sampling is used for estimating uncertainties from the selection-aware posterior. Applying our methods to a radiogenomic analysis, we successfully recover several important gene pathways and estimate uncertainties for their associations with patient survival times.  相似文献   

10.
Bayesian phylogenetic methods are generating noticeable enthusiasm in the field of molecular systematics. Many phylogenetic models are often at stake, and different approaches are used to compare them within a Bayesian framework. The Bayes factor, defined as the ratio of the marginal likelihoods of two competing models, plays a key role in Bayesian model selection. We focus on an alternative estimator of the marginal likelihood whose computation is still a challenging problem. Several computational solutions have been proposed, none of which can be considered outperforming the others simultaneously in terms of simplicity of implementation, computational burden and precision of the estimates. Practitioners and researchers, often led by available software, have privileged so far the simplicity of the harmonic mean (HM) estimator. However, it is known that the resulting estimates of the Bayesian evidence in favor of one model are biased and often inaccurate, up to having an infinite variance so that the reliability of the corresponding conclusions is doubtful. We consider possible improvements of the generalized harmonic mean (GHM) idea that recycle Markov Chain Monte Carlo (MCMC) simulations from the posterior, share the computational simplicity of the original HM estimator, but, unlike it, overcome the infinite variance issue. We show reliability and comparative performance of the improved harmonic mean estimators comparing them to approximation techniques relying on improved variants of the thermodynamic integration.  相似文献   

11.
Li Z  Sillanpää MJ 《Genetics》2012,190(1):231-249
Bayesian hierarchical shrinkage methods have been widely used for quantitative trait locus mapping. From the computational perspective, the application of the Markov chain Monte Carlo (MCMC) method is not optimal for high-dimensional problems such as the ones arising in epistatic analysis. Maximum a posteriori (MAP) estimation can be a faster alternative, but it usually produces only point estimates without providing any measures of uncertainty (i.e., interval estimates). The variational Bayes method, stemming from the mean field theory in theoretical physics, is regarded as a compromise between MAP and MCMC estimation, which can be efficiently computed and produces the uncertainty measures of the estimates. Furthermore, variational Bayes methods can be regarded as the extension of traditional expectation-maximization (EM) algorithms and can be applied to a broader class of Bayesian models. Thus, the use of variational Bayes algorithms based on three hierarchical shrinkage models including Bayesian adaptive shrinkage, Bayesian LASSO, and extended Bayesian LASSO is proposed here. These methods performed generally well and were found to be highly competitive with their MCMC counterparts in our example analyses. The use of posterior credible intervals and permutation tests are considered for decision making between quantitative trait loci (QTL) and non-QTL. The performance of the presented models is also compared with R/qtlbim and R/BhGLM packages, using a previously studied simulated public epistatic data set.  相似文献   

12.
We introduce a new statistical computing method, called data cloning, to calculate maximum likelihood estimates and their standard errors for complex ecological models. Although the method uses the Bayesian framework and exploits the computational simplicity of the Markov chain Monte Carlo (MCMC) algorithms, it provides valid frequentist inferences such as the maximum likelihood estimates and their standard errors. The inferences are completely invariant to the choice of the prior distributions and therefore avoid the inherent subjectivity of the Bayesian approach. The data cloning method is easily implemented using standard MCMC software. Data cloning is particularly useful for analysing ecological situations in which hierarchical statistical models, such as state-space models and mixed effects models, are appropriate. We illustrate the method by fitting two nonlinear population dynamics models to data in the presence of process and observation noise.  相似文献   

13.
Bayesian flux balance analysis applied to a skeletal muscle metabolic model   总被引:1,自引:0,他引:1  
In this article, the steady state condition for the multi-compartment models for cellular metabolism is considered. The problem is to estimate the reaction and transport fluxes, as well as the concentrations in venous blood when the stoichiometry and bound constraints for the fluxes and the concentrations are given. The problem has been addressed previously by a number of authors, and optimization-based approaches as well as extreme pathway analysis have been proposed. These approaches are briefly discussed here. The main emphasis of this work is a Bayesian statistical approach to the flux balance analysis (FBA). We show how the bound constraints and optimality conditions such as maximizing the oxidative phosphorylation flux can be incorporated into the model in the Bayesian framework by proper construction of the prior densities. We propose an effective Markov chain Monte Carlo (MCMC) scheme to explore the posterior densities, and compare the results with those obtained via the previously studied linear programming (LP) approach. The proposed methodology, which is applied here to a two-compartment model for skeletal muscle metabolism, can be extended to more complex models.  相似文献   

14.
The fate of scientific hypotheses often relies on the ability of a computational model to explain the data, quantified in modern statistical approaches by the likelihood function. The log-likelihood is the key element for parameter estimation and model evaluation. However, the log-likelihood of complex models in fields such as computational biology and neuroscience is often intractable to compute analytically or numerically. In those cases, researchers can often only estimate the log-likelihood by comparing observed data with synthetic observations generated by model simulations. Standard techniques to approximate the likelihood via simulation either use summary statistics of the data or are at risk of producing substantial biases in the estimate. Here, we explore another method, inverse binomial sampling (IBS), which can estimate the log-likelihood of an entire data set efficiently and without bias. For each observation, IBS draws samples from the simulator model until one matches the observation. The log-likelihood estimate is then a function of the number of samples drawn. The variance of this estimator is uniformly bounded, achieves the minimum variance for an unbiased estimator, and we can compute calibrated estimates of the variance. We provide theoretical arguments in favor of IBS and an empirical assessment of the method for maximum-likelihood estimation with simulation-based models. As case studies, we take three model-fitting problems of increasing complexity from computational and cognitive neuroscience. In all problems, IBS generally produces lower error in the estimated parameters and maximum log-likelihood values than alternative sampling methods with the same average number of samples. Our results demonstrate the potential of IBS as a practical, robust, and easy to implement method for log-likelihood evaluation when exact techniques are not available.  相似文献   

15.
Wu CH  Drummond AJ 《Genetics》2011,188(1):151-164
We provide a framework for Bayesian coalescent inference from microsatellite data that enables inference of population history parameters averaged over microsatellite mutation models. To achieve this we first implemented a rich family of microsatellite mutation models and related components in the software package BEAST. BEAST is a powerful tool that performs Bayesian MCMC analysis on molecular data to make coalescent and evolutionary inferences. Our implementation permits the application of existing nonparametric methods to microsatellite data. The implemented microsatellite models are based on the replication slippage mechanism and focus on three properties of microsatellite mutation: length dependency of mutation rate, mutational bias toward expansion or contraction, and number of repeat units changed in a single mutation event. We develop a new model that facilitates microsatellite model averaging and Bayesian model selection by transdimensional MCMC. With Bayesian model averaging, the posterior distributions of population history parameters are integrated across a set of microsatellite models and thus account for model uncertainty. Simulated data are used to evaluate our method in terms of accuracy and precision of estimation and also identification of the true mutation model. Finally we apply our method to a red colobus monkey data set as an example.  相似文献   

16.
The objective of this study was to obtain a quantitative assessment of the monophyly of morning glory taxa, specifically the genus Ipomoea and the tribe Argyreieae. Previous systematic studies of morning glories intimated the paraphyly of Ipomoea by suggesting that the genera within the tribe Argyreieae are derived from within Ipomoea; however, no quantitative estimates of statistical support were developed to address these questions. We applied a Bayesian analysis to provide quantitative estimates of monophyly in an investigation of morning glory relationships using DNA sequence data. We also explored various approaches for examining convergence of the Markov chain Monte Carlo (MCMC) simulation of the Bayesian analysis by running 18 separate analyses varying in length. We found convergence of the important components of the phylogenetic model (the tree with the maximum posterior probability, branch lengths, the parameter values from the DNA substitution model, and the posterior probabilities for clade support) for these data after one million generations of the MCMC simulations. In the process, we identified a run where the parameter values obtained were often outside the range of values obtained from the other runs, suggesting an aberrant result. In addition, we compared the Bayesian method of phylogenetic analysis to maximum likelihood and maximum parsimony. The results from the Bayesian analysis and the maximum likelihood analysis were similar for topology, branch lengths, and parameters of the DNA substitution model. Topologies also were similar in the comparison between the Bayesian analysis and maximum parsimony, although the posterior probabilities and the bootstrap proportions exhibited some striking differences. In a Bayesian analysis of three data sets (ITS sequences, waxy sequences, and ITS + waxy sequences) no supoort for the monophyly of the genus Ipomoea, or for the tribe Argyreieae, was observed, with the estimate of the probability of the monophyly of these taxa being less than 3.4 x 10(-7).  相似文献   

17.
Ion channels are characterized by inherently stochastic behavior which can be represented by continuous-time Markov models (CTMM). Although methods for collecting data from single ion channels are available, translating a time series of open and closed channels to a CTMM remains a challenge. Bayesian statistics combined with Markov chain Monte Carlo (MCMC) sampling provide means for estimating the rate constants of a CTMM directly from single channel data. In this article, different approaches for the MCMC sampling of Markov models are combined. This method, new to our knowledge, detects overparameterizations and gives more accurate results than existing MCMC methods. It shows similar performance as QuB-MIL, which indicates that it also compares well with maximum likelihood estimators. Data collected from an inositol trisphosphate receptor is used to demonstrate how the best model for a given data set can be found in practice.  相似文献   

18.
Improved efficiency of Markov chain Monte Carlo facilitates all aspects of statistical analysis with Bayesian hierarchical models. Identifying strategies to improve MCMC performance is becoming increasingly crucial as the complexity of models, and the run times to fit them, increases. We evaluate different strategies for improving MCMC efficiency using the open‐source software NIMBLE (R package nimble) using common ecological models of species occurrence and abundance as examples. We ask how MCMC efficiency depends on model formulation, model size, data, and sampling strategy. For multiseason and/or multispecies occupancy models and for N‐mixture models, we compare the efficiency of sampling discrete latent states vs. integrating over them, including more vs. fewer hierarchical model components, and univariate vs. block‐sampling methods. We include the common MCMC tool JAGS in comparisons. For simple models, there is little practical difference between computational approaches. As model complexity increases, there are strong interactions between model formulation and sampling strategy on MCMC efficiency. There is no one‐size‐fits‐all best strategy, but rather problem‐specific best strategies related to model structure and type. In all but the simplest cases, NIMBLE's default or customized performance achieves much higher efficiency than JAGS. In the two most complex examples, NIMBLE was 10–12 times more efficient than JAGS. We find NIMBLE is a valuable tool for many ecologists utilizing Bayesian inference, particularly for complex models where JAGS is prohibitively slow. Our results highlight the need for more guidelines and customizable approaches to fit hierarchical models to ensure practitioners can make the most of occupancy and other hierarchical models. By implementing model‐generic MCMC procedures in open‐source software, including the NIMBLE extensions for integrating over latent states (implemented in the R package nimbleEcology), we have made progress toward this aim.  相似文献   

19.
Bayesian inference is becoming a common statistical approach to phylogenetic estimation because, among other reasons, it allows for rapid analysis of large data sets with complex evolutionary models. Conveniently, Bayesian phylogenetic methods use currently available stochastic models of sequence evolution. However, as with other model-based approaches, the results of Bayesian inference are conditional on the assumed model of evolution: inadequate models (models that poorly fit the data) may result in erroneous inferences. In this article, I present a Bayesian phylogenetic method that evaluates the adequacy of evolutionary models using posterior predictive distributions. By evaluating a model's posterior predictive performance, an adequate model can be selected for a Bayesian phylogenetic study. Although I present a single test statistic that assesses the overall (global) performance of a phylogenetic model, a variety of test statistics can be tailored to evaluate specific features (local performance) of evolutionary models to identify sources failure. The method presented here, unlike the likelihood-ratio test and parametric bootstrap, accounts for uncertainty in the phylogeny and model parameters.  相似文献   

20.
Targeted maximum likelihood estimation is a versatile tool for estimating parameters in semiparametric and nonparametric models. We work through an example applying targeted maximum likelihood methodology to estimate the parameter of a marginal structural model. In the case we consider, we show how this can be easily done by clever use of standard statistical software. We point out differences between targeted maximum likelihood estimation and other approaches (including estimating function based methods). The application we consider is to estimate the effect of adherence to antiretroviral medications on virologic failure in HIV positive individuals.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号