首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Bayesian inference in ecology   总被引:14,自引:1,他引:13  
Bayesian inference is an important statistical tool that is increasingly being used by ecologists. In a Bayesian analysis, information available before a study is conducted is summarized in a quantitative model or hypothesis: the prior probability distribution. Bayes’ Theorem uses the prior probability distribution and the likelihood of the data to generate a posterior probability distribution. Posterior probability distributions are an epistemological alternative to P‐values and provide a direct measure of the degree of belief that can be placed on models, hypotheses, or parameter estimates. Moreover, Bayesian information‐theoretic methods provide robust measures of the probability of alternative models, and multiple models can be averaged into a single model that reflects uncertainty in model construction and selection. These methods are demonstrated through a simple worked example. Ecologists are using Bayesian inference in studies that range from predicting single‐species population dynamics to understanding ecosystem processes. Not all ecologists, however, appreciate the philosophical underpinnings of Bayesian inference. In particular, Bayesians and frequentists differ in their definition of probability and in their treatment of model parameters as random variables or estimates of true values. These assumptions must be addressed explicitly before deciding whether or not to use Bayesian methods to analyse ecological data.  相似文献   

2.
Parameter inference and model selection are very important for mathematical modeling in systems biology. Bayesian statistics can be used to conduct both parameter inference and model selection. Especially, the framework named approximate Bayesian computation is often used for parameter inference and model selection in systems biology. However, Monte Carlo methods needs to be used to compute Bayesian posterior distributions. In addition, the posterior distributions of parameters are sometimes almost uniform or very similar to their prior distributions. In such cases, it is difficult to choose one specific value of parameter with high credibility as the representative value of the distribution. To overcome the problems, we introduced one of the population Monte Carlo algorithms, population annealing. Although population annealing is usually used in statistical mechanics, we showed that population annealing can be used to compute Bayesian posterior distributions in the approximate Bayesian computation framework. To deal with un-identifiability of the representative values of parameters, we proposed to run the simulations with the parameter ensemble sampled from the posterior distribution, named “posterior parameter ensemble”. We showed that population annealing is an efficient and convenient algorithm to generate posterior parameter ensemble. We also showed that the simulations with the posterior parameter ensemble can, not only reproduce the data used for parameter inference, but also capture and predict the data which was not used for parameter inference. Lastly, we introduced the marginal likelihood in the approximate Bayesian computation framework for Bayesian model selection. We showed that population annealing enables us to compute the marginal likelihood in the approximate Bayesian computation framework and conduct model selection depending on the Bayes factor.  相似文献   

3.
Using a four-taxon example under a simple model of evolution, we show that the methods of maximum likelihood and maximum posterior probability (which is a Bayesian method of inference) may not arrive at the same optimal tree topology. Some patterns that are separately uninformative under the maximum likelihood method are separately informative under the Bayesian method. We also show that this difference has impact on the bootstrap frequencies and the posterior probabilities of topologies, which therefore are not necessarily approximately equal. Efron et al. (Proc. Natl. Acad. Sci. USA 93:13429-13434, 1996) stated that bootstrap frequencies can, under certain circumstances, be interpreted as posterior probabilities. This is true only if one includes a non-informative prior distribution of the possible data patterns, and most often the prior distributions are instead specified in terms of topology and branch lengths. [Bayesian inference; maximum likelihood method; Phylogeny; support.].  相似文献   

4.
Fair-balance paradox, star-tree paradox, and Bayesian phylogenetics   总被引:1,自引:0,他引:1  
The star-tree paradox refers to the conjecture that the posterior probabilities for the three unrooted trees for four species (or the three rooted trees for three species if the molecular clock is assumed) do not approach 1/3 when the data are generated using the star tree and when the amount of data approaches infinity. It reflects the more general phenomenon of high and presumably spurious posterior probabilities for trees or clades produced by the Bayesian method of phylogenetic reconstruction, and it is perceived to be a manifestation of the deeper problem of the extreme sensitivity of Bayesian model selection to the prior on parameters. Analysis of the star-tree paradox has been hampered by the intractability of the integrals involved. In this article, I use Laplacian expansion to approximate the posterior probabilities for the three rooted trees for three species using binary characters evolving at a constant rate. The approximation enables calculation of posterior tree probabilities for arbitrarily large data sets. Both theoretical analysis of the analogous fair-coin and fair-balance problems and computer simulation for the tree problem confirmed the existence of the star-tree paradox. When the data size n --> infinity, the posterior tree probabilities do not converge to 1/3 each, but they vary among data sets according to a statistical distribution. This distribution is characterized. Two strategies for resolving the star-tree paradox are explored: (1) a nonzero prior probability for the degenerate star tree and (2) an increasingly informative prior forcing the internal branch length toward zero. Both appear to be effective in resolving the paradox, but the latter is simpler to implement. The posterior tree probabilities are found to be very sensitive to the prior.  相似文献   

5.
A compound poisson process for relaxing the molecular clock   总被引:18,自引:0,他引:18  
Huelsenbeck JP  Larget B  Swofford D 《Genetics》2000,154(4):1879-1892
The molecular clock hypothesis remains an important conceptual and analytical tool in evolutionary biology despite the repeated observation that the clock hypothesis does not perfectly explain observed DNA sequence variation. We introduce a parametric model that relaxes the molecular clock by allowing rates to vary across lineages according to a compound Poisson process. Events of substitution rate change are placed onto a phylogenetic tree according to a Poisson process. When an event of substitution rate change occurs, the current rate of substitution is modified by a gamma-distributed random variable. Parameters of the model can be estimated using Bayesian inference. We use Markov chain Monte Carlo integration to evaluate the posterior probability distribution because the posterior probability involves high dimensional integrals and summations. Specifically, we use the Metropolis-Hastings-Green algorithm with 11 different move types to evaluate the posterior distribution. We demonstrate the method by analyzing a complete mtDNA sequence data set from 23 mammals. The model presented here has several potential advantages over other models that have been proposed to relax the clock because it is parametric and does not assume that rates change only at speciation events. This model should prove useful for estimating divergence times when substitution rates vary across lineages.  相似文献   

6.
Bayesian phylogenetic methods require the selection of prior probability distributions for all parameters of the model of evolution. These distributions allow one to incorporate prior information into a Bayesian analysis, but even in the absence of meaningful prior information, a prior distribution must be chosen. In such situations, researchers typically seek to choose a prior that will have little effect on the posterior estimates produced by an analysis, allowing the data to dominate. Sometimes a prior that is uniform (assigning equal prior probability density to all points within some range) is chosen for this purpose. In reality, the appropriate prior depends on the parameterization chosen for the model of evolution, a choice that is largely arbitrary. There is an extensive Bayesian literature on appropriate prior choice, and it has long been appreciated that there are parameterizations for which uniform priors can have a strong influence on posterior estimates. We here discuss the relationship between model parameterization and prior specification, using the general time-reversible model of nucleotide evolution as an example. We present Bayesian analyses of 10 simulated data sets obtained using a variety of prior distributions and parameterizations of the general time-reversible model. Uniform priors can produce biased parameter estimates under realistic conditions, and a variety of alternative priors avoid this bias.  相似文献   

7.
In Bayesian phylogenetics, confidence in evolutionary relationships is expressed as posterior probability--the probability that a tree or clade is true given the data, evolutionary model, and prior assumptions about model parameters. Model parameters, such as branch lengths, are never known in advance; Bayesian methods incorporate this uncertainty by integrating over a range of plausible values given an assumed prior probability distribution for each parameter. Little is known about the effects of integrating over branch length uncertainty on posterior probabilities when different priors are assumed. Here, we show that integrating over uncertainty using a wide range of typical prior assumptions strongly affects posterior probabilities, causing them to deviate from those that would be inferred if branch lengths were known in advance; only when there is no uncertainty to integrate over does the average posterior probability of a group of trees accurately predict the proportion of correct trees in the group. The pattern of branch lengths on the true tree determines whether integrating over uncertainty pushes posterior probabilities upward or downward. The magnitude of the effect depends on the specific prior distributions used and the length of the sequences analyzed. Under realistic conditions, however, even extraordinarily long sequences are not enough to prevent frequent inference of incorrect clades with strong support. We found that across a range of conditions, diffuse priors--either flat or exponential distributions with moderate to large means--provide more reliable inferences than small-mean exponential priors. An empirical Bayes approach that fixes branch lengths at their maximum likelihood estimates yields posterior probabilities that more closely match those that would be inferred if the true branch lengths were known in advance and reduces the rate of strongly supported false inferences compared with fully Bayesian integration.  相似文献   

8.
While Bayesian analysis has become common in phylogenetics, the effects of topological prior probabilities on tree inference have not been investigated. In Bayesian analyses, the prior probability of topologies is almost always considered equal for all possible trees, and clade support is calculated from the majority rule consensus of the approximated posterior distribution of topologies. These uniform priors on tree topologies imply non-uniform prior probabilities of clades, which are dependent on the number of taxa in a clade as well as the number of taxa in the analysis. As such, uniform topological priors do not model ignorance with respect to clades. Here, we demonstrate that Bayesian clade support, bootstrap support, and jackknife support from 17 empirical studies are significantly and positively correlated with non-uniform clade priors resulting from uniform topological priors. Further, we demonstrate that this effect disappears for bootstrap and jackknife when data sets are free from character conflict, but remains pronounced for Bayesian clade supports, regardless of tree shape. Finally, we propose the use of a Bayes factor to account for the fact that uniform topological priors do not model ignorance with respect to clade probability.  相似文献   

9.
Disease incidence or mortality data are typically available as rates or counts for specified regions, collected over time. We propose Bayesian nonparametric spatial modeling approaches to analyze such data. We develop a hierarchical specification using spatial random effects modeled with a Dirichlet process prior. The Dirichlet process is centered around a multivariate normal distribution. This latter distribution arises from a log-Gaussian process model that provides a latent incidence rate surface, followed by block averaging to the areal units determined by the regions in the study. With regard to the resulting posterior predictive inference, the modeling approach is shown to be equivalent to an approach based on block averaging of a spatial Dirichlet process to obtain a prior probability model for the finite dimensional distribution of the spatial random effects. We introduce a dynamic formulation for the spatial random effects to extend the model to spatio-temporal settings. Posterior inference is implemented through Gibbs sampling. We illustrate the methodology with simulated data as well as with a data set on lung cancer incidences for all 88 counties in the state of Ohio over an observation period of 21 years.  相似文献   

10.
The restricted mean survival time (RMST) evaluates the expectation of survival time truncated by a prespecified time point, because the mean survival time in the presence of censoring is typically not estimable. The frequentist inference procedure for RMST has been widely advocated for comparison of two survival curves, while research from the Bayesian perspective is rather limited. For the RMST of both right- and interval-censored data, we propose Bayesian nonparametric estimation and inference procedures. By assigning a mixture of Dirichlet processes (MDP) prior to the distribution function, we can estimate the posterior distribution of RMST. We also explore another Bayesian nonparametric approach using the Dirichlet process mixture model and make comparisons with the frequentist nonparametric method. Simulation studies demonstrate that the Bayesian nonparametric RMST under diffuse MDP priors leads to robust estimation and under informative priors it can incorporate prior knowledge into the nonparametric estimator. Analysis of real trial examples demonstrates the flexibility and interpretability of the Bayesian nonparametric RMST for both right- and interval-censored data.  相似文献   

11.
Bobb JF  Dominici F  Peng RD 《Biometrics》2011,67(4):1605-1616
Estimating the risks heat waves pose to human health is a critical part of assessing the future impact of climate change. In this article, we propose a flexible class of time series models to estimate the relative risk of mortality associated with heat waves and conduct Bayesian model averaging (BMA) to account for the multiplicity of potential models. Applying these methods to data from 105 U.S. cities for the period 1987-2005, we identify those cities having a high posterior probability of increased mortality risk during heat waves, examine the heterogeneity of the posterior distributions of mortality risk across cities, assess sensitivity of the results to the selection of prior distributions, and compare our BMA results to a model selection approach. Our results show that no single model best predicts risk across the majority of cities, and that for some cities heat-wave risk estimation is sensitive to model choice. Although model averaging leads to posterior distributions with increased variance as compared to statistical inference conditional on a model obtained through model selection, we find that the posterior mean of heat wave mortality risk is robust to accounting for model uncertainty over a broad class of models.  相似文献   

12.
The liver is the principal site of insulin degradation, and assessing its ability to extract insulin is important to understand several pathological states. Noninvasive quantification of hepatic extraction (HE) in an individual requires comparing the profiles of insulin secretion (ISR) and posthepatic insulin delivery rate (IDR). To do this, we propose here the combined use of the classical C-peptide minimal model with a new minimal model of insulin delivery and kinetics. The models were identified on insulin-modified intravenous glucose tolerance test (IM-IVGTT) data of 20 healthy subjects. C-peptide kinetics were fixed to standard population values, whereas insulin kinetics were assessed in each individual, along with IDR parameters, thanks to the presence of insulin decay data observed after exogenous insulin administration. From the two models, profiles of ISR and IDR were predicted, and ISR and IDR indexes of beta-cell responsivity to glucose in the basal state, as well as during first- and second-phase secretion, were estimated. HE profile, obtained by comparing ISR and IDR profiles, showed a rapid suppression immediately after the glucose administration. HE indexes, obtained by comparing ISR and IDR indexes, indicated that the liver is able to extract 70 +/- 9% of insulin passing through it in the basal state and 54 +/- 14% during IM-IVGTT. In conclusion, insulin secretion, kinetics, and hepatic extraction can be reliably assessed during an IM-IVGTT by using insulin and C-peptide minimal models.  相似文献   

13.
Yi N  Shriner D  Banerjee S  Mehta T  Pomp D  Yandell BS 《Genetics》2007,176(3):1865-1877
We extend our Bayesian model selection framework for mapping epistatic QTL in experimental crosses to include environmental effects and gene-environment interactions. We propose a new, fast Markov chain Monte Carlo algorithm to explore the posterior distribution of unknowns. In addition, we take advantage of any prior knowledge about genetic architecture to increase posterior probability on more probable models. These enhancements have significant computational advantages in models with many effects. We illustrate the proposed method by detecting new epistatic and gene-sex interactions for obesity-related traits in two real data sets of mice. Our method has been implemented in the freely available package R/qtlbim (http://www.qtlbim.org) to facilitate the general usage of the Bayesian methodology for genomewide interacting QTL analysis.  相似文献   

14.
We consider a set of sample counts obtained by sampling arbitrary fractions of a finite volume containing an homogeneously dispersed population of identical objects. We report a Bayesian derivation of the posterior probability distribution of the population size using a binomial likelihood and non-conjugate, discrete uniform priors under sampling with or without replacement. Our derivation yields a computationally feasible formula that can prove useful in a variety of statistical problems involving absolute quantification under uncertainty. We implemented our algorithm in the R package dupiR and compared it with a previously proposed Bayesian method based on a Gamma prior. As a showcase, we demonstrate that our inference framework can be used to estimate bacterial survival curves from measurements characterized by extremely low or zero counts and rather high sampling fractions. All in all, we provide a versatile, general purpose algorithm to infer population sizes from count data, which can find application in a broad spectrum of biological and physical problems.  相似文献   

15.
The Bayesian method of phylogenetic inference often produces high posterior probabilities (PPs) for trees or clades, even when the trees are clearly incorrect. The problem appears to be mainly due to large sizes of molecular datasets and to the large-sample properties of Bayesian model selection and its sensitivity to the prior when several of the models under comparison are nearly equally correct (or nearly equally wrong) and are of the same dimension. A previous suggestion to alleviate the problem is to let the internal branch lengths in the tree become increasingly small in the prior with the increase in the data size so that the bifurcating trees are increasingly star-like. In particular, if the internal branch lengths are assigned the exponential prior, the prior mean mu0 should approach zero faster than 1/square root n but more slowly than 1/n, where n is the sequence length. This paper examines the usefulness of this data size-dependent prior using a dataset of the mitochondrial protein-coding genes from the baleen whales, with the prior mean fixed at mu0=0.1n(-2/3). In this dataset, phylogeny reconstruction is sensitive to the assumed evolutionary model, species sampling and the type of data (DNA or protein sequences), but Bayesian inference using the default prior attaches high PPs for conflicting phylogenetic relationships. The data size-dependent prior alleviates the problem to some extent, giving weaker support for unstable relationships. This prior may be useful in reducing apparent conflicts in the results of Bayesian analysis or in making the method less sensitive to model violations.  相似文献   

16.
ABSTRACT: BACKGROUND: Inference about regulatory networks from high-throughput genomics data is of great interest in systems biology. We present a Bayesian approach to infer gene regulatory networks from time series expression data by integrating various types of biological knowledge. RESULTS: We formulate network construction as a series of variable selection problems and use linear regression to model the data. Our method summarizes additional data sources with an informative prior probability distribution over candidate regression models. We extend the Bayesian model averaging (BMA) variable selection method to select regulators in the regression framework. We summarize the external biological knowledge by an informative prior probability distribution over the candidate regression models. CONCLUSIONS: We demonstrate our method on simulated data and a set of time-series microarray experiments measuring the effect of a drug perturbation on gene expression levels, and show that it outperforms leading regression-based methods in the literature.  相似文献   

17.
The recent controversy over the increased risk of venous thrombosis with third generation oral contraceptives illustrates the public policy dilemma that can be created by relying on conventional statistical tests and estimates: case-control studies showed a significant increase in risk and forced a decision either to warn or not to warn. Conventional statistical tests are an improper basis for such decisions because they dichotomise results according to whether they are or are not significant and do not allow decision makers to take explicit account of additional evidence--for example, of biological plausibility or of biases in the studies. A Bayesian approach overcomes both these problems. A Bayesian analysis starts with a "prior" probability distribution for the value of interest (for example, a true relative risk)--based on previous knowledge--and adds the new evidence (via a model) to produce a "posterior" probability distribution. Because different experts will have different prior beliefs sensitivity analyses are important to assess the effects on the posterior distributions of these differences. Sensitivity analyses should also examine the effects of different assumptions about biases and about the model which links the data with the value of interest. One advantage of this method is that it allows such assumptions to be handled openly and explicitly. Data presented as a series of posterior probability distributions would be a much better guide to policy, reflecting the reality that degrees of belief are often continuous, not dichotomous, and often vary from one person to another in the face of inconclusive evidence.  相似文献   

18.
R D Ball 《Genetics》2001,159(3):1351-1364
We describe an approximate method for the analysis of quantitative trait loci (QTL) based on model selection from multiple regression models with trait values regressed on marker genotypes, using a modification of the easily calculated Bayesian information criterion to estimate the posterior probability of models with various subsets of markers as variables. The BIC-delta criterion, with the parameter delta increasing the penalty for additional variables in a model, is further modified to incorporate prior information, and missing values are handled by multiple imputation. Marginal probabilities for model sizes are calculated, and the posterior probability of nonzero model size is interpreted as the posterior probability of existence of a QTL linked to one or more markers. The method is demonstrated on analysis of associations between wood density and markers on two linkage groups in Pinus radiata. Selection bias, which is the bias that results from using the same data to both select the variables in a model and estimate the coefficients, is shown to be a problem for commonly used non-Bayesian methods for QTL mapping, which do not average over alternative possible models that are consistent with the data.  相似文献   

19.
Stochastic search variable selection (SSVS) is a Bayesian variable selection method that employs covariate‐specific discrete indicator variables to select which covariates (e.g., molecular markers) are included in or excluded from the model. We present a new variant of SSVS where, instead of discrete indicator variables, we use continuous‐scale weighting variables (which take also values between zero and one) to select covariates into the model. The improved model performance is shown and compared to standard SSVS using simulated and real quantitative trait locus mapping datasets. The decision making to decide phenotype‐genotype associations in our SSVS variant is based on median of posterior distribution or using Bayes factors. We also show here that by using continuous‐scale weighting variables it is possible to improve mixing properties of Markov chain Monte Carlo sampling substantially compared to standard SSVS. Also, the separation of association signals and nonsignals (control of noise level) seems to be more efficient compared to the standard SSVS. Thus, the novel method provides efficient new framework for SSVS analysis that additionally provides whole posterior distribution for pseudo‐indicators which means more information and may help in decision making.  相似文献   

20.
Evolutionary biology is a study of life's history on Earth. In researching this history, biologists are often interested in attempting to reconstruct phenotypes for the long extinct ancestors of living species. Various methods have been developed to do this on a phylogeny from the data for extant taxa. In the present article, I introduce a new approach for ancestral character estimation for discretely valued traits. This approach is based on the threshold model from evolutionary quantitative genetics. Under the threshold model, the value exhibited by an individual or species for a discrete character is determined by an underlying, unobserved continuous trait called “liability.” In this new method for ancestral state reconstruction, I use Bayesian Markov chain Monte Carlo (MCMC) to sample the liabilities of ancestral and tip species, and the relative positions of two or more thresholds, from their joint posterior probability distribution. Using data simulated under the model, I find that the method has very good performance in ancestral character estimation. Use of the threshold model for ancestral state reconstruction relies on a priori specification of the order of the discrete character states along the liability axis. I test the use of a Bayesian MCMC information theoretic criterion based approach to choose among different hypothesized orderings for the discrete character. Finally, I apply the method to the evolution of feeding mode in centrarchid fishes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号