首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 24 毫秒
1.
We modified the phylogenetic program MrBayes 3.1.2 to incorporate the compound Dirichlet priors for branch lengths proposed recently by Rannala, Zhu, and Yang (2012. Tail paradox, partial identifiability and influential priors in Bayesian branch length inference. Mol. Biol. Evol. 29:325-335.) as a solution to the problem of branch-length overestimation in Bayesian phylogenetic inference. The compound Dirichlet prior specifies a fairly diffuse prior on the tree length (the sum of branch lengths) and uses a Dirichlet distribution to partition the tree length into branch lengths. Six problematic data sets originally analyzed by Brown, Hedtke, Lemmon, and Lemmon (2010. When trees grow too long: investigating the causes of highly inaccurate Bayesian branch-length estimates. Syst. Biol. 59:145-161) are reanalyzed using the modified version of MrBayes to investigate properties of Bayesian branch-length estimation using the new priors. While the default exponential priors for branch lengths produced extremely long trees, the compound Dirichlet priors produced posterior estimates that are much closer to the maximum likelihood estimates. Furthermore, the posterior tree lengths were quite robust to changes in the parameter values in the compound Dirichlet priors, for example, when the prior mean of tree length changed over several orders of magnitude. Our results suggest that the compound Dirichlet priors may be useful for correcting branch-length overestimation in phylogenetic analyses of empirical data sets.  相似文献   

2.
Bayesian Inference in Semiparametric Mixed Models for Longitudinal Data   总被引:1,自引:0,他引:1  
Summary .  We consider Bayesian inference in semiparametric mixed models (SPMMs) for longitudinal data. SPMMs are a class of models that use a nonparametric function to model a time effect, a parametric function to model other covariate effects, and parametric or nonparametric random effects to account for the within-subject correlation. We model the nonparametric function using a Bayesian formulation of a cubic smoothing spline, and the random effect distribution using a normal distribution and alternatively a nonparametric Dirichlet process (DP) prior. When the random effect distribution is assumed to be normal, we propose a uniform shrinkage prior (USP) for the variance components and the smoothing parameter. When the random effect distribution is modeled nonparametrically, we use a DP prior with a normal base measure and propose a USP for the hyperparameters of the DP base measure. We argue that the commonly assumed DP prior implies a nonzero mean of the random effect distribution, even when a base measure with mean zero is specified. This implies weak identifiability for the fixed effects, and can therefore lead to biased estimators and poor inference for the regression coefficients and the spline estimator of the nonparametric function. We propose an adjustment using a postprocessing technique. We show that under mild conditions the posterior is proper under the proposed USP, a flat prior for the fixed effect parameters, and an improper prior for the residual variance. We illustrate the proposed approach using a longitudinal hormone dataset, and carry out extensive simulation studies to compare its finite sample performance with existing methods.  相似文献   

3.
Rosner GL 《Biometrics》2005,61(1):239-245
This article presents an aid for monitoring clinical trials with failure-time endpoints based on the Bayesian nonparametric analyses of the data. The posterior distribution is a mixture of Dirichlet processes in the presence of censoring if one assumes a Dirichlet process prior for the survival distribution. Using Gibbs sampling, one can generate random samples from the posterior distribution. With samples from the posterior distributions of treatment-specific survival curves, one can evaluate the current evidence in favor of stopping or continuing the trial based on summary statistics of these survival curves. Because the method is nonparametric, it can easily be used, for example, in situations where hazards cross or are suspected to cross and where relevant clinical decisions might be based on estimating when the integral between the curves might be expected to become positive and in favor of the new but toxic therapy. An example based on an actual trial illustrates the method.  相似文献   

4.
A Bayesian nonparametric form of regression based on Dirichlet process priors is adapted to the analysis of quantitative traits possibly affected by cryptic forms of gene action, and to the context of SNP-assisted genomic selection, where the main objective is to predict a genomic signal on phenotype. The procedure clusters unknown genotypes into groups with distinct genetic values, but in a setting in which the number of clusters is unknown a priori, so that standard methods for finite mixture analysis do not work. The central assumption is that genetic effects follow an unknown distribution with some “baseline” family, which is a normal process in the cases considered here. A Bayesian analysis based on the Gibbs sampler produces estimates of the number of clusters, posterior means of genetic effects, a measure of credibility in the baseline distribution, as well as estimates of parameters of the latter. The procedure is illustrated with a simulation representing two populations. In the first one, there are 3 unknown QTL, with additive, dominance and epistatic effects; in the second, there are 10 QTL with additive, dominance and additive × additive epistatic effects. In the two populations, baseline parameters are inferred correctly. The Dirichlet process model infers the number of unique genetic values correctly in the first population, but it produces an understatement in the second one; here, the true number of clusters is over 900, and the model gives a posterior mean estimate of about 140, probably because more replication of genotypes is needed for correct inference. The impact on inferences of the prior distribution of a key parameter (M), and of the extent of replication, was examined via an analysis of mean body weight in 192 paternal half-sib families of broiler chickens, where each sire was genotyped for nearly 7,000 SNPs. In this small sample, it was found that inference about the number of clusters was affected by the prior distribution of M. For a set of combinations of parameters of a given prior distribution, the effects of the prior dissipated when the number of replicate samples per genotype was increased. Thus, the Dirichlet process model seems to be useful for gauging the number of QTLs affecting the trait: if the number of clusters inferred is small, probably just a few QTLs code for the trait. If the number of clusters inferred is large, this may imply that standard parametric models based on the baseline distribution may suffice. However, priors may be influential, especially if sample size is not large and if only a few genotypic configurations have replicate phenotypes in the sample.  相似文献   

5.
In the context of right-censored and interval-censored data, we develop asymptotic formulas to compute pseudo-observations for the survival function and the restricted mean survival time (RMST). These formulas are based on the original estimators and do not involve computation of the jackknife estimators. For right-censored data, Von Mises expansions of the Kaplan–Meier estimator are used to derive the pseudo-observations. For interval-censored data, a general class of parametric models for the survival function is studied. An asymptotic representation of the pseudo-observations is derived involving the Hessian matrix and the score vector. Theoretical results that justify the use of pseudo-observations in regression are also derived. The formula is illustrated on the piecewise-constant-hazard model for the RMST. The proposed approximations are extremely accurate, even for small sample sizes, as illustrated by Monte Carlo simulations and real data. We also study the gain in terms of computation time, as compared to the original jackknife method, which can be substantial for a large dataset.  相似文献   

6.
Disease incidence or mortality data are typically available as rates or counts for specified regions, collected over time. We propose Bayesian nonparametric spatial modeling approaches to analyze such data. We develop a hierarchical specification using spatial random effects modeled with a Dirichlet process prior. The Dirichlet process is centered around a multivariate normal distribution. This latter distribution arises from a log-Gaussian process model that provides a latent incidence rate surface, followed by block averaging to the areal units determined by the regions in the study. With regard to the resulting posterior predictive inference, the modeling approach is shown to be equivalent to an approach based on block averaging of a spatial Dirichlet process to obtain a prior probability model for the finite dimensional distribution of the spatial random effects. We introduce a dynamic formulation for the spatial random effects to extend the model to spatio-temporal settings. Posterior inference is implemented through Gibbs sampling. We illustrate the methodology with simulated data as well as with a data set on lung cancer incidences for all 88 counties in the state of Ohio over an observation period of 21 years.  相似文献   

7.
Summary .  A variety of flexible approaches have been proposed for functional data analysis, allowing both the mean curve and the distribution about the mean to be unknown. Such methods are most useful when there is limited prior information. Motivated by applications to modeling of temperature curves in the menstrual cycle, this article proposes a flexible approach for incorporating prior information in semiparametric Bayesian analyses of hierarchical functional data. The proposed approach is based on specifying the distribution of functions as a mixture of a parametric hierarchical model and a nonparametric contamination. The parametric component is chosen based on prior knowledge, while the contamination is characterized as a functional Dirichlet process. In the motivating application, the contamination component allows unanticipated curve shapes in unhealthy menstrual cycles. Methods are developed for posterior computation, and the approach is applied to data from a European fecundability study.  相似文献   

8.
9.

Background

In quantitative trait mapping and genomic prediction, Bayesian variable selection methods have gained popularity in conjunction with the increase in marker data and computational resources. Whereas shrinkage-inducing methods are common tools in genomic prediction, rigorous decision making in mapping studies using such models is not well established and the robustness of posterior results is subject to misspecified assumptions because of weak biological prior evidence.

Methods

Here, we evaluate the impact of prior specifications in a shrinkage-based Bayesian variable selection method which is based on a mixture of uniform priors applied to genetic marker effects that we presented in a previous study. Unlike most other shrinkage approaches, the use of a mixture of uniform priors provides a coherent framework for inference based on Bayes factors. To evaluate the robustness of genetic association under varying prior specifications, Bayes factors are compared as signals of positive marker association, whereas genomic estimated breeding values are considered for genomic selection. The impact of specific prior specifications is reduced by calculation of combined estimates from multiple specifications. A Gibbs sampler is used to perform Markov chain Monte Carlo estimation (MCMC) and a generalized expectation-maximization algorithm as a faster alternative for maximum a posteriori point estimation. The performance of the method is evaluated by using two publicly available data examples: the simulated QTLMAS XII data set and a real data set from a population of pigs.

Results

Combined estimates of Bayes factors were very successful in identifying quantitative trait loci, and the ranking of Bayes factors was fairly stable among markers with positive signals of association under varying prior assumptions, but their magnitudes varied considerably. Genomic estimated breeding values using the mixture of uniform priors compared well to other approaches for both data sets and loss of accuracy with the generalized expectation-maximization algorithm was small as compared to that with MCMC.

Conclusions

Since no error-free method to specify priors is available for complex biological phenomena, exploring a wide variety of prior specifications and combining results provides some solution to this problem. For this purpose, the mixture of uniform priors approach is especially suitable, because it comprises a wide and flexible family of distributions and computationally intensive estimation can be carried out in a reasonable amount of time.  相似文献   

10.
Nathan P. Lemoine 《Oikos》2019,128(7):912-928
Throughout the last two decades, Bayesian statistical methods have proliferated throughout ecology and evolution. Numerous previous references established both philosophical and computational guidelines for implementing Bayesian methods. However, protocols for incorporating prior information, the defining characteristic of Bayesian philosophy, are nearly nonexistent in the ecological literature. Here, I hope to encourage the use of weakly informative priors in ecology and evolution by providing a ‘consumer's guide’ to weakly informative priors. The first section outlines three reasons why ecologists should abandon noninformative priors: 1) common flat priors are not always noninformative, 2) noninformative priors provide the same result as simpler frequentist methods, and 3) noninformative priors suffer from the same high type I and type M error rates as frequentist methods. The second section provides a guide for implementing informative priors, wherein I detail convenient ‘reference’ prior distributions for common statistical models (i.e. regression, ANOVA, hierarchical models). I then use simulations to visually demonstrate how informative priors influence posterior parameter estimates. With the guidelines provided here, I hope to encourage the use of weakly informative priors for Bayesian analyses in ecology. Ecologists can and should debate the appropriate form of prior information, but should consider weakly informative priors as the new ‘default’ prior for any Bayesian model.  相似文献   

11.
Variable Selection for Semiparametric Mixed Models in Longitudinal Studies   总被引:2,自引:0,他引:2  
Summary .  We propose a double-penalized likelihood approach for simultaneous model selection and estimation in semiparametric mixed models for longitudinal data. Two types of penalties are jointly imposed on the ordinary log-likelihood: the roughness penalty on the nonparametric baseline function and a nonconcave shrinkage penalty on linear coefficients to achieve model sparsity. Compared to existing estimation equation based approaches, our procedure provides valid inference for data with missing at random, and will be more efficient if the specified model is correct. Another advantage of the new procedure is its easy computation for both regression components and variance parameters. We show that the double-penalized problem can be conveniently reformulated into a linear mixed model framework, so that existing software can be directly used to implement our method. For the purpose of model inference, we derive both frequentist and Bayesian variance estimation for estimated parametric and nonparametric components. Simulation is used to evaluate and compare the performance of our method to the existing ones. We then apply the new method to a real data set from a lactation study.  相似文献   

12.
The t-year mean survival or restricted mean survival time (RMST) has been used as an appealing summary of the survival distribution within a time window [0, t]. RMST is the patient's life expectancy until time t and can be estimated nonparametrically by the area under the Kaplan-Meier curve up to t. In a comparative study, the difference or ratio of two RMSTs has been utilized to quantify the between-group-difference as a clinically interpretable alternative summary to the hazard ratio. The choice of the time window [0, t] may be prespecified at the design stage of the study based on clinical considerations. On the other hand, after the survival data have been collected, the choice of time point t could be data-dependent. The standard inferential procedures for the corresponding RMST, which is also data-dependent, ignore this subtle yet important issue. In this paper, we clarify how to make inference about a random “parameter.” Moreover, we demonstrate that under a rather mild condition on the censoring distribution, one can make inference about the RMST up to t, where t is less than or even equal to the largest follow-up time (either observed or censored) in the study. This finding reduces the subjectivity of the choice of t empirically. The proposal is illustrated with the survival data from a primary biliary cirrhosis study, and its finite sample properties are investigated via an extensive simulation study.  相似文献   

13.
Analysis of doubly-censored survival data, with application to AIDS   总被引:5,自引:0,他引:5  
This paper proposes nonparametric and weakly structured parametric methods for analyzing survival data in which both the time origin and the failure event can be right- or interval-censored. Such data arise in clinical investigations of the human immunodeficiency virus (HIV) when the infection and clinical status of patients are observed only at several time points. The proposed methods generalize the self-consistency algorithm proposed by Turnbull (1976, Journal of the Royal Statistical Society, Series B 38, 290-295) for singly-censored univariate data, and are illustrated with the results from a study of hemophiliacs who were infected with HIV by contaminated blood factor.  相似文献   

14.
Our nervous system continuously combines new information from our senses with information it has acquired throughout life. Numerous studies have found that human subjects manage this by integrating their observations with their previous experience (priors) in a way that is close to the statistical optimum. However, little is known about the way the nervous system acquires or learns priors. Here we present results from experiments where the underlying distribution of target locations in an estimation task was switched, manipulating the prior subjects should use. Our experimental design allowed us to measure a subject''s evolving prior while they learned. We confirm that through extensive practice subjects learn the correct prior for the task. We found that subjects can rapidly learn the mean of a new prior while the variance is learned more slowly and with a variable learning rate. In addition, we found that a Bayesian inference model could predict the time course of the observed learning while offering an intuitive explanation for the findings. The evidence suggests the nervous system continuously updates its priors to enable efficient behavior.  相似文献   

15.
Pennell ML  Dunson DB 《Biometrics》2008,64(2):413-423
Summary .   In certain biomedical studies, one may anticipate changes in the shape of a response distribution across the levels of an ordinal predictor. For instance, in toxicology studies, skewness and modality might change as dose increases. To address this issue, we propose a Bayesian nonparametric method for testing for distribution changes across an ordinal predictor. Using a dynamic mixture of Dirichlet processes, we allow the response distribution to change flexibly at each level of the predictor. In addition, by assigning mixture priors to the hyperparameters, we can obtain posterior probabilities of no effect of the predictor and identify the lowest dose level for which there is an appreciable change in distribution. The method also provides a natural framework for performing tests across multiple outcomes. We apply our method to data from a genotoxicity experiment.  相似文献   

16.
Follmann DA  Albert PS 《Biometrics》1999,55(2):603-607
A Bayesian approach to monitoring event rates with censored data is proposed. A Dirichlet prior for discrete time event probabilities is blended with discrete survival times to provide a posterior distribution that is a mixture of Dirichlets. Approximation of the posterior distribution via data augmentation is discussed. Practical issues involved in implementing the procedure are discussed and illustrated with a simulation of the single arm Cord Blood Transplantation Study where 6-month survival is monitored.  相似文献   

17.
Accurate estimates of population parameters are vital for estimating extinction risk. Such parameters, however, are typically not available for threatened populations. We used a recently developed software tool based on Markov Chain Monte Carlo methods for carrying out Bayesian inference (the BUGS package) to estimate four demographic parameters; the intrinsic growth rate, the strength of density dependence, and the demographic and environmental variance, in three species of small temperate passerines from two sets of time series data taken from a dipper and a song sparrow population, and from previously obtained frequentist estimates of the same parameters in the great tit. By simultaneously modeling variation in these demographic parameters across species and using the resulting distributions as priors in the estimation for individual species, we improve the estimates for each individual species. This framework also allows us to make probabilistic statements about plausible parameter values for small passerines temperate birds in general which is often critically needed in management of species for which little or no data are available. We also discuss how our work relates to recently developed theory on dynamic stochastic population models, and finally note some important differences between frequentist and Bayesian methods.  相似文献   

18.
The increasing ability to extract and sequence DNA from noncontemporaneous tissue offers biologists the opportunity to analyse ancient DNA (aDNA) together with modern DNA (mDNA) to address the taxonomy of extinct species, evolutionary origins, historical phylogeography and biogeography. Perhaps more exciting are recent developments in coalescence-based Bayesian inference that offer the potential to use temporal information from aDNA and mDNA for the estimation of substitution rates and divergence dates as an alternative to fossil and geological calibration. This comes at a time of growing interest in the possibility of time dependency for molecular rate estimates. In this study, we provide a critical assessment of Bayesian Markov chain Monte Carlo (MCMC) analysis for the estimation of substitution rate using simulated samples of aDNA and mDNA. We conclude that the current models and priors employed in Bayesian MCMC analysis of heterochronous mtDNA are susceptible to an upward bias in the estimation of substitution rates because of model misspecification when the data come from populations with less than simple demographic histories, including sudden short-lived population bottlenecks or pronounced population structure. However, when model misspecification is only mild, then the 95% highest posterior density intervals provide adequate frequentist coverage of the true rates.  相似文献   

19.
Kim YJ 《Biometrics》2006,62(2):458-464
In doubly censored failure time data, the survival time of interest is defined as the elapsed time between an initial event and a subsequent event, and the occurrences of both events cannot be observed exactly. Instead, only right- or interval-censored observations on the occurrence times are available. For the analysis of such data, a number of methods have been proposed under the assumption that the survival time of interest is independent of the occurrence time of the initial event. This article investigates a different situation where the independence may not be true with the focus on regression analysis of doubly censored data. Cox frailty models are applied to describe the effects of covariates and an EM algorithm is developed for estimation. Simulation studies are performed to investigate finite sample properties of the proposed method and an illustrative example from an acquired immune deficiency syndrome (AIDS) cohort study is provided.  相似文献   

20.
Recent studies have observed that Bayesian analyses of sequence data sets using the program MrBayes sometimes generate extremely large branch lengths, with posterior credibility intervals for the tree length (sum of branch lengths) excluding the maximum likelihood estimates. Suggested explanations for this phenomenon include the existence of multiple local peaks in the posterior, lack of convergence of the chain in the tail of the posterior, mixing problems, and misspecified priors on branch lengths. Here, we analyze the behavior of Bayesian Markov chain Monte Carlo algorithms when the chain is in the tail of the posterior distribution and note that all these phenomena can occur. In Bayesian phylogenetics, the likelihood function approaches a constant instead of zero when the branch lengths increase to infinity. The flat tail of the likelihood can cause poor mixing and undue influence of the prior. We suggest that the main cause of the extreme branch length estimates produced in many Bayesian analyses is the poor choice of a default prior on branch lengths in current Bayesian phylogenetic programs. The default prior in MrBayes assigns independent and identical distributions to branch lengths, imposing strong (and unreasonable) assumptions about the tree length. The problem is exacerbated by the strong correlation between the branch lengths and parameters in models of variable rates among sites or among site partitions. To resolve the problem, we suggest two multivariate priors for the branch lengths (called compound Dirichlet priors) that are fairly diffuse and demonstrate their utility in the special case of branch length estimation on a star phylogeny. Our analysis highlights the need for careful thought in the specification of high-dimensional priors in Bayesian analyses.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号