共查询到20条相似文献,搜索用时 15 毫秒
1.
On priors providing frequentist validity for Bayesian inference 总被引:4,自引:0,他引:4
2.
Recent studies have observed that Bayesian analyses of sequence data sets using the program MrBayes sometimes generate extremely large branch lengths, with posterior credibility intervals for the tree length (sum of branch lengths) excluding the maximum likelihood estimates. Suggested explanations for this phenomenon include the existence of multiple local peaks in the posterior, lack of convergence of the chain in the tail of the posterior, mixing problems, and misspecified priors on branch lengths. Here, we analyze the behavior of Bayesian Markov chain Monte Carlo algorithms when the chain is in the tail of the posterior distribution and note that all these phenomena can occur. In Bayesian phylogenetics, the likelihood function approaches a constant instead of zero when the branch lengths increase to infinity. The flat tail of the likelihood can cause poor mixing and undue influence of the prior. We suggest that the main cause of the extreme branch length estimates produced in many Bayesian analyses is the poor choice of a default prior on branch lengths in current Bayesian phylogenetic programs. The default prior in MrBayes assigns independent and identical distributions to branch lengths, imposing strong (and unreasonable) assumptions about the tree length. The problem is exacerbated by the strong correlation between the branch lengths and parameters in models of variable rates among sites or among site partitions. To resolve the problem, we suggest two multivariate priors for the branch lengths (called compound Dirichlet priors) that are fairly diffuse and demonstrate their utility in the special case of branch length estimation on a star phylogeny. Our analysis highlights the need for careful thought in the specification of high-dimensional priors in Bayesian analyses. 相似文献
3.
4.
The problem of inferring haplotypes from genotypes of single nucleotide polymorphisms (SNPs) is essential for the understanding of genetic variation within and among populations, with important applications to the genetic analysis of disease propensities and other complex traits. The problem can be formulated as a mixture model, where the mixture components correspond to the pool of haplotypes in the population. The size of this pool is unknown; indeed, knowing the size of the pool would correspond to knowing something significant about the genome and its history. Thus methods for fitting the genotype mixture must crucially address the problem of estimating a mixture with an unknown number of mixture components. In this paper we present a Bayesian approach to this problem based on a nonparametric prior known as the Dirichlet process. The model also incorporates a likelihood that captures statistical errors in the haplotype/genotype relationship trading off these errors against the size of the pool of haplotypes. We describe an algorithm based on Markov chain Monte Carlo for posterior inference in our model. The overall result is a flexible Bayesian method, referred to as DP-Haplotyper, that is reminiscent of parsimony methods in its preference for small haplotype pools. We further generalize the model to treat pedigree relationships (e.g., trios) between the population's genotypes. We apply DP-Haplotyper to the analysis of both simulated and real genotype data, and compare to extant methods. 相似文献
5.
6.
7.
Bayesian inference for small-sample capture-recapture data 总被引:1,自引:0,他引:1
Chavez-Demoulin V 《Biometrics》1999,55(3):727-731
We consider data on the survival of a population of Cephalorhynchus hectori, Hector's dolphins, in a marine area of New Zealand. To estimate survival probabilities of animal populations, a multiple capture-recapture sampling scheme can be used. In this paper, we propose a practical methodology to derive approximations to posterior distributions based on Laplace methods. We show how to calculate Bayes estimates and credible intervals in this setting. 相似文献
8.
9.
10.
We propose a class of longitudinal data models with random effects that generalizes currently used models in two important ways. First, the random-effects model is a flexible mixture of multivariate normals, accommodating population heterogeneity, outliers, and nonlinearity in the regression on subject-specific covariates. Second, the model includes a hierarchical extension to allow for meta-analysis over related studies. The random-effects distributions are decomposed into one part that is common across all related studies (common measure), and one part that is specific to each study and that captures the variability intrinsic between patients within the same study. Both the common measure and the study-specific measures are parameterized as mixture-of-normals models. We carry out inference using reversible jump posterior simulation to allow a random number of terms in the mixtures. The sampler takes advantage of the small number of entertained models. The motivating application is the analysis of two studies carried out by the Cancer and Leukemia Group B (CALGB). In both studies, we record for each patient white blood cell counts (WBC) over time to characterize the toxic effects of treatment. The WBCs are modeled through a nonlinear hierarchical model that gathers the information from both studies. 相似文献
11.
Greenland S 《Biometrics》2003,59(1):92-99
Conjugate priors for Bayesian analyses of relative risks can be quite restrictive, because their shape depends on their location. By introducing a separate location parameter, however, these priors generalize to allow modeling of a broad range of prior opinions, while still preserving the computational simplicity of conjugate analyses. The present article illustrates the resulting generalized conjugate analyses using examples from case-control studies of the association of residential wire codes and magnetic fields with childhood leukemia. 相似文献
12.
Alex Diana Emily Beth Dennis Eleni Matechou Byron John Treharne Morgan 《Biometrics》2023,79(3):2503-2515
In recent years, the study of species' occurrence has benefited from the increased availability of large-scale citizen-science data. While abundance data from standardized monitoring schemes are biased toward well-studied taxa and locations, opportunistic data are available for many taxonomic groups, from a large number of locations and across long timescales. Hence, these data provide opportunities to measure species' changes in occurrence, particularly through the use of occupancy models, which account for imperfect detection. These opportunistic datasets can be substantially large, numbering hundreds of thousands of sites, and hence present a challenge from a computational perspective, especially within a Bayesian framework. In this paper, we develop a unifying framework for Bayesian inference in occupancy models that account for both spatial and temporal autocorrelation. We make use of the Pólya-Gamma scheme, which allows for fast inference, and incorporate spatio-temporal random effects using Gaussian processes (GPs), for which we consider two efficient approximations: subset of regressors and nearest neighbor GPs. We apply our model to data on two UK butterfly species, one common and widespread and one rare, using records from the Butterflies for the New Millennium database, producing occupancy indices spanning 45 years. Our framework can be applied to a wide range of taxa, providing measures of variation in species' occurrence, which are used to assess biodiversity change. 相似文献
13.
The problem of simultaneous covariate selection and parameter inference for spatial regression models is considered. Previous research has shown that failure to take spatial correlation into account can influence the outcome of standard model selection methods. A Markov chain Monte Carlo (MCMC) method is investigated for the calculation of parameter estimates and posterior model probabilities for spatial regression models. The method can accommodate normal and non-normal response data and a large number of covariates. Thus the method is very flexible and can be used to fit spatial linear models, spatial linear mixed models, and spatial generalized linear mixed models (GLMMs). The Bayesian MCMC method also allows a priori unequal weighting of covariates, which is not possible with many model selection methods such as Akaike's information criterion (AIC). The proposed method is demonstrated on two data sets. The first is the whiptail lizard data set which has been previously analyzed by other researchers investigating model selection methods. Our results confirmed the previous analysis suggesting that sandy soil and ant abundance were strongly associated with lizard abundance. The second data set concerned pollution tolerant fish abundance in relation to several environmental factors. Results indicate that abundance is positively related to Strahler stream order and a habitat quality index. Abundance is negatively related to percent watershed disturbance. 相似文献
14.
15.
Bayesian inference for a bivariate binomial distribution 总被引:1,自引:0,他引:1
16.
Systems neuroscience traditionally conceptualizes a population of spiking neurons as merely encoding the value of a stimulus. Yet, psychophysics has revealed that people take into account stimulus uncertainty when performing sensory or motor computations and do so in a nearly Bayes-optimal way. This suggests that neural populations do not encode just a single value but an entire probability distribution over the stimulus. Several such probabilistic codes have been proposed, including one that utilizes the structure of neural variability to enable simple neural implementations of probabilistic computations such as optimal cue integration. This approach provides a quantitative link between Bayes-optimal behaviors and specific neural operations. It allows for novel ways to evaluate probabilistic codes and for predictions for physiological population recordings. 相似文献
17.
Bognar MA 《Biometrical journal. Biometrische Zeitschrift》2006,48(2):205-219
The K function is a summary of spatial dependence in spatial point processes. In practice one observes a realization of the spatial point process, called a spatial point pattern. Although the K function of a spatial point process is typically unknown, several estimators of the process K function have been put forth. These estimators, however, are based upon empirical averages; the complicated distributional properties of the estimators unfortunately complicates interval estimation. In this paper, we propose a Bayesian inferential framework, allowing inference for the K function of the spatial point process (including interval estimation). Of particular interest is the unique use of the posterior predictive distribution to (efficiently) enable such inferences. To demonstrate our technique, the well known Swedish pine sapling data (Strand, 1972) is analyzed, including a discussion on evaluating model fit. 相似文献
18.
Bayesian inference has emerged as a general framework that captures how organisms make decisions under uncertainty. Recent experimental findings reveal disparate mechanisms for how the brain generates behaviors predicted by normative Bayesian theories. Here, we identify two broad classes of neural implementations for Bayesian inference: a modular class, where each probabilistic component of Bayesian computation is independently encoded and a transform class, where uncertain measurements are converted to Bayesian estimates through latent processes. Many recent experimental neuroscience findings studying probabilistic inference broadly fall into these classes. We identify potential avenues for synthesis across these two classes and the disparities that, at present, cannot be reconciled. We conclude that to distinguish among implementation hypotheses for Bayesian inference, we require greater engagement among theoretical and experimental neuroscientists in an effort that spans different scales of analysis, circuits, tasks, and species. 相似文献
19.
Russell Schwartz Bjarni V Halldórsson Vineet Bafna Andrew G Clark Sorin Istrail 《Journal of computational biology》2003,10(1):13-19
In this report, we examine the validity of the haplotype block concept by comparing block decompositions derived from public data sets by variants of several leading methods of block detection. We first develop a statistical method for assessing the concordance of two block decompositions. We then assess the robustness of inferred haplotype blocks to the specific detection method chosen, to arbitrary choices made in the block-detection algorithms, and to the sample analyzed. Although the block decompositions show levels of concordance that are very unlikely by chance, the absolute magnitude of the concordance may be low enough to limit the utility of the inference. For purposes of SNP selection, it seems likely that methods that do not arbitrarily impose block boundaries among correlated SNPs might perform better than block-based methods. 相似文献
20.
Ronquist F 《Trends in ecology & evolution》2004,19(9):475-481
Much recent progress in evolutionary biology is based on the inference of ancestral states and past transformations in important traits on phylogenetic trees. These exercises often assume that the tree is known without error and that ancestral states and character change can be mapped onto it exactly. In reality, there is often considerable uncertainty about both the tree and the character mapping. Recently introduced Bayesian statistical methods enable the study of character evolution while simultaneously accounting for both phylogenetic and mapping uncertainty, adding much needed credibility to the reconstruction of evolutionary history. 相似文献