首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 234 毫秒
1.
Jingru Zhang  Wei Lin 《Biometrics》2019,75(4):1098-1108
Clustered multinomial data are prevalent in a variety of applications such as microbiome studies, where metagenomic sequencing data are summarized as multinomial counts for a large number of bacterial taxa per subject. Count normalization with ad hoc zero adjustment tends to result in poor estimates of abundances for taxa with zero or small counts. To account for heterogeneity and overdispersion in such data, we suggest using the logistic normal multinomial (LNM) model with an arbitrary correlation structure to simultaneously estimate the taxa compositions by borrowing information across subjects. We overcome the computational difficulties in high dimensions by developing a stochastic approximation EM algorithm with Hamiltonian Monte Carlo sampling for scalable parameter estimation in the LNM model. The ill‐conditioning problem due to unstructured covariance is further mitigated by a covariance‐regularized estimator with a condition number constraint. The advantages of the proposed methods are illustrated through simulations and an application to human gut microbiome data.  相似文献   

2.
Abstract

In the present paper, computational efficiency of the hybrid Monte Carlo (HMC) method applied to the multicanonical ensemble is studied; the HMC is an equation of motion guided Monte Carlo method. As in the standard HMC for the canonical ensemble, the multicanonical HMC calculations with high acceptance ratio show better efficiency; about 60% acceptance yields the best performance for the system examined.  相似文献   

3.
A Model for Analysis of Population Structure   总被引:5,自引:3,他引:2       下载免费PDF全文
Arguments have been presented for the appropriateness of a multinomial Dirichlet distribution for describing single-locus genotypic frequencies in a subdivided population. This distribution is defined as a function of allele frequency, the average (over the entire population) inbreeding coefficient and the correlation between genotypes within a subdivision. Alternative parameterizations and their genetic interpretations are given.-We then show how information from a sample drawn from this subdivided population, in the absence of pedigrees, can be combined with the multinomial Dirichlet model to form a likelihood function. This likelihood function is then used as the basis for estimation and testing hypotheses concerning the genetic parameters of the model. Comparisons of this approach to the alternative procedure of Cockerham (1969) and (1973) are made using human data obtained from Tecumseh, Michigan and Monte Carlo simulations.-Finally, implications of these results to statistical inference and to mutation rates are presented.  相似文献   

4.
Holmes I  Harris K  Quince C 《PloS one》2012,7(2):e30126
We introduce Dirichlet multinomial mixtures (DMM) for the probabilistic modelling of microbial metagenomics data. This data can be represented as a frequency matrix giving the number of times each taxa is observed in each sample. The samples have different size, and the matrix is sparse, as communities are diverse and skewed to rare taxa. Most methods used previously to classify or cluster samples have ignored these features. We describe each community by a vector of taxa probabilities. These vectors are generated from one of a finite number of Dirichlet mixture components each with different hyperparameters. Observed samples are generated through multinomial sampling. The mixture components cluster communities into distinct 'metacommunities', and, hence, determine envirotypes or enterotypes, groups of communities with a similar composition. The model can also deduce the impact of a treatment and be used for classification. We wrote software for the fitting of DMM models using the 'evidence framework' (http://code.google.com/p/microbedmm/). This includes the Laplace approximation of the model evidence. We applied the DMM model to human gut microbe genera frequencies from Obese and Lean twins. From the model evidence four clusters fit this data best. Two clusters were dominated by Bacteroides and were homogenous; two had a more variable community composition. We could not find a significant impact of body mass on community structure. However, Obese twins were more likely to derive from the high variance clusters. We propose that obesity is not associated with a distinct microbiota but increases the chance that an individual derives from a disturbed enterotype. This is an example of the 'Anna Karenina principle (AKP)' applied to microbial communities: disturbed states having many more configurations than undisturbed. We verify this by showing that in a study of inflammatory bowel disease (IBD) phenotypes, ileal Crohn's disease (ICD) is associated with a more variable community.  相似文献   

5.
Zhu J  Eickhoff JC  Yan P 《Biometrics》2005,61(3):674-683
Observations of multiple-response variables across space and over time occur often in environmental and ecological studies. Compared to purely spatial models for a single response variable in the exponential family of distributions, fewer statistical tools are available for multiple-response variables that are not necessarily Gaussian. An exception is a common-factor model developed for multivariate spatial data by Wang and Wall (2003, Biostatistics 4, 569-582). The purpose of this article is to extend this multivariate space-only model and develop a flexible class of generalized linear latent variable models for multivariate spatial-temporal data. For statistical inference, maximum likelihood estimates and their standard deviations are obtained using a Monte Carlo EM algorithm. We also use a novel way to automatically adjust the Monte Carlo sample size, which facilitates the convergence of the Monte Carlo EM algorithm. The methodology is illustrated by an ecological study of red pine trees in response to bark beetle challenges in a forest stand of Wisconsin.  相似文献   

6.
Monte Carlo methods have received much attention in the recent literature of phylogeny analysis. However, the conventional Markov chain Monte Carlo algorithms, such as the Metropolis–Hastings algorithm, tend to get trapped in a local mode in simulating from the posterior distribution of phylogenetic trees, rendering the inference ineffective. In this paper, we apply an advanced Monte Carlo algorithm, the stochastic approximation Monte Carlo algorithm, to Bayesian phylogeny analysis. Our method is compared with two popular Bayesian phylogeny software, BAMBE and MrBayes, on simulated and real datasets. The numerical results indicate that our method outperforms BAMBE and MrBayes. Among the three methods, SAMC produces the consensus trees which have the highest similarity to the true trees, and the model parameter estimates which have the smallest mean square errors, but costs the least CPU time.  相似文献   

7.
In quantitative genetics, Markov chain Monte Carlo (MCMC) methods are indispensable for statistical inference in non-standard models like generalized linear models with genetic random effects or models with genetically structured variance heterogeneity. A particular challenge for MCMC applications in quantitative genetics is to obtain efficient updates of the high-dimensional vectors of genetic random effects and the associated covariance parameters. We discuss various strategies to approach this problem including reparameterization, Langevin-Hastings updates, and updates based on normal approximations. The methods are compared in applications to Bayesian inference for three data sets using a model with genetically structured variance heterogeneity.  相似文献   

8.
For applying Poisson sequential sampling in safety monitoring concerning new medical conditions (as well as in general control of safety) the one-sample Poisson sequential probability ratio test (SPRT) is modified. The modification results in an adaptive procedure with respect to the smallest detectable relative risk at any time point in the course of the monitoring process. Although this quality would imply maximal safety, the result must be seen as a modification towards maximal safety since the Wald constants of the Poisson SPRT, basic for the derivation of the proposed procedure, are approximate only. The procedure may detect high relative risks even by one observed event right after start of monitoring. The alternative hypothesis must not be specified and the chosen error probabilities α and β completely specify the sampling scheme which can be used with any value of the null hypothesis. The power function and run lengths are established by Monte Carlo runs for an exemplifying choice of α = 10% and β = 5%. Three specified SPRTs are evaluated, by Monte Carlo runs also, to compare their results with those of the modified procedure.  相似文献   

9.
The Monte Carlo technique is considered gold standard when it comes to patient-specific dosimetry. Any newly developed Monte Carlo simulation framework, however, has to be carefully calibrated and validated prior to its use. For many researchers this is a tedious work. We propose a two-step validation procedure for our newly built Monte Carlo framework and provide all input data to make it feasible for future related application by the wider community. The validation was at first performed by benchmarking against simulation data available in literature. The American Association of Physicists in Medicine (AAPM) report of task group 195 (case 2) was considered most appropriate for our application. Secondly, the framework was calibrated and validated against experimental measurements for trunk X-ray imaging protocols using a water phantom. The dose results obtained from all simulations and measurements were compared. Our Monte Carlo framework proved to agree with literature data, by showing a maximal difference below 4% to the AAPM report. The mean difference with the water phantom measurements was around 7%. The statistical uncertainty for clinical applications of the dosimetry model is expected to be within 10%. This makes it reliable for clinical dose calculations in general radiology. Input data and the described procedure allow for the validation of other Monte Carlo frameworks.  相似文献   

10.
Abstract

The principle purpose of this paper is to demonstrate the use of the Inverse Monte Carlo technique for calculating pair interaction energies in monoatomic liquids from a given equilibrium property. This method is based on the mathematical relation between transition probability and pair potential given by the fundamental equation of the “importance sampling” Monte Carlo method. In order to have well defined conditions for the test of the Inverse Monte Carlo method a Metropolis Monte Carlo simulation of a Lennard Jones liquid is carried out to give the equilibrium pair correlation function determined by the assumed potential. Because an equilibrium configuration is prerequisite for an Inverse Monte Carlo simulation a model system is generated reproducing the pair correlation function, which has been calculated by the Metropolis Monte Carlo simulation and therefore representing the system in thermal equilibrium. This configuration is used to simulate virtual atom displacements. The resulting changes in atom distribution for each single simulation step are inserted in a set of non-linear equations defining the transition probability for the virtual change of configuration. The solution of the set of equations for pair interaction energies yields the Lennard Jones potential by which the equilibrium configuration has been determined.  相似文献   

11.
Spatial weed count data are modeled and predicted using a generalized linear mixed model combined with a Bayesian approach and Markov chain Monte Carlo. Informative priors for a data set with sparse sampling are elicited using a previously collected data set with extensive sampling. Furthermore, we demonstrate that so-called Langevin-Hastings updates are useful for efficient simulation of the posterior distributions, and we discuss computational issues concerning prediction.  相似文献   

12.
MacNab YC 《Biometrics》2003,59(2):305-315
We present Bayesian hierarchical spatial models for spatially correlated small-area health service outcome and utilization rates, with a particular emphasis on the estimation of both measured and unmeasured or unknown covariate effects. This Bayesian hierarchical model framework enables simultaneous modeling of fixed covariate effects and random residual effects. The random effects are modeled via Bayesian prior specifications reflecting spatial heterogeneity globally and relative homogeneity among neighboring areas. The model inference is implemented using Markov chain Monte Carlo methods. Specifically, a hybrid Markov chain Monte Carlo algorithm (Neal, 1995, Bayesian Learning for Neural Networks; Gustafson, MacNab, and Wen, 2003, Statistics and Computing, to appear) is used for posterior sampling of the random effects. To illustrate relevant problems, methods, and techniques, we present an analysis of regional variation in intraventricular hemorrhage incidence rates among neonatal intensive care unit patients across Canada.  相似文献   

13.
Leonov H  Mitchell JS  Arkin IT 《Proteins》2003,51(3):352-359
The estimation of the number of protein folds in nature is a matter of considerable interest. In this study, a Monte Carlo method employing the broken stick model is used to assign a given number of proteins into a given number of folds. Subsequently, random, integer, non-repeating numbers are generated in order to simulate the process of fold discovery. With this conceptual framework at hand, the effects of two factors upon the fold identification process were investigated: (1) the nature of folds distributions and (2) preferential sampling bias of previously identified folds. Depending on the type of distribution, dividing 100,000 proteins into 1,000 folds resulted in 10-30% of the folds having 10 proteins or less per fold, approximately 10% of the folds having 10-20 proteins per fold, 31-45% having 20-100 proteins per fold, and >30% of the folds having more than 100 proteins per fold. After randomly sampling one tenth of the proteins, 68-96% of the folds were identified. These percentages depend both on folds distribution and biased/non-biased sampling. Only upon increasing the sampling bias for previously identified folds to 1,000, did the model result in a reduction of the number of proteins identified by an order of magnitude (approximately 9%). Thus, assuming the structures of one tenth of the population of proteins in nature have been solved, the results of the Monte Carlo simulation are more consistent with recent lower estimates of the number of folds, 相似文献   

14.
Very long model chains may be produced in a highly efficient manner using dynamic Monte Carlo methods. As any dynamic Monte Carlo procedure transforms one chain into another one, some starting configuration is necessary. This might be an unbiased self-avoiding walk (SAW) obtained by any static method, or an arbitrary configuration, e.g. a rodlike chain, equilibrated by a sufficiently large number of relaxations, the corresponding chains not being used for data sampling. An alternative method is to start with a non reversal random walk (NRRW) and to apply a dynamic Monte Carlo procedure under the constraint that the new chain must have a smaller (or at least an equal) number of double occupancies than the old one. The properties of those chains that are free of overlaps for the first time (FSAWs) are strongly dependent on the relaxation mechanism chosen. Whereas FSAWs obtained by local motions are very similar to the (initial) NRRWs on a macroscopic scale, pivot algorithms and reptation yield configurations with properties comparable to unbiased self-avoiding chains. When reptation is used and the relaxation is continued until each bond of the initial NRRW is replaced by a new bond (if the chain is self-avoiding earlier) no further equilibration is necessary prior to data sampling.  相似文献   

15.
We introduce the Bayesian skyline plot, a new method for estimating past population dynamics through time from a sample of molecular sequences without dependence on a prespecified parametric model of demographic history. We describe a Markov chain Monte Carlo sampling procedure that efficiently samples a variant of the generalized skyline plot, given sequence data, and combines these plots to generate a posterior distribution of effective population size through time. We apply the Bayesian skyline plot to simulated data sets and show that it correctly reconstructs demographic history under canonical scenarios. Finally, we compare the Bayesian skyline plot model to previous coalescent approaches by analyzing two real data sets (hepatitis C virus in Egypt and mitochondrial DNA of Beringian bison) that have been previously investigated using alternative coalescent methods. In the bison analysis, we detect a severe but previously unrecognized bottleneck, estimated to have occurred 10,000 radiocarbon years ago, which coincides with both the earliest undisputed record of large numbers of humans in Alaska and the megafaunal extinctions in North America at the beginning of the Holocene.  相似文献   

16.
Bayesian inference is a powerful statistical paradigm that has gained popularity in many fields of science, but adoption has been somewhat slower in biophysics. Here, I provide an accessible tutorial on the use of Bayesian methods by focusing on example applications that will be familiar to biophysicists. I first discuss the goals of Bayesian inference and show simple examples of posterior inference using conjugate priors. I then describe Markov chain Monte Carlo sampling and, in particular, discuss Gibbs sampling and Metropolis random walk algorithms with reference to detailed examples. These Bayesian methods (with the aid of Markov chain Monte Carlo sampling) provide a generalizable way of rigorously addressing parameter inference and identifiability for arbitrarily complicated models.  相似文献   

17.
Kenneth Lange 《Genetica》1995,96(1-2):107-117
The Dirichlet distribution provides a convenient conjugate prior for Bayesian analyses involving multinomial proportions. In particular, allele frequency estimation can be carried out with a Dirichlet prior. If data from several distinct populations are available, then the parameters characterizing the Dirichlet prior can be estimated by maximum likelihood and then used for allele frequency estimation in each of the separate populations. This empirical Bayes procedure tends to moderate extreme multinomial estimates based on sample proportions. The Dirichlet distribution can also be employed to model the contributions from different ancestral populations in computing forensic match probabilities. If the ancestral populations are in genetic equilibrium, then the product rule for computing match probabilities is valid conditional on the ancestral contributions to a typical person of the reference population. This fact facilitates computation of match probabilities and tight upper bounds to match probabilities.Editor's commentsThe author continues the formal Bayesian analysis introduced by Gjertson & Morris in this voluem. He invokes Dirichlet distributions, and so brings rigor to the discussion of the effects of population structure on match probabilities. The increased computational burden this approach entails should not be regarded as a hindrance.  相似文献   

18.
Sample data from a number of sub-populations are often investigated in order to integrate the findings of different research studies on a particular area. In case of compositional samples, like the allele frequencies collected at a single locus in different surveys, the data are independent multinomial vectors. Each multinomial distribution depends on a specific probability vector, that is, the unknown relative composition of the sub-population. A Bayesian hierarchy approach is proposed here to model the variability of the sub-composition vectors around a common mean with possibly different scales. The common mean can be seen as the relative composition of the aggregated population. Scale parameters are well known in Biology as the Wright's inbreeding coefficients. The method presented here extends some previous work by assuming less prior knowledge on the subject and constraints on the model. A relatively simple Monte Carlo algorithm is described to perform joint inferences on general and local compositions and inbreeding coefficients. The method is applied on two case studies. The first one is based on DNA samples from ten Italian regions at the loci TH01 and FES, obtained from a database currently used for forensic identification, in which inbreeding assessments can be crucial. The second application is based on a set of colour-blind sample rates in North-East Indian populations collected by Choudhury (1994). The Author found some controversial results from the classical test for comparing proportions. A clearer picture, instead, is obtained by the current Bayesian approach.  相似文献   

19.
Abstract

The Detailed Balance Energy-scaled Displacement Monte Carlo method that stems from the previously published Energy Scaled Displacement Monte Carlo method is presented. The results of tests performed on a dense Lennard-Jones liquid and on two particles in one dimension are reported.  相似文献   

20.
Stepwise regression is often used in ecology to identify critical factors. From a large number of possible predictors, the procedure selects the subset generating the highest coefficient of determination,R 2. This work presents a method for testing the significance of this coefficient. Monte Carlo simulations are used to calculate the statistical distribution ofR 2 under the null hypothesis that the response variable is independent of the predictors. The method is illustrated by an application to a previously published analysis of the Canadian lynx population cycle where more than 75% of the variance could be explained by four meteorological factors.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号