首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Kitada S  Hayashi T  Kishino H 《Genetics》2000,156(4):2063-2079
We developed an empirical Bayes procedure to estimate genetic distances between populations using allele frequencies. This procedure makes it possible to describe the skewness of the genetic distance while taking full account of the uncertainty of the sample allele frequencies. Dirichlet priors of the allele frequencies are specified, and the posterior distributions of the various composite parameters are obtained by Monte Carlo simulation. To avoid overdependence on subjective priors, we adopt a hierarchical model and estimate hyperparameters by maximizing the joint marginal-likelihood function. Taking advantage of the empirical Bayesian procedure, we extend the method to estimate the effective population size using temporal changes in allele frequencies. The method is applied to data sets on red sea bream, herring, northern pike, and ayu broodstock. It is shown that overdispersion overestimates the genetic distance and underestimates the effective population size, if it is not taken into account during the analysis. The joint marginal-likelihood function also estimates the rate of gene flow into island populations.  相似文献   

2.
Rosner GL 《Biometrics》2005,61(1):239-245
This article presents an aid for monitoring clinical trials with failure-time endpoints based on the Bayesian nonparametric analyses of the data. The posterior distribution is a mixture of Dirichlet processes in the presence of censoring if one assumes a Dirichlet process prior for the survival distribution. Using Gibbs sampling, one can generate random samples from the posterior distribution. With samples from the posterior distributions of treatment-specific survival curves, one can evaluate the current evidence in favor of stopping or continuing the trial based on summary statistics of these survival curves. Because the method is nonparametric, it can easily be used, for example, in situations where hazards cross or are suspected to cross and where relevant clinical decisions might be based on estimating when the integral between the curves might be expected to become positive and in favor of the new but toxic therapy. An example based on an actual trial illustrates the method.  相似文献   

3.
Kenneth Lange 《Genetica》1995,96(1-2):107-117
The Dirichlet distribution provides a convenient conjugate prior for Bayesian analyses involving multinomial proportions. In particular, allele frequency estimation can be carried out with a Dirichlet prior. If data from several distinct populations are available, then the parameters characterizing the Dirichlet prior can be estimated by maximum likelihood and then used for allele frequency estimation in each of the separate populations. This empirical Bayes procedure tends to moderate extreme multinomial estimates based on sample proportions. The Dirichlet distribution can also be employed to model the contributions from different ancestral populations in computing forensic match probabilities. If the ancestral populations are in genetic equilibrium, then the product rule for computing match probabilities is valid conditional on the ancestral contributions to a typical person of the reference population. This fact facilitates computation of match probabilities and tight upper bounds to match probabilities.Editor's commentsThe author continues the formal Bayesian analysis introduced by Gjertson & Morris in this voluem. He invokes Dirichlet distributions, and so brings rigor to the discussion of the effects of population structure on match probabilities. The increased computational burden this approach entails should not be regarded as a hindrance.  相似文献   

4.
Accurately estimating probabilities from observations is important for probabilistic-based approaches to problems in computational biology. In this paper we present a biologically-motivated method for estimating probability distributions over discrete alphabets from observations using a mixture model of common ancestors. The method is an extension of substitution matrix-based probability estimation methods. In contrast to previous such methods, our method has a simple Bayesian interpretation and has the advantage over Dirichlet mixtures that it is both effective and simple to compute for large alphabets. The method is applied to estimate amino acid probabilities based on observed counts in an alignment and is shown to perform comparably to previous methods. The method is also applied to estimate probability distributions over protein families and improves protein classification accuracy.  相似文献   

5.
Accurately estimating probabilities from observations is important for probabilistic-based approaches to problems in computational biology. In this paper we present a biologically-motivated method for estimating probability distributions over discrete alphabets from observations using a mixture model of common ancestors. The method is an extension of substitution matrix-based probability estimation methods. In contrast to previous such methods, our method has a simple Bayesian interpretation and has the advantage over Dirichlet mixtures that it is both effective and simple to compute for large alphabets. The method is applied to estimate amino acid probabilities based on observed counts in an alignment and is shown to perform comparably to previous methods. The method is also applied to estimate probability distributions over protein families and improves protein classification accuracy.  相似文献   

6.
We present a Bayesian method for characterizing the mating system of populations reproducing through a mixture of self-fertilization and random outcrossing. Our method uses patterns of genetic variation across the genome as a basis for inference about reproduction under pure hermaphroditism, gynodioecy, and a model developed to describe the self-fertilizing killifish Kryptolebias marmoratus. We extend the standard coalescence model to accommodate these mating systems, accounting explicitly for multilocus identity disequilibrium, inbreeding depression, and variation in fertility among mating types. We incorporate the Ewens sampling formula (ESF) under the infinite-alleles model of mutation to obtain a novel expression for the likelihood of mating system parameters. Our Markov chain Monte Carlo (MCMC) algorithm assigns locus-specific mutation rates, drawn from a common mutation rate distribution that is itself estimated from the data using a Dirichlet process prior model. Our sampler is designed to accommodate additional information, including observations pertaining to the sex ratio, the intensity of inbreeding depression, and other aspects of reproduction. It can provide joint posterior distributions for the population-wide proportion of uniparental individuals, locus-specific mutation rates, and the number of generations since the most recent outcrossing event for each sampled individual. Further, estimation of all basic parameters of a given model permits estimation of functions of those parameters, including the proportion of the gene pool contributed by each sex and relative effective numbers.  相似文献   

7.
The restricted mean survival time (RMST) evaluates the expectation of survival time truncated by a prespecified time point, because the mean survival time in the presence of censoring is typically not estimable. The frequentist inference procedure for RMST has been widely advocated for comparison of two survival curves, while research from the Bayesian perspective is rather limited. For the RMST of both right- and interval-censored data, we propose Bayesian nonparametric estimation and inference procedures. By assigning a mixture of Dirichlet processes (MDP) prior to the distribution function, we can estimate the posterior distribution of RMST. We also explore another Bayesian nonparametric approach using the Dirichlet process mixture model and make comparisons with the frequentist nonparametric method. Simulation studies demonstrate that the Bayesian nonparametric RMST under diffuse MDP priors leads to robust estimation and under informative priors it can incorporate prior knowledge into the nonparametric estimator. Analysis of real trial examples demonstrates the flexibility and interpretability of the Bayesian nonparametric RMST for both right- and interval-censored data.  相似文献   

8.
Finite mixtures of Gaussian distributions are known to provide an accurate approximation to any unknown density. Motivated by DNA repair studies in which data are collected for samples of cells from different individuals, we propose a class of hierarchically weighted finite mixture models. The modeling framework incorporates a collection of k Gaussian basis distributions, with the individual-specific response densities expressed as mixtures of these bases. To allow heterogeneity among individuals and predictor effects, we model the mixture weights, while treating the basis distributions as unknown but common to all distributions. This results in a flexible hierarchical model for samples of distributions. We consider analysis of variance-type structures and a parsimonious latent factor representation, which leads to simplified inferences on non-Gaussian covariance structures. Methods for posterior computation are developed, and the model is used to select genetic predictors of baseline DNA damage, susceptibility to induced damage, and rate of repair.  相似文献   

9.
Brown ER  Ibrahim JG 《Biometrics》2003,59(2):221-228
This article proposes a new semiparametric Bayesian hierarchical model for the joint modeling of longitudinal and survival data. We relax the distributional assumptions for the longitudinal model using Dirichlet process priors on the parameters defining the longitudinal model. The resulting posterior distribution of the longitudinal parameters is free of parametric constraints, resulting in more robust estimates. This type of approach is becoming increasingly essential in many applications, such as HIV and cancer vaccine trials, where patients' responses are highly diverse and may not be easily modeled with known distributions. An example will be presented from a clinical trial of a cancer vaccine where the survival outcome is time to recurrence of a tumor. Immunologic measures believed to be predictive of tumor recurrence were taken repeatedly during follow-up. We will present an analysis of this data using our new semiparametric Bayesian hierarchical joint modeling methodology to determine the association of these longitudinal immunologic measures with time to tumor recurrence.  相似文献   

10.
In the case of the mixed linear model the random effects are usually assumed to be normally distributed in both the Bayesian and classical frameworks. In this paper, the Dirichlet process prior was used to provide nonparametric Bayesian estimates for correlated random effects. This goal was achieved by providing a Gibbs sampler algorithm that allows these correlated random effects to have a nonparametric prior distribution. A sampling based method is illustrated. This method which is employed by transforming the genetic covariance matrix to an identity matrix so that the random effects are uncorrelated, is an extension of the theory and the results of previous researchers. Also by using Gibbs sampling and data augmentation a simulation procedure was derived for estimating the precision parameter M associated with the Dirichlet process prior. All needed conditional posterior distributions are given. To illustrate the application, data from the Elsenburg Dormer sheep stud were analysed. A total of 3325 weaning weight records from the progeny of 101 sires were used.  相似文献   

11.
Studies of latent traits often collect data for multiple items measuring different aspects of the trait. For such data, it is common to consider models in which the different items are manifestations of a normal latent variable, which depends on covariates through a linear regression model. This article proposes a flexible Bayesian alternative in which the unknown latent variable density can change dynamically in location and shape across levels of a predictor. Scale mixtures of underlying normals are used in order to model flexibly the measurement errors and allow mixed categorical and continuous scales. A dynamic mixture of Dirichlet processes is used to characterize the latent response distributions. Posterior computation proceeds via a Markov chain Monte Carlo algorithm, with predictive densities used as a basis for inferences and evaluation of model fit. The methods are illustrated using data from a study of DNA damage in response to oxidative stress.  相似文献   

12.
A Bayesian nonparametric form of regression based on Dirichlet process priors is adapted to the analysis of quantitative traits possibly affected by cryptic forms of gene action, and to the context of SNP-assisted genomic selection, where the main objective is to predict a genomic signal on phenotype. The procedure clusters unknown genotypes into groups with distinct genetic values, but in a setting in which the number of clusters is unknown a priori, so that standard methods for finite mixture analysis do not work. The central assumption is that genetic effects follow an unknown distribution with some “baseline” family, which is a normal process in the cases considered here. A Bayesian analysis based on the Gibbs sampler produces estimates of the number of clusters, posterior means of genetic effects, a measure of credibility in the baseline distribution, as well as estimates of parameters of the latter. The procedure is illustrated with a simulation representing two populations. In the first one, there are 3 unknown QTL, with additive, dominance and epistatic effects; in the second, there are 10 QTL with additive, dominance and additive × additive epistatic effects. In the two populations, baseline parameters are inferred correctly. The Dirichlet process model infers the number of unique genetic values correctly in the first population, but it produces an understatement in the second one; here, the true number of clusters is over 900, and the model gives a posterior mean estimate of about 140, probably because more replication of genotypes is needed for correct inference. The impact on inferences of the prior distribution of a key parameter (M), and of the extent of replication, was examined via an analysis of mean body weight in 192 paternal half-sib families of broiler chickens, where each sire was genotyped for nearly 7,000 SNPs. In this small sample, it was found that inference about the number of clusters was affected by the prior distribution of M. For a set of combinations of parameters of a given prior distribution, the effects of the prior dissipated when the number of replicate samples per genotype was increased. Thus, the Dirichlet process model seems to be useful for gauging the number of QTLs affecting the trait: if the number of clusters inferred is small, probably just a few QTLs code for the trait. If the number of clusters inferred is large, this may imply that standard parametric models based on the baseline distribution may suffice. However, priors may be influential, especially if sample size is not large and if only a few genotypic configurations have replicate phenotypes in the sample.  相似文献   

13.
Holmes I  Harris K  Quince C 《PloS one》2012,7(2):e30126
We introduce Dirichlet multinomial mixtures (DMM) for the probabilistic modelling of microbial metagenomics data. This data can be represented as a frequency matrix giving the number of times each taxa is observed in each sample. The samples have different size, and the matrix is sparse, as communities are diverse and skewed to rare taxa. Most methods used previously to classify or cluster samples have ignored these features. We describe each community by a vector of taxa probabilities. These vectors are generated from one of a finite number of Dirichlet mixture components each with different hyperparameters. Observed samples are generated through multinomial sampling. The mixture components cluster communities into distinct 'metacommunities', and, hence, determine envirotypes or enterotypes, groups of communities with a similar composition. The model can also deduce the impact of a treatment and be used for classification. We wrote software for the fitting of DMM models using the 'evidence framework' (http://code.google.com/p/microbedmm/). This includes the Laplace approximation of the model evidence. We applied the DMM model to human gut microbe genera frequencies from Obese and Lean twins. From the model evidence four clusters fit this data best. Two clusters were dominated by Bacteroides and were homogenous; two had a more variable community composition. We could not find a significant impact of body mass on community structure. However, Obese twins were more likely to derive from the high variance clusters. We propose that obesity is not associated with a distinct microbiota but increases the chance that an individual derives from a disturbed enterotype. This is an example of the 'Anna Karenina principle (AKP)' applied to microbial communities: disturbed states having many more configurations than undisturbed. We verify this by showing that in a study of inflammatory bowel disease (IBD) phenotypes, ileal Crohn's disease (ICD) is associated with a more variable community.  相似文献   

14.
For a model of molecular evolution to be useful for phylogenetic inference, the topology of evolutionary trees must be identifiable. That is, from a joint distribution the model predicts, it must be possible to recover the tree parameter. We establish tree identifiability for a number of phylogenetic models, including a covarion model and a variety of mixture models with a limited number of classes. The proof is based on the introduction of a more general model, allowing more states at internal nodes of the tree than at leaves, and the study of the algebraic variety formed by the joint distributions to which it gives rise. Tree identifiability is first established for this general model through the use of certain phylogenetic invariants.  相似文献   

15.
This article discusses specific assumptions necessary for permutation multiple tests to control the Familywise Error Rate (FWER). At issue is that, in comparing parameters of the marginal distributions of two sets of multivariate observations, validity of permutation testing is affected by all the parameters in the joint distributions of the observations. We show the surprising fact that, in the case of a linear model with i.i.d. errors such as in the analysis of Quantitative Trait Loci (QTL), this issue has no impact on control of FWER, if the test statistic is of a particular form. On the other hand, in the analysis of gene expression levels or multiple safety endpoints, unless some assumption connecting the marginal distributions of the observations to their joint distributions is made, permutation multiple tests may not control FWER.  相似文献   

16.
Yi N  George V  Allison DB 《Genetics》2003,164(3):1129-1138
In this article, we utilize stochastic search variable selection methodology to develop a Bayesian method for identifying multiple quantitative trait loci (QTL) for complex traits in experimental designs. The proposed procedure entails embedding multiple regression in a hierarchical normal mixture model, where latent indicators for all markers are used to identify the multiple markers. The markers with significant effects can be identified as those with higher posterior probability included in the model. A simple and easy-to-use Gibbs sampler is employed to generate samples from the joint posterior distribution of all unknowns including the latent indicators, genetic effects for all markers, and other model parameters. The proposed method was evaluated using simulated data and illustrated using a real data set. The results demonstrate that the proposed method works well under typical situations of most QTL studies in terms of number of markers and marker density.  相似文献   

17.
Poly-γ-berizyl-L -glutamate prepared by polymerization of γ-benzyl-L -glutamate NCA in dimethylformamide (DMF) with the use of diisopropylamine as the initiator was precipitated from the polymerization mixture under different conditions. A portion of the almost completely polymerized solution was treated with an excess of isopropylamine and then precipitated into diethyl ether (sample A). The remaining portion of the polymerization mixture was concentrated in a rotating evaporator, stored at room temperature for a few days, and then diluted with DMF and precipitated into diethyl ether (sample B). The molecular weight distributions of the two polymer samples were determined by the chromatographic procedure of Baker and Williams. The molecular weight of sample B is roughly three times that of sample A. However both samples have the “most probable” distribution of molecular weight. The results are interpreted according to Bamford's polymerization mechanism.  相似文献   

18.
Modeling of developmental toxicity studies often requires simple parametric analyses of the dose-response relationship between exposure and probability of a birth defect but poses challenges because of nonstandard distributions of birth defects for a fixed level of exposure. This article is motivated by two such experiments in which the distribution of the outcome variable is challenging to both the standard logistic model with binomial response and its parametric multistage elaborations. We approach our analysis using a Bayesian semiparametric model that we tailored specifically to developmental toxicology studies. It combines parametric dose-response relationships with a flexible nonparametric specification of the distribution of the response, obtained via a product of Dirichlet process mixtures approach (PDPM). Our formulation achieves three goals: (1) the distribution of the response is modeled in a general way, (2) the degree to which the distribution of the response adapts nonparametrically to the observations is driven by the data, and (3) the marginal posterior distribution of the parameters of interest is available in closed form. The logistic regression model, as well as many of its extensions such as the beta-binomial model and finite mixture models, are special cases. In the context of the two motivating examples and a simulated example, we provide model comparisons, illustrate overdispersion diagnostics that can assist model specification, show how to derive posterior distributions of the effective dose parameters and predictive distributions of response, and discuss the sensitivity of the results to the choice of the prior distribution.  相似文献   

19.
Naskar M  Das K  Ibrahim JG 《Biometrics》2005,61(3):729-737
A very general class of multivariate life distributions is considered for analyzing failure time clustered data that are subject to censoring and multiple modes of failure. Conditional on cluster-specific quantities, the joint distribution of the failure time and event indicator can be expressed as a mixture of the distribution of time to failure due to a certain type (or specific cause), and the failure type distribution. We assume here the marginal probabilities of various failure types are logistic functions of some covariates. The cluster-specific quantities are subject to some unknown distribution that causes frailty. The unknown frailty distribution is modeled nonparametrically using a Dirichlet process. In such a semiparametric setup, a hybrid method of estimation is proposed based on the i.i.d. Weighted Chinese Restaurant algorithm that helps us generate observations from the predictive distribution of the frailty. The Monte Carlo ECM algorithm plays a vital role for obtaining the estimates of the parameters that assess the extent of the effects of the causal factors for failures of a certain type. A simulation study is conducted to study the consistency of our methodology. The proposed methodology is used to analyze a real data set on HIV infection of a cohort of female prostitutes in Senegal.  相似文献   

20.
Human microbiome research characterizes the microbial content of samples from human habitats to learn how interactions between bacteria and their host might impact human health. In this work a novel parametric statistical inference method based on object-oriented data analysis (OODA) for analyzing HMP data is proposed. OODA is an emerging area of statistical inference where the goal is to apply statistical methods to objects such as functions, images, and graphs or trees. The data objects that pertain to this work are taxonomic trees of bacteria built from analysis of 16S rRNA gene sequences (e.g. using RDP); there is one such object for each biological sample analyzed. Our goal is to model and formally compare a set of trees. The contribution of our work is threefold: first, a weighted tree structure to analyze RDP data is introduced; second, using a probability measure to model a set of taxonomic trees, we introduce an approximate MLE procedure for estimating model parameters and we derive LRT statistics for comparing the distributions of two metagenomic populations; and third the Jumpstart HMP data is analyzed using the proposed model providing novel insights and future directions of analysis.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号