首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Kinney SK  Dunson DB 《Biometrics》2007,63(3):690-698
We address the problem of selecting which variables should be included in the fixed and random components of logistic mixed effects models for correlated data. A fully Bayesian variable selection is implemented using a stochastic search Gibbs sampler to estimate the exact model-averaged posterior distribution. This approach automatically identifies subsets of predictors having nonzero fixed effect coefficients or nonzero random effects variance, while allowing uncertainty in the model selection process. Default priors are proposed for the variance components and an efficient parameter expansion Gibbs sampler is developed for posterior computation. The approach is illustrated using simulated data and an epidemiologic example.  相似文献   

2.
Biomedical studies often collect multivariate event time data from multiple clusters (either subjects or groups) within each of which event times for individuals are correlated and the correlation may vary in different classes. In such survival analyses, heterogeneity among clusters for shared and specific classes can be accommodated by incorporating parametric frailty terms into the model. In this article, we propose a Bayesian approach to relax the parametric distribution assumption for shared and specific‐class frailties by using a Dirichlet process prior while also allowing for the uncertainty of heterogeneity for different classes. Multiple cluster‐specific frailty selections rely on variable selection‐type mixture priors by applying mixtures of point masses at zero and inverse gamma distributions to the variance of log frailties. This selection allows frailties with zero variance to effectively drop out of the model. A reparameterization of log‐frailty terms is performed to reduce the potential bias of fixed effects due to variation of the random distribution and dependence among the parameters resulting in easy interpretation and faster Markov chain Monte Carlo convergence. Simulated data examples and an application to a lung cancer clinical trial are used for illustration.  相似文献   

3.
Summary In this rejoinder, we discuss the impact of misspecifying the random effects distribution on inferences obtained from generalized linear mixed models (GLMMs). Special attention is paid to the power of the tests for the fixed‐effect parameters. To study this misspecification, researchers often use simulation designs in which several choices for the true underlying random‐effects distribution are considered, while the assumed distribution is kept fixed. Neuhaus, McCulloch, and Boylan (2010, Biometrics 00 , 000–000) argue that a logically correct approach should consist of varying the assumed, fitted distribution, while holding the true fixed. We argue that both simulation designs can bring valuable insights into the impact of the misspecification. Furthermore, using both designs, we illustrate that the power associated with the tests for the fixed‐effect parameters in GLMM may be affected by misspecifying the random‐effects distribution.  相似文献   

4.
We explore a Bayesian approach to selection of variables that represent fixed and random effects in modeling of longitudinal binary outcomes with missing data caused by dropouts. We show via analytic results for a simple example that nonignorable missing data lead to biased parameter estimates. This bias results in selection of wrong effects asymptotically, which we can confirm via simulations for more complex settings. By jointly modeling the longitudinal binary data with the dropout process that possibly leads to nonignorable missing data, we are able to correct the bias in estimation and selection. Mixture priors with a point mass at zero are used to facilitate variable selection. We illustrate the proposed approach using a clinical trial for acute ischemic stroke.  相似文献   

5.
Cai B  Dunson DB 《Biometrics》2006,62(2):446-457
The generalized linear mixed model (GLMM), which extends the generalized linear model (GLM) to incorporate random effects characterizing heterogeneity among subjects, is widely used in analyzing correlated and longitudinal data. Although there is often interest in identifying the subset of predictors that have random effects, random effects selection can be challenging, particularly when outcome distributions are nonnormal. This article proposes a fully Bayesian approach to the problem of simultaneous selection of fixed and random effects in GLMMs. Integrating out the random effects induces a covariance structure on the multivariate outcome data, and an important problem that we also consider is that of covariance selection. Our approach relies on variable selection-type mixture priors for the components in a special Cholesky decomposition of the random effects covariance. A stochastic search MCMC algorithm is developed, which relies on Gibbs sampling, with Taylor series expansions used to approximate intractable integrals. Simulated data examples are presented for different exponential family distributions, and the approach is applied to discrete survival data from a time-to-pregnancy study.  相似文献   

6.
In longitudinal studies, measurements of the same individuals are taken repeatedly through time. Often, the primary goal is to characterize the change in response over time and the factors that influence change. Factors can affect not only the location but also more generally the shape of the distribution of the response over time. To make inference about the shape of a population distribution, the widely popular mixed-effects regression, for example, would be inadequate, if the distribution is not approximately Gaussian. We propose a novel linear model for quantile regression (QR) that includes random effects in order to account for the dependence between serial observations on the same subject. The notion of QR is synonymous with robust analysis of the conditional distribution of the response variable. We present a likelihood-based approach to the estimation of the regression quantiles that uses the asymmetric Laplace density. In a simulation study, the proposed method had an advantage in terms of mean squared error of the QR estimator, when compared with the approach that considers penalized fixed effects. Following our strategy, a nearly optimal degree of shrinkage of the individual effects is automatically selected by the data and their likelihood. Also, our model appears to be a robust alternative to the mean regression with random effects when the location parameter of the conditional distribution of the response is of interest. We apply our model to a real data set which consists of self-reported amount of labor pain measurements taken on women repeatedly over time, whose distribution is characterized by skewness, and the significance of the parameters is evaluated by the likelihood ratio statistic.  相似文献   

7.
Linear‐mixed models are frequently used to obtain model‐based estimators in small area estimation (SAE) problems. Such models, however, are not suitable when the target variable exhibits a point mass at zero, a highly skewed distribution of the nonzero values and a strong spatial structure. In this paper, a SAE approach for dealing with such variables is suggested. We propose a two‐part random effects SAE model that includes a correlation structure on the area random effects that appears in the two parts and incorporates a bivariate smooth function of the geographical coordinates of units. To account for the skewness of the distribution of the positive values of the response variable, a Gamma model is adopted. To fit the model, to get small area estimates and to evaluate their precision, a hierarchical Bayesian approach is used. The study is motivated by a real SAE problem. We focus on estimation of the per‐farm average grape wine production in Tuscany, at subregional level, using the Farm Structure Survey data. Results from this real data application and those obtained by a model‐based simulation experiment show a satisfactory performance of the suggested SAE approach.  相似文献   

8.
Keightley PD  Halligan DL 《Genetics》2011,188(4):931-940
Sequencing errors and random sampling of nucleotide types among sequencing reads at heterozygous sites present challenges for accurate, unbiased inference of single-nucleotide polymorphism genotypes from high-throughput sequence data. Here, we develop a maximum-likelihood approach to estimate the frequency distribution of the number of alleles in a sample of individuals (the site frequency spectrum), using high-throughput sequence data. Our method assumes binomial sampling of nucleotide types in heterozygotes and random sequencing error. By simulations, we show that close to unbiased estimates of the site frequency spectrum can be obtained if the error rate per base read does not exceed the population nucleotide diversity. We also show that these estimates are reasonably robust if errors are nonrandom. We then apply the method to infer site frequency spectra for zerofold degenerate, fourfold degenerate, and intronic sites of protein-coding genes using the low coverage human sequence data produced by the 1000 Genomes Project phase-one pilot. By fitting a model to the inferred site frequency spectra that estimates parameters of the distribution of fitness effects of new mutations, we find evidence for significant natural selection operating on fourfold sites. We also find that a model with variable effects of mutations at synonymous sites fits the data significantly better than a model with equal mutational effects. Under the variable effects model, we infer that 11% of synonymous mutations are subject to strong purifying selection.  相似文献   

9.
In nature, selection varies across time in most environments, but we lack an understanding of how specific ecological changes drive this variation. Ecological factors can alter phenotypic selection coefficients through changes in trait distributions or individual mean fitness, even when the trait‐absolute fitness relationship remains constant. We apply and extend a regression‐based approach in a population of Soay sheep (Ovis aries) and suggest metrics of environment‐selection relationships that can be compared across studies. We then introduce a novel method that constructs an environmentally structured fitness function. This allows calculation of full (as in existing approaches) and partial (acting separately through the absolute fitness function slope, mean fitness, and phenotype distribution) sensitivities of selection to an ecological variable. Both approaches show positive overall effects of density on viability selection of lamb mass. However, the second approach demonstrates that this relationship is largely driven by effects of density on mean fitness, rather than on the trait‐fitness relationship slope. If such mechanisms of environmental dependence of selection are common, this could have important implications regarding the frequency of fluctuating selection, and how previous selection inferences relate to longer term evolutionary dynamics.  相似文献   

10.
We study a linear mixed effects model for longitudinal data, where the response variable and covariates with fixed effects are subject to measurement error. We propose a method of moment estimation that does not require any assumption on the functional forms of the distributions of random effects and other random errors in the model. For a classical measurement error model we apply the instrumental variable approach to ensure identifiability of the parameters. Our methodology, without instrumental variables, can be applied to Berkson measurement errors. Using simulation studies, we investigate the finite sample performances of the estimators and show the impact of measurement error on the covariates and the response on the estimation procedure. The results show that our method performs quite satisfactory, especially for the fixed effects with measurement error (even under misspecification of measurement error model). This method is applied to a real data example of a large birth and child cohort study.  相似文献   

11.
We propose a state space model for analyzing equally or unequally spaced longitudinal count data with serial correlation. With a log link function, the mean of the Poisson response variable is a nonlinear function of the fixed and random effects. The random effects are assumed to be generated from a Gaussian first order autoregression (AR(1)). In this case, the mean of the observations has a log normal distribution. We use a combination of linear and nonlinear methods to take advantage of the Gaussian process embedded in a nonlinear function. The state space model uses a modified Kalman filter recursion to estimate the mean and variance of the AR(1) random error given the previous observations. The marginal likelihood is approximated by numerically integrating out the AR(1) random error. Simulation studies with different sets of parameters show that the state space model performs well. The model is applied to Epileptic Seizure data and Primary Care Visits Data. Missing and unequally spaced observations are handled naturally with this model.  相似文献   

12.
Bayesian Inference in Semiparametric Mixed Models for Longitudinal Data   总被引:1,自引:0,他引:1  
Summary .  We consider Bayesian inference in semiparametric mixed models (SPMMs) for longitudinal data. SPMMs are a class of models that use a nonparametric function to model a time effect, a parametric function to model other covariate effects, and parametric or nonparametric random effects to account for the within-subject correlation. We model the nonparametric function using a Bayesian formulation of a cubic smoothing spline, and the random effect distribution using a normal distribution and alternatively a nonparametric Dirichlet process (DP) prior. When the random effect distribution is assumed to be normal, we propose a uniform shrinkage prior (USP) for the variance components and the smoothing parameter. When the random effect distribution is modeled nonparametrically, we use a DP prior with a normal base measure and propose a USP for the hyperparameters of the DP base measure. We argue that the commonly assumed DP prior implies a nonzero mean of the random effect distribution, even when a base measure with mean zero is specified. This implies weak identifiability for the fixed effects, and can therefore lead to biased estimators and poor inference for the regression coefficients and the spline estimator of the nonparametric function. We propose an adjustment using a postprocessing technique. We show that under mild conditions the posterior is proper under the proposed USP, a flat prior for the fixed effect parameters, and an improper prior for the residual variance. We illustrate the proposed approach using a longitudinal hormone dataset, and carry out extensive simulation studies to compare its finite sample performance with existing methods.  相似文献   

13.
Summary It is of great practical interest to simultaneously identify the important predictors that correspond to both the fixed and random effects components in a linear mixed‐effects (LME) model. Typical approaches perform selection separately on each of the fixed and random effect components. However, changing the structure of one set of effects can lead to different choices of variables for the other set of effects. We propose simultaneous selection of the fixed and random factors in an LME model using a modified Cholesky decomposition. Our method is based on a penalized joint log likelihood with an adaptive penalty for the selection and estimation of both the fixed and random effects. It performs model selection by allowing fixed effects or standard deviations of random effects to be exactly zero. A constrained expectation–maximization algorithm is then used to obtain the final estimates. It is further shown that the proposed penalized estimator enjoys the Oracle property, in that, asymptotically it performs as well as if the true model was known beforehand. We demonstrate the performance of our method based on a simulation study and a real data example.  相似文献   

14.
We propose criteria for variable selection in the mean model and for the selection of a working correlation structure in longitudinal data with dropout missingness using weighted generalized estimating equations. The proposed criteria are based on a weighted quasi‐likelihood function and a penalty term. Our simulation results show that the proposed criteria frequently select the correct model in candidate mean models. The proposed criteria also have good performance in selecting the working correlation structure for binary and normal outcomes. We illustrate our approaches using two empirical examples. In the first example, we use data from a randomized double‐blind study to test the cancer‐preventing effects of beta carotene. In the second example, we use longitudinal CD4 count data from a randomized double‐blind study.  相似文献   

15.
The widely used “Maxent” software for modeling species distributions from presence‐only data (Phillips et al., Ecological Modelling, 190, 2006, 231) tends to produce models with high‐predictive performance but low‐ecological interpretability, and implications of Maxent's statistical approach to variable transformation, model fitting, and model selection remain underappreciated. In particular, Maxent's approach to model selection through lasso regularization has been shown to give less parsimonious distribution models—that is, models which are more complex but not necessarily predictively better—than subset selection. In this paper, we introduce the MIAmaxent R package, which provides a statistical approach to modeling species distributions similar to Maxent's, but with subset selection instead of lasso regularization. The simpler models typically produced by subset selection are ecologically more interpretable, and making distribution models more grounded in ecological theory is a fundamental motivation for using MIAmaxent. To that end, the package executes variable transformation based on expected occurrence–environment relationships and contains tools for exploring data and interrogating models in light of knowledge of the modeled system. Additionally, MIAmaxent implements two different kinds of model fitting: maximum entropy fitting for presence‐only data and logistic regression (GLM) for presence–absence data. Unlike Maxent, MIAmaxent decouples variable transformation, model fitting, and model selection, which facilitates methodological comparisons and gives the modeler greater flexibility when choosing a statistical approach to a given distribution modeling problem.  相似文献   

16.
If the number of treatments in a network meta‐analysis is large, it may be possible and useful to model the main effect of treatment as random, that is to say as random realizations from a normal distribution of possible treatment effects. This then constitutes a third sort of random effect that may be considered in connection with such analyses. The first and most common models treatment‐by‐trial interaction as being random and the second, rather rarer, models the main effects of trial as being random and thus permits the recovery of intertrial information. Taking the example of a network meta‐analysis of 44 similar treatments in 10 trials, we illustrate how a hierarchical approach to modeling a random main effect of treatment can be used to produce shrunk (toward the overall mean) estimates of effects for individual treatments. As a related problem, we also consider the issue of using a random‐effect model for the within‐trial variances from trial to trial. We provide a number of possible graphical representations of the results and discuss the advantages and disadvantages of such an approach.  相似文献   

17.
Statistical models support medical research by facilitating individualized outcome prognostication conditional on independent variables or by estimating effects of risk factors adjusted for covariates. Theory of statistical models is well‐established if the set of independent variables to consider is fixed and small. Hence, we can assume that effect estimates are unbiased and the usual methods for confidence interval estimation are valid. In routine work, however, it is not known a priori which covariates should be included in a model, and often we are confronted with the number of candidate variables in the range 10–30. This number is often too large to be considered in a statistical model. We provide an overview of various available variable selection methods that are based on significance or information criteria, penalized likelihood, the change‐in‐estimate criterion, background knowledge, or combinations thereof. These methods were usually developed in the context of a linear regression model and then transferred to more generalized linear models or models for censored survival data. Variable selection, in particular if used in explanatory modeling where effect estimates are of central interest, can compromise stability of a final model, unbiasedness of regression coefficients, and validity of p‐values or confidence intervals. Therefore, we give pragmatic recommendations for the practicing statistician on application of variable selection methods in general (low‐dimensional) modeling problems and on performing stability investigations and inference. We also propose some quantities based on resampling the entire variable selection process to be routinely reported by software packages offering automated variable selection algorithms.  相似文献   

18.
Mixed models are now well‐established methods in ecology and evolution because they allow accounting for and quantifying within‐ and between‐individual variation. However, the required normal distribution of the random effects can often be violated by the presence of clusters among subjects, which leads to multi‐modal distributions. In such cases, using what is known as mixture regression models might offer a more appropriate approach. These models are widely used in psychology, sociology, and medicine to describe the diversity of trajectories occurring within a population over time (e.g. psychological development, growth). In ecology and evolution, however, these models are seldom used even though understanding changes in individual trajectories is an active area of research in life‐history studies. Our aim is to demonstrate the value of using mixture models to describe variation in individual life‐history tactics within a population, and hence to promote the use of these models by ecologists and evolutionary ecologists. We first ran a set of simulations to determine whether and when a mixture model allows teasing apart latent clustering, and to contrast the precision and accuracy of estimates obtained from mixture models versus mixed models under a wide range of ecological contexts. We then used empirical data from long‐term studies of large mammals to illustrate the potential of using mixture models for assessing within‐population variation in life‐history tactics. Mixture models performed well in most cases, except for variables following a Bernoulli distribution and when sample size was small. The four selection criteria we evaluated [Akaike information criterion (AIC), Bayesian information criterion (BIC), and two bootstrap methods] performed similarly well, selecting the right number of clusters in most ecological situations. We then showed that the normality of random effects implicitly assumed by evolutionary ecologists when using mixed models was often violated in life‐history data. Mixed models were quite robust to this violation in the sense that fixed effects were unbiased at the population level. However, fixed effects at the cluster level and random effects were better estimated using mixture models. Our empirical analyses demonstrated that using mixture models facilitates the identification of the diversity of growth and reproductive tactics occurring within a population. Therefore, using this modelling framework allows testing for the presence of clusters and, when clusters occur, provides reliable estimates of fixed and random effects for each cluster of the population. In the presence or expectation of clusters, using mixture models offers a suitable extension of mixed models, particularly when evolutionary ecologists aim at identifying how ecological and evolutionary processes change within a population. Mixture regression models therefore provide a valuable addition to the statistical toolbox of evolutionary ecologists. As these models are complex and have their own limitations, we provide recommendations to guide future users.  相似文献   

19.
Darwinian evolution consists of the gradual transformation of heritable traits due to natural selection and the input of random variation by mutation. Here, we use a quantitative genetics approach to investigate the coevolution of multiple quantitative traits under selection, mutation, and limited dispersal. We track the dynamics of trait means and of variance–covariances between traits that experience frequency‐dependent selection. Assuming a multivariate‐normal trait distribution, we recover classical dynamics of quantitative genetics, as well as stability and evolutionary branching conditions of invasion analyses, except that due to limited dispersal, selection depends on indirect fitness effects and relatedness. In particular, correlational selection that associates different traits within‐individuals depends on the fitness effects of such associations between‐individuals. We find that these kin selection effects can be as relevant as pleiotropy for the evolution of correlation between traits. We illustrate this with an example of the coevolution of two social traits whose association within‐individuals is costly but synergistically beneficial between‐individuals. As dispersal becomes limited and relatedness increases, associations between‐traits between‐individuals become increasingly targeted by correlational selection. Consequently, the trait distribution goes from being bimodal with a negative correlation under panmixia to unimodal with a positive correlation under limited dispersal.  相似文献   

20.
Comparisons of the strength and form of phenotypic selection among groups provide a powerful approach for testing adaptive hypotheses. A central and largely unaddressed issue is how fitness and phenotypes are standardized in such studies; standardization across or within groups can qualitatively change conclusions whenever mean fitness differs between groups. We briefly reviewed recent relevant literature, and found that selection studies vary widely in their scale of standardization, but few investigators motivated their rationale for chosen standardization approaches. Here, we propose that the scale at which fitness should be relativized should reflect whether selection is likely to be hard or soft; that is, the scale at which populations (or hypothetical populations in the case of a contrived experiment) are regulated. We argue that many comparative studies of selection are implicitly or explicitly focused on soft selection (i.e., frequency and density‐dependent selection). In such studies, relative fitness should preferably be calculated using within‐group means, although this approach is taken only occasionally. Related difficulties arise for the standardization of phenotypes. The appropriate scale at which standardization should take place depends on whether groups are considered to be fixed or random. We emphasize that the scale of standardization is a critical decision in empirical studies of selection that should always warrant explicit justification.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号