首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 625 毫秒
1.
We analyze a real data set pertaining to reindeer fecal pellet‐group counts obtained from a survey conducted in a forest area in northern Sweden. In the data set, over 70% of counts are zeros, and there is high spatial correlation. We use conditionally autoregressive random effects for modeling of spatial correlation in a Poisson generalized linear mixed model (GLMM), quasi‐Poisson hierarchical generalized linear model (HGLM), zero‐inflated Poisson (ZIP), and hurdle models. The quasi‐Poisson HGLM allows for both under‐ and overdispersion with excessive zeros, while the ZIP and hurdle models allow only for overdispersion. In analyzing the real data set, we see that the quasi‐Poisson HGLMs can perform better than the other commonly used models, for example, ordinary Poisson HGLMs, spatial ZIP, and spatial hurdle models, and that the underdispersed Poisson HGLMs with spatial correlation fit the reindeer data best. We develop R codes for fitting these models using a unified algorithm for the HGLMs. Spatial count response with an extremely high proportion of zeros, and underdispersion can be successfully modeled using the quasi‐Poisson HGLM with spatial random effects.  相似文献   

2.
Cai B  Dunson DB 《Biometrics》2006,62(2):446-457
The generalized linear mixed model (GLMM), which extends the generalized linear model (GLM) to incorporate random effects characterizing heterogeneity among subjects, is widely used in analyzing correlated and longitudinal data. Although there is often interest in identifying the subset of predictors that have random effects, random effects selection can be challenging, particularly when outcome distributions are nonnormal. This article proposes a fully Bayesian approach to the problem of simultaneous selection of fixed and random effects in GLMMs. Integrating out the random effects induces a covariance structure on the multivariate outcome data, and an important problem that we also consider is that of covariance selection. Our approach relies on variable selection-type mixture priors for the components in a special Cholesky decomposition of the random effects covariance. A stochastic search MCMC algorithm is developed, which relies on Gibbs sampling, with Taylor series expansions used to approximate intractable integrals. Simulated data examples are presented for different exponential family distributions, and the approach is applied to discrete survival data from a time-to-pregnancy study.  相似文献   

3.
Yau KK 《Biometrics》2001,57(1):96-102
A method for modeling survival data with multilevel clustering is described. The Cox partial likelihood is incorporated into the generalized linear mixed model (GLMM) methodology. Parameter estimation is achieved by maximizing a log likelihood analogous to the likelihood associated with the best linear unbiased prediction (BLUP) at the initial step of estimation and is extended to obtain residual maximum likelihood (REML) estimators of the variance component. Estimating equations for a three-level hierarchical survival model are developed in detail, and such a model is applied to analyze a set of chronic granulomatous disease (CGD) data on recurrent infections as an illustration with both hospital and patient effects being considered as random. Only the latter gives a significant contribution. A simulation study is carried out to evaluate the performance of the REML estimators. Further extension of the estimation procedure to models with an arbitrary number of levels is also discussed.  相似文献   

4.
Data from a litter matched tumorigenesis experiment are analysed using a generalised linear mixed model (GLMM) approach to the analysis of clustered survival data in which there is a dependence of failure time observations within the same litter. Maximum likelihood (ML) and residual maximum likelihood (REML) estimates of risk variable parameters, variance component parameters and the prediction of random effects are given. Estimation of treatment effect parameter (carcinogen effect) has good agreement with previous analyses obtained in the literature though the dependence structure within a litter is modelled in different ways. The variance component estimation provides the estimated dispersion of the random effects. The prediction of random effects, is useful, for instance, in identifying high risk litters and individuals. The present analysis illustrates its wider application to detecting increased risk of occurrence of disease in particular families of a study population.  相似文献   

5.
Che X  Xu S 《Heredity》2012,109(1):41-49
Many biological traits are discretely distributed in phenotype but continuously distributed in genetics because they are controlled by multiple genes and environmental variants. Due to the quantitative nature of the genetic background, these multiple genes are called quantitative trait loci (QTL). When the QTL effects are treated as random, they can be estimated in a single generalized linear mixed model (GLMM), even if the number of QTL may be larger than the sample size. The GLMM in its original form cannot be applied to QTL mapping for discrete traits if there are missing genotypes. We examined two alternative missing genotype-handling methods: the expectation method and the overdispersion method. Simulation studies show that the two methods are efficient for multiple QTL mapping (MQM) under the GLMM framework. The overdispersion method showed slight advantages over the expectation method in terms of smaller mean-squared errors of the estimated QTL effects. The two methods of GLMM were applied to MQM for the female fertility trait of wheat. Multiple QTL were detected to control the variation of the number of seeded spikelets.  相似文献   

6.
We propose a general Bayesian approach to heteroskedastic error modeling for generalized linear mixed models (GLMM) in which linked functions of conditional means and residual variances are specified as separate linear combinations of fixed and random effects. We focus on the linear mixed model (LMM) analysis of birth weight (BW) and the cumulative probit mixed model (CPMM) analysis of calving ease (CE). The deviance information criterion (DIC) was demonstrated to be useful in correctly choosing between homoskedastic and heteroskedastic error GLMM for both traits when data was generated according to a mixed model specification for both location parameters and residual variances. Heteroskedastic error LMM and CPMM were fitted, respectively, to BW and CE data on 8847 Italian Piemontese first parity dams in which residual variances were modeled as functions of fixed calf sex and random herd effects. The posterior mean residual variance for male calves was over 40% greater than that for female calves for both traits. Also, the posterior means of the standard deviation of the herd-specific variance ratios (relative to a unitary baseline) were estimated to be 0.60 ± 0.09 for BW and 0.74 ± 0.14 for CE. For both traits, the heteroskedastic error LMM and CPMM were chosen over their homoskedastic error counterparts based on DIC values.  相似文献   

7.
A general statistical framework is proposed for comparing linear models of spatial process and pattern. A spatial linear model for nested analysis of variance can be based on either fixed effects or random effects. Greig-Smith (1952) originally used a fixed effects model, but there are also examples of random effects models in the soil science literature. Assuming intrinsic stationarity for a linear model, the expectations of a spatial nested ANOVA and two term local variance (TTLV, Hill 1973) are functions of the variogram, and several examples are given. Paired quadrat variance (PQV, Ludwig & Goodall 1978) is a variogram estimator which can be used to approximate TTLV, and we provide an example from ecological data. Both nested ANOVA and TTLV can be seen as weighted lag-1 variogram estimators that are functions of support, rather than distance. We show that there are two unbiased estimators for the variogram under aggregation, and computer simulation shows that the estimator with smaller variance depends on the process autocorrelation.  相似文献   

8.
Although most statistical methods for the analysis of longitudinal data have focused on retrospective models of association, new advances in mobile health data have presented opportunities for predicting future health status by leveraging an individual's behavioral history alongside data from similar patients. Methods that incorporate both individual-level and sample-level effects are critical to using these data to its full predictive capacity. Neural networks are powerful tools for prediction, but many assume input observations are independent even when they are clustered or correlated in some way, such as in longitudinal data. Generalized linear mixed models (GLMM) provide a flexible framework for modeling longitudinal data but have poor predictive power particularly when the data are highly nonlinear. We propose a generalized neural network mixed model that replaces the linear fixed effect in a GLMM with the output of a feed-forward neural network. The model simultaneously accounts for the correlation structure and complex nonlinear relationship between input variables and outcomes, and it utilizes the predictive power of neural networks. We apply this approach to predict depression and anxiety levels of schizophrenic patients using longitudinal data collected from passive smartphone sensor data.  相似文献   

9.
Auxiliary covariate data are often collected in biomedical studies when the primary exposure variable is only assessed on a subset of the study subjects. In this study, we investigate a semiparametric‐estimated likelihood estimation for the generalized linear mixed models (GLMM) in the presence of a continuous auxiliary variable. We use a kernel smoother to handle continuous auxiliary data. The method can be used to deal with missing or mismeasured covariate data problems in a variety of applications when an auxiliary variable is available and cluster sizes are not too small. Simulation study results show that the proposed method performs better than that which ignores the random effects in GLMM and that which only uses data in the validation data set. We illustrate the proposed method with a real data set from a recent environmental epidemiology study on the maternal serum 1,1‐dichloro‐2,2‐bis(p‐chlorophenyl) ethylene level in relationship to preterm births.  相似文献   

10.
Generalized linear model analyses of repeated measurements typically rely on simplifying mathematical models of the error covariance structure for testing the significance of differences in patterns of change across time. The robustness of the tests of significance depends, not only on the degree of agreement between the specified mathematical model and the actual population data structure, but also on the precision and robustness of the computational criteria for fitting the specified covariance structure to the data. Generalized estimating equation (GEE) solutions utilizing the robust empirical sandwich estimator for modeling of the error structure were compared with general linear mixed model (GLMM) solutions that utilized the commonly employed restricted maximum likelihood (REML) procedure. Under the conditions considered, the GEE and GLMM procedures were identical in assuming that the data are normally distributed and that the variance‐covariance structure of the data is the one specified by the user. The question addressed in this article concerns relative sensitivity of tests of significance for treatment effects to varying degrees of misspecification of the error covariance structure model when fitted by the alternative procedures. Simulated data that were subjected to monte carlo evaluation of actual Type I error and power of tests of the equal slopes hypothesis conformed to assumptions of ordinary linear model ANOVA for repeated measures except for autoregressive covariance structures and missing data due to dropouts. The actual within‐groups correlation structures of the simulated repeated measurements ranged from AR(1) to compound symmetry in graded steps, whereas the GEE and GLMM formulations restricted the respective error structure models to be either AR(1), compound symmetry (CS), or unstructured (UN). The GEE‐based tests utilizing empirical sandwich estimator criteria were documented to be relatively insensitive to misspecification of the covariance structure models, whereas GLMM tests which relied on restricted maximum likelihood (REML) were highly sensitive to relatively modest misspecification of the error correlation structure even though normality, variance homogeneity, and linearity were not an issue in the simulated data.Goodness‐of‐fit statistics were of little utility in identifying cases in which relatively minor misspecification of the GLMM error structure model resulted in inadequate alpha protection for tests of the equal slopes hypothesis. Both GEE and GLMM formulations that relied on unstructured (UN) error model specification produced nonconservative results regardless of the actual correlation structure of the repeated measurements. A random coefficients model produced robust tests with competitive power across all conditions examined. (© 2004 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

11.
We consider an extension of linear mixed models by assuming a multivariate skew t distribution for the random effects and a multivariate t distribution for the error terms. The proposed model provides flexibility in capturing the effects of skewness and heavy tails simultaneously among continuous longitudinal data. We present an efficient alternating expectation‐conditional maximization (AECM) algorithm for the computation of maximum likelihood estimates of parameters on the basis of two convenient hierarchical formulations. The techniques for the prediction of random effects and intermittent missing values under this model are also investigated. Our methodologies are illustrated through an application to schizophrenia data.  相似文献   

12.
The identification of core habitat areas and resulting prediction maps are vital tools for land managers. Often, agencies have large datasets from multiple studies over time that could be combined for a more informed and complete picture of a species. Colorado Parks and Wildlife has a large database for greater sage-grouse (Centrocercus urophasianus) including 11 radio-telemetry studies completed over 12 years (1997–2008) across northwestern Colorado. We divided the 49,470-km2 study area into 1-km2 grids with the number of sage-grouse locations in each grid cell that contained at least 1 location counted as the response variable. We used a generalized linear mixed model (GLMM) using land cover variables as fixed effects and individual birds and populations as random effects to predict greater sage-grouse location counts during breeding, summer, and winter seasons. The mixed effects model enabled us to model correlations that may exist in grouped data (e.g., correlations among individuals and populations). We found only individual groupings accounted for variation in the summer and breeding seasons, but not the winter season. The breeding and summer seasonal models predicted sage-grouse presence in the currently delineated populations for Colorado, but we found little evidence supporting a winter season model. According to our models, about 50% of the study area in Colorado is considered highly or moderately suitable habitat in both the breeding and summer seasons. As oil and gas development and other landscape changes occur in this portion of Colorado, knowledge of where management actions can be accomplished or possible restoration can occur becomes more critical. These seasonal models provide data-driven, distribution maps that managers and biologists can use for identification and exploration when investigating greater sage-grouse issues across the Colorado range. Using historic data for future decisions on species management while accounting for issues found from combining datasets allows land managers the flexibility to use all information available. © 2013 The Wildlife Society.  相似文献   

13.
This paper extends the multilevel survival model by allowing the existence of cured fraction in the model. Random effects induced by the multilevel clustering structure are specified in the linear predictors in both hazard function and cured probability parts. Adopting the generalized linear mixed model (GLMM) approach to formulate the problem, parameter estimation is achieved by maximizing a best linear unbiased prediction (BLUP) type log‐likelihood at the initial step of estimation, and is then extended to obtain residual maximum likelihood (REML) estimators of the variance component. The proposed multilevel mixture cure model is applied to analyze the (i) child survival study data with multilevel clustering and (ii) chronic granulomatous disease (CGD) data on recurrent infections as illustrations. A simulation study is carried out to evaluate the performance of the REML estimators and assess the accuracy of the standard error estimates.  相似文献   

14.
Summary In this rejoinder, we discuss the impact of misspecifying the random effects distribution on inferences obtained from generalized linear mixed models (GLMMs). Special attention is paid to the power of the tests for the fixed‐effect parameters. To study this misspecification, researchers often use simulation designs in which several choices for the true underlying random‐effects distribution are considered, while the assumed distribution is kept fixed. Neuhaus, McCulloch, and Boylan (2010, Biometrics 00 , 000–000) argue that a logically correct approach should consist of varying the assumed, fitted distribution, while holding the true fixed. We argue that both simulation designs can bring valuable insights into the impact of the misspecification. Furthermore, using both designs, we illustrate that the power associated with the tests for the fixed‐effect parameters in GLMM may be affected by misspecifying the random‐effects distribution.  相似文献   

15.
Zhang P  Song PX  Qu A  Greene T 《Biometrics》2008,64(1):29-38
Summary .  This article presents a new class of nonnormal linear mixed models that provide an efficient estimation of subject-specific disease progression in the analysis of longitudinal data from the Modification of Diet in Renal Disease (MDRD) trial. This new analysis addresses the previously reported finding that the distribution of the random effect characterizing disease progression is negatively skewed. We assume a log-gamma distribution for the random effects and provide the maximum likelihood inference for the proposed nonnormal linear mixed model. We derive the predictive distribution of patient-specific disease progression rates, which demonstrates rather different individual progression profiles from those obtained from the normal linear mixed model analysis. To validate the adequacy of the log-gamma assumption versus the usual normality assumption for the random effects, we propose a lack-of-fit test that clearly indicates a better fit for the log-gamma modeling in the analysis of the MDRD data. The full maximum likelihood inference is also advantageous in dealing with the missing at random (MAR) type of dropouts encountered in the MDRD data.  相似文献   

16.

Background

When administering vancomycin hydrochloride (VCM), the initial dose is adjusted to ensure that the steady-state trough value (Css-trough) remains within the effective concentration range. However, the Css-trough (population mean method predicted value [PMMPV]) calculated using the population mean method (PMM) often deviate from the effective concentration range. In this study, we used the generalized linear mixed model (GLMM) for initial dose planning to create a model that accurately predicts Css-trough, and subsequently assessed its prediction accuracy.

Methods

The study included 46 subjects whose trough values were measured after receiving VCM. We calculated the Css-trough (Bayesian estimate predicted value [BEPV]) from the Bayesian estimates of trough values. Using the patients’ medical data, we created models that predict the BEPV and selected the model with minimum information criterion (GLMM best model). We then calculated the Css-trough (GLMMPV) from the GLMM best model and compared the BEPV correlation with GLMMPV and with PMMPV.

Results

The GLMM best model was {[0.977?+?(males: 0.029 or females: -0.081)]?×?PMMPV?+?0.101?×?BUN/adjusted SCr – 12.899?×?SCr adjusted amount}. The coefficients of determination for BEPV/GLMMPV and BEPV/PMMPV were 0.623 and 0.513, respectively.

Conclusion

We demonstrated that the GLMM best model was more accurate in predicting the Css-trough than the PMM.
  相似文献   

17.
Missing outcomes or irregularly timed multivariate longitudinal data frequently occur in clinical trials or biomedical studies. The multivariate t linear mixed model (MtLMM) has been shown to be a robust approach to modeling multioutcome continuous repeated measures in the presence of outliers or heavy‐tailed noises. This paper presents a framework for fitting the MtLMM with an arbitrary missing data pattern embodied within multiple outcome variables recorded at irregular occasions. To address the serial correlation among the within‐subject errors, a damped exponential correlation structure is considered in the model. Under the missing at random mechanism, an efficient alternating expectation‐conditional maximization (AECM) algorithm is used to carry out estimation of parameters and imputation of missing values. The techniques for the estimation of random effects and the prediction of future responses are also investigated. Applications to an HIV‐AIDS study and a pregnancy study involving analysis of multivariate longitudinal data with missing outcomes as well as a simulation study have highlighted the superiority of MtLMMs on the provision of more adequate estimation, imputation and prediction performances.  相似文献   

18.
Roy J  Lin X 《Biometrics》2005,61(3):837-846
We consider estimation in generalized linear mixed models (GLMM) for longitudinal data with informative dropouts. At the time a unit drops out, time-varying covariates are often unobserved in addition to the missing outcome. However, existing informative dropout models typically require covariates to be completely observed. This assumption is not realistic in the presence of time-varying covariates. In this article, we first study the asymptotic bias that would result from applying existing methods, where missing time-varying covariates are handled using naive approaches, which include: (1) using only baseline values; (2) carrying forward the last observation; and (3) assuming the missing data are ignorable. Our asymptotic bias analysis shows that these naive approaches yield inconsistent estimators of model parameters. We next propose a selection/transition model that allows covariates to be missing in addition to the outcome variable at the time of dropout. The EM algorithm is used for inference in the proposed model. Data from a longitudinal study of human immunodeficiency virus (HIV)-infected women are used to illustrate the methodology.  相似文献   

19.
Generalized Spatial Dirichlet Process Models   总被引:1,自引:0,他引:1  
Many models for the study of point-referenced data explicitlyintroduce spatial random effects to capture residual spatialassociation. These spatial effects are customarily modelledas a zero-mean stationary Gaussian process. The spatial Dirichletprocess introduced by Gelfand et al. (2005) produces a randomspatial process which is neither Gaussian nor stationary. Rather,it varies about a process that is assumed to be stationary andGaussian. The spatial Dirichlet process arises as a probability-weightedcollection of random surfaces. This can be limiting for modellingand inferential purposes since it insists that a process realizationmust be one of these surfaces. We introduce a random distributionfor the spatial effects that allows different surface selectionat different sites. Moreover, we can specify the model so thatthe marginal distribution of the effect at each site still comesfrom a Dirichlet process. The development is offered constructively,providing a multivariate extension of the stick-breaking representationof the weights. We then introduce mixing using this generalizedspatial Dirichlet process. We illustrate with a simulated datasetof independent replications and note that we can embed the generalizedprocess within a dynamic model specification to eliminate theindependence assumption.  相似文献   

20.
Summary It is of great practical interest to simultaneously identify the important predictors that correspond to both the fixed and random effects components in a linear mixed‐effects (LME) model. Typical approaches perform selection separately on each of the fixed and random effect components. However, changing the structure of one set of effects can lead to different choices of variables for the other set of effects. We propose simultaneous selection of the fixed and random factors in an LME model using a modified Cholesky decomposition. Our method is based on a penalized joint log likelihood with an adaptive penalty for the selection and estimation of both the fixed and random effects. It performs model selection by allowing fixed effects or standard deviations of random effects to be exactly zero. A constrained expectation–maximization algorithm is then used to obtain the final estimates. It is further shown that the proposed penalized estimator enjoys the Oracle property, in that, asymptotically it performs as well as if the true model was known beforehand. We demonstrate the performance of our method based on a simulation study and a real data example.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号