首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 546 毫秒
1.
Spatial models for disease mapping should ideally account for covariates measured both at individual and area levels. The newly available “indiCAR” model fits the popular conditional autoregresssive (CAR) model by accommodating both individual and group level covariates while adjusting for spatial correlation in the disease rates. This algorithm has been shown to be effective but assumes log‐linear associations between individual level covariates and outcome. In many studies, the relationship between individual level covariates and the outcome may be non‐log‐linear, and methods to track such nonlinearity between individual level covariate and outcome in spatial regression modeling are not well developed. In this paper, we propose a new algorithm, smooth‐indiCAR, to fit an extension to the popular conditional autoregresssive model that can accommodate both linear and nonlinear individual level covariate effects while adjusting for group level covariates and spatial correlation in the disease rates. In this formulation, the effect of a continuous individual level covariate is accommodated via penalized splines. We describe a two‐step estimation procedure to obtain reliable estimates of individual and group level covariate effects where both individual and group level covariate effects are estimated separately. This distributed computing framework enhances its application in the Big Data domain with a large number of individual/group level covariates. We evaluate the performance of smooth‐indiCAR through simulation. Our results indicate that the smooth‐indiCAR method provides reliable estimates of all regression and random effect parameters. We illustrate our proposed methodology with an analysis of data on neutropenia admissions in New South Wales (NSW), Australia.  相似文献   

2.
Multiple imputation (MI) is increasingly popular for handling multivariate missing data. Two general approaches are available in standard computer packages: MI based on the posterior distribution of incomplete variables under a multivariate (joint) model, and fully conditional specification (FCS), which imputes missing values using univariate conditional distributions for each incomplete variable given all the others, cycling iteratively through the univariate imputation models. In the context of longitudinal or clustered data, it is not clear whether these approaches result in consistent estimates of regression coefficient and variance component parameters when the analysis model of interest is a linear mixed effects model (LMM) that includes both random intercepts and slopes with either covariates or both covariates and outcome contain missing information. In the current paper, we compared the performance of seven different MI methods for handling missing values in longitudinal and clustered data in the context of fitting LMMs with both random intercepts and slopes. We study the theoretical compatibility between specific imputation models fitted under each of these approaches and the LMM, and also conduct simulation studies in both the longitudinal and clustered data settings. Simulations were motivated by analyses of the association between body mass index (BMI) and quality of life (QoL) in the Longitudinal Study of Australian Children (LSAC). Our findings showed that the relative performance of MI methods vary according to whether the incomplete covariate has fixed or random effects and whether there is missingnesss in the outcome variable. We showed that compatible imputation and analysis models resulted in consistent estimation of both regression parameters and variance components via simulation. We illustrate our findings with the analysis of LSAC data.  相似文献   

3.
Kinney SK  Dunson DB 《Biometrics》2007,63(3):690-698
We address the problem of selecting which variables should be included in the fixed and random components of logistic mixed effects models for correlated data. A fully Bayesian variable selection is implemented using a stochastic search Gibbs sampler to estimate the exact model-averaged posterior distribution. This approach automatically identifies subsets of predictors having nonzero fixed effect coefficients or nonzero random effects variance, while allowing uncertainty in the model selection process. Default priors are proposed for the variance components and an efficient parameter expansion Gibbs sampler is developed for posterior computation. The approach is illustrated using simulated data and an epidemiologic example.  相似文献   

4.
In this paper, we consider selection based on the best predictor of animal additive genetic values in Gaussian linear mixed models, threshold models, Poisson mixed models, and log normal frailty models for survival data (including models with time-dependent covariates with associated fixed or random effects). In the different models, expressions are given (when these can be found – otherwise unbiased estimates are given) for prediction error variance, accuracy of selection and expected response to selection on the additive genetic scale and on the observed scale. The expressions given for non Gaussian traits are generalisations of the well-known formulas for Gaussian traits – and reflect, for Poisson mixed models and frailty models for survival data, the hierarchal structure of the models. In general the ratio of the additive genetic variance to the total variance in the Gaussian part of the model (heritability on the normally distributed level of the model) or a generalised version of heritability plays a central role in these formulas.  相似文献   

5.
Summary .   For longitudinal data, mixed models include random subject effects to indicate how subjects influence their responses over repeated assessments. The error variance and the variance of the random effects are usually considered to be homogeneous. These variance terms characterize the within-subjects (i.e., error variance) and between-subjects (i.e., random-effects variance) variation in the data. In studies using ecological momentary assessment (EMA), up to 30 or 40 observations are often obtained for each subject, and interest frequently centers around changes in the variances, both within and between subjects. In this article, we focus on an adolescent smoking study using EMA where interest is on characterizing changes in mood variation. We describe how covariates can influence the mood variances, and also extend the standard mixed model by adding a subject-level random effect to the within-subject variance specification. This permits subjects to have influence on the mean, or location, and variability, or (square of the) scale, of their mood responses. Additionally, we allow the location and scale random effects to be correlated. These mixed-effects location scale models have useful applications in many research areas where interest centers on the joint modeling of the mean and variance structure.  相似文献   

6.
Bayesian Inference in Semiparametric Mixed Models for Longitudinal Data   总被引:1,自引:0,他引:1  
Summary .  We consider Bayesian inference in semiparametric mixed models (SPMMs) for longitudinal data. SPMMs are a class of models that use a nonparametric function to model a time effect, a parametric function to model other covariate effects, and parametric or nonparametric random effects to account for the within-subject correlation. We model the nonparametric function using a Bayesian formulation of a cubic smoothing spline, and the random effect distribution using a normal distribution and alternatively a nonparametric Dirichlet process (DP) prior. When the random effect distribution is assumed to be normal, we propose a uniform shrinkage prior (USP) for the variance components and the smoothing parameter. When the random effect distribution is modeled nonparametrically, we use a DP prior with a normal base measure and propose a USP for the hyperparameters of the DP base measure. We argue that the commonly assumed DP prior implies a nonzero mean of the random effect distribution, even when a base measure with mean zero is specified. This implies weak identifiability for the fixed effects, and can therefore lead to biased estimators and poor inference for the regression coefficients and the spline estimator of the nonparametric function. We propose an adjustment using a postprocessing technique. We show that under mild conditions the posterior is proper under the proposed USP, a flat prior for the fixed effect parameters, and an improper prior for the residual variance. We illustrate the proposed approach using a longitudinal hormone dataset, and carry out extensive simulation studies to compare its finite sample performance with existing methods.  相似文献   

7.
The objective of this study was to compare models for appropriate genetic parameter estimation for milk yield (305-day) in crossbred Holsteins in the tropics, where only records from crossbred cows were available. Eleven models with different effects of contemporary group (CG) at calving (herd-year-season or herd-year-month as fixed, and herd-year-month as random), age at calving (as linear or quadratic covariates, age-class, and age-class x lactation), and dominance were considered. On-farm records from small herds (n < 50) were included or excluded to validate the parameter estimates. Average Information Restricted Maximum Likelihood (AIREML) and Best Linear Unbiased Prediction (BLUP) were used to estimate variance components and breeding values. R-square (R2) and standard error of heritability (h2) were used to determine the appropriate model. The estimates of heritability from most models ranged from 0.18 to 0.22. CG formation of herd-year-month as a random effect slightly lowered the additive genetic variance but considerably decreased the permanent environmental variance. The model with age-class x lactation gave better R2 than other age adjustments. The models including records from smallholders gave similar estimates of heritability and a lower standard error than the models excluding them. The estimate of dominance variance as a proportion of total variance was close to zero. The low ratio of dominance to additive genetic variance suggested that the inclusion of dominance effects in the model was unjustified. In conclusion, the model including the effects of herd-year-month, age-class x lactation, as well as additive genetic, permanent environmental and residual effects, was the most appropriate for genetic evaluation in crossbred Holsteins, where records from smallholders could be included.  相似文献   

8.
A general statistical framework is proposed for comparing linear models of spatial process and pattern. A spatial linear model for nested analysis of variance can be based on either fixed effects or random effects. Greig-Smith (1952) originally used a fixed effects model, but there are also examples of random effects models in the soil science literature. Assuming intrinsic stationarity for a linear model, the expectations of a spatial nested ANOVA and two term local variance (TTLV, Hill 1973) are functions of the variogram, and several examples are given. Paired quadrat variance (PQV, Ludwig & Goodall 1978) is a variogram estimator which can be used to approximate TTLV, and we provide an example from ecological data. Both nested ANOVA and TTLV can be seen as weighted lag-1 variogram estimators that are functions of support, rather than distance. We show that there are two unbiased estimators for the variogram under aggregation, and computer simulation shows that the estimator with smaller variance depends on the process autocorrelation.  相似文献   

9.
Kneib T  Fahrmeir L 《Biometrics》2006,62(1):109-118
Motivated by a space-time study on forest health with damage state of trees as the response, we propose a general class of structured additive regression models for categorical responses, allowing for a flexible semiparametric predictor. Nonlinear effects of continuous covariates, time trends, and interactions between continuous covariates are modeled by penalized splines. Spatial effects can be estimated based on Markov random fields, Gaussian random fields, or two-dimensional penalized splines. We present our approach from a Bayesian perspective, with inference based on a categorical linear mixed model representation. The resulting empirical Bayes method is closely related to penalized likelihood estimation in a frequentist setting. Variance components, corresponding to inverse smoothing parameters, are estimated using (approximate) restricted maximum likelihood. In simulation studies we investigate the performance of different choices for the spatial effect, compare the empirical Bayes approach to competing methodology, and study the bias of mixed model estimates. As an application we analyze data from the forest health survey.  相似文献   

10.
Microarrays provide a valuable tool for the quantification of gene expression. Usually, however, there is a limited number of replicates leading to unsatisfying variance estimates in a gene‐wise mixed model analysis. As thousands of genes are available, it is desirable to combine information across genes. When more than two tissue types or treatments are to be compared it might be advisable to consider the array effect as random. Then information between arrays may be recovered, which can increase accuracy in estimation. We propose a method of variance component estimation across genes for a linear mixed model with two random effects. The method may be extended to models with more than two random effects. We assume that the variance components follow a log‐normal distribution. Assuming that the sums of squares from the gene‐wise analysis, given the true variance components, follow a scaled χ2‐distribution, we adopt an empirical Bayes approach. The variance components are estimated by the expectation of their posterior distribution. The new method is evaluated in a simulation study. Differentially expressed genes are more likely to be detected by tests based on these variance estimates than by tests based on gene‐wise variance estimates. This effect is most visible in studies with small array numbers. Analyzing a real data set on maize endosperm the method is shown to work well. (© 2008 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

11.
Longitudinal data usually consist of a number of short time series. A group of subjects or groups of subjects are followed over time and observations are often taken at unequally spaced time points, and may be at different times for different subjects. When the errors and random effects are Gaussian, the likelihood of these unbalanced linear mixed models can be directly calculated, and nonlinear optimization used to obtain maximum likelihood estimates of the fixed regression coefficients and parameters in the variance components. For binary longitudinal data, a two state, non-homogeneous continuous time Markov process approach is used to model serial correlation within subjects. Formulating the model as a continuous time Markov process allows the observations to be equally or unequally spaced. Fixed and time varying covariates can be included in the model, and the continuous time model allows the estimation of the odds ratio for an exposure variable based on the steady state distribution. Exact likelihoods can be calculated. The initial probability distribution on the first observation on each subject is estimated using logistic regression that can involve covariates, and this estimation is embedded in the overall estimation. These models are applied to an intervention study designed to reduce children's sun exposure.  相似文献   

12.
Understanding causes of nest loss is critical for the management of endangered bird populations. Available methods for estimating nest loss probabilities to competing sources do not allow for random effects and covariation among sources, and there are few data simulation methods or goodness‐of‐fit (GOF) tests for such models. We developed a Bayesian multinomial extension of the widely used logistic exposure (LE) nest survival model which can incorporate multiple random effects and fixed‐effect covariates for each nest loss category. We investigated the performance of this model and the accompanying GOF test by analysing simulated nest fate datasets with and without age‐biased discovery probability, and by comparing the estimates with those of traditional fixed‐effects estimators. We then exemplify the use of the multinomial LE model and GOF test by analysing Piping Plover Charadrius melodus nest fate data (n = 443) to explore the effects of wire cages (exclosures) constructed around nests, which are used to protect nests from predation but can lead to increased nest abandonment rates. Mean parameter estimates of the random‐effects multinomial LE model were all within 1 sd of the true values used to simulate the datasets. Age‐biased discovery probability did not result in biased parameter estimates. Traditional fixed‐effects models provided estimates with a high bias of up to 43% with a mean of 71% smaller standard deviations. The GOF test identified models that were a poor fit to the simulated data. For the Piping Plover dataset, the fixed‐effects model was less well‐supported than the random‐effects model and underestimated the risk of exclosure use by 16%. The random‐effects model estimated a range of 1–6% probability of abandonment for nests not protected by exclosures across sites and 5–41% probability of abandonment for nests with exclosures, suggesting that the magnitude of exclosure‐related abandonment is site‐specific. Our results demonstrate that unmodelled heterogeneity can result in biased estimates potentially leading to incorrect management recommendations. The Bayesian multinomial LE model offers a flexible method of incorporating random effects into an analysis of nest failure and is robust to age‐biased nest discovery probability. This model can be generalized to other staggered‐entry, time‐to‐hazard situations.  相似文献   

13.
Generalized linear models are a widely used method to obtain parametric estimates for the mean function. They have been further extended to allow the relationship between the mean function and the covariates to be more flexible via generalized additive models. However, the fixed variance structure can in many cases be too restrictive. The extended quasilikelihood (EQL) framework allows for estimation of both the mean and the dispersion/variance as functions of covariates. As for other maximum likelihood methods though, EQL estimates are not resistant to outliers: we need methods to obtain robust estimates for both the mean and the dispersion function. In this article, we obtain functional estimates for the mean and the dispersion that are both robust and smooth. The performance of the proposed method is illustrated via a simulation study and some real data examples.  相似文献   

14.
The mixed-model factorial analysis of variance has been used in many recent studies in evolutionary quantitative genetics. Two competing formulations of the mixed-model ANOVA are commonly used, the “Scheffe” model and the “SAS” model; these models differ in both their assumptions and in the way in which variance components due to the main effect of random factors are defined. The biological meanings of the two variance component definitions have often been unappreciated, however. A full understanding of these meanings leads to the conclusion that the mixed-model ANOVA could have been used to much greater effect by many recent authors. The variance component due to the random main effect under the two-way SAS model is the covariance in true means associated with a level of the random factor (e.g., families) across levels of the fixed factor (e.g., environments). Therefore the SAS model has a natural application for estimating the genetic correlation between a character expressed in different environments and testing whether it differs from zero. The variance component due to the random main effect under the two-way Scheffe model is the variance in marginal means (i.e., means over levels of the fixed factor) among levels of the random factor. Therefore the Scheffe model has a natural application for estimating genetic variances and heritabilities in populations using a defined mixture of environments. Procedures and assumptions necessary for these applications of the models are discussed. While exact significance tests under the SAS model require balanced data and the assumptions that family effects are normally distributed with equal variances in the different environments, the model can be useful even when these conditions are not met (e.g., for providing an unbiased estimate of the across-environment genetic covariance). Contrary to statements in a recent paper, exact significance tests regarding the variance in marginal means as well as unbiased estimates can be readily obtained from unbalanced designs with no restrictive assumptions about the distributions or variance-covariance structure of family effects.  相似文献   

15.
On flexible finite polygenic models for multiple-trait evaluation   总被引:1,自引:0,他引:1  
Bink MC 《Genetical research》2002,80(3):245-256
Finite polygenic models (FPM) might be an alternative to the infinitesimal model (TIM) for the genetic evaluation of pedigreed multiple-generation populations for multiple quantitative traits. I present a general flexible Bayesian method that includes the number of genes in the FPM as an additional random variable. Markov-chain Monte Carlo techniques such as Gibbs sampling and the reversible jump sampler are used for implementation. Sampling of genotypes of all genes in the FPM is done via the use of segregation indicators. A broad range of FPM models, some combined with TIM, are empirically tested for the estimation of variance components and the number of genes in the FPM. Four simulation scenarios were studied, including genetic models with 5 or 50 additive independent diallelic genes affecting the traits, and random selection or selection on one of the traits was performed. The results in this study were based on ten replicates per simulation scenario. In the case of random selection, uniform priors on additive gene effects led to posterior mean estimates of genetic variance that were positively correlated with the number of genes fitted in the FPM. In the case of trait selection, assuming normal priors on gene effects also led to genetic variance estimates for the selected trait that were negatively correlated with the number of genes in the FPM. This negative correlation was not observed for the unselected trait. Treating the number of genes in the FPM as random revealed a positive correlation between prior and posterior mean estimates of this number, but the prior hardly affected the posterior estimates of genetic variance. Posterior inferences about the number of genes should be considered to be indicative where trait selection seems to improve the power of distinguishing between TIM and FPM. Based on the results of this study, I suggest not replacing TIM by the FPM, but combining TIM and FPM with the number of genes treated as random, to facilitate a highly flexible and thereby robust method for variance component estimation in pedigreed populations. Further study is required to explore the full potential of these models under different genetic model assumptions.  相似文献   

16.
A retrospective likelihood-based approach was proposed to test and estimate the effect of haplotype on disease risk using unphased genotype data with adjustment for environmental covariates. The proposed method was also extended to handle the data in which the haplotype and environmental covariates are not independent. Likelihood ratio tests were constructed to test the effects of haplotype and gene-environment interaction. The model parameters such as haplotype effect size was estimated using an Expectation Conditional-Maximization (ECM) algorithm developed by Meng and Rubin (1993). Model-based variance estimates were derived using the observed information matrix. Simulation studies were conducted for three different genetic effect models, including dominant effect, recessive effect, and additive effect. The results showed that the proposed method generated unbiased parameter estimates, proper type I error, and true beta coverage probabilities. The model performed well with small or large sample sizes, as well as short or long haplotypes.  相似文献   

17.
Random regression models are widely used in the field of animal breeding for the genetic evaluation of daily milk yields from different test days. These models are capable of handling different environmental effects on the respective test day, and they describe the characteristics of the course of the lactation period by using suitable covariates with fixed and random regression coefficients. As the numerically expensive estimation of parameters is already part of advanced computer software, modifications of random regression models will considerably grow in importance for statistical evaluations of nutrition and behaviour experiments with animals. Random regression models belong to the large class of linear mixed models. Thus, when choosing a model, or more precisely, when selecting a suitable covariance structure of the random effects, the information criteria of Akaike and Schwarz can be used. In this study, the fitting of random regression models for a statistical analysis of a feeding experiment with dairy cows is illustrated under application of the program package SAS. For each of the feeding groups, lactation curves modelled by covariates with fixed regression coefficients are estimated simultaneously. With the help of the fixed regression coefficients, differences between the groups are estimated and then tested for significance. The covariance structure of the random and subject-specific effects and the serial correlation matrix are selected by using information criteria and by estimating correlations between repeated measurements. For the verification of the selected model and the alternative models, mean values and standard deviations estimated with ordinary least square residuals are used.  相似文献   

18.
Shared random effects joint models are becoming increasingly popular for investigating the relationship between longitudinal and time‐to‐event data. Although appealing, such complex models are computationally intensive, and quick, approximate methods may provide a reasonable alternative. In this paper, we first compare the shared random effects model with two approximate approaches: a naïve proportional hazards model with time‐dependent covariate and a two‐stage joint model, which uses plug‐in estimates of the fitted values from a longitudinal analysis as covariates in a survival model. We show that the approximate approaches should be avoided since they can severely underestimate any association between the current underlying longitudinal value and the event hazard. We present classical and Bayesian implementations of the shared random effects model and highlight the advantages of the latter for making predictions. We then apply the models described to a study of abdominal aortic aneurysms (AAA) to investigate the association between AAA diameter and the hazard of AAA rupture. Out‐of‐sample predictions of future AAA growth and hazard of rupture are derived from Bayesian posterior predictive distributions, which are easily calculated within an MCMC framework. Finally, using a multivariate survival sub‐model we show that underlying diameter rather than the rate of growth is the most important predictor of AAA rupture.  相似文献   

19.
The purpose of this paper is to present a procedure for obtaining approximate maximum likelihood estimates for compound binary response models. The extra binomial variation is incorporated into the model by adding random effects to the fixed effects on the probit (or logit) scale. Numerical integration techniques are used to arrive at a solution of the likelihood equations. The paper also presents an illustrating numerical example based on a large toxicological data set. The computations are carried out within the GLIM statistical package.  相似文献   

20.
Abstract

Random regression models are widely used in the field of animal breeding for the genetic evaluation of daily milk yields from different test days. These models are capable of handling different environmental effects on the respective test day, and they describe the characteristics of the course of the lactation period by using suitable covariates with fixed and random regression coefficients. As the numerically expensive estimation of parameters is already part of advanced computer software, modifications of random regression models will considerably grow in importance for statistical evaluations of nutrition and behaviour experiments with animals. Random regression models belong to the large class of linear mixed models. Thus, when choosing a model, or more precisely, when selecting a suitable covariance structure of the random effects, the information criteria of Akaike and Schwarz can be used. In this study, the fitting of random regression models for a statistical analysis of a feeding experiment with dairy cows is illustrated under application of the program package SAS. For each of the feeding groups, lactation curves modelled by covariates with fixed regression coefficients are estimated simultaneously. With the help of the fixed regression coefficients, differences between the groups are estimated and then tested for significance. The covariance structure of the random and subject-specific effects and the serial correlation matrix are selected by using information criteria and by estimating correlations between repeated measurements. For the verification of the selected model and the alternative models, mean values and standard deviations estimated with ordinary least square residuals are used.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号