首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Association Models for Clustered Data with Binary and Continuous Responses   总被引:1,自引:0,他引:1  
Summary .  We consider analysis of clustered data with mixed bivariate responses, i.e., where each member of the cluster has a binary and a continuous outcome. We propose a new bivariate random effects model that induces associations among the binary outcomes within a cluster, among the continuous outcomes within a cluster, between a binary outcome and a continuous outcome from different subjects within a cluster, as well as the direct association between the binary and continuous outcomes within the same subject. For the ease of interpretations of the regression effects, the marginal model of the binary response probability integrated over the random effects preserves the logistic form and the marginal expectation of the continuous response preserves the linear form. We implement maximum likelihood estimation of our model parameters using standard software such as PROC NLMIXED of SAS . Our simulation study demonstrates the robustness of our method with respect to the misspecification of the regression model as well as the random effects model. We illustrate our methodology by analyzing a developmental toxicity study of ethylene glycol in mice.  相似文献   

2.
Deletion diagnostics are introduced for the regression analysis of clustered binary outcomes estimated with alternating logistic regressions, an implementation of generalized estimating equations (GEE) that estimates regression coefficients in a marginal mean model and in a model for the intracluster association given by the log odds ratio. The diagnostics are developed within an estimating equations framework that recasts the estimating functions for association parameters based upon conditional residuals into equivalent functions based upon marginal residuals. Extensions of earlier work on GEE diagnostics follow directly, including computational formulae for one‐step deletion diagnostics that measure the influence of a cluster of observations on the estimated regression parameters and on the overall marginal mean or association model fit. The diagnostic formulae are evaluated with simulations studies and with an application concerning an assessment of factors associated with health maintenance visits in primary care medical practices. The application and the simulations demonstrate that the proposed cluster‐deletion diagnostics for alternating logistic regressions are good approximations of their exact fully iterated counterparts.  相似文献   

3.
Summary The generalized estimating equation (GEE) has been a popular tool for marginal regression analysis with longitudinal data, and its extension, the weighted GEE approach, can further accommodate data that are missing at random (MAR). Model selection methodologies for GEE, however, have not been systematically developed to allow for missing data. We propose the missing longitudinal information criterion (MLIC) for selection of the mean model, and the MLIC for correlation (MLICC) for selection of the correlation structure in GEE when the outcome data are subject to dropout/monotone missingness and are MAR. Our simulation results reveal that the MLIC and MLICC are effective for variable selection in the mean model and selecting the correlation structure, respectively. We also demonstrate the remarkable drawbacks of naively treating incomplete data as if they were complete and applying the existing GEE model selection method. The utility of proposed method is further illustrated by two real applications involving missing longitudinal outcome data.  相似文献   

4.
Multiple imputation (MI) is increasingly popular for handling multivariate missing data. Two general approaches are available in standard computer packages: MI based on the posterior distribution of incomplete variables under a multivariate (joint) model, and fully conditional specification (FCS), which imputes missing values using univariate conditional distributions for each incomplete variable given all the others, cycling iteratively through the univariate imputation models. In the context of longitudinal or clustered data, it is not clear whether these approaches result in consistent estimates of regression coefficient and variance component parameters when the analysis model of interest is a linear mixed effects model (LMM) that includes both random intercepts and slopes with either covariates or both covariates and outcome contain missing information. In the current paper, we compared the performance of seven different MI methods for handling missing values in longitudinal and clustered data in the context of fitting LMMs with both random intercepts and slopes. We study the theoretical compatibility between specific imputation models fitted under each of these approaches and the LMM, and also conduct simulation studies in both the longitudinal and clustered data settings. Simulations were motivated by analyses of the association between body mass index (BMI) and quality of life (QoL) in the Longitudinal Study of Australian Children (LSAC). Our findings showed that the relative performance of MI methods vary according to whether the incomplete covariate has fixed or random effects and whether there is missingnesss in the outcome variable. We showed that compatible imputation and analysis models resulted in consistent estimation of both regression parameters and variance components via simulation. We illustrate our findings with the analysis of LSAC data.  相似文献   

5.
Summary In some biomedical studies involving clustered binary responses (say, disease status), the cluster sizes can vary because some components of the cluster can be absent. When both the presence of a cluster component as well as the binary disease status of a present component are treated as responses of interest, we propose a novel two‐stage random effects logistic regression framework. For the ease of interpretation of regression effects, both the marginal probability of presence/absence of a component as well as the conditional probability of disease status of a present component, preserve the approximate logistic regression forms. We present a maximum likelihood method of estimation implementable using standard statistical software. We compare our models and the physical interpretation of regression effects with competing methods from literature. We also present a simulation study to assess the robustness of our procedure to wrong specification of the random effects distribution and to compare finite‐sample performances of estimates with existing methods. The methodology is illustrated via analyzing a study of the periodontal health status in a diabetic Gullah population.  相似文献   

6.
GEE with Gaussian estimation of the correlations when data are incomplete   总被引:4,自引:0,他引:4  
This paper considers a modification of generalized estimating equations (GEE) for handling missing binary response data. The proposed method uses Gaussian estimation of the correlation parameters, i.e., the estimating function that yields an estimate of the correlation parameters is obtained from the multivariate normal likelihood. The proposed method yields consistent estimates of the regression parameters when data are missing completely at random (MCAR). However, when data are missing at random (MAR), consistency may not hold. In a simulation study with repeated binary outcomes that are missing at random, the magnitude of the potential bias that can arise is examined. The results of the simulation study indicate that, when the working correlation matrix is correctly specified, the bias is almost negligible for the modified GEE. In the simulation study, the proposed modification of GEE is also compared to the standard GEE, multiple imputation, and weighted estimating equations approaches. Finally, the proposed method is illustrated using data from a longitudinal clinical trial comparing two therapeutic treatments, zidovudine (AZT) and didanosine (ddI), in patients with HIV.  相似文献   

7.
The intraclass correlation is commonly used with clustered data. It is often estimated based on fitting a model to hierarchical data and it leads, in turn, to several concepts such as reliability, heritability, inter‐rater agreement, etc. For data where linear models can be used, such measures can be defined as ratios of variance components. Matters are more difficult for non‐Gaussian outcomes. The focus here is on count and time‐to‐event outcomes where so‐called combined models are used, extending generalized linear mixed models, to describe the data. These models combine normal and gamma random effects to allow for both correlation due to data hierarchies as well as for overdispersion. Furthermore, because the models admit closed‐form expressions for the means, variances, higher moments, and even the joint marginal distribution, it is demonstrated that closed forms of intraclass correlations exist. The proposed methodology is illustrated using data from agricultural and livestock studies.  相似文献   

8.
9.
Within behavioural research, non‐normally distributed data with a complicated structure are common. For instance, data can represent repeated observations of quantities on the same individual. The regression analysis of such data is complicated both by the interdependency of the observations (response variables) and by their non‐normal distribution. Over the last decade, such data have been more and more frequently analysed using generalized mixed‐effect models. Some researchers invoke the heavy machinery of mixed‐effect modelling to obtain the desired population‐level (marginal) inference, which can be achieved by using simpler tools—namely by marginal models. This paper highlights marginal modelling (using generalized estimating equations [GEE]) as an alternative method. In various situations, GEE can be based on fewer assumptions and directly generate estimates (population‐level parameters) which are of immediate interest to the behavioural researcher (such as population means). Using four examples from behavioural research, we demonstrate the use, advantages, and limits of the GEE approach as implemented within the functions of the ‘geepack’ package in R.  相似文献   

10.
Approaches based on linear mixed models (LMMs) have recently gained popularity for modelling population substructure and relatedness in genome-wide association studies. In the last few years, a bewildering variety of different LMM methods/software packages have been developed, but it is not always clear how (or indeed whether) any newly-proposed method differs from previously-proposed implementations. Here we compare the performance of several LMM approaches (and software implementations, including EMMAX, GenABEL, FaST-LMM, Mendel, GEMMA and MMM) via their application to a genome-wide association study of visceral leishmaniasis in 348 Brazilian families comprising 3626 individuals (1972 genotyped). The implementations differ in precise details of methodology implemented and through various user-chosen options such as the method and number of SNPs used to estimate the kinship (relatedness) matrix. We investigate sensitivity to these choices and the success (or otherwise) of the approaches in controlling the overall genome-wide error-rate for both real and simulated phenotypes. We compare the LMM results to those obtained using traditional family-based association tests (based on transmission of alleles within pedigrees) and to alternative approaches implemented in the software packages MQLS, ROADTRIPS and MASTOR. We find strong concordance between the results from different LMM approaches, and all are successful in controlling the genome-wide error rate (except for some approaches when applied naively to longitudinal data with many repeated measures). We also find high correlation between LMMs and alternative approaches (apart from transmission-based approaches when applied to SNPs with small or non-existent effects). We conclude that LMM approaches perform well in comparison to competing approaches. Given their strong concordance, in most applications, the choice of precise LMM implementation cannot be based on power/type I error considerations but must instead be based on considerations such as speed and ease-of-use.  相似文献   

11.
Ranked set sampling (RSS) is a sampling procedure that can be considerably more efficient than simple random sampling (SRS). When the variable of interest is binary, ranking of the sample observations can be implemented using the estimated probabilities of success obtained from a logistic regression model developed for the binary variable. The main objective of this study is to use substantial data sets to investigate the application of RSS to estimation of a proportion for a population that is different from the one that provides the logistic regression. Our results indicate that precision in estimation of a population proportion is improved through the use of logistic regression to carry out the RSS ranking and, hence, the sample size required to achieve a desired precision is reduced. Further, the choice and the distribution of covariates in the logistic regression model are not overly crucial for the performance of a balanced RSS procedure.  相似文献   

12.
Generalized linear models (GLMs) are increasingly being used in daily data analysis. However, model checking for GLMs with correlated discrete response data remains difficult. In this paper, through a case study on marginal logistic regression using a real data set, we illustrate the flexibility and effectiveness of using conditional moment tests (CMTs), along with other graphical methods, to do model checking for generalized estimation equation (GEE) analyses. Although CMTs provide an array of powerful diagnostic tests for model checking, they were originally proposed in the econometrics literature and, to our knowledge, have never been applied to GEE analyses. CMTs cover many existing tests, including the (generalized) score test for an omitted covariate, as special cases. In summary, we believe that CMTs provide a class of useful model checking tools.  相似文献   

13.
Population structure and kinship are widespread confounding factors in genome-wide association studies (GWAS). It has been standard practice to include principal components of the genotypes in a regression model in order to account for population structure. More recently, the linear mixed model (LMM) has emerged as a powerful method for simultaneously accounting for population structure and kinship. The statistical theory underlying the differences in empirical performance between modeling principal components as fixed versus random effects has not been thoroughly examined. We undertake an analysis to formalize the relationship between these widely used methods and elucidate the statistical properties of each. Moreover, we introduce a new statistic, effective degrees of freedom, that serves as a metric of model complexity and a novel low rank linear mixed model (LRLMM) to learn the dimensionality of the correction for population structure and kinship, and we assess its performance through simulations. A comparison of the results of LRLMM and a standard LMM analysis applied to GWAS data from the Multi-Ethnic Study of Atherosclerosis (MESA) illustrates how our theoretical results translate into empirical properties of the mixed model. Finally, the analysis demonstrates the ability of the LRLMM to substantially boost the strength of an association for HDL cholesterol in Europeans.  相似文献   

14.
Although multicenter data are common, many prediction model studies ignore this during model development. The objective of this study is to evaluate the predictive performance of regression methods for developing clinical risk prediction models using multicenter data, and provide guidelines for practice. We compared the predictive performance of standard logistic regression, generalized estimating equations, random intercept logistic regression, and fixed effects logistic regression. First, we presented a case study on the diagnosis of ovarian cancer. Subsequently, a simulation study investigated the performance of the different models as a function of the amount of clustering, development sample size, distribution of center-specific intercepts, the presence of a center-predictor interaction, and the presence of a dependency between center effects and predictors. The results showed that when sample sizes were sufficiently large, conditional models yielded calibrated predictions, whereas marginal models yielded miscalibrated predictions. Small sample sizes led to overfitting and unreliable predictions. This miscalibration was worse with more heavily clustered data. Calibration of random intercept logistic regression was better than that of standard logistic regression even when center-specific intercepts were not normally distributed, a center-predictor interaction was present, center effects and predictors were dependent, or when the model was applied in a new center. Therefore, to make reliable predictions in a specific center, we recommend random intercept logistic regression.  相似文献   

15.
Repeatability (more precisely the common measure of repeatability, the intra‐class correlation coefficient, ICC) is an important index for quantifying the accuracy of measurements and the constancy of phenotypes. It is the proportion of phenotypic variation that can be attributed to between‐subject (or between‐group) variation. As a consequence, the non‐repeatable fraction of phenotypic variation is the sum of measurement error and phenotypic flexibility. There are several ways to estimate repeatability for Gaussian data, but there are no formal agreements on how repeatability should be calculated for non‐Gaussian data (e.g. binary, proportion and count data). In addition to point estimates, appropriate uncertainty estimates (standard errors and confidence intervals) and statistical significance for repeatability estimates are required regardless of the types of data. We review the methods for calculating repeatability and the associated statistics for Gaussian and non‐Gaussian data. For Gaussian data, we present three common approaches for estimating repeatability: correlation‐based, analysis of variance (ANOVA)‐based and linear mixed‐effects model (LMM)‐based methods, while for non‐Gaussian data, we focus on generalised linear mixed‐effects models (GLMM) that allow the estimation of repeatability on the original and on the underlying latent scale. We also address a number of methods for calculating standard errors, confidence intervals and statistical significance; the most accurate and recommended methods are parametric bootstrapping, randomisation tests and Bayesian approaches. We advocate the use of LMM‐ and GLMM‐based approaches mainly because of the ease with which confounding variables can be controlled for. Furthermore, we compare two types of repeatability (ordinary repeatability and extrapolated repeatability) in relation to narrow‐sense heritability. This review serves as a collection of guidelines and recommendations for biologists to calculate repeatability and heritability from both Gaussian and non‐Gaussian data.  相似文献   

16.
Longitudinal studies are often applied in biomedical research and clinical trials to evaluate the treatment effect. The association pattern within the subject must be considered in both sample size calculation and the analysis. One of the most important approaches to analyze such a study is the generalized estimating equation (GEE) proposed by Liang and Zeger, in which “working correlation structure” is introduced and the association pattern within the subject depends on a vector of association parameters denoted by ρ. The explicit sample size formulas for two‐group comparison in linear and logistic regression models are obtained based on the GEE method by Liu and Liang. For cluster randomized trials (CRTs), researchers proposed the optimal sample sizes at both the cluster and individual level as a function of sampling costs and the intracluster correlation coefficient (ICC). In these approaches, the optimal sample sizes depend strongly on the ICC. However, the ICC is usually unknown for CRTs and multicenter trials. To overcome this shortcoming, Van Breukelen et al. consider a range of possible ICC values identified from literature reviews and present Maximin designs (MMDs) based on relative efficiency (RE) and efficiency under budget and cost constraints. In this paper, the optimal sample size and number of repeated measurements using GEE models with an exchangeable working correlation matrix is proposed under the considerations of fixed budget, where “optimal” refers to maximum power for a given sampling budget. The equations of sample size and number of repeated measurements for a known parameter value ρ are derived and a straightforward algorithm for unknown ρ is developed. Applications in practice are discussed. We also discuss the existence of the optimal design when an AR(1) working correlation matrix is assumed. Our proposed method can be extended under the scenarios when the true and working correlation matrix are different.  相似文献   

17.
A population average regression model is proposed to assess the marginal effects of covariates on the cumulative incidence function when there is dependence across individuals within a cluster in the competing risks setting. This method extends the Fine-Gray proportional hazards model for the subdistribution to situations, where individuals within a cluster may be correlated due to unobserved shared factors. Estimators of the regression parameters in the marginal model are developed under an independence working assumption where the correlation across individuals within a cluster is completely unspecified. The estimators are consistent and asymptotically normal, and variance estimation may be achieved without specifying the form of the dependence across individuals. A simulation study evidences that the inferential procedures perform well with realistic sample sizes. The practical utility of the methods is illustrated with data from the European Bone Marrow Transplant Registry.  相似文献   

18.
Wang Z  Louis TA 《Biometrics》2004,60(4):884-891
Marginal models and conditional mixed-effects models are commonly used for clustered binary data. However, regression parameters and predictions in nonlinear mixed-effects models usually do not have a direct marginal interpretation, because the conditional functional form does not carry over to the margin. Because both marginal and conditional inferences are of interest, a unified approach is attractive. To this end, we investigate a parameterization of generalized linear mixed models with a structured random-intercept distribution that matches the conditional and marginal shapes. We model the marginal mean of response distribution and select the distribution of the random intercept to produce the match and also to model covariate-dependent random effects. We discuss the relation between this approach and some existing models and compare the approaches on two datasets.  相似文献   

19.

Background

Discovering reliable protein biomarkers is one of the most important issues in biomedical research. The ELISA is a traditional technique for accurate quantitation of well-known proteins. Recently, the multiple reaction-monitoring (MRM) mass spectrometry has been proposed for quantifying newly discovered protein and has become a popular alternative to ELISA. For the MRM data analysis, linear mixed modeling (LMM) has been used to analyze MRM data. MSstats is one of the most widely used tools for MRM data analysis that is based on the LMMs. However, LMMs often provide various significance results, depending on model specification. Sometimes it would be difficult to specify a correct LMM method for the analysis of MRM data. Here, we propose a new logistic regression-based method for Significance Analysis of Multiple Reaction Monitoring (LR-SAM).

Results

Through simulation studies, we demonstrate that LMM methods may not preserve type I error, thus yielding high false- positive errors, depending on how random effects are specified. Our simulation study also shows that the LR-SAM approach performs similarly well as LMM approaches, in most cases. However, LR-SAM performs better than the LMMs, particularly when the effects sizes of peptides from the same protein are heterogeneous. Our proposed method was applied to MRM data for identification of proteins associated with clinical responses of treatment of 115 hepatocellular carcinoma (HCC) patients with the tyrosine kinase inhibitor sorafenib. Of 124 candidate proteins, LMM approaches provided 6 results varying in significance, while LR-SAM, by contrast, yielded 18 significant results that were quite reproducibly consistent.

Conclusion

As exemplified by an application to HCC data set, LR-SAM more effectively identified proteins associated with clinical responses of treatment than LMM did.
  相似文献   

20.
Models for longitudinal data are employed in a wide range of behavioral, biomedical, psychosocial, and health‐care‐related research. One popular model for continuous response is the linear mixed‐effects model (LMM). Although simulations by recent studies show that LMM provides reliable estimates under departures from the normality assumption for complete data, the invariable occurrence of missing data in practical studies renders such robustness results less useful when applied to real study data. In this paper, we show by simulated studies that in the presence of missing data estimates of the fixed effect of LMM are biased under departures from normality. We discuss two robust alternatives, the weighted generalized estimating equations (WGEE) and the augmented WGEE (AWGEE), and compare their performances with LMM using real as well as simulated data. Our simulation results show that both WGEE and AWGEE provide valid inference for skewed non‐normal data when missing data follows the missing at random, the most popular missing data mechanism for real study data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号