首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Association Models for Clustered Data with Binary and Continuous Responses   总被引:1,自引:0,他引:1  
Summary .  We consider analysis of clustered data with mixed bivariate responses, i.e., where each member of the cluster has a binary and a continuous outcome. We propose a new bivariate random effects model that induces associations among the binary outcomes within a cluster, among the continuous outcomes within a cluster, between a binary outcome and a continuous outcome from different subjects within a cluster, as well as the direct association between the binary and continuous outcomes within the same subject. For the ease of interpretations of the regression effects, the marginal model of the binary response probability integrated over the random effects preserves the logistic form and the marginal expectation of the continuous response preserves the linear form. We implement maximum likelihood estimation of our model parameters using standard software such as PROC NLMIXED of SAS . Our simulation study demonstrates the robustness of our method with respect to the misspecification of the regression model as well as the random effects model. We illustrate our methodology by analyzing a developmental toxicity study of ethylene glycol in mice.  相似文献   

2.
An interpretation for the ROC curve and inference using GLM procedures   总被引:7,自引:0,他引:7  
Pepe MS 《Biometrics》2000,56(2):352-359
The accuracy of a medical diagnostic test is often summarized in a receiver operating characteristic (ROC) curve. This paper puts forth an interpretation for each point on the ROC curve as being a conditional probability of a test result from a random diseased subject exceeding that from a random nondiseased subject. This interpretation gives rise to new methods for making inference about ROC curves. It is shown that inference can be achieved with binary regression techniques applied to indicator variables constructed from pairs of test results, one component of the pair being from a diseased subject and the other from a nondiseased subject. Within the generalized linear model (GLM) binary regression framework, ROC curves can be estimated, and we highlight a new semiparametric estimator. Covariate effects can also be evaluated with the GLM models. The methodology is applied to a pancreatic cancer dataset where we use the regression framework to compare two different serum biomarkers. Asymptotic distribution theory is developed to facilitate inference and to provide insight into factors influencing variability of estimated model parameters.  相似文献   

3.
A logistic regression with random effects model is commonly applied to analyze clustered binary data, and every cluster is assumed to have a different proportion of success. However, it could be of interest to obtain the proportion of success over clusters (i.e. the marginal proportion of success). Furthermore, the degree of correlation among data of the same cluster (intraclass correlation) is also a relevant concept to assess, but when using logistic regression with random effects it is not possible to get an analytical expression of the estimators for marginal proportion and intraclass correlation. In our paper, we assess and compare approaches using different kinds of approximations: based on the logistic‐normal mixed effects model (LN), linear mixed model (LMM), and generalized estimating equations (GEE). The comparisons are completed by using two real data examples and a simulation study. The results show the performance of the approaches strongly depends on the magnitude of the marginal proportion, the intraclass correlation, and the sample size. In general, the reliability of the approaches get worsen with low marginal proportion and large intraclass correlation. LMM and GEE approaches arises as reliable approaches when the sample size is large.  相似文献   

4.
In biomedical research, the logistic regression model is the most commonly used method for predicting the probability of a binary outcome. While many clinical researchers have expressed an enthusiasm for regression trees, this method may have limited accuracy for predicting health outcomes. We aimed to evaluate the improvement that is achieved by using ensemble‐based methods, including bootstrap aggregation (bagging) of regression trees, random forests, and boosted regression trees. We analyzed 30‐day mortality in two large cohorts of patients hospitalized with either acute myocardial infarction (N = 16,230) or congestive heart failure (N = 15,848) in two distinct eras (1999–2001 and 2004–2005). We found that both the in‐sample and out‐of‐sample prediction of ensemble methods offered substantial improvement in predicting cardiovascular mortality compared to conventional regression trees. However, conventional logistic regression models that incorporated restricted cubic smoothing splines had even better performance. We conclude that ensemble methods from the data mining and machine learning literature increase the predictive performance of regression trees, but may not lead to clear advantages over conventional logistic regression models for predicting short‐term mortality in population‐based samples of subjects with cardiovascular disease.  相似文献   

5.
J M Neuhaus  N P Jewell 《Biometrics》1990,46(4):977-990
Recently a great deal of attention has been given to binary regression models for clustered or correlated observations. The data of interest are of the form of a binary dependent or response variable, together with independent variables X1,...., Xk, where sets of observations are grouped together into clusters. A number of models and methods of analysis have been suggested to study such data. Many of these are extensions in some way of the familiar logistic regression model for binary data that are not grouped (i.e., each cluster is of size 1). In general, the analyses of these clustered data models proceed by assuming that the observed clusters are a simple random sample of clusters selected from a population of clusters. In this paper, we consider the application of these procedures to the case where the clusters are selected randomly in a manner that depends on the pattern of responses in the cluster. For example, we show that ignoring the retrospective nature of the sample design, by fitting standard logistic regression models for clustered binary data, may result in misleading estimates of the effects of covariates and the precision of estimated regression coefficients.  相似文献   

6.
Rieger RH  Weinberg CR 《Biometrics》2002,58(2):332-341
Conditional logistic regression (CLR) is useful for analyzing clustered binary outcome data when interest lies in estimating a cluster-specific exposure parameter while treating the dependency arising from random cluster effects as a nuisance. CLR aggregates unmeasured cluster-specific factors into a cluster-specific baseline risk and is invalid in the presence of unmodeled heterogeneous covariate effects or within-cluster dependency. We propose an alternative, resampling-based method for analyzing clustered binary outcome data, within-cluster paired resampling (WCPR), which allows for within-cluster dependency not solely due to baseline heterogeneity. For example, dependency may be in part caused by heterogeneity in response to an exposure across clusters due to unmeasured cofactors. When both CLR and WCPR are valid, our simulations suggest that the two methods perform comparably. When CLR is invalid, WCPR continues to have good operating characteristics. For illustration, we apply both WCPR and CLR to a periodontal data set where there is heterogeneity in response to exposure across clusters.  相似文献   

7.
Large-scale surveys, such as national forest inventories and vegetation monitoring programs, usually have complex sampling designs that include geographical stratification and units organized in clusters. When models are developed using data from such programs, a key question is whether or not to utilize design information when analyzing the relationship between a response variable and a set of covariates. Standard statistical regression methods often fail to account for complex sampling designs, which may lead to severely biased estimators of model coefficients. Furthermore, ignoring that data are spatially correlated within clusters may underestimate the standard errors of regression coefficient estimates, with a risk for drawing wrong conclusions. We first review general approaches that account for complex sampling designs, e.g. methods using probability weighting, and stress the need to explore the effects of the sampling design when applying logistic regression models. We then use Monte Carlo simulation to compare the performance of the standard logistic regression model with two approaches to model correlated binary responses, i.e. cluster-specific and population-averaged logistic regression models. As an example, we analyze the occurrence of epiphytic hair lichens in the genus Bryoria; an indicator of forest ecosystem integrity. Based on data from the National Forest Inventory (NFI) for the period 1993–2014 we generated a data set on hair lichen occurrence on  >100,000 Picea abies trees distributed throughout Sweden. The NFI data included ten covariates representing forest structure and climate variables potentially affecting lichen occurrence. Our analyses show the importance of taking complex sampling designs and correlated binary responses into account in logistic regression modeling to avoid the risk of obtaining notably biased parameter estimators and standard errors, and erroneous interpretations about factors affecting e.g. hair lichen occurrence. We recommend comparisons of unweighted and weighted logistic regression analyses as an essential step in development of models based on data from large-scale surveys.  相似文献   

8.
In many studies, the association of longitudinal measurements of a continuous response and a binary outcome are often of interest. A convenient framework for this type of problems is the joint model, which is formulated to investigate the association between a binary outcome and features of longitudinal measurements through a common set of latent random effects. The joint model, which is the focus of this article, is a logistic regression model with covariates defined as the individual‐specific random effects in a non‐linear mixed‐effects model (NLMEM) for the longitudinal measurements. We discuss different estimation procedures, which include two‐stage, best linear unbiased predictors, and various numerical integration techniques. The proposed methods are illustrated using a real data set where the objective is to study the association between longitudinal hormone levels and the pregnancy outcome in a group of young women. The numerical performance of the estimating methods is also evaluated by means of simulation.  相似文献   

9.
Binary regression models for spatial data are commonly used in disciplines such as epidemiology and ecology. Many spatially referenced binary data sets suffer from location error, which occurs when the recorded location of an observation differs from its true location. When location error occurs, values of the covariates associated with the true spatial locations of the observations cannot be obtained. We show how a change of support (COS) can be applied to regression models for binary data to provide coefficient estimates when the true values of the covariates are unavailable, but the unknown location of the observations are contained within nonoverlapping arbitrarily shaped polygons. The COS accommodates spatial and nonspatial covariates and preserves the convenient interpretation of methods such as logistic and probit regression. Using a simulation experiment, we compare binary regression models with a COS to naive approaches that ignore location error. We illustrate the flexibility of the COS by modeling individual-level disease risk in a population using a binary data set where the locations of the observations are unknown but contained within administrative units. Our simulation experiment and data illustration corroborate that conventional regression models for binary data that ignore location error are unreliable, but that the COS can be used to eliminate bias while preserving model choice.  相似文献   

10.
SUMMARY: We consider two-armed clinical trials in which the response and/or the covariates are observed on either a binary, ordinal, or continuous scale. A new general nonparametric (NP) approach for covariate adjustment is presented using the notion of a relative effect to describe treatment effects. The relative effect is defined by the probability of observing a higher response in the experimental than in the control arm. The notion is invariant under monotone transformations of the data and is therefore especially suitable for ordinal data. For a normal or binary distributed response the relative effect is the transformed effect size or the difference of response probability, respectively. An unbiased and consistent NP estimator for the relative effect is presented. Further, we suggest a NP procedure for correcting the relative effect for covariate imbalance and random covariate imbalance, yielding a consistent estimator for the adjusted relative effect. Asymptotic theory has been developed to derive test statistics and confidence intervals. The test statistic is based on the joint behavior of the estimated relative effect for the response and the covariates. It is shown that the test statistic can be used to evaluate the treatment effect in the presence of (random) covariate imbalance. Approximations for small sample sizes are considered as well. The sampling behavior of the estimator of the adjusted relative effect is examined. We also compare the probability of a type I error and the power of our approach to standard covariate adjustment methods by means of a simulation study. Finally, our approach is illustrated on three studies involving ordinal responses and covariates.  相似文献   

11.
Shih JH  Albert PS 《Biometrics》1999,55(4):1232-1235
We propose a methodology for modeling correlated binary data measured with diagnostic error. A shared random effect is used to induce correlations in repeated true latent binary outcomes and in observed responses and to link the probability of a true positive outcome with the probability of having a diagnosis error. We evaluate the performance of our proposed approach through simulations and compare it with an ad hoc approach. The methodology is illustrated with data from a study that assessed the probability of corneal arcus in patients with familial hypercholesterolemia.  相似文献   

12.
Cook RJ  Brumback BB  Wigg MB  Ryan LM 《Biometrics》2001,57(3):671-680
We describe a method for assessing dose-response effects from a series of case-control and cohort studies in which the exposure information is interval censored. The interval censoring of the exposure variable is dealt with through the use of retrospective models in which the exposure is treated as a multinomial response and disease status as a binary covariate. Polychotomous logistic regression models are adopted in which the dose-response relationship between exposure and disease may be modeled in a discrete or continuous fashion. Partial conditioning is possible to eliminate some of the nuisance parameters. The methods are applied to the motivating study of the relationship between chorionic villus sampling and the occurrence of terminal transverse limb reduction.  相似文献   

13.
Although multicenter data are common, many prediction model studies ignore this during model development. The objective of this study is to evaluate the predictive performance of regression methods for developing clinical risk prediction models using multicenter data, and provide guidelines for practice. We compared the predictive performance of standard logistic regression, generalized estimating equations, random intercept logistic regression, and fixed effects logistic regression. First, we presented a case study on the diagnosis of ovarian cancer. Subsequently, a simulation study investigated the performance of the different models as a function of the amount of clustering, development sample size, distribution of center-specific intercepts, the presence of a center-predictor interaction, and the presence of a dependency between center effects and predictors. The results showed that when sample sizes were sufficiently large, conditional models yielded calibrated predictions, whereas marginal models yielded miscalibrated predictions. Small sample sizes led to overfitting and unreliable predictions. This miscalibration was worse with more heavily clustered data. Calibration of random intercept logistic regression was better than that of standard logistic regression even when center-specific intercepts were not normally distributed, a center-predictor interaction was present, center effects and predictors were dependent, or when the model was applied in a new center. Therefore, to make reliable predictions in a specific center, we recommend random intercept logistic regression.  相似文献   

14.
Logistic probability models—models linear in the log odds of the outcome event—have found extensive application in modelling of unordered categorical responses. This paper illustrates some extensions of logistic models to the modelling of probabilities of ordinal responses. The extensions arise naturally from discrete probability models for the conditional distribution of the ordinal response, as well as from linear modelling of the log odds of response. Methods of estimation and examination of fit developed for the binary logistic model extend in a straightforward manner to the ordinal models. The models and methods are illustrated in an analysis of the dependence of chronic obstructive respiratory disease prevalence on smoking and age.  相似文献   

15.
Huang Y  Leroux B 《Biometrics》2011,67(3):843-851
Summary Williamson, Datta, and Satten's (2003, Biometrics 59 , 36–42) cluster‐weighted generalized estimating equations (CWGEEs) are effective in adjusting for bias due to informative cluster sizes for cluster‐level covariates. We show that CWGEE may not perform well, however, for covariates that can take different values within a cluster if the numbers of observations at each covariate level are informative. On the other hand, inverse probability of treatment weighting accounts for informative treatment propensity but not for informative cluster size. Motivated by evaluating the effect of a binary exposure in presence of such types of informativeness, we propose several weighted GEE estimators, with weights related to the size of a cluster as well as the distribution of the binary exposure within the cluster. Choice of the weights depends on the population of interest and the nature of the exposure. Through simulation studies, we demonstrate the superior performance of the new estimators compared to existing estimators such as from GEE, CWGEE, and inverse probability of treatment‐weighted GEE. We demonstrate the use of our method using an example examining covariate effects on the risk of dental caries among small children.  相似文献   

16.
The augmentation of categorical outcomes with underlying Gaussian variables in bivariate generalized mixed effects models has facilitated the joint modeling of continuous and binary response variables. These models typically assume that random effects and residual effects (co)variances are homogeneous across all clusters and subjects, respectively. Motivated by conflicting evidence about the association between performance outcomes in dairy production systems, we consider the situation where these (co)variance parameters may themselves be functions of systematic and/or random effects. We present a hierarchical Bayesian extension of bivariate generalized linear models whereby functions of the (co)variance matrices are specified as linear combinations of fixed and random effects following a square‐root‐free Cholesky reparameterization that ensures necessary positive semidefinite constraints. We test the proposed model by simulation and apply it to the analysis of a dairy cattle data set in which the random herd‐level and residual cow‐level effects (co)variances between a continuous production trait and binary reproduction trait are modeled as functions of fixed management effects and random cluster effects.  相似文献   

17.
Interpreting parameters in the logistic regression model with random effects   总被引:11,自引:0,他引:11  
Logistic regression with random effects is used to study the relationship between explanatory variables and a binary outcome in cases with nonindependent outcomes. In this paper, we examine in detail the interpretation of both fixed effects and random effects parameters. As heterogeneity measures, the random effects parameters included in the model are not easily interpreted. We discuss different alternative measures of heterogeneity and suggest using a median odds ratio measure that is a function of the original random effects parameters. The measure allows a simple interpretation, in terms of well-known odds ratios, that greatly facilitates communication between the data analyst and the subject-matter researcher. Three examples from different subject areas, mainly taken from our own experience, serve to motivate and illustrate different aspects of parameter interpretation in these models.  相似文献   

18.
Summary Latent class analysis (LCA) and latent class regression (LCR) are widely used for modeling multivariate categorical outcomes in social science and biomedical studies. Standard analyses assume data of different respondents to be mutually independent, excluding application of the methods to familial and other designs in which participants are clustered. In this article, we consider multilevel latent class models, in which subpopulation mixing probabilities are treated as random effects that vary among clusters according to a common Dirichlet distribution. We apply the expectation‐maximization (EM) algorithm for model fitting by maximum likelihood (ML). This approach works well, but is computationally intensive when either the number of classes or the cluster size is large. We propose a maximum pairwise likelihood (MPL) approach via a modified EM algorithm for this case. We also show that a simple latent class analysis, combined with robust standard errors, provides another consistent, robust, but less‐efficient inferential procedure. Simulation studies suggest that the three methods work well in finite samples, and that the MPL estimates often enjoy comparable precision as the ML estimates. We apply our methods to the analysis of comorbid symptoms in the obsessive compulsive disorder study. Our models' random effects structure has more straightforward interpretation than those of competing methods, thus should usefully augment tools available for LCA of multilevel data.  相似文献   

19.
In this paper, we investigate K‐group comparisons on survival endpoints for observational studies. In clinical databases for observational studies, treatment for patients are chosen with probabilities varying depending on their baseline characteristics. This often results in noncomparable treatment groups because of imbalance in baseline characteristics of patients among treatment groups. In order to overcome this issue, we conduct propensity analysis and match the subjects with similar propensity scores across treatment groups or compare weighted group means (or weighted survival curves for censored outcome variables) using the inverse probability weighting (IPW). To this end, multinomial logistic regression has been a popular propensity analysis method to estimate the weights. We propose to use decision tree method as an alternative propensity analysis due to its simplicity and robustness. We also propose IPW rank statistics, called Dunnett‐type test and ANOVA‐type test, to compare 3 or more treatment groups on survival endpoints. Using simulations, we evaluate the finite sample performance of the weighted rank statistics combined with these propensity analysis methods. We demonstrate these methods with a real data example. The IPW method also allows us for unbiased estimation of population parameters of each treatment group. In this paper, we limit our discussions to survival outcomes, but all the methods can be easily modified for any type of outcomes, such as binary or continuous variables.  相似文献   

20.
Logistic regression is often used to help make medical decisions with binary outcomes. Here we evaluate the use of several methods for selection of variables in logistic regression. We use a large dataset to predict the diagnosis of myocardial infarction in patients reporting to an emergency room with chest pain. Our results indicate that some of the examined methods are well suited for variable selection in logistic regression and that our model, and our myocardial infarction risk calculator, can be an additional tool to aid physicians in myocardial infarction diagnosis.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号