首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We have developed a new general approach for handling misclassification in discrete covariates or responses in regression models. The simulation and extrapolation (SIMEX) method, which was originally designed for handling additive covariate measurement error, is applied to the case of misclassification. The statistical model for characterizing misclassification is given by the transition matrix Pi from the true to the observed variable. We exploit the relationship between the size of misclassification and bias in estimating the parameters of interest. Assuming that Pi is known or can be estimated from validation data, we simulate data with higher misclassification and extrapolate back to the case of no misclassification. We show that our method is quite general and applicable to models with misclassified response and/or misclassified discrete regressors. In the case of a binary response with misclassification, we compare our method to the approach of Neuhaus, and to the matrix method of Morrissey and Spiegelman in the case of a misclassified binary regressor. We apply our method to a study on caries with a misclassified longitudinal response.  相似文献   

2.
We consider a Bayesian analysis for modeling a binary response that is subject to misclassification. Additionally, an explanatory variable is assumed to be unobservable, but measurements are available on its surrogate. A binary regression model is developed to incorporate the measurement error in the covariate as well as the misclassification in the response. Unlike existing methods, no model parameters need be assumed known. Markov chain Monte Carlo methods are utilized to perform the necessary computations. The methods developed are illustrated using atomic bomb survival data. A simulation experiment explores advantages of the approach.  相似文献   

3.
Holcroft CA  Spiegelman D 《Biometrics》1999,55(4):1193-1201
We compared several validation study designs for estimating the odds ratio of disease with misclassified exposure. We assumed that the outcome and misclassified binary covariate are available and that the error-free binary covariate is measured in a subsample, the validation sample. We considered designs in which the total size of the validation sample is fixed and the probability of selection into the validation sample may depend on outcome and misclassified covariate values. Design comparisons were conducted for rare and common disease scenarios, where the optimal design is the one that minimizes the variance of the maximum likelihood estimator of the true log odds ratio relating the outcome to the exposure of interest. Misclassification rates were assumed to be independent of the outcome. We used a sensitivity analysis to assess the effect of misspecifying the misclassification rates. Under the scenarios considered, our results suggested that a balanced design, which allocates equal numbers of validation subjects into each of the four outcome/mismeasured covariate categories, is preferable for its simplicity and good performance. A user-friendly Fortran program is available from the second author, which calculates the optimal sampling fractions for all designs considered and the efficiencies of these designs relative to the optimal hybrid design for any scenario of interest.  相似文献   

4.
Multiple imputation (MI) is used to handle missing at random (MAR) data. Despite warnings from statisticians, continuous variables are often recoded into binary variables. With MI it is important that the imputation and analysis models are compatible; variables should be imputed in the same form they appear in the analysis model. With an encoded binary variable more accurate imputations may be obtained by imputing the underlying continuous variable. We conducted a simulation study to explore how best to impute a binary variable that was created from an underlying continuous variable. We generated a completely observed continuous outcome associated with an incomplete binary covariate that is a categorized version of an underlying continuous covariate, and an auxiliary variable associated with the underlying continuous covariate. We simulated data with several sample sizes, and set 25% and 50% of data in the covariate to MAR dependent on the outcome and the auxiliary variable. We compared the performance of five different imputation methods: (a) Imputation of the binary variable using logistic regression; (b) imputation of the continuous variable using linear regression, then categorizing into the binary variable; (c, d) imputation of both the continuous and binary variables using fully conditional specification (FCS) and multivariate normal imputation; (e) substantive-model compatible (SMC) FCS. Bias and standard errors were large when the continuous variable only was imputed. The other methods performed adequately. Imputation of both the binary and continuous variables using FCS often encountered mathematical difficulties. We recommend the SMC-FCS method as it performed best in our simulation studies.  相似文献   

5.
Zucker DM  Spiegelman D 《Biometrics》2004,60(2):324-334
We consider the Cox proportional hazards model with discrete-valued covariates subject to misclassification. We present a simple estimator of the regression parameter vector for this model. The estimator is based on a weighted least squares analysis of weighted-averaged transformed Kaplan-Meier curves for the different possible configurations of the observed covariate vector. Optimal weighting of the transformed Kaplan-Meier curves is described. The method is designed for the case in which the misclassification rates are known or are estimated from an external validation study. A hybrid estimator for situations with an internal validation study is also described. When there is no misclassification, the regression coefficient vector is small in magnitude, and the censoring distribution does not depend on the covariates, our estimator has the same asymptotic covariance matrix as the Cox partial likelihood estimator. We present results of a finite-sample simulation study under Weibull survival in the setting of a single binary covariate with known misclassification rates. In this simulation study, our estimator performed as well as or, in a few cases, better than the full Weibull maximum likelihood estimator. We illustrate the method on data from a study of the relationship between trans-unsaturated dietary fat consumption and cardiovascular disease incidence.  相似文献   

6.
Gustafson P  Le Nhu D 《Biometrics》2002,58(4):878-887
It is well known that imprecision in the measurement of predictor variables typically leads to bias in estimated regression coefficients. We compare the bias induced by measurement error in a continuous predictor with that induced by misclassification of a binary predictor in the contexts of linear and logistic regression. To make the comparison fair, we consider misclassification probabilities for a binary predictor that correspond to dichotomizing an imprecise continuous predictor in lieu of its precise counterpart. On this basis, nondifferential binary misclassification is seen to yield more bias than nondifferential continuous measurement error. However, it is known that differential misclassification results if a binary predictor is actually formed by dichotomizing a continuous predictor subject to nondifferential measurement error. When the postulated model linking the response and precise continuous predictor is correct, this differential misclassification is found to yield less bias than continuous measurement error, in contrast with nondifferential misclassification, i.e., dichotomization reduces the bias due to mismeasurement. This finding, however, is sensitive to the form of the underlying relationship between the response and the continuous predictor. In particular, we give a scenario where dichotomization involves a trade-off between model fit and misclassification bias. We also examine how the bias depends on the choice of threshold in the dichotomization process and on the correlation between the imprecise predictor and a second precise predictor.  相似文献   

7.
Within the pattern-mixture modeling framework for informative dropout, conditional linear models (CLMs) are a useful approach to deal with dropout that can occur at any point in continuous time (not just at observation times). However, in contrast with selection models, inferences about marginal covariate effects in CLMs are not readily available if nonidentity links are used in the mean structures. In this article, we propose a CLM for long series of longitudinal binary data with marginal covariate effects directly specified. The association between the binary responses and the dropout time is taken into account by modeling the conditional mean of the binary response as well as the dependence between the binary responses given the dropout time. Specifically, parameters in both the conditional mean and dependence models are assumed to be linear or quadratic functions of the dropout time; and the continuous dropout time distribution is left completely unspecified. Inference is fully Bayesian. We illustrate the proposed model using data from a longitudinal study of depression in HIV-infected women, where the strategy of sensitivity analysis based on the extrapolation method is also demonstrated.  相似文献   

8.
This paper develops a model for repeated binary regression when a covariate is measured with error. The model allows for estimating the effect of the true value of the covariate on a repeated binary response. The choice of a probit link for the effect of the error-free covariate, coupled with normal measurement error for the error-free covariate, results in a probit model after integrating over the measurement error distribution. We propose a two-stage estimation procedure where, in the first stage, a linear mixed model is used to fit the repeated covariate. In the second stage, a model for the correlated binary responses conditional on the linear mixed model estimates is fit to the repeated binary data using generalized estimating equations. The approach is demonstrated using nutrient safety data from the Diet Intervention of School Age Children (DISC) study.  相似文献   

9.
Frangakis CE  Baker SG 《Biometrics》2001,57(3):899-908
For studies with treatment noncompliance, analyses have been developed recently to better estimate treatment efficacy. However, the advantage and cost of measuring compliance data have implications on the study design that have not been as systematically explored. In order to estimate better treatment efficacy with lower cost, we propose a new class of compliance subsampling (CSS) designs where, after subjects are assigned treatment, compliance behavior is measured for only subgroups of subjects. The sizes of the subsamples are allowed to relate to the treatment assignment, the assignment probability, the total sample size, the anticipated distributions of outcome and compliance, and the cost parameters of the study. The CSS design methods relate to prior work (i) on two-phase designs in which a covariate is subsampled and (ii) on causal inference because the subsampled postrandomization compliance behavior is not the true covariate of interest. For each CSS design, we develop efficient estimation of treatment efficacy under binary outcome and all-or-none observed compliance. Then we derive a minimal cost CSS design that achieves a required precision for estimating treatment efficacy. We compare the properties of the CSS design to those of conventional protocols in a study of patient choices for medical care at the end of life.  相似文献   

10.
We consider longitudinal studies in which the outcome observed over time is binary and the covariates of interest are categorical. With no missing responses or covariates, one specifies a multinomial model for the responses given the covariates and uses maximum likelihood to estimate the parameters. Unfortunately, incomplete data in the responses and covariates are a common occurrence in longitudinal studies. Here we assume the missing data are missing at random (Rubin, 1976, Biometrika 63, 581-592). Since all of the missing data (responses and covariates) are categorical, a useful technique for obtaining maximum likelihood parameter estimates is the EM algorithm by the method of weights proposed in Ibrahim (1990, Journal of the American Statistical Association 85, 765-769). In using the EM algorithm with missing responses and covariates, one specifies the joint distribution of the responses and covariates. Here we consider the parameters of the covariate distribution as a nuisance. In data sets where the percentage of missing data is high, the estimates of the nuisance parameters can lead to highly unstable estimates of the parameters of interest. We propose a conditional model for the covariate distribution that has several modeling advantages for the EM algorithm and provides a reduction in the number of nuisance parameters, thus providing more stable estimates in finite samples.  相似文献   

11.
Recurrent events data are common in experimental and observational studies. It is often of interest to estimate the effect of an intervention on the incidence rate of the recurrent events. The incidence rate difference is a useful measure of intervention effect. A weighted least squares estimator of the incidence rate difference for recurrent events was recently proposed for an additive rate model in which both the baseline incidence rate and the covariate effects were constant over time. In this article, we relax this model assumption and examine the properties of the estimator under the additive and multiplicative rate models assumption in which the baseline incidence rate and covariate effects may vary over time. We show analytically and numerically that the estimator gives an appropriate summary measure of the time‐varying covariate effects. In particular, when the underlying covariate effects are additive and time‐varying, the estimator consistently estimates the weighted average of the covariate effects over time. When the underlying covariate effects are multiplicative and time‐varying, and if there is only one binary covariate indicating the intervention status, the estimator consistently estimates the weighted average of the underlying incidence rate difference between the intervention and control groups over time. We illustrate the method with data from a randomized vaccine trial.  相似文献   

12.
Statistical procedures and methodology for assessment of interventions or treatments based on medical data often involves complexities due to incompleteness of the available data as a result of drop out or the inability of complete follow up until the endpoint of interest. In this article we propose a nonparametric regression model based on censored data when we are concerned with investigation of the simultaneous effects of the two or more factors. Specifically, we will assess the effect of a treatment (dose) and a covariate (e.g., age categories) on the mean survival time of subjects assigned to combinations of the levels of these factors. The proposed method allows for varying levels of censorship in the outcome among different groups of subjects at different levels of the independent variables (factors). We derive the asymptotic distribution of the estimators of the parameters in our model, which then allows for statistical inference. Finally, through a simulation study we assess the effect of the censoring rates on the standard error of these types of estimators. (© 2004 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

13.
Spatial models for disease mapping should ideally account for covariates measured both at individual and area levels. The newly available “indiCAR” model fits the popular conditional autoregresssive (CAR) model by accommodating both individual and group level covariates while adjusting for spatial correlation in the disease rates. This algorithm has been shown to be effective but assumes log‐linear associations between individual level covariates and outcome. In many studies, the relationship between individual level covariates and the outcome may be non‐log‐linear, and methods to track such nonlinearity between individual level covariate and outcome in spatial regression modeling are not well developed. In this paper, we propose a new algorithm, smooth‐indiCAR, to fit an extension to the popular conditional autoregresssive model that can accommodate both linear and nonlinear individual level covariate effects while adjusting for group level covariates and spatial correlation in the disease rates. In this formulation, the effect of a continuous individual level covariate is accommodated via penalized splines. We describe a two‐step estimation procedure to obtain reliable estimates of individual and group level covariate effects where both individual and group level covariate effects are estimated separately. This distributed computing framework enhances its application in the Big Data domain with a large number of individual/group level covariates. We evaluate the performance of smooth‐indiCAR through simulation. Our results indicate that the smooth‐indiCAR method provides reliable estimates of all regression and random effect parameters. We illustrate our proposed methodology with an analysis of data on neutropenia admissions in New South Wales (NSW), Australia.  相似文献   

14.
Tian L  Lagakos S 《Biometrics》2006,62(3):821-828
We develop methods for assessing the association between a binary time-dependent covariate process and a failure time endpoint when the former is observed only at a single time point and the latter is right censored, and when the observations are subject to truncation and competing causes of failure. Using a proportional hazards model for the effect of the covariate process on the failure time of interest, we develop an approach utilizing EM algorithm and profile likelihood for estimating the relative risk parameter and cause-specific hazards for failure. The methods are extended to account for other covariates that can influence the time-dependent covariate process and cause-specific risks of failure. We illustrate the methods with data from a recent study on the association between loss of hepatitis B e antigen and the development of hepatocellular carcinoma in a population of chronic carriers of hepatitis B.  相似文献   

15.
Summary We introduce a correction for covariate measurement error in nonparametric regression applied to longitudinal binary data arising from a study on human sleep. The data have been surveyed to investigate the association of some hormonal levels and the probability of being asleep. The hormonal effect is modeled flexibly while we account for the error‐prone measurement of its concentration in the blood and the longitudinal character of the data. We present a fully Bayesian treatment utilizing Markov chain Monte Carlo inference techniques, and also introduce block updating to improve sampling and computational performance in the binary case. Our model is partly inspired by the relevance vector machine with radial basis functions, where usually very few basis functions are automatically selected for fitting the data. In the proposed approach, we implement such data‐driven complexity regulation by adopting the idea of Bayesian model averaging. Besides the general theory and the detailed sampling scheme, we also provide a simulation study for the Gaussian and the binary cases by comparing our method to the naive analysis ignoring measurement error. The results demonstrate a clear gain when using the proposed correction method, particularly for the Gaussian case with medium and large measurement error variances, even if the covariate model is misspecified.  相似文献   

16.
Misclassification in binary outcomes can severely bias effect estimates of regression models when the models are naively applied to error‐prone data. Here, we discuss response misclassification in studies on the special class of bilateral diseases. Such diseases can affect neither, one, or both entities of a paired organ, for example, the eyes or ears. If measurements are available on both organ entities, disease occurrence in a person is often defined as disease occurrence in at least one entity. In this setting, there are two reasons for response misclassification: (a) ignorance of missing disease assessment in one of the two entities and (b) error‐prone disease assessment in the single entities. We investigate the consequences of ignoring both types of response misclassification and present an approach to adjust the bias from misclassification by optimizing an adequate likelihood function. The inherent modelling assumptions and problems in case of entity‐specific misclassification are discussed. This work was motivated by studies on age‐related macular degeneration (AMD), a disease that can occur separately in each eye of a person. We illustrate and discuss the proposed analysis approach based on real‐world data of a study on AMD and simulated data.  相似文献   

17.
Though a variety of reasons are often articulated for adjusting analyses for covariates, these reasons often fall into one of two general objectives, specifically to increase precision or to decrease bias. In practice, one does not generally choose between these objectives, because the methods that address one tend to address the other, as well. Because of this, no distinction is made in the methods used to correct for a baseline imbalance with respect to a prognostic covariate versus to ensure a fair comparison across treatment groups by making the comparisons within the levels of a prognostic covariate. Yet the literal translation of these two uses of covariate adjustment will lead to two distinct adjustment methods. We illustrate this divergence in the simplest case of a single binary covariate, a binary outcome, and two treatments, and we note that it is possible to combine the two approaches to derive yet a third approach. Each of these approaches is nonparametric and exact, and so it is the precise reason for adjusting that should dictate which would be used in any given situation.  相似文献   

18.
SUMMARY: We consider two-armed clinical trials in which the response and/or the covariates are observed on either a binary, ordinal, or continuous scale. A new general nonparametric (NP) approach for covariate adjustment is presented using the notion of a relative effect to describe treatment effects. The relative effect is defined by the probability of observing a higher response in the experimental than in the control arm. The notion is invariant under monotone transformations of the data and is therefore especially suitable for ordinal data. For a normal or binary distributed response the relative effect is the transformed effect size or the difference of response probability, respectively. An unbiased and consistent NP estimator for the relative effect is presented. Further, we suggest a NP procedure for correcting the relative effect for covariate imbalance and random covariate imbalance, yielding a consistent estimator for the adjusted relative effect. Asymptotic theory has been developed to derive test statistics and confidence intervals. The test statistic is based on the joint behavior of the estimated relative effect for the response and the covariates. It is shown that the test statistic can be used to evaluate the treatment effect in the presence of (random) covariate imbalance. Approximations for small sample sizes are considered as well. The sampling behavior of the estimator of the adjusted relative effect is examined. We also compare the probability of a type I error and the power of our approach to standard covariate adjustment methods by means of a simulation study. Finally, our approach is illustrated on three studies involving ordinal responses and covariates.  相似文献   

19.
Misclassification of exposure variables is a common problem in epidemiologic studies. This paper compares the matrix method (Barron, 1977, Biometrics 33, 414-418; Greenland, 1988a, Statistics in Medicine 7, 745-757) and the inverse matrix method (Marshall, 1990, Journal of Clinical Epidemiology 43, 941-947) to the maximum likelihood estimator (MLE) that corrects the odds ratio for bias due to a misclassified binary covariate. Under the assumption of differential misclassification, the inverse matrix method is always more efficient than the matrix method; however, the efficiency depends strongly on the values of the sensitivity, specificity, baseline probability of exposure, the odds ratio, case-control ratio, and validation sampling fraction. In a study on sudden infant death syndrome (SIDS), an estimate of the asymptotic relative efficiency (ARE) of the inverse matrix estimate was 0.99, while the matrix method's ARE was 0.19. Under nondifferential misclassification, neither the matrix nor the inverse matrix estimator is uniformly more efficient than the other; the efficiencies again depend on the underlying parameters. In the SIDS data, the MLE was more efficient than the matrix method (ARE = 0.39). In a study investigating the effect of vitamin A intake on the incidence of breast cancer, the MLE was more efficient than the matrix method (ARE = 0.75).  相似文献   

20.
The MESA (Multi-Ethnic Study of Atherosclerosis) is an ongoing study of the prevalence, risk factors, and progression of subclinical cardiovascular disease in a multi-ethnic cohort. It provides a valuable opportunity to examine the development and progression of CAC (coronary artery calcium), which is an important risk factor for the development of coronary heart disease. In MESA, about half of the CAC scores are zero and the rest are continuously distributed. Such data has been referred to as “zero-inflated data” and may be described using two-part models. Existing two-part model studies have limitations in that they usually consider parametric models only, make the assumption of known forms of the covariate effects, and focus only on the estimation property of the models. In this article, we investigate statistical modeling of CAC in MESA. Building on existing studies, we focus on two-part models. We investigate both parametric and semiparametric, and both proportional and nonproportional models. For various models, we study their estimation as well as prediction properties. We show that, to fully describe the relationship between covariates and CAC development, the semiparametric model with nonproportional covariate effects is needed. In contrast, for the purpose of prediction, the parametric model with proportional covariate effects is sufficient. This study provides a statistical basis for describing the behaviors of CAC and insights into its biological mechanisms.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号