首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We investigate methods for regression analysis when covariates are measured with errors. In a subset of the whole cohort, a surrogate variable is available for the true unobserved exposure variable. The surrogate variable satisfies the classical measurement error model, but it may not have repeated measurements. In addition to the surrogate variables that are available among the subjects in the calibration sample, we assume that there is an instrumental variable (IV) that is available for all study subjects. An IV is correlated with the unobserved true exposure variable and hence can be useful in the estimation of the regression coefficients. We propose a robust best linear estimator that uses all the available data, which is the most efficient among a class of consistent estimators. The proposed estimator is shown to be consistent and asymptotically normal under very weak distributional assumptions. For Poisson or linear regression, the proposed estimator is consistent even if the measurement error from the surrogate or IV is heteroscedastic. Finite-sample performance of the proposed estimator is examined and compared with other estimators via intensive simulation studies. The proposed method and other methods are applied to a bladder cancer case-control study.  相似文献   

2.
Summary : We propose a semiparametric Bayesian method for handling measurement error in nutritional epidemiological data. Our goal is to estimate nonparametrically the form of association between a disease and exposure variable while the true values of the exposure are never observed. Motivated by nutritional epidemiological data, we consider the setting where a surrogate covariate is recorded in the primary data, and a calibration data set contains information on the surrogate variable and repeated measurements of an unbiased instrumental variable of the true exposure. We develop a flexible Bayesian method where not only is the relationship between the disease and exposure variable treated semiparametrically, but also the relationship between the surrogate and the true exposure is modeled semiparametrically. The two nonparametric functions are modeled simultaneously via B‐splines. In addition, we model the distribution of the exposure variable as a Dirichlet process mixture of normal distributions, thus making its modeling essentially nonparametric and placing this work into the context of functional measurement error modeling. We apply our method to the NIH‐AARP Diet and Health Study and examine its performance in a simulation study.  相似文献   

3.
We introduce a new method, moment reconstruction, of correcting for measurement error in covariates in regression models. The central idea is similar to regression calibration in that the values of the covariates that are measured with error are replaced by "adjusted" values. In regression calibration the adjusted value is the expectation of the true value conditional on the measured value. In moment reconstruction the adjusted value is the variance-preserving empirical Bayes estimate of the true value conditional on the outcome variable. The adjusted values thereby have the same first two moments and the same covariance with the outcome variable as the unobserved "true" covariate values. We show that moment reconstruction is equivalent to regression calibration in the case of linear regression, but leads to different results for logistic regression. For case-control studies with logistic regression and covariates that are normally distributed within cases and controls, we show that the resulting estimates of the regression coefficients are consistent. In simulations we demonstrate that for logistic regression, moment reconstruction carries less bias than regression calibration, and for case-control studies is superior in mean-square error to the standard regression calibration approach. Finally, we give an example of the use of moment reconstruction in linear discriminant analysis and a nonstandard problem where we wish to adjust a classification tree for measurement error in the explanatory variables.  相似文献   

4.
Schafer DW 《Biometrics》2001,57(1):53-61
This paper presents an EM algorithm for semiparametric likelihood analysis of linear, generalized linear, and nonlinear regression models with measurement errors in explanatory variables. A structural model is used in which probability distributions are specified for (a) the response and (b) the measurement error. A distribution is also assumed for the true explanatory variable but is left unspecified and is estimated by nonparametric maximum likelihood. For various types of extra information about the measurement error distribution, the proposed algorithm makes use of available routines that would be appropriate for likelihood analysis of (a) and (b) if the true x were available. Simulations suggest that the semiparametric maximum likelihood estimator retains a high degree of efficiency relative to the structural maximum likelihood estimator based on correct distributional assumptions and can outperform maximum likelihood based on an incorrect distributional assumption. The approach is illustrated on three examples with a variety of structures and types of extra information about the measurement error distribution.  相似文献   

5.
Wang CY  Wang N  Wang S 《Biometrics》2000,56(2):487-495
We consider regression analysis when covariate variables are the underlying regression coefficients of another linear mixed model. A naive approach is to use each subject's repeated measurements, which are assumed to follow a linear mixed model, and obtain subject-specific estimated coefficients to replace the covariate variables. However, directly replacing the unobserved covariates in the primary regression by these estimated coefficients may result in a significantly biased estimator. The aforementioned problem can be evaluated as a generalization of the classical additive error model where repeated measures are considered as replicates. To correct for these biases, we investigate a pseudo-expected estimating equation (EEE) estimator, a regression calibration (RC) estimator, and a refined version of the RC estimator. For linear regression, the first two estimators are identical under certain conditions. However, when the primary regression model is a nonlinear model, the RC estimator is usually biased. We thus consider a refined regression calibration estimator whose performance is close to that of the pseudo-EEE estimator but does not require numerical integration. The RC estimator is also extended to the proportional hazards regression model. In addition to the distribution theory, we evaluate the methods through simulation studies. The methods are applied to analyze a real dataset from a child growth study.  相似文献   

6.
Sugar EA  Wang CY  Prentice RL 《Biometrics》2007,63(1):143-151
Regression calibration, refined regression calibration, and conditional scores estimation procedures are extended to a measurement model that is motivated by nutritional and physical activity epidemiology. Biomarker data, available on a small subset of a study cohort for reasons of cost, are assumed to adhere to a classical measurement error model, while corresponding self-report nutrient consumption or activity-related energy expenditure data are available for the entire cohort. The self-report assessment measurement model includes a person-specific random effect, the mean and variance of which may depend on individual characteristics such as body mass index or ethnicity. Logistic regression is used to relate the disease odds ratio to the actual, but unmeasured, dietary or physical activity exposure. Simulation studies are presented to evaluate and contrast the three estimation procedures, and to provide insight into preferred biomarker subsample size under selected cohort study configurations.  相似文献   

7.
Venkatraman ES  Begg CB 《Biometrics》1999,55(4):1171-1176
A nonparametric test is derived for comparing treatments with respect to the final endpoint in clinical trials in which the final endpoint has been observed for a random subset of patients, but results are available for a surrogate endpoint for a larger sample of patients. The test is an adaptation of the Wilcoxon-Mann-Whitney two-sample test, with an adjustment that involves a comparison of the ranks of the surrogate endpoints between patients with and without final endpoints. The validity of the test depends on the assumption that the patients with final endpoints represent a random sample of the patients registered in the study. This assumption is viable in trials in which the final endpoint is evaluated at a "landmark" timepoint in the patients' natural history. A small sample simulation study demonstrates that the test has a size that is close to the nominal value for all configurations evaluated. When compared with the conventional test based only on the final endpoints, the new test delivers substantial increases in power only when the surrogate endpoint is highly correlated with the true endpoint. Our research indicates that, in the absence of modeling assumptions, auxiliary information derived from surrogate endpoints can provide significant additional information only under special circumstances.  相似文献   

8.
Larsen K 《Biometrics》2004,60(1):85-92
Multiple categorical variables are commonly used in medical and epidemiological research to measure specific aspects of human health and functioning. To analyze such data, models have been developed considering these categorical variables as imperfect indicators of an individual's "true" status of health or functioning. In this article, the latent class regression model is used to model the relationship between covariates, a latent class variable (the unobserved status of health or functioning), and the observed indicators (e.g., variables from a questionnaire). The Cox model is extended to encompass a latent class variable as predictor of time-to-event, while using information about latent class membership available from multiple categorical indicators. The expectation-maximization (EM) algorithm is employed to obtain maximum likelihood estimates, and standard errors are calculated based on the profile likelihood, treating the nonparametric baseline hazard as a nuisance parameter. A sampling-based method for model checking is proposed. It allows for graphical investigation of the assumption of proportional hazards across latent classes. It may also be used for checking other model assumptions, such as no additional effect of the observed indicators given latent class. The usefulness of the model framework and the proposed techniques are illustrated in an analysis of data from the Women's Health and Aging Study concerning the effect of severe mobility disability on time-to-death for elderly women.  相似文献   

9.
An important indicator for the long-term recovery after valve replacement surgery is postoperative valve gradient. This information is available only for patients received catheterization or echocardiogram postoperatively. It is plausible that sicker patients are more inclined to undergo these postoperative procedures and their valve gradients tend to be higher. Under this situation, ignoring the missing values and using sample mean based on the available information as an estimate of the whole study population leads to overestimation. Regression estimator is a reasonable choice to eliminate this bias if independent (explanatory) variables closely associated with both residual valve gradient and non-response mechanism can be identified. Using a series of patients receiving St. Jude Medical prosthetic valves, we found that valve area index can be used as an independent variable in the regression estimator. Two digressions from the standard assumptions used in linear regression, heteroscedastic trend of the error term and outliers were found in the data set. Iteratively reweighted least square (IRLS) was adopted to handle heteroscedasticity. Influence function approach was used to evaluate the sensitivity of outliers in regression estimator. Under an equal response rate mechanism, IRLS not only solves the problem of heteroscedasticity, but is also less sensitive to outliers.  相似文献   

10.
If a dependent variable in a regression analysis is exceptionally expensive or hard to obtain the overall sample size used to fit the model may be limited. To avoid this one may use a cheaper or more easily collected “surrogate” variable to supplement the expensive variable. The regression analysis will be enhanced to the degree the surrogate is associated with the costly dependent variable. We develop a Bayesian approach incorporating surrogate variables in regression based on a two‐stage experiment. Illustrative examples are given, along with comparisons to an existing frequentist method. (© 2004 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

11.
Assessment of the misclassification error rate is of high practical relevance in many biomedical applications. As it is a complex problem, theoretical results on estimator performance are few. The origin of most findings are Monte Carlo simulations, which take place in the “normal setting”: The covariables of two groups have a multivariate normal distribution; The groups differ in location, but have the same covariance matrix and the linear discriminant function LDF is used for prediction. We perform a new simulation to compare existing nonparametric estimators in a more complex situation. The underlying distribution is based on a logistic model with six binary as well as continuous covariables. To study estimator performance for varying true error rates, three prediction rules including nonparametric classification trees and parametric logistic regression and sample sizes ranging from 100‐1,000 are considered. In contrast to most published papers we turn our attention to estimator performance based on simple, even inappropriate prediction rules and relatively large training sets. For the major part, results are in agreement with usual findings. The most strikingly behavior was seen in applying (simple) classification trees for prediction: Since the apparent error rate Êrr.app is biased, linear combinations incorporating Êrr.app underestimate the true error rate even for large sample sizes. The .632+ estimator, which was designed to correct for the overoptimism of Efron's .632 estimator for nonparametric prediction rules, performs best of all such linear combinations. The bootstrap estimator Êrr.B0 and the crossvalidation estimator Êrr.cv, which do not depend on Êrr.app, seem to track the true error rate. Although the disadvantages of both estimators – pessimism of Êrr.B0 and high variability of Êrr.cv – shrink with increased sample sizes, they are still visible. We conclude that for the choice of a particular estimator the asymptotic behavior of the apparent error rate is important. For the assessment of estimator performance the variance of the true error rate is crucial, where in general the stability of prediction procedures is essential for the application of estimators based on resampling methods. (© 2004 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

12.
Shaw PA  Prentice RL 《Biometrics》2012,68(2):397-407
Uncertainty concerning the measurement error properties of self-reported diet has important implications for the reliability of nutritional epidemiology reports. Biomarkers based on the urinary recovery of expended nutrients can provide an objective measure of short-term nutrient consumption for certain nutrients and, when applied to a subset of a study cohort, can be used to calibrate corresponding self-report nutrient consumption assessments. A nonstandard measurement error model that makes provision for systematic error and subject-specific error, along with the usual independent random error, is needed for the self-report data. Three estimation procedures for hazard ratio (Cox model) parameters are extended for application to this more complex measurement error structure. These procedures are risk set regression calibration, conditional score, and nonparametric corrected score. An estimator for the cumulative baseline hazard function is also provided. The performance of each method is assessed in a simulation study. The methods are then applied to an example from the Women's Health Initiative Dietary Modification Trial.  相似文献   

13.
Song X  Wang CY 《Biometrics》2008,64(2):557-566
Summary .   We study joint modeling of survival and longitudinal data. There are two regression models of interest. The primary model is for survival outcomes, which are assumed to follow a time-varying coefficient proportional hazards model. The second model is for longitudinal data, which are assumed to follow a random effects model. Based on the trajectory of a subject's longitudinal data, some covariates in the survival model are functions of the unobserved random effects. Estimated random effects are generally different from the unobserved random effects and hence this leads to covariate measurement error. To deal with covariate measurement error, we propose a local corrected score estimator and a local conditional score estimator. Both approaches are semiparametric methods in the sense that there is no distributional assumption needed for the underlying true covariates. The estimators are shown to be consistent and asymptotically normal. However, simulation studies indicate that the conditional score estimator outperforms the corrected score estimator for finite samples, especially in the case of relatively large measurement error. The approaches are demonstrated by an application to data from an HIV clinical trial.  相似文献   

14.
Li L  Shao J  Palta M 《Biometrics》2005,61(3):824-830
Covariate measurement error in regression is typically assumed to act in an additive or multiplicative manner on the true covariate value. However, such an assumption does not hold for the measurement error of sleep-disordered breathing (SDB) in the Wisconsin Sleep Cohort Study (WSCS). The true covariate is the severity of SDB, and the observed surrogate is the number of breathing pauses per unit time of sleep, which has a nonnegative semicontinuous distribution with a point mass at zero. We propose a latent variable measurement error model for the error structure in this situation and implement it in a linear mixed model. The estimation procedure is similar to regression calibration but involves a distributional assumption for the latent variable. Modeling and model-fitting strategies are explored and illustrated through an example from the WSCS.  相似文献   

15.
Exposure measurement error can result in a biased estimate of the association between an exposure and outcome. When the exposure–outcome relationship is linear on the appropriate scale (e.g. linear, logistic) and the measurement error is classical, that is the result of random noise, the result is attenuation of the effect. When the relationship is non‐linear, measurement error distorts the true shape of the association. Regression calibration is a commonly used method for correcting for measurement error, in which each individual's unknown true exposure in the outcome regression model is replaced by its expectation conditional on the error‐prone measure and any fully measured covariates. Regression calibration is simple to execute when the exposure is untransformed in the linear predictor of the outcome regression model, but less straightforward when non‐linear transformations of the exposure are used. We describe a method for applying regression calibration in models in which a non‐linear association is modelled by transforming the exposure using a fractional polynomial model. It is shown that taking a Bayesian estimation approach is advantageous. By use of Markov chain Monte Carlo algorithms, one can sample from the distribution of the true exposure for each individual. Transformations of the sampled values can then be performed directly and used to find the expectation of the transformed exposure required for regression calibration. A simulation study shows that the proposed approach performs well. We apply the method to investigate the relationship between usual alcohol intake and subsequent all‐cause mortality using an error model that adjusts for the episodic nature of alcohol consumption.  相似文献   

16.
Informative drop-out arises in longitudinal studies when the subject's follow-up time depends on the unobserved values of the response variable. We specify a semiparametric linear regression model for the repeatedly measured response variable and an accelerated failure time model for the time to informative drop-out. The error terms from the two models are assumed to have a common, but completely arbitrary joint distribution. Using a rank-based estimator for the accelerated failure time model and an artificial censoring device, we construct an asymptotically unbiased estimating function for the linear regression model. The resultant estimator is shown to be consistent and asymptotically normal. A resampling scheme is developed to estimate the limiting covariance matrix. Extensive simulation studies demonstrate that the proposed methods are suitable for practical use. Illustrations with data taken from two AIDS clinical trials are provided.  相似文献   

17.
We consider the proportional hazards model in which the covariates include the discretized categories of a continuous time-dependent exposure variable measured with error. Naively ignoring the measurement error in the analysis may cause biased estimation and erroneous inference. Although various approaches have been proposed to deal with measurement error when the hazard depends linearly on the time-dependent variable, it has not yet been investigated how to correct when the hazard depends on the discretized categories of the time-dependent variable. To fill this gap in the literature, we propose a smoothed corrected score approach based on approximation of the discretized categories after smoothing the indicator function. The consistency and asymptotic normality of the proposed estimator are established. The observation times of the time-dependent variable are allowed to be informative. For comparison, we also extend to this setting two approximate approaches, the regression calibration and the risk-set regression calibration. The methods are assessed by simulation studies and by application to data from an HIV clinical trial.  相似文献   

18.
Many estimators of the average effect of a treatment on an outcome require estimation of the propensity score, the outcome regression, or both. It is often beneficial to utilize flexible techniques, such as semiparametric regression or machine learning, to estimate these quantities. However, optimal estimation of these regressions does not necessarily lead to optimal estimation of the average treatment effect, particularly in settings with strong instrumental variables. A recent proposal addressed these issues via the outcome-adaptive lasso, a penalized regression technique for estimating the propensity score that seeks to minimize the impact of instrumental variables on treatment effect estimators. However, a notable limitation of this approach is that its application is restricted to parametric models. We propose a more flexible alternative that we call the outcome highly adaptive lasso. We discuss the large sample theory for this estimator and propose closed-form confidence intervals based on the proposed estimator. We show via simulation that our method offers benefits over several popular approaches.  相似文献   

19.
Estimating the effects of haplotypes on the age of onset of a disease is an important step toward the discovery of genes that influence complex human diseases. A haplotype is a specific sequence of nucleotides on the same chromosome of an individual and can only be measured indirectly through the genotype. We consider cohort studies which collect genotype data on a subset of cohort members through case-cohort or nested case-control sampling. We formulate the effects of haplotypes and possibly time-varying environmental variables on the age of onset through a broad class of semiparametric regression models. We construct appropriate nonparametric likelihoods, which involve both finite- and infinite-dimensional parameters. The corresponding nonparametric maximum likelihood estimators are shown to be consistent, asymptotically normal, and asymptotically efficient. Consistent variance-covariance estimators are provided, and efficient and reliable numerical algorithms are developed. Simulation studies demonstrate that the asymptotic approximations are accurate in practical settings and that case-cohort and nested case-control designs are highly cost-effective. An application to a major cardiovascular study is provided.  相似文献   

20.
Follmann D  Nason M 《Biometrics》2011,67(3):1127-1134
Summary Quantal bioassay experiments relate the amount or potency of some compound; for example, poison, antibody, or drug to a binary outcome such as death or infection in animals. For infectious diseases, probit regression is commonly used for inference and a key measure of potency is given by the IDP , the amount that results in P% of the animals being infected. In some experiments, a validation set may be used where both direct and proxy measures of the dose are available on a subset of animals with the proxy being available on all. The proxy variable can be viewed as a messy reflection of the direct variable, leading to an errors‐in‐variables problem. We develop a model for the validation set and use a constrained seemingly unrelated regression (SUR) model to obtain the distribution of the direct measure conditional on the proxy. We use the conditional distribution to derive a pseudo‐likelihood based on probit regression and use the parametric bootstrap for statistical inference. We re‐evaluate an old experiment in 21 monkeys where neutralizing antibodies (nABs) to HIV were measured using an old (proxy) assay in all monkeys and with a new (direct) assay in a validation set of 11 who had sufficient stored plasma. Using our methods, we obtain an estimate of the ID1 for the new assay, an important target for HIV vaccine candidates. In simulations, we compare the pseudo‐likelihood estimates with regression calibration and a full joint likelihood approach.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号