首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 875 毫秒
1.
Summary .  In this article, we study the estimation of mean response and regression coefficient in semiparametric regression problems when response variable is subject to nonrandom missingness. When the missingness is independent of the response conditional on high-dimensional auxiliary information, the parametric approach may misspecify the relationship between covariates and response while the nonparametric approach is infeasible because of the curse of dimensionality. To overcome this, we study a model-based approach to condense the auxiliary information and estimate the parameters of interest nonparametrically on the condensed covariate space. Our estimators possess the double robustness property, i.e., they are consistent whenever the model for the response given auxiliary covariates or the model for the missingness given auxiliary covariate is correct. We conduct a number of simulations to compare the numerical performance between our estimators and other existing estimators in the current missing data literature, including the propensity score approach and the inverse probability weighted estimating equation. A set of real data is used to illustrate our approach.  相似文献   

2.
Summary .  We consider semiparametric transition measurement error models for longitudinal data, where one of the covariates is measured with error in transition models, and no distributional assumption is made for the underlying unobserved covariate. An estimating equation approach based on the pseudo conditional score method is proposed. We show the resulting estimators of the regression coefficients are consistent and asymptotically normal. We also discuss the issue of efficiency loss. Simulation studies are conducted to examine the finite-sample performance of our estimators. The longitudinal AIDS Costs and Services Utilization Survey data are analyzed for illustration.  相似文献   

3.
Pan W  Lin X  Zeng D 《Biometrics》2006,62(2):402-412
We propose a new class of models, transition measurement error models, to study the effects of covariates and the past responses on the current response in longitudinal studies when one of the covariates is measured with error. We show that the response variable conditional on the error-prone covariate follows a complex transition mixed effects model. The naive model obtained by ignoring the measurement error correctly specifies the transition part of the model, but misspecifies the covariate effect structure and ignores the random effects. We next study the asymptotic bias in naive estimator obtained by ignoring the measurement error for both continuous and discrete outcomes. We show that the naive estimator of the regression coefficient of the error-prone covariate is attenuated, while the naive estimators of the regression coefficients of the past responses are generally inflated. We then develop a structural modeling approach for parameter estimation using the maximum likelihood estimation method. In view of the multidimensional integration required by full maximum likelihood estimation, an EM algorithm is developed to calculate maximum likelihood estimators, in which Monte Carlo simulations are used to evaluate the conditional expectations in the E-step. We evaluate the performance of the proposed method through a simulation study and apply it to a longitudinal social support study for elderly women with heart disease. An additional simulation study shows that the Bayesian information criterion (BIC) performs well in choosing the correct transition orders of the models.  相似文献   

4.
Sensitivity and specificity are common measures of the accuracy of a diagnostic test. The usual estimators of these quantities are unbiased if data on the diagnostic test result and the true disease status are obtained from all subjects in an appropriately selected sample. In some studies, verification of the true disease status is performed only for a subset of subjects, possibly depending on the result of the diagnostic test and other characteristics of the subjects. Estimators of sensitivity and specificity based on this subset of subjects are typically biased; this is known as verification bias. Methods have been proposed to correct verification bias under the assumption that the missing data on disease status are missing at random (MAR), that is, the probability of missingness depends on the true (missing) disease status only through the test result and observed covariate information. When some of the covariates are continuous, or the number of covariates is relatively large, the existing methods require parametric models for the probability of disease or the probability of verification (given the test result and covariates), and hence are subject to model misspecification. We propose a new method for correcting verification bias based on the propensity score, defined as the predicted probability of verification given the test result and observed covariates. This is estimated separately for those with positive and negative test results. The new method classifies the verified sample into several subsamples that have homogeneous propensity scores and allows correction for verification bias. Simulation studies demonstrate that the new estimators are more robust to model misspecification than existing methods, but still perform well when the models for the probability of disease and probability of verification are correctly specified.  相似文献   

5.
A method for reducing bias in observational studies proposed by ROSENBAUM and RUBIN (1983, 1984) is discussed with a view to applications in studies designed to compare two treatments. The data are stratified on a function of covariates, called the propensity score. The propensity score is the conditional probability of receiving a specific treatment given a set of observed covariates. Some insight into how this kind of stratification works in theory is given. Within strata, the treatment groups are comparable with respect to the distribution of covariates incorporated into the score, hence a corresponding stratified analysis can be considered. The method is different from other strategies in that the sub-classes are not intended to comprise patients with similar prognosis. In practice, estimated grouped scores are used. Problems concerning the interpretation of the proposed stratified approach are illustrated by an application in oncology, and the results are compared to those from an analysis in a standard regression model.  相似文献   

6.
Horton NJ  Laird NM 《Biometrics》2001,57(1):34-42
This article presents a new method for maximum likelihood estimation of logistic regression models with incomplete covariate data where auxiliary information is available. This auxiliary information is extraneous to the regression model of interest but predictive of the covariate with missing data. Ibrahim (1990, Journal of the American Statistical Association 85, 765-769) provides a general method for estimating generalized linear regression models with missing covariates using the EM algorithm that is easily implemented when there is no auxiliary data. Vach (1997, Statistics in Medicine 16, 57-72) describes how the method can be extended when the outcome and auxiliary data are conditionally independent given the covariates in the model. The method allows the incorporation of auxiliary data without making the conditional independence assumption. We suggest tests of conditional independence and compare the performance of several estimators in an example concerning mental health service utilization in children. Using an artificial dataset, we compare the performance of several estimators when auxiliary data are available.  相似文献   

7.
Datta S  Sundaram R 《Biometrics》2006,62(3):829-837
Multistage models are used to describe individuals (or experimental units) moving through a succession of "stages" corresponding to distinct states (e.g., healthy, diseased, diseased with complications, dead). The resulting data can be considered to be a form of multivariate survival data containing information about the transition times and the stages occupied. Traditional survival analysis is the simplest example of a multistage model, where individuals begin in an initial stage (say, alive) and move irreversibly to a second stage (death). In this article, we consider general multistage models with a directed tree structure (progressive models) in which individuals traverse through stages in a possibly non-Markovian manner. We construct nonparametric estimators of stage occupation probabilities and marginal cumulative transition hazards. Empirical calculations of these quantities are not possible due to the lack of complete data. We consider current status information which represents a more severe form of censoring than the commonly used right censoring. Asymptotic validity of our estimators can be justified using consistency results for nonparametric regression estimators. Finite-sample behavior of our estimators is studied by simulation, in which we show that our estimators based on these limited data compare well with those based on complete data. We also apply our method to a real-life data set arising from a cardiovascular diseases study in Taiwan.  相似文献   

8.
J M Robins  S D Mark  W K Newey 《Biometrics》1992,48(2):479-495
In order to estimate the causal effects of one or more exposures or treatments on an outcome of interest, one has to account for the effect of "confounding factors" which both covary with the exposures or treatments and are independent predictors of the outcome. In this paper we present regression methods which, in contrast to standard methods, adjust for the confounding effect of multiple continuous or discrete covariates by modelling the conditional expectation of the exposures or treatments given the confounders. In the special case of a univariate dichotomous exposure or treatment, this conditional expectation is identical to what Rosenbaum and Rubin have called the propensity score. They have also proposed methods to estimate causal effects by modelling the propensity score. Our methods generalize those of Rosenbaum and Rubin in several ways. First, our approach straightforwardly allows for multivariate exposures or treatments, each of which may be continuous, ordinal, or discrete. Second, even in the case of a single dichotomous exposure, our approach does not require subclassification or matching on the propensity score so that the potential for "residual confounding," i.e., bias, due to incomplete matching is avoided. Third, our approach allows a rather general formalization of the idea that it is better to use the "estimated propensity score" than the true propensity score even when the true score is known. The additional power of our approach derives from the fact that we assume the causal effects of the exposures or treatments can be described by the parametric component of a semiparametric regression model. To illustrate our methods, we reanalyze the effect of current cigarette smoking on the level of forced expiratory volume in one second in a cohort of 2,713 adult white males. We compare the results with those obtained using standard methods.  相似文献   

9.
Little RJ  Long Q  Lin X 《Biometrics》2009,65(2):640-649
Summary .  We consider the analysis of clinical trials that involve randomization to an active treatment ( T  = 1) or a control treatment ( T  = 0), when the active treatment is subject to all-or-nothing compliance. We compare three approaches to estimating treatment efficacy in this situation: as-treated analysis, per-protocol analysis, and instrumental variable (IV) estimation, where the treatment effect is estimated using the randomization indicator as an IV. Both model- and method-of-moment based IV estimators are considered. The assumptions underlying these estimators are assessed, standard errors and mean squared errors of the estimates are compared, and design implications of the three methods are examined. Extensions of the methods to include observed covariates are then discussed, emphasizing the role of compliance propensity methods and the contrasting role of covariates in these extensions. Methods are illustrated on data from the Women Take Pride study, an assessment of behavioral treatments for women with heart disease.  相似文献   

10.
Roy J  Lin X 《Biometrics》2005,61(3):837-846
We consider estimation in generalized linear mixed models (GLMM) for longitudinal data with informative dropouts. At the time a unit drops out, time-varying covariates are often unobserved in addition to the missing outcome. However, existing informative dropout models typically require covariates to be completely observed. This assumption is not realistic in the presence of time-varying covariates. In this article, we first study the asymptotic bias that would result from applying existing methods, where missing time-varying covariates are handled using naive approaches, which include: (1) using only baseline values; (2) carrying forward the last observation; and (3) assuming the missing data are ignorable. Our asymptotic bias analysis shows that these naive approaches yield inconsistent estimators of model parameters. We next propose a selection/transition model that allows covariates to be missing in addition to the outcome variable at the time of dropout. The EM algorithm is used for inference in the proposed model. Data from a longitudinal study of human immunodeficiency virus (HIV)-infected women are used to illustrate the methodology.  相似文献   

11.
Variable Selection for Semiparametric Mixed Models in Longitudinal Studies   总被引:2,自引:0,他引:2  
Summary .  We propose a double-penalized likelihood approach for simultaneous model selection and estimation in semiparametric mixed models for longitudinal data. Two types of penalties are jointly imposed on the ordinary log-likelihood: the roughness penalty on the nonparametric baseline function and a nonconcave shrinkage penalty on linear coefficients to achieve model sparsity. Compared to existing estimation equation based approaches, our procedure provides valid inference for data with missing at random, and will be more efficient if the specified model is correct. Another advantage of the new procedure is its easy computation for both regression components and variance parameters. We show that the double-penalized problem can be conveniently reformulated into a linear mixed model framework, so that existing software can be directly used to implement our method. For the purpose of model inference, we derive both frequentist and Bayesian variance estimation for estimated parametric and nonparametric components. Simulation is used to evaluate and compare the performance of our method to the existing ones. We then apply the new method to a real data set from a lactation study.  相似文献   

12.
Propensity score matching (PSM) and propensity score weighting (PSW) are popular tools to estimate causal effects in observational studies. We address two open issues: how to estimate propensity scores and assess covariate balance. Using simulations, we compare the performance of PSM and PSW based on logistic regression and machine learning algorithms (CART; Bagging; Boosting; Random Forest; Neural Networks; naive Bayes). Additionally, we consider several measures of covariate balance (Absolute Standardized Average Mean (ASAM) with and without interactions; measures based on the quantile‐quantile plots; ratio between variances of propensity scores; area under the curve (AUC)) and assess their ability in predicting the bias of PSM and PSW estimators. We also investigate the importance of tuning of machine learning parameters in the context of propensity score methods. Two simulation designs are employed. In the first, the generating processes are inspired to birth register data used to assess the effect of labor induction on the occurrence of caesarean section. The second exploits more general generating mechanisms. Overall, among the different techniques, random forests performed the best, especially in PSW. Logistic regression and neural networks also showed an excellent performance similar to that of random forests. As for covariate balance, the simplest and commonly used metric, the ASAM, showed a strong correlation with the bias of causal effects estimators. Our findings suggest that researchers should aim at obtaining an ASAM lower than 10% for as many variables as possible. In the empirical study we found that labor induction had a small and not statistically significant impact on caesarean section.  相似文献   

13.
Several methods exist for investigation of the relationship between records and weather data. These can be broadly classified into models that attempt to incorporate information about underlying biological processes, such as those based on the concept of thermal time, and linear regression methods. The latter are less driven by the biology but have the advantages of ease of use and flexibility. Regression can be used where there is no obvious mechanistic model or to suggest the form of a mechanistic or empirical model where there are several to choose from. Stepwise regression is commonly used in phenology. However, it requires aggregation of the weather records, resulting in loss of information. Penalised signal regression (PSR) was recently introduced to overcome this weakness. Here, we introduce a further method to the phenology context called fusion, which is a sparse version of PSR. In this paper, we compare the performance of these three regression methods based on simulations from two types of mechanistic models, the spring warming and sequential models. Given a suitable choice of temperature days as regression covariates, PSR and fusion performed better than stepwise regression for the spring warming model and PSR performed best for the sequential model. However, if a large number of redundant temperature days were included as covariates, the performance of PSR fell off whilst fusion was quite robust to this change. For this reason, it is best to use PSR and fusion methods in tandem, and to vary the number of covariates included.  相似文献   

14.
The receiver operating characteristic (ROC) curve is often used to assess the usefulness of a diagnostic test. We present a new method to estimate the parameters of a popular semi‐parametric ROC model, called the binormal model. Our method is based on minimization of the functional distance between two estimators of an unknown transformation postulated by the model, and has a simple, closed‐form solution. We study the asymptotics of our estimators, show via simulation that they compare favorably with existing estimators, and illustrate how covariates may be incorporated into the norm minimization framework.  相似文献   

15.
Summary Clinicians are often interested in the effect of covariates on survival probabilities at prespecified study times. Because different factors can be associated with the risk of short‐ and long‐term failure, a flexible modeling strategy is pursued. Given a set of multiple candidate working models, an objective methodology is proposed that aims to construct consistent and asymptotically normal estimators of regression coefficients and average prediction error for each working model, that are free from the nuisance censoring variable. It requires the conditional distribution of censoring given covariates to be modeled. The model selection strategy uses stepup or stepdown multiple hypothesis testing procedures that control either the proportion of false positives or generalized familywise error rate when comparing models based on estimates of average prediction error. The context can actually be cast as a missing data problem, where augmented inverse probability weighted complete case estimators of regression coefficients and prediction error can be used ( Tsiatis, 2006 , Semiparametric Theory and Missing Data). A simulation study and an interesting analysis of a recent AIDS trial are provided.  相似文献   

16.
Due to increasing discoveries of biomarkers and observed diversity among patients, there is growing interest in personalized medicine for the purpose of increasing the well‐being of patients (ethics) and extending human life. In fact, these biomarkers and observed heterogeneity among patients are useful covariates that can be used to achieve the ethical goals of clinical trials and improving the efficiency of statistical inference. Covariate‐adjusted response‐adaptive (CARA) design was developed to use information in such covariates in randomization to maximize the well‐being of participating patients as well as increase the efficiency of statistical inference at the end of a clinical trial. In this paper, we establish conditions for consistency and asymptotic normality of maximum likelihood (ML) estimators of generalized linear models (GLM) for a general class of adaptive designs. We prove that the ML estimators are consistent and asymptotically follow a multivariate Gaussian distribution. The efficiency of the estimators and the performance of response‐adaptive (RA), CARA, and completely randomized (CR) designs are examined based on the well‐being of patients under a logit model with categorical covariates. Results from our simulation studies and application to data from a clinical trial on stroke prevention in atrial fibrillation (SPAF) show that RA designs lead to ethically desirable outcomes as well as higher statistical efficiency compared to CARA designs if there is no treatment by covariate interaction in an ideal model. CARA designs were however more ethical than RA designs when there was significant interaction.  相似文献   

17.
Song X  Wang CY 《Biometrics》2008,64(2):557-566
Summary .   We study joint modeling of survival and longitudinal data. There are two regression models of interest. The primary model is for survival outcomes, which are assumed to follow a time-varying coefficient proportional hazards model. The second model is for longitudinal data, which are assumed to follow a random effects model. Based on the trajectory of a subject's longitudinal data, some covariates in the survival model are functions of the unobserved random effects. Estimated random effects are generally different from the unobserved random effects and hence this leads to covariate measurement error. To deal with covariate measurement error, we propose a local corrected score estimator and a local conditional score estimator. Both approaches are semiparametric methods in the sense that there is no distributional assumption needed for the underlying true covariates. The estimators are shown to be consistent and asymptotically normal. However, simulation studies indicate that the conditional score estimator outperforms the corrected score estimator for finite samples, especially in the case of relatively large measurement error. The approaches are demonstrated by an application to data from an HIV clinical trial.  相似文献   

18.
We propose an extension to the estimating equations in generalized linear models to estimate parameters in the link function and variance structure simultaneously with regression coefficients. Rather than focusing on the regression coefficients, the purpose of these models is inference about the mean of the outcome as a function of a set of covariates, and various functionals of the mean function used to measure the effects of the covariates. A commonly used functional in econometrics, referred to as the marginal effect, is the partial derivative of the mean function with respect to any covariate, averaged over the empirical distribution of covariates in the model. We define an analogous parameter for discrete covariates. The proposed estimation method not only helps to identify an appropriate link function and to suggest an underlying distribution for a specific application but also serves as a robust estimator when no specific distribution for the outcome measure can be identified. Using Monte Carlo simulations, we show that the resulting parameter estimators are consistent. The method is illustrated with an analysis of inpatient expenditure data from a study of hospitalists.  相似文献   

19.
Chen B  Zhou XH 《Biometrics》2011,67(3):830-842
Longitudinal studies often feature incomplete response and covariate data. Likelihood-based methods such as the expectation-maximization algorithm give consistent estimators for model parameters when data are missing at random (MAR) provided that the response model and the missing covariate model are correctly specified; however, we do not need to specify the missing data mechanism. An alternative method is the weighted estimating equation, which gives consistent estimators if the missing data and response models are correctly specified; however, we do not need to specify the distribution of the covariates that have missing values. In this article, we develop a doubly robust estimation method for longitudinal data with missing response and missing covariate when data are MAR. This method is appealing in that it can provide consistent estimators if either the missing data model or the missing covariate model is correctly specified. Simulation studies demonstrate that this method performs well in a variety of situations.  相似文献   

20.
Errors‐in‐variables models in high‐dimensional settings pose two challenges in application. First, the number of observed covariates is larger than the sample size, while only a small number of covariates are true predictors under an assumption of model sparsity. Second, the presence of measurement error can result in severely biased parameter estimates, and also affects the ability of penalized methods such as the lasso to recover the true sparsity pattern. A new estimation procedure called SIMulation‐SELection‐EXtrapolation (SIMSELEX) is proposed. This procedure makes double use of lasso methodology. First, the lasso is used to estimate sparse solutions in the simulation step, after which a group lasso is implemented to do variable selection. The SIMSELEX estimator is shown to perform well in variable selection, and has significantly lower estimation error than naive estimators that ignore measurement error. SIMSELEX can be applied in a variety of errors‐in‐variables settings, including linear models, generalized linear models, and Cox survival models. It is furthermore shown in the Supporting Information how SIMSELEX can be applied to spline‐based regression models. A simulation study is conducted to compare the SIMSELEX estimators to existing methods in the linear and logistic model settings, and to evaluate performance compared to naive methods in the Cox and spline models. Finally, the method is used to analyze a microarray dataset that contains gene expression measurements of favorable histology Wilms tumors.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号