首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Doubly robust estimation in missing data and causal inference models   总被引:3,自引:0,他引:3  
Bang H  Robins JM 《Biometrics》2005,61(4):962-973
The goal of this article is to construct doubly robust (DR) estimators in ignorable missing data and causal inference models. In a missing data model, an estimator is DR if it remains consistent when either (but not necessarily both) a model for the missingness mechanism or a model for the distribution of the complete data is correctly specified. Because with observational data one can never be sure that either a missingness model or a complete data model is correct, perhaps the best that can be hoped for is to find a DR estimator. DR estimators, in contrast to standard likelihood-based or (nonaugmented) inverse probability-weighted estimators, give the analyst two chances, instead of only one, to make a valid inference. In a causal inference model, an estimator is DR if it remains consistent when either a model for the treatment assignment mechanism or a model for the distribution of the counterfactual data is correctly specified. Because with observational data one can never be sure that a model for the treatment assignment mechanism or a model for the counterfactual data is correct, inference based on DR estimators should improve upon previous approaches. Indeed, we present the results of simulation studies which demonstrate that the finite sample performance of DR estimators is as impressive as theory would predict. The proposed method is applied to a cardiovascular clinical trial.  相似文献   

2.
Chen B  Zhou XH 《Biometrics》2011,67(3):830-842
Longitudinal studies often feature incomplete response and covariate data. Likelihood-based methods such as the expectation-maximization algorithm give consistent estimators for model parameters when data are missing at random (MAR) provided that the response model and the missing covariate model are correctly specified; however, we do not need to specify the missing data mechanism. An alternative method is the weighted estimating equation, which gives consistent estimators if the missing data and response models are correctly specified; however, we do not need to specify the distribution of the covariates that have missing values. In this article, we develop a doubly robust estimation method for longitudinal data with missing response and missing covariate when data are MAR. This method is appealing in that it can provide consistent estimators if either the missing data model or the missing covariate model is correctly specified. Simulation studies demonstrate that this method performs well in a variety of situations.  相似文献   

3.
4.
Summary In medical research, the receiver operating characteristic (ROC) curves can be used to evaluate the performance of biomarkers for diagnosing diseases or predicting the risk of developing a disease in the future. The area under the ROC curve (ROC AUC), as a summary measure of ROC curves, is widely utilized, especially when comparing multiple ROC curves. In observational studies, the estimation of the AUC is often complicated by the presence of missing biomarker values, which means that the existing estimators of the AUC are potentially biased. In this article, we develop robust statistical methods for estimating the ROC AUC and the proposed methods use information from auxiliary variables that are potentially predictive of the missingness of the biomarkers or the missing biomarker values. We are particularly interested in auxiliary variables that are predictive of the missing biomarker values. In the case of missing at random (MAR), that is, missingness of biomarker values only depends on the observed data, our estimators have the attractive feature of being consistent if one correctly specifies, conditional on auxiliary variables and disease status, either the model for the probabilities of being missing or the model for the biomarker values. In the case of missing not at random (MNAR), that is, missingness may depend on the unobserved biomarker values, we propose a sensitivity analysis to assess the impact of MNAR on the estimation of the ROC AUC. The asymptotic properties of the proposed estimators are studied and their finite‐sample behaviors are evaluated in simulation studies. The methods are further illustrated using data from a study of maternal depression during pregnancy.  相似文献   

5.
Datta S  Sundaram R 《Biometrics》2006,62(3):829-837
Multistage models are used to describe individuals (or experimental units) moving through a succession of "stages" corresponding to distinct states (e.g., healthy, diseased, diseased with complications, dead). The resulting data can be considered to be a form of multivariate survival data containing information about the transition times and the stages occupied. Traditional survival analysis is the simplest example of a multistage model, where individuals begin in an initial stage (say, alive) and move irreversibly to a second stage (death). In this article, we consider general multistage models with a directed tree structure (progressive models) in which individuals traverse through stages in a possibly non-Markovian manner. We construct nonparametric estimators of stage occupation probabilities and marginal cumulative transition hazards. Empirical calculations of these quantities are not possible due to the lack of complete data. We consider current status information which represents a more severe form of censoring than the commonly used right censoring. Asymptotic validity of our estimators can be justified using consistency results for nonparametric regression estimators. Finite-sample behavior of our estimators is studied by simulation, in which we show that our estimators based on these limited data compare well with those based on complete data. We also apply our method to a real-life data set arising from a cardiovascular diseases study in Taiwan.  相似文献   

6.
Imputation, weighting, direct likelihood, and direct Bayesian inference (Rubin, 1976) are important approaches for missing data regression. Many useful semiparametric estimators have been developed for regression analysis of data with missing covariates or outcomes. It has been established that some semiparametric estimators are asymptotically equivalent, but it has not been shown that many are numerically the same. We applied some existing methods to a bladder cancer case-control study and noted that they were the same numerically when the observed covariates and outcomes are categorical. To understand the analytical background of this finding, we further show that when observed covariates and outcomes are categorical, some estimators are not only asymptotically equivalent but also actually numerically identical. That is, although their estimating equations are different, they lead numerically to exactly the same root. This includes a simple weighted estimator, an augmented weighted estimator, and a mean-score estimator. The numerical equivalence may elucidate the relationship between imputing scores and weighted estimation procedures.  相似文献   

7.
We used survey data collected from a large plot (20 ha) of sub-tropical forest in the Dinghushan Nature Reserve, Guangdong Province, southern China, in 2005 to test the comparative performance of nine species-richness estimators (number of observed species, three species-individual curve models, five nonparametric estimators). As the true species richness, we used the 210 free-standing shrub and tree species of >1 cm diameter at breast height recorded during the survey. This true species richness was then used to calculate performance measures of bias, accuracy, and precision for each estimator, whereby we distinguished performance for low, medium, and high sampling intensity. Unsurprisingly, all estimators performed better than the number of observed species in terms of bias and accuracy. Surprisingly, however, two curve models (logistic and logarithm) outperformed all other estimators in terms of bias, accuracy, and precision, which is in contrast to most other previous studies, in which nonparametric methods usually outperform curve models. Intriguingly, relative estimator performance changed between low, medium, and high sampling intensity, sometimes dramatically, reinforcing the assertion that the influence of sampling intensity on estimator performance is an important aspect to investigate and to consider when choosing estimators for ecological surveys. Because these results are based on only one dataset, the results should be treated with caution, both because (1) the generality of these results needs to be confirmed with simulated datasets and (2) more work is needed to establish what “true” species richness is extrapolated by each of the tested estimators in both the statistical and the practical sense. Nevertheless, the two curve estimators, namely Logistic and Logarithm, should be considered in future studies of comparative performance of species-richness estimators because of their outstanding performance in this study.  相似文献   

8.
Summary .   Missing data, measurement error, and misclassification are three important problems in many research fields, such as epidemiological studies. It is well known that missing data and measurement error in covariates may lead to biased estimation. Misclassification may be considered as a special type of measurement error, for categorical data. Nevertheless, we treat misclassification as a different problem from measurement error because statistical models for them are different. Indeed, in the literature, methods for these three problems were generally proposed separately given that statistical modeling for them are very different. The problem is more challenging in a longitudinal study with nonignorable missing data. In this article, we consider estimation in generalized linear models under these three incomplete data models. We propose a general approach based on expected estimating equations (EEEs) to solve these three incomplete data problems in a unified fashion. This EEE approach can be easily implemented and its asymptotic covariance can be obtained by sandwich estimation. Intensive simulation studies are performed under various incomplete data settings. The proposed method is applied to a longitudinal study of oral bone density in relation to body bone density.  相似文献   

9.
Sensitivity and specificity are common measures used to evaluate the performance of a diagnostic test. A diagnostic test is often administrated at a subunit level, e.g. at the level of vessel, ear or eye of a patient so that the treatment can be targeted at the specific subunit. Therefore, it is essential to evaluate the diagnostic test at the subunit level. Often patients with more negative subunit test results are less likely to receive the gold standard tests than patients with more positive subunit test results. To account for this type of missing data and correlation between subunit test results, we proposed a weighted generalized estimating equations (WGEE) approach to evaluate subunit sensitivities and specificities. A simulation study was conducted to evaluate the performance of the WGEE estimators and the weighted least squares (WLS) estimators (Barnhart and Kosinski, 2003) under a missing at random assumption. The results suggested that WGEE estimator is consistent under various scenarios of percentage of missing data and sample size, while the WLS approach could yield biased estimators due to a misspecified missing data mechanism. We illustrate the methodology with a cardiology example.  相似文献   

10.
Analysis with time-to-event data in clinical and epidemiological studies often encounters missing covariate values, and the missing at random assumption is commonly adopted, which assumes that missingness depends on the observed data, including the observed outcome which is the minimum of survival and censoring time. However, it is conceivable that in certain settings, missingness of covariate values is related to the survival time but not to the censoring time. This is especially so when covariate missingness is related to an unmeasured variable affected by the patient's illness and prognosis factors at baseline. If this is the case, then the covariate missingness is not at random as the survival time is censored, and it creates a challenge in data analysis. In this article, we propose an approach to deal with such survival-time-dependent covariate missingness based on the well known Cox proportional hazard model. Our method is based on inverse propensity weighting with the propensity estimated by nonparametric kernel regression. Our estimators are consistent and asymptotically normal, and their finite-sample performance is examined through simulation. An application to a real-data example is included for illustration.  相似文献   

11.
Datta S  Satten GA 《Biometrics》2002,58(4):792-802
We propose nonparametric estimators of the stage occupation probabilities and transition hazards for a multistage system that is not necessarily Markovian, using data that are subject to dependent right censoring. We assume that the hazard of being censored at a given instant depends on a possibly time-dependent covariate process as opposed to assuming a fixed censoring hazard (independent censoring). The estimator of the integrated transition hazard matrix has a Nelson-Aalen form where each of the counting processes counting the number of transitions between states and the risk sets for leaving each stage have an IPCW (inverse probability of censoring weighted) form. We estimate these weights using Aalen's linear hazard model. Finally, the stage occupation probabilities are obtained from the estimated integrated transition hazard matrix via product integration. Consistency of these estimators under the general paradigm of non-Markov models is established and asymptotic variance formulas are provided. Simulation results show satisfactory performance of these estimators. An analysis of data on graft-versus-host disease for bone marrow transplant patients is used as an illustration.  相似文献   

12.
Estimating the richness of species with variable mobility   总被引:4,自引:0,他引:4  
Ulrich Brose  Neo D. Martinez 《Oikos》2004,105(2):292-300
The vast majority of species are animals that, unlike most plants and fungi, are variably and often highly mobile. While species' mobility affects species' probabilities of being sampled, effects of movement on the estimation of species richness have yet to be systematically investigated. Information-rich abundance-based estimators may be able to address variably mobile species but the accuracy of these estimators has also yet to be investigated. Here, we address both issues by variably sampling simulated landscapes with up to 250 species and evaluating the performance of ten non-parametric estimators and one species accumulation curve. Our results show that some abundance-based estimators are as accurate as better known and tested incidence-based estimators. Increased movement heterogeneity between the species reduced estimator performance by reducing the sample coverage, which systematically determined which estimator was most accurate. Based on these findings, we present the first decision framework for choosing the most accurate of many available abundance-based species-richness estimators. These decisions, based on data coverage, can significantly improve investigators' ability to estimate faunal species richness.  相似文献   

13.
Predictive species distribution modelling (SDM) has become an essential tool in biodiversity conservation and management. The choice of grain size (resolution) of environmental layers used in modelling is one important factor that may affect predictions. We applied 10 distinct modelling techniques to presence-only data for 50 species in five different regions, to test whether: (1) a 10-fold coarsening of resolution affects predictive performance of SDMs, and (2) any observed effects are dependent on the type of region, modelling technique, or species considered. Results show that a 10 times change in grain size does not severely affect predictions from species distribution models. The overall trend is towards degradation of model performance, but improvement can also be observed. Changing grain size does not equally affect models across regions, techniques, and species types. The strongest effect is on regions and species types, with tree species in the data sets (regions) with highest locational accuracy being most affected. Changing grain size had little influence on the ranking of techniques: boosted regression trees remain best at both resolutions. The number of occurrences used for model training had an important effect, with larger sample sizes resulting in better models, which tended to be more sensitive to grain. Effect of grain change was only noticeable for models reaching sufficient performance and/or with initial data that have an intrinsic error smaller than the coarser grain size.  相似文献   

14.
Seaman SR  White IR  Copas AJ  Li L 《Biometrics》2012,68(1):129-137
Two approaches commonly used to deal with missing data are multiple imputation (MI) and inverse-probability weighting (IPW). IPW is also used to adjust for unequal sampling fractions. MI is generally more efficient than IPW but more complex. Whereas IPW requires only a model for the probability that an individual has complete data (a univariate outcome), MI needs a model for the joint distribution of the missing data (a multivariate outcome) given the observed data. Inadequacies in either model may lead to important bias if large amounts of data are missing. A third approach combines MI and IPW to give a doubly robust estimator. A fourth approach (IPW/MI) combines MI and IPW but, unlike doubly robust methods, imputes only isolated missing values and uses weights to account for remaining larger blocks of unimputed missing data, such as would arise, e.g., in a cohort study subject to sample attrition, and/or unequal sampling fractions. In this article, we examine the performance, in terms of bias and efficiency, of IPW/MI relative to MI and IPW alone and investigate whether the Rubin's rules variance estimator is valid for IPW/MI. We prove that the Rubin's rules variance estimator is valid for IPW/MI for linear regression with an imputed outcome, we present simulations supporting the use of this variance estimator in more general settings, and we demonstrate that IPW/MI can have advantages over alternatives. IPW/MI is applied to data from the National Child Development Study.  相似文献   

15.
We propose a method to estimate the regression coefficients in a competing risks model where the cause-specific hazard for the cause of interest is related to covariates through a proportional hazards relationship and when cause of failure is missing for some individuals. We use multiple imputation procedures to impute missing cause of failure, where the probability that a missing cause is the cause of interest may depend on auxiliary covariates, and combine the maximum partial likelihood estimators computed from several imputed data sets into an estimator that is consistent and asymptotically normal. A consistent estimator for the asymptotic variance is also derived. Simulation results suggest the relevance of the theory in finite samples. Results are also illustrated with data from a breast cancer study.  相似文献   

16.
In this article, we propose a two-stage approach to modeling multilevel clustered non-Gaussian data with sufficiently large numbers of continuous measures per cluster. Such data are common in biological and medical studies utilizing monitoring or image-processing equipment. We consider a general class of hierarchical models that generalizes the model in the global two-stage (GTS) method for nonlinear mixed effects models by using any square-root-n-consistent and asymptotically normal estimators from stage 1 as pseudodata in the stage 2 model, and by extending the stage 2 model to accommodate random effects from multiple levels of clustering. The second-stage model is a standard linear mixed effects model with normal random effects, but the cluster-specific distributions, conditional on random effects, can be non-Gaussian. This methodology provides a flexible framework for modeling not only a location parameter but also other characteristics of conditional distributions that may be of specific interest. For estimation of the population parameters, we propose a conditional restricted maximum likelihood (CREML) approach and establish the asymptotic properties of the CREML estimators. The proposed general approach is illustrated using quartiles as cluster-specific parameters estimated in the first stage, and applied to the data example from a collagen fibril development study. We demonstrate using simulations that in samples with small numbers of independent clusters, the CREML estimators may perform better than conditional maximum likelihood estimators, which are a direct extension of the estimators from the GTS method.  相似文献   

17.
The present paper discusses models of Configural Frequency Analysis (CFA). For most models of CFA maximum likelihood estimators are given. For all of these models least squares estimators are also given. These estimators are equivalent to each other if quasiparametric conditions prevail. Using the second approach, the general linear model can be used to systematize CFA models. Numerical examples are given, using both artificial and psychiatric data.  相似文献   

18.
This article considers three nonparametric estimators of the joint distribution function for a survival time and a continuous mark variable when the survival time is interval censored and the mark variable may be missing for interval-censored observations. Finite and large sample properties are described for the nonparametric maximum likelihood estimator (NPMLE) as well as estimators based on midpoint imputation (MIDMLE) and coarsening the mark variable (CMLE). The estimators are compared using data from a simulation study and a recent phase III HIV vaccine efficacy trial where the survival time is the time from enrollment to infection and the mark variable is the genetic distance from the infecting HIV sequence to the HIV sequence in the vaccine. Theoretical and empirical evidence are presented indicating the NPMLE and MIDMLE are inconsistent. Conversely, the CMLE is shown to be consistent in general and thus is preferred.  相似文献   

19.
20.
It is not uncommon that we may encounter a randomized clinical trial (RCT) in which there are confounders which are needed to control and patients who do not comply with their assigned treatments. In this paper, we concentrate our attention on interval estimation of the proportion ratio (PR) of probabilities of response between two treatments in a stratified noncompliance RCT. We have developed and considered five asymptotic interval estimators for the PR, including the interval estimator using the weighted-least squares (WLS) estimator, the interval estimator using the Mantel-Haenszel type of weight, the interval estimator derived from Fieller's Theorem with the corresponding WLS optimal weight, the interval estimator derived from Fieller's Theorem with the randomization-based optimal weight, and the interval estimator based on a stratified two-sample proportion test with the optimal weight suggested elsewhere. To evaluate and compare the finite sample performance of these estimators, we apply Monte Carlo simulation to calculate the coverage probability and average length in a variety of situations. We discuss the limitation and usefulness for each of these interval estimators, as well as include a general guideline about which estimators may be used for given various situations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号