首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In cluster randomized trials (CRTs), identifiable clusters rather than individuals are randomized to study groups. Resulting data often consist of a small number of clusters with correlated observations within a treatment group. Missing data often present a problem in the analysis of such trials, and multiple imputation (MI) has been used to create complete data sets, enabling subsequent analysis with well-established analysis methods for CRTs. We discuss strategies for accounting for clustering when multiply imputing a missing continuous outcome, focusing on estimation of the variance of group means as used in an adjusted t-test or ANOVA. These analysis procedures are congenial to (can be derived from) a mixed effects imputation model; however, this imputation procedure is not yet available in commercial statistical software. An alternative approach that is readily available and has been used in recent studies is to include fixed effects for cluster, but the impact of using this convenient method has not been studied. We show that under this imputation model the MI variance estimator is positively biased and that smaller intraclass correlations (ICCs) lead to larger overestimation of the MI variance. Analytical expressions for the bias of the variance estimator are derived in the case of data missing completely at random, and cases in which data are missing at random are illustrated through simulation. Finally, various imputation methods are applied to data from the Detroit Middle School Asthma Project, a recent school-based CRT, and differences in inference are compared.  相似文献   

2.
We focus on the problem of generalizing a causal effect estimated on a randomized controlled trial (RCT) to a target population described by a set of covariates from observational data. Available methods such as inverse propensity sampling weighting are not designed to handle missing values, which are however common in both data sources. In addition to coupling the assumptions for causal effect identifiability and for the missing values mechanism and to defining appropriate estimation strategies, one difficulty is to consider the specific structure of the data with two sources and treatment and outcome only available in the RCT. We propose three multiple imputation strategies to handle missing values when generalizing treatment effects, each handling the multisource structure of the problem differently (separate imputation, joint imputation with fixed effect, joint imputation ignoring source information). As an alternative to multiple imputation, we also propose a direct estimation approach that treats incomplete covariates as semidiscrete variables. The multiple imputation strategies and the latter alternative rely on different sets of assumptions concerning the impact of missing values on identifiability. We discuss these assumptions and assess the methods through an extensive simulation study. This work is motivated by the analysis of a large registry of over 20,000 major trauma patients and an RCT studying the effect of tranexamic acid administration on mortality in major trauma patients admitted to intensive care units. The analysis illustrates how the missing values handling can impact the conclusion about the effect generalized from the RCT to the target population.  相似文献   

3.
Longitudinal data often encounter missingness with monotone and/or intermittent missing patterns. Multiple imputation (MI) has been popularly employed for analysis of missing longitudinal data. In particular, the MI‐GEE method has been proposed for inference of generalized estimating equations (GEE) when missing data are imputed via MI. However, little is known about how to perform model selection with multiply imputed longitudinal data. In this work, we extend the existing GEE model selection criteria, including the “quasi‐likelihood under the independence model criterion” (QIC) and the “missing longitudinal information criterion” (MLIC), to accommodate multiple imputed datasets for selection of the MI‐GEE mean model. According to real data analyses from a schizophrenia study and an AIDS study, as well as simulations under nonmonotone missingness with moderate proportion of missing observations, we conclude that: (i) more than a few imputed datasets are required for stable and reliable model selection in MI‐GEE analysis; (ii) the MI‐based GEE model selection methods with a suitable number of imputations generally perform well, while the naive application of existing model selection methods by simply ignoring missing observations may lead to very poor performance; (iii) the model selection criteria based on improper (frequentist) multiple imputation generally performs better than their analogies based on proper (Bayesian) multiple imputation.  相似文献   

4.
In this article, we address a missing data problem that occurs in transplant survival studies. Recipients of organ transplants are followed up from transplantation and their survival times recorded, together with various explanatory variables. Due to differences in data collection procedures in different centers or over time, a particular explanatory variable (or set of variables) may only be recorded for certain recipients, which results in this variable being missing for a substantial number of records in the data. The variable may also turn out to be an important predictor of survival and so it is important to handle this missing-by-design problem appropriately. Consensus in the literature is to handle this problem with complete case analysis, as the missing data are assumed to arise under an appropriate missing at random mechanism that gives consistent estimates here. Specifically, the missing values can reasonably be assumed not to be related to the survival time. In this article, we investigate the potential for multiple imputation to handle this problem in a relevant study on survival after kidney transplantation, and show that it comprehensively outperforms complete case analysis on a range of measures. This is a particularly important finding in the medical context as imputing large amounts of missing data is often viewed with scepticism.  相似文献   

5.
It is not uncommon for biological anthropologists to analyze incomplete bioarcheological or forensic skeleton specimens. As many quantitative multivariate analyses cannot handle incomplete data, missing data imputation or estimation is a common preprocessing practice for such data. Using William W. Howells' Craniometric Data Set and the Goldman Osteometric Data Set, we evaluated the performance of multiple popular statistical methods for imputing missing metric measurements. Results indicated that multiple imputation methods outperformed single imputation methods, such as Bayesian principal component analysis (BPCA). Multiple imputation with Bayesian linear regression implemented in the R package norm2, the Expectation–Maximization (EM) with Bootstrapping algorithm implemented in Amelia, and the Predictive Mean Matching (PMM) method and several of the derivative linear regression models implemented in mice, perform well regarding accuracy, robustness, and speed. Based on the findings of this study, we suggest a practical procedure for choosing appropriate imputation methods.  相似文献   

6.
Multiple imputation (MI) is increasingly popular for handling multivariate missing data. Two general approaches are available in standard computer packages: MI based on the posterior distribution of incomplete variables under a multivariate (joint) model, and fully conditional specification (FCS), which imputes missing values using univariate conditional distributions for each incomplete variable given all the others, cycling iteratively through the univariate imputation models. In the context of longitudinal or clustered data, it is not clear whether these approaches result in consistent estimates of regression coefficient and variance component parameters when the analysis model of interest is a linear mixed effects model (LMM) that includes both random intercepts and slopes with either covariates or both covariates and outcome contain missing information. In the current paper, we compared the performance of seven different MI methods for handling missing values in longitudinal and clustered data in the context of fitting LMMs with both random intercepts and slopes. We study the theoretical compatibility between specific imputation models fitted under each of these approaches and the LMM, and also conduct simulation studies in both the longitudinal and clustered data settings. Simulations were motivated by analyses of the association between body mass index (BMI) and quality of life (QoL) in the Longitudinal Study of Australian Children (LSAC). Our findings showed that the relative performance of MI methods vary according to whether the incomplete covariate has fixed or random effects and whether there is missingnesss in the outcome variable. We showed that compatible imputation and analysis models resulted in consistent estimation of both regression parameters and variance components via simulation. We illustrate our findings with the analysis of LSAC data.  相似文献   

7.
Yang X  Belin TR  Boscardin WJ 《Biometrics》2005,61(2):498-506
Across multiply imputed data sets, variable selection methods such as stepwise regression and other criterion-based strategies that include or exclude particular variables typically result in models with different selected predictors, thus presenting a problem for combining the results from separate complete-data analyses. Here, drawing on a Bayesian framework, we propose two alternative strategies to address the problem of choosing among linear regression models when there are missing covariates. One approach, which we call "impute, then select" (ITS) involves initially performing multiple imputation and then applying Bayesian variable selection to the multiply imputed data sets. A second strategy is to conduct Bayesian variable selection and missing data imputation simultaneously within one Gibbs sampling process, which we call "simultaneously impute and select" (SIAS). The methods are implemented and evaluated using the Bayesian procedure known as stochastic search variable selection for multivariate normal data sets, but both strategies offer general frameworks within which different Bayesian variable selection algorithms could be used for other types of data sets. A study of mental health services utilization among children in foster care programs is used to illustrate the techniques. Simulation studies show that both ITS and SIAS outperform complete-case analysis with stepwise variable selection and that SIAS slightly outperforms ITS.  相似文献   

8.
Multiple imputation (MI) has emerged in the last two decades as a frequently used approach in dealing with incomplete data. Gaussian and log‐linear imputation models are fairly straightforward to implement for continuous and discrete data, respectively. However, in missing data settings that include a mix of continuous and discrete variables, the lack of flexible models for the joint distribution of different types of variables can make the specification of the imputation model a daunting task. The widespread availability of software packages that are capable of carrying out MI under the assumption of joint multivariate normality allows applied researchers to address this complication pragmatically by treating the discrete variables as continuous for imputation purposes and subsequently rounding the imputed values to the nearest observed category. In this article, we compare several rounding rules for binary variables based on simulated longitudinal data sets that have been used to illustrate other missing‐data techniques. Using a combination of conditional and marginal data generation mechanisms and imputation models, we study the statistical properties of multiple‐imputation‐based estimates for various population quantities under different rounding rules from bias and coverage standpoints. We conclude that a good rule should be driven by borrowing information from other variables in the system rather than relying on the marginal characteristics and should be relatively insensitive to imputation model specifications that may potentially be incompatible with the observed data. We also urge researchers to consider the applied context and specific nature of the problem, to avoid uncritical and possibly inappropriate use of rounding in imputation models.  相似文献   

9.
DNA methylation is a widely studied epigenetic mechanism and alterations in methylation patterns may be involved in the development of common diseases. Unlike inherited changes in genetic sequence, variation in site-specific methylation varies by tissue, developmental stage, and disease status, and may be impacted by aging and exposure to environmental factors, such as diet or smoking. These non-genetic factors are typically included in epigenome-wide association studies (EWAS) because they may be confounding factors to the association between methylation and disease. However, missing values in these variables can lead to reduced sample size and decrease the statistical power of EWAS. We propose a site selection and multiple imputation (MI) method to impute missing covariate values and to perform association tests in EWAS. Then, we compare this method to an alternative projection-based method. Through simulations, we show that the MI-based method is slightly conservative, but provides consistent estimates for effect size. We also illustrate these methods with data from the Atherosclerosis Risk in Communities (ARIC) study to carry out an EWAS between methylation levels and smoking status, in which missing cell type compositions and white blood cell counts are imputed.  相似文献   

10.
Food frequency questionnaires (FFQs) are commonly used to assess dietary intake in epidemiologic research. To evaluate the FFQ reliability, the commonly used approach is to estimate the correlation coefficient between the data given in FFQ and those in food records (for example, 4-day food records [4DFR]) for nutrients of interest. However, in a dietary intervention study, a criterion for eligibility may be to select participants who have baseline FFQ-measured dietary intake of percent energy from fat above a prespecified quantity. Other instruments, such as the 4DFR, may be subsequently administrated only to eligible participants. Under these circumstances, analysis without adjusting for the restricted population will usually lead to biased estimation of correlation coefficients and other parameters of interest. In this paper, we apply likelihood-based and multiple imputation (MI) methods to accommodate such incomplete data obtained as a result of the study design. A simulation study is conducted to examine finite sample performance of various estimators. We note that both the MI estimate and the maximum likelihood (ML) estimate based on a bivariate-normal model are not sensitive to departures from this normality assumption. This led us to investigate robustness properties of the ML estimator analytically. We present some data analyses from a dietary assessment study from the Women's Health Initiative to illustrate the methods.  相似文献   

11.
Analysts often estimate treatment effects in observational studies using propensity score matching techniques. When there are missing covariate values, analysts can multiply impute the missing data to create m completed data sets. Analysts can then estimate propensity scores on each of the completed data sets, and use these to estimate treatment effects. However, there has been relatively little attention on developing imputation models to deal with the additional problem of missing treatment indicators, perhaps due to the consequences of generating implausible imputations. However, simply ignoring the missing treatment values, akin to a complete case analysis, could also lead to problems when estimating treatment effects. We propose a latent class model to multiply impute missing treatment indicators. We illustrate its performance through simulations and with data taken from a study on determinants of children's cognitive development. This approach is seen to obtain treatment effect estimates closer to the true treatment effect than when employing conventional imputation procedures as well as compared to a complete case analysis.  相似文献   

12.
Exposure to air pollution is associated with increased morbidity and mortality. Recent technological advancements permit the collection of time-resolved personal exposure data. Such data are often incomplete with missing observations and exposures below the limit of detection, which limit their use in health effects studies. In this paper, we develop an infinite hidden Markov model for multiple asynchronous multivariate time series with missing data. Our model is designed to include covariates that can inform transitions among hidden states. We implement beam sampling, a combination of slice sampling and dynamic programming, to sample the hidden states, and a Bayesian multiple imputation algorithm to impute missing data. In simulation studies, our model excels in estimating hidden states and state-specific means and imputing observations that are missing at random or below the limit of detection. We validate our imputation approach on data from the Fort Collins Commuter Study. We show that the estimated hidden states improve imputations for data that are missing at random compared to existing approaches. In a case study of the Fort Collins Commuter Study, we describe the inferential gains obtained from our model including improved imputation of missing data and the ability to identify shared patterns in activity and exposure among repeated sampling days for individuals and among distinct individuals.  相似文献   

13.
The log response ratio, lnRR, is the most frequently used effect size statistic for meta-analysis in ecology. However, often missing standard deviations (SDs) prevent estimation of the sampling variance of lnRR. We propose new methods to deal with missing SDs via a weighted average coefficient of variation (CV) estimated from studies in the dataset that do report SDs. Across a suite of simulated conditions, we find that using the average CV to estimate sampling variances for all observations, regardless of missingness, performs with minimal bias. Surprisingly, even with missing SDs, this simple method outperforms the conventional approach (basing each effect size on its individual study-specific CV) with complete data. This is because the conventional method ultimately yields less precise estimates of the sampling variances than using the pooled CV from multiple studies. Our approach is broadly applicable and can be implemented in all meta-analyses of lnRR, regardless of ‘missingness’.  相似文献   

14.
Functional trait databases are powerful tools in ecology, though most of them contain large amounts of missing values. The goal of this study was to test the effect of imputation methods on the evaluation of trait values at species level and on the subsequent calculation of functional diversity indices at community level using functional trait databases. Two simple imputation methods (average and median), two methods based on ecological hypotheses, and one multiple imputation method were tested using a large plant trait database, together with the influence of the percentage of missing data and differences between functional traits. At community level, the complete‐case approach and three functional diversity indices calculated from grassland plant communities were included. At the species level, one of the methods based on ecological hypothesis was for all traits more accurate than imputation with average or median values, but the multiple imputation method was superior for most of the traits. The method based on functional proximity between species was the best method for traits with an unbalanced distribution, while the method based on the existence of relationships between traits was the best for traits with a balanced distribution. The ranking of the grassland communities for their functional diversity indices was not robust with the complete‐case approach, even for low percentages of missing data. With the imputation methods based on ecological hypotheses, functional diversity indices could be computed with a maximum of 30% of missing data, without affecting the ranking between grassland communities. The multiple imputation method performed well, but not better than single imputation based on ecological hypothesis and adapted to the distribution of the trait values for the functional identity and range of the communities. Ecological studies using functional trait databases have to deal with missing data using imputation methods corresponding to their specific needs and making the most out of the information available in the databases. Within this framework, this study indicates the possibilities and limits of single imputation methods based on ecological hypothesis and concludes that they could be useful when studying the ranking of communities for their functional diversity indices.  相似文献   

15.
Wang T  Wu L 《Biometrics》2011,67(4):1452-1460
Multivariate one-sided hypotheses testing problems arise frequently in practice. Various tests have been developed. In practice, there are often missing values in multivariate data. In this case, standard testing procedures based on complete data may not be applicable or may perform poorly if the missing data are discarded. In this article, we propose several multiple imputation methods for multivariate one-sided testing problem with missing data. Some theoretical results are presented. The proposed methods are evaluated using simulations. A real data example is presented to illustrate the methods.  相似文献   

16.
Yuan Y  Little RJ 《Biometrics》2009,65(2):487-496
Summary .  Consider a meta-analysis of studies with varying proportions of patient-level missing data, and assume that each primary study has made certain missing data adjustments so that the reported estimates of treatment effect size and variance are valid. These estimates of treatment effects can be combined across studies by standard meta-analytic methods, employing a random-effects model to account for heterogeneity across studies. However, we note that a meta-analysis based on the standard random-effects model will lead to biased estimates when the attrition rates of primary studies depend on the size of the underlying study-level treatment effect. Perhaps ignorable within each study, these types of missing data are in fact not ignorable in a meta-analysis. We propose three methods to correct the bias resulting from such missing data in a meta-analysis: reweighting the DerSimonian–Laird estimate by the completion rate; incorporating the completion rate into a Bayesian random-effects model; and inference based on a Bayesian shared-parameter model that includes the completion rate. We illustrate these methods through a meta-analysis of 16 published randomized trials that examined combined pharmacotherapy and psychological treatment for depression.  相似文献   

17.
Data with missing covariate values but fully observed binary outcomes are an important subset of the missing data challenge. Common approaches are complete case analysis (CCA) and multiple imputation (MI). While CCA relies on missing completely at random (MCAR), MI usually relies on a missing at random (MAR) assumption to produce unbiased results. For MI involving logistic regression models, it is also important to consider several missing not at random (MNAR) conditions under which CCA is asymptotically unbiased and, as we show, MI is also valid in some cases. We use a data application and simulation study to compare the performance of several machine learning and parametric MI methods under a fully conditional specification framework (MI-FCS). Our simulation includes five scenarios involving MCAR, MAR, and MNAR under predictable and nonpredictable conditions, where “predictable” indicates missingness is not associated with the outcome. We build on previous results in the literature to show MI and CCA can both produce unbiased results under more conditions than some analysts may realize. When both approaches were valid, we found that MI-FCS was at least as good as CCA in terms of estimated bias and coverage, and was superior when missingness involved a categorical covariate. We also demonstrate how MNAR sensitivity analysis can build confidence that unbiased results were obtained, including under MNAR-predictable, when CCA and MI are both valid. Since the missingness mechanism cannot be identified from observed data, investigators should compare results from MI and CCA when both are plausibly valid, followed by MNAR sensitivity analysis.  相似文献   

18.
GEE with Gaussian estimation of the correlations when data are incomplete   总被引:4,自引:0,他引:4  
This paper considers a modification of generalized estimating equations (GEE) for handling missing binary response data. The proposed method uses Gaussian estimation of the correlation parameters, i.e., the estimating function that yields an estimate of the correlation parameters is obtained from the multivariate normal likelihood. The proposed method yields consistent estimates of the regression parameters when data are missing completely at random (MCAR). However, when data are missing at random (MAR), consistency may not hold. In a simulation study with repeated binary outcomes that are missing at random, the magnitude of the potential bias that can arise is examined. The results of the simulation study indicate that, when the working correlation matrix is correctly specified, the bias is almost negligible for the modified GEE. In the simulation study, the proposed modification of GEE is also compared to the standard GEE, multiple imputation, and weighted estimating equations approaches. Finally, the proposed method is illustrated using data from a longitudinal clinical trial comparing two therapeutic treatments, zidovudine (AZT) and didanosine (ddI), in patients with HIV.  相似文献   

19.
In problems with missing or latent data, a standard approach is to first impute the unobserved data, then perform all statistical analyses on the completed dataset--corresponding to the observed data and imputed unobserved data--using standard procedures for complete-data inference. Here, we extend this approach to model checking by demonstrating the advantages of the use of completed-data model diagnostics on imputed completed datasets. The approach is set in the theoretical framework of Bayesian posterior predictive checks (but, as with missing-data imputation, our methods of missing-data model checking can also be interpreted as "predictive inference" in a non-Bayesian context). We consider the graphical diagnostics within this framework. Advantages of the completed-data approach include: (1) One can often check model fit in terms of quantities that are of key substantive interest in a natural way, which is not always possible using observed data alone. (2) In problems with missing data, checks may be devised that do not require to model the missingness or inclusion mechanism; the latter is useful for the analysis of ignorable but unknown data collection mechanisms, such as are often assumed in the analysis of sample surveys and observational studies. (3) In many problems with latent data, it is possible to check qualitative features of the model (for example, independence of two variables) that can be naturally formalized with the help of the latent data. We illustrate with several applied examples.  相似文献   

20.
Taylor L  Zhou XH 《Biometrics》2009,65(1):88-95
Summary .  Randomized clinical trials are a powerful tool for investigating causal treatment effects, but in human trials there are oftentimes problems of noncompliance which standard analyses, such as the intention-to-treat or as-treated analysis, either ignore or incorporate in such a way that the resulting estimand is no longer a causal effect. One alternative to these analyses is the complier average causal effect (CACE) which estimates the average causal treatment effect among a subpopulation that would comply under any treatment assigned. We focus on the setting of a randomized clinical trial with crossover treatment noncompliance (e.g., control subjects could receive the intervention and intervention subjects could receive the control) and outcome nonresponse. In this article, we develop estimators for the CACE using multiple imputation methods, which have been successfully applied to a wide variety of missing data problems, but have not yet been applied to the potential outcomes setting of causal inference. Using simulated data we investigate the finite sample properties of these estimators as well as of competing procedures in a simple setting. Finally we illustrate our methods using a real randomized encouragement design study on the effectiveness of the influenza vaccine.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号