期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Doubly robust estimates for binary longitudinal data analysis with missing response and missing covariates

Chen B Zhou XH 《Biometrics》2011,67(3):830-842

Longitudinal studies often feature incomplete response and covariate data. Likelihood-based methods such as the expectation-maximization algorithm give consistent estimators for model parameters when data are missing at random (MAR) provided that the response model and the missing covariate model are correctly specified; however, we do not need to specify the missing data mechanism. An alternative method is the weighted estimating equation, which gives consistent estimators if the missing data and response models are correctly specified; however, we do not need to specify the distribution of the covariates that have missing values. In this article, we develop a doubly robust estimation method for longitudinal data with missing response and missing covariate when data are MAR. This method is appealing in that it can provide consistent estimators if either the missing data model or the missing covariate model is correctly specified. Simulation studies demonstrate that this method performs well in a variety of situations. 相似文献

2.

Marginal analysis of incomplete longitudinal binary data: a cautionary note on LOCF imputation

Cook RJ Zeng L Yi GY 《Biometrics》2004,60(3):820-828

In recent years there has been considerable research devoted to the development of methods for the analysis of incomplete data in longitudinal studies. Despite these advances, the methods used in practice have changed relatively little, particularly in the reporting of pharmaceutical trials. In this setting, perhaps the most widely adopted strategy for dealing with incomplete longitudinal data is imputation by the "last observation carried forward" (LOCF) approach, in which values for missing responses are imputed using observations from the most recently completed assessment. We examine the asymptotic and empirical bias, the empirical type I error rate, and the empirical coverage probability associated with estimators and tests of treatment effect based on the LOCF imputation strategy. We consider a setting involving longitudinal binary data with longitudinal analyses based on generalized estimating equations, and an analysis based simply on the response at the end of the scheduled follow-up. We find that for both of these approaches, imputation by LOCF can lead to substantial biases in estimators of treatment effects, the type I error rates of associated tests can be greatly inflated, and the coverage probability can be far from the nominal level. Alternative analyses based on all available data lead to estimators with comparatively small bias, and inverse probability weighted analyses yield consistent estimators subject to correct specification of the missing data process. We illustrate the differences between various methods of dealing with drop-outs using data from a study of smoking behavior. 相似文献

3.

Strategies for analysing missing item response data with an application to lung cancer

Sheng X Carrière KC 《Biometrical journal. Biometrische Zeitschrift》2005,47(5):605-615

Missing data problems persist in many scientific investigations. Although various strategies for analyzing missing data have been proposed, they are mainly limited to data on continuous measurements. In this paper, we focus on implementing some of the available strategies to analyze item response data. In particular, we investigate the effects of popular missing data methods on various missing data mechanisms. We examine large sample behaviors of estimators in a simulation study that evaluates and compares their performance. We use data from a quality of life study with lung cancer patients to illustrate the utility of these methods. 相似文献

4.

Weighted regression analysis to correct for informative monitoring times and confounders in longitudinal studies

Janie Coulombe Erica E. M. Moodie Robert W. Platt 《Biometrics》2021,77(1):162-174

We address estimation of the marginal effect of a time‐varying binary treatment on a continuous longitudinal outcome in the context of observational studies using electronic health records, when the relationship of interest is confounded, mediated, and further distorted by an informative visit process. We allow the longitudinal outcome to be recorded only sporadically and assume that its monitoring timing is informed by patients' characteristics. We propose two novel estimators based on linear models for the mean outcome that incorporate an adjustment for confounding and informative monitoring process through generalized inverse probability of treatment weights and a proportional intensity model, respectively. We allow for a flexible modeling of the intercept function as a function of time. Our estimators have closed‐form solutions, and their asymptotic distributions can be derived. Extensive simulation studies show that both estimators outperform standard methods such as the ordinary least squares estimator or estimators that only account for informative monitoring or confounders. We illustrate our methods using data from the Add Health study, assessing the effect of depressive mood on weight in adolescents. 相似文献

5.

Adjustment for Missingness Using Auxiliary Information in Semiparametric Regression

Donglin Zeng Qingxia Chen 《Biometrics》2010,66(1):115-122

Summary . In this article, we study the estimation of mean response and regression coefficient in semiparametric regression problems when response variable is subject to nonrandom missingness. When the missingness is independent of the response conditional on high-dimensional auxiliary information, the parametric approach may misspecify the relationship between covariates and response while the nonparametric approach is infeasible because of the curse of dimensionality. To overcome this, we study a model-based approach to condense the auxiliary information and estimate the parameters of interest nonparametrically on the condensed covariate space. Our estimators possess the double robustness property, i.e., they are consistent whenever the model for the response given auxiliary covariates or the model for the missingness given auxiliary covariate is correct. We conduct a number of simulations to compare the numerical performance between our estimators and other existing estimators in the current missing data literature, including the propensity score approach and the inverse probability weighted estimating equation. A set of real data is used to illustrate our approach. 相似文献

6.

A simulation-based marginal method for longitudinal data with dropout and mismeasured covariates

Yi GY 《Biostatistics (Oxford, England)》2008,9(3):501-512

Longitudinal data often contain missing observations and error-prone covariates. Extensive attention has been directed to analysis methods to adjust for the bias induced by missing observations. There is relatively little work on investigating the effects of covariate measurement error on estimation of the response parameters, especially on simultaneously accounting for the biases induced by both missing values and mismeasured covariates. It is not clear what the impact of ignoring measurement error is when analyzing longitudinal data with both missing observations and error-prone covariates. In this article, we study the effects of covariate measurement error on estimation of the response parameters for longitudinal studies. We develop an inference method that adjusts for the biases induced by measurement error as well as by missingness. The proposed method does not require the full specification of the distribution of the response vector but only requires modeling its mean and variance structures. Furthermore, the proposed method employs the so-called functional modeling strategy to handle the covariate process, with the distribution of covariates left unspecified. These features, plus the simplicity of implementation, make the proposed method very attractive. In this paper, we establish the asymptotic properties for the resulting estimators. With the proposed method, we conduct sensitivity analyses on a cohort data set arising from the Framingham Heart Study. Simulation studies are carried out to evaluate the impact of ignoring covariate measurement error and to assess the performance of the proposed method. 相似文献

7.

Missing covariates in longitudinal data with informative dropouts: bias analysis and inference

Roy J Lin X 《Biometrics》2005,61(3):837-846

We consider estimation in generalized linear mixed models (GLMM) for longitudinal data with informative dropouts. At the time a unit drops out, time-varying covariates are often unobserved in addition to the missing outcome. However, existing informative dropout models typically require covariates to be completely observed. This assumption is not realistic in the presence of time-varying covariates. In this article, we first study the asymptotic bias that would result from applying existing methods, where missing time-varying covariates are handled using naive approaches, which include: (1) using only baseline values; (2) carrying forward the last observation; and (3) assuming the missing data are ignorable. Our asymptotic bias analysis shows that these naive approaches yield inconsistent estimators of model parameters. We next propose a selection/transition model that allows covariates to be missing in addition to the outcome variable at the time of dropout. The EM algorithm is used for inference in the proposed model. Data from a longitudinal study of human immunodeficiency virus (HIV)-infected women are used to illustrate the methodology. 相似文献

8.

Robust Estimation of Area Under ROC Curve Using Auxiliary Variables in the Presence of Missing Biomarker Values

Qi Long Xiaoxi Zhang Brent A. Johnson 《Biometrics》2011,67(2):559-567

Summary In medical research, the receiver operating characteristic (ROC) curves can be used to evaluate the performance of biomarkers for diagnosing diseases or predicting the risk of developing a disease in the future. The area under the ROC curve (ROC AUC), as a summary measure of ROC curves, is widely utilized, especially when comparing multiple ROC curves. In observational studies, the estimation of the AUC is often complicated by the presence of missing biomarker values, which means that the existing estimators of the AUC are potentially biased. In this article, we develop robust statistical methods for estimating the ROC AUC and the proposed methods use information from auxiliary variables that are potentially predictive of the missingness of the biomarkers or the missing biomarker values. We are particularly interested in auxiliary variables that are predictive of the missing biomarker values. In the case of missing at random (MAR), that is, missingness of biomarker values only depends on the observed data, our estimators have the attractive feature of being consistent if one correctly specifies, conditional on auxiliary variables and disease status, either the model for the probabilities of being missing or the model for the biomarker values. In the case of missing not at random (MNAR), that is, missingness may depend on the unobserved biomarker values, we propose a sensitivity analysis to assess the impact of MNAR on the estimation of the ROC AUC. The asymptotic properties of the proposed estimators are studied and their finite‐sample behaviors are evaluated in simulation studies. The methods are further illustrated using data from a study of maternal depression during pregnancy. 相似文献

9.

Multiple-Imputation-Based Residuals and Diagnostic Plots for Joint Models of Longitudinal and Survival Outcomes

Dimitris Rizopoulos Geert Verbeke Geert Molenberghs 《Biometrics》2010,66(1):20-29

Summary . The majority of the statistical literature for the joint modeling of longitudinal and time-to-event data has focused on the development of models that aim at capturing specific aspects of the motivating case studies. However, little attention has been given to the development of diagnostic and model-assessment tools. The main difficulty in using standard model diagnostics in joint models is the nonrandom dropout in the longitudinal outcome caused by the occurrence of events. In particular, the reference distribution of statistics, such as the residuals, in missing data settings is not directly available and complex calculations are required to derive it. In this article, we propose a multiple-imputation-based approach for creating multiple versions of the completed data set under the assumed joint model. Residuals and diagnostic plots for the complete data model can then be calculated based on these imputed data sets. Our proposals are exemplified using two real data sets. 相似文献

10.

Identifiability and estimation of causal effects in randomized trials with noncompliance and completely nonignorable missing data

Chen H Geng Z Zhou XH 《Biometrics》2009,65(3):675-682

Summary . In this article, we first study parameter identifiability in randomized clinical trials with noncompliance and missing outcomes. We show that under certain conditions the parameters of interest are identifiable even under different types of completely nonignorable missing data: that is, the missing mechanism depends on the outcome. We then derive their maximum likelihood and moment estimators and evaluate their finite-sample properties in simulation studies in terms of bias, efficiency, and robustness. Our sensitivity analysis shows that the assumed nonignorable missing-data model has an important impact on the estimated complier average causal effect (CACE) parameter. Our new method provides some new and useful alternative nonignorable missing-data models over the existing latent ignorable model, which guarantees parameter identifiability, for estimating the CACE in a randomized clinical trial with noncompliance and missing data. 相似文献

11.

Testing for temporal variation in diversification rates when sampling is incomplete and nonrandom

Brock CD Harmon LJ Alfaro ME 《Systematic biology》2011,60(4):410-419

A common pattern found in phylogeny-based empirical studies of diversification is a decrease in the rate of lineage accumulation toward the present. This early-burst pattern of cladogenesis is often interpreted as a signal of adaptive radiation or density-dependent processes of diversification. However, incomplete taxonomic sampling is also known to artifactually produce patterns of rapid initial diversification. The Monte Carlo constant rates (MCCR) test, based upon Pybus and Harvey's gamma (γ)-statistic, is commonly used to accommodate incomplete sampling, but this test assumes that missing taxa have been randomly pruned from the phylogeny. Here we use simulations to show that preferentially sampling disparate lineages within a clade can produce severely inflated type-I error rates of the MCCR test, especially when taxon sampling drops below 75%. We first propose two corrections for the standard MCCR test, the proportionally deeper splits that assumes missing taxa are more likely to be recently diverged, and the deepest splits only MCCR that assumes that all missing taxa are the youngest lineages in the clade, and assess their statistical properties. We then extend these two tests into a generalized form that allows the degree of nonrandom sampling (NRS)to be controlled by a scaling parameter, α. This generalized test is then applied to two recent studies. This new test allows systematists to account for nonrandom taxonomic sampling when assessing temporal patterns of lineage diversification in empirical trees. Given the dramatic affect NRS can have on the behavior of the MCCR test, we argue that evaluating the sensitivity of this test to NRS should become the norm when investigating patterns of cladogenesis in incompletely sampled phylogenies. 相似文献

12.

Latent pattern mixture models for informative intermittent missing data in longitudinal studies

Lin H McCulloch CE Rosenheck RA 《Biometrics》2004,60(2):295-305

A frequently encountered problem in longitudinal studies is data that are missing due to missed visits or dropouts. In the statistical literature, interest has primarily focused on monotone missing data (dropout) with much less work on intermittent missing data in which a subject may return after one or more missed visits. Intermittent missing data have broader applicability that can include the frequent situation in which subjects do not have common sets of visit times or they visit at nonprescheduled times. In this article, we propose a latent pattern mixture model (LPMM), where the mixture patterns are formed from latent classes that link the longitudinal response and the missingness process. This allows us to handle arbitrary patterns of missing data embodied by subjects' visit process, and avoids the need to specify the mixture patterns a priori. One assumption of our model is that the missingness process is assumed to be conditionally independent of the longitudinal outcomes given the latent classes. We propose a noniterative approach to assess this key assumption. The LPMM is illustrated with a data set from a health service research study in which homeless people with mental illness were randomized to three different service packages and measures of homelessness were recorded at multiple time points. Our model suggests the presence of four latent classes linking subject visit patterns to homeless outcomes. 相似文献

13.

On the Impact of Parametric Assumptions and Robust Alternatives for Longitudinal Data Analysis

Naiji Lu Wan Tang Hua He Qin Yu Paul Crits‐Christoph Hui Zhang Xin Tu 《Biometrical journal. Biometrische Zeitschrift》2009,51(4):627-643

Models for longitudinal data are employed in a wide range of behavioral, biomedical, psychosocial, and health‐care‐related research. One popular model for continuous response is the linear mixed‐effects model (LMM). Although simulations by recent studies show that LMM provides reliable estimates under departures from the normality assumption for complete data, the invariable occurrence of missing data in practical studies renders such robustness results less useful when applied to real study data. In this paper, we show by simulated studies that in the presence of missing data estimates of the fixed effect of LMM are biased under departures from normality. We discuss two robust alternatives, the weighted generalized estimating equations (WGEE) and the augmented WGEE (AWGEE), and compare their performances with LMM using real as well as simulated data. Our simulation results show that both WGEE and AWGEE provide valid inference for skewed non‐normal data when missing data follows the missing at random, the most popular missing data mechanism for real study data. 相似文献

14.

Numerical equivalence of imputing scores and weighted estimators in regression analysis with missing covariates

Wang CY Lee SM Chao EC 《Biostatistics (Oxford, England)》2007,8(2):468-473

Imputation, weighting, direct likelihood, and direct Bayesian inference (Rubin, 1976) are important approaches for missing data regression. Many useful semiparametric estimators have been developed for regression analysis of data with missing covariates or outcomes. It has been established that some semiparametric estimators are asymptotically equivalent, but it has not been shown that many are numerically the same. We applied some existing methods to a bladder cancer case-control study and noted that they were the same numerically when the observed covariates and outcomes are categorical. To understand the analytical background of this finding, we further show that when observed covariates and outcomes are categorical, some estimators are not only asymptotically equivalent but also actually numerically identical. That is, although their estimating equations are different, they lead numerically to exactly the same root. This includes a simple weighted estimator, an augmented weighted estimator, and a mean-score estimator. The numerical equivalence may elucidate the relationship between imputing scores and weighted estimation procedures. 相似文献

15.

A comparison of single‐sample estimators of effective population sizes from genetic marker data

Jinliang Wang 《Molecular ecology》2016,25(19):4692-4711

In molecular ecology and conservation genetics studies, the important parameter of effective population size (N_e) is increasingly estimated from a single sample of individuals taken at random from a population and genotyped at a number of marker loci. Several estimators are developed, based on the information of linkage disequilibrium (LD), heterozygote excess (HE), molecular coancestry (MC) and sibship frequency (SF) in marker data. The most popular is the LD estimator, because it is more accurate than HE and MC estimators and is simpler to calculate than SF estimator. However, little is known about the accuracy of LD estimator relative to that of SF and about the robustness of all single‐sample estimators when some simplifying assumptions (e.g. random mating, no linkage, no genotyping errors) are violated. This study fills the gaps and uses extensive simulations to compare the biases and accuracies of the four estimators for different population properties (e.g. bottlenecks, nonrandom mating, haplodiploid), marker properties (e.g. linkage, polymorphisms) and sample properties (e.g. numbers of individuals and markers) and to compare the robustness of the four estimators when marker data are imperfect (with allelic dropouts). Extensive simulations show that SF estimator is more accurate, has a much wider application scope (e.g. suitable to nonrandom mating such as selfing, haplodiploid species, dominant markers) and is more robust (e.g. to the presence of linkage and genotyping errors of markers) than the other estimators. An empirical data set from a Yellowstone grizzly bear population was analysed to demonstrate the use of the SF estimator in practice. 相似文献

16.

Utilization Distribution Estimation Using Weighted Kernel Density Estimators

JOHN FIEBERG 《The Journal of wildlife management》2007,71(5):1669-1675

相似文献

17.

Improved Doubly Robust Estimation When Data Are Monotonely Coarsened,with Application to Longitudinal Studies with Dropout

Anastasios A. Tsiatis Marie Davidian Weihua Cao 《Biometrics》2011,67(2):536-545

Summary A routine challenge is that of making inference on parameters in a statistical model of interest from longitudinal data subject to dropout, which are a special case of the more general setting of monotonely coarsened data. Considerable recent attention has focused on doubly robust (DR) estimators, which in this context involve positing models for both the missingness (more generally, coarsening) mechanism and aspects of the distribution of the full data, that have the appealing property of yielding consistent inferences if only one of these models is correctly specified. DR estimators have been criticized for potentially disastrous performance when both of these models are even only mildly misspecified. We propose a DR estimator applicable in general monotone coarsening problems that achieves comparable or improved performance relative to existing DR methods, which we demonstrate via simulation studies and by application to data from an AIDS clinical trial. 相似文献

18.

Simple and efficient analysis of disease association with missing genotype data 总被引：1，自引：0，他引：1

下载免费PDF全文

Lin DY Hu Y Huang BE 《American journal of human genetics》2008,82(2):444-452

Missing genotype data arise in association studies when the single-nucleotide polymorphisms (SNPs) on the genotyping platform are not assayed successfully, when the SNPs of interest are not on the platform, or when total sequence variation is determined only on a small fraction of individuals. We present a simple and flexible likelihood framework to study SNP-disease associations with such missing genotype data. Our likelihood makes full use of all available data in case-control studies and reference panels (e.g., the HapMap), and it properly accounts for the biased nature of the case-control sampling as well as the uncertainty in inferring unknown variants. The corresponding maximum-likelihood estimators for genetic effects and gene-environment interactions are unbiased and statistically efficient. We developed fast and stable numerical algorithms to calculate the maximum-likelihood estimators and their variances, and we implemented these algorithms in a freely available computer program. Simulation studies demonstrated that the new approach is more powerful than existing methods while providing accurate control of the type I error. An application to a case-control study on rheumatoid arthritis revealed several loci that deserve further investigations. 相似文献

19.

A robust method using propensity score stratification for correcting verification bias for binary tests

He H McDermott MP 《Biostatistics (Oxford, England)》2012,13(1):32-47

Sensitivity and specificity are common measures of the accuracy of a diagnostic test. The usual estimators of these quantities are unbiased if data on the diagnostic test result and the true disease status are obtained from all subjects in an appropriately selected sample. In some studies, verification of the true disease status is performed only for a subset of subjects, possibly depending on the result of the diagnostic test and other characteristics of the subjects. Estimators of sensitivity and specificity based on this subset of subjects are typically biased; this is known as verification bias. Methods have been proposed to correct verification bias under the assumption that the missing data on disease status are missing at random (MAR), that is, the probability of missingness depends on the true (missing) disease status only through the test result and observed covariate information. When some of the covariates are continuous, or the number of covariates is relatively large, the existing methods require parametric models for the probability of disease or the probability of verification (given the test result and covariates), and hence are subject to model misspecification. We propose a new method for correcting verification bias based on the propensity score, defined as the predicted probability of verification given the test result and observed covariates. This is estimated separately for those with positive and negative test results. The new method classifies the verified sample into several subsamples that have homogeneous propensity scores and allows correction for verification bias. Simulation studies demonstrate that the new estimators are more robust to model misspecification than existing methods, but still perform well when the models for the probability of disease and probability of verification are correctly specified. 相似文献

20.

Estimation in Semiparametric Transition Measurement Error Models for Longitudinal Data

Wenqin Pan Donglin Zeng Xihong Lin 《Biometrics》2009,65(3):728-736

Summary . We consider semiparametric transition measurement error models for longitudinal data, where one of the covariates is measured with error in transition models, and no distributional assumption is made for the underlying unobserved covariate. An estimating equation approach based on the pseudo conditional score method is proposed. We show the resulting estimators of the regression coefficients are consistent and asymptotically normal. We also discuss the issue of efficiency loss. Simulation studies are conducted to examine the finite-sample performance of our estimators. The longitudinal AIDS Costs and Services Utilization Survey data are analyzed for illustration. 相似文献