共查询到20条相似文献,搜索用时 0 毫秒
1.
Summary. The present article deals with informative missing (IM) exposure data in matched case–control studies. When the missingness mechanism depends on the unobserved exposure values, modeling the missing data mechanism is inevitable. Therefore, a full likelihood-based approach for handling IM data has been proposed by positing a model for selection probability, and a parametric model for the partially missing exposure variable among the control population along with a disease risk model. We develop an EM algorithm to estimate the model parameters. Three special cases: (a) binary exposure variable, (b) normally distributed exposure variable, and (c) lognormally distributed exposure variable are discussed in detail. The method is illustrated by analyzing a real matched case–control data with missing exposure variable. The performance of the proposed method is evaluated through simulation studies, and the robustness of the proposed method for violation of different types of model assumptions has been considered. 相似文献
2.
One of the objectives in the Northern Manhattan Stroke Study is to investigate the impact of stroke subtype on the functional status 2 years after the first ischemic stroke. A challenge in this analysis is that the functional status at 2 years after stroke is not completely observed. In this paper, we propose a method to handle nonignorably missing binary functional status when the baseline value and the covariates are completely observed. The proposed method consists of fitting four separate binary regression models: for the baseline outcome, the outcome 2 years after the stroke, the product of the previous two, and finally, the missingness indicator. We then conduct a sensitivity analysis by varying the assumptions about the third and the fourth binary regression models. Our method belongs to an imputation paradigm and can be an alternative to the weighting method of Rotnitzky and Robins (1997, Statistics in Medicine 16, 81-102). A jackknife variance estimate is proposed for the variance of the resulting estimate. The proposed analysis can be implemented using statistical software such as SAS. 相似文献
3.
We consider matched case-control familial studies which match a group of patients, called \"case probands,\" with a group of disease-free subjects, called \"control probands,\" using a set of family-level matching variables. Family members of each proband are then recruited into the study. Of interest here is the familial aggregation of the response variable and the effects of subject-specific covariates on the response. We propose an estimating equation approach to jointly estimate the main effects and intrafamilial correlations for matched family studies with a continuous outcome. Only knowledge of the first two joint moments of the response variable is required. The induced estimators for the main effects and intrafamilial correlations are consistent and asymptotically normally distributed. We apply the proposed method to sleep apnea data. A simulation study demonstrates the usefulness of our approach. 相似文献
4.
Genetic epidemiologists routinely assess disease susceptibility in relation to haplotypes, that is, combinations of alleles on a single chromosome. We study statistical methods for inferring haplotype-related disease risk using single nucleotide polymorphism (SNP) genotype data from matched case-control studies, where controls are individually matched to cases on some selected factors. Assuming a logistic regression model for haplotype-disease association, we propose two conditional likelihood approaches that address the issue that haplotypes cannot be inferred with certainty from SNP genotype data (phase ambiguity). One approach is based on the likelihood of disease status conditioned on the total number of cases, genotypes, and other covariates within each matching stratum, and the other is based on the joint likelihood of disease status and genotypes conditioned only on the total number of cases and other covariates. The joint-likelihood approach is generally more efficient, particularly for assessing haplotype-environment interactions. Simulation studies demonstrated that the first approach was more robust to model assumptions on the diplotype distribution conditioned on environmental risk variables and matching factors in the control population. We applied the two methods to analyze a matched case-control study of prostate cancer. 相似文献
5.
6.
Using unphased genotype data, we studied statistical inference for association between a disease and a haplotype in matched case-control studies. Statistical inference for haplotype data is complicated due to ambiguity of genotype phases. An estimating equation-based method is developed for estimating odds ratios and testing disease-haplotype association. The method potentially can also be applied to testing haplotype-environment interaction. Simulation studies show that the proposed method has good performance. The performance of the method in the presence of departures from Hardy-Weinberg equilibrium is also studied. 相似文献
7.
Longitudinal clinical trials often collect long sequences of binary data. Our application is a recent clinical trial in opiate addicts that examined the effect of a new treatment on repeated binary urine tests to assess opiate use over an extended follow-up. The dataset had two sources of missingness: dropout and intermittent missing observations. The primary endpoint of the study was comparing the marginal probability of a positive urine test over follow-up across treatment arms. We present a latent autoregressive model for longitudinal binary data subject to informative missingness. In this model, a Gaussian autoregressive process is shared between the binary response and missing-data processes, thereby inducing informative missingness. Our approach extends the work of others who have developed models that link the various processes through a shared random effect but do not allow for autocorrelation. We discuss parameter estimation using Monte Carlo EM and demonstrate through simulations that incorporating within-subject autocorrelation through a latent autoregressive process can be very important when longitudinal binary data is subject to informative missingness. We illustrate our new methodology using the opiate clinical trial data. 相似文献
8.
We introduce a method of parameter estimation for a random effects cure rate model. We also propose a methodology that allows us to account for nonignorable missing covariates in this class of models. The proposed method corrects for possible bias introduced by complete case analysis when missing data are not missing completely at random and is motivated by data from a pair of melanoma studies conducted by the Eastern Cooperative Oncology Group in which clustering by cohort or time of study entry was suspected. In addition, these models allow estimation of cure rates, which is desirable when we do not wish to assume that all subjects remain at risk of death or relapse from disease after sufficient follow-up. We develop an EM algorithm for the model and provide an efficient Gibbs sampling scheme for carrying out the E-step of the algorithm. 相似文献
9.
10.
Summary . Selection models and pattern-mixture models are often used to deal with nonignorable dropout in longitudinal studies. These two classes of models are based on different factorizations of the joint distribution of the outcome process and the dropout process. We consider a new class of models, called mixed-effect hybrid models (MEHMs), where the joint distribution of the outcome process and dropout process is factorized into the marginal distribution of random effects, the dropout process conditional on random effects, and the outcome process conditional on dropout patterns and random effects. MEHMs combine features of selection models and pattern-mixture models: they directly model the missingness process as in selection models, and enjoy the computational simplicity of pattern-mixture models. The MEHM provides a generalization of shared-parameter models (SPMs) by relaxing the conditional independence assumption between the measurement process and the dropout process given random effects. Because SPMs are nested within MEHMs, likelihood ratio tests can be constructed to evaluate the conditional independence assumption of SPMs. We use data from a pediatric AIDS clinical trial to illustrate the models. 相似文献
11.
Summary . Longitudinal studies often generate incomplete response patterns according to a missing not at random mechanism. Shared parameter models provide an appealing framework for the joint modelling of the measurement and missingness processes, especially in the nonmonotone missingness case, and assume a set of random effects to induce the interdependence. Parametric assumptions are typically made for the random effects distribution, violation of which leads to model misspecification with a potential effect on the parameter estimates and standard errors. In this article we avoid any parametric assumption for the random effects distribution and leave it completely unspecified. The estimation of the model is then made using a semi-parametric maximum likelihood method. Our proposal is illustrated on a randomized longitudinal study on patients with rheumatoid arthritis exhibiting nonmonotone missingness. 相似文献
12.
This article presents a likelihood-based method for handling nonignorable dropout in longitudinal studies with binary responses. The methodology developed is appropriate when the target of inference is the marginal distribution of the response at each occasion and its dependence on covariates. A \"hybrid\" model is formulated, which is designed to retain advantageous features of the selection and pattern-mixture model approaches. This formulation accommodates a variety of assumed forms of nonignorable dropout, while maintaining transparency of the constraints required for identifying the overall model. Once appropriate identifying constraints have been imposed, likelihood-based estimation is conducted via the EM algorithm. The article concludes by applying the approach to data from a randomized clinical trial comparing two doses of a contraceptive. 相似文献
13.
Ralph C. Ward Robert Neal Axon Mulugeta Gebregziabher 《Biometrical journal. Biometrische Zeitschrift》2020,62(4):1025-1037
Data with missing covariate values but fully observed binary outcomes are an important subset of the missing data challenge. Common approaches are complete case analysis (CCA) and multiple imputation (MI). While CCA relies on missing completely at random (MCAR), MI usually relies on a missing at random (MAR) assumption to produce unbiased results. For MI involving logistic regression models, it is also important to consider several missing not at random (MNAR) conditions under which CCA is asymptotically unbiased and, as we show, MI is also valid in some cases. We use a data application and simulation study to compare the performance of several machine learning and parametric MI methods under a fully conditional specification framework (MI-FCS). Our simulation includes five scenarios involving MCAR, MAR, and MNAR under predictable and nonpredictable conditions, where “predictable” indicates missingness is not associated with the outcome. We build on previous results in the literature to show MI and CCA can both produce unbiased results under more conditions than some analysts may realize. When both approaches were valid, we found that MI-FCS was at least as good as CCA in terms of estimated bias and coverage, and was superior when missingness involved a categorical covariate. We also demonstrate how MNAR sensitivity analysis can build confidence that unbiased results were obtained, including under MNAR-predictable, when CCA and MI are both valid. Since the missingness mechanism cannot be identified from observed data, investigators should compare results from MI and CCA when both are plausibly valid, followed by MNAR sensitivity analysis. 相似文献
14.
Longitudinal studies frequently incur outcome-related nonresponse. In this article, we discuss a likelihood-based method for analyzing repeated binary responses when the mechanism leading to missing response data depends on unobserved responses. We describe a pattern-mixture model for the joint distribution of the vector of binary responses and the indicators of nonresponse patterns. Specifically, we propose an extension of the multivariate logistic model to handle nonignorable nonresponse. This method yields estimates of the mean parameters under a variety of assumptions regarding the distribution of the unobserved responses. Because these models make unverifiable identifying assumptions, we recommended conducting sensitivity analyses that provide a range of inferences, each of which is valid under different assumptions for nonresponse. The methodology is illustrated using data from a longitudinal study of obesity in children. 相似文献
15.
Summary With advances in modern medicine and clinical diagnosis, case–control data with characterization of finer subtypes of cases are often available. In matched case–control studies, missingness in exposure values often leads to deletion of entire stratum, and thus entails a significant loss in information. When subtypes of cases are treated as categorical outcomes, the data are further stratified and deletion of observations becomes even more expensive in terms of precision of the category‐specific odds‐ratio parameters, especially using the multinomial logit model. The stereotype regression model for categorical responses lies intermediate between the proportional odds and the multinomial or baseline category logit model. The use of this class of models has been limited as the structure of the model implies certain inferential challenges with nonidentifiability and nonlinearity in the parameters. We illustrate how to handle missing data in matched case–control studies with finer disease subclassification within the cases under a stereotype regression model. We present both Monte Carlo based full Bayesian approach and expectation/conditional maximization algorithm for the estimation of model parameters in the presence of a completely general missingness mechanism. We illustrate our methods by using data from an ongoing matched case–control study of colorectal cancer. Simulation results are presented under various missing data mechanisms and departures from modeling assumptions. 相似文献
16.
Ma S 《Biometrical journal. Biometrische Zeitschrift》2006,48(1):83-92
In large cohort studies, it is common that a subset of the regressors may be missing for some study subjects by design or happenstance. In this article, we apply the multiple data augmentation techniques to semiparametric models for epidemiologic data when a subset of the regressors are missing for some subjects, under the assumption that the data are missing at random in the sense of Rubin (2004) and that the missingness probabilities depend jointly on the observable subset of regressors, on a set of observable extraneous variables and on the outcome. Computational algorithms for the Poor Man's and the Asymptotic Normal data augmentations are investigated. Simulation studies show that the data augmentation approach generates satisfactory estimates and is computationally affordable. Under certain simulation scenarios, the proposed approach can achieve asymptotic efficiency similar to the maximum likelihood approach. We apply the proposed technique to the Multi-Ethic Study of Atherosclerosis (MESA) data and the South Wales Nickel Worker Study data. 相似文献
17.
Stratified Cox regression models with large number of strata and small stratum size are useful in many settings, including matched case-control family studies. In the presence of measurement error in covariates and a large number of strata, we show that extensions of existing methods fail either to reduce the bias or to correct the bias under nonsymmetric distributions of the true covariate or the error term. We propose a nonparametric correction method for the estimation of regression coefficients, and show that the estimators are asymptotically consistent for the true parameters. Small sample properties are evaluated in a simulation study. The method is illustrated with an analysis of Framingham data. 相似文献
18.
Imputation, weighting, direct likelihood, and direct Bayesian inference (Rubin, 1976) are important approaches for missing data regression. Many useful semiparametric estimators have been developed for regression analysis of data with missing covariates or outcomes. It has been established that some semiparametric estimators are asymptotically equivalent, but it has not been shown that many are numerically the same. We applied some existing methods to a bladder cancer case-control study and noted that they were the same numerically when the observed covariates and outcomes are categorical. To understand the analytical background of this finding, we further show that when observed covariates and outcomes are categorical, some estimators are not only asymptotically equivalent but also actually numerically identical. That is, although their estimating equations are different, they lead numerically to exactly the same root. This includes a simple weighted estimator, an augmented weighted estimator, and a mean-score estimator. The numerical equivalence may elucidate the relationship between imputing scores and weighted estimation procedures. 相似文献
19.
20.
Summary Time varying, individual covariates are problematic in experiments with marked animals because the covariate can typically only be observed when each animal is captured. We examine three methods to incorporate time varying, individual covariates of the survival probabilities into the analysis of data from mark‐recapture‐recovery experiments: deterministic imputation, a Bayesian imputation approach based on modeling the joint distribution of the covariate and the capture history, and a conditional approach considering only the events for which the associated covariate data are completely observed (the trinomial model). After describing the three methods, we compare results from their application to the analysis of the effect of body mass on the survival of Soay sheep (Ovis aries) on the Isle of Hirta, Scotland. Simulations based on these results are then used to make further comparisons. We conclude that both the trinomial model and Bayesian imputation method perform best in different situations. If the capture and recovery probabilities are all high, then the trinomial model produces precise, unbiased estimators that do not depend on any assumptions regarding the distribution of the covariate. In contrast, the Bayesian imputation method performs substantially better when capture and recovery probabilities are low, provided that the specified model of the covariate is a good approximation to the true data‐generating mechanism. 相似文献