首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Summary.   The present article deals with informative missing (IM) exposure data in matched case–control studies. When the missingness mechanism depends on the unobserved exposure values, modeling the missing data mechanism is inevitable. Therefore, a full likelihood-based approach for handling IM data has been proposed by positing a model for selection probability, and a parametric model for the partially missing exposure variable among the control population along with a disease risk model. We develop an EM algorithm to estimate the model parameters. Three special cases: (a) binary exposure variable, (b) normally distributed exposure variable, and (c) lognormally distributed exposure variable are discussed in detail. The method is illustrated by analyzing a real matched case–control data with missing exposure variable. The performance of the proposed method is evaluated through simulation studies, and the robustness of the proposed method for violation of different types of model assumptions has been considered.  相似文献   

2.
We consider matched case-control familial studies which match a group of patients, called "case probands," with a group of disease-free subjects, called "control probands," using a set of family-level matching variables. Family members of each proband are then recruited into the study. Of interest here is the familial aggregation of the response variable and the effects of subject-specific covariates on the response. We propose an estimating equation approach to jointly estimate the main effects and intrafamilial correlations for matched family studies with a continuous outcome. Only knowledge of the first two joint moments of the response variable is required. The induced estimators for the main effects and intrafamilial correlations are consistent and asymptotically normally distributed. We apply the proposed method to sleep apnea data. A simulation study demonstrates the usefulness of our approach.  相似文献   

3.
The paper considers the problem of determining the number of matched sets in 1 : M matched case-control studies with a categorical exposure having k + 1 categories, k > or = 1. The basic interest lies in constructing a test statistic to test whether the exposure is associated with the disease. Estimates of the k odds ratios for 1 : M matched case-control studies with dichotomous exposure and for 1 : 1 matched case-control studies with exposure at several levels are presented in Breslow and Day (1980), but results holding in full generality were not available so far. We propose a score test for testing the hypothesis of no association between disease and the polychotomous exposure. We exploit the power function of this test statistic to calculate the required number of matched sets to detect specific departures from the null hypothesis of no association. We also consider the situation when there is a natural ordering among the levels of the exposure variable. For ordinal exposure variables, we propose a test for detecting trend in disease risk with increasing levels of the exposure variable. Our methods are illustrated with two datasets, one is a real dataset on colorectal cancer in rats and the other a simulated dataset for studying disease-gene association.  相似文献   

4.
D C Thomas  M Blettner  N E Day 《Biometrics》1992,48(3):781-794
A method is proposed for analysis of nested case-control studies that combines the matched comparison of covariate values between cases and controls and a comparison of the observed numbers of cases in the nesting cohort with expected numbers based on external rates and average relative risks estimated from the controls. The former comparison is based on the conditional likelihood for matched case-control studies and the latter on the unconditional likelihood for Poisson regression. It is shown that the two likelihoods are orthogonal and that their product is an estimator of the full survival likelihood that would have been obtained on the total cohort, had complete covariate data been available. Parameter estimation and significance tests follow in the usual way by maximizing this product likelihood. The method is illustrated using data on leukemia following irradiation for cervical cancer. In this study, the original cohort study showed a clear excess of leukemia in the first 15 years after exposure, but it was not feasible to obtain dose estimates on the entire cohort. However, the subsequent nested case-control study failed to demonstrate significant differences between alternative dose-response relations and effects of time-related modifiers. The combined analysis allows much clearer discrimination between alternative dose-time-response models.  相似文献   

5.
We investigate methods for regression analysis when covariates are measured with errors. In a subset of the whole cohort, a surrogate variable is available for the true unobserved exposure variable. The surrogate variable satisfies the classical measurement error model, but it may not have repeated measurements. In addition to the surrogate variables that are available among the subjects in the calibration sample, we assume that there is an instrumental variable (IV) that is available for all study subjects. An IV is correlated with the unobserved true exposure variable and hence can be useful in the estimation of the regression coefficients. We propose a robust best linear estimator that uses all the available data, which is the most efficient among a class of consistent estimators. The proposed estimator is shown to be consistent and asymptotically normal under very weak distributional assumptions. For Poisson or linear regression, the proposed estimator is consistent even if the measurement error from the surrogate or IV is heteroscedastic. Finite-sample performance of the proposed estimator is examined and compared with other estimators via intensive simulation studies. The proposed method and other methods are applied to a bladder cancer case-control study.  相似文献   

6.
We present a Bayesian approach to analyze matched "case-control" data with multiple disease states. The probability of disease development is described by a multinomial logistic regression model. The exposure distribution depends on the disease state and could vary across strata. In such a model, the number of stratum effect parameters grows in direct proportion to the sample size leading to inconsistent MLEs for the parameters of interest even when one uses a retrospective conditional likelihood. We adopt a semiparametric Bayesian framework instead, assuming a Dirichlet process prior with a mixing normal distribution on the distribution of the stratum effects. We also account for possible missingness in the exposure variable in our model. The actual estimation is carried out through a Markov chain Monte Carlo numerical integration scheme. The proposed methodology is illustrated through simulation and an example of a matched study on low birth weight of newborns (Hosmer, D. A. and Lemeshow, S., 2000, Applied Logistic Regression) with two possible disease groups matched with a control group.  相似文献   

7.
Lu SE  Wang MC 《Biometrics》2002,58(4):764-772
Cohort case-control design is an efficient and economical design to study risk factors for disease incidence or mortality in a large cohort. In the last few decades, a variety of cohort case-control designs have been developed and theoretically justified. These designs have been exclusively applied to the analysis of univariate failure-time data. In this work, a cohort case-control design adapted to multivariate failure-time data is developed. A risk set sampling method is proposed to sample controls from nonfailures in a large cohort for each case matched by failure time. This method leads to a pseudolikelihood approach for the estimation of regression parameters in the marginal proportional hazards model (Cox, 1972, Journal of the Royal Statistical Society, Series B 34, 187-220), where the correlation structure between individuals within a cluster is left unspecified. The performance of the proposed estimator is demonstrated by simulation studies. A bootstrap method is proposed for inferential purposes. This methodology is illustrated by a data example from a child vitamin A supplementation trial in Nepal (Nepal Nutrition Intervention Project-Sarlahi, or NNIPS).  相似文献   

8.
Recently, there have been many case-control studies proposed to test for association between haplotypes and disease, which require the Hardy-Weinberg equilibrium (HWE) assumption of haplotype frequencies. As such, haplotype inference of unphased genotypes and development of haplotype-based HWE tests are crucial prior to fine mapping. The goodness-of-fit test is a frequently-used method to test for HWE for multiple tightly-linked loci. However, its degrees of freedom dramatically increase with the increase of the number of loci, which may lack the test power. Therefore, in this paper, to improve the test power for haplotype-based HWE, we first write out two likelihood functions of the observed data based on the Niu''s model (NM) and inbreeding model (IM), respectively, which can cause the departure from HWE. Then, we use two expectation-maximization algorithms and one expectation-conditional-maximization algorithm to estimate the model parameters under the HWE, IM and NM models, respectively. Finally, we propose the likelihood ratio tests LRT and LRT for haplotype-based HWE under the NM and IM models, respectively. We simulate the HWE, Niu''s, inbreeding and population stratification models to assess the validity and compare the performance of these two LRT tests. The simulation results show that both of the tests control the type I error rates well in testing for haplotype-based HWE. If the NM model is true, then LRT is more powerful. While, if the true model is the IM model, then LRT has better performance in power. Under the population stratification model, LRT is still more powerful. To this end, LRT is generally recommended. Application of the proposed methods to a rheumatoid arthritis data set further illustrates their utility for real data analysis.  相似文献   

9.
Summary In individually matched case–control studies, when some covariates are incomplete, an analysis based on the complete data may result in a large loss of information both in the missing and completely observed variables. This usually results in a bias and loss of efficiency. In this article, we propose a new method for handling the problem of missing covariate data based on a missing‐data‐induced intensity approach when the missingness mechanism does not depend on case–control status and show that this leads to a generalization of the missing indicator method. We derive the asymptotic properties of the estimates from the proposed method and, using an extensive simulation study, assess the finite sample performance in terms of bias, efficiency, and 95% confidence coverage under several missing data scenarios. We also make comparisons with complete‐case analysis (CCA) and some missing data methods that have been proposed previously. Our results indicate that, under the assumption of predictable missingness, the suggested method provides valid estimation of parameters, is more efficient than CCA, and is competitive with other, more complex methods of analysis. A case–control study of multiple myeloma risk and a polymorphism in the receptor Inter‐Leukin‐6 (IL‐6‐α) is used to illustrate our findings.  相似文献   

10.
Binary logistic model has been found useful for estimating odds ratio in case of dichotomous exposure variable under matched case-control retrospective design. We describe the use of polytomous logistic model for estimating odds ratios when the exposure of prime interests, relative to disease incidence, has more than two levels. An illustrative example is presented and discussed.  相似文献   

11.
Summary With advances in modern medicine and clinical diagnosis, case–control data with characterization of finer subtypes of cases are often available. In matched case–control studies, missingness in exposure values often leads to deletion of entire stratum, and thus entails a significant loss in information. When subtypes of cases are treated as categorical outcomes, the data are further stratified and deletion of observations becomes even more expensive in terms of precision of the category‐specific odds‐ratio parameters, especially using the multinomial logit model. The stereotype regression model for categorical responses lies intermediate between the proportional odds and the multinomial or baseline category logit model. The use of this class of models has been limited as the structure of the model implies certain inferential challenges with nonidentifiability and nonlinearity in the parameters. We illustrate how to handle missing data in matched case–control studies with finer disease subclassification within the cases under a stereotype regression model. We present both Monte Carlo based full Bayesian approach and expectation/conditional maximization algorithm for the estimation of model parameters in the presence of a completely general missingness mechanism. We illustrate our methods by using data from an ongoing matched case–control study of colorectal cancer. Simulation results are presented under various missing data mechanisms and departures from modeling assumptions.  相似文献   

12.
A mediation model explores the direct and indirect effects between an independent variable and a dependent variable by including other variables (or mediators). Mediation analysis has recently been used to dissect the direct and indirect effects of genetic variants on complex diseases using case-control studies. However, bias could arise in the estimations of the genetic variant-mediator association because the presence or absence of the mediator in the study samples is not sampled following the principles of case-control study design. In this case, the mediation analysis using data from case-control studies might lead to biased estimates of coefficients and indirect effects. In this article, we investigated a multiple-mediation model involving a three-path mediating effect through two mediators using case-control study data. We propose an approach to correct bias in coefficients and provide accurate estimates of the specific indirect effects. Our approach can also be used when the original case-control study is frequency matched on one of the mediators. We employed bootstrapping to assess the significance of indirect effects. We conducted simulation studies to investigate the performance of the proposed approach, and showed that it provides more accurate estimates of the indirect effects as well as the percent mediated than standard regressions. We then applied this approach to study the mediating effects of both smoking and chronic obstructive pulmonary disease (COPD) on the association between the CHRNA5-A3 gene locus and lung cancer risk using data from a lung cancer case-control study. The results showed that the genetic variant influences lung cancer risk indirectly through all three different pathways. The percent of genetic association mediated was 18.3% through smoking alone, 30.2% through COPD alone, and 20.6% through the path including both smoking and COPD, and the total genetic variant-lung cancer association explained by the two mediators was 69.1%.  相似文献   

13.
Lee SM  Gee MJ  Hsieh SH 《Biometrics》2011,67(3):788-798
Summary We consider the estimation problem of a proportional odds model with missing covariates. Based on the validation and nonvalidation data sets, we propose a joint conditional method that is an extension of Wang et al. (2002, Statistica Sinica 12, 555–574). The proposed method is semiparametric since it requires neither an additional model for the missingness mechanism, nor the specification of the conditional distribution of missing covariates given observed variables. Under the assumption that the observed covariates and the surrogate variable are categorical, we derived the large sample property. The simulation studies show that in various situations, the joint conditional method is more efficient than the conditional estimation method and weighted method. We also use a real data set that came from a survey of cable TV satisfaction to illustrate the approaches.  相似文献   

14.
Summary The generalized estimating equation (GEE) has been a popular tool for marginal regression analysis with longitudinal data, and its extension, the weighted GEE approach, can further accommodate data that are missing at random (MAR). Model selection methodologies for GEE, however, have not been systematically developed to allow for missing data. We propose the missing longitudinal information criterion (MLIC) for selection of the mean model, and the MLIC for correlation (MLICC) for selection of the correlation structure in GEE when the outcome data are subject to dropout/monotone missingness and are MAR. Our simulation results reveal that the MLIC and MLICC are effective for variable selection in the mean model and selecting the correlation structure, respectively. We also demonstrate the remarkable drawbacks of naively treating incomplete data as if they were complete and applying the existing GEE model selection method. The utility of proposed method is further illustrated by two real applications involving missing longitudinal outcome data.  相似文献   

15.
Auxiliary covariate data are often collected in biomedical studies when the primary exposure variable is only assessed on a subset of the study subjects. In this study, we investigate a semiparametric‐estimated likelihood estimation for the generalized linear mixed models (GLMM) in the presence of a continuous auxiliary variable. We use a kernel smoother to handle continuous auxiliary data. The method can be used to deal with missing or mismeasured covariate data problems in a variety of applications when an auxiliary variable is available and cluster sizes are not too small. Simulation study results show that the proposed method performs better than that which ignores the random effects in GLMM and that which only uses data in the validation data set. We illustrate the proposed method with a real data set from a recent environmental epidemiology study on the maternal serum 1,1‐dichloro‐2,2‐bis(p‐chlorophenyl) ethylene level in relationship to preterm births.  相似文献   

16.
Patient-reported outcomes (PRO) have gained importance in clinical and epidemiological research and aim at assessing quality of life, anxiety or fatigue for instance. Item Response Theory (IRT) models are increasingly used to validate and analyse PRO. Such models relate observed variables to a latent variable (unobservable variable) which is commonly assumed to be normally distributed. A priori sample size determination is important to obtain adequately powered studies to determine clinically important changes in PRO. In previous developments, the Raschpower method has been proposed for the determination of the power of the test of group effect for the comparison of PRO in cross-sectional studies with an IRT model, the Rasch model. The objective of this work was to evaluate the robustness of this method (which assumes a normal distribution for the latent variable) to violations of distributional assumption. The statistical power of the test of group effect was estimated by the empirical rejection rate in data sets simulated using a non-normally distributed latent variable. It was compared to the power obtained with the Raschpower method. In both cases, the data were analyzed using a latent regression Rasch model including a binary covariate for group effect. For all situations, both methods gave comparable results whatever the deviations from the model assumptions. Given the results, the Raschpower method seems to be robust to the non-normality of the latent trait for determining the power of the test of group effect.  相似文献   

17.
A variables selection method for case‐control studies is proposed that uses an adaptive weighting scheme along with a permutation method to determine if a variable is useful in differentiating the cases from the controls. This adaptive method is used to select exposure variables for the analysis of data from a bladder cancer case‐control study. An extensive simulation study shows that the adaptive method is nearly as effective at finding those variables that are related to case‐control status when normally distributed variables are used. The simulation also shows that the proposed variable selection procedure is much more effective than the stepwise discriminant analysis method when the variables are not normally distributed. (© 2004 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

18.
The outcome-dependent sampling (ODS) design, which allows observation of exposure variable to depend on the outcome, has been shown to be cost efficient. In this article, we propose a new statistical inference method, an estimated penalized likelihood method, for a partial linear model in the setting of a 2-stage ODS with a continuous outcome. We develop the asymptotic properties and conduct simulation studies to demonstrate the performance of the proposed estimator. A real environmental study data set is used to illustrate the proposed method.  相似文献   

19.
The problem of exact conditional inference for discrete multivariate case-control data has two forms. The first is grouped case-control data, where Monte Carlo computations can be done using the importance sampling method of Booth and Butler (1999, Biometrika86, 321-332), or a proposed alternative sequential importance sampling method. The second form is matched case-control data. For this analysis we propose a new exact sampling method based on the conditional-Poisson distribution for conditional testing with one binary and one integral ordered covariate. This method makes computations on data sets with large numbers of matched sets fast and accurate. We provide detailed derivation of the constraints and conditional distributions for conditional inference on grouped and matched data. The methods are illustrated on several new and old data sets.  相似文献   

20.
Stubbendick AL  Ibrahim JG 《Biometrics》2003,59(4):1140-1150
This article analyzes quality of life (QOL) data from an Eastern Cooperative Oncology Group (ECOG) melanoma trial that compared treatment with ganglioside vaccination to treatment with high-dose interferon. The analysis of this data set is challenging due to several difficulties, namely, nonignorable missing longitudinal responses and baseline covariates. Hence, we propose a selection model for estimating parameters in the normal random effects model with nonignorable missing responses and covariates. Parameters are estimated via maximum likelihood using the Gibbs sampler and a Monte Carlo expectation maximization (EM) algorithm. Standard errors are calculated using the bootstrap. The method allows for nonmonotone patterns of missing data in both the response variable and the covariates. We model the missing data mechanism and the missing covariate distribution via a sequence of one-dimensional conditional distributions, allowing the missing covariates to be either categorical or continuous, as well as time-varying. We apply the proposed approach to the ECOG quality-of-life data and conduct a small simulation study evaluating the performance of the maximum likelihood estimates. Our results indicate that a patient treated with the vaccine has a higher QOL score on average at a given time point than a patient treated with high-dose interferon.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号