首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Wang CY  Huang WT 《Biometrics》2000,56(1):98-105
We consider estimation in logistic regression where some covariate variables may be missing at random. Satten and Kupper (1993, Journal of the American Statistical Association 88, 200-208) proposed estimating odds ratio parameters using methods based on the probability of exposure. By approximating a partial likelihood, we extend their idea and propose a method that estimates the cumulant-generating function of the missing covariate given observed covariates and surrogates in the controls. Our proposed method first estimates some lower order cumulants of the conditional distribution of the unobserved data and then solves a resulting estimating equation for the logistic regression parameter. A simple version of the proposed method is to replace a missing covariate by the summation of its conditional mean and conditional variance given observed data in the controls. We note that one important property of the proposed method is that, when the validation is only on controls, a class of inverse selection probability weighted semiparametric estimators cannot be applied because selection probabilities on cases are zeroes. The proposed estimator performs well unless the relative risk parameters are large, even though it is technically inconsistent. Small-sample simulations are conducted. We illustrate the method by an example of real data analysis.  相似文献   

2.
Jing Qin  Yu Shen 《Biometrics》2010,66(2):382-392
Summary Length‐biased time‐to‐event data are commonly encountered in applications ranging from epidemiological cohort studies or cancer prevention trials to studies of labor economy. A longstanding statistical problem is how to assess the association of risk factors with survival in the target population given the observed length‐biased data. In this article, we demonstrate how to estimate these effects under the semiparametric Cox proportional hazards model. The structure of the Cox model is changed under length‐biased sampling in general. Although the existing partial likelihood approach for left‐truncated data can be used to estimate covariate effects, it may not be efficient for analyzing length‐biased data. We propose two estimating equation approaches for estimating the covariate coefficients under the Cox model. We use the modern stochastic process and martingale theory to develop the asymptotic properties of the estimators. We evaluate the empirical performance and efficiency of the two methods through extensive simulation studies. We use data from a dementia study to illustrate the proposed methodology, and demonstrate the computational algorithms for point estimates, which can be directly linked to the existing functions in S‐PLUS or R .  相似文献   

3.
Tian L  Lagakos S 《Biometrics》2006,62(3):821-828
We develop methods for assessing the association between a binary time-dependent covariate process and a failure time endpoint when the former is observed only at a single time point and the latter is right censored, and when the observations are subject to truncation and competing causes of failure. Using a proportional hazards model for the effect of the covariate process on the failure time of interest, we develop an approach utilizing EM algorithm and profile likelihood for estimating the relative risk parameter and cause-specific hazards for failure. The methods are extended to account for other covariates that can influence the time-dependent covariate process and cause-specific risks of failure. We illustrate the methods with data from a recent study on the association between loss of hepatitis B e antigen and the development of hepatocellular carcinoma in a population of chronic carriers of hepatitis B.  相似文献   

4.

Summary

Omission of relevant covariates can lead to bias when estimating treatment or exposure effects from survival data in both randomized controlled trials and observational studies. This paper presents a general approach to assessing bias when covariates are omitted from the Cox model. The proposed method is applicable to both randomized and non‐randomized studies. We distinguish between the effects of three possible sources of bias: omission of a balanced covariate, data censoring and unmeasured confounding. Asymptotic formulae for determining the bias are derived from the large sample properties of the maximum likelihood estimator. A simulation study is used to demonstrate the validity of the bias formulae and to characterize the influence of the different sources of bias. It is shown that the bias converges to fixed limits as the effect of the omitted covariate increases, irrespective of the degree of confounding. The bias formulae are used as the basis for developing a new method of sensitivity analysis to assess the impact of omitted covariates on estimates of treatment or exposure effects. In simulation studies, the proposed method gave unbiased treatment estimates and confidence intervals with good coverage when the true sensitivity parameters were known. We describe application of the method to a randomized controlled trial and a non‐randomized study.  相似文献   

5.
This paper develops a model for repeated binary regression when a covariate is measured with error. The model allows for estimating the effect of the true value of the covariate on a repeated binary response. The choice of a probit link for the effect of the error-free covariate, coupled with normal measurement error for the error-free covariate, results in a probit model after integrating over the measurement error distribution. We propose a two-stage estimation procedure where, in the first stage, a linear mixed model is used to fit the repeated covariate. In the second stage, a model for the correlated binary responses conditional on the linear mixed model estimates is fit to the repeated binary data using generalized estimating equations. The approach is demonstrated using nutrient safety data from the Diet Intervention of School Age Children (DISC) study.  相似文献   

6.
Right-truncated data arise when observations are ascertained retrospectively, and only subjects who experience the event of interest by the time of sampling are selected. Such a selection scheme, without adjustment, leads to biased estimation of covariate effects in the Cox proportional hazards model. The existing methods for fitting the Cox model to right-truncated data, which are based on the maximization of the likelihood or solving estimating equations with respect to both the baseline hazard function and the covariate effects, are numerically challenging. We consider two alternative simple methods based on inverse probability weighting (IPW) estimating equations, which allow consistent estimation of covariate effects under a positivity assumption and avoid estimation of baseline hazards. We discuss problems of identifiability and consistency that arise when positivity does not hold and show that although the partial tests for null effects based on these IPW methods can be used in some settings even in the absence of positivity, they are not valid in general. We propose adjusted estimating equations that incorporate the probability of observation when it is known from external sources, which results in consistent estimation. We compare the methods in simulations and apply them to the analyses of human immunodeficiency virus latency.  相似文献   

7.
Dai JY  LeBlanc M  Kooperberg C 《Biometrics》2009,65(1):178-187
Summary .  Recent results for case–control sampling suggest when the covariate distribution is constrained by gene-environment independence, semiparametric estimation exploiting such independence yields a great deal of efficiency gain. We consider the efficient estimation of the treatment–biomarker interaction in two-phase sampling nested within randomized clinical trials, incorporating the independence between a randomized treatment and the baseline markers. We develop a Newton–Raphson algorithm based on the profile likelihood to compute the semiparametric maximum likelihood estimate (SPMLE). Our algorithm accommodates both continuous phase-one outcomes and continuous phase-two biomarkers. The profile information matrix is computed explicitly via numerical differentiation. In certain situations where computing the SPMLE is slow, we propose a maximum estimated likelihood estimator (MELE), which is also capable of incorporating the covariate independence. This estimated likelihood approach uses a one-step empirical covariate distribution, thus is straightforward to maximize. It offers a closed-form variance estimate with limited increase in variance relative to the fully efficient SPMLE. Our results suggest exploiting the covariate independence in two-phase sampling increases the efficiency substantially, particularly for estimating treatment–biomarker interactions.  相似文献   

8.
Incomplete covariate data are a common occurrence in studies in which the outcome is survival time. Further, studies in the health sciences often give rise to correlated, possibly censored, survival data. With no missing covariate data, if the marginal distributions of the correlated survival times follow a given parametric model, then the estimates using the maximum likelihood estimating equations, naively treating the correlated survival times as independent, give consistent estimates of the relative risk parameters Lipsitz et al. 1994 50, 842-846. Now, suppose that some observations within a cluster have some missing covariates. We show in this paper that if one naively treats observations within a cluster as independent, that one can still use the maximum likelihood estimating equations to obtain consistent estimates of the relative risk parameters. This method requires the estimation of the parameters of the distribution of the covariates. We present results from a clinical trial Lipsitz and Ibrahim (1996b) 2, 5-14 with five covariates, four of which have some missing values. In the trial, the clusters are the hospitals in which the patients were treated.  相似文献   

9.
Chen Q  Ibrahim JG 《Biometrics》2006,62(1):177-184
We consider a class of semiparametric models for the covariate distribution and missing data mechanism for missing covariate and/or response data for general classes of regression models including generalized linear models and generalized linear mixed models. Ignorable and nonignorable missing covariate and/or response data are considered. The proposed semiparametric model can be viewed as a sensitivity analysis for model misspecification of the missing covariate distribution and/or missing data mechanism. The semiparametric model consists of a generalized additive model (GAM) for the covariate distribution and/or missing data mechanism. Penalized regression splines are used to express the GAMs as a generalized linear mixed effects model, in which the variance of the corresponding random effects provides an intuitive index for choosing between the semiparametric and parametric model. Maximum likelihood estimates are then obtained via the EM algorithm. Simulations are given to demonstrate the methodology, and a real data set from a melanoma cancer clinical trial is analyzed using the proposed methods.  相似文献   

10.
This paper deals with testing the functional form of the covariate effects in a Cox proportional hazards model with random effects. We assume that the responses are clustered and incomplete due to right censoring. The estimation of the model under the null (parametric covariate effect) and the alternative (nonparametric effect) is performed using the full marginal likelihood. Under the alternative, the nonparametric covariate effects are estimated using orthogonal expansions. The test statistic is the likelihood ratio statistic, and its distribution is approximated using a bootstrap method. The performance of the proposed testing procedure is studied through simulations. The method is also applied on two real data sets one from biomedical research and one from veterinary medicine.  相似文献   

11.
Summary Combining data collected from different sources can potentially enhance statistical efficiency in estimating effects of environmental or genetic factors or gene–environment interactions. However, combining data across studies becomes complicated when data are collected under different study designs, such as family‐based and unrelated individual‐based case–control design. In this article, we describe likelihood‐based approaches that permit the joint estimation of covariate effects on disease risk under study designs that include cases, relatives of cases, and unrelated individuals. Our methods accommodate familial residual correlation and a variety of ascertainment schemes. Extensive simulation experiments demonstrate that the proposed methods for estimation and inference perform well in realistic settings. Efficiencies of different designs are contrasted in the simulation. We applied the methods to data from the Colorectal Cancer Family Registry.  相似文献   

12.
Summary Naive use of misclassified covariates leads to inconsistent estimators of covariate effects in regression models. A variety of methods have been proposed to address this problem including likelihood, pseudo‐likelihood, estimating equation methods, and Bayesian methods, with all of these methods typically requiring either internal or external validation samples or replication studies. We consider a problem arising from a series of orthopedic studies in which interest lies in examining the effect of a short‐term serological response and other covariates on the risk of developing a longer term thrombotic condition called deep vein thrombosis. The serological response is an indicator of whether the patient developed antibodies following exposure to an antithrombotic drug, but the seroconversion status of patients is only available at the time of a blood sample taken upon the discharge from hospital. The seroconversion time is therefore subject to a current status observation scheme, or Case I interval censoring, and subjects tested before seroconversion are misclassified as nonseroconverters. We develop a likelihood‐based approach for fitting regression models that accounts for misclassification of the seroconversion status due to early testing using parametric and nonparametric estimates of the seroconversion time distribution. The method is shown to reduce the bias resulting from naive analyses in simulation studies and an application to the data from the orthopedic studies provides further illustration.  相似文献   

13.
Horton NJ  Laird NM 《Biometrics》2001,57(1):34-42
This article presents a new method for maximum likelihood estimation of logistic regression models with incomplete covariate data where auxiliary information is available. This auxiliary information is extraneous to the regression model of interest but predictive of the covariate with missing data. Ibrahim (1990, Journal of the American Statistical Association 85, 765-769) provides a general method for estimating generalized linear regression models with missing covariates using the EM algorithm that is easily implemented when there is no auxiliary data. Vach (1997, Statistics in Medicine 16, 57-72) describes how the method can be extended when the outcome and auxiliary data are conditionally independent given the covariates in the model. The method allows the incorporation of auxiliary data without making the conditional independence assumption. We suggest tests of conditional independence and compare the performance of several estimators in an example concerning mental health service utilization in children. Using an artificial dataset, we compare the performance of several estimators when auxiliary data are available.  相似文献   

14.
Stubbendick AL  Ibrahim JG 《Biometrics》2003,59(4):1140-1150
This article analyzes quality of life (QOL) data from an Eastern Cooperative Oncology Group (ECOG) melanoma trial that compared treatment with ganglioside vaccination to treatment with high-dose interferon. The analysis of this data set is challenging due to several difficulties, namely, nonignorable missing longitudinal responses and baseline covariates. Hence, we propose a selection model for estimating parameters in the normal random effects model with nonignorable missing responses and covariates. Parameters are estimated via maximum likelihood using the Gibbs sampler and a Monte Carlo expectation maximization (EM) algorithm. Standard errors are calculated using the bootstrap. The method allows for nonmonotone patterns of missing data in both the response variable and the covariates. We model the missing data mechanism and the missing covariate distribution via a sequence of one-dimensional conditional distributions, allowing the missing covariates to be either categorical or continuous, as well as time-varying. We apply the proposed approach to the ECOG quality-of-life data and conduct a small simulation study evaluating the performance of the maximum likelihood estimates. Our results indicate that a patient treated with the vaccine has a higher QOL score on average at a given time point than a patient treated with high-dose interferon.  相似文献   

15.
The conventional line transect approach of estimating effective search width from the perpendicular distance distribution is inappropriate in certain types of surveys, e.g., when an unknown fraction of the animals on the track line is detected, the animals can be observed only at discrete points in time, there are errors in positional measurements, and covariate heterogeneity exists in detectability. For such situations a hazard probability framework for independent observer surveys is developed. The likelihood of the data, including observed positions of both initial and subsequent observations of animals, is established under the assumption of no measurement errors. To account for measurement errors and possibly other complexities, this likelihood is modified by a function estimated from extensive simulations. This general method of simulated likelihood is explained and the methodology applied to data from a double-platform survey of minke whales in the northeastern Atlantic in 1995.  相似文献   

16.
Two-stage design has long been recognized to be a cost-effective way for conducting biomedical studies. In many trials, auxiliary covariate information may also be available, and it is of interest to exploit these auxiliary data to improve the efficiency of inferences. In this paper, we propose a 2-stage design with continuous outcome where the second-stage data is sampled with an "outcome-auxiliary-dependent sampling" (OADS) scheme. We propose an estimator which is the maximizer for an estimated likelihood function. We show that the proposed estimator is consistent and asymptotically normally distributed. The simulation study indicates that greater study efficiency gains can be achieved under the proposed 2-stage OADS design by utilizing the auxiliary covariate information when compared with other alternative sampling schemes. We illustrate the proposed method by analyzing a data set from an environmental epidemiologic study.  相似文献   

17.
Holcroft CA  Spiegelman D 《Biometrics》1999,55(4):1193-1201
We compared several validation study designs for estimating the odds ratio of disease with misclassified exposure. We assumed that the outcome and misclassified binary covariate are available and that the error-free binary covariate is measured in a subsample, the validation sample. We considered designs in which the total size of the validation sample is fixed and the probability of selection into the validation sample may depend on outcome and misclassified covariate values. Design comparisons were conducted for rare and common disease scenarios, where the optimal design is the one that minimizes the variance of the maximum likelihood estimator of the true log odds ratio relating the outcome to the exposure of interest. Misclassification rates were assumed to be independent of the outcome. We used a sensitivity analysis to assess the effect of misspecifying the misclassification rates. Under the scenarios considered, our results suggested that a balanced design, which allocates equal numbers of validation subjects into each of the four outcome/mismeasured covariate categories, is preferable for its simplicity and good performance. A user-friendly Fortran program is available from the second author, which calculates the optimal sampling fractions for all designs considered and the efficiencies of these designs relative to the optimal hybrid design for any scenario of interest.  相似文献   

18.
Yan W  Hu Y  Geng Z 《Biometrics》2012,68(1):121-128
We discuss identifiability and estimation of causal effects of a treatment in subgroups defined by a covariate that is sometimes missing due to death, which is different from a problem with outcomes censored by death. Frangakis et al. (2007, Biometrics 63, 641-662) proposed an approach for estimating the causal effects under a strong monotonicity (SM) assumption. In this article, we focus on identifiability of the joint distribution of the covariate, treatment and potential outcomes, show sufficient conditions for identifiability, and relax the SM assumption to monotonicity (M) and no-interaction (NI) assumptions. We derive expectation-maximization algorithms for finding the maximum likelihood estimates of parameters of the joint distribution under different assumptions. Further we remove the M and NI assumptions, and prove that signs of the causal effects of a treatment in the subgroups are identifiable, which means that their bounds do not cover zero. We perform simulations and a sensitivity analysis to evaluate our approaches. Finally, we apply the approaches to the National Study on the Costs and Outcomes of Trauma Centers data, which are also analyzed by Frangakis et al. (2007) and Xie and Murphy (2007, Biometrics 63, 655-658).  相似文献   

19.
D C Thomas  M Blettner  N E Day 《Biometrics》1992,48(3):781-794
A method is proposed for analysis of nested case-control studies that combines the matched comparison of covariate values between cases and controls and a comparison of the observed numbers of cases in the nesting cohort with expected numbers based on external rates and average relative risks estimated from the controls. The former comparison is based on the conditional likelihood for matched case-control studies and the latter on the unconditional likelihood for Poisson regression. It is shown that the two likelihoods are orthogonal and that their product is an estimator of the full survival likelihood that would have been obtained on the total cohort, had complete covariate data been available. Parameter estimation and significance tests follow in the usual way by maximizing this product likelihood. The method is illustrated using data on leukemia following irradiation for cervical cancer. In this study, the original cohort study showed a clear excess of leukemia in the first 15 years after exposure, but it was not feasible to obtain dose estimates on the entire cohort. However, the subsequent nested case-control study failed to demonstrate significant differences between alternative dose-response relations and effects of time-related modifiers. The combined analysis allows much clearer discrimination between alternative dose-time-response models.  相似文献   

20.
K Y Liang 《Biometrics》1987,43(2):289-299
A class of estimating functions is proposed for the estimation of multivariate relative risk in stratified case-control studies. It reduces to the well-known Mantel-Haenszel estimator when there is a single binary risk factor. Large-sample properties of the solutions to the proposed estimating equations are established for two distinct situations. Efficiency calculations suggest that the proposed estimators are nearly fully efficient relative to the conditional maximum likelihood estimator for the parameters considered. Application of the proposed method to family data and longitudinal data, where the conditional likelihood approach fails, is discussed. Two examples from case-control studies and one example from a study on familial aggregation are presented.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号