首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 281 毫秒
1.
Wang CY  Huang WT 《Biometrics》2000,56(1):98-105
We consider estimation in logistic regression where some covariate variables may be missing at random. Satten and Kupper (1993, Journal of the American Statistical Association 88, 200-208) proposed estimating odds ratio parameters using methods based on the probability of exposure. By approximating a partial likelihood, we extend their idea and propose a method that estimates the cumulant-generating function of the missing covariate given observed covariates and surrogates in the controls. Our proposed method first estimates some lower order cumulants of the conditional distribution of the unobserved data and then solves a resulting estimating equation for the logistic regression parameter. A simple version of the proposed method is to replace a missing covariate by the summation of its conditional mean and conditional variance given observed data in the controls. We note that one important property of the proposed method is that, when the validation is only on controls, a class of inverse selection probability weighted semiparametric estimators cannot be applied because selection probabilities on cases are zeroes. The proposed estimator performs well unless the relative risk parameters are large, even though it is technically inconsistent. Small-sample simulations are conducted. We illustrate the method by an example of real data analysis.  相似文献   

2.
Some covariance models for longitudinal count data with overdispersion   总被引:9,自引:0,他引:9  
P F Thall  S C Vail 《Biometrics》1990,46(3):657-671
A family of covariance models for longitudinal counts with predictive covariates is presented. These models account for overdispersion, heteroscedasticity, and dependence among repeated observations. The approach is a quasi-likelihood regression similar to the formulation given by Liang and Zeger (1986, Biometrika 73, 13-22). Generalized estimating equations for both the covariate parameters and the variance-covariance parameters are presented. Large-sample properties of the parameter estimates are derived. The proposed methods are illustrated by an analysis of epileptic seizure count data arising from a study of progabide as an adjuvant therapy for partial seizures.  相似文献   

3.
Summary We discuss design and analysis of longitudinal studies after case–control sampling, wherein interest is in the relationship between a longitudinal binary response that is related to the sampling (case–control) variable, and a set of covariates. We propose a semiparametric modeling framework based on a marginal longitudinal binary response model and an ancillary model for subjects' case–control status. In this approach, the analyst must posit the population prevalence of being a case, which is then used to compute an offset term in the ancillary model. Parameter estimates from this model are used to compute offsets for the longitudinal response model. Examining the impact of population prevalence and ancillary model misspecification, we show that time‐invariant covariate parameter estimates, other than the intercept, are reasonably robust, but intercept and time‐varying covariate parameter estimates can be sensitive to such misspecification. We study design and analysis issues impacting study efficiency, namely: choice of sampling variable and the strength of its relationship to the response, sample stratification, choice of working covariance weighting, and degree of flexibility of the ancillary model. The research is motivated by a longitudinal study following case–control sampling of the time course of attention deficit hyperactivity disorder (ADHD) symptoms.  相似文献   

4.
In the linear model with right-censored responses and many potential explanatory variables, regression parameter estimates may be unstable or, when the covariates outnumber the uncensored observations, not estimable. We propose an iterative algorithm for partial least squares, based on the Buckley-James estimating equation, to estimate the covariate effect and predict the response for a future subject with a given set of covariates. We use a leave-two-out cross-validation method for empirically selecting the number of components in the partial least-squares fit that approximately minimizes the error in estimating the covariate effect of a future observation. Simulation studies compare the methods discussed here with other dimension reduction techniques. Data from the AIDS Clinical Trials Group protocol 333 are used to motivate the methodology.  相似文献   

5.
Wang YG  Zhao Y 《Biometrics》2007,63(3):681-689
We consider the analysis of longitudinal data when the covariance function is modeled by additional parameters to the mean parameters. In general, inconsistent estimators of the covariance (variance/correlation) parameters will be produced when the "working" correlation matrix is misspecified, which may result in great loss of efficiency of the mean parameter estimators (albeit the consistency is preserved). We consider using different "working" correlation models for the variance and the mean parameters. In particular, we find that an independence working model should be used for estimating the variance parameters to ensure their consistency in case the correlation structure is misspecified. The designated "working" correlation matrices should be used for estimating the mean and the correlation parameters to attain high efficiency for estimating the mean parameters. Simulation studies indicate that the proposed algorithm performs very well. We also applied different estimation procedures to a data set from a clinical trial for illustration.  相似文献   

6.
We propose an extension to the estimating equations in generalized linear models to estimate parameters in the link function and variance structure simultaneously with regression coefficients. Rather than focusing on the regression coefficients, the purpose of these models is inference about the mean of the outcome as a function of a set of covariates, and various functionals of the mean function used to measure the effects of the covariates. A commonly used functional in econometrics, referred to as the marginal effect, is the partial derivative of the mean function with respect to any covariate, averaged over the empirical distribution of covariates in the model. We define an analogous parameter for discrete covariates. The proposed estimation method not only helps to identify an appropriate link function and to suggest an underlying distribution for a specific application but also serves as a robust estimator when no specific distribution for the outcome measure can be identified. Using Monte Carlo simulations, we show that the resulting parameter estimators are consistent. The method is illustrated with an analysis of inpatient expenditure data from a study of hospitalists.  相似文献   

7.
Incomplete covariate data are a common occurrence in studies in which the outcome is survival time. Further, studies in the health sciences often give rise to correlated, possibly censored, survival data. With no missing covariate data, if the marginal distributions of the correlated survival times follow a given parametric model, then the estimates using the maximum likelihood estimating equations, naively treating the correlated survival times as independent, give consistent estimates of the relative risk parameters Lipsitz et al. 1994 50, 842-846. Now, suppose that some observations within a cluster have some missing covariates. We show in this paper that if one naively treats observations within a cluster as independent, that one can still use the maximum likelihood estimating equations to obtain consistent estimates of the relative risk parameters. This method requires the estimation of the parameters of the distribution of the covariates. We present results from a clinical trial Lipsitz and Ibrahim (1996b) 2, 5-14 with five covariates, four of which have some missing values. In the trial, the clusters are the hospitals in which the patients were treated.  相似文献   

8.
Wang YG 《Biometrics》1999,55(3):984-989
Troxel, Lipsitz, and Brennan (1997, Biometrics 53, 857-869) considered parameter estimation from survey data with nonignorable nonresponse and proposed weighted estimating equations to remove the biases in the complete-case analysis that ignores missing observations. This paper suggests two alternative modifications for unbiased estimation of regression parameters when a binary outcome is potentially observed at successive time points. The weighting approach of Robins, Rotnitzky, and Zhao (1995, Journal of the American Statistical Association 90, 106-121) is also modified to obtain unbiased estimating functions. The suggested estimating functions are unbiased only when the missingness probability is correctly specified, and misspecification of the missingness model will result in biases in the estimates. Simulation studies are carried out to assess the performance of different methods when the covariate is binary or normal. For the simulation models used, the relative efficiency of the two new methods to the weighting methods is about 3.0 for the slope parameter and about 2.0 for the intercept parameter when the covariate is continuous and the missingness probability is correctly specified. All methods produce substantial biases in the estimates when the missingness model is misspecified or underspecified. Analysis of data from a medical survey illustrates the use and possible differences of these estimating functions.  相似文献   

9.
X Liu  K Y Liang 《Biometrics》1992,48(2):645-654
Ignoring measurement error may cause bias in the estimation of regression parameters. When the true covariates are unobservable, multiple imprecise measurements can be used in the analysis to correct for the associated bias. We suggest a simple estimating procedure that gives consistent estimates of regression parameters by using the repeated measurements with error. The relative Pitman efficiency of our estimator based on models with and without measurement error has been found to be a simple function of the number of replicates and the ratio of intra- to inter-variance of the true covariate. The procedure thus provides a guide for deciding the number of repeated measurements in the design stage. An example from a survey study is presented.  相似文献   

10.
Efficiency of regression estimates for clustered data   总被引:1,自引:0,他引:1  
Mancl LA  Leroux BG 《Biometrics》1996,52(2):500-511
Statistical methods for clustered data, such as generalized estimating equations (GEE) and generalized least squares (GLS), require selecting a correlation or convariance structure to specify the dependence between observations within a cluster. Valid regression estimates can be obtained that do not depend on correct specification of the true correlation, but inappropriate specifications can result in a loss of efficiency. We derive general expressions for the asymptotic relative efficiency of GEE and GLS estimators under nested correlation structures. Efficiency is shown to depend on the covariate distribution, the cluster sizes, the response variable correlation, and the regression parameters. The results demonstrate that efficiency is quite sensitive to the between- and within-cluster variation of the covariates, and provide useful characterizations of models for which upper and lower efficiency bounds are attained. Efficiency losses for simple working correlation matrices, such as independence, can be large even for small to moderate correlations and cluster sizes.  相似文献   

11.
Yip PS  Lin HZ  Xi L 《Biometrics》2005,61(4):1085-1092
A semiparametric estimation procedure is proposed to model capture-recapture data with the aim of estimating the population size for a closed population. Individuals' covariates are possibly time dependent and missing at noncaptured times and may be measured with error. A set of estimating equations (EEs) based on covariate process and capture-recapture data is constructed to estimate the relevant parameters and the population size. These EEs can be solved by an algorithm similar to an EM algorithm. Simulation results show that the proposed procedures work better than the naive estimate. In some cases they are even better than "ideal" estimates, for which the true values of covariates are available for all captured subjects over the entire experimental period. We apply the method to a capture-recapture experiment on the bird species Prinia flaviventris in Hong Kong.  相似文献   

12.
13.
Censored quantile regression models, which offer great flexibility in assessing covariate effects on event times, have attracted considerable research interest. In this study, we consider flexible estimation and inference procedures for competing risks quantile regression, which not only provides meaningful interpretations by using cumulative incidence quantiles but also extends the conventional accelerated failure time model by relaxing some of the stringent model assumptions, such as global linearity and unconditional independence. Current method for censored quantile regressions often involves the minimization of the L1‐type convex function or solving the nonsmoothed estimating equations. This approach could lead to multiple roots in practical settings, particularly with multiple covariates. Moreover, variance estimation involves an unknown error distribution and most methods rely on computationally intensive resampling techniques such as bootstrapping. We consider the induced smoothing procedure for censored quantile regressions to the competing risks setting. The proposed procedure permits the fast and accurate computation of quantile regression parameter estimates and standard variances by using conventional numerical methods such as the Newton–Raphson algorithm. Numerical studies show that the proposed estimators perform well and the resulting inference is reliable in practical settings. The method is finally applied to data from a soft tissue sarcoma study.  相似文献   

14.
We present a parametric family of regression models for interval-censored event-time (survival) data that accomodates both fixed (e.g. baseline) and time-dependent covariates. The model employs a three-parameter family of survival distributions that includes the Weibull, negative binomial, and log-logistic distributions as special cases, and can be applied to data with left, right, interval, or non-censored event times. Standard methods, such as Newton-Raphson, can be employed to estimate the model and the resulting estimates have an asymptotically normal distribution about the true values with a covariance matrix that is consistently estimated by the information function. The deviance function is described to assess model fit and a robust sandwich estimate of the covariance may also be employed to provide asymptotically robust inferences when the model assumptions do not apply. Spline functions may also be employed to allow for non-linear covariates. The model is applied to data from a long-term study of type 1 diabetes to describe the effects of longitudinal measures of glycemia (HbA1c) over time (the time-dependent covariate) on the risk of progression of diabetic retinopathy (eye disease), an interval-censored event-time outcome.  相似文献   

15.
We consider longitudinal studies in which the outcome observed over time is binary and the covariates of interest are categorical. With no missing responses or covariates, one specifies a multinomial model for the responses given the covariates and uses maximum likelihood to estimate the parameters. Unfortunately, incomplete data in the responses and covariates are a common occurrence in longitudinal studies. Here we assume the missing data are missing at random (Rubin, 1976, Biometrika 63, 581-592). Since all of the missing data (responses and covariates) are categorical, a useful technique for obtaining maximum likelihood parameter estimates is the EM algorithm by the method of weights proposed in Ibrahim (1990, Journal of the American Statistical Association 85, 765-769). In using the EM algorithm with missing responses and covariates, one specifies the joint distribution of the responses and covariates. Here we consider the parameters of the covariate distribution as a nuisance. In data sets where the percentage of missing data is high, the estimates of the nuisance parameters can lead to highly unstable estimates of the parameters of interest. We propose a conditional model for the covariate distribution that has several modeling advantages for the EM algorithm and provides a reduction in the number of nuisance parameters, thus providing more stable estimates in finite samples.  相似文献   

16.
Chen B  Zhou XH 《Biometrics》2011,67(3):830-842
Longitudinal studies often feature incomplete response and covariate data. Likelihood-based methods such as the expectation-maximization algorithm give consistent estimators for model parameters when data are missing at random (MAR) provided that the response model and the missing covariate model are correctly specified; however, we do not need to specify the missing data mechanism. An alternative method is the weighted estimating equation, which gives consistent estimators if the missing data and response models are correctly specified; however, we do not need to specify the distribution of the covariates that have missing values. In this article, we develop a doubly robust estimation method for longitudinal data with missing response and missing covariate when data are MAR. This method is appealing in that it can provide consistent estimators if either the missing data model or the missing covariate model is correctly specified. Simulation studies demonstrate that this method performs well in a variety of situations.  相似文献   

17.
Shrinkage Estimators for Covariance Matrices   总被引:1,自引:0,他引:1  
Estimation of covariance matrices in small samples has been studied by many authors. Standard estimators, like the unstructured maximum likelihood estimator (ML) or restricted maximum likelihood (REML) estimator, can be very unstable with the smallest estimated eigenvalues being too small and the largest too big. A standard approach to more stably estimating the matrix in small samples is to compute the ML or REML estimator under some simple structure that involves estimation of fewer parameters, such as compound symmetry or independence. However, these estimators will not be consistent unless the hypothesized structure is correct. If interest focuses on estimation of regression coefficients with correlated (or longitudinal) data, a sandwich estimator of the covariance matrix may be used to provide standard errors for the estimated coefficients that are robust in the sense that they remain consistent under misspecification of the covariance structure. With large matrices, however, the inefficiency of the sandwich estimator becomes worrisome. We consider here two general shrinkage approaches to estimating the covariance matrix and regression coefficients. The first involves shrinking the eigenvalues of the unstructured ML or REML estimator. The second involves shrinking an unstructured estimator toward a structured estimator. For both cases, the data determine the amount of shrinkage. These estimators are consistent and give consistent and asymptotically efficient estimates for regression coefficients. Simulations show the improved operating characteristics of the shrinkage estimators of the covariance matrix and the regression coefficients in finite samples. The final estimator chosen includes a combination of both shrinkage approaches, i.e., shrinking the eigenvalues and then shrinking toward structure. We illustrate our approach on a sleep EEG study that requires estimation of a 24 x 24 covariance matrix and for which inferences on mean parameters critically depend on the covariance estimator chosen. We recommend making inference using a particular shrinkage estimator that provides a reasonable compromise between structured and unstructured estimators.  相似文献   

18.
Summary .  In this article, we study the estimation of mean response and regression coefficient in semiparametric regression problems when response variable is subject to nonrandom missingness. When the missingness is independent of the response conditional on high-dimensional auxiliary information, the parametric approach may misspecify the relationship between covariates and response while the nonparametric approach is infeasible because of the curse of dimensionality. To overcome this, we study a model-based approach to condense the auxiliary information and estimate the parameters of interest nonparametrically on the condensed covariate space. Our estimators possess the double robustness property, i.e., they are consistent whenever the model for the response given auxiliary covariates or the model for the missingness given auxiliary covariate is correct. We conduct a number of simulations to compare the numerical performance between our estimators and other existing estimators in the current missing data literature, including the propensity score approach and the inverse probability weighted estimating equation. A set of real data is used to illustrate our approach.  相似文献   

19.
Several methods have been proposed to estimate the variance in disease liability explained by large sets of genetic markers. However, current methods do not scale up well to large sample sizes. Linear mixed models require solving high-dimensional matrix equations, and methods that use polygenic scores are very computationally intensive. Here we propose a fast analytic method that uses polygenic scores, based on the formula for the non-centrality parameter of the association test of the score. We estimate model parameters from the results of multiple polygenic score tests based on markers with p values in different intervals. We estimate parameters by maximum likelihood and use profile likelihood to compute confidence intervals. We compare various options for constructing polygenic scores, based on nested or disjoint intervals of p values, weighted or unweighted effect sizes, and different numbers of intervals, in estimating the variance explained by a set of markers, the proportion of markers with effects, and the genetic covariance between a pair of traits. Our method provides nearly unbiased estimates and confidence intervals with good coverage, although estimation of the variance is less reliable when jointly estimated with the covariance. We find that disjoint p value intervals perform better than nested intervals, but the weighting did not affect our results. A particular advantage of our method is that it can be applied to summary statistics from single markers, and so can be quickly applied to large consortium datasets. Our method, named AVENGEME (Additive Variance Explained and Number of Genetic Effects Method of Estimation), is implemented in R software.  相似文献   

20.
In clinical trials of chronic diseases such as acquired immunodeficiency syndrome, cancer, or cardiovascular diseases, the concept of quality-adjusted lifetime (QAL) has received more and more attention. In this paper, we consider the problem of how the covariates affect the mean QAL when the data are subject to right censoring. We allow a very general form for the mean model as a function of covariates. Using the idea of inverse probability weighting, we first construct a simple weighted estimating equation for the parameters in our mean model. We then find the form of the most efficient estimating equation, which yields the most efficient estimator for the regression parameters. Since the most efficient estimator depends on the distribution of the health history processes, and thus cannot be estimated nonparametrically, we consider different approaches for improving the efficiency of the simple weighted estimating equation using observed data. The applicability of these methods is demonstrated by both simulation experiments and a data example from a breast cancer clinical trial study.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号