首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In clinical and epidemiological studies information on the primary outcome of interest, that is, the disease status, is usually collected at a limited number of follow‐up visits. The disease status can often only be retrieved retrospectively in individuals who are alive at follow‐up, but will be missing for those who died before. Right‐censoring the death cases at the last visit (ad‐hoc analysis) yields biased hazard ratio estimates of a potential risk factor, and the bias can be substantial and occur in either direction. In this work, we investigate three different approaches that use the same likelihood contributions derived from an illness‐death multistate model in order to more adequately estimate the hazard ratio by including the death cases into the analysis: a parametric approach, a penalized likelihood approach, and an imputation‐based approach. We investigate to which extent these approaches allow for an unbiased regression analysis by evaluating their performance in simulation studies and on a real data example. In doing so, we use the full cohort with complete illness‐death data as reference and artificially induce missing information due to death by setting discrete follow‐up visits. Compared to an ad‐hoc analysis, all considered approaches provide less biased or even unbiased results, depending on the situation studied. In the real data example, the parametric approach is seen to be too restrictive, whereas the imputation‐based approach could almost reconstruct the original event history information.  相似文献   

2.
We investigate the use of follow-up samples of individuals to estimate survival curves from studies that are subject to right censoring from two sources: (i) early termination of the study, namely, administrative censoring, or (ii) censoring due to lost data prior to administrative censoring, so-called dropout. We assume that, for the full cohort of individuals, administrative censoring times are independent of the subjects' inherent characteristics, including survival time. To address the loss to censoring due to dropout, which we allow to be possibly selective, we consider an intensive second phase of the study where a representative sample of the originally lost subjects is subsequently followed and their data recorded. As with double-sampling designs in survey methodology, the objective is to provide data on a representative subset of the dropouts. Despite assumed full response from the follow-up sample, we show that, in general in our setting, administrative censoring times are not independent of survival times within the two subgroups, nondropouts and sampled dropouts. As a result, the stratified Kaplan-Meier estimator is not appropriate for the cohort survival curve. Moreover, using the concept of potential outcomes, as opposed to observed outcomes, and thereby explicitly formulating the problem as a missing data problem, reveals and addresses these complications. We present an estimation method based on the likelihood of an easily observed subset of the data and study its properties analytically for large samples. We evaluate our method in a realistic situation by simulating data that match published margins on survival and dropout from an actual hip-replacement study. Limitations and extensions of our design and analytic method are discussed.  相似文献   

3.
In many observational studies, individuals are measured repeatedly over time, although not necessarily at a set of prespecified occasions. Instead, individuals may be measured at irregular intervals, with those having a history of poorer health outcomes being measured with somewhat greater frequency and regularity; i.e., those individuals with poorer health outcomes may have more frequent follow-up measurements and the intervals between their repeated measurements may be shorter. In this article, we consider estimation of regression parameters in models for longitudinal data where the follow-up times are not fixed by design but can depend on previous outcomes. In particular, we focus on general linear models for longitudinal data where the repeated measures are assumed to have a multivariate Gaussian distribution. We consider assumptions regarding the follow-up time process that result in the likelihood function separating into two components: one for the follow-up time process, the other for the outcome process. The practical implication of this separation is that the former process can be ignored when making likelihood-based inferences about the latter; i.e., maximum likelihood (ML) estimation of the regression parameters relating the mean of the longitudinal outcomes to covariates does not require that a model for the distribution of follow-up times be specified. As a result, standard statistical software, e.g., SAS PROC MIXED (Littell et al., 1996, SAS System for Mixed Models), can be used to analyze the data. However, we also demonstrate that misspecification of the model for the covariance among the repeated measures will, in general, result in regression parameter estimates that are biased. Furthermore, results of a simulation study indicate that the potential bias due to misspecification of the covariance can be quite considerable in this setting. Finally, we illustrate these results using data from a longitudinal observational study (Lipshultz et al., 1995, New England Journal of Medicine 332, 1738-1743) that explored the cardiotoxic effects of doxorubicin chemotherapy for the treatment of acute lymphoblastic leukemia in children.  相似文献   

4.
In many observational studies, individuals are measured repeatedly over time, although not necessarily at a set of pre-specified occasions. Instead, individuals may be measured at irregular intervals, with those having a history of poorer health outcomes being measured with somewhat greater frequency and regularity. In this paper, we consider likelihood-based estimation of the regression parameters in marginal models for longitudinal binary data when the follow-up times are not fixed by design, but can depend on previous outcomes. In particular, we consider assumptions regarding the follow-up time process that result in the likelihood function separating into two components: one for the follow-up time process, the other for the outcome measurement process. The practical implication of this separation is that the follow-up time process can be ignored when making likelihood-based inferences about the marginal regression model parameters. That is, maximum likelihood (ML) estimation of the regression parameters relating the probability of success at a given time to covariates does not require that a model for the distribution of follow-up times be specified. However, to obtain consistent parameter estimates, the multinomial distribution for the vector of repeated binary outcomes must be correctly specified. In general, ML estimation requires specification of all higher-order moments and the likelihood for a marginal model can be intractable except in cases where the number of repeated measurements is relatively small. To circumvent these difficulties, we propose a pseudolikelihood for estimation of the marginal model parameters. The pseudolikelihood uses a linear approximation for the conditional distribution of the response at any occasion, given the history of previous responses. The appeal of this approximation is that the conditional distributions are functions of the first two moments of the binary responses only. When the follow-up times depend only on the previous outcome, the pseudolikelihood requires correct specification of the conditional distribution of the current outcome given the outcome at the previous occasion only. Results from a simulation study and a study of asymptotic bias are presented. Finally, we illustrate the main results using data from a longitudinal observational study that explored the cardiotoxic effects of doxorubicin chemotherapy for the treatment of acute lymphoblastic leukemia in children.  相似文献   

5.
We develop nonparametric maximum likelihood estimation for the parameters of an irreversible Markov chain on states from the observations with interval censored times of 0 → 1, 0 → 2 and 1 → 2 transitions. The distinguishing aspect of the data is that, in addition to all transition times being interval censored, the times of two events (0 → 1 and 1 → 2 transitions) can be censored into the same interval. This development was motivated by a common data structure in oral health research, here specifically illustrated by the data from a prospective cohort study on the longevity of dental veneers. Using the self‐consistency algorithm we obtain the maximum likelihood estimators of the cumulative incidences of the times to events 1 and 2 and of the intensity of the 1 → 2 transition. This work generalizes previous results on the estimation in an “illness‐death” model from interval censored observations.  相似文献   

6.
Chan KC  Wang MC 《Biometrics》2012,68(2):521-531
A prevalent sample consists of individuals who have experienced disease incidence but not failure event at the sampling time. We discuss methods for estimating the distribution function of a random vector defined at baseline for an incident disease population when data are collected by prevalent sampling. Prevalent sampling design is often more focused and economical than incident study design for studying the survival distribution of a diseased population, but prevalent samples are biased by design. Subjects with longer survival time are more likely to be included in a prevalent cohort, and other baseline variables of interests that are correlated with survival time are also subject to sampling bias induced by the prevalent sampling scheme. Without recognition of the bias, applying empirical distribution function to estimate the population distribution of baseline variables can lead to serious bias. In this article, nonparametric and semiparametric methods are developed for distribution estimation of baseline variables using prevalent data.  相似文献   

7.
Jing Qin  Yu Shen 《Biometrics》2010,66(2):382-392
Summary Length‐biased time‐to‐event data are commonly encountered in applications ranging from epidemiological cohort studies or cancer prevention trials to studies of labor economy. A longstanding statistical problem is how to assess the association of risk factors with survival in the target population given the observed length‐biased data. In this article, we demonstrate how to estimate these effects under the semiparametric Cox proportional hazards model. The structure of the Cox model is changed under length‐biased sampling in general. Although the existing partial likelihood approach for left‐truncated data can be used to estimate covariate effects, it may not be efficient for analyzing length‐biased data. We propose two estimating equation approaches for estimating the covariate coefficients under the Cox model. We use the modern stochastic process and martingale theory to develop the asymptotic properties of the estimators. We evaluate the empirical performance and efficiency of the two methods through extensive simulation studies. We use data from a dementia study to illustrate the proposed methodology, and demonstrate the computational algorithms for point estimates, which can be directly linked to the existing functions in S‐PLUS or R .  相似文献   

8.
Summary .   Motivated by the spatial modeling of aberrant crypt foci (ACF) in colon carcinogenesis, we consider binary data with probabilities modeled as the sum of a nonparametric mean plus a latent Gaussian spatial process that accounts for short-range dependencies. The mean is modeled in a general way using regression splines. The mean function can be viewed as a fixed effect and is estimated with a penalty for regularization. With the latent process viewed as another random effect, the model becomes a generalized linear mixed model. In our motivating data set and other applications, the sample size is too large to easily accommodate maximum likelihood or restricted maximum likelihood estimation (REML), so pairwise likelihood, a special case of composite likelihood, is used instead. We develop an asymptotic theory for models that are sufficiently general to be used in a wide variety of applications, including, but not limited to, the problem that motivated this work. The splines have penalty parameters that must converge to zero asymptotically: we derive theory for this along with a data-driven method for selecting the penalty parameter, a method that is shown in simulations to improve greatly upon standard devices, such as likelihood crossvalidation. Finally, we apply the methods to the data from our experiment ACF. We discover an unexpected location for peak formation of ACF.  相似文献   

9.
Interval‐censored recurrent event data arise when the event of interest is not readily observed but the cumulative event count can be recorded at periodic assessment times. In some settings, chronic disease processes may resolve, and individuals will cease to be at risk of events at the time of disease resolution. We develop an expectation‐maximization algorithm for fitting a dynamic mover‐stayer model to interval‐censored recurrent event data under a Markov model with a piecewise‐constant baseline rate function given a latent process. The model is motivated by settings in which the event times and the resolution time of the disease process are unobserved. The likelihood and algorithm are shown to yield estimators with small empirical bias in simulation studies. Data are analyzed on the cumulative number of damaged joints in patients with psoriatic arthritis where individuals experience disease remission.  相似文献   

10.
A convenient measure of fecundability is time (number of menstrual cycles) required to achieve pregnancy. Couples attempting pregnancy are heterogeneous in their per-cycle probability of success. If success probabilities vary among couples according to a beta distribution, then cycles to pregnancy will have a beta-geometric distribution. Under this model, the inverse of the cycle-specific conception rate is a linear function of time. Data on cycles to pregnancy can be used to estimate the beta parameters by maximum likelihood in a straightforward manner with a package such as GLIM. The likelihood ratio test can thus be employed in studies of exposures that may impair fecundability. Covariates are incorporated in a natural way. The model is illustrated by applying it to data on cycles to pregnancy in smokers and nonsmokers, with adjustment for covariates. For a cross-sectional study, when length-biased sampling is taken into account, the pre-interview attempt time is shown to follow a beta-geometric distribution, so that the same methods of analysis can be applied even though all of the available data are right-censored. For a cohort followed prospectively, there will be some couples enrolled whose fecundability is effectively 0, and for such applications, the beta could be considered to be contaminated by a distribution degenerate at 0. The mixing parameter (proportion sterile) can be estimated by application of the expectation-maximization (EM) algorithm. This, too, can be carried out using GLIM.  相似文献   

11.
Ning J  Qin J  Shen Y 《Biometrics》2011,67(4):1369-1378
We present a natural generalization of the Buckley-James-type estimator for traditional survival data to right-censored length-biased data under the accelerated failure time (AFT) model. Length-biased data are often encountered in prevalent cohort studies and cancer screening trials. Informative right censoring induced by length-biased sampling creates additional challenges in modeling the effects of risk factors on the unbiased failure times for the target population. In this article, we evaluate covariate effects on the failure times of the target population under the AFT model given the observed length-biased data. We construct a Buckley-James-type estimating equation, develop an iterative computing algorithm, and establish the asymptotic properties of the estimators. We assess the finite-sample properties of the proposed estimators against the estimators obtained from the existing methods. Data from a prevalent cohort study of patients with dementia are used to illustrate the proposed methodology.  相似文献   

12.
Geriatric dentistry researchers are building a basic knowledge basic pertaining to the oral health status of older adults. Important findings on the prevalence of disease that run counter to “conventional wisdom” surrounding the oral health of older adults are that edentulism is decreasing, that both coronal and root caries are prevalent, that serious periodontal disease is not as prevalent as thought, that chronological age is not strongly associated with disease in older adults, and that oral lesions, especially those related to dentures, are prevalent. An important finding has been that the majority of disease seems to occur in a minority of the population. While the prevalence of oral diseases have been shown to be associated with water fluoridation, systemic diseases, use of medications, social and behavioral factors, and a variety of other oral conditions, there is only preliminary information on the incidence of disease and actual risk factors. The data available on the incidence of disease come from a study of community-dwelling older adults in Iowa, and these data generally confirm the prevalence results. Coronal and root caries are active in this older population with caries being the best predictor of tooth loss. Furthermore, most disease occurs in a high risk group. Multivariate models predicting people at highest risk for root and coronal caries implicate stress and anxiety, use of tobacco, and recent onset of illness as risk factors. In addition, preliminary results indicate that some dental conditions may be predictive of general health status.  相似文献   

13.
14.
A class of discrete-time models of infectious disease spread, referred to as individual-level models (ILMs), are typically fitted in a Bayesian Markov chain Monte Carlo (MCMC) framework. These models quantify probabilistic outcomes regarding the risk of infection of susceptible individuals due to various susceptibility and transmissibility factors, including their spatial distance from infectious individuals. The infectious pressure from infected individuals exerted on susceptible individuals is intrinsic to these ILMs. Unfortunately, quantifying this infectious pressure for data sets containing many individuals can be computationally burdensome, leading to a time-consuming likelihood calculation and, thus, computationally prohibitive MCMC-based analysis. This problem worsens when using data augmentation to allow for uncertainty in infection times. In this paper, we develop sampling methods that can be used to calculate a fast, approximate likelihood when fitting such disease models. A simple random sampling approach is initially considered followed by various spatially-stratified schemes. We test and compare the performance of our methods with both simulated data and data from the 2001 foot-and-mouth disease (FMD) epidemic in the U.K. Our results indicate that substantial computation savings can be obtained—albeit, of course, with some information loss—suggesting that such techniques may be of use in the analysis of very large epidemic data sets.  相似文献   

15.
Independent replication of linkage in previously studied pedigrees is desirable when genetic heterogeneity is suspected or when the illness is very rare. When the likelihood of the new data in this type of replication study is computed as conditional on the previously reported linkage results, it can be considered independent. We describe a simulation method using the SLINK program in which the initial data are fixed and newly genotyped individuals are simulated under theta = .01 and theta = .50. These give appropriate lod score criteria for rejection and acceptance of linkage in the follow-up study, which take into account the original marker genotypes in the data. An estimate of the power to detect linkage in the follow-up data is also generated.  相似文献   

16.
By monitoring the in vivo incorporation of low concentrations of radiolabeled adenine into acid-soluble compounds, we observed the unusual accumulation of two nucleosides in Saccharomyces cerevisiae that were previously considered products of nucleotide degradation. Under the culture conditions used in the present study, radiolabeled adenosine was the major acid-soluble intracellular derivative, and radiolabeled inosine was initially detected as the second most prevalent derivative in a mutant lacking adenine aminohydrolase. The use of yeast mutants defective in the conversion of adenine to hypoxanthine or to AMP renders very unlikely the possibility that the presence of adenosine and inosine is attributable to nucleotide degradation. These data can be explained by postulating the existence of two enzyme activities not previously reported in S. cerevisiae. The first of these activities transfers ribose to the purine ring and may be attributable to purine nucleoside phosphorylase (EC 2.4.2.1) or adenosine phosphorylase (EC 2.4.2.-). The second enzyme converts adenosine to inosine and in all likelihood is adenosine aminohydrolase (EC 3.5.4.4).  相似文献   

17.
Empirical estimates of the incubation period of influenza A (H1N1-2009) have been limited. We estimated the incubation period among confirmed imported cases who traveled to Japan from Hawaii during the early phase of the 2009 pandemic (n=72). We addressed censoring and employed an infection-age structured argument to explicitly model the daily frequency of illness onset after departure. We assumed uniform and exponential distributions for the frequency of exposure in Hawaii, and the hazard rate of infection for the latter assumption was retrieved, in Hawaii, from local outbreak data. The maximum likelihood estimates of the median incubation period range from 1.43 to 1.64 days according to different modeling assumptions, consistent with a published estimate based on a New York school outbreak. The likelihood values of the different modeling assumptions do not differ greatly from each other, although models with the exponential assumption yield slightly shorter incubation periods than those with the uniform exposure assumption. Differences between our proposed approach and a published method for doubly interval-censored analysis highlight the importance of accounting for the dependence of the frequency of exposure on the survival function of incubating individuals among imported cases. A truncation of the density function of the incubation period due to an absence of illness onset during the exposure period also needs to be considered. When the data generating process is similar to that among imported cases, and when the incubation period is close to or shorter than the length of exposure, accounting for these aspects is critical for long exposure times.  相似文献   

18.
Mixture cure models have been utilized to analyze survival data with possible cure. This paper considers the inclusion of frailty into the mixture cure model to model recurrent event data with a cure fraction. An attractive feature of the proposed model is the allowance for heterogeneity in risk among those individuals experiencing the event of interest in addition to the incorporation of a cured component. Maximum likelihood estimates can be obtained using the Expectation Maximization algorithm and standard errors are calculated from the Bootstrap method. The model is applied to hospital readmission data among colorectal cancer patients.  相似文献   

19.
Summary This article develops semiparametric approaches for estimation of propensity scores and causal survival functions from prevalent survival data. The analytical problem arises when the prevalent sampling is adopted for collecting failure times and, as a result, the covariates are incompletely observed due to their association with failure time. The proposed procedure for estimating propensity scores shares interesting features similar to the likelihood formulation in case‐control study, but in our case it requires additional consideration in the intercept term. The result shows that the corrected propensity scores in logistic regression setting can be obtained through standard estimation procedure with specific adjustments on the intercept term. For causal estimation, two different types of missing sources are encountered in our model: one can be explained by potential outcome framework; the other is caused by the prevalent sampling scheme. Statistical analysis without adjusting bias from both sources of missingness will lead to biased results in causal inference. The proposed methods were partly motivated by and applied to the Surveillance, Epidemiology, and End Results (SEER)‐Medicare linked data for women diagnosed with breast cancer.  相似文献   

20.
Demographic studies focusing on age-specific mortality rates are becoming increasingly common throughout the fields of life-history evolution, ecology and biogerontology. Well-defined statistical techniques for quantifying patterns of mortality within a cohort and identifying differences in age-specific mortality among cohorts are needed. Here I discuss using maximum likelihood (ML) statistical methods to estimate the parameters of mathematical models, which are used to describe the change in mortality with age. ML provides a convenient and powerful framework for choosing an adequate mortality model, estimating model parameters and testing hypotheses about differences in parameters among experimental or ecological treatments. Simulations suggest that experiments designed to estimate age-specific mortality should involve at least 100-500 individuals per cohort per treatment. Significant bias in the estimation of model parameters is introduced when the mortality model is misspecified and samples are too small to detect the true mortality pattern. Furthermore, the lack of simple and efficient procedures for comparing different mortality models has forced the use of the Gompertz model, which specifies an exponentially increasing mortality with age, and which may not apply to the majority of experimental systems.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号