首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
There is a great deal of recent interests in modeling right‐censored clustered survival time data with a possible fraction of cured subjects who are nonsusceptible to the event of interest using marginal mixture cure models. In this paper, we consider a semiparametric marginal mixture cure model for such data and propose to extend an existing generalized estimating equation approach by a new unbiased estimating equation for the regression parameters in the latency part of the model. The large sample properties of the regression effect estimators in both incidence and the latency parts are established. The finite sample properties of the estimators are studied in simulation studies. The proposed method is illustrated with a bone marrow transplantation data and a tonsil cancer data.  相似文献   

2.
We study bias-reduced estimators of exponentially transformed parameters in general linear models (GLMs) and show how they can be used to obtain bias-reduced conditional (or unconditional) odds ratios in matched case-control studies. Two options are considered and compared: the explicit approach and the implicit approach. The implicit approach is based on the modified score function where bias-reduced estimates are obtained by using iterative procedures to solve the modified score equations. The explicit approach is shown to be a one-step approximation of this iterative procedure. To apply these approaches for the conditional analysis of matched case-control studies, with potentially unmatched confounding and with several exposures, we utilize the relation between the conditional likelihood and the likelihood of the unconditional logit binomial GLM for matched pairs and Cox partial likelihood for matched sets with appropriately setup data. The properties of the estimators are evaluated by using a large Monte Carlo simulation study and an illustration of a real dataset is shown. Researchers reporting the results on the exponentiated scale should use bias-reduced estimators since otherwise the effects can be under or overestimated, where the magnitude of the bias is especially large in studies with smaller sample sizes.  相似文献   

3.
Summary Recently meta‐analysis has been widely utilized to combine information across multiple studies to evaluate a common effect. Integrating data from similar studies is particularly useful in genomic studies where the individual study sample sizes are not large relative to the number of parameters of interest. In this article, we are interested in developing robust prognostic rules for the prediction of t ‐year survival based on multiple studies. We propose to construct a composite score for prediction by fitting a stratified semiparametric transformation model that allows the studies to have related but not identical outcomes. To evaluate the accuracy of the resulting score, we provide point and interval estimators for the commonly used accuracy measures including the time‐specific receiver operating characteristic curves, and positive and negative predictive values. We apply the proposed procedures to develop prognostic rules for the 5‐year survival of breast cancer patients based on five breast cancer genomic studies.  相似文献   

4.
Summary Nested case–control (NCC) design is a popular sampling method in large epidemiological studies for its cost effectiveness to investigate the temporal relationship of diseases with environmental exposures or biological precursors. Thomas' maximum partial likelihood estimator is commonly used to estimate the regression parameters in Cox's model for NCC data. In this article, we consider a situation in which failure/censoring information and some crude covariates are available for the entire cohort in addition to NCC data and propose an improved estimator that is asymptotically more efficient than Thomas' estimator. We adopt a projection approach that, heretofore, has only been employed in situations of random validation sampling and show that it can be well adapted to NCC designs where the sampling scheme is a dynamic process and is not independent for controls. Under certain conditions, consistency and asymptotic normality of the proposed estimator are established and a consistent variance estimator is also developed. Furthermore, a simplified approximate estimator is proposed when the disease is rare. Extensive simulations are conducted to evaluate the finite sample performance of our proposed estimators and to compare the efficiency with Thomas' estimator and other competing estimators. Moreover, sensitivity analyses are conducted to demonstrate the behavior of the proposed estimator when model assumptions are violated, and we find that the biases are reasonably small in realistic situations. We further demonstrate the proposed method with data from studies on Wilms' tumor.  相似文献   

5.
Weibin Zhong  Guoqing Diao 《Biometrics》2023,79(3):1959-1971
Two-phase studies such as case-cohort and nested case-control studies are widely used cost-effective sampling strategies. In the first phase, the observed failure/censoring time and inexpensive exposures are collected. In the second phase, a subgroup of subjects is selected for measurements of expensive exposures based on the information from the first phase. One challenging issue is how to utilize all the available information to conduct efficient regression analyses of the two-phase study data. This paper proposes a joint semiparametric modeling of the survival outcome and the expensive exposures. Specifically, we assume a class of semiparametric transformation models and a semiparametric density ratio model for the survival outcome and the expensive exposures, respectively. The class of semiparametric transformation models includes the proportional hazards model and the proportional odds model as special cases. The density ratio model is flexible in modeling multivariate mixed-type data. We develop efficient likelihood-based estimation and inference procedures and establish the large sample properties of the nonparametric maximum likelihood estimators. Extensive numerical studies reveal that the proposed methods perform well under practical settings. The proposed methods also appear to be reasonably robust under various model mis-specifications. An application to the National Wilms Tumor Study is provided.  相似文献   

6.
Cong XJ  Yin G  Shen Y 《Biometrics》2007,63(3):663-672
We consider modeling correlated survival data when cluster sizes may be informative to the outcome of interest based on a within-cluster resampling (WCR) approach and a weighted score function (WSF) method. We derive the large sample properties for the WCR estimators under the Cox proportional hazards model. We establish consistency and asymptotic normality of the regression coefficient estimators, and the weak convergence property of the estimated baseline cumulative hazard function. The WSF method is to incorporate the inverse of cluster sizes as weights in the score function. We conduct simulation studies to assess and compare the finite-sample behaviors of the estimators and apply the proposed methods to a dental study as an illustration.  相似文献   

7.
In this article, we provide a method of estimation for the treatment effect in the adaptive design for censored survival data with or without adjusting for risk factors other than the treatment indicator. Within the semiparametric Cox proportional hazards model, we propose a bias-adjusted parameter estimator for the treatment coefficient and its asymptotic confidence interval at the end of the trial. The method for obtaining an asymptotic confidence interval and point estimator is based on a general distribution property of the final test statistic from the weighted linear rank statistics at the interims with or without considering the nuisance covariates. The computation of the estimates is straightforward. Extensive simulation studies show that the asymptotic confidence intervals have reasonable nominal probability of coverage, and the proposed point estimators are nearly unbiased with practical sample sizes.  相似文献   

8.
Wahed AS  Tsiatis AA 《Biometrics》2004,60(1):124-133
Two-stage designs, where patients are initially randomized to an induction therapy and then depending upon their response and consent, are randomized to a maintenance therapy, are common in cancer and other clinical trials. The goal is to compare different combinations of primary and maintenance therapies to find the combination that is most beneficial. In practice, the analysis is usually conducted in two separate stages which does not directly address the major objective of finding the best combination. Recently Lunceford, Davidian, and Tsiatis (2002, Biometrics58, 48-57) introduced ad hoc estimators for the survival distribution and mean restricted survival time under different treatment policies. These estimators are consistent but not efficient, and do not include information from auxiliary covariates. In this article we derive estimators that are easy to compute and are more efficient than previous estimators. We also show how to improve efficiency further by taking into account additional information from auxiliary variables. Large sample properties of these estimators are derived and comparisons with other estimators are made using simulation. We apply our estimators to a leukemia clinical trial data set that motivated this study.  相似文献   

9.
Chen J  Chatterjee N 《Biometrics》2006,62(1):28-35
Genetic epidemiologic studies often collect genotype data at multiple loci within a genomic region of interest from a sample of unrelated individuals. One popular method for analyzing such data is to assess whether haplotypes, i.e., the arrangements of alleles along individual chromosomes, are associated with the disease phenotype or not. For many study subjects, however, the exact haplotype configuration on the pair of homologous chromosomes cannot be derived with certainty from the available locus-specific genotype data (phase ambiguity). In this article, we consider estimating haplotype-specific association parameters in the Cox proportional hazards model, using genotype, environmental exposure, and the disease endpoint data collected from cohort or nested case-control studies. We study alternative Expectation-Maximization algorithms for estimating haplotype frequencies from cohort and nested case-control studies. Based on a hazard function of the disease derived from the observed genotype data, we then propose a semiparametric method for joint estimation of relative-risk parameters and the cumulative baseline hazard function. The method is greatly simplified under a rare disease assumption, for which an asymptotic variance estimator is also proposed. The performance of the proposed estimators is assessed via simulation studies. An application of the proposed method is presented, using data from the Alpha-Tocopherol, Beta-Carotene Cancer Prevention Study.  相似文献   

10.
Abstract: The assumption of independent sample units is potentially violated in survival analyses where siblings comprise a high proportion of the sample. Violation of the independence assumption causes sample data to be overdispersed relative to a binomial model, which leads to underestimates of sampling variances. A variance inflation factor, c, is therefore required to obtain appropriate estimates of variances. We evaluated overdispersion in fetal and neonatal mule deer (Odocoileus hemionus) datasets where more than half of the sample units were comprised of siblings. We developed a likelihood function for estimating fetal survival when the fates of some fetuses are unknown, and we used several variations of the binomial model to estimate neonatal survival. We compared theoretical variance estimates obtained from these analyses with empirical variance estimates obtained from data-bootstrap analyses to estimate the overdispersion parameter, c. Our estimates of c for fetal survival ranged from 0.678 to 1.118, which indicate little to no evidence of overdispersion. For neonatal survival, 3 different models indicated that ĉ ranged from 1.1 to 1.4 and averaged 1.24–1.26, providing evidence of limited overdispersion (i.e., limited sibling dependence). Our results indicate that fates of sibling mule deer fetuses and neonates may often be independent even though they have the same dam. Predation tends to act independently on sibling neonates because of dam-neonate behavioral adaptations. The effect of maternal characteristics on sibling fate dependence is less straightforward and may vary by circumstance. We recommend that future neonatal survival studies incorporate additional sampling intensity to accommodate modest overdispersion (i.e., ĉ = 1.25), which would facilitate a corresponding ĉ adjustment in a model selection analysis using quasi-likelihood without a reduction in power. Our computational approach could be used to evaluate sample unit dependence in other studies where fates of individually marked siblings are monitored.  相似文献   

11.
Zhao and Tsiatis (1997) consider the problem of estimation of the distribution of the quality-adjusted lifetime when the chronological survival time is subject to right censoring. The quality-adjusted lifetime is typically defined as a weighted sum of the times spent in certain states up until death or some other failure time. They propose an estimator and establish the relevant asymptotics under the assumption of independent censoring. In this paper we extend the data structure with a covariate process observed until the end of follow-up and identify the optimal estimation problem. Because of the curse of dimensionality, no globally efficient nonparametric estimators, which have a good practical performance at moderate sample sizes, exist. Given a correctly specified model for the hazard of censoring conditional on the observed quality-of-life and covariate processes, we propose a closed-form one-step estimator of the distribution of the quality-adjusted lifetime whose asymptotic variance attains the efficiency bound if we can correctly specify a lower-dimensional working model for the conditional distribution of quality-adjusted lifetime given the observed quality-of-life and covariate processes. The estimator remains consistent and asymptotically normal even if this latter submodel is misspecified. The practical performance of the estimators is illustrated with a simulation study. We also extend our proposed one-step estimator to the case where treatment assignment is confounded by observed risk factors so that this estimator can be used to test a treatment effect in an observational study.  相似文献   

12.
In the context of right-censored and interval-censored data, we develop asymptotic formulas to compute pseudo-observations for the survival function and the restricted mean survival time (RMST). These formulas are based on the original estimators and do not involve computation of the jackknife estimators. For right-censored data, Von Mises expansions of the Kaplan–Meier estimator are used to derive the pseudo-observations. For interval-censored data, a general class of parametric models for the survival function is studied. An asymptotic representation of the pseudo-observations is derived involving the Hessian matrix and the score vector. Theoretical results that justify the use of pseudo-observations in regression are also derived. The formula is illustrated on the piecewise-constant-hazard model for the RMST. The proposed approximations are extremely accurate, even for small sample sizes, as illustrated by Monte Carlo simulations and real data. We also study the gain in terms of computation time, as compared to the original jackknife method, which can be substantial for a large dataset.  相似文献   

13.
Multivariate survival data arise from case-control family studies in which the ages at disease onset for family members may be correlated. In this paper, we consider a multivariate survival model with the marginal hazard function following the proportional hazards model. We use a frailty-based approach in the spirit of Glidden and Self (1999) to account for the correlation of ages at onset among family members. Specifically, we first estimate the baseline hazard function nonparametrically by the innovation theorem, and then obtain maximum pseudolikelihood estimators for the regression and correlation parameters plugging in the baseline hazard function estimator. We establish a connection with a previously proposed generalized estimating equation-based approach. Simulation studies and an analysis of case-control family data of breast cancer illustrate the methodology's practical utility.  相似文献   

14.
Jinliang Wang 《Molecular ecology》2016,25(19):4692-4711
In molecular ecology and conservation genetics studies, the important parameter of effective population size (Ne) is increasingly estimated from a single sample of individuals taken at random from a population and genotyped at a number of marker loci. Several estimators are developed, based on the information of linkage disequilibrium (LD), heterozygote excess (HE), molecular coancestry (MC) and sibship frequency (SF) in marker data. The most popular is the LD estimator, because it is more accurate than HE and MC estimators and is simpler to calculate than SF estimator. However, little is known about the accuracy of LD estimator relative to that of SF and about the robustness of all single‐sample estimators when some simplifying assumptions (e.g. random mating, no linkage, no genotyping errors) are violated. This study fills the gaps and uses extensive simulations to compare the biases and accuracies of the four estimators for different population properties (e.g. bottlenecks, nonrandom mating, haplodiploid), marker properties (e.g. linkage, polymorphisms) and sample properties (e.g. numbers of individuals and markers) and to compare the robustness of the four estimators when marker data are imperfect (with allelic dropouts). Extensive simulations show that SF estimator is more accurate, has a much wider application scope (e.g. suitable to nonrandom mating such as selfing, haplodiploid species, dominant markers) and is more robust (e.g. to the presence of linkage and genotyping errors of markers) than the other estimators. An empirical data set from a Yellowstone grizzly bear population was analysed to demonstrate the use of the SF estimator in practice.  相似文献   

15.
In clinical settings, the necessity of treatment is often measured in terms of the patient’s prognosis in the absence of treatment. Along these lines, it is often of interest to compare subgroups of patients (e.g., based on underlying diagnosis) with respect to pre-treatment survival. Such comparisons may be complicated by at least two important issues. First, mortality contrasts by subgroup may differ over follow-up time, as opposed to being constant, and may follow a form that is difficult to model parametrically. Moreover, in settings where the proportional hazards assumption fails, investigators tend to be more interested in cumulative (as opposed to instantaneous) effects on mortality. Second, pre-treatment death is censored by the receipt of treatment and in settings where treatment assignment depends on time-dependent factors that also affect mortality, such censoring is likely to be informative. We propose semiparametric methods for contrasting subgroup-specific cumulative mortality in the presence of dependent censoring. The proposed estimators are based on the cumulative hazard function, with pre-treatment mortality assumed to follow a stratified Cox model. No functional form is assumed for the nature of the non-proportionality. Asymptotic properties of the proposed estimators are derived, and simulation studies show that the proposed methods are applicable to practical sample sizes. The methods are then applied to contrast pre-transplant mortality for acute versus chronic End-Stage Liver Disease patients.  相似文献   

16.
Effects of sample size on the performance of species distribution models   总被引:8,自引:0,他引:8  
A wide range of modelling algorithms is used by ecologists, conservation practitioners, and others to predict species ranges from point locality data. Unfortunately, the amount of data available is limited for many taxa and regions, making it essential to quantify the sensitivity of these algorithms to sample size. This is the first study to address this need by rigorously evaluating a broad suite of algorithms with independent presence–absence data from multiple species and regions. We evaluated predictions from 12 algorithms for 46 species (from six different regions of the world) at three sample sizes (100, 30, and 10 records). We used data from natural history collections to run the models, and evaluated the quality of model predictions with area under the receiver operating characteristic curve (AUC). With decreasing sample size, model accuracy decreased and variability increased across species and between models. Novel modelling methods that incorporate both interactions between predictor variables and complex response shapes (i.e. GBM, MARS-INT, BRUTO) performed better than most methods at large sample sizes but not at the smallest sample sizes. Other algorithms were much less sensitive to sample size, including an algorithm based on maximum entropy (MAXENT) that had among the best predictive power across all sample sizes. Relative to other algorithms, a distance metric algorithm (DOMAIN) and a genetic algorithm (OM-GARP) had intermediate performance at the largest sample size and among the best performance at the lowest sample size. No algorithm predicted consistently well with small sample size ( n  < 30) and this should encourage highly conservative use of predictions based on small sample size and restrict their use to exploratory modelling.  相似文献   

17.
In the context of time-to-event analysis, a primary objective is to model the risk of experiencing a particular event in relation to a set of observed predictors. The Concordance Index (C-Index) is a statistic frequently used in practice to assess how well such models discriminate between various risk levels in a population. However, the properties of conventional C-Index estimators when applied to left-truncated time-to-event data have not been well studied, despite the fact that left-truncation is commonly encountered in observational studies. We show that the limiting values of the conventional C-Index estimators depend on the underlying distribution of truncation times, which is similar to the situation with right-censoring as discussed in Uno et al. (2011) [On the C-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Statistics in Medicine 30(10), 1105–1117]. We develop a new C-Index estimator based on inverse probability weighting (IPW) that corrects for this limitation, and we generalize this estimator to settings with left-truncated and right-censored data. The proposed IPW estimators are highly robust to the underlying truncation distribution and often outperform the conventional methods in terms of bias, mean squared error, and coverage probability. We apply these estimators to evaluate a predictive survival model for mortality among patients with end-stage renal disease.  相似文献   

18.
Liu M  Lu W  Shao Y 《Biometrics》2006,62(4):1053-1061
Interval mapping using normal mixture models has been an important tool for analyzing quantitative traits in experimental organisms. When the primary phenotype is time-to-event, it is natural to use survival models such as Cox's proportional hazards model instead of normal mixtures to model the phenotype distribution. An extra challenge for modeling time-to-event data is that the underlying population may consist of susceptible and nonsusceptible subjects. In this article, we propose a semiparametric proportional hazards mixture cure model which allows missing covariates. We discuss applications to quantitative trait loci (QTL) mapping when the primary trait is time-to-event from a population of mixed susceptibility. This model can be used to characterize QTL effects on both susceptibility and time-to-event distribution, and to estimate QTL location. The model can naturally incorporate covariate effects of other risk factors. Maximum likelihood estimates for the parameters in the model as well as their corresponding variance estimates can be obtained numerically using an EM-type algorithm. The proposed methods are assessed by simulations under practical settings and illustrated using a real data set containing survival times of mice after infection with Listeria monocytogenes. An extension to multiple intervals is also discussed.  相似文献   

19.
We expand a coalescent-based method that uses serially sampled genetic data from a subdivided population to incorporate changes to the number of demes and patterns of colonization. Often, when estimating population parameters or other parameters of interest from genetic data, the demographic structure and parameters are not constant over evolutionary time. In this paper, we develop a Bayesian Markov chain Monte Carlo method that allows for step changes in mutation, migration, and population sizes, as well as changing numbers of demes, where the times of these changes are also estimated. We show that in parameter ranges of interest, reliable estimates can often be obtained, including the historical times of parameter changes. However, posterior densities of migration rates can be quite diffuse and estimators somewhat biased, as reported by other authors.  相似文献   

20.
Abstract. The efficiency of four nonparametric species richness estimators — first‐order Jackknife, second‐order Jackknife, Chao2 and Bootstrap — was tested using simulated quadrat sampling of two field data sets (a sandy ‘Dune’ and adjacent ‘Swale’) in high diversity shrublands (kwongan) in south‐western Australia. The data sets each comprised > 100 perennial plant species and > 10 000 individuals, and the explicit (x‐y co‐ordinate) location of every individual. We applied two simulated sampling strategies to these data sets based on sampling quadrats of unit sizes 1/400th and 1/100th of total plot area. For each site and sampling strategy we obtained 250 independent sample curves, of 250 quadrats each, and compared the estimators’ performances by using three indices of bias and precision: MRE (mean relative error), MSRE (mean squared relative error) and OVER (percentage overestimation). The analysis presented here is unique in providing sample estimates derived from a complete, field‐based population census for a high diversity plant community. In general the true reference value was approached faster for a comparable area sampled for the smaller quadrat size and for the swale field data set, which was characterized by smaller plant size and higher plant density. Nevertheless, at least 15–30% of the total area needed to be sampled before reasonable estimates of St (total species richness) were obtained. In most field surveys, typically less than 1% of the total study domain is likely to be sampled, and at this sampling intensity underestimation is a problem. Results showed that the second‐order Jackknife approached the actual value of St more quickly than the other estimators. All four estimators were better than Sobs (observed number of species). However, the behaviour of the tested estimators was not as good as expected, and even with large sample size (number of quadrats sampled) all of them failed to provide reliable estimates. First‐ and second‐order Jackknives were positively biased whereas Chao2 and Bootstrap were negatively biased. The observed limitations in the estimators’ performance suggests that there is still scope for new tools to be developed by statisticians to assist in the estimation of species richness from sample data, especially in communities with high species richness.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号