期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Joint semiparametric models for case-cohort designs

Weibin Zhong Guoqing Diao 《Biometrics》2023,79(3):1959-1971

Two-phase studies such as case-cohort and nested case-control studies are widely used cost-effective sampling strategies. In the first phase, the observed failure/censoring time and inexpensive exposures are collected. In the second phase, a subgroup of subjects is selected for measurements of expensive exposures based on the information from the first phase. One challenging issue is how to utilize all the available information to conduct efficient regression analyses of the two-phase study data. This paper proposes a joint semiparametric modeling of the survival outcome and the expensive exposures. Specifically, we assume a class of semiparametric transformation models and a semiparametric density ratio model for the survival outcome and the expensive exposures, respectively. The class of semiparametric transformation models includes the proportional hazards model and the proportional odds model as special cases. The density ratio model is flexible in modeling multivariate mixed-type data. We develop efficient likelihood-based estimation and inference procedures and establish the large sample properties of the nonparametric maximum likelihood estimators. Extensive numerical studies reveal that the proposed methods perform well under practical settings. The proposed methods also appear to be reasonably robust under various model mis-specifications. An application to the National Wilms Tumor Study is provided. 相似文献

2.

Efficient semiparametric estimation of haplotype-disease associations in case-cohort and nested case-control studies

Zeng D Lin DY Avery CL North KE Bray MS 《Biostatistics (Oxford, England)》2006,7(3):486-502

Estimating the effects of haplotypes on the age of onset of a disease is an important step toward the discovery of genes that influence complex human diseases. A haplotype is a specific sequence of nucleotides on the same chromosome of an individual and can only be measured indirectly through the genotype. We consider cohort studies which collect genotype data on a subset of cohort members through case-cohort or nested case-control sampling. We formulate the effects of haplotypes and possibly time-varying environmental variables on the age of onset through a broad class of semiparametric regression models. We construct appropriate nonparametric likelihoods, which involve both finite- and infinite-dimensional parameters. The corresponding nonparametric maximum likelihood estimators are shown to be consistent, asymptotically normal, and asymptotically efficient. Consistent variance-covariance estimators are provided, and efficient and reliable numerical algorithms are developed. Simulation studies demonstrate that the asymptotic approximations are accurate in practical settings and that case-cohort and nested case-control designs are highly cost-effective. An application to a major cardiovascular study is provided. 相似文献

3.

Haplotype-based association analysis in cohort and nested case-control studies

Chen J Chatterjee N 《Biometrics》2006,62(1):28-35

Genetic epidemiologic studies often collect genotype data at multiple loci within a genomic region of interest from a sample of unrelated individuals. One popular method for analyzing such data is to assess whether haplotypes, i.e., the arrangements of alleles along individual chromosomes, are associated with the disease phenotype or not. For many study subjects, however, the exact haplotype configuration on the pair of homologous chromosomes cannot be derived with certainty from the available locus-specific genotype data (phase ambiguity). In this article, we consider estimating haplotype-specific association parameters in the Cox proportional hazards model, using genotype, environmental exposure, and the disease endpoint data collected from cohort or nested case-control studies. We study alternative Expectation-Maximization algorithms for estimating haplotype frequencies from cohort and nested case-control studies. Based on a hazard function of the disease derived from the observed genotype data, we then propose a semiparametric method for joint estimation of relative-risk parameters and the cumulative baseline hazard function. The method is greatly simplified under a rare disease assumption, for which an asymptotic variance estimator is also proposed. The performance of the proposed estimators is assessed via simulation studies. An application of the proposed method is presented, using data from the Alpha-Tocopherol, Beta-Carotene Cancer Prevention Study. 相似文献

4.

Sequential designs for ordinal phase I clinical trials

Guohui Liu William F. Rosenberger Linda M. Haines 《Biometrical journal. Biometrische Zeitschrift》2009,51(2):335-347

Sequential designs for phase I clinical trials which incorporate maximum likelihood estimates (MLE) as data accrue are inherently problematic because of limited data for estimation early on. We address this problem for small phase I clinical trials with ordinal responses. In particular, we explore the problem of the nonexistence of the MLE of the logistic parameters under a proportional odds model with one predictor. We incorporate the probability of an undetermined MLE as a restriction, as well as ethical considerations, into a proposed sequential optimal approach, which consists of a start‐up design, a follow‐on design and a sequential dose‐finding design. Comparisons with nonparametric sequential designs are also performed based on simulation studies with parameters drawn from a real data set. 相似文献

5.

Weight calibration to improve the efficiency of pure risk estimates from case-control samples nested in a cohort

Yei Eun Shin Ruth M. Pfeiffer Barry I. Graubard Mitchell H. Gail 《Biometrics》2020,76(4):1087-1097

Cohort studies provide information on relative hazards and pure risks of disease. For rare outcomes, large cohorts are needed to have sufficient numbers of events, making it costly to obtain covariate information on all cohort members. We focus on nested case-control designs that are used to estimate relative hazard in the Cox regression model. In 1997, Langholz and Borgan showed that pure risk can also be estimated from nested case-control data. However, these approaches do not take advantage of some covariates that may be available on all cohort members. Researchers have used weight calibration to increase the efficiency of relative hazard estimates from case-cohort studies and nested cased-control studies. Our objective is to extend weight calibration approaches to nested case-control designs to improve precision of estimates of relative hazards and pure risks. We show that calibrating sample weights additionally against follow-up times multiplied by relative hazards during the risk projection period improves estimates of pure risk. Efficiency improvements for relative hazards for variables that are available on the entire cohort also contribute to improved efficiency for pure risks. We develop explicit variance formulas for the weight-calibrated estimates. Simulations show how much precision is improved by calibration and confirm the validity of inference based on asymptotic normality. Examples are provided using data from the American Association of Retired Persons Diet and Health Cohort Study. 相似文献

6.

Polynomial spline estimation and inference of proportional hazards regression models with flexible relative risk form

Huang JZ Liu L 《Biometrics》2006,62(3):793-802

The Cox proportional hazards model usually assumes an exponential form for the dependence of the hazard function on covariate variables. However, in practice this assumption may be violated and other relative risk forms may be more appropriate. In this article, we consider the proportional hazards model with an unknown relative risk form. Issues in model interpretation are addressed. We propose a method to estimate the relative risk form and the regression parameters simultaneously by first approximating the logarithm of the relative risk form by a spline, and then employing the maximum partial likelihood estimation. An iterative alternating optimization procedure is developed for efficient implementation. Statistical inference of the regression coefficients and of the relative risk form based on parametric asymptotic theory is discussed. The proposed methods are illustrated using simulation and an application to the Veteran's Administration lung cancer data. 相似文献

7.

Conditional logistic analysis of case-control studies with complex sampling

Langholz B Goldstein L 《Biostatistics (Oxford, England)》2001,2(1):63-84

Methods for the analysis of unmatched case-control data based on a finite population sampling model are developed. Under this model, and the prospective logistic model for disease probabilities, a likelihood for case-control data that accommodates very general sampling of controls is derived. This likelihood has the form of a weighted conditional logistic likelihood. The flexibility of the methods is illustrated by providing a number of control sampling designs and a general scheme for their analyses. These include frequency matching, counter-matching, case-base, randomized recruitment, and quota sampling. A study of risk factors for childhood asthma illustrates an application of the counter-matching design. Some asymptotic efficiency results are presented and computational methods discussed. Further, it is shown that a 'marginal' likelihood provides a link to unconditional logistic methods. The methods are examined in a simulation study that compares frequency and counter-matching using conditional and unconditional logistic analyses and indicate that the conditional logistic likelihood has superior efficiency. Extensions that accommodate sampling of cases and multistage designs are presented. Finally, we compare the analysis methods presented here to other approaches, compare counter-matching and two-stage designs, and suggest areas for further research.To whom correspondence should be addressed. 相似文献

8.

Bayesian cure rate frailty models with application to a root canal therapy study

Yin G 《Biometrics》2005,61(2):552-558

Due to natural or artificial clustering, multivariate survival data often arise in biomedical studies, for example, a dental study involving multiple teeth from each subject. A certain proportion of subjects in the population who are not expected to experience the event of interest are considered to be "cured" or insusceptible. To model correlated or clustered failure time data incorporating a surviving fraction, we propose two forms of cure rate frailty models. One model naturally introduces frailty based on biological considerations while the other is motivated from the Cox proportional hazards frailty model. We formulate the likelihood functions based on piecewise constant hazards and derive the full conditional distributions for Gibbs sampling in the Bayesian paradigm. As opposed to the Cox frailty model, the proposed methods demonstrate great potential in modeling multivariate survival data with a cure fraction. We illustrate the cure rate frailty models with a root canal therapy data set. 相似文献

9.

Proportional hazards model with covariates subject to measurement error. 总被引：1，自引：0，他引：1

T Nakamura 《Biometrics》1992,48(3):829-838

When covariates of a proportional hazards model are subject to measurement error, the maximum likelihood estimates of regression coefficients based on the partial likelihood are asymptotically biased. Prentice (1982, Biometrika 69, 331-342) presents an example of such bias and suggests a modified partial likelihood. This paper applies the corrected score function method (Nakamura, 1990, Biometrika 77, 127-137) to the proportional hazards model when measurement errors are additive and normally distributed. The result allows a simple correction to the ordinary partial likelihood that yields asymptotically unbiased estimates; the validity of the correction is confirmed via a limited simulation study. 相似文献

10.

Pseudo semiparametric maximum likelihood estimation exploiting gene environment independence for population-based case-control studies with complex samples

Li Y Graubard BI 《Biostatistics (Oxford, England)》2012,13(4):711-723

Advances in human genetics have led to epidemiological investigations not only of the effects of genes alone but also of gene-environment (G-E) interaction. A widely accepted design strategy in the study of how G-E relate to disease risks is the population-based case-control study (PBCCS). For simple random samples, semiparametric methods for testing G-E have been developed by Chatterjee and Carroll in 2005. The use of complex sampling in PBCCS that involve differential probabilities of sample selection of cases and controls and possibly cluster sampling is becoming more common. Two complexities, weighting for selection probabilities and intracluster correlation of observations, are induced by the complex sampling. We develop pseudo-semiparametric maximum likelihood estimators (pseudo-SPMLE) that apply to PBCCS with complex sampling. We study the finite sample performance of the pseudo-SPMLE using simulations and illustrate the pseudo-SPMLE with a US case-control study of kidney cancer. 相似文献

11.

Multivariate survival analysis for case-control family data

Hsu L Gorfine M 《Biostatistics (Oxford, England)》2006,7(3):387-398

Multivariate survival data arise from case-control family studies in which the ages at disease onset for family members may be correlated. In this paper, we consider a multivariate survival model with the marginal hazard function following the proportional hazards model. We use a frailty-based approach in the spirit of Glidden and Self (1999) to account for the correlation of ages at onset among family members. Specifically, we first estimate the baseline hazard function nonparametrically by the innovation theorem, and then obtain maximum pseudolikelihood estimators for the regression and correlation parameters plugging in the baseline hazard function estimator. We establish a connection with a previously proposed generalized estimating equation-based approach. Simulation studies and an analysis of case-control family data of breast cancer illustrate the methodology's practical utility. 相似文献

12.

Cohort case-control design and analysis for clustered failure-time data

Lu SE Wang MC 《Biometrics》2002,58(4):764-772

Cohort case-control design is an efficient and economical design to study risk factors for disease incidence or mortality in a large cohort. In the last few decades, a variety of cohort case-control designs have been developed and theoretically justified. These designs have been exclusively applied to the analysis of univariate failure-time data. In this work, a cohort case-control design adapted to multivariate failure-time data is developed. A risk set sampling method is proposed to sample controls from nonfailures in a large cohort for each case matched by failure time. This method leads to a pseudolikelihood approach for the estimation of regression parameters in the marginal proportional hazards model (Cox, 1972, Journal of the Royal Statistical Society, Series B 34, 187-220), where the correlation structure between individuals within a cluster is left unspecified. The performance of the proposed estimator is demonstrated by simulation studies. A bootstrap method is proposed for inferential purposes. This methodology is illustrated by a data example from a child vitamin A supplementation trial in Nepal (Nepal Nutrition Intervention Project-Sarlahi, or NNIPS). 相似文献

13.

Use of external rates in nested case-control studies with application to the international radiation study of cervical cancer patients.

D C Thomas M Blettner N E Day 《Biometrics》1992,48(3):781-794

A method is proposed for analysis of nested case-control studies that combines the matched comparison of covariate values between cases and controls and a comparison of the observed numbers of cases in the nesting cohort with expected numbers based on external rates and average relative risks estimated from the controls. The former comparison is based on the conditional likelihood for matched case-control studies and the latter on the unconditional likelihood for Poisson regression. It is shown that the two likelihoods are orthogonal and that their product is an estimator of the full survival likelihood that would have been obtained on the total cohort, had complete covariate data been available. Parameter estimation and significance tests follow in the usual way by maximizing this product likelihood. The method is illustrated using data on leukemia following irradiation for cervical cancer. In this study, the original cohort study showed a clear excess of leukemia in the first 15 years after exposure, but it was not feasible to obtain dose estimates on the entire cohort. However, the subsequent nested case-control study failed to demonstrate significant differences between alternative dose-response relations and effects of time-related modifiers. The combined analysis allows much clearer discrimination between alternative dose-time-response models. 相似文献

14.

Simplifying the estimation of diagnostic testing accuracy over time for high specificity tests in the absence of a gold standard

Clara Drew Moses Badio Dehkontee Dennis Lisa Hensley Elizabeth Higgs Michael Sneller Mosoka Fallah Cavan Reilly 《Biometrics》2023,79(2):1546-1558

Many different methods for evaluating diagnostic test results in the absence of a gold standard have been proposed. In this paper, we discuss how one common method, a maximum likelihood estimate for a latent class model found via the Expectation-Maximization (EM) algorithm can be applied to longitudinal data where test sensitivity changes over time. We also propose two simplified and nonparametric methods which use data-based indicator variables for disease status and compare their accuracy to the maximum likelihood estimation (MLE) results. We find that with high specificity tests, the performance of simpler approximations may be just as high as the MLE. 相似文献

15.

Analysis of current status data with missing covariates

Wen CC Lin CT 《Biometrics》2011,67(3):760-769

Statistical inference based on right-censored data for the proportional hazards (PH) model with missing covariates has received considerable attention, but interval-censored or current status data with missing covariates has not yet been investigated. Our study is partly motivated by the analysis of fracture data from the 2005 National Health Interview Survey Original Database in Taiwan, where the occurrence of fractures was interval censored and the covariate osteoporosis was not reported for all residents. We assume that the data are realized from a PH model. A semiparametric maximum likelihood estimate implemented by a hybrid algorithm is proposed to analyze current status data with missing covariates. A comparison of the performance of our method with full-cohort analysis, complete-case analysis, and surrogate analysis is made via simulation with moderate sample sizes. The fracture data are then analyzed. 相似文献

16.

A nonproportional hazards Weibull accelerated failure time regression model 总被引：1，自引：0，他引：1

K M Anderson 《Biometrics》1991,47(1):281-288

We present a study of risk factors measured in mean before age 50 and subsequent incidence of heart disease over 32 years of follow-up. The data are from the Framingham Heart Study. The standard accelerated failure time model assumes the logarithm of time until an event has a constant dispersion parameter and a location parameter that is a linear function of covariates. Parameters are estimated by maximum likelihood. We reject a standard Weibull model for these data in favor of a model with the dispersion parameter depending on the location parameter. This model suggests that the cumulative hazard ratio for two individuals shrinks towards unity over the follow-up period. Thus, not only the standard Weibull, but also the semiparametric proportional hazards (Cox) model is inadequate for this data. The model improvement appears particularly valuable when estimating the difference in predicted outcome probabilities for two individuals. 相似文献

17.

Using the Whole Cohort in the Analysis of Case-Control Data

Norman E. Breslow Gustavo Amorim Mary B. Pettinger Jacques Rossouw 《Statistics in biosciences》2013,5(2):232-249

Standard analyses of data from case-control studies that are nested in a large cohort ignore information available for cohort members not sampled for the sub-study. This paper reviews several methods designed to increase estimation efficiency by using more of the data, treating the case-control sample as a two or three phase stratified sample. When applied to a study of coronary heart disease among women in the hormone trials of the Women’s Health Initiative, modest but increasing gains in precision of regression coefficients were observed depending on the amount of cohort information used in the analysis. The gains were particularly evident for pseudo- or maximum likelihood estimates whose validity depends on the assumed model being correct. Larger standard errors were obtained for coefficients estimated by inverse probability weighted methods that are more robust to model misspecification. Such misspecification may have been responsible for an important difference in one key regression coefficient estimated using the weighted compared with the more efficient methods. 相似文献

18.

Testing the non-unity of rate ratio under inverse sampling

Tang ML Liao YJ Ng HK Chan PS 《Biometrical journal. Biometrische Zeitschrift》2007,49(4):551-564

Inverse sampling is considered to be a more appropriate sampling scheme than the usual binomial sampling scheme when subjects arrive sequentially, when the underlying response of interest is acute, and when maximum likelihood estimators of some epidemiologic indices are undefined. In this article, we study various statistics for testing non-unity rate ratios in case-control studies under inverse sampling. These include the Wald, unconditional score, likelihood ratio and conditional score statistics. Three methods (the asymptotic, conditional exact, and Mid-P methods) are adopted for P-value calculation. We evaluate the performance of different combinations of test statistics and P-value calculation methods in terms of their empirical sizes and powers via Monte Carlo simulation. In general, asymptotic score and conditional score tests are preferable for their actual type I error rates are well controlled around the pre-chosen nominal level, and their powers are comparatively the largest. The exact version of Wald test is recommended if one wants to control the actual type I error rate at or below the pre-chosen nominal level. If larger power is expected and fluctuation of sizes around the pre-chosen nominal level are allowed, then the Mid-P version of Wald test is a desirable alternative. We illustrate the methodologies with a real example from a heart disease study. 相似文献

19.

Matrix methods for estimating odds ratios with misclassified exposure data: extensions and comparisons

Morrissey MJ Spiegelman D 《Biometrics》1999,55(2):338-344

Misclassification of exposure variables is a common problem in epidemiologic studies. This paper compares the matrix method (Barron, 1977, Biometrics 33, 414-418; Greenland, 1988a, Statistics in Medicine 7, 745-757) and the inverse matrix method (Marshall, 1990, Journal of Clinical Epidemiology 43, 941-947) to the maximum likelihood estimator (MLE) that corrects the odds ratio for bias due to a misclassified binary covariate. Under the assumption of differential misclassification, the inverse matrix method is always more efficient than the matrix method; however, the efficiency depends strongly on the values of the sensitivity, specificity, baseline probability of exposure, the odds ratio, case-control ratio, and validation sampling fraction. In a study on sudden infant death syndrome (SIDS), an estimate of the asymptotic relative efficiency (ARE) of the inverse matrix estimate was 0.99, while the matrix method's ARE was 0.19. Under nondifferential misclassification, neither the matrix nor the inverse matrix estimator is uniformly more efficient than the other; the efficiencies again depend on the underlying parameters. In the SIDS data, the MLE was more efficient than the matrix method (ARE = 0.39). In a study investigating the effect of vitamin A intake on the incidence of breast cancer, the MLE was more efficient than the matrix method (ARE = 0.75). 相似文献

20.

A Monte Carlo maximum likelihood method for estimating uncertainty arising from shared errors in exposures in epidemiological studies of nuclear workers

Stayner L Vrijheid M Cardis E Stram DO Deltour I Gilbert SJ Howe G 《Radiation research》2007,168(6):757-763

Errors in the estimation of exposures or doses are a major source of uncertainty in epidemiological studies of cancer among nuclear workers. This paper presents a Monte Carlo maximum likelihood method that can be used for estimating a confidence interval that reflects both statistical sampling error and uncertainty in the measurement of exposures. The method is illustrated by application to an analysis of all cancer (excluding leukemia) mortality in a study of nuclear workers at the Oak Ridge National Laboratory (ORNL). Monte Carlo methods were used to generate 10,000 data sets with a simulated corrected dose estimate for each member of the cohort based on the estimated distribution of errors in doses. A Cox proportional hazards model was applied to each of these simulated data sets. A partial likelihood, averaged over all of the simulations, was generated; the central risk estimate and confidence interval were estimated from this partial likelihood. The conventional unsimulated analysis of the ORNL study yielded an excess relative risk (ERR) of 5.38 per Sv (90% confidence interval 0.54-12.58). The Monte Carlo maximum likelihood method yielded a slightly lower ERR (4.82 per Sv) and wider confidence interval (0.41-13.31). 相似文献