首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In case-control studies where the outcome is not uncommon but the exposure is rare, inverse sampling may be used to reduce the total number of subjects required to find a fixed number of exposed cases and controls. The sampling distribution is negative binomial rather than binomial. Logistic regression for adjustment of covariates may be implemented on the computer program GLIM by the appropriate use of macros. An example is given.  相似文献   

2.
We consider matched case-control familial studies which match a group of patients, called "case probands," with a group of disease-free subjects, called "control probands," using a set of family-level matching variables. Family members of each proband are then recruited into the study. Of interest here is the familial aggregation of the response variable and the effects of subject-specific covariates on the response. We propose an estimating equation approach to jointly estimate the main effects and intrafamilial correlations for matched family studies with a continuous outcome. Only knowledge of the first two joint moments of the response variable is required. The induced estimators for the main effects and intrafamilial correlations are consistent and asymptotically normally distributed. We apply the proposed method to sleep apnea data. A simulation study demonstrates the usefulness of our approach.  相似文献   

3.
Although case-control association studies have been widely used, they are insufficient for many complex diseases, such as Alzheimer's disease and breast cancer, since these diseases may have multiple subtypes with distinct morphologies and clinical implications. Many multigroup studies, such as the Alzheimer's Disease Neuroimaging Initiative (ADNI), have been undertaken by recruiting subjects based on their multiclass primary disease status, while extensive secondary outcomes have been collected. The aim of this paper is to develop a general regression framework for the analysis of secondary phenotypes collected in multigroup association studies. Our regression framework is built on a conditional model for the secondary outcome given the multigroup status and covariates and its relationship with the population regression of interest of the secondary outcome given the covariates. Then, we develop generalized estimation equations to estimate the parameters of interest. We use both simulations and a large-scale imaging genetic data analysis from the ADNI to evaluate the effect of the multigroup sampling scheme on standard genome-wide association analyses based on linear regression methods, while comparing it with our statistical methods that appropriately adjust for the multigroup sampling scheme. Data used in preparation of this article were obtained from the ADNI database.  相似文献   

4.
Case-control designs are widely used in rare disease studies. In a typical case-control study, data are collected from a sample of all available subjects who have experienced a disease (cases) and a sub-sample of subjects who have not experienced the disease (controls) in a study cohort. Cases are oversampled in case-control studies. Logistic regression is a common tool to estimate the relative risks of the disease with respect to a set of covariates. Very often in such a study, information of ages-at-onset of the disease for all cases and ages at survey of controls are known. Standard logistic regression analysis using age as a covariate is based on a dichotomous outcome and does not efficiently use such age-at-onset (time-to-event) information. We propose to analyze age-at-onset data using a modified case-cohort method by treating the control group as an approximation of a subcohort assuming rare events. We investigate the asymptotic bias of this approximation and show that the asymptotic bias of the proposed estimator is small when the disease rate is low. We evaluate the finite sample performance of the proposed method through a simulation study and illustrate the method using a breast cancer case-control data set.  相似文献   

5.
Two-phase designs can reduce the cost of epidemiological studies by limiting the ascertainment of expensive covariates or/and exposures to an efficiently selected subset (phase-II) of a larger (phase-I) study. Efficient analysis of the resulting data set combining disparate information from phase-I and phase-II, however, can be complex. Most of the existing methods, including semiparametric maximum-likelihood estimator, require the information in phase-I to be summarized into a fixed number of strata. In this paper, we describe a novel method for the analysis of two-phase studies where information from phase-I is summarized by parameters associated with a reduced logistic regression model of the disease outcome on available covariates. We then setup estimating equations for parameters associated with the desired extended logistic regression model, based on information on the reduced model parameters from phase-I and complete data available at phase-II after accounting for nonrandom sampling design. We use generalized method of moments to solve overly identified estimating equations and develop the resulting asymptotic theory for the proposed estimator. Simulation studies show that the use of reduced parametric models, as opposed to summarizing data into strata, can lead to more efficient utilization of phase-I data. An application of the proposed method is illustrated using the data from the U.S. National Wilms Tumor Study.  相似文献   

6.
Primary analysis of case-control studies focuses on the relationship between disease (D) and a set of covariates of interest (Y,X). A secondary application of the case-control study, often invoked in modern genetic epidemiologic association studies, is to investigate the interrelationship between the covariates themselves. The task is complicated due to the case-control sampling, and to avoid the biased sampling that arises from the design, it is typical to use the control data only. In this paper, we develop penalized regression spline methodology that uses all the data, and improves precision of estimation compared to using only the controls. A simulation study and an empirical example are used to illustrate the methodology.  相似文献   

7.
Outcome-dependent sampling (ODS) schemes can be a cost effective way to enhance study efficiency. The case-control design has been widely used in epidemiologic studies. However, when the outcome is measured on a continuous scale, dichotomizing the outcome could lead to a loss of efficiency. Recent epidemiologic studies have used ODS sampling schemes where, in addition to an overall random sample, there are also a number of supplemental samples that are collected based on a continuous outcome variable. We consider a semiparametric empirical likelihood inference procedure in which the underlying distribution of covariates is treated as a nuisance parameter and is left unspecified. The proposed estimator has asymptotic normality properties. The likelihood ratio statistic using the semiparametric empirical likelihood function has Wilks-type properties in that, under the null, it follows a chi-square distribution asymptotically and is independent of the nuisance parameters. Our simulation results indicate that, for data obtained using an ODS design, the semiparametric empirical likelihood estimator is more efficient than conditional likelihood and probability weighted pseudolikelihood estimators and that ODS designs (along with the proposed estimator) can produce more efficient estimates than simple random sample designs of the same size. We apply the proposed method to analyze a data set from the Collaborative Perinatal Project (CPP), an ongoing environmental epidemiologic study, to assess the relationship between maternal polychlorinated biphenyl (PCB) level and children's IQ test performance.  相似文献   

8.
Chen J  Rodriguez C 《Biometrics》2007,63(4):1099-1107
Genetic epidemiologists routinely assess disease susceptibility in relation to haplotypes, that is, combinations of alleles on a single chromosome. We study statistical methods for inferring haplotype-related disease risk using single nucleotide polymorphism (SNP) genotype data from matched case-control studies, where controls are individually matched to cases on some selected factors. Assuming a logistic regression model for haplotype-disease association, we propose two conditional likelihood approaches that address the issue that haplotypes cannot be inferred with certainty from SNP genotype data (phase ambiguity). One approach is based on the likelihood of disease status conditioned on the total number of cases, genotypes, and other covariates within each matching stratum, and the other is based on the joint likelihood of disease status and genotypes conditioned only on the total number of cases and other covariates. The joint-likelihood approach is generally more efficient, particularly for assessing haplotype-environment interactions. Simulation studies demonstrated that the first approach was more robust to model assumptions on the diplotype distribution conditioned on environmental risk variables and matching factors in the control population. We applied the two methods to analyze a matched case-control study of prostate cancer.  相似文献   

9.
Optimal multivariate matching before randomization   总被引:1,自引:0,他引:1  
Although blocking or pairing before randomization is a basic principle of experimental design, the principle is almost invariably applied to at most one or two blocking variables. Here, we discuss the use of optimal multivariate matching prior to randomization to improve covariate balance for many variables at the same time, presenting an algorithm and a case-study of its performance. The method is useful when all subjects, or large groups of subjects, are randomized at the same time. Optimal matching divides a single group of 2n subjects into n pairs to minimize covariate differences within pairs-the so-called nonbipartite matching problem-then one subject in each pair is picked at random for treatment, the other being assigned to control. Using the baseline covariate data for 132 patients from an actual, unmatched, randomized experiment, we construct 66 pairs matching for 14 covariates. We then create 10000 unmatched and 10000 matched randomized experiments by repeatedly randomizing the 132 patients, and compare the covariate balance with and without matching. By every measure, every one of the 14 covariates was substantially better balanced when randomization was performed within matched pairs. Even after covariance adjustment for chance imbalances in the 14 covariates, matched randomizations provided more accurate estimates than unmatched randomizations, the increase in accuracy being equivalent to, on average, a 7% increase in sample size. In randomization tests of no treatment effect, matched randomizations using the signed rank test had substantially higher power than unmatched randomizations using the rank sum test, even when only 2 of 14 covariates were relevant to a simulated response. Unmatched randomizations experienced rare disasters which were consistently avoided by matched randomizations.  相似文献   

10.
Janes H  Pepe MS 《Biometrics》2008,64(1):1-9
Summary.   In case–control studies evaluating the classification accuracy of a marker, controls are often matched to cases with respect to factors associated with the marker and disease status. In contrast with matching in epidemiologic etiology studies, matching in the classification setting has not been rigorously studied. In this article, we consider the implications of matching in terms of the choice of statistical analysis, efficiency, and assessment of the incremental value of the marker over the matching covariates. We find that adjustment for the matching covariates is essential, as unadjusted summaries of classification accuracy can be biased. In many settings, matching is the most efficient covariate-dependent sampling scheme, and we provide an expression for the optimal matching ratio. However, we also show that matching greatly complicates estimation of the incremental value of the marker. We recommend that matching be carefully considered in the context of these findings.  相似文献   

11.
Weibin Zhong  Guoqing Diao 《Biometrics》2023,79(3):1959-1971
Two-phase studies such as case-cohort and nested case-control studies are widely used cost-effective sampling strategies. In the first phase, the observed failure/censoring time and inexpensive exposures are collected. In the second phase, a subgroup of subjects is selected for measurements of expensive exposures based on the information from the first phase. One challenging issue is how to utilize all the available information to conduct efficient regression analyses of the two-phase study data. This paper proposes a joint semiparametric modeling of the survival outcome and the expensive exposures. Specifically, we assume a class of semiparametric transformation models and a semiparametric density ratio model for the survival outcome and the expensive exposures, respectively. The class of semiparametric transformation models includes the proportional hazards model and the proportional odds model as special cases. The density ratio model is flexible in modeling multivariate mixed-type data. We develop efficient likelihood-based estimation and inference procedures and establish the large sample properties of the nonparametric maximum likelihood estimators. Extensive numerical studies reveal that the proposed methods perform well under practical settings. The proposed methods also appear to be reasonably robust under various model mis-specifications. An application to the National Wilms Tumor Study is provided.  相似文献   

12.
Guolo A 《Biometrics》2008,64(4):1207-1214
SUMMARY: We investigate the use of prospective likelihood methods to analyze retrospective case-control data where some of the covariates are measured with error. We show that prospective methods can be applied and the case-control sampling scheme can be ignored if one adequately models the distribution of the error-prone covariates in the case-control sampling scheme. Indeed, subject to this, the prospective likelihood methods result in consistent estimates and information standard errors are asymptotically correct. However, the distribution of such covariates is not the same in the population and under case-control sampling, dictating the need to model the distribution flexibly. In this article, we illustrate the general principle by modeling the distribution of the continuous error-prone covariates using the skewnormal distribution. The performance of the method is evaluated through simulation studies, which show satisfactory results in terms of bias and coverage. Finally, the method is applied to the analysis of two data sets which refer, respectively, to a cholesterol study and a study on breast cancer.  相似文献   

13.
Lu B 《Biometrics》2005,61(3):721-728
In observational studies with a time-dependent treatment and time-dependent covariates, it is desirable to balance the distribution of the covariates at every time point. A time-dependent propensity score based on the Cox proportional hazards model is proposed and used in risk set matching. Matching on this propensity score is shown to achieve a balanced distribution of the covariates in both treated and control groups. Optimal matching with various designs is conducted and compared in a study of a surgical treatment, cystoscopy and hydrodistention, given in response to a chronic bladder disease, interstitial cystitis. Simulation studies also suggest that the statistical analysis after matching outperforms the analysis without matching in terms of both point and interval estimations.  相似文献   

14.
Motivated by the absolute risk predictions required in medical decision making and patient counseling, we propose an approach for the combined analysis of case-control and prospective studies of disease risk factors. The approach is hierarchical to account for parameter heterogeneity among studies and among sampling units of the same study. It is based on modeling the retrospective distribution of the covariates given the disease outcome, a strategy that greatly simplifies both the combination of prospective and retrospective studies and the computation of Bayesian predictions in the hierarchical case-control context. Retrospective modeling differentiates our approach from most current strategies for inference on risk factors, which are based on the assumption of a specific prospective model. To ensure modeling flexibility, we propose using a mixture model for the retrospective distributions of the covariates. This leads to a general nonlinear regression family for the implied prospective likelihood. After introducing and motivating our proposal, we present simple results that highlight its relationship with existing approaches, develop Markov chain Monte Carlo methods for inference and prediction, and present an illustration using ovarian cancer data.  相似文献   

15.
The design and analysis of case-control studies with biased sampling   总被引:4,自引:0,他引:4  
A design is proposed for case-control studies in which selection of subjects for full variable ascertainment is based jointly on disease status and on easily obtained "screening" variables that may be related to the disease. Recruitment of subjects follows an independent Bernoulli sampling scheme, with recruitment probabilities set by the investigator in advance. In particular, the sampling can be set up to achieve, on average, frequency matching, provided prior estimates of the disease rates or odds ratios associated with screening variables such as age and sex are available. Alternatively--for example, when studying a rare exposure--one can enrich the sample with certain categories of subject. Following such a design, there are two valid approaches to logistic regression analysis, both of which allow for efficient estimation of effects associated with the screening variables that were allowed to bias the recruitment. The statistical properties of the estimators are compared, both for large samples, based on asymptotics, and for small samples, based on simulations.  相似文献   

16.
This paper addresses optimal design and efficiency of two-phase (2P) case-control studies in which the first phase uses an error-prone exposure measure, Z, while the second phase measures true, dichotomous exposure, X, in a subset of subjects. Optimal design of a separate second phase, to be added to a preexisting study, is also investigated. Differential misclassification is assumed throughout. Results are also applicable to 2P cohort studies with error-prone and error-free measures of disease status but error-free exposure measures. While software based on the mean score method of Reilly and Pepe (1995, Biometrika 82, 299--314) can find optimal designs given pilot data, the lack of simple formulae makes it difficult to generalize about efficiency compared to one-phase (1P) studies based on X alone. Here, formulae for the optimal ratios of cases to controls and first- to second-phase sizes, and the optimal second-phase stratified sampling fractions, given a fixed budget, are given. The maximum efficiency of 2P designs compared to a 1P design is deduced and is shown to be bounded from above by a function of the sensitivities and specificities of Z. The efficiency of 'balanced' separate second-phase designs (Breslow and Cain, 1988, Biometrika 75, 11--20)-in which equal numbers of subjects are chosen from each first-phase strata-compared to optimal design is deduced, enabling situations where balanced designs are nearly optimal to be identified.  相似文献   

17.
We introduce a new method, moment reconstruction, of correcting for measurement error in covariates in regression models. The central idea is similar to regression calibration in that the values of the covariates that are measured with error are replaced by "adjusted" values. In regression calibration the adjusted value is the expectation of the true value conditional on the measured value. In moment reconstruction the adjusted value is the variance-preserving empirical Bayes estimate of the true value conditional on the outcome variable. The adjusted values thereby have the same first two moments and the same covariance with the outcome variable as the unobserved "true" covariate values. We show that moment reconstruction is equivalent to regression calibration in the case of linear regression, but leads to different results for logistic regression. For case-control studies with logistic regression and covariates that are normally distributed within cases and controls, we show that the resulting estimates of the regression coefficients are consistent. In simulations we demonstrate that for logistic regression, moment reconstruction carries less bias than regression calibration, and for case-control studies is superior in mean-square error to the standard regression calibration approach. Finally, we give an example of the use of moment reconstruction in linear discriminant analysis and a nonstandard problem where we wish to adjust a classification tree for measurement error in the explanatory variables.  相似文献   

18.
Nested case-control sampling is designed to reduce the costs of large cohort studies. It is important to estimate the parameters of interest as efficiently as possible. We present a new maximum likelihood estimator (MLE) for nested case-control sampling in the context of Cox's proportional hazards model. The MLE is computed by the EM-algorithm, which is easy to implement in the proportional hazards setting. Standard errors are estimated by a numerical profile likelihood approach based on EM aided differentiation. The work was motivated by a nested case-control study that hypothesized that insulin-like growth factor I was associated with ischemic heart disease. The study was based on a population of 3784 Danes and 231 cases of ischemic heart disease where controls were matched on age and gender. We illustrate the use of the MLE for these data and show how the maximum likelihood framework can be used to obtain information additional to the relative risk estimates of covariates.  相似文献   

19.
We propose a semiparametric mean residual life mixture cure model for right-censored survival data with a cured fraction. The model employs the proportional mean residual life model to describe the effects of covariates on the mean residual time of uncured subjects and the logistic regression model to describe the effects of covariates on the cure rate. We develop estimating equations to estimate the proposed cure model for the right-censored data with and without length-biased sampling, the latter is often found in prevalent cohort studies. In particular, we propose two estimating equations to estimate the effects of covariates in the cure rate and a method to combine them to improve the estimation efficiency. The consistency and asymptotic normality of the proposed estimates are established. The finite sample performance of the estimates is confirmed with simulations. The proposed estimation methods are applied to a clinical trial study on melanoma and a prevalent cohort study on early-onset type 2 diabetes mellitus.  相似文献   

20.
Wang J  Shete S 《PloS one》2011,6(11):e27642
In case-control genetic association studies, cases are subjects with the disease and controls are subjects without the disease. At the time of case-control data collection, information about secondary phenotypes is also collected. In addition to studies of primary diseases, there has been some interest in studying genetic variants associated with secondary phenotypes. In genetic association studies, the deviation from Hardy-Weinberg proportion (HWP) of each genetic marker is assessed as an initial quality check to identify questionable genotypes. Generally, HWP tests are performed based on the controls for the primary disease or secondary phenotype. However, when the disease or phenotype of interest is common, the controls do not represent the general population. Therefore, using only controls for testing HWP can result in a highly inflated type I error rate for the disease- and/or phenotype-associated variants. Recently, two approaches, the likelihood ratio test (LRT) approach and the mixture HWP (mHWP) exact test were proposed for testing HWP in samples from case-control studies. Here, we show that these two approaches result in inflated type I error rates and could lead to the removal from further analysis of potential causal genetic variants associated with the primary disease and/or secondary phenotype when the study of primary disease is frequency-matched on the secondary phenotype. Therefore, we proposed alternative approaches, which extend the LRT and mHWP approaches, for assessing HWP that account for frequency matching. The goal was to maintain more (possible causative) single-nucleotide polymorphisms in the sample for further analysis. Our simulation results showed that both extended approaches could control type I error probabilities. We also applied the proposed approaches to test HWP for SNPs from a genome-wide association study of lung cancer that was frequency-matched on smoking status and found that the proposed approaches can keep more genetic variants for association studies.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号