首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
On the design of synthetic case-control studies   总被引:6,自引:0,他引:6  
R L Prentice 《Biometrics》1986,42(2):301-310
A design is proposed for "case-control within cohort" studies. In this design, controls are sampled without replacement from failure-free members of the cohort at each distinct failure time. Upon selection, a subject ceases to be eligible for control selection at later failure times. Also, if a subject failing at time t had been selected as a control at t' less than t, then the matched controls at t are selected to have also been at risk at t'. In these circumstances correlation exists between score statistic contributions at t and t'. An estimator is developed for this correlation. A small simulation study compares the design just described to other possible synthetic case-control designs.  相似文献   

2.
Lu SE  Wang MC 《Biometrics》2002,58(4):764-772
Cohort case-control design is an efficient and economical design to study risk factors for disease incidence or mortality in a large cohort. In the last few decades, a variety of cohort case-control designs have been developed and theoretically justified. These designs have been exclusively applied to the analysis of univariate failure-time data. In this work, a cohort case-control design adapted to multivariate failure-time data is developed. A risk set sampling method is proposed to sample controls from nonfailures in a large cohort for each case matched by failure time. This method leads to a pseudolikelihood approach for the estimation of regression parameters in the marginal proportional hazards model (Cox, 1972, Journal of the Royal Statistical Society, Series B 34, 187-220), where the correlation structure between individuals within a cluster is left unspecified. The performance of the proposed estimator is demonstrated by simulation studies. A bootstrap method is proposed for inferential purposes. This methodology is illustrated by a data example from a child vitamin A supplementation trial in Nepal (Nepal Nutrition Intervention Project-Sarlahi, or NNIPS).  相似文献   

3.
Efficiency of cohort sampling designs: some surprising results.   总被引:3,自引:0,他引:3  
B Langholz  D C Thomas 《Biometrics》1991,47(4):1563-1571
Cohort sampling designs are proposed which one would intuitively expect to be more efficient than nested case-control sampling. Two of these designs start with a nested case-control sample and distribute controls to sampled risk sets other than those for which they were picked. The third design has the goal of maximizing the number of distinct persons in a nested case-control sample. Simulation results show surprisingly little gain, and more often a loss in efficiency of these new designs relative to nested case-control sampling. This is due to the sampling-induced covariance between score terms. We conclude that the often stated intuition that nested case-control sampling does not make good use of sampled individuals' covariate histories is false.  相似文献   

4.
Garner C 《Human heredity》2006,61(1):22-26
BACKGROUND: The optimal control sample would be ethnically-matched and at minimal risk of developing the disease. Alternatively, one could collect random individuals from the population or select individuals to reduce the number of at-risk individuals in the sample. The effect of randomly selected individuals in a control sample on the statistical power and the odds ratio estimate was investigated. METHODS: Case and control genotype distributions were simulated using standard genetic models with an additional term representing the proportion of unidentified cases in the control sample. Power and odds ratio were calculated from the genotype distributions generated under different sampling scenarios using established methods. RESULTS: Random sampling of controls resulted in a loss in power and a reduction in the odds ratio estimate to a degree that is determined by the proportion of random sampling and the prevalence of the disease. Random sampling resulted in a 19% loss in power for a disease having prevalence of 0.20, compared to a control sample that contained no at-risk individuals. Having random controls results in a decrease in the odds ratio estimate. CONCLUSIONS: Investigators planning case-control genetic association studies should be aware of the statistical costs of different ascertainment approaches.  相似文献   

5.
McNamee R 《Biometrics》2004,60(3):783-792
Two-phase designs for estimation of prevalence, where the first-phase classification is fallible and the second is accurate but relatively expensive, are not necessarily justified on efficiency grounds. However, they might be advantageous for dual-purpose studies, for example where prevalence estimation is followed by a clinical trial or case-control study, if they can identify cases of disease for the second study in a cost-effective way. Alternatively, they may be justified on ethical grounds if they can identify more, previously undetected but treatable cases of disease, than a simple random sample design. An approach to sampling is proposed, which formally combines the goals of efficient prevalence estimation and case detection by setting different notional study costs for investigating cases and noncases. Two variants of the method are compared with an "ethical" two-phase scheme proposed by Shrout and Newman (1989, Biometrics 45, 549-555), and with the most efficient scheme for prevalence estimation alone, in terms of the standard error of the prevalence estimate, the expected number of cases, and the fraction of cases among second-phase subjects, given a fixed budget. One variant yields the highest fraction and expected number of cases but also the largest standard errors. The other yields a higher fraction than Shrout and Newman's scheme and a similar number of cases but appears to do so more efficiently.  相似文献   

6.
Cohort and nested case-control (NCC) designs are frequently used in pharmacoepidemiology to assess the associations of drug exposure that can vary over time with the risk of an adverse event. Although it is typically expected that estimates from NCC analyses are similar to those from the full cohort analysis, with moderate loss of precision, only few studies have actually compared their respective performance for estimating the effects of time-varying exposures (TVE). We used simulations to compare the properties of the resulting estimators of these designs for both time-invariant exposure and TVE. We varied exposure prevalence, proportion of subjects experiencing the event, hazard ratio, and control-to-case ratio and considered matching on confounders. Using both designs, we also estimated the real-world associations of time-invariant ever use of menopausal hormone therapy (MHT) at baseline and updated, time-varying MHT use with breast cancer incidence. In all simulated scenarios, the cohort-based estimates had small relative bias and greater precision than the NCC design. NCC estimates displayed bias to the null that decreased with a greater number of controls per case. This bias markedly increased with higher proportion of events. Bias was seen with Breslow's and Efron's approximations for handling tied event times but was greatly reduced with the exact method or when NCC analyses were matched on confounders. When analyzing the MHT-breast cancer association, differences between the two designs were consistent with simulated data. Once ties were taken correctly into account, NCC estimates were very similar to those of the full cohort analysis.  相似文献   

7.
Linkage mapping of complex diseases is often followed by association studies between phenotypes and marker genotypes through use of case-control or family-based designs. Given fixed genotyping resources, it is important to know which study designs are the most efficient. To address this problem, we extended the likelihood-based method of Li et al., which assesses whether there is linkage disequilibrium between a disease locus and a SNP, to accommodate sibships of arbitrary size and disease-phenotype configuration. A key advantage of our method is the ability to combine data from different family structures. We consider scenarios for which genotypes are available for unrelated cases, affected sib pairs (ASPs), or only one sibling per ASP. We construct designs that use cases only and others that use unaffected siblings or unrelated unaffected individuals as controls. Different combinations of cases and controls result in seven study designs. We compare the efficiency of these designs when the number of individuals to be genotyped is fixed. Our results suggest that (1) when the disease is influenced by a single gene, the one sibling per ASP-control design is the most efficient, followed by the ASP-control design, and familial cases contribute more association information than singleton cases; (2) when the disease is influenced by multiple genes, familial cases provide more association information than singleton cases, unless the effect of the locus being tested is much smaller than at least one other untested disease locus; and (3) the case-control design can be useful for detecting genes with small effect in the presence of genes with much larger effect. Our findings will be helpful for researchers designing and analyzing complex disease-association studies and will facilitate genotyping resource allocation.  相似文献   

8.
Weinberg CR 《Genomics》2009,93(1):10-12
Most diseases are complex in that they are caused by the joint action of multiple factors, both genetic and environmental. Over the past few decades, the mathematical convenience of logistic regression has served to enshrine the multiplicative model, to the point where many epidemiologists believe that departure from additivity on a log scale implies that two factors interact in causing disease. Other terminology in epidemiology, where students are told that inequality of relative risks across levels of a second factor should be seen as "effect modification," reinforces an uncritical acceptance of multiplicative joint effect as the biologically meaningful no-interaction null. Our first task, when studying joint effects, is to understand the limitations of our definitions for "interaction," and recognize that what statisticians mean and what biologists might want to mean by interaction may not coincide. Joint effects are notoriously hard to identify and characterize, even when asking a simple and unsatisfying question, like whether two effects are log-additive. The rule of thumb for such efforts is that a factor-of-four sample size is needed, compared with that needed to demonstrate main effects of either genes or exposures. So strategies have been devised that focus on the most informative individuals, either through risk-based sampling for a cohort, or case-control sampling, extreme phenotype sampling, pooling, two-stage sampling, exposed-only, or case-only designs. These designs gain efficiency, but at a cost of flexibility in models for joint effects. A relatively new approach avoids population controls by genotyping case-parent triads. Because it requires parents, the method works best for diseases with onset early in life. With this design, the role of autosomal genetic variants is assessed by in effect treating the nontransmitted parental alleles as controls for affected offspring. Despite advantages for looking at genetic effects, the triad design faces limitations when examining joint effects of genetic and environmental factors. Because population-based controls are not included, main effects for exposures cannot be estimated, and consequently one only has access to inference related to a multiplicative null. We have proposed a hybrid approach that offers the best features of both case-parent and case-control designs. Through genotyping of parents of population-based controls and assuming Mendelian transmission, power is markedly enhanced. One can also estimate main effects for exposures and now flexibly assess models for joint effects.  相似文献   

9.

Background

A diverse range of study designs (e.g. case-control or cohort) are used in the evaluation of adverse effects. We aimed to ascertain whether the risk estimates from meta-analyses of case-control studies differ from that of other study designs.

Methods

Searches were carried out in 10 databases in addition to reference checking, contacting experts, and handsearching key journals and conference proceedings. Studies were included where a pooled relative measure of an adverse effect (odds ratio or risk ratio) from case-control studies could be directly compared with the pooled estimate for the same adverse effect arising from other types of observational studies.

Results

We included 82 meta-analyses. Pooled estimates of harm from the different study designs had 95% confidence intervals that overlapped in 78/82 instances (95%). Of the 23 cases of discrepant findings (significant harm identified in meta-analysis of one type of study design, but not with the other study design), 16 (70%) stemmed from significantly elevated pooled estimates from case-control studies. There was associated evidence of funnel plot asymmetry consistent with higher risk estimates from case-control studies. On average, cohort or cross-sectional studies yielded pooled odds ratios 0.94 (95% CI 0.88–1.00) times lower than that from case-control studies.

Interpretation

Empirical evidence from this overview indicates that meta-analysis of case-control studies tend to give slightly higher estimates of harm as compared to meta-analyses of other observational studies. However it is impossible to rule out potential confounding from differences in drug dose, duration and populations when comparing between study designs.  相似文献   

10.
Estimating the effects of haplotypes on the age of onset of a disease is an important step toward the discovery of genes that influence complex human diseases. A haplotype is a specific sequence of nucleotides on the same chromosome of an individual and can only be measured indirectly through the genotype. We consider cohort studies which collect genotype data on a subset of cohort members through case-cohort or nested case-control sampling. We formulate the effects of haplotypes and possibly time-varying environmental variables on the age of onset through a broad class of semiparametric regression models. We construct appropriate nonparametric likelihoods, which involve both finite- and infinite-dimensional parameters. The corresponding nonparametric maximum likelihood estimators are shown to be consistent, asymptotically normal, and asymptotically efficient. Consistent variance-covariance estimators are provided, and efficient and reliable numerical algorithms are developed. Simulation studies demonstrate that the asymptotic approximations are accurate in practical settings and that case-cohort and nested case-control designs are highly cost-effective. An application to a major cardiovascular study is provided.  相似文献   

11.
In genome-wide association studies (GWAS), multiple diseases with shared controls is one of the case–control study designs. If data obtained from these studies are appropriately analyzed, this design can have several advantages such as improving statistical power in detecting associations and reducing the time and cost in the data collection process. In this paper, we propose a study design for GWAS which involves multiple diseases but without controls. We also propose corresponding statistical data analysis strategy for GWAS with multiple diseases but no controls. Through a simulation study, we show that the statistical association test with the proposed study design is more powerful than the test with single disease sharing common controls, and it has comparable power to the overall test based on the whole dataset including the controls. We also apply the proposed method to a real GWAS dataset to illustrate the methodologies and the advantages of the proposed design. Some possible limitations of this study design and testing method and their solutions are also discussed. Our findings indicate that the proposed study design and statistical analysis strategy could be more efficient than the usual case–control GWAS as well as those with shared controls.  相似文献   

12.
S Wacholder  M Gail  D Pee 《Biometrics》1991,47(1):63-76
We develop approximate methods to compare the efficiencies and to compute the power of alternative potential designs for sampling from a cohort before beginning to collect exposure data. Our methods require only that the cohort be assembled, meaning that the numbers of individuals Nkj at risk at pairs of event times tk and tj greater than or equal to tk are available. To compute Nkj, one needs to know the entry, follow-up, censoring, and event history, but not the exposure, for each individual. Our methods apply to any "unbiased control sampling design," in which cases are compared to a random sample of noncases at risk at the time of an event. We apply our methods to approximate the efficiencies of the nested case-control design, the case-cohort design, and an augmented case-cohort design, compared to the full cohort design, in an assembled cohort of 17,633 members of an insurance cooperative who were followed for mortality from prostatic cancer. The assumptions underlying the approximation are that exposure is unrelated both to the hazard of an event and to the hazard for censoring. The approximations performed well in simulations when both assumptions held and when the exposure was moderately related to censoring.  相似文献   

13.
Case-control designs are widely used in rare disease studies. In a typical case-control study, data are collected from a sample of all available subjects who have experienced a disease (cases) and a sub-sample of subjects who have not experienced the disease (controls) in a study cohort. Cases are oversampled in case-control studies. Logistic regression is a common tool to estimate the relative risks of the disease with respect to a set of covariates. Very often in such a study, information of ages-at-onset of the disease for all cases and ages at survey of controls are known. Standard logistic regression analysis using age as a covariate is based on a dichotomous outcome and does not efficiently use such age-at-onset (time-to-event) information. We propose to analyze age-at-onset data using a modified case-cohort method by treating the control group as an approximation of a subcohort assuming rare events. We investigate the asymptotic bias of this approximation and show that the asymptotic bias of the proposed estimator is small when the disease rate is low. We evaluate the finite sample performance of the proposed method through a simulation study and illustrate the method using a breast cancer case-control data set.  相似文献   

14.
The paper proposes an approach to causal mediation analysis in nested case-control study designs, often incorporated with countermatching schemes using conditional likelihood, and we compare the method's performance to that of mediation analysis using the Cox model for the full cohort with a continuous or dichotomous mediator. Simulation studies are conducted to assess our proposed method and investigate the efficiency relative to the cohort. We illustrate the method using actual data from two studies of potential mediation of radiation risk conducted within the Adult Health Study cohort of atomic-bomb survivors. The performance becomes comparable to that based on the full cohort, illustrating the potential for valid mediation analysis based on the reduced data obtained through the nested case-control design.  相似文献   

15.
J M Robins  M H Gail  J H Lubin 《Biometrics》1986,42(2):293-299
The authors consider several aspects of the design and analysis of synthetic case-control studies of cohort data under a proportional hazards model. First, in highly stratified data, consistent estimates of the relative risk are shown to result only if controls are sampled randomly with replacement from the entire risk set or without replacement from the noncases. Second, if previous controls are excluded from consideration as future controls but are included as cases if they fail, then inconsistent estimates of the relative risk can occur if "time" in the proportional hazards model represents an individual's chronological age and age at entry into follow-up is variable. On the other hand, if "time" represents time since the beginning of follow-up, estimates of the relative risk will be consistent, but the usual variance estimator will be inconsistent.  相似文献   

16.
In this paper we propose a method to be used in the planning stage of a case-control study. An allocation rule for controls in multicenter case-control studies is proposed which would assure a simple, efficient and unbiased estimation of the odds ratio in the pooled data. It is shown that the efficiency of the design increases with increasing correlation between study center and risk factor. Sources of bias and their implications for relative risk estimation are discussed. The method is demonstrated with data from a case-control study.  相似文献   

17.
Zhao Y  Wang S 《Human heredity》2009,67(1):46-56
Study cost remains the major limiting factor for genome-wide association studies due to the necessity of genotyping a large number of SNPs for a large number of subjects. Both DNA pooling strategies and two-stage designs have been proposed to reduce genotyping costs. In this study, we propose a cost-effective, two-stage approach with a DNA pooling strategy. During stage I, all markers are evaluated on a subset of individuals using DNA pooling. The most promising set of markers is then evaluated with individual genotyping for all individuals during stage II. The goal is to determine the optimal parameters (pi(p)(sample ), the proportion of samples used during stage I with DNA pooling; and pi(p)(marker ), the proportion of markers evaluated during stage II with individual genotyping) that minimize the cost of a two-stage DNA pooling design while maintaining a desired overall significance level and achieving a level of power similar to that of a one-stage individual genotyping design. We considered the effects of three factors on optimal two-stage DNA pooling designs. Our results suggest that, under most scenarios considered, the optimal two-stage DNA pooling design may be much more cost-effective than the optimal two-stage individual genotyping design, which use individual genotyping during both stages.  相似文献   

18.
In biomedical cohort studies for assessing the association between an outcome variable and a set of covariates, usually, some covariates can only be measured on a subgroup of study subjects. An important design question is—which subjects to select into the subgroup to increase statistical efficiency. When the outcome is binary, one may adopt a case-control sampling design or a balanced case-control design where cases and controls are further matched on a small number of complete discrete covariates. While the latter achieves success in estimating odds ratio (OR) parameters for the matching covariates, similar two-phase design options have not been explored for the remaining covariates, especially the incompletely collected ones. This is of great importance in studies where the covariates of interest cannot be completely collected. To this end, assuming that an external model is available to relate the outcome and complete covariates, we propose a novel sampling scheme that oversamples cases and controls with worse goodness-of-fit based on the external model and further matches them on complete covariates similarly to the balanced design. We develop a pseudolikelihood method for estimating OR parameters. Through simulation studies and explorations in a real-cohort study, we find that our design generally leads to reduced asymptotic variances of the OR estimates and the reduction for the matching covariates is comparable to that of the balanced design.  相似文献   

19.
Outcome-dependent sampling (ODS) schemes can be a cost effective way to enhance study efficiency. The case-control design has been widely used in epidemiologic studies. However, when the outcome is measured on a continuous scale, dichotomizing the outcome could lead to a loss of efficiency. Recent epidemiologic studies have used ODS sampling schemes where, in addition to an overall random sample, there are also a number of supplemental samples that are collected based on a continuous outcome variable. We consider a semiparametric empirical likelihood inference procedure in which the underlying distribution of covariates is treated as a nuisance parameter and is left unspecified. The proposed estimator has asymptotic normality properties. The likelihood ratio statistic using the semiparametric empirical likelihood function has Wilks-type properties in that, under the null, it follows a chi-square distribution asymptotically and is independent of the nuisance parameters. Our simulation results indicate that, for data obtained using an ODS design, the semiparametric empirical likelihood estimator is more efficient than conditional likelihood and probability weighted pseudolikelihood estimators and that ODS designs (along with the proposed estimator) can produce more efficient estimates than simple random sample designs of the same size. We apply the proposed method to analyze a data set from the Collaborative Perinatal Project (CPP), an ongoing environmental epidemiologic study, to assess the relationship between maternal polychlorinated biphenyl (PCB) level and children's IQ test performance.  相似文献   

20.
Case-cohort designs and analysis for clustered failure time data   总被引:1,自引:0,他引:1  
Lu SE  Shih JH 《Biometrics》2006,62(4):1138-1148
Case-cohort design is an efficient and economical design to study risk factors for infrequent disease in a large cohort. It involves the collection of covariate data from all failures ascertained throughout the entire cohort, and from the members of a random subcohort selected at the onset of follow-up. In the literature, the case-cohort design has been extensively studied, but was exclusively considered for univariate failure time data. In this article, we propose case-cohort designs adapted to multivariate failure time data. An estimation procedure with the independence working model approach is used to estimate the regression parameters in the marginal proportional hazards model, where the correlation structure between individuals within a cluster is left unspecified. Statistical properties of the proposed estimators are developed. The performance of the proposed estimators and comparisons of statistical efficiencies are investigated with simulation studies. A data example from the Translating Research into Action for Diabetes (TRIAD) study is used to illustrate the proposed methodology.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号