首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
ABSTRACT: BACKGROUND: For gene expression or gene association studies with a large number of hypotheses the number of measurements per marker in a conventional single-stage design is often low due to limited resources. Two-stage designs have been proposed where in a first stage promising hypotheses are identified and further investigated in the second stage with larger sample sizes. For two types of two-stage designs proposed in the literature we derive multiple testing procedures controlling the False Discovery Rate (FDR) demonstrating FDR control by simulations: designs where a fixed number of top-ranked hypotheses are selected and designs where the selection in the interim analysis is based on an FDR threshold. In contrast to earlier approaches which use only the second-stage data in the hypothesis tests (pilot approach), the proposed testing procedures are based on the pooled data from both stages (integrated approach). Results: For both selection rules the multiple testing procedures control the FDR in the considered simulation scenarios. This holds for the case of independent observations across hypotheses as well as for certain correlation structures. Additionally, we show that in scenarios with small effect sizes the testing procedures based on the pooled data from both stages can give a considerable improvement in power compared to tests based on the second-stage data only. Conclusion: The proposed hypothesis tests provide a tool for FDR control for the considered two-stage designs. Comparing the integrated approaches for both selection rules with the corresponding pilot approaches showed an advantage of the integrated approach in many simulation scenarios.  相似文献   

2.
Zhao Y  Wang S 《Human heredity》2009,67(1):46-56
Study cost remains the major limiting factor for genome-wide association studies due to the necessity of genotyping a large number of SNPs for a large number of subjects. Both DNA pooling strategies and two-stage designs have been proposed to reduce genotyping costs. In this study, we propose a cost-effective, two-stage approach with a DNA pooling strategy. During stage I, all markers are evaluated on a subset of individuals using DNA pooling. The most promising set of markers is then evaluated with individual genotyping for all individuals during stage II. The goal is to determine the optimal parameters (pi(p)(sample ), the proportion of samples used during stage I with DNA pooling; and pi(p)(marker ), the proportion of markers evaluated during stage II with individual genotyping) that minimize the cost of a two-stage DNA pooling design while maintaining a desired overall significance level and achieving a level of power similar to that of a one-stage individual genotyping design. We considered the effects of three factors on optimal two-stage DNA pooling designs. Our results suggest that, under most scenarios considered, the optimal two-stage DNA pooling design may be much more cost-effective than the optimal two-stage individual genotyping design, which use individual genotyping during both stages.  相似文献   

3.
Two-stage designs for experiments with a large number of hypotheses   总被引:1,自引:0,他引:1  
MOTIVATION: When a large number of hypotheses are investigated the false discovery rate (FDR) is commonly applied in gene expression analysis or gene association studies. Conventional single-stage designs may lack power due to low sample sizes for the individual hypotheses. We propose two-stage designs where the first stage is used to screen the 'promising' hypotheses which are further investigated at the second stage with an increased sample size. A multiple test procedure based on sequential individual P-values is proposed to control the FDR for the case of independent normal distributions with known variance. RESULTS: The power of optimal two-stage designs is impressively larger than the power of the corresponding single-stage design with equal costs. Extensions to the case of unknown variances and correlated test statistics are investigated by simulations. Moreover, it is shown that the simple multiple test procedure using first stage data for screening purposes and deriving the test decisions only from second stage data is a very powerful option.  相似文献   

4.
There is an increasing interest in the use of two-stage case-control studies to reduce genotyping costs in the search for genes underlying common disorders. Instead of analyzing the data from the second stage separately, a more powerful test can be performed by combining the data from both stages. However, standard tests cannot be used because only the markers that are significant in the first stage are selected for the second stage and the test statistics at both stages are dependent because they partly involve the same data. Theoretical approximations are not available for commonly used test statistics and in this specific context simulations can be problematic because of the computational burden. We therefore derived a cost-effective, that is, accurate but fast in terms of central processing unit (CPU) time, approximation for the distribution of Pearson's statistic on 2 xm contingency tables in two-stage design with combined data. We included this approximation in an iterative method for designing optimal two-stage studies. Simulations supported the accuracy of our approximation. Numerical results confirmed that the use of two-stage designs reduces the genotyping burden substantially. Compared to not combining data, combining the data decreases the required sample sizes on average by 15% and the genotyping burden by 5%.  相似文献   

5.
Genomewide association studies (GWAS) are being conducted to unravel the genetic etiology of complex diseases, in which complex epistasis may play an important role. One-stage method in which interactions are tested using all samples at one time may be computationally problematic, may have low power as the number of markers tested increases and may not be cost-efficient. A common two-stage method may be a reasonable and powerful approach for detecting interacting genes using all samples in both two stages. In this study, we introduce an alternative two-stage method, in which some promising markers are selected using a proportion of samples in the first stage and interactions are then tested using the remaining samples in the second stage. This two-stage method is called mixed two-stage method. We then investigate the power of both one-stage method and mixed two-stage method to detect interacting disease loci for a range of two-locus epistatic models in a case-control study design. Our results suggest that mixed two-stage method may be more powerful than one-stage method if we choose about 30% of samples for single-locus tests in the first stage, and identify less than and equal to 1% of markers for follow-up interaction tests. In addition, we compare both two-stage methods and find that our two-stage method will lose power because we only use part of samples in both two stages.  相似文献   

6.
Two-stage designs in case-control association analysis   总被引:1,自引:0,他引:1       下载免费PDF全文
Zuo Y  Zou G  Zhao H 《Genetics》2006,173(3):1747-1760
DNA pooling is a cost-effective approach for collecting information on marker allele frequency in genetic studies. It is often suggested as a screening tool to identify a subset of candidate markers from a very large number of markers to be followed up by more accurate and informative individual genotyping. In this article, we investigate several statistical properties and design issues related to this two-stage design, including the selection of the candidate markers for second-stage analysis, statistical power of this design, and the probability that truly disease-associated markers are ranked among the top after second-stage analysis. We have derived analytical results on the proportion of markers to be selected for second-stage analysis. For example, to detect disease-associated markers with an allele frequency difference of 0.05 between the cases and controls through an initial sample of 1000 cases and 1000 controls, our results suggest that when the measurement errors are small (0.005), approximately 3% of the markers should be selected. For the statistical power to identify disease-associated markers, we find that the measurement errors associated with DNA pooling have little effect on its power. This is in contrast to the one-stage pooling scheme where measurement errors may have large effect on statistical power. As for the probability that the disease-associated markers are ranked among the top in the second stage, we show that there is a high probability that at least one disease-associated marker is ranked among the top when the allele frequency differences between the cases and controls are not <0.05 for reasonably large sample sizes, even though the errors associated with DNA pooling in the first stage are not small. Therefore, the two-stage design with DNA pooling as a screening tool offers an efficient strategy in genomewide association studies, even when the measurement errors associated with DNA pooling are nonnegligible. For any disease model, we find that all the statistical results essentially depend on the population allele frequency and the allele frequency differences between the cases and controls at the disease-associated markers. The general conclusions hold whether the second stage uses an entirely independent sample or includes both the samples used in the first stage and an independent set of samples.  相似文献   

7.
Flexible manufacturing systems (FMSs) for two-stage production may possess a variety of operating flexibilities in the form of tooling capabilities for the machines and alternative routings for each operation. In this paper, we compare the throughput performance of several flexible flow shop and job shop designs. We consider two-stage assembly flow shops with m parallel machines in stage 1 and a single assembly facility in stage 2. Every upstream operation can be processed by any one of the machines in stage 1 prior to the assembly stage. We also study a similar design where every stage 1 operation is processed by a predetermined machine. For both designs, we present heuristic algorithms with good worst-case error bounds and show that the average performance of these algorithms is near optimal. The algorithms presented are used to compare the performance of the two designs with each other and other related flexible flow shop designs. It is shown, both analytically and experimentally, that the mode of flexibility possessed by a design has implications on the throughput performance of the production system.  相似文献   

8.
In oncology, single‐arm two‐stage designs with binary endpoint are widely applied in phase II for the development of cytotoxic cancer therapies. Simon's optimal design with prefixed sample sizes in both stages minimizes the expected sample size under the null hypothesis and is one of the most popular designs. The search algorithms that are currently used to identify phase II designs showing prespecified characteristics are computationally intensive. For this reason, most authors impose restrictions on their search procedure. However, it remains unclear to what extent this approach influences the optimality of the resulting designs. This article describes an extension to fixed sample size phase II designs by allowing the sample size of stage two to depend on the number of responses observed in the first stage. Furthermore, we present a more efficient numerical algorithm that allows for an exhaustive search of designs. Comparisons between designs presented in the literature and the proposed optimal adaptive designs show that while the improvements are generally moderate, notable reductions in the average sample size can be achieved for specific parameter constellations when applying the new method and search strategy.  相似文献   

9.
Large-scale whole genome association studies are increasingly common, due in large part to recent advances in genotyping technology. With this change in paradigm for genetic studies of complex diseases, it is vital to develop valid, powerful, and efficient statistical tools and approaches to evaluate such data. Despite a dramatic drop in genotyping costs, it is still expensive to genotype thousands of individuals for hundreds of thousands single nucleotide polymorphisms (SNPs) for large-scale whole genome association studies. A multi-stage (or two-stage) design has been a promising alternative: in the first stage, only a fraction of samples are genotyped and tested using a dense set of SNPs, and only a small subset of markers that show moderate associations with the disease will be genotyped in later stages. Multi-stage designs have also been used in candidate gene association studies, usually in regions that have shown strong signals by linkage studies. To decide which set of SNPs to be genotyped in the next stage, a common practice is to utilize a simple test (such as a chi2 test for case-control data) and a liberal significance level without corrections for multiple testing, to ensure that no true signals will be filtered out. In this paper, I have developed a novel SNP selection procedure within the framework of multi-stage designs. Based on data from stage 1, the method explicitly explores correlations (linkage disequilibrium) among SNPs and their possible interactions in determining the disease phenotype. Comparing with a regular multi-stage design, the approach can select a much reduced set of SNPs with high discriminative power for later stages. Therefore, not only does it reduce the genotyping cost in later stages, it also increases the statistical power by reducing the number of tests. Combined analysis is proposed to further improve power, and the theoretical significance level of the combined statistic is derived. Extensive simulations have been performed, and results have shown that the procedure can reduce the number of SNPs required in later stages, with improved power to detect associations. The procedure has also been applied to a real data set from a genome-wide association study of the sporadic amyotrophic lateral sclerosis (ALS) disease, and an interesting set of candidate SNPs has been identified.  相似文献   

10.
Flexible design for following up positive findings   总被引:2,自引:0,他引:2       下载免费PDF全文
As more population-based studies suggest associations between genetic variants and disease risk, there is a need to improve the design of follow-up studies (stage II) in independent samples to confirm evidence of association observed at the initial stage (stage I). We propose to use flexible designs developed for randomized clinical trials in the calculation of sample size for follow-up studies. We apply a bootstrap procedure to correct the effect of regression to the mean, also called "winner's curse," resulting from choosing to follow up the markers with the strongest associations. We show how the results from stage I can improve sample size calculations for stage II adaptively. Despite the adaptive use of stage I data, the proposed method maintains the nominal global type I error for final analyses on the basis of either pure replication with the stage II data only or a joint analysis using information from both stages. Simulation studies show that sample-size calculations accounting for the impact of regression to the mean with the bootstrap procedure are more appropriate than is the conventional method. We also find that, in the context of flexible design, the joint analysis is generally more powerful than the replication analysis.  相似文献   

11.
Summary .  It is well known that optimal designs are strongly model dependent. In this article, we apply the Lagrange multiplier approach to the optimal design problem, using a recently proposed model for carryover effects. Generally, crossover designs are not recommended when carryover effects are present and when the primary goal is to obtain an unbiased estimate of the treatment effect. In some cases, baseline measurements are believed to improve design efficiency. This article examines the impact of baselines on optimal designs using two different assumptions about carryover effects during baseline periods and employing a nontraditional crossover design model. As anticipated, baseline observations improve design efficiency considerably for two-period designs, which use the data in the first period only to obtain unbiased estimates of treatment effects, while the improvement is rather modest for three- or four-period designs. Further, we find little additional benefits for measuring baselines at each treatment period as compared to measuring baselines only in the first period. Although our study of baselines did not change the results on optimal designs that are reported in the literature, the problem of strong model dependency problem is generally recognized. The advantage of using multiperiod designs is rather evident, as we found that extending two-period designs to three- or four-period designs significantly reduced variability in estimating the direct treatment effect contrast.  相似文献   

12.
Genome-wide association (GWA) studies are a powerful approach for identifying novel genetic risk factors associated with human disease. A GWA study typically requires the inclusion of thousands of samples to have sufficient statistical power to detect single nucleotide polymorphisms that are associated with only modest increases in risk of disease given the heavy burden of a multiple test correction that is necessary to maintain valid statistical tests. Low statistical power and the high financial cost of performing a GWA study remains prohibitive for many scientific investigators anxious to perform such a study using their own samples. A number of remedies have been suggested to increase statistical power and decrease cost, including the utilization of free publicly available genotype data and multi-stage genotyping designs. Herein, we compare the statistical power and relative costs of alternative association study designs that use cases and screened controls to study designs that are based only on, or additionally include, free public control genotype data. We describe a novel replication-based two-stage study design, which uses free public control genotype data in the first stage and follow-up genotype data on case-matched controls in the second stage that preserves many of the advantages inherent when using only an epidemiologically matched set of controls. Specifically, we show that our proposed two-stage design can substantially increase statistical power and decrease cost of performing a GWA study while controlling the type-I error rate that can be inflated when using public controls due to differences in ancestry and batch genotype effects.  相似文献   

13.
Kang G  Lin D  Hakonarson H  Chen J 《Human heredity》2012,73(3):139-147
Next-generation sequencing technology provides an unprecedented opportunity to identify rare susceptibility variants. It is not yet financially feasible to perform whole-genome sequencing on a large number of subjects, and a two-stage design has been advocated to be a practical option. In stage I, variants are discovered by sequencing the whole genomes of a small number of carefully selected individuals. In stage II, the discovered variants of a large number of individuals are genotyped to assess associations. Individuals with extreme phenotypes are typically selected in stage I. Using simulated data for unrelated individuals, we explore two important aspects of this two-stage design: the efficiency of discovering common and rare single-nucleotide polymorphisms (SNPs) in stage I and the impact of incomplete SNP discovery in stage I on the power of testing associations in stage II. We applied a sum test and a sum of squared score test for gene-based association analyses evaluating the power of the two-stage design. We obtained the following results from extensive simulation studies and analysis of the GAW17 dataset. When individuals with trait values more extreme than the 99.7-99th quantile were included in stage I, the two-stage design could achieve the same power as or even higher than the one-stage design if the rare causal variants had large effect sizes. In such design, fewer than half of the total SNPs including more than half of the causal SNPs were discovered, which included nearly all SNPs with minor allele frequencies (MAFs) ≥5%, more than half of the SNPs with MAFs between 1% and 5%, and fewer than half of the SNPs with MAFs <1%. Although a one-stage design may be preferable to identify multiple rare variants having small to moderate effect sizes, our observations support using the two-stage design as a cost-effective option for next-generation sequencing studies.  相似文献   

14.
Posch M  Bauer P 《Biometrics》2000,56(4):1170-1176
This article deals with sample size reassessment for adaptive two-stage designs based on conditional power arguments utilizing the variability observed at the first stage. Fisher's product test for the p-values from the disjoint samples at the two stages is considered in detail for the comparison of the means of two normal populations. We show that stopping rules allowing for the early acceptance of the null hypothesis that are optimal with respect to the average sample size may lead to a severe decrease of the overall power if the sample size is a priori underestimated. This problem can be overcome by choosing designs with low probabilities of early acceptance or by midtrial adaptations of the early acceptance boundary using the variability observed in the first stage. This modified procedure is negligibly anticonservative and preserves the power.  相似文献   

15.
Zheng G  Song K  Elston RC 《Human heredity》2007,63(3-4):175-186
We study a two-stage analysis of genetic association for case-control studies. In the first stage, we compare Hardy-Weinberg disequilibrium coefficients between cases and controls and, in the second stage, we apply the Cochran- Armitage trend test. The two analyses are statistically independent when Hardy-Weinberg equilibrium holds in the population, so all the samples are used in both stages. The significance level in the first stage is adaptively determined based on its conditional power. Given the level in the first stage, the level for the second stage analysis is determined with the overall Type I error being asymptotically controlled. For finite sample sizes, a parametric bootstrap method is used to control the overall Type I error rate. This two-stage analysis is often more powerful than the Cochran-Armitage trend test alone for a large association study. The new approach is applied to SNPs from a real study.  相似文献   

16.
Continuous fermentation trials on the bioconversions of pregnadiene to pregnatriene by Septomyxa affinis and progesterone to 11α-hydroxyprogesterone by Rhizopus nigricans were conducted successfully in an eight-stage pilot plant reactor. The first stage was used as the mycelial growth stage while the steroid solutions were added continuously to stage 2, thus using the remaining stages as conversion vessels. Recoveries of 50 to 60% oxidized steroid (based on total steroid supplied) were obtained in both cases upon a contact time of 5 hr between mycelium and steroid. Longer contact times resulted in a gradual net loss of steroid. It was concluded that two-stage reactors (one growth stage and one conversion stage) were adequate for efficient continuous operation of such processes. The reaction volumes of both stages have to be kept in proper balance to insure optimal holdup times for both the cell growth and conversion steps.  相似文献   

17.
18.
We propose a Bayesian two-stage biomarker-based adaptive randomization (AR) design for the development of targeted agents. The design has three main goals: (1) to test the treatment efficacy, (2) to identify prognostic and predictive markers for the targeted agents, and (3) to provide better treatment for patients enrolled in the trial. To treat patients better, both stages are guided by the Bayesian AR based on the individual patient’s biomarker profiles. The AR in the first stage is based on a known marker. A Go/No-Go decision can be made in the first stage by testing the overall treatment effects. If a Go decision is made at the end of the first stage, a two-step Bayesian lasso strategy will be implemented to select additional prognostic or predictive biomarkers to refine the AR in the second stage. We use simulations to demonstrate the good operating characteristics of the design, including the control of per-comparison type I and type II errors, high probability in selecting important markers, and treating more patients with more effective treatments. Bayesian adaptive designs allow for continuous learning. The designs are particularly suitable for the development of multiple targeted agents in the quest of personalized medicine. By estimating treatment effects and identifying relevant biomarkers, the information acquired from the interim data can be used to guide the choice of treatment for each individual patient enrolled in the trial in real time to achieve a better outcome. The design is being implemented in the BATTLE-2 trial in lung cancer at the MD Anderson Cancer Center.  相似文献   

19.
We develop expressions for the power to detect associations between parental genotypes and offspring phenotypes for quantitative traits. Three different “indirect” experimental designs are considered: full-sib, half-sib, and full-sib–half-sib families. We compare the power of these designs to detect genotype–phenotype associations relative to the common, “direct,” approach of genotyping and phenotyping the same individuals. When heritability is low, the indirect designs can outperform the direct method. However, the extra power comes at a cost due to an increased phenotyping effort. By developing expressions for optimal experimental designs given the cost of phenotyping relative to genotyping, we show how the extra costs associated with phenotyping a large number of individuals will influence experimental design decisions. Our results suggest that indirect association studies can be a powerful means of detecting allelic associations in outbred populations of species for which genotyping and phenotyping the same individuals is impractical and for life history and behavioral traits that are heavily influenced by environmental variance and therefore best measured on groups of individuals. Indirect association studies are likely to be favored only on purely economical grounds, however, when phenotyping is substantially less expensive than genotyping. A web-based application implementing our expressions has been developed to aid in the design of indirect association studies.  相似文献   

20.
We consider two-stage sampling designs, including so-called nested case control studies, where one takes a random sample from a target population and completes measurements on each subject in the first stage. The second stage involves drawing a subsample from the original sample, collecting additional data on the subsample. This data structure can be viewed as a missing data structure on the full-data structure collected in the second-stage of the study. Methods for analyzing two-stage designs include parametric maximum likelihood estimation and estimating equation methodology. We propose an inverse probability of censoring weighted targeted maximum likelihood estimator (IPCW-TMLE) in two-stage sampling designs and present simulation studies featuring this estimator.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号