首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Two-stage designs in case-control association analysis   总被引:1,自引:0,他引:1       下载免费PDF全文
Zuo Y  Zou G  Zhao H 《Genetics》2006,173(3):1747-1760
DNA pooling is a cost-effective approach for collecting information on marker allele frequency in genetic studies. It is often suggested as a screening tool to identify a subset of candidate markers from a very large number of markers to be followed up by more accurate and informative individual genotyping. In this article, we investigate several statistical properties and design issues related to this two-stage design, including the selection of the candidate markers for second-stage analysis, statistical power of this design, and the probability that truly disease-associated markers are ranked among the top after second-stage analysis. We have derived analytical results on the proportion of markers to be selected for second-stage analysis. For example, to detect disease-associated markers with an allele frequency difference of 0.05 between the cases and controls through an initial sample of 1000 cases and 1000 controls, our results suggest that when the measurement errors are small (0.005), approximately 3% of the markers should be selected. For the statistical power to identify disease-associated markers, we find that the measurement errors associated with DNA pooling have little effect on its power. This is in contrast to the one-stage pooling scheme where measurement errors may have large effect on statistical power. As for the probability that the disease-associated markers are ranked among the top in the second stage, we show that there is a high probability that at least one disease-associated marker is ranked among the top when the allele frequency differences between the cases and controls are not <0.05 for reasonably large sample sizes, even though the errors associated with DNA pooling in the first stage are not small. Therefore, the two-stage design with DNA pooling as a screening tool offers an efficient strategy in genomewide association studies, even when the measurement errors associated with DNA pooling are nonnegligible. For any disease model, we find that all the statistical results essentially depend on the population allele frequency and the allele frequency differences between the cases and controls at the disease-associated markers. The general conclusions hold whether the second stage uses an entirely independent sample or includes both the samples used in the first stage and an independent set of samples.  相似文献   

2.
This study represents the first attempt at an empirical evaluation of the DNA pooling methodology by comparing it to individual genotyping and interval mapping to detect QTL in a dairy half-sib design. The findings indicated that the use of peak heights from the pool electropherograms without correction for stutter (shadow) product and preferential amplification performed as well as corrected estimates of frequencies. However, errors were found to decrease the power of the experiment at every stage of the pooling and analysis. The main sources of errors include technical errors from DNA quantification, pool construction, inconsistent differential amplification, and from the prevalence of sire alleles in the dams. Additionally, interval mapping using individual genotyping gains information from phenotypic differences between individuals in the same pool and from neighbouring markers, which is lost in a DNA pooling design. These errors cause some differences between the markers detected as significant by pooling and those found significant by interval mapping based on individual selective genotyping. Therefore, it is recommended that pooled genotyping only be used as part of an initial screen with significant results to be confirmed by individual genotyping. Strategies for improving the efficiency of the DNA pooling design are also presented.  相似文献   

3.
Chi XF  Lou XY  Yang MC  Shu QY 《Genetica》2009,135(3):267-281
We present a cost-effective DNA pooling strategy for fine mapping of a single Mendelian gene in controlled crosses. The theoretical argument suggests that it is potentially possible for a single-stage pooling approach to reduce the overall experimental expense considerably by balancing costs for genotyping and sample collection. Further, the genotyping burden can be reduced through multi-stage pooling. Numerical results are provided for practical guidelines. For example, the genotyping effort can be reduced to only a small fraction of that needed for individual genotyping at a small loss of estimation accuracy or at a cost of increasing sample sizes slightly when recombination rates are 0.5% or less. An optimal two-stage pooling scheme can reduce the amount of genotyping to 19.5%, 14.5% and 6.4% of individual genotyping efforts for identifying a gene within 1, 0.5, and 0.1 cM, respectively. Finally, we use a genetic data set for mapping the rice xl(t) gene to demonstrate the feasibility and efficiency of the DNA pooling strategy. Taken together, the results demonstrate that this DNA pooling strategy can greatly reduce the genotyping burden and the overall cost in fine mapping experiments.  相似文献   

4.
DNA Pooling: a tool for large-scale association studies   总被引:1,自引:0,他引:1  
DNA pooling is a practical way to reduce the cost of large-scale association studies to identify susceptibility loci for common diseases. Pooling allows allele frequencies in groups of individuals to be measured using far fewer PCR reactions and genotyping assays than are used when genotyping individuals. Here, we discuss recent developments in quantitative genotyping assays and in the design and analysis of pooling studies. Sophisticated pooling designs are being developed that can take account of hidden population stratification, confounders and inter-loci interactions, and that allow the analysis of haplotypes.  相似文献   

5.
One of the key steps in positional cloning and marker-aided selection is to identify marker(s) tightly linked to the target gene (i.e., fine mapping). Selective genotyping such as selective recombinant genotyping (SRG) is commonly used in fine mapping for cost-saving. To further decrease genotyping effort and rapidly screen for tightly linked markers, we propose here a combined DNA pooling and SRG strategy. A two-stage pooled genotyping can be used for identifying recombinants between a pair of flanking markers more efficiently, and a joint use of bulked DNA analysis and two-stage pooling can also save cost for genotyping recombinants. The combined DNA pooling and SRG strategy can further be extended to fine mapping for polygenic traits. The numerical results based on hypothetical scenarios and an illustrative application to fine mapping of a mutant gene, called xl(t), in rice suggest that the proposed strategy can remarkably reduce genotyping amount compared with the conventional SRG.  相似文献   

6.
Gene-disease association studies based on case-control designs may often be used to identify candidate polymorphisms (markers) conferring disease risk. If a large number of markers are studied, genotyping all markers on all samples is inefficient in resource utilization. Here, we propose an alternative two-stage method to identify disease-susceptibility markers. In the first stage all markers are evaluated on a fraction of the available subjects. The most promising markers are then evaluated on the remaining individuals in Stage 2. This approach can be cost effective since markers unlikely to be associated with the disease can be eliminated in the first stage. Using simulations we show that, when the markers are independent and when they are correlated, the two-stage approach provides a substantial reduction in the total number of marker evaluations for a minimal loss of power. The power of the two-stage approach is evaluated when a single marker is associated with the disease, and in the presence of multiple disease-susceptibility markers. As a general guideline, the simulations over a wide range of parametric configurations indicate that evaluating all the markers on 50% of the individuals in Stage 1 and evaluating the most promising 10% of the markers on the remaining individuals in Stage 2 provides near-optimal power while resulting in a 45% decrease in the total number of marker evaluations.  相似文献   

7.
Using striped bass (Morone saxatilis) and six multiplexed microsatellite markers, we evaluated procedures for estimating allele frequencies by pooling DNA from multiple individuals, a method suggested as cost-effective relative to individual genotyping. Using moment-based estimators, we estimated allele frequencies in experimental DNA pools and found that the three primary laboratory steps, DNA quantitation and pooling, PCR amplification, and electrophoresis, accounted for 23, 48, and 29%, respectively, of the technical variance of estimates in pools containing DNA from 2-24 individuals. Exact allele-frequency estimates could be made for pools of sizes 2-8, depending on the locus, by using an integer-valued estimator. Larger pools of size 12 and 24 tended to yield biased estimates; however, replicates of these estimates detected allele frequency differences among pools with different allelic compositions. We also derive an unbiased estimator of Hardy-Weinberg disequilibrium coefficients that uses multiple DNA pools and analyze the cost-efficiency of DNA pooling. DNA pooling yields the most potential cost savings when a large number of loci are employed using a large number of individuals, a situation becoming increasingly common as microsatellite loci are developed in increasing numbers of taxa.  相似文献   

8.
A. Darvasi  M. Soller 《Genetics》1994,138(4):1365-1373
Selective genotyping is a method to reduce costs in marker-quantitative trait locus (QTL) linkage determination by genotyping only those individuals with extreme, and hence most informative, quantitative trait values. The DNA pooling strategy (termed: ``selective DNA pooling') takes this one step further by pooling DNA from the selected individuals at each of the two phenotypic extremes, and basing the test for linkage on marker allele frequencies as estimated from the pooled samples only. This can reduce genotyping costs of marker-QTL linkage determination by up to two orders of magnitude. Theoretical analysis of selective DNA pooling shows that for experiments involving backcross, F(2) and half-sib designs, the power of selective DNA pooling for detecting genes with large effect, can be the same as that obtained by individual selective genotyping. Power for detecting genes with small effect, however, was found to decrease strongly with increase in the technical error of estimating allele frequencies in the pooled samples. The effect of technical error, however, can be markedly reduced by replication of technical procedures. It is also shown that a proportion selected of 0.1 at each tail will be appropriate for a wide range of experimental conditions.  相似文献   

9.
Genome-wide genotyping of a cohort using pools rather than individual samples has long been proposed as a cost-saving alternative for performing genome-wide association (GWA) studies. However, successful disease gene mapping using pooled genotyping has thus far been limited to detecting common variants with large effect sizes, which tend not to exist for many complex common diseases or traits. Therefore, for DNA pooling to be a viable strategy for conducting GWA studies, it is important to determine whether commonly used genome-wide SNP array platforms such as the Affymetrix 6.0 array can reliably detect common variants of small effect sizes using pooled DNA. Taking obesity and age at menarche as examples of human complex traits, we assessed the feasibility of genome-wide genotyping of pooled DNA as a single-stage design for phenotype association. By individually genotyping the top associations identified by pooling, we obtained a 14- to 16-fold enrichment of SNPs nominally associated with the phenotype, but we likely missed the top true associations. In addition, we assessed whether genotyping pooled DNA can serve as an inexpensive screen as the second stage of a multi-stage design with a large number of samples by comparing the most cost-effective 3-stage designs with 80% power to detect common variants with genotypic relative risk of 1.1, with and without pooling. Given the current state of the specific technology we employed and the associated genotyping costs, we showed through simulation that a design involving pooling would be 1.07 times more expensive than a design without pooling. Thus, while a significant amount of information exists within the data from pooled DNA, our analysis does not support genotyping pooled DNA as a means to efficiently identify common variants contributing small effects to phenotypes of interest. While our conclusions were based on the specific technology and study design we employed, the approach presented here will be useful for evaluating the utility of other or future genome-wide genotyping platforms in pooled DNA studies.  相似文献   

10.
Case-control association studies often suffer from population stratification bias. A previous triple combination strategy of stratum matching, genomic controlling, and multiple DNA pooling can correct the bias and save genotyping cost. However the method requires researchers to prepare a multitude of DNA pools—more than 30 case-control pooling sets in total (polyset). In this paper, the authors propose a permutation test for oligoset DNA pooling studies. Monte-Carlo simulations show that the proposed test has a type I error rate under control and a power comparable to that of individual genotyping. For a researcher on a tight budget, oligoset DNA pooling is a viable option.  相似文献   

11.
A strategy of DNA pooling aimed at identifying markers linked to quantitative trait loci (QTLs), ‘Sequential Bulked Typing’ (SBT), is presented. The method proposed consists in pooling DNA from consecutive pairs of individuals ranked phenotypically, i.e., pools are formed with individuals ranked (1st, 2nd), (3rd, 4th),…, (N-1st, Nth). The N/2 pools are subsequently amplified using the polymerase chain reaction (PCR). If the whole population is typed the number of PCRs per marker is halved with respect to individual typing (IT). But if this strategy is combined with selective genotyping of extreme individuals savings can be further increased. Two extreme cases are considered: in the first one (SBT0), it is assumed that only presence or absence of a given allele can be ascertained in a pool; in the second one (SBT1), it is further assumed that differences between allele band intensities can be distinguished. The theory to estimate by maximum likelihood the QTL effect and its position with respect to flanking markers is presented. The behaviour of IT and SBT was studied using stochastic computer simulation in backcross and F2 populations. Three percentages of subpattern distinction (0, 50 and 100%) two population sizes (n=1200 and 600) and two QTL effects (a=0.1 and 0.25 standard deviations) were considered. SBT1 had the same power as individual genotyping at half the genotyping costs in all situations studied. Accuracy of QTL location is not increased with a dense number of markers, as opposed to individual typing. As a result DNA pooling is not useful for accurate location of the QTL but rather to pick up genome regions containing QTLs of at least moderate effect. The theory developed provides the general theoretical framework to deal with any DNA pooling strategy aimed at detecting QTLs. Received: 15 September 1997 / Accepted: 6 October 1997  相似文献   

12.
"Selective DNA pooling" accomplishes quantitative trait locus (QTL) mapping through densitometric estimates of marker allele frequencies in pooled DNA samples of phenotypically extreme individuals. With poly(TG) microsatellites, such estimates are confounded by "shadow" ("stutter") bands. A correction procedure was developed on the basis of an observed linear regression between shadow band intensity and allele TG repeat number. Using this procedure, a selective DNA pooling study with respect to milk protein percentage was implemented in Israel-Holstein dairy cattle. Pools were prepared from milk samples of high and low daughters of each of seven sires and genotyped with respect to 11 markers. Highly significant associations with milk protein percentage were found for 5 of the markers; 4 of these markers confirmed previous reports. Selective DNA pooling accessed 80.6 and 48.3%, respectively, of the information that would have been available through individual selective genotyping or total population genotyping. In effect, the statistical power of 45,600 individual genotypings was obtained from 328 pool genotypings. This methodology can make genome-wide mapping of QTL accessible to moderately sized breeding organizations.  相似文献   

13.
There is an increasing interest in the use of two-stage case-control studies to reduce genotyping costs in the search for genes underlying common disorders. Instead of analyzing the data from the second stage separately, a more powerful test can be performed by combining the data from both stages. However, standard tests cannot be used because only the markers that are significant in the first stage are selected for the second stage and the test statistics at both stages are dependent because they partly involve the same data. Theoretical approximations are not available for commonly used test statistics and in this specific context simulations can be problematic because of the computational burden. We therefore derived a cost-effective, that is, accurate but fast in terms of central processing unit (CPU) time, approximation for the distribution of Pearson's statistic on 2 xm contingency tables in two-stage design with combined data. We included this approximation in an iterative method for designing optimal two-stage studies. Simulations supported the accuracy of our approximation. Numerical results confirmed that the use of two-stage designs reduces the genotyping burden substantially. Compared to not combining data, combining the data decreases the required sample sizes on average by 15% and the genotyping burden by 5%.  相似文献   

14.
MOTIVATION: Two-stage pilot and integrated designs are powerful tools for investigating large numbers of hypotheses. Asymptotically, optimal two-stage designs controlling the familywise error or false discovery rate are considered when costs and effect sizes per measurement differ between stages and total costs are constrained. RESULTS: Depending on the cost and effect size ratios between the measurements, it is generally more powerful to apply two-stage procedures using one measurement method at both stages. For the practically relevant case that the same method is applied at both stages but designing the second-stage measurements raises extra costs, two-stage designs are more powerful than the single-stage design even for large costs ratios. The power of the optimal pilot and integrated two-stage designs generally are similar, however, the integrated approach is less sensitive even to severe design misspecifications in the planning phase. AVAILABILITY: R-programs (R, 2005) to calculate asymptotically optimal designs are available on: http://statistics.msi.meduniwien.ac.at/index.php?page=ao2stage  相似文献   

15.
We develop expressions for the power to detect associations between parental genotypes and offspring phenotypes for quantitative traits. Three different “indirect” experimental designs are considered: full-sib, half-sib, and full-sib–half-sib families. We compare the power of these designs to detect genotype–phenotype associations relative to the common, “direct,” approach of genotyping and phenotyping the same individuals. When heritability is low, the indirect designs can outperform the direct method. However, the extra power comes at a cost due to an increased phenotyping effort. By developing expressions for optimal experimental designs given the cost of phenotyping relative to genotyping, we show how the extra costs associated with phenotyping a large number of individuals will influence experimental design decisions. Our results suggest that indirect association studies can be a powerful means of detecting allelic associations in outbred populations of species for which genotyping and phenotyping the same individuals is impractical and for life history and behavioral traits that are heavily influenced by environmental variance and therefore best measured on groups of individuals. Indirect association studies are likely to be favored only on purely economical grounds, however, when phenotyping is substantially less expensive than genotyping. A web-based application implementing our expressions has been developed to aid in the design of indirect association studies.  相似文献   

16.
Investigation on QTL-marker linkage usually requires a great number of observed recombinations, inferred from combined analysis of phenotypes and genotypes. To avoid costly individual genotyping, inferences on QTL position and effects can instead make use of marker allele frequencies. DNA pooling of selected samples makes allele frequency estimation feasible for studies involving large sample sizes. Linkage studies in outbred populations have traditionally exploited half-sib family designs; within the animal production context, half-sibships provide large families that are highly suitable for DNA pooling. Estimators for QTL position and effect have been proposed that make use of information from flanking markers. We present formulas derived by the delta method for the asymptotic variance of these estimators.  相似文献   

17.
As an approach to combining the phase II dose finding trial and phase III pivotal trials, we propose a two-stage adaptive design that selects the best among several treatments in the first stage and tests significance of the selected treatment in the second stage. The approach controls the type I error defined as the probability of selecting a treatment and claiming its significance when the selected treatment is indifferent from placebo, as considered in Bischoff and Miller (2005). Our approach uses the conditional error function and allows determining the conditional type I error function for the second stage based on information observed at the first stage in a similar way to that for an ordinary adaptive design without treatment selection. We examine properties such as expected sample size and stage-2 power of this design with a given type I error and a maximum stage-2 sample size under different hypothesis configurations. We also propose a method to find the optimal conditional error function of a simple parametric form to improve the performance of the design and have derived optimal designs under some hypothesis configurations. Application of this approach is illustrated by a hypothetical example.  相似文献   

18.
Association studies using genome scans to identify quantitative trait loci for multifactorial disorders, with anything approaching reasonable power, have been compromised by the need for a very dense array of genetic markers and large numbers of affected individuals. These requirements impose enormous burdens on the genotyping capacity for most laboratories. DNA pooling has been proposed as a possible approach to reduce genotyping costs and effort. We report on the application of the SNaPIT™ technology to evaluate allele frequencies in pooled DNA samples and conclude that it offers a cost effective, efficient and accurate estimator and provides several advantages over competing technologies in this regard.  相似文献   

19.
Association mapping studies aim to determine the genetic basis of a trait. A common experimental design uses a sample of unrelated individuals classified into 2 groups, for example cases and controls. If the trait has a complex genetic basis, consisting of many quantitative trait loci (QTLs), each group needs to be large. Each group must be genotyped at marker loci covering the region of interest; for dense coverage of a large candidate region, or a whole-genome scan, the number of markers will be very large. The total amount of genotyping required for such a study is formidable. A laboratory effort efficient technique called DNA pooling could reduce the amount of genotyping required, but the data generated are less informative and require novel methods for efficient analysis. In this paper, a Bayesian statistical analysis of the classic model of McPeek and Strahs is proposed. In contrast to previous work on this model, I assume that data are collected using DNA pooling, so individual genotypes are not directly observed, and also account for experimental errors. A complete analysis can be performed using analytical integration, a propagation algorithm for a hidden Markov model, and quadrature. The method developed here is both statistically and computationally efficient. It allows simultaneous detection and mapping of a QTL, in a large-scale association mapping study, using data from pooled DNA. The method is shown to perform well on data sets simulated under a realistic coalescent-with-recombination model, and is shown to outperform classical single-point methods. The method is illustrated on data consisting of 27 markers in an 880-kb region around the CYP2D6 gene.  相似文献   

20.
The study of gene functions requires a DNA library of high quality, such a library is obtained from a large mount of testing and screening. Pooling design is a very helpful tool for reducing the number of tests for DNA library screening. In this paper, we present new one- and two-stage pooling designs, together with new probabilistic pooling designs. The approach in this paper works for both error-free and error-tolerance scenarios.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号