首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Two-stage designs for gene-disease association studies   总被引:2,自引:0,他引:2  
The goal of this article is to describe a two-stage design that maximizes the power to detect gene-disease associations when the principal design constraint is the total cost, represented by the total number of gene evaluations rather than the total number of individuals. In the first stage, all genes of interest are evaluated on a subset of individuals. The most promising genes are then evaluated on additional subjects in the second stage. This will eliminate wastage of resources on genes unlikely to be associated with disease based on the results of the first stage. We consider the case where the genes are correlated and the case where the genes are independent. Using simulation results, it is shown that, as a general guideline when the genes are independent or when the correlation is small, utilizing 75% of the resources in stage 1 to screen all the markers and evaluating the most promising 10% of the markers with the remaining resources provides near-optimal power for a broad range of parametric configurations. This translates to screening all the markers on approximately one quarter of the required sample size in stage 1.  相似文献   

2.
Zhao Y  Wang S 《Human heredity》2009,67(1):46-56
Study cost remains the major limiting factor for genome-wide association studies due to the necessity of genotyping a large number of SNPs for a large number of subjects. Both DNA pooling strategies and two-stage designs have been proposed to reduce genotyping costs. In this study, we propose a cost-effective, two-stage approach with a DNA pooling strategy. During stage I, all markers are evaluated on a subset of individuals using DNA pooling. The most promising set of markers is then evaluated with individual genotyping for all individuals during stage II. The goal is to determine the optimal parameters (pi(p)(sample ), the proportion of samples used during stage I with DNA pooling; and pi(p)(marker ), the proportion of markers evaluated during stage II with individual genotyping) that minimize the cost of a two-stage DNA pooling design while maintaining a desired overall significance level and achieving a level of power similar to that of a one-stage individual genotyping design. We considered the effects of three factors on optimal two-stage DNA pooling designs. Our results suggest that, under most scenarios considered, the optimal two-stage DNA pooling design may be much more cost-effective than the optimal two-stage individual genotyping design, which use individual genotyping during both stages.  相似文献   

3.
The affected-pedigree-member (APM) method of linkage analysis is a nonparametric statistic that tests for nonrandom cosegregation of a disease and marker loci. The APM statistic is based on the observation that if a marker locus is near a disease-susceptibility locus, then affected individuals within a family should be more similar at the marker locus than is expected by chance. The APM statistic measures marker similarity in terms of identity by state (IBS) of marker alleles; that is, two alleles are IBS if they are the same, regardless of their ancestral origin. Since the APM statistic measures increased marker similarity, it makes no assumptions concerning how the disease is inherited; this can be an advantage when dealing with complex diseases for which the mode of inheritance is difficult to determine. We investigate here the power of the APM statistic to detect linkage in the context of a genomewide search. In such a search, the APM statistic is evaluated at a grid of markers. Then regions with high APM statistics are investigated more thoroughly by typing more markers in the region. Using simulated data, we investigate various search strategies and recommend an optimal search strategy that maximizes the power to detect linkage while minimizing the false-positive rate and number of markers. We determine an optimal series of three increasing cut-points and an independent criterion for significance.  相似文献   

4.
Genomewide association studies (GWAS) are being conducted to unravel the genetic etiology of complex diseases, in which complex epistasis may play an important role. One-stage method in which interactions are tested using all samples at one time may be computationally problematic, may have low power as the number of markers tested increases and may not be cost-efficient. A common two-stage method may be a reasonable and powerful approach for detecting interacting genes using all samples in both two stages. In this study, we introduce an alternative two-stage method, in which some promising markers are selected using a proportion of samples in the first stage and interactions are then tested using the remaining samples in the second stage. This two-stage method is called mixed two-stage method. We then investigate the power of both one-stage method and mixed two-stage method to detect interacting disease loci for a range of two-locus epistatic models in a case-control study design. Our results suggest that mixed two-stage method may be more powerful than one-stage method if we choose about 30% of samples for single-locus tests in the first stage, and identify less than and equal to 1% of markers for follow-up interaction tests. In addition, we compare both two-stage methods and find that our two-stage method will lose power because we only use part of samples in both two stages.  相似文献   

5.
Two-stage designs in case-control association analysis   总被引:1,自引:0,他引:1       下载免费PDF全文
Zuo Y  Zou G  Zhao H 《Genetics》2006,173(3):1747-1760
DNA pooling is a cost-effective approach for collecting information on marker allele frequency in genetic studies. It is often suggested as a screening tool to identify a subset of candidate markers from a very large number of markers to be followed up by more accurate and informative individual genotyping. In this article, we investigate several statistical properties and design issues related to this two-stage design, including the selection of the candidate markers for second-stage analysis, statistical power of this design, and the probability that truly disease-associated markers are ranked among the top after second-stage analysis. We have derived analytical results on the proportion of markers to be selected for second-stage analysis. For example, to detect disease-associated markers with an allele frequency difference of 0.05 between the cases and controls through an initial sample of 1000 cases and 1000 controls, our results suggest that when the measurement errors are small (0.005), approximately 3% of the markers should be selected. For the statistical power to identify disease-associated markers, we find that the measurement errors associated with DNA pooling have little effect on its power. This is in contrast to the one-stage pooling scheme where measurement errors may have large effect on statistical power. As for the probability that the disease-associated markers are ranked among the top in the second stage, we show that there is a high probability that at least one disease-associated marker is ranked among the top when the allele frequency differences between the cases and controls are not <0.05 for reasonably large sample sizes, even though the errors associated with DNA pooling in the first stage are not small. Therefore, the two-stage design with DNA pooling as a screening tool offers an efficient strategy in genomewide association studies, even when the measurement errors associated with DNA pooling are nonnegligible. For any disease model, we find that all the statistical results essentially depend on the population allele frequency and the allele frequency differences between the cases and controls at the disease-associated markers. The general conclusions hold whether the second stage uses an entirely independent sample or includes both the samples used in the first stage and an independent set of samples.  相似文献   

6.
Multipoint linkage analysis is commonly used to evaluate linkage of a disease to multiple markers in a small region. Multipoint analysis is particularly powerful when the IBD relations of family members at the trait locus are ambiguous. The increased power arises because, unlike single-marker analyses, multipoint analysis uses haplotype information from several markers to infer the IBD relations. We wish to temper this advantage with a cautionary note: multipoint analysis is sensitive to power loss due to misspecification of intermarker distances. Such misspecification is especially problematic when dealing with closely spaced markers. We present computer simulations comparing the power of single-point and multipoint analyses, both when IBD relations are ambiguous, and when the intermarker distances are misspecified. We conclude that when evaluating markers in a small region to confirm or refute previous findings, a situation in which p values of modest statistical significance are important, single marker analyses may provide more reliable measures of the strength of support for linkage than multipoint statistics.  相似文献   

7.
Complex traits, by definition, are the pheonotypic outcome from multiple interacting genes. The traditional analysis of association studies on complex traits is to test one locus at a time, but a better approach is to analyze all markers simultaneously. We previously proposed a two-stage approach, first selecting the influential markers and then modeling main and interaction effects of these markers. Here we introduce alternative approaches to marker selection and discuss issues regarding analytical tools for disease gene mapping, marker selection, and statistical modeling.  相似文献   

8.
Before new markers are thoroughly characterized, they are usually screened for high polymorphism on the basis of a small panel of individuals. Four commonly used screening strategies are compared in terms of their power to correctly classify a marker as having heterozygosity of 70% or higher. A small number of typed individuals (10, say) are shown to provide good discrimination power between low- and high-heterozygosity markers when the markers have a small number of alleles. Characterizing markers in more detail requires larger sample sizes (e.g., at least 80-100 individuals) if there is to be a high probability of detecting most or all alleles. For linkage analyses involving highly polymorphic markers, the practice of arbitrarily assuming equal gene frequencies can cause serious trouble. In the presence of untyped individuals, when gene frequencies are unequal but are assumed to be equal in the analysis, recombination-fraction estimates tend to be badly biased, leading to strong false-positive evidence for linkage.  相似文献   

9.
The design and feasibility of whole-genome-association studies are critically dependent on the extent of linkage disequilibrium (LD) between markers. Although there has been extensive theoretical discussion of this, few empirical data exist. The authors have determined the extent of LD among 38 biallelic markers with minor allele frequencies >.1, since these are most comparable to the common disease-susceptibility polymorphisms that association studies aim to detect. The markers come from three chromosomal regions-1,335 kb on chromosome 13q12-13, 380 kb on chromosome 19q13.2, and 120 kb on chromosome 22q13.3-which have been extensively mapped. These markers were examined in approximately 1,600 individuals from four populations, all of European origin but with different demographic histories; Afrikaners, Ashkenazim, Finns, and East Anglian British. There are few differences, either in allele frequencies or in LD, among the populations studied. A similar inverse relationship was found between LD and distance in each genomic region and in each population. Mean D' is.68 for marker pairs <5 kb apart and is.24 for pairs separated by 10-20 kb, and the level of LD is not different from that seen in unlinked marker pairs separated by >500 kb. However, only 50% of marker pairs at distances <5 kb display sufficient LD (delta>.3) to be useful in association studies. Results of the present study, if representative of the whole genome, suggest that a whole-genome scan searching for common disease-susceptibility alleles would require markers spaced < or = 5 kb apart.  相似文献   

10.
Association studies in consanguineous populations.   总被引:2,自引:0,他引:2       下载免费PDF全文
To study the genetic determinism of multifactorial diseases in large panmictic populations, a strategy consists in looking for an association with markers closely linked to candidate genes. A distribution of marker genotypes different in patients and controls may indicate that the candidate gene is involved in the disease. In panmictic populations, the power to detect the role of a candidate gene depends on the gametic disequilibrium with the marker locus. In consanguineous populations, we show that it depends on the inbreeding coefficient F as well. Inbreeding increases the power to detect the role of a recessive or quasi-recessive disease-susceptibility factor. The gain in power turns out to be greater for small values of the gametic disequilibrium. Moreover, even in the absence of gametic disequilibrium, the presence of inbreeding may allow to detect the role of a recessive factor. Ignoring inbreeding when it exists may lead to reject falsely a recessive model if the mode of inheritance is inferred on the distribution of genotypes among patients.  相似文献   

11.
Summary Many studies have shown that segregating quantitative trait loci (QTL) can be detected via linkage to genetic markers. Power to detect a QTL effect on the trait mean as a function of the number of individuals genotyped for the marker is increased by selectively genotyping individuals with extreme values for the quantitative trait. Computer simulations were employed to study the effect of various sampling strategies on the statistical power to detect QTL variance effects. If only individuals with extreme phenotypes for the quantitative trait are selected for genotyping, then power to detect a variance effect is less than by random sampling. If 0.2 of the total number of individuals genotyped are selected from the center of the distribution, then power to detect a variance effect is equal to that obtained with random selection. Power to detect a variance effect was maximum when 0.2 to 0.5 of the individuals selected for genotyping were selected from the tails of the distribution and the remainder from the center.  相似文献   

12.
Technological developments allow increasing numbers of markers to be deployed in case-control studies searching for genetic factors that influence disease susceptibility. However, with vast numbers of markers, true 'hits' may become lost in a sea of false positives. This problem may be particularly acute for infectious diseases, where the control group may contain unexposed individuals with susceptible genotypes. To explore this effect, we used a series of stochastic simulations to model a scenario based loosely on bovine tuberculosis. We find that a candidate gene approach tends to have greater statistical power than studies that use large numbers of single nucleotide polymorphisms (SNPs) in genome-wide association tests, almost regardless of the number of SNPs deployed. Both approaches struggle to detect genetic effects when these are either weak or if an appreciable proportion of individuals are unexposed to the disease when modest sample sizes (250 each of cases and controls) are used, but these issues are largely mitigated if sample sizes can be increased to 2000 or more of each class. We conclude that the power of any genotype-phenotype association test will be improved if the sampling strategy takes account of exposure heterogeneity, though this is not necessarily easy to do.  相似文献   

13.
Studies of hybridization and introgression and, in particular, the identification of admixed individuals in natural populations benefit from the use of diagnostic genetic markers that reliably differentiate pure species from each other and their hybrid forms. Such diagnostic markers are often infrequent in the genomes of closely related species, and genomewide data facilitate their discovery. We used whole‐genome data from Illumina HiSeqS2000 sequencing of two recently diverged (600,000 years) and hybridizing, avian, sister species, the Saltmarsh (Ammodramus caudacutus) and Nelson's (A. nelsoni) Sparrow, to develop a suite of diagnostic markers for high‐resolution identification of pure and admixed individuals. We compared the microsatellite repeat regions identified in the genomes of the two species and selected a subset of 37 loci that differed between the species in repeat number. We screened these loci on 12 pure individuals of each species and report on the 34 that successfully amplified. From these, we developed a panel of the 12 most diagnostic loci, which we evaluated on 96 individuals, including individuals from both allopatric populations and sympatric individuals from the hybrid zone. Using simulations, we evaluated the power of the marker panel for accurate assignments of individuals to their appropriate pure species and hybrid genotypic classes (F1, F2, and backcrosses). The markers proved highly informative for species discrimination and had high accuracy for classifying admixed individuals into their genotypic classes. These markers will aid future investigations of introgressive hybridization in this system and aid conservation efforts aimed at monitoring and preserving pure species. Our approach is transferable to other study systems consisting of closely related and incipient species.  相似文献   

14.
Mapping multiple Quantitative Trait Loci by Bayesian classification   总被引:2,自引:0,他引:2       下载免费PDF全文
Zhang M  Montooth KL  Wells MT  Clark AG  Zhang D 《Genetics》2005,169(4):2305-2318
We developed a classification approach to multiple quantitative trait loci (QTL) mapping built upon a Bayesian framework that incorporates the important prior information that most genotypic markers are not cotransmitted with a QTL or their QTL effects are negligible. The genetic effect of each marker is modeled using a three-component mixture prior with a class for markers having negligible effects and separate classes for markers having positive or negative effects on the trait. The posterior probability of a marker's classification provides a natural statistic for evaluating credibility of identified QTL. This approach performs well, especially with a large number of markers but a relatively small sample size. A heat map to visualize the results is proposed so as to allow investigators to be more or less conservative when identifying QTL. We validated the method using a well-characterized data set for barley heading values from the North American Barley Genome Mapping Project. Application of the method to a new data set revealed sex-specific QTL underlying differences in glucose-6-phosphate dehydrogenase enzyme activity between two Drosophila species. A simulation study demonstrated the power of this approach across levels of trait heritability and when marker data were sparse.  相似文献   

15.
Information on statistical power is critical when planning investigations and evaluating empirical data, but actual power estimates are rarely presented in population genetic studies. We used computer simulations to assess and evaluate power when testing for genetic differentiation at multiple loci through combining test statistics or P values obtained by four different statistical approaches, viz. Pearson's chi-square, the log-likelihood ratio G-test, Fisher's exact test, and an F(ST)-based permutation test. Factors considered in the comparisons include the number of samples, their size, and the number and type of genetic marker loci. It is shown that power for detecting divergence may be substantial for frequently used sample sizes and sets of markers, also at quite low levels of differentiation. The choice of statistical method may be critical, though. For multi-allelic loci such as microsatellites, combining exact P values using Fisher's method is robust and generally provides a high resolving power. In contrast, for few-allele loci (e.g. allozymes and single nucleotide polymorphisms) and when making pairwise sample comparisons, this approach may yield a remarkably low power. In such situations chi-square typically represents a better alternative. The G-test without Williams's correction frequently tends to provide an unduly high proportion of false significances, and results from this test should be interpreted with great care. Our results are not confined to population genetic analyses but applicable to contingency testing in general.  相似文献   

16.
Pedigree and marker data from a multiple-generation pig selection experiment have been analysed to screen for loci affecting quantitative traits (QTL). Pigs from a base population were selected either for low backfat thickness at fixed live weight (L-line) or high live weight at fixed age (F-line). Selection was based on single-trait own performance and DNA was available on selected individuals only. Genotypes for three marker loci with known positions on chromosome 4 were available. The transmission/disequilibrium test (TDT) was originally described in human genetics to test for linkage between a genetic marker and a disease-susceptibility locus, in the presence of association. Here, we adapt the TDT to test for linkage between a marker and QTL favoured by selection, and for linkage disequilibrium between them in the base population. The a priori unknown distribution of the test statistic under the null hypothesis, no linkage, was obtained via Monte Carlo simulation. Significant TDT statistics were found for markers AFABP and SW818 in the F-line, indicating the presence of a closely linked QTL affecting growth performance. In the L-line, none of the markers studied showed significance. This study emphasizes the potential of the TDT as a quick and simple approach to screen for QTL in situations where marker genotypes are available on selected individuals. The results suggest that previously identified QTL in crosses of genetically diverse breeds may also segregate in commercial selection lines.  相似文献   

17.
Kim S  Zhang K  Sun F 《BMC genetics》2003,4(Z1):S9
Complex diseases are generally caused by intricate interactions of multiple genes and environmental factors. Most available linkage and association methods are developed to identify individual susceptibility genes assuming a simple disease model blind to any possible gene - gene and gene - environmental interactions. We used a set association method that uses single-nucleotide polymorphism markers to locate genetic variation responsible for complex diseases in which multiple genes are involved. Here we extended the set association method from bi-allelic to multiallelic markers. In addition, we studied the type I error rates and power for both approaches using simulations based on the coalescent process. Both bi-allelic set association (BSA) and multiallelic set association (MSA) tests have the correct type I error rates. In addition, BSA and MSA can have more power than individual marker analysis when multiple genes are involved in a complex disease. We applied the MSA approach to the simulated data sets from Genetic Analysis Workshop 13. High cholesterol level was used as the definitive phenotype for a disease. MSA failed to detect markers with significant linkage disequilibrium with genes responsible for cholesterol level. This is due to the wide spacing between the markers and the lack of association between the marker loci and the simulated phenotype.  相似文献   

18.
In hybrid studies, potential for error is high when classifying genealogical origins of individuals (e.g., parental, F1, F2) based on their genotypic arrays. For codominant markers, previous researchers have considered the probability of misclassification by genotypic inspection and proposed alternative maximum-likelihood approaches to estimating genealogical class frequencies. Recently developed dominant marker systems may significantly increase the number of diagnostic loci available for hybrid studies. I examine probabilities of classification error based on the number of dominant loci. As in earlier studies, I assume that only parental and first- and second-generation hybrid crosses between two taxa potentially exist. Thirteen loci with dominant expression from each parental taxon (i.e., 26 total loci) are needed to reduce classification error below 5% for F2 individuals, compared to 13 codominant loci for the same error rate. Use of loci in similar numbers from both taxa most efficiently increases power to characterize all genealogical classes. In contrast, classification of backcrosses to one parental taxon is wholly dependent on loci from the other taxon. Use of dominant diagnostic markers may increase the power and expand the use of maximum-likelihood methods for evaluating hybrid mixtures.  相似文献   

19.
ABSTRACT: BACKGROUND: For gene expression or gene association studies with a large number of hypotheses the number of measurements per marker in a conventional single-stage design is often low due to limited resources. Two-stage designs have been proposed where in a first stage promising hypotheses are identified and further investigated in the second stage with larger sample sizes. For two types of two-stage designs proposed in the literature we derive multiple testing procedures controlling the False Discovery Rate (FDR) demonstrating FDR control by simulations: designs where a fixed number of top-ranked hypotheses are selected and designs where the selection in the interim analysis is based on an FDR threshold. In contrast to earlier approaches which use only the second-stage data in the hypothesis tests (pilot approach), the proposed testing procedures are based on the pooled data from both stages (integrated approach). Results: For both selection rules the multiple testing procedures control the FDR in the considered simulation scenarios. This holds for the case of independent observations across hypotheses as well as for certain correlation structures. Additionally, we show that in scenarios with small effect sizes the testing procedures based on the pooled data from both stages can give a considerable improvement in power compared to tests based on the second-stage data only. Conclusion: The proposed hypothesis tests provide a tool for FDR control for the considered two-stage designs. Comparing the integrated approaches for both selection rules with the corresponding pilot approaches showed an advantage of the integrated approach in many simulation scenarios.  相似文献   

20.
Liu W  Zhao W  Chase GA 《Human heredity》2006,61(1):31-44
OBJECTIVE: Single nucleotide polymorphisms (SNPs) serve as effective markers for localizing disease susceptibility genes, but current genotyping technologies are inadequate for genotyping all available SNP markers in a typical linkage/association study. Much attention has recently been paid to methods for selecting the minimal informative subset of SNPs in identifying haplotypes, but there has been little investigation of the effect of missing or erroneous genotypes on the performance of these SNP selection algorithms and subsequent association tests using the selected tagging SNPs. The purpose of this study is to explore the effect of missing genotype or genotyping error on tagging SNP selection and subsequent single marker and haplotype association tests using the selected tagging SNPs. METHODS: Through two sets of simulations, we evaluated the performance of three tagging SNP selection programs in the presence of missing or erroneous genotypes: Clayton's diversity based program htstep, Carlson's linkage disequilibrium (LD) based program ldSelect, and Stram's coefficient of determination based program tagsnp.exe. RESULTS: When randomly selected known loci were relabeled as 'missing', we found that the average number of tagging SNPs selected by all three algorithms changed very little and the power of subsequent single marker and haplotype association tests using the selected tagging SNPs remained close to the power of these tests in the absence of missing genotype. When random genotyping errors were introduced, we found that the average number of tagging SNPs selected by all three algorithms increased. In data sets simulated according to the haplotype frequecies in the CYP19 region, Stram's program had larger increase than Carlson's and Clayton's programs. In data sets simulated under the coalescent model, Carlson's program had the largest increase and Clayton's program had the smallest increase. In both sets of simulations, with the presence of genotyping errors, the power of the haplotype tests from all three programs decreased quickly, but there was not much reduction in power of the single marker tests. CONCLUSIONS: Missing genotypes do not seem to have much impact on tagging SNP selection and subsequent single marker and haplotype association tests. In contrast, genotyping errors could have severe impact on tagging SNP selection and haplotype tests, but not on single marker tests.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号