首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In genome-wide genetic studies with a large number of markers, balancing the type I error rate and power is a challenging issue. Recently proposed false discovery rate (FDR) approaches are promising solutions to this problem. Using the 100 simulated datasets of a genome-wide marker map spaced about 3 cM and phenotypes from the Genetic Analysis Workshop 14, we studied the type I error rate and power of Storey's FDR approach, and compared it to the traditional Bonferroni procedure. We confirmed that Storey's FDR approach had a strong control of FDR. We found that Storey's FDR approach only provided weak control of family-wise error rate (FWER). For these simulated datasets, Storey's FDR approach only had slightly higher power than the Bonferroni procedure. In conclusion, Storey's FDR approach is more powerful than the Bonferroni procedure if strong control of FDR or weak control of FWER is desired. Storey's FDR approach has little power advantage over the Bonferroni procedure if there is low linkage disequilibrium among the markers. Further evaluation of the type I error rate and power of the FDR approaches for higher linkage disequilibrium and for haplotype analyses is warranted.  相似文献   

2.
MOTIVATION: Multiple hypothesis testing is a common problem in genome research, particularly in microarray experiments and genomewide association studies. Failure to account for the effects of multiple comparisons would result in an abundance of false positive results. The Bonferroni correction and Holm's step-down procedure are overly conservative, whereas the permutation test is time-consuming and is restricted to simple problems. RESULTS: We developed an efficient Monte Carlo approach to approximating the joint distribution of the test statistics along the genome. We then used the Monte Carlo distribution to evaluate the commonly used criteria for error control, such as familywise error rates and positive false discovery rates. This approach is applicable to any data structures and test statistics. Applications to simulated and real data demonstrate that the proposed approach provides accurate error control, and can be substantially more powerful than the Bonferroni and Holm methods, especially when the test statistics are highly correlated.  相似文献   

3.
Sha Q  Zhang X  Zhu X  Zhang S 《Human heredity》2006,62(2):55-63
Admixture mapping, using unrelated individuals from the admixture populations that result from recent mating between members of each parental population, is an efficient approach to localize disease-causing variants that differ in frequency between two or more historically separated populations. Recently, several methods have been proposed to test linkage between a susceptibility gene and a disease locus by using admixture-generated linkage disequilibrium (LD) for each of the genotyped markers. In a genome scan, admixture mapping usually tests 2,000 to 3,000 markers across the genome. Currently, either a very conservative Sidak (or Bonferroni) correction or a very time consuming simulation-based method is used to correct for the multiple tests and evaluate the overall p value. In this report, we propose a computationally efficient analytical approach for correction of the multiple tests and for calculating the overall p value for an admixture genome scan. Except for the Sidak (or Bonferroni) correction, our proposed method is the first analytical approach for correction of the multiple tests and for calculating the overall p value for a genome scan. Our simulation studies show that the proposed method gives correct overall type I error rates for genome scans in all cases, and is much more computationally efficient than simulation-based methods.  相似文献   

4.
Although permutation testing has been the gold standard for assessing significance levels in studies using multiple markers, it is time-consuming. A Bonferroni correction to the nominal p-value that uses the underlying pair-wise linkage disequilibrium (LD) structure among the markers to determine the number of effectively independent tests has recently been proposed. We propose using the number of independent LD blocks plus the number of independent single-nucleotide polymorphisms for correction. Using the Collaborative Study on the Genetics of Alcoholism LD data for chromosome 21, we simulated 1,000 replicates of parent-child trio data under the null hypothesis with two levels of LD: moderate and high. Assuming haplotype blocks were independent, we calculated the number of independent statistical tests using 3 haplotype blocking algorithms. We then compared the type I error rates using a principal components-based method, the three blocking methods, a traditional Bonferroni correction, and the unadjusted p-values obtained from FBAT. Under high LD conditions, the PC method and one of the blocking methods were slightly conservative, whereas the 2 other blocking methods exceeded the target type I error rate. Under conditions of moderate LD, we show that the blocking algorithm corrections are closest to the desired type I error, although still slightly conservative, with the principal components-based method being almost as conservative as the traditional Bonferroni correction.  相似文献   

5.
Zou F  Fine JP  Hu J  Lin DY 《Genetics》2004,168(4):2307-2316
Assessing genome-wide statistical significance is an important and difficult problem in multipoint linkage analysis. Due to multiple tests on the same genome, the usual pointwise significance level based on the chi-square approximation is inappropriate. Permutation is widely used to determine genome-wide significance. Theoretical approximations are available for simple experimental crosses. In this article, we propose a resampling procedure to assess the significance of genome-wide QTL mapping for experimental crosses. The proposed method is computationally much less intensive than the permutation procedure (in the order of 10(2) or higher) and is applicable to complex breeding designs and sophisticated genetic models that cannot be handled by the permutation and theoretical methods. The usefulness of the proposed method is demonstrated through simulation studies and an application to a Drosophila backcross.  相似文献   

6.
Controlling for the multiplicity effect is an essential part of determining statistical significance in large-scale single-locus association genome scans on Single Nucleotide Polymorphisms (SNPs). Bonferroni adjustment is a commonly used approach due to its simplicity, but is conservative and has low power for large-scale tests. The permutation test, which is a powerful and popular tool, is computationally expensive and may mislead in the presence of family structure. We propose a computationally efficient and powerful multiple testing correction approach for Linkage Disequilibrium (LD) based Quantitative Trait Loci (QTL) mapping on the basis of graphical weighted-Bonferroni methods. The proposed multiplicity adjustment method synthesizes weighted Bonferroni-based closed testing procedures into a powerful and versatile graphical approach. By tailoring different priorities for the two hypothesis tests involved in LD based QTL mapping, we are able to increase power and maintain computational efficiency and conceptual simplicity. The proposed approach enables strong control of the familywise error rate (FWER). The performance of the proposed approach as compared to the standard Bonferroni correction is illustrated by simulation and real data. We observe a consistent and moderate increase in power under all simulated circumstances, among different sample sizes, heritabilities, and number of SNPs. We also applied the proposed method to a real outbred mouse HDL cholesterol QTL mapping project where we detected the significant QTLs that were highlighted in the literature, while still ensuring strong control of the FWER.  相似文献   

7.
复杂疾病全基因组关联研究进展——遗传统计分析   总被引:7,自引:0,他引:7  
严卫丽 《遗传》2008,30(5):543-549
2005年, Science杂志首次报道了有关人类年龄相关性黄斑变性的全基因组关联研究, 此后有关肥胖、2型糖尿病、冠心病、阿尔茨海默病等一系列复杂疾病的全基因组关联研究被陆续报道, 这一阶段被称为人类全基因组关联研究的第一次浪潮。文章分别介绍了全基因组关联研究统计分析的方法、软件和应用实例; 比较了关联分析中多重检验的P值调整方法, 包括Bonferroni、递减的Bonferroni校正法、模拟运算法和控制错误发现率的方法; 还讨论了人群混杂对关联分析结果可能产生的影响及原理, 以及全基因组关联研究中控制人群混杂的方法的研究进展和应用实例。在全基因组关联研究的第一次浪潮中, 应用经典的遗传统计方法发现了许多基因-表型之间的关联并且能够对这些关联做出解释, 其中包括许多基因组中的未知基因和染色体区域。然而, 全基因组关联研究的继续发展需要进一步阐述基因组内基因之间相互作用、基因-基因之间的复杂作用网络与环境因素的相互作用在复杂疾病发生中的作用, 现有的统计分析方法肯定不能满足需要, 开发更为高级的统计分析方法势在必行。最后, 文章还给出了全基因组关联研究统计分析软件的相关网站信息。  相似文献   

8.
一种有效的复杂疾病基因定位的检测法   总被引:1,自引:0,他引:1  
连锁不平衡(LD)应用于某些复杂疾病基因的定位,近年来发展了许多LD定位方法,除TDT外,大多数LD定位方法须先假定无人群混和,人群混合可增大在疾病基因定位时犯Ⅰ类错误的机率,产生无效结果。此方法利用LD来检测标记位点和疾病敏感位点(DSL)的连锁(有连锁不平衡)相关(有连锁)。分析时采用不相关样本,已知其父母基因型和至少父母之一为杂合子,再将随机样本依基因型不同分类,然后对来自不同类的数据应用有力的统计方法进行单独和联合分析。此LD定位法不仅适用于患病和正常个体,而且有效消除据父母基因分类的样本定位时人群混合的影响,分析结果和模拟结果也表明此方法解决了在检测标记位点和疾病敏感位点之间的连锁和相关时人群混和的问题,但与TDT比,此法在检测的位点为DSL时丙能有效和充分地利用矫正数据,检测位点不是DSL时,此法和TDT法可相互补充更有效地检测连锁的DSL。  相似文献   

9.
Haplotypes--that is, linear arrangements of alleles on the same chromosome that were inherited as a unit--are expected to carry important information in the context of association fine mapping of complex diseases. In consideration of a set of tightly linked markers, there is an enormous number of different marker combinations that can be analyzed. Therefore, a severe multiple-testing problem is introduced. One method to deal with this problem is Bonferroni correction by the number of combinations that are considered. Bonferroni correction is appropriate for independent tests but will result in a loss of power in the presence of linkage disequilibrium in the region. A second method is to perform simulations. It is unfortunate that most methods of haplotype analysis already require simulations to obtain an uncorrected P value for a specific marker combination. Thus, it seems that nested simulations are necessary to obtain P values that are corrected for multiple testing, which, apparently, limits the applicability of this approach because of computer running-time restrictions. Here, an algorithm is described that avoids such nested simulations. We check the validity of our approach under two disease models for haplotype analysis of family data. The true type I error rate of our algorithm corresponds to the nominal significance level. Furthermore, we observe a strong gain in power with our method to obtain the global P value, compared with the Bonferroni procedure to calculate the global P value. The method described here has been implemented in the latest update of our program FAMHAP.  相似文献   

10.
Jie Liu  Guoxian Yu  Yazhou Ren  Maozu Guo  Jun Wang 《Genomics》2019,111(5):1176-1182
Single nucleotide polymorphism (SNP) interactions can explain the missing heritability of common complex diseases. Many interaction detection methods have been proposed in genome-wide association studies, and they can be divided into two types: population-based and family-based. Compared with population-based methods, family-based methods are robust vs. population stratification. Several family-based methods have been proposed, among which Multifactor Dimensionality Reduction (MDR)-based methods are popular and powerful. However, current MDR-based methods suffer from heavy computational burden. Furthermore, they do not allow for main effect adjustment. In this work we develop a two-stage model-based MDR approach (TrioMDR) to detect multi-locus interaction in trio families (i.e., two parents and one affected child). TrioMDR combines the MDR framework with logistic regression models to check interactions, so TrioMDR can adjust main effects. In addition, unlike consuming permutation procedures used in traditional MDR-based methods, TrioMDR utilizes a simple semi-parameter P-values correction procedure to control type I error rate, this procedure only uses a few permutations to achieve the significance of a multi-locus model and significantly speeds up TrioMDR. We performed extensive experiments on simulated data to compare the type I error and power of TrioMDR under different scenarios. The results demonstrate that TrioMDR is fast and more powerful in general than some recently proposed methods for interaction detection in trios. The R codes of TrioMDR are available at: https://github.com/TrioMDR/TrioMDR.  相似文献   

11.
Determination of the relevance of both demanding classical epidemiologic criteria for control selection and robust handling of population stratification (PS) represents a major challenge in the design and analysis of genome-wide association studies (GWAS). Empirical data from two GWAS in European Americans of the Cancer Genetic Markers of Susceptibility (CGEMS) project were used to evaluate the impact of PS in studies with different control selection strategies. In each of the two original case-control studies nested in corresponding prospective cohorts, a minor confounding effect due to PS (inflation factor lambda of 1.025 and 1.005) was observed. In contrast, when the control groups were exchanged to mimic a cost-effective but theoretically less desirable control selection strategy, the confounding effects were larger (lambda of 1.090 and 1.062). A panel of 12,898 autosomal SNPs common to both the Illumina and Affymetrix commercial platforms and with low local background linkage disequilibrium (pair-wise r(2)<0.004) was selected to infer population substructure with principal component analysis. A novel permutation procedure was developed for the correction of PS that identified a smaller set of principal components and achieved a better control of type I error (to lambda of 1.032 and 1.006, respectively) than currently used methods. The overlap between sets of SNPs in the bottom 5% of p-values based on the new test and the test without PS correction was about 80%, with the majority of discordant SNPs having both ranks close to the threshold. Thus, for the CGEMS GWAS of prostate and breast cancer conducted in European Americans, PS does not appear to be a major problem in well-designed studies. A study using suboptimal controls can have acceptable type I error when an effective strategy for the correction of PS is employed.  相似文献   

12.
OBJECTIVE: The potential value of haplotypes has attracted widespread interest in the mapping of complex traits. Haplotype sharing methods take the linkage disequilibrium information between multiple markers into account, and may have good power to detect predisposing genes. We present a new approach based on Mantel statistics for spacetime clustering, which is developed in order to improve the power of haplotype sharing analysis for gene mapping in complex disease. METHODS: The new statistic correlates genetic similarity and phenotypic similarity across pairs of haplotypes for case-only and case-control studies. The genetic similarity is measured as the shared length between haplotypes around a putative disease locus. The phenotypic similarity is measured as the mean-corrected cross-product based on the respective phenotypes. We analyzed two tests for statistical significance with respect to type I error: (1) assuming asymptotic normality, and (2) using a Monte Carlo permutation procedure. The results were compared to the chi(2) test for association based on 3-marker haplotypes. RESULTS: The results of the type I error rates for the Mantel statistics using the permutational procedure yielded pointwise valid tests. The approach based on the assumption of asymptotic normality was seriously liberal. CONCLUSION: Power comparisons showed that the Mantel statistics were better than or equal to the chi(2) test for all simulated disease models.  相似文献   

13.
Because of the need for fine mapping of disease loci and the availability of dense single-nucleotide-polymorphism markers, many forms of association tests have been developed. Most of them are applicable only to triads, whereas some are amenable to nuclear families (sibships). Although there are a number of methods that can deal with extended families (e.g., the pedigree disequilibrium test [PDT]), most of them cannot accommodate incomplete data. Furthermore, despite a large body of literature on association mapping, only a very limited number of publications are applicable to X-chromosomal markers. In this report, we first extend the PDT to markers on the X chromosome for testing linkage disequilibrium in the presence of linkage. This method is applicable to any pedigree structure and is termed "X-chromosomal pedigree disequilibrium test" (XPDT). We then further extend the XPDT to accommodate pedigrees with missing genotypes in some of the individuals, especially founders. Monte Carlo (MC) samples of the missing genotypes are generated and used to calculate the XMCPDT (X-chromosomal MC PDT) statistic, which is defined as the conditional expectation of the XPDT statistic given the incomplete (observed) data. This MC version of the XPDT remains a valid test for association under linkage with the assumption that the pedigrees and their associated affection patterns are drawn randomly from a population of pedigrees with at least one affected offspring. This set of methods was compared with existing approaches through simulation, and substantial power gains were observed in all settings considered, with type I error rates closely tracking their nominal values.  相似文献   

14.
MOTIVATION: Although population-based association mapping may be subject to the bias caused by population stratification, alternative methods that are robust to population stratification such as family-based linkage analysis have lower mapping resolution. Recently, various statistical methods robust to population stratification were proposed for association studies, using unrelated individuals to identify associations between candidate genes and traits of interest. The association between a candidate gene and a quantitative trait is often evaluated via a regression model with inferred population structure variables as covariates, where the residual distribution is customarily assumed to be from a symmetric and unimodal parametric family, such as a Gaussian, although this may be inappropriate for the analysis of many real-life datasets. RESULTS: In this article, we proposed a new structured association (SA) test. Our method corrects for continuous population stratification by first deriving population structure and kinship matrices through a set of random genetic markers and then modeling the relationship between trait values, genotypic scores at a candidate marker and genetic background variables through a semiparametric model, where the error distribution is modeled as a mixture of Polya trees centered around a normal family of distributions. We compared our model to the existing SA tests in terms of model fit, type I error rate, power, precision and accuracy by application to a real dataset as well as simulated datasets.  相似文献   

15.
An Arabidopsis example of association mapping in structured samples   总被引:6,自引:0,他引:6       下载免费PDF全文
A potentially serious disadvantage of association mapping is the fact that marker-trait associations may arise from confounding population structure as well as from linkage to causative polymorphisms. Using genome-wide marker data, we have previously demonstrated that the problem can be severe in a global sample of 95 Arabidopsis thaliana accessions, and that established methods for controlling for population structure are generally insufficient. Here, we use the same sample together with a number of flowering-related phenotypes and data-perturbation simulations to evaluate a wider range of methods for controlling for population structure. We find that, in terms of reducing the false-positive rate while maintaining statistical power, a recently introduced mixed-model approach that takes genome-wide differences in relatedness into account via estimated pairwise kinship coefficients generally performs best. By combining the association results with results from linkage mapping in F2 crosses, we identify one previously known true positive and several promising new associations, but also demonstrate the existence of both false positives and false negatives. Our results illustrate the potential of genome-wide association scans as a tool for dissecting the genetics of natural variation, while at the same time highlighting the pitfalls. The importance of study design is clear; our study is severely under-powered both in terms of sample size and marker density. Our results also provide a striking demonstration of confounding by population structure. While statistical methods can be used to ameliorate this problem, they cannot always be effective and are certainly not a substitute for independent evidence, such as that obtained via crosses or transgenic experiments. Ultimately, association mapping is a powerful tool for identifying a list of candidates that is short enough to permit further genetic study.  相似文献   

16.
Implementing false discovery rate control: increasing your power   总被引:23,自引:0,他引:23  
Popular procedures to control the chance of making type I errors when multiple statistical tests are performed come at a high cost: a reduction in power. As the number of tests increases, power for an individual test may become unacceptably low. This is a consequence of minimizing the chance of making even a single type I error, which is the aim of, for instance, the Bonferroni and sequential Bonferroni procedures. An alternative approach, control of the false discovery rate (FDR), has recently been advocated for ecological studies. This approach aims at controlling the proportion of significant results that are in fact type I errors. Keeping the proportion of type I errors low among all significant results is a sensible, powerful, and easy-to-interpret way of addressing the multiple testing issue. To encourage practical use of the approach, in this note we illustrate how the proposed procedure works, we compare it to more traditional methods that control the familywise error rate, and we discuss some recent useful developments in FDR control.  相似文献   

17.
Using the Genetic Analysis Workshop 13 simulated data set, we compared the technique of importance sampling to several other methods designed to adjust p-values for multiple testing: the Bonferroni correction, the method proposed by Feingold et al., and na?ve Monte Carlo simulation. We performed affected sib-pair linkage analysis for each of the 100 replicates for each of five binary traits and adjusted the derived p-values using each of the correction methods. The type I error rates for each correction method and the ability of each of the methods to detect loci known to influence trait values were compared. All of the methods considered were conservative with respect to type I error, especially the Bonferroni method. The ability of these methods to detect trait loci was also low. However, this may be partially due to a limitation inherent in our binary trait definitions.  相似文献   

18.
Zhao Y  Yu H  Zhu Y  Ter-Minassian M  Peng Z  Shen H  Diao N  Chen F 《PloS one》2012,7(2):e31134
Family based association study (FBAS) has the advantages of controlling for population stratification and testing for linkage and association simultaneously. We propose a retrospective multilevel model (rMLM) approach to analyze sibship data by using genotypic information as the dependent variable. Simulated data sets were generated using the simulation of linkage and association (SIMLA) program. We compared rMLM to sib transmission/disequilibrium test (S-TDT), sibling disequilibrium test (SDT), conditional logistic regression (CLR) and generalized estimation equations (GEE) on the measures of power, type I error, estimation bias and standard error. The results indicated that rMLM was a valid test of association in the presence of linkage using sibship data. The advantages of rMLM became more evident when the data contained concordant sibships. Compared to GEE, rMLM had less underestimated odds ratio (OR). Our results support the application of rMLM to detect gene-disease associations using sibship data. However, the risk of increasing type I error rate should be cautioned when there is association without linkage between the disease locus and the genotyped marker.  相似文献   

19.
The analysis of multiple endpoints in clinical trials   总被引:11,自引:0,他引:11  
Treatment comparisons in randomized clinical trials usually involve several endpoints such that conventional significance testing can seriously inflate the overall Type I error rate. One option is to select a single primary endpoint for formal statistical inference, but this is not always feasible. Another approach is to apply Bonferroni correction (i.e., multiply each P-value by the total number of endpoints). Its conservatism for correlated endpoints is examined for multivariate normal data. A third approach is to derive an appropriate global test statistic and this paper explores one such test applicable to any set of asymptotically normal test statistics. Quantitative, binary, and survival endpoints are all considered within this general framework. Two examples are presented and the relative merits of the proposed strategies are discussed.  相似文献   

20.
A threshold of 3.3 for a genome-wide maximum LOD score (MAXLOD) has been demonstrated in human linkage studies as corresponding to a type I error rate of 5%. Generalization of this work to other species assumes the presence of an infinitely dense marker map. While this assumption is increasingly realistic for the human genome, it may be unrealistic for the dog genome. In this study we establish the analytic and empirical thresholds for MAXLOD in canine linkage studies corresponding to type I error rates of 5% and 1% for autosomal traits. Empirical thresholds are computed via simulation assuming a 10 cM map with no fine mapping performed. Pedigree structures for simulations were drawn from two canine disease studies. Five thousand replicates of genome-wide null genotype data were simulated and analyzed for each disease. We determined that MAXLOD thresholds of 3.2 and 2.7 correspond to analytic and empirical type I error rates of 5%, respectively. In all cases, the MAXLOD thresholds from simulations were always at least 0.5 LOD units below the corresponding analytic thresholds. We therefore recommend that a threshold of 3.2 be used for canine linkage studies when fine mapping is performed, and that researchers perform their own simulation studies to assess genome-wide empirical significance levels when no fine mapping is performed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号