首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 546 毫秒
1.
Zang Y  Zhang H  Yang Y  Zheng G 《Human heredity》2007,63(3-4):187-195
The population-based case-control design is a powerful approach for detecting susceptibility markers of a complex disease. However, this approach may lead to spurious association when there is population substructure: population stratification (PS) or cryptic relatedness (CR). Two simple approaches to correct for the population substructure are genomic control (GC) and delta centralization (DC). GC uses the variance inflation factor to correct for the variance distortion of a test statistic, and the DC centralizes the non-central chi-square distribution of the test statistic. Both GC and DC have been studied for case-control association studies mainly under a specific genetic model (e.g. recessive, additive or dominant), under which an optimal trend test is available. The genetic model is usually unknown for many complex diseases. In this situation, we study the performance of three robust tests based on the GC and DC corrections in the presence of the population substructure. Our results show that, when the genetic model is unknown, the DC- (or GC-) corrected maximum and Pearson's association test are robust and have good control of Type I error and high power relative to the optimal trend tests in the presence of PS (or CR).  相似文献   

2.
We propose in this paper a unified approach for testing the association between rare variants and phenotypes in sequencing association studies. This approach maximizes power by adaptively using the data to optimally combine the burden test and the nonburden sequence kernel association test (SKAT). Burden tests are more powerful when most variants in a region are causal and the effects are in the same direction, whereas SKAT is more powerful when a large fraction of the variants in a region are noncausal or the effects of causal variants are in different directions. The proposed unified test maintains the power in both scenarios. We show that the unified test corresponds to the optimal test in an extended family of SKAT tests, which we refer to as SKAT-O. The second goal of this paper is to develop a small-sample adjustment procedure for the proposed methods for the correction of conservative type I error rates of SKAT family tests when the trait of interest is dichotomous and the sample size is small. Both small-sample-adjusted SKAT and the optimal unified test (SKAT-O) are computationally efficient and can easily be applied to genome-wide sequencing association studies. We evaluate the finite sample performance of the proposed methods using extensive simulation studies and illustrate their application using the acute-lung-injury exome-sequencing data of the National Heart, Lung, and Blood Institute Exome Sequencing Project.  相似文献   

3.
Determination of the relevance of both demanding classical epidemiologic criteria for control selection and robust handling of population stratification (PS) represents a major challenge in the design and analysis of genome-wide association studies (GWAS). Empirical data from two GWAS in European Americans of the Cancer Genetic Markers of Susceptibility (CGEMS) project were used to evaluate the impact of PS in studies with different control selection strategies. In each of the two original case-control studies nested in corresponding prospective cohorts, a minor confounding effect due to PS (inflation factor lambda of 1.025 and 1.005) was observed. In contrast, when the control groups were exchanged to mimic a cost-effective but theoretically less desirable control selection strategy, the confounding effects were larger (lambda of 1.090 and 1.062). A panel of 12,898 autosomal SNPs common to both the Illumina and Affymetrix commercial platforms and with low local background linkage disequilibrium (pair-wise r(2)<0.004) was selected to infer population substructure with principal component analysis. A novel permutation procedure was developed for the correction of PS that identified a smaller set of principal components and achieved a better control of type I error (to lambda of 1.032 and 1.006, respectively) than currently used methods. The overlap between sets of SNPs in the bottom 5% of p-values based on the new test and the test without PS correction was about 80%, with the majority of discordant SNPs having both ranks close to the threshold. Thus, for the CGEMS GWAS of prostate and breast cancer conducted in European Americans, PS does not appear to be a major problem in well-designed studies. A study using suboptimal controls can have acceptable type I error when an effective strategy for the correction of PS is employed.  相似文献   

4.
Population stratification remains an important issue in case-control studies of disease-marker association, even within populations considered to be genetically homogeneous. Campbell et al. (Nature Genetics 2005;37:868-872) illustrated this by showing that stratification induced a spurious association between the lactase gene (LCT) and tall/short status in a European American sample. Furthermore, existing approaches for controlling stratification by use of substructure-informative loci (e.g., genomic control, structured association, and principal components) could not resolve this confounding. To address this problem, we propose a simple two-step procedure. In the first step, we model the odds of disease, given data on substructure-informative loci (excluding the test locus). For each participant, we use this model to calculate a stratification score, which is that participant's estimated odds of disease calculated using his or her substructure-informative-loci data in the disease-odds model. In the second step, we assign subjects to strata defined by stratification score and then test for association between the disease and the test locus within these strata. The resulting association test is valid even in the presence of population stratification. Our approach is computationally simple and less model dependent than are existing approaches for controlling stratification. To illustrate these properties, we apply our approach to the data from Campbell et al. and find no association between the LCT locus and tall/short status. Using simulated data, we show that our approach yields a more appropriate correction for stratification than does principal components or genomic control.  相似文献   

5.
By testing DNA pools rather than single samples the number of tests for a case-control association study can be decreased to only two for each marker: one on the patient and one on the control pool. A fundamental requirement is that each pool represents the frequency of the markers in the corresponding population beyond the influence of experimental errors. Consequently the latter must be carefully determined. To this aim, we prepared pools of different size (49-402 individuals) with accurately quantified DNAs, estimated the allelic frequencies in the pools of two SNPs by primer extension genotyping followed by DHPLC analysis and compared them with the real frequencies determined in the single samples. Our data show that (1) the method is highly reproducible: the standard deviation of repeated determinations was +/-0.014; (2) the experimental error (i.e., the discrepancy between the estimated and real frequencies) was +/-0.013 (95% C.I.: 0.0098-0.0165). The magnitude of this error was not correlated to the pool size or to the type of SNP. The effect of the observed experimental error on the power of the association test was evaluated. We conclude that this method constitutes an efficient tool for high-throughput association screenings provided that the experimental error is low. We therefore recommend that before a pool is used for extensive association studies, its quality, i.e., the experimental error, is verified by determining the difference between estimated and real frequencies for at least one marker.  相似文献   

6.
OBJECTIVE: Case-control association studies in mixed populations can result in spurious disease-marker associations if subpopulation disease prevalence and marker frequencies both differ. Genomic control (GC) uses neutral loci to correct for spurious association (due to population stratification), but how well this works remains undetermined. METHODS: We simulated and mixed populations with different disease and marker frequencies but without marker-disease association. We generated case-control datasets, calculated the chi2 for disease association with each marker, and applied two GC procedures, dividing by the mean chi2 or median-chi2/0.456. RESULTS: Corrections became conservative (false positive rate [FPR] <5%) with increasing subpopulation prevalence and marker differences. The mean correction resulted in FPRs close to 5% at average subpopulation allele frequency differences <0.26, but inclusion of just a few markers with large frequency differences resulted in conservative FPRs. FPRs from the median correction were mostly conservative but became anticonservative when a few markers with large frequency differences were included. CONCLUSION: GC can both lead to a notable loss of power to detect a true association (conservative) in many circumstances or may fail to eliminate the spurious associations (anticonservative). The mean correction factor is useful in certain situations to correct population stratification, but it is difficult to know when those situations exist.  相似文献   

7.
复杂疾病全基因组关联研究进展——遗传统计分析   总被引:7,自引:0,他引:7  
严卫丽 《遗传》2008,30(5):543-549
2005年, Science杂志首次报道了有关人类年龄相关性黄斑变性的全基因组关联研究, 此后有关肥胖、2型糖尿病、冠心病、阿尔茨海默病等一系列复杂疾病的全基因组关联研究被陆续报道, 这一阶段被称为人类全基因组关联研究的第一次浪潮。文章分别介绍了全基因组关联研究统计分析的方法、软件和应用实例; 比较了关联分析中多重检验的P值调整方法, 包括Bonferroni、递减的Bonferroni校正法、模拟运算法和控制错误发现率的方法; 还讨论了人群混杂对关联分析结果可能产生的影响及原理, 以及全基因组关联研究中控制人群混杂的方法的研究进展和应用实例。在全基因组关联研究的第一次浪潮中, 应用经典的遗传统计方法发现了许多基因-表型之间的关联并且能够对这些关联做出解释, 其中包括许多基因组中的未知基因和染色体区域。然而, 全基因组关联研究的继续发展需要进一步阐述基因组内基因之间相互作用、基因-基因之间的复杂作用网络与环境因素的相互作用在复杂疾病发生中的作用, 现有的统计分析方法肯定不能满足需要, 开发更为高级的统计分析方法势在必行。最后, 文章还给出了全基因组关联研究统计分析软件的相关网站信息。  相似文献   

8.
Murphy A  Weiss ST  Lange C 《PLoS genetics》2008,4(9):e1000197
For genome-wide association studies in family-based designs, we propose a powerful two-stage testing strategy that can be applied in situations in which parent-offspring trio data are available and all offspring are affected with the trait or disease under study. In the first step of the testing strategy, we construct estimators of genetic effect size in the completely ascertained sample of affected offspring and their parents that are statistically independent of the family-based association/transmission disequilibrium tests (FBATs/TDTs) that are calculated in the second step of the testing strategy. For each marker, the genetic effect is estimated (without requiring an estimate of the SNP allele frequency) and the conditional power of the corresponding FBAT/TDT is computed. Based on the power estimates, a weighted Bonferroni procedure assigns an individually adjusted significance level to each SNP. In the second stage, the SNPs are tested with the FBAT/TDT statistic at the individually adjusted significance levels. Using simulation studies for scenarios with up to 1,000,000 SNPs, varying allele frequencies and genetic effect sizes, the power of the strategy is compared with standard methodology (e.g., FBATs/TDTs with Bonferroni correction). In all considered situations, the proposed testing strategy demonstrates substantial power increases over the standard approach, even when the true genetic model is unknown and must be selected based on the conditional power estimates. The practical relevance of our methodology is illustrated by an application to a genome-wide association study for childhood asthma, in which we detect two markers meeting genome-wide significance that would not have been detected using standard methodology.  相似文献   

9.
In this paper, different strategies to test for association in samples with related individuals designed for linkage studies are compared. Because no independent controls are available, a family-based association test and case-control tests corrected for the presence of related individuals in which unaffected relatives are used as controls were tested. When unrelated controls are available, additional strategies including selection of a single case per family considering either all families or a subset of linked families, are also considered. Analyses are performed on the simulated dataset, blind to the answers. The case-control test corrected for the presence of related individuals is the most powerful strategy to detect three loci associated with the disease under study. Using a correction factor for the case-control test performed conditional on the marker information rather than unconditional does not impact the power significantly.  相似文献   

10.
The paper considers the problem of determining the number of matched sets in 1 : M matched case-control studies with a categorical exposure having k + 1 categories, k > or = 1. The basic interest lies in constructing a test statistic to test whether the exposure is associated with the disease. Estimates of the k odds ratios for 1 : M matched case-control studies with dichotomous exposure and for 1 : 1 matched case-control studies with exposure at several levels are presented in Breslow and Day (1980), but results holding in full generality were not available so far. We propose a score test for testing the hypothesis of no association between disease and the polychotomous exposure. We exploit the power function of this test statistic to calculate the required number of matched sets to detect specific departures from the null hypothesis of no association. We also consider the situation when there is a natural ordering among the levels of the exposure variable. For ordinal exposure variables, we propose a test for detecting trend in disease risk with increasing levels of the exposure variable. Our methods are illustrated with two datasets, one is a real dataset on colorectal cancer in rats and the other a simulated dataset for studying disease-gene association.  相似文献   

11.
The power of genomic control   总被引:16,自引:0,他引:16       下载免费PDF全文
Although association analysis is a useful tool for uncovering the genetic underpinnings of complex traits, its utility is diminished by population substructure, which can produce spurious association between phenotype and genotype within population-based samples. Because family-based designs are robust against substructure, they have risen to the fore of association analysis. Yet, if population substructure could be ignored, this robustness can come at the price of power. Unfortunately it is rarely evident when population substructure can be ignored. Devlin and Roeder recently have proposed a method, termed "genomic control" (GC), which has the robustness of family-based designs even though it uses population-based data. GC uses the genome itself to determine appropriate corrections for population-based association tests. Using the GC method, we contrast the power of two study designs, family trios (i.e., father, mother, and affected progeny) versus case-control. For analysis of trios, we use the TDT test. When population substructure is absent, we find GC is always more powerful than TDT; furthermore, contrary to previous results, we show that as a disease becomes more prevalent the discrepancy in power becomes more extreme. When population substructure is present, however, the results are more complex: TDT is more powerful when population substructure is substantial, and GC is more powerful otherwise. We also explore general issues of power and implementation of GC within the case-control setting and find that, economically, GC is at least comparable to and often less expensive than family-based methods. Therefore, GC methods should prove a useful complement to family-based methods for the genetic analysis of complex traits.  相似文献   

12.
In population-based case-control association studies, the regular chi (2) test is often used to investigate association between a candidate locus and disease. However, it is well known that this test may be biased in the presence of population stratification and/or genotyping error. Unlike some other biases, this bias will not go away with increasing sample size. On the contrary, the false-positive rate will be much larger when the sample size is increased. The usual family-based designs are robust against population stratification, but they are sensitive to genotype error. In this article, we propose a novel method of simultaneously correcting for the bias arising from population stratification and/or for the genotyping error in case-control studies. The appropriate corrections depend on sample odds ratios of the standard 2x3 tables of genotype by case and control from null loci. Therefore, the test is simple to apply. The corrected test is robust against misspecification of the genetic model. If the null hypothesis of no association is rejected, the corrections can be further used to estimate the effect of the genetic factor. We considered a simulation study to investigate the performance of the new method, using parameter values similar to those found in real-data examples. The results show that the corrected test approximately maintains the expected type I error rate under various simulation conditions. It also improves the power of the association test in the presence of population stratification and/or genotyping error. The discrepancy in power between the tests with correction and those without correction tends to be more extreme as the magnitude of the bias becomes larger. Therefore, the bias-correction method proposed in this article should be useful for the genetic analysis of complex traits.  相似文献   

13.
The joint analysis of multiple traits has recently become popular since it can increase statistical power to detect genetic variants and there is increasing evidence showing that pleiotropy is a widespread phenomenon in complex diseases. Currently, most of existing methods use all of the traits for testing the association between multiple traits and a single variant. However, those methods for association studies may lose power in the presence of a large number of noise traits. In this paper, we propose an “optimal” maximum heritability test (MHT-O) to test the association between multiple traits and a single variant. MHT-O includes a procedure of deleting traits that have weak or no association with the variant. Using extensive simulation studies, we compare the performance of MHT-O with MHT, Trait-based Association Test uses Extended Simes procedure (TATES), SUM_SCORE and MANOVA. Our results show that, in all of the simulation scenarios, MHT-O is either the most powerful test or comparable to the most powerful test among the five tests we compared.  相似文献   

14.
Wang J  Shete S 《PloS one》2011,6(11):e27642
In case-control genetic association studies, cases are subjects with the disease and controls are subjects without the disease. At the time of case-control data collection, information about secondary phenotypes is also collected. In addition to studies of primary diseases, there has been some interest in studying genetic variants associated with secondary phenotypes. In genetic association studies, the deviation from Hardy-Weinberg proportion (HWP) of each genetic marker is assessed as an initial quality check to identify questionable genotypes. Generally, HWP tests are performed based on the controls for the primary disease or secondary phenotype. However, when the disease or phenotype of interest is common, the controls do not represent the general population. Therefore, using only controls for testing HWP can result in a highly inflated type I error rate for the disease- and/or phenotype-associated variants. Recently, two approaches, the likelihood ratio test (LRT) approach and the mixture HWP (mHWP) exact test were proposed for testing HWP in samples from case-control studies. Here, we show that these two approaches result in inflated type I error rates and could lead to the removal from further analysis of potential causal genetic variants associated with the primary disease and/or secondary phenotype when the study of primary disease is frequency-matched on the secondary phenotype. Therefore, we proposed alternative approaches, which extend the LRT and mHWP approaches, for assessing HWP that account for frequency matching. The goal was to maintain more (possible causative) single-nucleotide polymorphisms in the sample for further analysis. Our simulation results showed that both extended approaches could control type I error probabilities. We also applied the proposed approaches to test HWP for SNPs from a genome-wide association study of lung cancer that was frequency-matched on smoking status and found that the proposed approaches can keep more genetic variants for association studies.  相似文献   

15.
We show the negative link between genome size and metabolic intensity in tetrapods, using the heart index (relative heart mass) as a unified indicator of metabolic intensity in poikilothermal and homeothermal animals. We found two separate regression lines of heart index on genome size for reptiles-birds and amphibians-mammals (the slope of regression is steeper in reptiles-birds). We also show a negative correlation between GC content and nucleosome formation potential in vertebrate DNA, and, consistent with this relationship, a positive correlation between genome GC content and nuclear size (independent of genome size). It is known that there are two separate regression lines of genome GC content on genome size for reptiles-birds and amphibians-mammals: reptiles-birds have the relatively higher GC content (for their genome sizes) compared to amphibians-mammals. Our results suggest uniting all these data into one concept. The slope of negative regression between GC content and nucleosome formation potential is steeper in exons than in non-coding DNA (where nucleosome formation potential is generally higher), which indicates a special role of non-coding DNA for orderly chromatin organization. The chromatin condensation and nuclear size are supposed to be key parameters that accommodate the effects of both genome size and GC content and connect them with metabolic intensity. Our data suggest that the reptilian-birds clade evolved special relationships among these parameters, whereas mammals preserved the amphibian-like relationships. Surprisingly, mammals, although acquiring a more complex general organization, seem to retain certain genome-related properties that are similar to amphibians. At the same time, the slope of regression between nucleosome formation potential and GC content is steeper in poikilothermal than in homeothermal genomes, which suggests that mammals and birds acquired certain common features of genomic organization.  相似文献   

16.
Genome-wide association studies have reported a promising association of rs4072037 with gastric cancer (GC). This variant was associated with altered physiological function of MUC1 possibly by modulating promoter activity and alternative splicing of MUC1. However, the association results were inconclusive and estimate of the effect of this variant was not well evaluated. A meta-analysis by systematically reviewing relevant reports may facilitate to address these concerns. Association studies involving MUC1 rs4072037 polymorphism and GC risk were identified up to June 30, 2012. Odds ratio (OR) and 95 % confidence interval (CI) in additive model were estimated or extracted from each study. The pooled effect size was quantitatively synthesized using meta-analysis. Heterogeneity between studies was measured by the Q test and I 2 statistic, and publication bias was evaluated by a funnel plot and the Egger’s test. A total of 10 independent case–control studies including 6,580 GC cases and 10,324 controls were included in this meta-analysis. Eight of the ten studies were Asian ethnicity and two European. The G allele of MUC1 rs4072037 was significantly associated with a decreased risk of GC (OR = 0.72, 95 % CI 0.68–0.77; P = 7.82 × 10?25), as compared with A allele. Stratification for different ethnicity, tumor localization or type showed similar results. These findings represent important evidence for association of MUC1 rs4072037 variant with GC risk, and also provide a relatively reliable estimate of effect size. MUC1 is a strong candidate as a susceptibility gene of GC.  相似文献   

17.
In modern whole-genome scans, the use of stringent thresholds to control the genome-wide testing error distorts the estimation process, producing estimated effect sizes that may be on average far greater in magnitude than the true effect sizes. We introduce a method, based on the estimate of genetic effect and its standard error as reported by standard statistical software, to correct for this bias in case-control association studies. Our approach is widely applicable, is far easier to implement than competing approaches, and may often be applied to published studies without access to the original data. We evaluate the performance of our approach via extensive simulations for a range of genetic models, minor allele frequencies, and genetic effect sizes. Compared to the naive estimation procedure, our approach reduces the bias and the mean squared error, especially for modest effect sizes. We also develop a principled method to construct confidence intervals for the genetic effect that acknowledges the conditioning on statistical significance. Our approach is described in the specific context of odds ratios and logistic modeling but is more widely applicable. Application to recently published data sets demonstrates the relevance of our approach to modern genome scans.  相似文献   

18.
Zhang F  Wang Y  Deng HW 《PloS one》2008,3(10):e3392
Population stratification can cause spurious associations in population-based association studies. Several statistical methods have been proposed to reduce the impact of population stratification on population-based association studies. We simulated a set of stratified populations based on the real haplotype data from the HapMap ENCODE project, and compared the relative power, type I error rates, accuracy and positive prediction value of four prevailing population-based association study methods: traditional case-control tests, structured association (SA), genomic control (GC) and principal components analysis (PCA) under various population stratification levels. Additionally, we evaluated the effects of sample sizes and frequencies of disease susceptible allele on the performance of the four analytical methods in the presence of population stratification. We found that the performance of PCA was very stable under various scenarios. Our comparison results suggest that SA and PCA have comparable performance, if sufficient ancestral informative markers are used in SA analysis. GC appeared to be strongly conservative in significantly stratified populations. It may be better to apply GC in the stratified populations with low stratification level. Our study intends to provide a practical guideline for researchers to select proper study methods and make appropriate inference of the results in population-based association studies.  相似文献   

19.
Mendelian randomization (MR) analysis uses genotypes as instruments to estimate the causal effect of an exposure in the presence of unobserved confounders. The existing MR methods focus on the data generated from prospective cohort studies. We develop a procedure for studying binary outcomes under a case-control design. The proposed procedure is built upon two working models commonly used for MR analyses and adopts a quasi-empirical likelihood framework to address the ascertainment bias from case-control sampling. We derive various approaches for estimating the causal effect and hypothesis testing under the empirical likelihood framework. We conduct extensive simulation studies to evaluate the proposed methods. We find that the proposed empirical likelihood estimate is less biased than the existing estimates. Among all the approaches considered, the Lagrange multiplier (LM) test has the highest power, and the confidence intervals derived from the LM test have the most accurate coverage. We illustrate the use of our method in MR analysis of prostate cancer case-control data with vitamin D level as exposure and three single nucleotide polymorphisms as instruments.  相似文献   

20.
Genome-wide association studies (GWAS) comprise a powerful tool for mapping genes of complex traits. However, an inflation of the test statistic can occur because of population substructure or cryptic relatedness, which could cause spurious associations. If information on a large number of genetic markers is available, adjusting the analysis results by using the method of genomic control (GC) is possible. GC was originally proposed to correct the Cochran-Armitage additive trend test. For non-additive models, correction has been shown to depend on allele frequencies. Therefore, usage of GC is limited to situations where allele frequencies of null markers and candidate markers are matched. In this work, we extended the capabilities of the GC method for non-additive models, which allows us to use null markers with arbitrary allele frequencies for GC. Analytical expressions for the inflation of a test statistic describing its dependency on allele frequency and several population parameters were obtained for recessive, dominant, and over-dominant models of inheritance. We proposed a method to estimate these required population parameters. Furthermore, we suggested a GC method based on approximation of the correction coefficient by a polynomial of allele frequency and described procedures to correct the genotypic (two degrees of freedom) test for cases when the model of inheritance is unknown. Statistical properties of the described methods were investigated using simulated and real data. We demonstrated that all considered methods were effective in controlling type 1 error in the presence of genetic substructure. The proposed GC methods can be applied to statistical tests for GWAS with various models of inheritance. All methods developed and tested in this work were implemented using R language as a part of the GenABEL package.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号