共查询到20条相似文献,搜索用时 0 毫秒
1.
Simultaneously correcting for population stratification and for genotyping error in case-control association studies
下载免费PDF全文

In population-based case-control association studies, the regular chi (2) test is often used to investigate association between a candidate locus and disease. However, it is well known that this test may be biased in the presence of population stratification and/or genotyping error. Unlike some other biases, this bias will not go away with increasing sample size. On the contrary, the false-positive rate will be much larger when the sample size is increased. The usual family-based designs are robust against population stratification, but they are sensitive to genotype error. In this article, we propose a novel method of simultaneously correcting for the bias arising from population stratification and/or for the genotyping error in case-control studies. The appropriate corrections depend on sample odds ratios of the standard 2x3 tables of genotype by case and control from null loci. Therefore, the test is simple to apply. The corrected test is robust against misspecification of the genetic model. If the null hypothesis of no association is rejected, the corrections can be further used to estimate the effect of the genetic factor. We considered a simulation study to investigate the performance of the new method, using parameter values similar to those found in real-data examples. The results show that the corrected test approximately maintains the expected type I error rate under various simulation conditions. It also improves the power of the association test in the presence of population stratification and/or genotyping error. The discrepancy in power between the tests with correction and those without correction tends to be more extreme as the magnitude of the bias becomes larger. Therefore, the bias-correction method proposed in this article should be useful for the genetic analysis of complex traits. 相似文献
2.
A unified association analysis approach for family and unrelated samples correcting for stratification 总被引:1,自引:0,他引:1
下载免费PDF全文

There are two common designs for association mapping of complex diseases: case-control and family-based designs. A case-control sample is more powerful to detect genetic effects than a family-based sample that contains the same numbers of affected and unaffected persons, although additional markers may be required to control for spurious association. When family and unrelated samples are available, statistical analyses are often performed in the family and unrelated samples separately, conditioning on parental information for the former, thus resulting in reduced power. In this report, we propose a unified approach that can incorporate both family and case-control samples and, provided the additional markers are available, at the same time corrects for population stratification. We apply the principal components of a marker matrix to adjust for the effect of population stratification. This unified approach makes it unnecessary to perform a conditional analysis of the family data and is more powerful than the separate analyses of unrelated and family samples, or a meta-analysis performed by combining the results of the usual separate analyses. This property is demonstrated in both a variety of simulation models and empirical data. The proposed approach can be equally applied to the analysis of both qualitative and quantitative traits. 相似文献
3.
S Lee MJ Emond MJ Bamshad KC Barnes MJ Rieder DA Nickerson;NHLBI GO Exome Sequencing Project—ESP Lung Project Team DC Christiani MM Wurfel X Lin 《American journal of human genetics》2012,91(2):224-237
We propose in this paper a unified approach for testing the association between rare variants and phenotypes in sequencing association studies. This approach maximizes power by adaptively using the data to optimally combine the burden test and the nonburden sequence kernel association test (SKAT). Burden tests are more powerful when most variants in a region are causal and the effects are in the same direction, whereas SKAT is more powerful when a large fraction of the variants in a region are noncausal or the effects of causal variants are in different directions. The proposed unified test maintains the power in both scenarios. We show that the unified test corresponds to the optimal test in an extended family of SKAT tests, which we refer to as SKAT-O. The second goal of this paper is to develop a small-sample adjustment procedure for the proposed methods for the correction of conservative type I error rates of SKAT family tests when the trait of interest is dichotomous and the sample size is small. Both small-sample-adjusted SKAT and the optimal unified test (SKAT-O) are computationally efficient and can easily be applied to genome-wide sequencing association studies. We evaluate the finite sample performance of the proposed methods using extensive simulation studies and illustrate their application using the acute-lung-injury exome-sequencing data of the National Heart, Lung, and Blood Institute Exome Sequencing Project. 相似文献
4.
Population stratification remains an important issue in case-control studies of disease-marker association, even within populations considered to be genetically homogeneous. Campbell et al. (Nature Genetics 2005;37:868-872) illustrated this by showing that stratification induced a spurious association between the lactase gene (LCT) and tall/short status in a European American sample. Furthermore, existing approaches for controlling stratification by use of substructure-informative loci (e.g., genomic control, structured association, and principal components) could not resolve this confounding. To address this problem, we propose a simple two-step procedure. In the first step, we model the odds of disease, given data on substructure-informative loci (excluding the test locus). For each participant, we use this model to calculate a stratification score, which is that participant's estimated odds of disease calculated using his or her substructure-informative-loci data in the disease-odds model. In the second step, we assign subjects to strata defined by stratification score and then test for association between the disease and the test locus within these strata. The resulting association test is valid even in the presence of population stratification. Our approach is computationally simple and less model dependent than are existing approaches for controlling stratification. To illustrate these properties, we apply our approach to the data from Campbell et al. and find no association between the LCT locus and tall/short status. Using simulated data, we show that our approach yields a more appropriate correction for stratification than does principal components or genomic control. 相似文献
5.
Population stratification can cause spurious associations in population-based association studies. Several statistical methods have been proposed to reduce the impact of population stratification on population-based association studies. We simulated a set of stratified populations based on the real haplotype data from the HapMap ENCODE project, and compared the relative power, type I error rates, accuracy and positive prediction value of four prevailing population-based association study methods: traditional case-control tests, structured association (SA), genomic control (GC) and principal components analysis (PCA) under various population stratification levels. Additionally, we evaluated the effects of sample sizes and frequencies of disease susceptible allele on the performance of the four analytical methods in the presence of population stratification. We found that the performance of PCA was very stable under various scenarios. Our comparison results suggest that SA and PCA have comparable performance, if sufficient ancestral informative markers are used in SA analysis. GC appeared to be strongly conservative in significantly stratified populations. It may be better to apply GC in the stratified populations with low stratification level. Our study intends to provide a practical guideline for researchers to select proper study methods and make appropriate inference of the results in population-based association studies. 相似文献
6.
A randomization test for controlling population stratification in whole-genome association studies
下载免费PDF全文

Kimmel G Jordan MI Halperin E Shamir R Karp RM 《American journal of human genetics》2007,81(5):895-905
Population stratification can be a serious obstacle in the analysis of genomewide association studies. We propose a method for evaluating the significance of association scores in whole-genome cohorts with stratification. Our approach is a randomization test akin to a standard permutation test. It conditions on the genotype matrix and thus takes into account not only the population structure but also the complex linkage disequilibrium structure of the genome. As we show in simulation experiments, our method achieves higher power and significantly better control over false-positive rates than do existing methods. In addition, it can be easily applied to whole-genome association studies. 相似文献
7.
Testing for population subdivision and association in four case-control studies 总被引:13,自引:0,他引:13
下载免费PDF全文

Population structure has been presumed to cause many of the unreplicated disease-marker associations reported in the literature, yet few actual case-control studies have been evaluated for the presence of structure. Here, we examine four moderate case-control samples, comprising 3,472 individuals, to determine if detectable population subdivision is present. The four population samples include: 500 U.S. whites and 236 African Americans with hypertension; and 500 U.S. whites and 500 Polish whites with type 2 diabetes, all with matched control subjects. Both diabetes populations were typed for the PPARg Pro12Ala polymorphism, to replicate this well-supported association (Altshuler et al. 2000). In each of the four samples, we tested for structure, using the sum of the case-control allele frequency chi(2) statistics for 9 STR and 35 SNP markers (Pritchard and Rosenberg 1999). We found weak evidence for population structure in the African American sample only, but further refinement of the sample, to include only individuals with U.S.-born parents and grandparents, eliminated the stratification. Our examples provide insight into the factors affecting the replication of association studies and suggest that carefully matched, moderate-sized case-control samples in cosmopolitan U.S. and European populations are unlikely to contain levels of structure that would result in significantly inflated numbers of false-positive associations. We explore the role that extreme differences in power among studies, due to sample size and risk-allele frequency differences, may play in the replication problem. 相似文献
8.
Background
The vast majority of genetic risk factors for complex diseases have, taken individually, a small effect on the end phenotype. Population-based association studies therefore need very large sample sizes to detect significant differences between affected and non-affected individuals. Including thousands of affected individuals in a study requires recruitment in numerous centers, possibly from different geographic regions. Unfortunately such a recruitment strategy is likely to complicate the study design and to generate concerns regarding population stratification.Methodology/Principal Findings
We analyzed 9,751 individuals representing three main ethnic groups - Europeans, Arabs and South Asians - that had been enrolled from 154 centers involving 52 countries for a global case/control study of acute myocardial infarction. All individuals were genotyped at 103 candidate genes using 1,536 SNPs selected with a tagging strategy that captures most of the genetic diversity in different populations. We show that relying solely on self-reported ethnicity is not sufficient to exclude population stratification and we present additional methods to identify and correct for stratification.Conclusions/Significance
Our results highlight the importance of carefully addressing population stratification and of carefully “cleaning” the sample prior to analyses to obtain stronger signals of association and to avoid spurious results. 相似文献9.
10.
11.
In many case-control genetic association studies, a set of correlated secondary phenotypes that may share common genetic factors with disease status are collected. Examination of these secondary phenotypes can yield valuable insights about the disease etiology and supplement the main studies. However, due to unequal sampling probabilities between cases and controls, standard regression analysis that assesses the effect of SNPs (single nucleotide polymorphisms) on secondary phenotypes using cases only, controls only, or combined samples of cases and controls can yield inflated type I error rates when the test SNP is associated with the disease. To solve this issue, we propose a Gaussian copula-based approach that efficiently models the dependence between disease status and secondary phenotypes. Through simulations, we show that our method yields correct type I error rates for the analysis of secondary phenotypes under a wide range of situations. To illustrate the effectiveness of our method in the analysis of real data, we applied our method to a genome-wide association study on high-density lipoprotein cholesterol (HDL-C), where "cases" are defined as individuals with extremely high HDL-C level and "controls" are defined as those with low HDL-C level. We treated 4 quantitative traits with varying degrees of correlation with HDL-C as secondary phenotypes and tested for association with SNPs in LIPG, a gene that is well known to be associated with HDL-C. We show that when the correlation between the primary and secondary phenotypes is >0.2, the P values from case-control combined unadjusted analysis are much more significant than methods that aim to correct for ascertainment bias. Our results suggest that to avoid false-positive associations, it is important to appropriately model secondary phenotypes in case-control genetic association studies. 相似文献
12.
Association studies are traditionally performed in the case-control framework. As a first step in the analysis process, comparing allele frequencies using the Pearson's chi-square statistic is often invoked. However such an approach assumes the independence of alleles under the hypothesis of no association, which may not always be the case. Consequently this method introduces a bias that deviates the expected type I error-rate. In this article we first propose an unbiased and exact test as an alternative to the biased allelic test. Available data require to perform thousands of such tests so we focused on its fast execution. Since the biased allelic test is still widely used in the community, we illustrate its pitfalls in the context of genome-wide association studies and particularly in the case of low-level tests. Finally, we compare the unbiased and exact test with the Cochran-Armitage test for trend and show it perfoms similarly in terms of power. The fast, unbiased and exact allelic test code is available in R, C++ and Perl at: http://stat.genopole.cnrs.fr/software/fueatest. 相似文献
13.
In genetic association studies, due to the varying underlying genetic models, no single statistical test can be the most powerful test under all situations. Current studies show that if the underlying genetic models are known, trend-based tests, which outperform the classical Pearson χ2 test, can be constructed. However, when the underlying genetic models are unknown, the χ2 test is usually more robust than trend-based tests. In this paper, we propose a new association test based on a generalized genetic model, namely the generalized order-restricted relative risks model. Through a Monte Carlo simulation study, we show that the proposed association test is generally more powerful than the χ2 test, and more robust than those trend-based tests. The proposed methodologies are also illustrated by some real SNP datasets. 相似文献
14.
We introduce a new powerful nonparametric testing strategy for family-based association studies in which multiple quantitative traits are recorded and the phenotype with the strongest genetic component is not known prior to the analysis. In the first stage, using a population-based test based on the generalized estimating equation approach, we test all recorded phenotypes for association with the marker locus without biasing the nominal significance level of the later family-based analysis. In the second stage the phenotype with the smallest p value is selected and tested by a family-based association test for association with the marker locus. This strategy is robust against population admixture and stratification and does not require any adjustment for multiple testing. We demonstrate the advantages of this testing strategy over standard methodology in a simulation study. The practical importance of our testing strategy is illustrated by applications to the Childhood Asthma Management Program asthma data sets. 相似文献
15.
There is growing interest in genomewide association analysis using single-nucleotide polymorphisms (SNPs), because traditional linkage studies are not as powerful in identifying genes for common, complex diseases. Tests for linkage disequilibrium have been developed for binary and quantitative traits. However, since many human conditions and diseases are measured in an ordinal scale, methods need to be developed to investigate the association of genes and ordinal traits. Thus, in the current report we propose and derive a score test statistic that identifies genes that are associated with ordinal traits when gametic disequilibrium between a marker and trait loci exists. Through simulation, the performance of this new test is examined for both ordinal traits and quantitative traits. The proposed statistic not only accommodates and is more powerful for ordinal traits, but also has similar power to that of existing tests when the trait is quantitative. Therefore, our proposed statistic has the potential to serve as a unified approach to identifying genes that are associated with any trait, regardless of how the trait is measured. We further demonstrated the advantage of our test by revealing a significant association (P = 0.00067) between alcohol dependence and a SNP in the growth-associated protein 43. 相似文献
16.
17.
Population stratification is a form of confounding by ethnicity that may cause bias to effect estimates and inflate test statistics in genetic association studies. Unlinked genetic markers have been used to adjust for test statistics, but their use in correcting biased effect estimates has not been addressed. We evaluated the potential of bias correction that could be achieved by a single null marker (M) in studies involving one candidate gene (G). When the distribution of M varied greatly across ethnicities, controlling for M in a logistic regression model substantially reduced biases on odds ratio estimates. When M had same distributions as G across ethnicities, biases were further reduced or eliminated by subtracting the regression coefficient of M from the coefficient of G in the model, which was fitted either with or without a multiplicative interaction term between M and G. Correction of bias due to population stratification depended specifically on the distributions of G and M, the difference between baseline disease risks across ethnicities, and whether G had an effect on disease risk or not. Our results suggested that marker choice and the specific treatment of that marker in analysis greatly influenced bias correction. 相似文献
18.
There has been considerable debate in the literature concerning bias in case-control association mapping studies due to population stratification. In this paper, we perform a theoretical analysis of the effects of population stratification by measuring the inflation in the test's type I error (or false-positive rate). Using a model of stratified sampling, we derive an exact expression for the type I error as a function of population parameters and sample size. We give necessary and sufficient conditions for the bias to vanish when there is no statistical association between disease and marker genotype in each of the subpopulations making up the total population. We also investigate the variation of bias with increasing subpopulations and show, both theoretically and by using simulations, that the bias can sometimes be quite substantial even with a very large number of subpopulations. In a companion simulation-based paper (Heiman et al., Part I, this issue), we have focused on the CRR (confounding risk ratio) and its relationship to the type I error in the case of two subpopulations, and have also quantified the magnitude of the type I error that can occur with relatively low CRR values. 相似文献
19.
Use of unlinked genetic markers to detect population stratification in association studies. 总被引:38,自引:0,他引:38
下载免费PDF全文

We examine the issue of population stratification in association-mapping studies. In case-control studies of association, population subdivision or recent admixture of populations can lead to spurious associations between a phenotype and unlinked candidate loci. Using a model of sampling from a structured population, we show that if population stratification exists, it can be detected by use of unlinked marker loci. We show that the case-control-study design, using unrelated control individuals, is a valid approach for association mapping, provided that marker loci unlinked to the candidate locus are included in the study, to test for stratification. We suggest guidelines as to the number of unlinked marker loci to use. 相似文献
20.
Tsai HJ Kho JY Shaikh N Choudhry S Naqvi M Navarro D Matallana H Castro R Lilly CM Watson HG Meade K Lenoir M Thyne S Ziv E Burchard EG 《Human genetics》2006,118(5):626-639
Case-control genetic association studies in admixed populations are known to be susceptible to genetic confounding due to population stratification. The transmission/disequilibrium test (TDT) approach can avoid this problem. However, the TDT is expensive and impractical for late-onset diseases. Case-control study designs, in which, cases and controls are matched by admixture, can be an appealing and a suitable alternative for genetic association studies in admixed populations. In this study, we applied this matching strategy when recruiting our African American participants in the Study of African American, Asthma, Genes and Environments. Group admixture in this cohort consists of 83% African ancestry and 17% European ancestry, which was consistent with reports from other studies. By carrying out several complementary analyses, our results show that there is a substructure in the cohort, but that the admixture distributions are almost identical in cases and controls, and also in cases only. We performed association tests for asthma-related traits with ancestry, and only found that FEV(1), a measure for baseline pulmonary function, was associated with ancestry after adjusting for socio-economic and environmental risk factors (P=0.01). We did not observe an excess of type I error rate in our association tests for ancestry informative markers and asthma-related phenotypes when ancestry was not adjusted in the analyses. Furthermore, using the association tests between genetic variants in a known asthma candidate gene, beta(2) adrenergic receptor (beta(2)AR) and DeltaFEF(25-75), an asthma-related phenotype, as an example, we demonstrated population stratification was not a confounder in our genetic association. Our present work demonstrates that admixture-matched case-control strategies can efficiently control population stratification confounding in admixed populations. 相似文献