首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In population-based case-control association studies, the regular chi (2) test is often used to investigate association between a candidate locus and disease. However, it is well known that this test may be biased in the presence of population stratification and/or genotyping error. Unlike some other biases, this bias will not go away with increasing sample size. On the contrary, the false-positive rate will be much larger when the sample size is increased. The usual family-based designs are robust against population stratification, but they are sensitive to genotype error. In this article, we propose a novel method of simultaneously correcting for the bias arising from population stratification and/or for the genotyping error in case-control studies. The appropriate corrections depend on sample odds ratios of the standard 2x3 tables of genotype by case and control from null loci. Therefore, the test is simple to apply. The corrected test is robust against misspecification of the genetic model. If the null hypothesis of no association is rejected, the corrections can be further used to estimate the effect of the genetic factor. We considered a simulation study to investigate the performance of the new method, using parameter values similar to those found in real-data examples. The results show that the corrected test approximately maintains the expected type I error rate under various simulation conditions. It also improves the power of the association test in the presence of population stratification and/or genotyping error. The discrepancy in power between the tests with correction and those without correction tends to be more extreme as the magnitude of the bias becomes larger. Therefore, the bias-correction method proposed in this article should be useful for the genetic analysis of complex traits.  相似文献   

2.
The HapMap project has given case-control association studies a unique opportunity to uncover the genetic basis of complex diseases. However, persistent issues in such studies remain the proper quantification of, testing for, and correction for population stratification (PS). In this paper, we present the first unified paradigm that addresses all three fundamental issues within one statistical framework. Our unified approach makes use of an omnibus quantity (delta), which can be estimated in a case-control study from suitable null loci. We show how this estimated value can be used to quantify PS, to statistically test for PS, and to correct for PS, all in the context of case-control studies. Moreover, we provide guidelines for interpreting values of delta in association studies (e.g., at alpha = 0.05, a delta of size 0.416 is small, a delta of size 0.653 is medium, and a delta of size 1.115 is large). A novel feature of our testing procedure is its ability to test for either strictly any PS or only 'practically important' PS. We also performed simulations to compare our correction procedure with Genomic Control (GC). Our results show that, unlike GC, it maintains good Type I error rates and power across all levels of PS.  相似文献   

3.
When two or more populations have been separated by geographic or cultural boundaries for many generations, drift, spontaneous mutations, differential selection pressures and other factors may lead to allele frequency differences among populations. If these 'parental' populations subsequently come together and begin inter-mating, disequilibrium among linked markers may span a greater genetic distance than it typically does among populations under panmixia [see glossary]. This extended disequilibrium can make association studies highly effective and more economical than disequilibrium mapping in panmictic populations since less marker loci are needed to detect regions of the genome that harbor phenotype-influencing loci. However, under some circumstances, this process of intermating (as well as other processes) can produce disequilibrium between pairs of unlinked loci and thus create the possibility of confounding or spurious associations due to this population stratification. Accordingly, researchers are advised to employ valid statistical tests for linkage disequilibrium mapping allowing conduct of genetic association studies that control for such confounding. Many recent papers have addressed this need. We provide a comprehensive review of advances made in recent years in correcting for population stratification and then evaluate and synthesize these methods based on statistical principles such as (1) randomization, (2) conditioning on sufficient statistics, and (3) identifying whether the method is based on testing the genotype-phenotype covariance (conditional upon familial information) and/or testing departures of the marginal distribution from the expected genotypic frequencies.  相似文献   

4.
There are two common designs for association mapping of complex diseases: case-control and family-based designs. A case-control sample is more powerful to detect genetic effects than a family-based sample that contains the same numbers of affected and unaffected persons, although additional markers may be required to control for spurious association. When family and unrelated samples are available, statistical analyses are often performed in the family and unrelated samples separately, conditioning on parental information for the former, thus resulting in reduced power. In this report, we propose a unified approach that can incorporate both family and case-control samples and, provided the additional markers are available, at the same time corrects for population stratification. We apply the principal components of a marker matrix to adjust for the effect of population stratification. This unified approach makes it unnecessary to perform a conditional analysis of the family data and is more powerful than the separate analyses of unrelated and family samples, or a meta-analysis performed by combining the results of the usual separate analyses. This property is demonstrated in both a variety of simulation models and empirical data. The proposed approach can be equally applied to the analysis of both qualitative and quantitative traits.  相似文献   

5.
Population stratification can be a serious obstacle in the analysis of genomewide association studies. We propose a method for evaluating the significance of association scores in whole-genome cohorts with stratification. Our approach is a randomization test akin to a standard permutation test. It conditions on the genotype matrix and thus takes into account not only the population structure but also the complex linkage disequilibrium structure of the genome. As we show in simulation experiments, our method achieves higher power and significantly better control over false-positive rates than do existing methods. In addition, it can be easily applied to whole-genome association studies.  相似文献   

6.
Purcell S  Sham P 《Human heredity》2004,58(2):93-107
OBJECTIVE: To examine the properties of the structured association approach for the detection and correction of population stratification. METHOD: A method is developed, within a latent class analysis framework, similar to the methods proposed by Satten et al. (2001) and Pritchard et al. (2000). A series of simulations illustrate the relative impact of number and type of loci, sample size and population structure. RESULTS: The ability to detect stratification and assign individuals to population strata is determined for a number of different scenarios. CONCLUSION: The results underline the importance of careful marker selection.  相似文献   

7.

Background

The vast majority of genetic risk factors for complex diseases have, taken individually, a small effect on the end phenotype. Population-based association studies therefore need very large sample sizes to detect significant differences between affected and non-affected individuals. Including thousands of affected individuals in a study requires recruitment in numerous centers, possibly from different geographic regions. Unfortunately such a recruitment strategy is likely to complicate the study design and to generate concerns regarding population stratification.

Methodology/Principal Findings

We analyzed 9,751 individuals representing three main ethnic groups - Europeans, Arabs and South Asians - that had been enrolled from 154 centers involving 52 countries for a global case/control study of acute myocardial infarction. All individuals were genotyped at 103 candidate genes using 1,536 SNPs selected with a tagging strategy that captures most of the genetic diversity in different populations. We show that relying solely on self-reported ethnicity is not sufficient to exclude population stratification and we present additional methods to identify and correct for stratification.

Conclusions/Significance

Our results highlight the importance of carefully addressing population stratification and of carefully “cleaning” the sample prior to analyses to obtain stronger signals of association and to avoid spurious results.  相似文献   

8.
Population stratification may confound the results of genetic association studies among unrelated individuals from admixed populations. Several methods have been proposed to estimate the ancestral information in admixed populations and used to adjust the population stratification in genetic association tests. We evaluate the performances of three different methods: maximum likelihood estimation, ADMIXMAP and Structure through various simulated data sets and real data from Latino subjects participating in a genetic study of asthma. All three methods provide similar information on the accuracy of ancestral estimates and control type I error rate at an approximately similar rate. The most important factor in determining accuracy of the ancestry estimate and in minimizing type I error rate is the number of markers used to estimate ancestry. We demonstrate that approximately 100 ancestry informative markers (AIMs) are required to obtain estimates of ancestry that correlate with correlation coefficients more than 0.9 with the true individual ancestral proportions. In addition, after accounting for the ancestry information in association tests, the excess of type I error rate is controlled at the 5% level when 100 markers are used to estimate ancestry. However, since the effect of admixture on the type I error rate worsens with sample size, the accuracy of ancestry estimates also needs to increase to make the appropriate correction. Using data from the Latino subjects, we also apply these methods to an association study between body mass index and 44 AIMs. These simulations are meant to provide some practical guidelines for investigators conducting association studies in admixed populations.  相似文献   

9.
人类复杂疾病关联研究中群体分层的检出和校正   总被引:2,自引:1,他引:2  
病例对照研究是鉴定多基因疾病易感位点重要的遗传流行病学方法, 而群体分层是导致病例对照研究关联研究结果出现偏倚甚至是假关联的重要原因之一。文章对人群分层的检出及校正的方法和原理进行了阐述, 包括基于核心家系的传递/不平衡检验(TDT)以及基于不相关基因组遗传标记的基因组对照(GC)和结构化关联(SA)等, 并且对这几种方法进行了比较。  相似文献   

10.
Background: It is vital that unbiased estimates of relative survival are estimated and reported by cancer registries. A single figure of relative survival is often required to make reporting simpler. This can be obtained by pooling all ages or, more commonly, by using age-standardisation. The various methods for providing a single figure estimate of relative survival can give very different estimates. Methods: The problem is illustrated through an example using Finnish thyroid cancer data. The differences are further explored through a simulation study that investigates the effect of age on the estimates of relative survival. Results: The example highlights that in practice the all-age estimates from the various methods can be substantially different (up to 6 percentage units at 15 years of follow-up). The simulation study confirms the finding that differing estimates for the all-age estimates of relative survival are obtained. Performing age-standardisation makes the methods more comparable and results in better estimation of the true net survival. Conclusions: The all-age estimates of relative survival rarely give an appropriate estimate of net survival. We feel that modelling or stratifying by age when calculating relative survival is vitally important as the lack of homogeneity in the cohort of patients leads to potentially biased estimates. We feel that the methods using modelling provide a greater flexibility than life-table based approaches. The flexible parametric approach does not require an arbitrary splitting of the time-scale, which makes it more computationally efficient. It also has the advantage of easily being extended to incorporate time-dependent effects.  相似文献   

11.
曹宗富  马传香  王雷  蔡斌 《遗传》2010,32(9):921-928
在复杂疾病的全基因组关联研究中,人群分层现象会增加结果的假阳性率,因此考虑人群遗传结构、控制人群分层是很有必要的。而在人群分层研究中,使用随机选择的SNP的效果还有待进一步探讨。文章利用HapMap Phase2人群中无关个体的Affymetrix SNP 6.0芯片分型数据,在全基因组上随机均匀选择不同数量的SNP,同时利用f值和Fisher精确检验方法筛选祖先信息标记(Ancestry Informative Markers,AIMs)。然后利用HapMap Phase3中的无关个体的数据,以F-statistics和STRUCTURE分析两种方法评估所选出的不同SNP组合对人群的区分效果。研究发现,随机均匀分布于全基因组的SNP可用于识别人群内部存在的遗传结构。文章进一步提示,在全基因组关联研究中,当没有针对特定人群的AIMs时,可在全基因组上随机选择3000以上均匀分布的SNP来控制人群分层。  相似文献   

12.
A tutorial on statistical methods for population association studies   总被引:14,自引:0,他引:14  
Although genetic association studies have been with us for many years, even for the simplest analyses there is little consensus on the most appropriate statistical procedures. Here I give an overview of statistical approaches to population association studies, including preliminary analyses (Hardy-Weinberg equilibrium testing, inference of phase and missing data, and SNP tagging), and single-SNP and multipoint tests for association. My goal is to outline the key methods with a brief discussion of problems (population structure and multiple testing), avenues for solutions and some ongoing developments.  相似文献   

13.
A novel approach for association testing in the presence of population stratification has been introduced by Pritchard et al. (2000a) and Pritchard et al. (2000b). The structured association approach is a two-tiered procedure that first estimates the population structure and then tests the null hypothesis H0: 'no association within subpopulations' in the second step. A power comparison of the stratified test for association (STRAT) (Pritchard et al., 2000b) and the Transmission-Disequilibrium-Test (TDT) (Spielman and Ewens, 1993a) in a simulation framework showed superiority of STRAT if allele frequencies or associations between allele and disease differ strongly in subpopulations. In more homogeneous situations, the TDT had greater power than STRAT. However, the TDT, based on family trios,that uses population controls, needs 50% more genotyping compared to STRAT. The Sib-Transmission-Disequilibrium-Test (S-TDT) needs the same amount of genotyping since it relays in its minimal configuration on pairs of siblings. This raises the question how the S-TDT (Spielman and Ewens, 1998a) performs compared to the population based methods STRAT and Genomic Controls (GC). In this paper, we present a simulation study accounting for two different models of population stratification in different settings of allele frequencies and under different risk models. The results showed that under a discrete as well as under an admixed population model, STRAT strongly outperformed the S-TDT and the GC when different alleles were associated in different subpopulations. In contrast, the S-TDT had greater power than STRAT when the same allele was associated in both subpopulations. Here, the GC was sometimes even more powerful than the S-TDT, depending on the population model and the allele frequency differences. A general recommendation for the use of one of the tests can therefore not be given.  相似文献   

14.
Population stratification is a form of confounding by ethnicity that may cause bias to effect estimates and inflate test statistics in genetic association studies. Unlinked genetic markers have been used to adjust for test statistics, but their use in correcting biased effect estimates has not been addressed. We evaluated the potential of bias correction that could be achieved by a single null marker (M) in studies involving one candidate gene (G). When the distribution of M varied greatly across ethnicities, controlling for M in a logistic regression model substantially reduced biases on odds ratio estimates. When M had same distributions as G across ethnicities, biases were further reduced or eliminated by subtracting the regression coefficient of M from the coefficient of G in the model, which was fitted either with or without a multiplicative interaction term between M and G. Correction of bias due to population stratification depended specifically on the distributions of G and M, the difference between baseline disease risks across ethnicities, and whether G had an effect on disease risk or not. Our results suggested that marker choice and the specific treatment of that marker in analysis greatly influenced bias correction.  相似文献   

15.
We examine the issue of population stratification in association-mapping studies. In case-control studies of association, population subdivision or recent admixture of populations can lead to spurious associations between a phenotype and unlinked candidate loci. Using a model of sampling from a structured population, we show that if population stratification exists, it can be detected by use of unlinked marker loci. We show that the case-control-study design, using unrelated control individuals, is a valid approach for association mapping, provided that marker loci unlinked to the candidate locus are included in the study, to test for stratification. We suggest guidelines as to the number of unlinked marker loci to use.  相似文献   

16.
Summary The objective of this study was to compare several selection procedures with respect to expected genetic gain in the population hybrid across a range of initial allelic frequencies, degrees of dominance, and environmental variances. The methods compared were intrapopulation recurrent selection using full-sib or S1 families, full-sib and two half-sib reciprocal recurrent selection procedures, and convergent improvement applied to populations. Comparisons were made by calculating expected allelic frequency changes for each method. The optimal selection method for a given set of allelic frequencies and degree of dominance depended little on the environmental variance. Partly because of its short cycle, full-sib intrapopulation selection was the most effective method for the majority of allelic frequency combinations when the degree of dominance was small and an off-season nursery could be used to make recombinations. With larger values for the degree of dominance, S1 and reciprocal full-sib methods became optimal, the former method especially when favorable alleles had a high frequency and the latter when populations were highly divergent. When off-season nursery use was restricted to making self-pollinations or was absent, S1 selection was optimal for the majority of allelic frequency combinations. Convergent improvement was superior only for extremely divergent allelic frequencies and then only when the degree of dominance was less than 0.10. Half-sib reciprocal methods were never optimal, although the gain for the standard half-sib reciprocal procedure differed little from that of full-sib reciprocal selection when the degree of dominance was 0.75.  相似文献   

17.

Background

Thirty-two common variants associated with body mass index (BMI) have been identified in genome-wide association studies, explaining ∼1.45% of BMI variation in general population cohorts. We performed a genome-wide association study in a sample of young adults enriched for extremely overweight individuals. We aimed to identify new loci associated with BMI and to ascertain whether using an extreme sampling design would identify the variants known to be associated with BMI in general populations.

Methodology/Principal Findings

From two large Danish cohorts we selected all extremely overweight young men and women (n = 2,633), and equal numbers of population-based controls (n = 2,740, drawn randomly from the same populations as the extremes, representing ∼212,000 individuals). We followed up novel (at the time of the study) association signals (p<0.001) from the discovery cohort in a genome-wide study of 5,846 Europeans, before attempting to replicate the most strongly associated 28 SNPs in an independent sample of Danish individuals (n = 20,917) and a population-based cohort of 15-year-old British adolescents (n = 2,418). Our discovery analysis identified SNPs at three loci known to be associated with BMI with genome-wide confidence (P<5×10−8; FTO, MC4R and FAIM2). We also found strong evidence of association at the known TMEM18, GNPDA2, SEC16B, TFAP2B, SH2B1 and KCTD15 loci (p<0.001), and nominal association (p<0.05) at a further 8 loci known to be associated with BMI. However, meta-analyses of our discovery and replication cohorts identified no novel associations.

Significance

Our results indicate that the detectable genetic variation associated with extreme overweight is very similar to that previously found for general BMI. This suggests that population-based study designs with enriched sampling of individuals with the extreme phenotype may be an efficient method for identifying common variants that influence quantitative traits and a valid alternative to genotyping all individuals in large population-based studies, which may require tens of thousands of subjects to achieve similar power.  相似文献   

18.

Background

In canine genetics, the impact of population structure on whole genome association studies is typically addressed by sampling approximately equal numbers of cases and controls from dogs of a single breed, usually from the same country or geographic area. However one way to increase the power of genetic studies is to sample individuals of the same breed but from different geographic areas, with the expectation that independent meiotic events will have shortened the presumed ancestral haplotype around the mutation differently. Little is known, however, about genetic variation among dogs of the same breed collected from different geographic regions.

Methodology/Principal Findings

In this report, we address the magnitude and impact of genetic diversity among common breeds sampled in the U.S. and Europe. The breeds selected, including the Rottweiler, Bernese mountain dog, flat-coated retriever, and golden retriever, share susceptibility to a class of soft tissue cancers typified by malignant histiocytosis in the Bernese mountain dog. We genotyped 722 SNPs at four unlinked loci (between 95 and 271 per locus) on canine chromosome 1 (CFA1). We showed that each population is characterized by distinct genetic diversity that can be correlated with breed history. When the breed studied has a reduced intra-breed diversity, the combination of dogs from international locations does not increase the rate of false positives and potentially increases the power of association studies. However, over-sampling cases from one geographic location is more likely to lead to false positive results in breeds with significant genetic diversity.

Conclusions

These data provide new guidelines for association studies using purebred dogs that take into account population structure.  相似文献   

19.
We have developed a robust microarray genotyping chip that will help advance studies in genetic epidemiology. In population-based genetic association studies of complex disease, there could be hidden genetic substructure in the study populations, resulting in false-positive associations. Such population stratification may confound efforts to identify true associations between genotype/haplotype and phenotype. Methods relying on genotyping additional null single nucleotide polymorphism (SNP) markers have been proposed, such as genomic control (GC) and structured association (SA), to correct association tests for population stratification. If there is an association of a disease with null SNPs, this suggests that there is a population subset with different genetic background plus different disease susceptibility. Genotyping over 100 null SNPs in the large numbers of patient and control DNA samples that are required in genetic association studies can be prohibitively expensive. We have therefore developed and tested a resequencing chip based on arrayed primer extension (APEX) from over 2000 DNA probe features that facilitate multiple interrogations of each SNP, providing a powerful, accurate, and economical means to simultaneously determine the genotypes at 110 null SNP loci in any individual. Based on 1141 known genotypes from other research groups, our GC SNP chip has an accuracy of 98.5%, including non-calls.  相似文献   

20.
Sensitivity and specificity are common measures of the accuracy of a diagnostic test. The usual estimators of these quantities are unbiased if data on the diagnostic test result and the true disease status are obtained from all subjects in an appropriately selected sample. In some studies, verification of the true disease status is performed only for a subset of subjects, possibly depending on the result of the diagnostic test and other characteristics of the subjects. Estimators of sensitivity and specificity based on this subset of subjects are typically biased; this is known as verification bias. Methods have been proposed to correct verification bias under the assumption that the missing data on disease status are missing at random (MAR), that is, the probability of missingness depends on the true (missing) disease status only through the test result and observed covariate information. When some of the covariates are continuous, or the number of covariates is relatively large, the existing methods require parametric models for the probability of disease or the probability of verification (given the test result and covariates), and hence are subject to model misspecification. We propose a new method for correcting verification bias based on the propensity score, defined as the predicted probability of verification given the test result and observed covariates. This is estimated separately for those with positive and negative test results. The new method classifies the verified sample into several subsamples that have homogeneous propensity scores and allows correction for verification bias. Simulation studies demonstrate that the new estimators are more robust to model misspecification than existing methods, but still perform well when the models for the probability of disease and probability of verification are correctly specified.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号