首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 812 毫秒
1.
Although variations in allele frequencies at common SNPs have been extensively studied in different populations, little is known about the stratification of rare variants and its impact on association tests. In this paper, we used Affymetrix 500K genotype data from the WTCCC to investigate if variants in three different frequency categories (below 1%, between 1 and 5%, above 5%) show different stratification patterns in the UK population. We found that these patterns are indeed different. The top principal component extracted from the rare variant category shows poor correlations with any principal component or combination of principal components from the low frequency or common variant categories. These results could suggest that a suitable solution to avoid false positive association due to population stratification would involve adjusting for the respective PCs when testing for variants in different allele frequency categories. However, we found this was not the case both on type 2 diabetes data and on simulated data. Indeed, adjusting rare variant association tests on PCs derived from rare variants does no better to correct for population stratification than adjusting on PCs derived from more common variants. Mixed models perform slightly better for low frequency variants than PC based adjustments but less well for the rarest variants. These results call for the need of new methodological developments specifically devoted to address rare variant stratification issues in association tests.  相似文献   

2.
Studying genomic patterns of human population structure provides important insights into human evolutionary history and the relationship among populations, and it has significant practical implications for disease-gene mapping. Here we describe a principal component (PC)-based approach to studying intracontinental population structure in humans, identify the underlying markers mediating the observed patterns of fine-scale population structure, and infer the predominating evolutionary forces shaping local population structure. We applied this methodology to a data set of 650K SNPs genotyped in 944 unrelated individuals from 52 populations and demonstrate that, although typical PC analyses focus on the top axes of variation, substantial information about population structure is contained in lower-ranked PCs. We identified 18 significant PCs, some of which distinguish individual populations. In addition to visually representing sample clusters in PC biplots, we estimated the set of all SNPs significantly correlated with each of the most informative axes of variation. These polymorphisms, unlike ancestry-informative markers (AIMs), constitute a much larger set of loci that drive genomic signatures of population structure. The genome-wide distribution of these significantly correlated markers can largely be accounted for by the stochastic effects of genetic drift, although significant clustering does occur in genomic regions that have been previously implicated as targets of recent adaptive evolution.  相似文献   

3.
Bouaziz M  Ambroise C  Guedj M 《PloS one》2011,6(12):e28845
Genome-Wide Association Studies are powerful tools to detect genetic variants associated with diseases. Their results have, however, been questioned, in part because of the bias induced by population stratification. This is a consequence of systematic differences in allele frequencies due to the difference in sample ancestries that can lead to both false positive or false negative findings. Many strategies are available to account for stratification but their performances differ, for instance according to the type of population structure, the disease susceptibility locus minor allele frequency, the degree of sampling imbalanced, or the sample size. We focus on the type of population structure and propose a comparison of the most commonly used methods to deal with stratification that are the Genomic Control, Principal Component based methods such as implemented in Eigenstrat, adjusted Regressions and Meta-Analyses strategies. Our assessment of the methods is based on a large simulation study, involving several scenarios corresponding to many types of population structures. We focused on both false positive rate and power to determine which methods perform the best. Our analysis showed that if there is no population structure, none of the tests led to a bias nor decreased the power except for the Meta-Analyses. When the population is stratified, adjusted Logistic Regressions and Eigenstrat are the best solutions to account for stratification even though only the Logistic Regressions are able to constantly maintain correct false positive rates. This study provides more details about these methods. Their advantages and limitations in different stratification scenarios are highlighted in order to propose practical guidelines to account for population stratification in Genome-Wide Association Studies.  相似文献   

4.
Zhang F  Wang Y  Deng HW 《PloS one》2008,3(10):e3392
Population stratification can cause spurious associations in population-based association studies. Several statistical methods have been proposed to reduce the impact of population stratification on population-based association studies. We simulated a set of stratified populations based on the real haplotype data from the HapMap ENCODE project, and compared the relative power, type I error rates, accuracy and positive prediction value of four prevailing population-based association study methods: traditional case-control tests, structured association (SA), genomic control (GC) and principal components analysis (PCA) under various population stratification levels. Additionally, we evaluated the effects of sample sizes and frequencies of disease susceptible allele on the performance of the four analytical methods in the presence of population stratification. We found that the performance of PCA was very stable under various scenarios. Our comparison results suggest that SA and PCA have comparable performance, if sufficient ancestral informative markers are used in SA analysis. GC appeared to be strongly conservative in significantly stratified populations. It may be better to apply GC in the stratified populations with low stratification level. Our study intends to provide a practical guideline for researchers to select proper study methods and make appropriate inference of the results in population-based association studies.  相似文献   

5.
Principal components analysis of population admixture   总被引:1,自引:0,他引:1  
J Ma  CI Amos 《PloS one》2012,7(7):e40115
With the availability of high-density genotype information, principal components analysis (PCA) is now routinely used to detect and quantify the genetic structure of populations in both population genetics and genetic epidemiology. An important issue is how to make appropriate and correct inferences about population relationships from the results of PCA, especially when admixed individuals are included in the analysis. We extend our recently developed theoretical formulation of PCA to allow for admixed populations. Because the sampled individuals are treated as features, our generalized formulation of PCA directly relates the pattern of the scatter plot of the top eigenvectors to the admixture proportions and parameters reflecting the population relationships, and thus can provide valuable guidance on how to properly interpret the results of PCA in practice. Using our formulation, we theoretically justify the diagnostic of two-way admixture. More importantly, our theoretical investigations based on the proposed formulation yield a diagnostic of multi-way admixture. For instance, we found that admixed individuals with three parental populations are distributed inside the triangle formed by their parental populations and divide the triangle into three smaller triangles whose areas have the same proportions in the big triangle as the corresponding admixture proportions. We tested and illustrated these findings using simulated data and data from HapMap III and the Human Genome Diversity Project.  相似文献   

6.
The scatter plot is a well known and easily applicable graphical tool to explore relationships between two quantitative variables. For the exploration of relations between multiple variables, generalisations of the scatter plot are useful. We present an overview of multivariate scatter plots focussing on the following situations. Firstly, we look at a scatter plot for portraying relations between quantitative variables within one data matrix. Secondly, we discuss a similar plot for the case of qualitative variables. Thirdly, we describe scatter plots for the relationships between two sets of variables where we focus on correlations. Finally, we treat plots of the relationships between multiple response and predictor variables, focussing on the matrix of regression coefficients. We will present both known and new results, where an important original contribution concerns a procedure for the inclusion of scales for the variables in multivariate scatter plots. We provide software for drawing such scales. We illustrate the construction and interpretation of the plots by means of examples on data collected in a genomic research program on taste in tomato.  相似文献   

7.
Allele substitution effects at quantitative trait loci (QTL) are part of the basis of quantitative genetics theory and applications such as association analysis and genomic prediction. In the presence of nonadditive functional gene action, substitution effects are not constant across populations. We develop an original approach to model the difference in substitution effects across populations as a first order Taylor series expansion from a “focal” population. This expansion involves the difference in allele frequencies and second-order statistical effects (additive by additive and dominance). The change in allele frequencies is a function of relationships (or genetic distances) across populations. As a result, it is possible to estimate the correlation of substitution effects across two populations using three elements: magnitudes of additive, dominance, and additive by additive variances; relationships (Nei’s minimum distances or Fst indexes); and assumed heterozygosities. Similarly, the theory applies as well to distinct generations in a population, in which case the distance across generations is a function of increase of inbreeding. Simulation results confirmed our derivations. Slight biases were observed, depending on the nonadditive mechanism and the reference allele. Our derivations are useful to understand and forecast the possibility of prediction across populations and the similarity of GWAS effects.  相似文献   

8.
In the United States, asthma prevalence and mortality are the highest among Puerto Ricans and the lowest among Mexicans. Case-control association studies are a powerful strategy for identifying genes of modest effect in complex diseases. However, studies of complex disorders in admixed populations such as Latinos may be confounded by population stratification. We used ancestry informative markers (AIMs) to identify and correct for population stratification among Mexican and Puerto Rican subjects participating in case-control studies of asthma. Three hundred and sixty-two subjects with asthma (Mexican: 181, Puerto Rican: 181) and 359 ethnically matched controls (Mexican: 181, Puerto Rican: 178) were genotyped for 44 AIMs. We observed a greater than expected degree of association between pairs of AIMs on different chromosomes in Mexicans (P < 0.00001) and Puerto Ricans (P < 0.00002) providing evidence for population substructure and/or recent admixture. To assess the effect of population stratification on association studies of asthma, we measured differences in genetic background of cases and controls by comparing allele frequencies of the 44 AIMs. Among Puerto Ricans but not in Mexicans, we observed a significant overall difference in allele frequencies between cases and controls (P = 0.0002); of 44 AIMs tested, 8 (18%) were significantly associated with asthma. However, after adjustment for individual ancestry, only two of these markers remained significantly associated with the disease. Our findings suggest that empirical assessment of the effects of stratification is critical to appropriately interpret the results of case-control studies in admixed populations.  相似文献   

9.
大蒜种质产量和品质性状主成分聚类分析与综合评价   总被引:3,自引:0,他引:3  
以40个大蒜品种为供试材料,依据数值分类学的性状选择原则,分别于大蒜生长期和采收后进行农艺性状指标的采集。估算40个大蒜品种16个农艺性状及4个品质指标的主成分,并以前3个主成分和遗传相似性系数为基础,分别作二维散点图和系统聚类分析。40份大蒜品种前7个主成分累计贡献率达85%。根据品种性状主成分表现,评选出性状优良的大蒜品种共10个。在聚类图中,在0.14的遗传相似性水平上可以把40份品种分成4类,即由5份种质组成的类群Ⅰ;由28份种质聚成的类群Ⅱ;由改良蒜等4份种质组成的类群Ⅲ,及苏联蒜等3份种质组成的类群Ⅳ。全部种质的遗传相似性系数在0.07~0.64之间,很好地揭示了品种类群间存在的亲缘关系。  相似文献   

10.
小麦农艺性状与品质特性的多元分析与评价   总被引:16,自引:0,他引:16  
估算96个小麦品种(系)的11个农艺性状和10个品质特性参数的主成分,并以主成分和欧氏距离为基础,分别作二维排序分析和聚类分析。农艺性状的前4个主成分反映了85.3450%的原始数据信息量;品质特性的前4个主成分代表了89.1483%的原始数据信息量。以96个材料的主成分得分绘制二维排序图,27个小麦品种(系)表现为矮秆、子粒和旗叶较大,丰产性较好、综合农艺性状优良;32个小麦品种(系)表现为铁、锌含量较高,加工品质较好、综合品质特性优良。在系统聚类图中,农艺性状和品质特性分别被聚成5类。综合农艺性状较好的材料主要集中在第Ⅲ类和第Ⅳ类;综合品质特性较好的材料主要集中在第Ⅰ类和第Ⅱ类。综合分析发现,同时兼顾丰产性较好且子粒铁、锌含量较高,品质特性较好的小麦品种(系)有:泰山9818、西农822、轮选719、杨-31、西安837和中育9383。将聚类分析和二维排序分析结合起来,能较好的对小麦的性状组成做出综合评价,鉴定和评价出优质、高产、综合性状优良的小麦品种(系),为小麦遗传育种提供优良的种质资源,为合理选配亲本提供参考。  相似文献   

11.
Population stratification is a potential problem for genome-wide association studies (GWAS), confounding results and causing spurious associations. Hence, understanding how allele frequencies vary across geographic regions or among subpopulations is an important prelude to analyzing GWAS data. Using over 350,000 genome-wide autosomal SNPs in over 6000 Han Chinese samples from ten provinces of China, our study revealed a one-dimensional “north-south” population structure and a close correlation between geography and the genetic structure of the Han Chinese. The north-south population structure is consistent with the historical migration pattern of the Han Chinese population. Metropolitan cities in China were, however, more diffused “outliers,” probably because of the impact of modern migration of peoples. At a very local scale within the Guangdong province, we observed evidence of population structure among dialect groups, probably on account of endogamy within these dialects. Via simulation, we show that empirical levels of population structure observed across modern China can cause spurious associations in GWAS if not properly handled. In the Han Chinese, geographic matching is a good proxy for genetic matching, particularly in validation and candidate-gene studies in which population stratification cannot be directly accessed and accounted for because of the lack of genome-wide data, with the exception of the metropolitan cities, where geographical location is no longer a good indicator of ancestral origin. Our findings are important for designing GWAS in the Chinese population, an activity that is expected to intensify greatly in the near future.  相似文献   

12.

Background

Here we present convergent methodologies using theoretical calculations, empirical assessment on in-house and publicly available datasets as well as in silico simulations, that validate a panel of SNPs for a variety of necessary tasks in human genetics disease research before resources are committed to larger-scale genotyping studies on those samples. While large-scale well-funded human genetic studies routinely have up to a million SNP genotypes, samples in a human genetics laboratory that are not yet part of such studies may be productively utilized in pilot projects or as part of targeted follow-up work though such smaller scale applications require at least some genome-wide genotype data for quality control purposes such as DNA “barcoding” to detect swaps or contamination issues, determining familial relationships between samples and correcting biases due to population effects such as population stratification in pilot studies.

Principal Findings

Empirical performance in classification of relative types for any two given DNA samples (e.g., full siblings, parental, etc) indicated that for outbred populations the panel performs sufficiently to classify relationship in extended families and therefore also for smaller structures such as trios and for twin zygosity testing. Additionally, familial relationships do not significantly diminish the (mean match) probability of sharing SNP genotypes in pedigrees, further indicating the uniqueness of the “barcode.” Simulation using these SNPs for an African American case-control disease association study demonstrated that population stratification, even in complex admixed samples, can be adequately corrected under a range of disease models using the SNP panel.

Conclusion

The panel has been validated for use in a variety of human disease genetics research tasks including sample barcoding, relationship verification, population substructure detection and statistical correction. Given the ease of genotyping our specific assay contained herein, this panel represents a useful and economical panel for human geneticists.  相似文献   

13.
This study addresses the issue of appropriate allelic frequency estimates in epidemiological studies. Reasons for imprecise estimate of allele frequency may be population stratification, and lack of power of many published studies to define true allele frequencies in the general population. As an example of the lack of power of epidemiological studies, we plot the frequency of GSTM1 deletion versus sample size for the 79 studies from the GSEC pooled analysis. The estimate of allele frequency derived from small groups of controls deviates more from the true frequency than the estimate derived from larger studies. We discuss the possible consequences of not properly defining allele frequencies in the population. This may reflect on the conduct of association studies, on assessment of the effects of multigenic mechanisms, and on the determination of genetic diversity.  相似文献   

14.
Despite the popularity of discriminant analysis of principal components (DAPC) for studying population structure, there has been little discussion of best practice for this method. In this work, I provide guidelines for standardizing the application of DAPC to genotype data sets. An often overlooked fact is that DAPC generates a model describing genetic differences among a set of populations defined by a researcher. Appropriate parameterization of this model is critical for obtaining biologically meaningful results. I show that the number of leading PC axes used as predictors of among-population differences, paxes, should not exceed the k−1 biologically informative PC axes that are expected for k effective populations in a genotype data set. This k−1 criterion for paxes specification is more appropriate compared to the widely used proportional variance criterion, which often results in a choice of paxesk−1. DAPC parameterized with no more than the leading k−1 PC axes: (i) is more parsimonious; (ii) captures maximal among-population variation on biologically relevant predictors; (iii) is less sensitive to unintended interpretations of population structure; and (iv) is more generally applicable to independent sample sets. Assessing model fit should be routine practice and aids interpretation of population structure. It is imperative that researchers articulate their study goals, that is, testing a priori expectations vs. studying de novo inferred populations, because this has implications on how their DAPC results should be interpreted. The discussion and practical recommendations in this work provide the molecular ecology community with a roadmap for using DAPC in population genetic investigations.  相似文献   

15.
Population genetic signatures of local adaptation are frequently investigated by identifying loci with allele frequencies that exhibit high correlation with ecological variables. One difficulty with this approach is that ecological associations might be confounded by geographic variation at selectively neutral loci. Here, we consider populations that underwent spatial expansion from their original range, and for which geographical variation of adaptive allele frequency coincides with habitat gradients. Using range expansion simulations, we asked whether our ability to detect genomic regions involved in adaptation could be impacted by the orientation of the ecological gradients. For three ecological association methods tested, we found, counter-intuitively, fewer false-positive associations when ecological gradients aligned along the main axis of expansion than when they aligned along any other direction. This result has important consequences for the analysis of genomic data under non-equilibrium population genetic models. Alignment of gradients with expansion axes is likely to be common in scenarios in which expanding species track their ecological niche during climate change while adapting to changing environments at their rear edge.  相似文献   

16.
Evolution of Haplotypes at the DRD2 Locus   总被引:4,自引:0,他引:4       下载免费PDF全文
We present here the first evolutionary perspective on haplotypes at DRD2, the locus for the dopamine D2 receptor. The dopamine D2 receptor plays a critical role in the functioning of many neural circuits in the human brain. If functionally relevant variation at the DRD2 locus exists, understanding the evolution of haplotypes on the basis of polymorphic sites encompassing the gene should provide a powerful framework for identifying that variation. Three DRD2 polymorphisms (TaqI “A” and “B” RFLPs and the (CA)n short tandem repeat polymorphism) encompassing the coding sequences have been studied in 15 populations; these markers are polymorphic in all the populations studied, and they display strong and significant linkage disequilibria with each other. The common haplotypes for the two TaqI RFLPs are separately derived from the ancestral haplotype but predate the spread of modern humans around the world. The knowledge of how the various haplotypes have evolved, the allele frequencies of the haplotypes in human populations, and the physical relationships of the polymorphisms to each other and to the functional parts of the gene should now allow proper design and interpretation of association studies.  相似文献   

17.
Ewens WJ  Li M  Spielman RS 《PLoS genetics》2008,4(9):e1000180
Quantitative trait transmission/disequilibrium tests (quantitative TDTs) are commonly used in family-based genetic association studies of quantitative traits. Despite the availability of various quantitative TDTs, some users are not aware of the properties of these tests and the relationships between them. This review aims at outlining the broad features of the various quantitative TDT procedures carried out in the frequently used QTDT and FBAT packages. Specifically, we discuss the “Rabinowitz” and the “Monks-Kaplan” procedures, as well as the various “Abecasis” and “Allison” regression-based procedures. We focus on the models assumed in these tests and the relationships between them. Moreover, we discuss what hypotheses are tested by the various quantitative TDTs, what testing procedures are best suited to various forms of data, and whether the regression-based tests overcome population stratification problems. Finally, we comment on power considerations in the choice of the test to be used. We hope this brief review will shed light on the similarities and differences of the various quantitative TDTs.  相似文献   

18.
Population stratification remains an important issue in case-control studies of disease-marker association, even within populations considered to be genetically homogeneous. Campbell et al. (Nature Genetics 2005;37:868-872) illustrated this by showing that stratification induced a spurious association between the lactase gene (LCT) and tall/short status in a European American sample. Furthermore, existing approaches for controlling stratification by use of substructure-informative loci (e.g., genomic control, structured association, and principal components) could not resolve this confounding. To address this problem, we propose a simple two-step procedure. In the first step, we model the odds of disease, given data on substructure-informative loci (excluding the test locus). For each participant, we use this model to calculate a stratification score, which is that participant's estimated odds of disease calculated using his or her substructure-informative-loci data in the disease-odds model. In the second step, we assign subjects to strata defined by stratification score and then test for association between the disease and the test locus within these strata. The resulting association test is valid even in the presence of population stratification. Our approach is computationally simple and less model dependent than are existing approaches for controlling stratification. To illustrate these properties, we apply our approach to the data from Campbell et al. and find no association between the LCT locus and tall/short status. Using simulated data, we show that our approach yields a more appropriate correction for stratification than does principal components or genomic control.  相似文献   

19.
Copy number variants (CNVs) contribute to human genetic and phenotypic diversity. However, the distribution of larger CNVs in the general population remains largely unexplored. We identify large variants in ~2500 individuals by using Illumina SNP data, with an emphasis on “hotspots” prone to recurrent mutations. We find variants larger than 500 kb in 5%–10% of individuals and variants greater than 1 Mb in 1%–2%. In contrast to previous studies, we find limited evidence for stratification of CNVs in geographically distinct human populations. Importantly, our sample size permits a robust distinction between truly rare and polymorphic but low-frequency copy number variation. We find that a significant fraction of individual CNVs larger than 100 kb are rare and that both gene density and size are strongly anticorrelated with allele frequency. Thus, although large CNVs commonly exist in normal individuals, which suggests that size alone can not be used as a predictor of pathogenicity, such variation is generally deleterious. Considering these observations, we combine our data with published CNVs from more than 12,000 individuals contrasting control and neurological disease collections. This analysis identifies known disease loci and highlights additional CNVs (e.g., 3q29, 16p12, and 15q25.2) for further investigation. This study provides one of the first analyses of large, rare (0.1%–1%) CNVs in the general population, with insights relevant to future analyses of genetic disease.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号