首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 763 毫秒
1.
To date, most genome-wide association studies (GWAS) and studies of fine-scale population structure have been conducted primarily on Europeans. Han Chinese, the largest ethnic group in the world, composing 20% of the entire global human population, is largely underrepresented in such studies. A well-recognized challenge is the fact that population structure can cause spurious associations in GWAS. In this study, we examined population substructures in a diverse set of over 1700 Han Chinese samples collected from 26 regions across China, each genotyped at ∼160K single-nucleotide polymorphisms (SNPs). Our results showed that the Han Chinese population is intricately substructured, with the main observed clusters corresponding roughly to northern Han, central Han, and southern Han. However, simulated case-control studies showed that genetic differentiation among these clusters, although very small (FST = 0.0002 ∼0.0009), is sufficient to lead to an inflated rate of false-positive results even when the sample size is moderate. The top two SNPs with the greatest frequency differences between the northern Han and southern Han clusters (FST > 0.06) were found in the FADS2 gene, which associates with the fatty acid composition in phospholipids, and in the HLA complex P5 gene (HCP5), which associates with HIV infection, psoriasis, and psoriatic arthritis. Ingenuity Pathway Analysis (IPA) showed that most differentiated genes among clusters are involved in cardiac arteriopathy (p < 10−101). These signals indicating significant differences among Han Chinese subpopulations should be carefully explained in case they are also detected in association studies, especially when sample sources are diverse.  相似文献   

2.
Jiang N  Wang M  Jia T  Wang L  Leach L  Hackett C  Marshall D  Luo Z 《PloS one》2011,6(8):e23192

Background

It has been well established that theoretical kernel for recently surging genome-wide association study (GWAS) is statistical inference of linkage disequilibrium (LD) between a tested genetic marker and a putative locus affecting a disease trait. However, LD analysis is vulnerable to several confounding factors of which population stratification is the most prominent. Whilst many methods have been proposed to correct for the influence either through predicting the structure parameters or correcting inflation in the test statistic due to the stratification, these may not be feasible or may impose further statistical problems in practical implementation.

Methodology

We propose here a novel statistical method to control spurious LD in GWAS from population structure by incorporating a control marker into testing for significance of genetic association of a polymorphic marker with phenotypic variation of a complex trait. The method avoids the need of structure prediction which may be infeasible or inadequate in practice and accounts properly for a varying effect of population stratification on different regions of the genome under study. Utility and statistical properties of the new method were tested through an intensive computer simulation study and an association-based genome-wide mapping of expression quantitative trait loci in genetically divergent human populations.

Results/Conclusions

The analyses show that the new method confers an improved statistical power for detecting genuine genetic association in subpopulations and an effective control of spurious associations stemmed from population structure when compared with other two popularly implemented methods in the literature of GWAS.  相似文献   

3.
Cui B  Zhu X  Xu M  Guo T  Zhu D  Chen G  Li X  Xu L  Bi Y  Chen Y  Xu Y  Li X  Wang W  Wang H  Huang W  Ning G 《PloS one》2011,6(7):e22353

Background

Genome-wide association study (GWAS) has identified more than 30 loci associated with type 2 diabetes (T2D) in Caucasians. However, genomic understanding of T2D in Asians, especially Han Chinese, is still limited.

Methods and Principal Findings

A two-stage GWAS was performed in Han Chinese from Mainland China. The discovery stage included 793 T2D cases and 806 healthy controls genotyped using Illumina Human 660- and 610-Quad BeadChips; and the replication stage included two independent case-control populations (a total of 4445 T2D cases and 4458 controls) genotyped using TaqMan assay. We validated the associations of KCNQ1 (rs163182, p = 2.085×10−17, OR 1.28) and C2CD4A/B (rs1370176, p = 3.677×10−4, OR 1.124; rs1436953, p = 7.753×10−6, OR 1.141; rs7172432, p = 4.001×10−5, OR 1.134) in Han Chinese.

Conclusions and Significance

Our study represents the first GWAS of T2D with both discovery and replication sample sets recruited from Han Chinese men and women residing in Mainland China. We confirmed the associations of KCNQ1 and C2CD4A/B with T2D, with the latter for the first time being examined in Han Chinese. Arguably, eight more independent loci were replicated in our GWAS.  相似文献   

4.
Although genome-wide association studies (GWASs) have discovered numerous novel genetic variants associated with many complex traits and diseases, those genetic variants typically explain only a small fraction of phenotypic variance. Factors that account for phenotypic variance include environmental factors and gene-by-environment interactions (GEIs). Recently, several studies have conducted genome-wide gene-by-environment association analyses and demonstrated important roles of GEIs in complex traits. One of the main challenges in these association studies is to control effects of population structure that may cause spurious associations. Many studies have analyzed how population structure influences statistics of genetic variants and developed several statistical approaches to correct for population structure. However, the impact of population structure on GEI statistics in GWASs has not been extensively studied and nor have there been methods designed to correct for population structure on GEI statistics. In this paper, we show both analytically and empirically that population structure may cause spurious GEIs and use both simulation and two GWAS datasets to support our finding. We propose a statistical approach based on mixed models to account for population structure on GEI statistics. We find that our approach effectively controls population structure on statistics for GEIs as well as for genetic variants.  相似文献   

5.
Unaccounted population stratification can lead to spurious associations in genome-wide association studies (GWAS) and in this context several methods have been proposed to deal with this problem. An alternative line of research uses whole-genome random regression (WGRR) models that fit all markers simultaneously. Important objectives in WGRR studies are to estimate the proportion of variance accounted for by the markers, the effect of individual markers, prediction of genetic values for complex traits, and prediction of genetic risk of diseases. Proposals to account for stratification in this context are unsatisfactory. Here we address this problem and describe a reparameterization of a WGRR model, based on an eigenvalue decomposition, for simultaneous inference of parameters and unobserved population structure. This allows estimation of genomic parameters with and without inclusion of marker-derived eigenvectors that account for stratification. The method is illustrated with grain yield in wheat typed for 1279 genetic markers, and with height, HDL cholesterol and systolic blood pressure from the British 1958 cohort study typed for 1 million SNP genotypes. Both sets of data show signs of population structure but with different consequences on inferences. The method is compared to an advocated approach consisting of including eigenvectors as fixed-effect covariates in a WGRR model. We show that this approach, used in the context of WGRR models, is ill posed and illustrate the advantages of the proposed model. In summary, our method permits a unified approach to the study of population structure and inference of parameters, is computationally efficient, and is easy to implement.  相似文献   

6.
Zhou L  Ding H  Zhang X  He M  Huang S  Xu Y  Shi Y  Cui G  Cheng L  Wang QK  Hu FB  Wang D  Wu T 《PloS one》2011,6(11):e27481

Background

Recent genome-wide association studies (GWAS) have mapped several novel loci influencing blood lipid levels in Caucasians. We sought to explore whether the genetic variants at newly identified lipid-associated loci were associated with CHD susceptibility in a Chinese Han population.

Methodology/Principal Findings

We conducted a two-stage case-control study in a Chinese Han population. The first-stage, consisting of 1,376 CHD cases and 1,376 sex and age- frequency matched controls, examined 5 novel lipid-associated single-nucleotide polymorphisms (SNPs) identified from GWAS among Caucasians in relation to CHD risk in Chinese. We then validated significant SNPs in the second-stage, consisting of 1,269 cases and 2,745 controls. We also tested associations between SNPs within the five novel loci and blood lipid levels in 4,121 controls. We identified two novel SNPs (rs599839 in CELSR2-PSRC1-SORT1 and rs16996148 in NCAN-CILP2) that were significantly associated with reduced CHD risk in Chinese (odds ratios (95% confidence intervals) in the dominant model 0.76 (0.61-0.90; P = 0.001), 0.67 (0.57-0.77; P = 3.4×10−8), respectively). Multiple linear regression analyses using dominant model showed that rs599839 was significantly associated with decreased LDL levels (P = 0.022) and rs16996148 was significantly associated with increased LDL and HDL levels (P = 2.9×10−4 and 0.001, respectively).

Conclusions/Significance

We identified two novel SNPs (rs599839 and rs16996148) at newly identified lipid-associated loci that were significantly associated with CHD susceptibility in a Chinese Han population.  相似文献   

7.
Advances in next-generation sequencing technology have enabled systematic exploration of the contribution of rare variation to Mendelian and complex diseases. Although it is well known that population stratification can generate spurious associations with common alleles, its impact on rare variant association methods remains poorly understood. Here, we performed exhaustive coalescent simulations with demographic parameters calibrated from exome sequence data to evaluate the performance of nine rare variant association methods in the presence of fine-scale population structure. We find that all methods have an inflated spurious association rate for parameter values that are consistent with levels of differentiation typical of European populations. For example, at a nominal significance level of 5%, some test statistics have a spurious association rate as high as 40%. Finally, we empirically assess the impact of population stratification in a large data set of 4,298 European American exomes. Our results have important implications for the design, analysis, and interpretation of rare variant genome-wide association studies.  相似文献   

8.
9.

Background

The vast majority of genetic risk factors for complex diseases have, taken individually, a small effect on the end phenotype. Population-based association studies therefore need very large sample sizes to detect significant differences between affected and non-affected individuals. Including thousands of affected individuals in a study requires recruitment in numerous centers, possibly from different geographic regions. Unfortunately such a recruitment strategy is likely to complicate the study design and to generate concerns regarding population stratification.

Methodology/Principal Findings

We analyzed 9,751 individuals representing three main ethnic groups - Europeans, Arabs and South Asians - that had been enrolled from 154 centers involving 52 countries for a global case/control study of acute myocardial infarction. All individuals were genotyped at 103 candidate genes using 1,536 SNPs selected with a tagging strategy that captures most of the genetic diversity in different populations. We show that relying solely on self-reported ethnicity is not sufficient to exclude population stratification and we present additional methods to identify and correct for stratification.

Conclusions/Significance

Our results highlight the importance of carefully addressing population stratification and of carefully “cleaning” the sample prior to analyses to obtain stronger signals of association and to avoid spurious results.  相似文献   

10.
Genome-wide association study (GWAS) has become an obvious general approach for studying traits of agricultural importance in higher plants, especially crops. Here, we present a GWAS of 32 morphologic and 10 agronomic traits in a collection of 615 barley cultivars genotyped by genome-wide polymorphisms from a recently developed barley oligonucleotide pool assay. Strong population structure effect related to mixed sampling based on seasonal growth habit and ear row number is present in this barley collection. Comparison of seven statistical approaches in a genome-wide scan for significant associations with or without correction for confounding by population structure, revealed that in reducing false positive rates while maintaining statistical power, a mixed linear model solution outperforms genomic control, structured association, stepwise regression control and principal components adjustment. The present study reports significant associations for sixteen morphologic and nine agronomic traits and demonstrates the power and feasibility of applying GWAS to explore complex traits in highly structured plant samples.  相似文献   

11.
Kawasaki disease (KD) is an acute systemic vasculitis syndrome that primarily affects infants and young children. Its etiology is unknown; however, epidemiological findings suggest that genetic predisposition underlies disease susceptibility. Taiwan has the third-highest incidence of KD in the world, after Japan and Korea. To investigate novel mechanisms that might predispose individuals to KD, we conducted a genome-wide association study (GWAS) in 250 KD patients and 446 controls in a Han Chinese population residing in Taiwan, and further validated our findings in an independent Han Chinese cohort of 208 cases and 366 controls. The most strongly associated single-nucleotide polymorphisms (SNPs) detected in the joint analysis corresponded to three novel loci. Among these KD-associated SNPs three were close to the COPB2 (coatomer protein complex beta-2 subunit) gene: rs1873668 (p = 9.52×10−5), rs4243399 (p = 9.93×10−5), and rs16849083 (p = 9.93×10−5). We also identified a SNP in the intronic region of the ERAP1 (endoplasmic reticulum amino peptidase 1) gene (rs149481, pbest = 4.61×10−5). Six SNPs (rs17113284, rs8005468, rs10129255, rs2007467, rs10150241, and rs12590667) clustered in an area containing immunoglobulin heavy chain variable regions genes, with pbest-values between 2.08×10−5 and 8.93×10−6, were also identified. This is the first KD GWAS performed in a Han Chinese population. The novel KD candidates we identified have been implicated in T cell receptor signaling, regulation of proinflammatory cytokines, as well as antibody-mediated immune responses. These findings may lead to a better understanding of the underlying molecular pathogenesis of KD.  相似文献   

12.
A genome-wide study has shown an association between SNPs located on 17q21 and asthma. Such associations have been identified in several populations, but little is known about the Han Chinese population. We conducted a case-control study in a Han Chinese population to investigate the relationship between SNPs located on 17q21 and asthma; 241 asthmatic patients and 212 healthy controls were recruited from the outpatient clinics of the Nanfang Hospital, Guangdong Province, southern China. We genotyped six SNPs (rs8067378, rs8069176, rs2305480, rs4795400, rs12603332, and rs11650680) located on 17q21 with the Sequenom MassARRAY iPLEX platform. For two of these six loci (rs2305480 and rs8067378), there was evidence of association with asthma, and there was a weak association of asthma with rs8069176. We confirm that genetic variants on 17q21 are associated with asthma in the Han Chinese population.  相似文献   

13.
厉新民  林鸿宣 《植物学报》2016,51(4):411-415
全基因组关联分析(GWAS)近年来被广泛应用于解析生物自然变异的遗传基础。但限于其遗传定位精度, 在水稻(Oryza sativa)遗传学研究中, 该方法尚无法取代传统的图位克隆法在克隆复杂性状调控基因中的作用。近期, 中国科学家在应用GWAS等大数据来克隆控制水稻粒长和粒重等复杂性状的QTL方面取得了新突破。  相似文献   

14.
Show-jumping is an economically important breeding goal in Hanoverian warmblood horses. The aim of this study was a genome-wide association study (GWAS) for quantitative trait loci (QTL) for show-jumping in Hanoverian warmblood horses, employing the Illumina equine SNP50 Beadchip. For our analyses, we genotyped 115 stallions of the National State stud of Lower Saxony. The show-jumping talent of a horse includes style and ability in free-jumping. To control spurious associations based on population stratification, two different mixed linear animal model (MLM) approaches were employed, besides linear models with fixed effects only and adaptive permutations for correcting multiple testing. Population stratification was explained best in the MLM considering Hanoverian, Thoroughbred, Trakehner and Holsteiner genes and the marker identity-by-state relationship matrix. We identified six QTL for show-jumping on horse chromosomes (ECA) 1, 8, 9 and 26 (-log(10) P-value >5) and further putative QTL with -log(10) P-values of 3-5 on ECA1, 3, 11, 17 and 21. Within six QTL regions, we identified human performance-related genes including PAPSS2 on ECA1, MYL2 on ECA8, TRHR on ECA9 and GABPA on ECA26 and within the putative QTL regions NRAP on ECA1, and TBX4 on ECA11. The results of our GWAS suggest that genes involved in muscle structure, development and metabolism are crucial for elite show-jumping performance. Further studies are required to validate these QTL in larger data sets and further horse populations.  相似文献   

15.
This study analyzes population structure and linkage disequilibrium (LD) among 187 commonly used Chinese maize inbred lines, representing the genetic diversity among public, commercial and historically important lines for corn breeding. Seventy SSR loci, evenly distributed over 10 chromosomes, were assayed for polymorphism. The identified 290 alleles served to estimate population structure and analyze the genome-wide LD. The population of lines was highly structured, showing 6 subpopulations: BSSS (American BSSS including Reid), PA (group A germplasm derived from modern U.S. hybrids in China), PB (group B germplasm derived from modern U.S. hybrid in China), Lan (Lancaster Surecrop), LRC (derivative lines from Lvda Reb Cob, a Chinese landrace) and SPT (derivative lines from Si-ping-tou, a Chinese landrace). Forty lines, which formerly had an unknown and/or miscellaneous origin and pedigree record, were assigned to the appropriate group. Relationship estimates based on SSR marker data were quantified in a Q matrix, and this information will inform breeder’s decisions regarding crosses. Extensive inter- and intra-chromosomal LD was detected between 70 microsatellite loci for the investigated maize lines (2109 loci pairs in LD with D′ > 0.1 and 93 out of them at P < 0.01).This suggests that rapidly evolving microsatellites may track recent population structure. Interlocus LD decay among the diverse maize germplasm indicated that association studies in QTLs and/or candidate genes might avoid nonfunctional and spurious associations since most of the LD blocks were broken between diverse germplasm. The defined population structure and the LD analysis present the basis for future association mapping. Electronic supplementary material The online version of this article (doi:) contains supplementary material, which is available to authorized users.  相似文献   

16.
Association studies in populations that are genetically heterogeneous can yield large numbers of spurious associations if population subgroups are unequally represented among cases and controls. This problem is particularly acute for studies involving pooled genotyping of very large numbers of single-nucleotide-polymorphism (SNP) markers, because most methods for analysis of association in structured populations require individual genotyping data. In this study, we present several strategies for matching case and control pools to have similar genetic compositions, based on ancestry information inferred from genotype data for approximately 300 SNPs tiled on an oligonucleotide-based genotyping array. We also discuss methods for measuring the impact of population stratification on an association study. Results for an admixed population and a phenotype strongly confounded with ancestry show that these simple matching strategies can effectively mitigate the impact of population stratification.  相似文献   

17.
Shi Y  Qu J  Zhang D  Zhao P  Zhang Q  Tam PO  Sun L  Zuo X  Zhou X  Xiao X  Hu J  Li Y  Cai L  Liu X  Lu F  Liao S  Chen B  He F  Gong B  Lin H  Ma S  Cheng J  Zhang J  Chen Y  Zhao F  Yang X  Chen Y  Yang C  Lam DS  Li X  Shi F  Wu Z  Lin Y  Yang J  Li S  Ren Y  Xue A  Fan Y  Li D  Pang CP  Zhang X  Yang Z 《American journal of human genetics》2011,(6):438-813
High myopia, which is extremely prevalent in the Chinese population, is one of the leading causes of blindness in the world. Genetic factors play a critical role in the development of the condition. To identify the genetic variants associated with high myopia in the Han Chinese, we conducted a genome-wide association study (GWAS) of 493,947 SNPs in 1088 individuals (419 cases and 669 controls) from a Han Chinese cohort and followed up on signals that were associated with p < 1.0 × 10−4 in three independent cohorts (combined, 2803 cases and 5642 controls). We identified a significant association between high myopia and a variant at 13q12.12 (rs9318086, combined p = 1.91 × 10−16, heterozygous odds ratio = 1.32, and homozygous odds ratio = 1.64). Furthermore, five additional SNPs (rs9510902, rs3794338, rs1886970, rs7325450, and rs7331047) in the same linkage disequilibrium (LD) block with rs9318086 also proved to be significantly associated with high myopia in the Han Chinese population; p values ranged from 5.46 × 10−11 to 6.16 × 10−16. This associated locus contains three genes—MIPEP, C1QTNF9B-AS1, and C1QTNF9B. MIPEP and C1QTNF9B were found to be expressed in the retina and retinal pigment epithelium (RPE) and are more likely than C1QTNF9B-AS1 to be associated with high myopia given the evidence of retinal signaling that controls eye growth. Our results suggest that the variants at 13q12.12 are associated with high myopia.  相似文献   

18.
The Han Chinese are the world's largest ethnic group residing across China. Shaanxi province in northern China was a pastoral–agricultural interlacing region sensitive to climate change since Neolithic times, which makes it a vital place for studying population dynamics. However, genetic studies of Shaanxi Han are underrepresented due to the lack of high-density sampling and genome-wide data. Here, we genotyped 700 000 single nucleotide polymorphisms (SNPs) in 200 Han individuals from nine populations in Shaanxi and compared with available modern and ancient Eurasian individuals. We revealed a north–south genetic cline in Han Chinese with Shaanxi Han locating at the northern side of the cline. We detected the western Eurasian-related admixture in Shaanxi populations, especially in Guanzhong and Shanbei Han Chinese in proportions of 2%–4.6%. Shaanxi Han were suggested to derive a large part of ancestry (39%–69%) from a lineage that also contributed largely to ancient and present-day Tibetans (85%) as well as southern Han, supporting the common northern China origin of modern Sino-Tibetan-speaking populations and southwestward expansion of millet farmers from the middle-upper Yellow River Basin to the Tibetan Plateau and to southern China. The rest of the ancestry of Shaanxi Han was from a lineage closely related to ancient and present-day Austronesian and Tai-Kadai speaking populations in southern China and Southeast Asia. We also observed a genetic substructure in Shaanxi Han in terms of north–south-related ancestry corresponding well to the latitudes. Maternal mitochondrial DNA and paternal Y-chromosome lineages further demonstrated the aforementioned admixture pattern of Han Chinese in Shaanxi province.  相似文献   

19.
曹宗富  马传香  王雷  蔡斌 《遗传》2010,32(9):921-928
在复杂疾病的全基因组关联研究中,人群分层现象会增加结果的假阳性率,因此考虑人群遗传结构、控制人群分层是很有必要的。而在人群分层研究中,使用随机选择的SNP的效果还有待进一步探讨。文章利用HapMap Phase2人群中无关个体的Affymetrix SNP 6.0芯片分型数据,在全基因组上随机均匀选择不同数量的SNP,同时利用f值和Fisher精确检验方法筛选祖先信息标记(Ancestry Informative Markers,AIMs)。然后利用HapMap Phase3中的无关个体的数据,以F-statistics和STRUCTURE分析两种方法评估所选出的不同SNP组合对人群的区分效果。研究发现,随机均匀分布于全基因组的SNP可用于识别人群内部存在的遗传结构。文章进一步提示,在全基因组关联研究中,当没有针对特定人群的AIMs时,可在全基因组上随机选择3000以上均匀分布的SNP来控制人群分层。  相似文献   

20.
Genome-wide association studies (GWAS) have successfully identified susceptibility loci from marginal association analysis of SNPs. Valuable insight into genetic variation underlying complex diseases will likely be gained by considering functionally related sets of genes simultaneously. One approach is to further develop gene set enrichment analysis methods, which are initiated in gene expression studies, to account for the distinctive features of GWAS data. These features include the large number of SNPs per gene, the modest and sparse SNP associations, and the additional information provided by linkage disequilibrium (LD) patterns within genes. We propose a “gene set ridge regression in association studies (GRASS)” algorithm. GRASS summarizes the genetic structure for each gene as eigenSNPs and uses a novel form of regularized regression technique, termed group ridge regression, to select representative eigenSNPs for each gene and assess their joint association with disease risk. Compared with existing methods, the proposed algorithm greatly reduces the high dimensionality of GWAS data while still accounting for multiple hits and/or LD in the same gene. We show by simulation that this algorithm performs well in situations in which there are a large number of predictors compared to sample size. We applied the GRASS algorithm to a genome-wide association study of colon cancer and identified nicotinate and nicotinamide metabolism and transforming growth factor beta signaling as the top two significantly enriched pathways. Elucidating the role of variation in these pathways may enhance our understanding of colon cancer etiology.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号