首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Common variants explain little of the variance of most common disease,prompting large-scale sequencing studies to understand the contribution of rare variants to these diseases.Imputation of rare variants from genome-wide genotypic arrays offers a cost-efficient strategy to achieve necessary sample sizes required for adequate statistical power.To estimate the performance of imputation of rare variants,we imputed 153 individuals,each of whom was genotyped on 3 different genotype arrays including 317k,610k and 1 million single nucleotide polymorphisms(SNPs),to two different reference panels:HapMap2 and 1000 Genomes pilot March 2010 release (lKGpilot) by using IMPUTE version 2.We found that more than 94%and 84%of all SNPs yield acceptable accuracy(info > 0.4) in HapMap2 and lKGpilot-based imputation,respectively.For rare variants(minor allele frequency(MAF) <5%),the proportion of wellimputed SNPs increased as the MAF increased from 0.3%to 5%across all 3 genome-wide association study(GWAS) datasets.The proportion of well-imputed SNPs was 69%,60%and 49%for SNPs with a MAF from 0.3%to 5%for 1M,610k and 317k,respectively. None of the very rare variants(MAF < 0.3%) were well imputed.We conclude that the imputation accuracy of rare variants increases with higher density of genome-wide genotyping arrays when the size of the reference panel is small.Variants with lower MAF are more difficult to impute.These findings have important implications in the design and replication of large-scale sequencing studies.  相似文献   

2.
Genome-wide association studies (GWAS) conducted using commercial single nucleotide polymorphisms (SNP) arrays have proven to be a powerful tool for the detection of common disease susceptibility variants. However, their utility for the detection of lower frequency variants is yet to be practically investigated. Here we describe the application of a rare variant collapsing method to a large genome-wide SNP dataset, the Wellcome Trust Case Control Consortium rheumatoid arthritis (RA) GWAS. We partitioned the data into gene-centric bins and collapsed genotypes of low frequency variants (defined here as MAF ≤0.05) into a single count coupled with univariate analysis. We then prioritised gene regions for further investigation in an independent cohort of 3,355 cases and 2,427 controls based on rare variant signal p value and prior evidence to support involvement in RA. A total of 14,536 gene bins were investigated in the primary analysis and signals mapping to the TNFAIP3 and chr17q24 loci were selected for further investigation. We detected replicating association to low frequency variants in the TNFAIP3 gene (combined p = 6.6 × 10?6). Even though rare variants are not well-represented and can be difficult to genotype in GWAS, our study supports the application of low frequency variant collapsing methods to genome-wide SNP datasets as a means of exploiting data that are routinely ignored.  相似文献   

3.
Genotype imputation is now routinely applied in genome-wide association studies (GWAS) and meta-analyses. However, most of the imputations have been run using HapMap samples as reference, imputation of low frequency and rare variants (minor allele frequency (MAF) < 5%) are not systemically assessed. With the emergence of next-generation sequencing, large reference panels (such as the 1000 Genomes panel) are available to facilitate imputation of these variants. Therefore, in order to estimate the performance of low frequency and rare variants imputation, we imputed 153 individuals, each of whom had 3 different genotype array data including 317k, 610k and 1 million SNPs, to three different reference panels: the 1000 Genomes pilot March 2010 release (1KGpilot), the 1000 Genomes interim August 2010 release (1KGinterim), and the 1000 Genomes phase1 November 2010 and May 2011 release (1KGphase1) by using IMPUTE version 2. The differences between these three releases of the 1000 Genomes data are the sample size, ancestry diversity, number of variants and their frequency spectrum. We found that both reference panel and GWAS chip density affect the imputation of low frequency and rare variants. 1KGphase1 outperformed the other 2 panels, at higher concordance rate, higher proportion of well-imputed variants (info>0.4) and higher mean info score in each MAF bin. Similarly, 1M chip array outperformed 610K and 317K. However for very rare variants (MAF≤0.3%), only 0–1% of the variants were well imputed. We conclude that the imputation of low frequency and rare variants improves with larger reference panels and higher density of genome-wide genotyping arrays. Yet, despite a large reference panel size and dense genotyping density, very rare variants remain difficult to impute.  相似文献   

4.
DNA variants, such as single nucleotide polymorphisms (SNPs) and copy number variants (CNVs), are unevenly distributed across the human genome. Currently, dbSNP contains more than 6 million human SNPs, and whole-genome genotyping arrays can assay more than 4 million of them simultaneously. In our study, we first questioned whether published genome-wide association studies (GWASs) assays cover all regions well in the genome. Using dbSNP build 135 data, we identified 50 genomic regions longer than 100 Kb that do not contain any common SNPs, i.e., those with minor allele frequency (MAF)≥1%. Secondly, because conserved regions are generally of functional importance, we tested genes in those large genomic regions without common SNPs. We found 97 genes and were enriched for reproduction function. In addition, we further filtered out regions with CNVs listed in the Database of Genomic Variants (DGV), segmental duplications from Human Genome Project and common variants identified by personal genome sequencing (UCSC). No region survived after those filtering. Our analysis suggests that, while there may not be many large genomic regions free of common variants, there are still some “holes” in the current human genomic map for common SNPs. Because GWAS only focused on common SNPs, interpretation of GWAS results should take this limitation into account. Particularly, two recent GWAS of fertility may be incomplete due to the map deficit. Additional SNP discovery efforts should pay close attention to these regions.  相似文献   

5.
Alcohol dependence (AD) is a heritable substance addiction with adverse physical and psychological consequences, representing a major health and economic burden on societies worldwide. Genes thus far implicated via linkage, candidate gene and genome‐wide association studies (GWAS) account for only a small fraction of its overall risk, with effects varying across ethnic groups. Here we investigate the genetic architecture of alcoholism and report on the extent to which common, genome‐wide SNPs collectively account for risk of AD in two US populations, African‐Americans (AAs) and European‐Americans (EAs). Analyzing GWAS data for two independent case–control sample sets, we compute polymarker scores that are significantly associated with alcoholism (P = 1.64 × 10–3 and 2.08 × 10–4 for EAs and AAs, respectively), reflecting the small individual effects of thousands of variants derived from patterns of allelic architecture that are population specific. Simulations show that disease models based on rare and uncommon causal variants (MAF < 0.05) best fit the observed distribution of polymarker signals. When scoring bins were annotated for gene location and examined for constituent biological networks, gene enrichment is observed for several cellular processes and functions in both EA and AA populations, transcending their underlying allelic differences. Our results reveal key insights into the complex etiology of AD, raising the possibility of an important role for rare and uncommon variants, and identify polygenic mechanisms that encompass a spectrum of disease liability, with some, such as chloride transporters and glycine metabolism genes, displaying subtle, modifying effects that are likely to escape detection in most GWAS designs.  相似文献   

6.
Coeliac disease (CeD) is a highly heritable common autoimmune disease involving chronic small intestinal inflammation in response to dietary wheat. The human leukocyte antigen (HLA) region, and 40 newer regions identified by genome wide association studies (GWAS) and dense fine mapping, account for ∼40% of the disease heritability. We hypothesized that in pedigrees with multiple individuals with CeD rare [minor allele frequency (MAF) <0.5%] mutations of larger effect size (odds ratios of ∼ 2–5) might exist. We sequenced the exomes of 75 coeliac individuals of European ancestry from 55 multiply affected families. We selected interesting variants and genes for further follow up using a combination of: an assessment of shared variants between related subjects, a model-free linkage test, and gene burden tests for multiple, potentially causal, variants. We next performed highly multiplexed amplicon resequencing of all RefSeq exons from 24 candidate genes selected on the basis of the exome sequencing data in 2,248 unrelated coeliac cases and 2,230 controls. 1,335 variants with a 99.9% genotyping call rate were observed in 4,478 samples, of which 939 were present in coding regions of 24 genes (Ti/Tv 2.99). 91.7% of coding variants were rare (MAF <0.5%) and 60% were novel. Gene burden tests performed on rare functional variants identified no significant associations (p<1×10−3) in the resequenced candidate genes. Our strategy of sequencing multiply affected families with deep follow up of candidate genes has not identified any new CeD risk mutations.  相似文献   

7.
In genome-wide association studies, single nucleotide polymorphisms located in five novel loci were associated with PDB. We aimed at identifying rare genetic variants of candidate genes located in these loci and search for genetic association with PDB in the French-Canadian population. Exons, promoter and exon–intron junctions from patients with familial PDB and healthy individuals were sequenced in candidate genes, located within novel loci associated with PDB in our population. Rare variant was defined by a minor allele frequency <0.05 or absent from dbSNP (NCBI). We sequenced seven genes in 1p13 locus, three genes in 7q33, three genes in 8q22, and five genes in 15q24 locus. We identified 126 rare variants in at least one patient with PDB of whom 55 were located in 1p13 locus, 32 in 7q33, 10 in 8q22 and 29 in 15q24 locus. We located 71 of these 126 rare variants in an intron, 30 in an exon and 9 in an untranslated region. 60 % of these variants were located in functionally relevant gene regions. Among the 12 missense rare variants in PDB, two (rs62620995 in TM7SF4; rs62641691 in CD276) were predicted to be damaging by in silico analysis tools. Rs62620995, which altered a conserved amino acid (p.Leu397Phe) in the TM7SF4 gene, encoding the DC-STAMP protein involved in osteoclastogenesis through RANK signaling pathway, was found to have a marginal association with PDB (p = 0.09). Rs35500845, located in the CTHRC1 gene, which encodes a regulator of collagen matrix deposition, was also associated with PDB in the French-Canadian population (p = 0.046).  相似文献   

8.
Hepatitis B virus (HBV) infection affects more than 2 billion people throughout the world. Among them, more than 240 million have chronic infection. Every year, 0.5–1.2 million people die of chronic hepatitis B virus infection (CHBVI), and approximately 60 % of liver cancers are related to CHBI and subsequent liver cirrhosis (LC). These HBVI-related diseases impose a considerable economic burden as well as morbidity on patients, families, and society. Family and twin studies have indicated that the host genetic constitution greatly influences the clinical outcomes of HBV infection. During the past several years, genome-wide association studies (GWAS) have identified susceptibility variants for various HBVI-related diseases. Of these variants, SNPs rs3077 and rs9277535 in HLA-DP on chromosome 6 show the strongest evidence for association with CHBVI and with viral clearance. However, whether there exists an association between HLA-DP variants and the progression of CHBVI remains to be determined. Thus, further study should focus not only on identifying more variants in HLA-DP that are associated with various HBVI-related diseases but also on characterizing any newly discovered functional variants at the molecular level. Further, given the complexity of CHBV infection and its progression, gene–gene and gene–environment interactions should also be taken into consideration. Moreover, because both smoking and alcohol affect HBV infection and progression, it is important to understand how these factors interact with genetics to influence HBV-related diseases.  相似文献   

9.
3% of the population develops saccular intracranial aneurysms (sIAs), a complex trait, with a sporadic and a familial form. Subarachnoid hemorrhage from sIA (sIA-SAH) is a devastating form of stroke. Certain rare genetic variants are enriched in the Finns, a population isolate with a small founder population and bottleneck events. As the sIA-SAH incidence in Finland is >2× increased, such variants may associate with sIA in the Finnish population. We tested 9.4 million variants for association in 760 Finnish sIA patients (enriched for familial sIA), and in 2,513 matched controls with case-control status and with the number of sIAs. The most promising loci (p<5E-6) were replicated in 858 Finnish sIA patients and 4,048 controls. The frequencies and effect sizes of the replicated variants were compared to a continental European population using 717 Dutch cases and 3,004 controls. We discovered four new high-risk loci with low frequency lead variants. Three were associated with the case-control status: 2q23.3 (MAF 2.1%, OR 1.89, p 1.42×10-9); 5q31.3 (MAF 2.7%, OR 1.66, p 3.17×10-8); 6q24.2 (MAF 2.6%, OR 1.87, p 1.87×10-11) and one with the number of sIAs: 7p22.1 (MAF 3.3%, RR 1.59, p 6.08×-9). Two of the associations (5q31.3, 6q24.2) replicated in the Dutch sample. The 7p22.1 locus was strongly differentiated; the lead variant was more frequent in Finland (4.6%) than in the Netherlands (0.3%). Additionally, we replicated a previously inconclusive locus on 2q33.1 in all samples tested (OR 1.27, p 1.87×10-12). The five loci explain 2.1% of the sIA heritability in Finland, and may relate to, but not explain, the increased incidence of sIA-SAH in Finland. This study illustrates the utility of population isolates, familial enrichment, dense genotype imputation and alternate phenotyping in search for variants associated with complex diseases.  相似文献   

10.
Empirical evidences suggest that both common and rare variants contribute to complex disease etiology. Although the effects of common variants have been thoroughly assessed in recent genome-wide association studies (GWAS), our knowledge of the impact of rare variants on complex diseases remains limited. A number of methods have been proposed to test for rare variant association in sequencing-based studies, a study design that is becoming popular but is still not economically feasible. On the contrary, few (if any) methods exist to detect rare variants in GWAS data, the data we have collected on thousands of individuals. Here we propose two methods, a weighted haplotype-based approach and an imputation-based approach, to test for the effect of rare variants with GWAS data. Both methods can incorporate external sequencing data when available. We evaluated our methods and compared them with methods proposed in the sequencing setting through extensive simulations. Our methods clearly show enhanced statistical power over existing methods for a wide range of population-attributable risk, percentage of disease-contributing rare variants, and proportion of rare alleles working in different directions. We also applied our methods to the IFIH1 region for the type 1 diabetes GWAS data collected by the Wellcome Trust Case-Control Consortium. Our methods yield p values in the order of 10−3, whereas the most significant p value from the existing methods is greater than 0.17. We thus demonstrate that the evaluation of rare variants with GWAS data is possible, particularly when public sequencing data are incorporated.  相似文献   

11.

Background

An understanding of linkage disequilibrium (LD) structures in the human genome underpins much of medical genetics and provides a basis for disease gene mapping and investigating biological mechanisms such as recombination and selection. Whole genome sequencing (WGS) provides the opportunity to determine LD structures at maximal resolution.

Results

We compare LD maps constructed from WGS data with LD maps produced from the array-based HapMap dataset, for representative European and African populations. WGS provides up to 5.7-fold greater SNP density than array-based data and achieves much greater resolution of LD structure, allowing for identification of up to 2.8-fold more regions of intense recombination. The absence of ascertainment bias in variant genotyping improves the population representativeness of the WGS maps, and highlights the extent of uncaptured variation using array genotyping methodologies. The complete capture of LD patterns using WGS allows for higher genome-wide association study (GWAS) power compared to array-based GWAS, with WGS also allowing for the analysis of rare variation. The impact of marker ascertainment issues in arrays has been greatest for Sub-Saharan African populations where larger sample sizes and substantially higher marker densities are required to fully resolve the LD structure.

Conclusions

WGS provides the best possible resource for LD mapping due to the maximal marker density and lack of ascertainment bias. WGS LD maps provide a rich resource for medical and population genetics studies. The increasing availability of WGS data for large populations will allow for improved research utilising LD, such as GWAS and recombination biology studies.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1854-0) contains supplementary material, which is available to authorized users.  相似文献   

12.
Genotype imputation has the potential to assess human genetic variation at a lower cost than assaying the variants using laboratory techniques. The performance of imputation for rare variants has not been comprehensively studied. We utilized 8865 human samples with high depth resequencing data for the exons and flanking regions of 202 genes and Genome-Wide Association Study (GWAS) data to characterize the performance of genotype imputation for rare variants. We evaluated reference sets ranging from 100 to 3713 subjects for imputing into samples typed for the Affymetrix (500K and 6.0) and Illumina 550K GWAS panels. The proportion of variants that could be well imputed (true r2>0.7) with a reference panel of 3713 individuals was: 31% (Illumina 550K) or 25% (Affymetrix 500K) with MAF (Minor Allele Frequency) less than or equal 0.001, 48% or 35% with 0.0010.05. The performance for common SNPs (MAF>0.05) within exons and flanking regions is comparable to imputation of more uniformly distributed SNPs. The performance for rare SNPs (0.01相似文献   

13.
14.
The BACH2 gene regulates B cell differentiation and function and has been reported to be a shared susceptibility gene for several autoimmune diseases. Our previous genome-wide association study (GWAS) indicated that several single nucleotide polymorphisms (SNPs) in the BACH2 gene are associated with Graves’ disease (GD) in the Chinese Han population; however, the association did not achieve genome-wide significance levels. Recently, this association of BACH2 with GD was confirmed in Caucasians in the UK population, but fine mapping in this region has not yet been reported. Here, we provide a refined analysis of a 331-kb region in the BACH2 gene, which harbors 359 SNPs, using GWAS data from 1,442 GD patients and 1,468 controls. The SNPs rs2474619 and rs9344996 were implied as the independent variants associated with GD by forward and two-locus logistic regression analysis. We genotyped eight out of 10 tagSNPs with P < 1 × 10?3 in 3,508 GD patients and 3,209 controls, the results also showed that rs2474619 was independently associated with GD in the combined population from GWAS and the second stage (P = 1.81 × 10?5). The rs2474619 and rs9344996 were further genotyped in the third stage cohorts, and rs2474619 showed evidence of association with GD at genome-wide significance levels in the combined population (P = 3.28 × 10?8, odds ratio = 1.13). The association of rs9344996 with GD can be explained by its linkage to rs2474619 in the combined population. Our study clearly demonstrated that BACH2 is a susceptibility gene for GD in the Chinese Han population and further supported rs2474619, in intron 2 of BACH2, is the best association signal with GD. However, the mechanism by which BACH2 confers increased risk of GD requires further study.  相似文献   

15.
Whole-exome or gene targeted resequencing in hundreds to thousands of individuals has shown that the majority of genetic variants are at low frequency in human populations. Rare variants are enriched for functional mutations and are expected to explain an important fraction of the genetic etiology of human disease, therefore having a potential medical interest. In this work, we analyze the whole-exome sequences of French-Canadian individuals, a founder population with a unique demographic history that includes an original population bottleneck less than 20 generations ago, followed by a demographic explosion, and the whole exomes of French individuals sampled from France. We show that in less than 20 generations of genetic isolation from the French population, the genetic pool of French-Canadians shows reduced levels of diversity, higher homozygosity, and an excess of rare variants with low variant sharing with Europeans. Furthermore, the French-Canadian population contains a larger proportion of putatively damaging functional variants, which could partially explain the increased incidence of genetic disease in the province. Our results highlight the impact of population demography on genetic fitness and the contribution of rare variants to the human genetic variation landscape, emphasizing the need for deep cataloguing of genetic variants by resequencing worldwide human populations in order to truly assess disease risk.  相似文献   

16.
Over the past decades, genome-wide association studies (GWAS) have led to a dramatic expansion of genetic variants implicated with human traits and diseases. These advances are expected to result in new drug targets but the identification of causal genes and the cell biology underlying human diseases from GWAS remains challenging. Here, we review protein interaction network-based methods to analyse GWAS data. These approaches can rank candidate drug targets at GWAS-associated loci or among interactors of disease genes without direct genetic support. These methods identify the cell biology affected in common across diseases, offering opportunities for drug repurposing, as well as be combined with expression data to identify focal tissues and cell types. Going forward, we expect that these methods will further improve from advances in the characterisation of context specific interaction networks and the joint analysis of rare and common genetic signals.  相似文献   

17.
18.
19.
Genome-wide association studies (GWAS) have identified hundreds of associated loci across many common diseases. Most risk variants identified by GWAS will merely be tags for as-yet-unknown causal variants. It is therefore possible that identification of the causal variant, by fine mapping, will identify alleles with larger effects on genetic risk than those currently estimated from GWAS replication studies. We show that under plausible assumptions, whilst the majority of the per-allele relative risks (RR) estimated from GWAS data will be close to the true risk at the causal variant, some could be considerable underestimates. For example, for an estimated RR in the range 1.2-1.3, there is approximately a 38% chance that it exceeds 1.4 and a 10% chance that it is over 2. We show how these probabilities can vary depending on the true effects associated with low-frequency variants and on the minor allele frequency (MAF) of the most associated SNP. We investigate the consequences of the underestimation of effect sizes for predictions of an individual's disease risk and interpret our results for the design of fine mapping experiments. Although these effects mean that the amount of heritability explained by known GWAS loci is expected to be larger than current projections, this increase is likely to explain a relatively small amount of the so-called "missing" heritability.  相似文献   

20.
Entropion is a known congenital disorder in sheep presumed to be heritable but no causative genetic variant has been reported. Affected lambs show a variable inward rolling of the lower eyelids leading to blindness in severe cases. In Switzerland, the Swiss White Alpine (SWA) breed showed a significantly higher prevalence for entropion than other breeds. A GWAS using 150 SWA sheep (90 affected lambs and 60 controls), based on 600k SNP data, revealed a genome-wide significant signal on chromosome 15. The 0.2 Mb associated region contains functional candidate genes, SMTNL1 and CTNND1. Pathogenic variants in human CTNND1 cause blepharocheilodontic syndrome 2, a rare disorder including eyelid anomalies, and SMTNL1 regulates contraction and relaxation of skeletal and smooth muscle. WGS of a single entropion-affected lamb revealed two private missense variants in SMTNL1 and CTNND1. Subsequent genotyping of both variants in 231 phenotyped SWA sheep was performed. The SMTNL1 variant p.(Asp452Asn) affects an evolutionary conserved residue within an important domain and represents a rare allele, which occurred also in controls. The p.(Glu943Lys) variant in CTNND1 represents a common variant unlikely to cause entropion as the mutant allele occurred more frequently in non-affected sheep. Therefore, we propose that these protein-changing variants are unlikely to explain the phenotype. Additionally, WGS of three further disconcordant pairs of full siblings was carried out but revealed no obvious causative variant. Finally, we conclude that entropion represents a more complex disease caused by different non-coding regulatory variants.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号