首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We describe three statistical results that we have found to be useful in case-control genetic association testing. All three involve combining the discovery of novel genetic variants, usually by sequencing, with genotyping methods that recognize previously discovered variants. We first consider expanding the list of known variants by concentrating variant-discovery in cases. Although the naive inclusion of cases-only sequencing data would create a bias, we show that some sequencing data may be retained, even if controls are not sequenced. Furthermore, for alleles of intermediate frequency, cases-only sequencing with bias-correction entails little if any loss of power, compared to dividing the same sequencing effort among cases and controls. Secondly, we investigate more strongly focused variant discovery to obtain a greater enrichment for disease-related variants. We show how case status, family history, and marker sharing enrich the discovery set by increments that are multiplicative with penetrance, enabling the preferential discovery of high-penetrance variants. A third result applies when sequencing is the primary means of counting alleles in both cases and controls, but a supplementary pooled genotyping sample is used to identify the variants that are very rare. We show that this raises no validity issues, and we evaluate a less expensive and more adaptive approach to judging rarity, based on group-specific variants. We demonstrate the important and unusual caveat that this method requires equal sample sizes for validity. These three results can be used to more efficiently detect the association of rare genetic variants with disease.  相似文献   

2.
The extent to which variants in the protein-coding sequence of genes contribute to risk of rheumatoid arthritis (RA) is unknown. In this study, we addressed this issue by deep exon sequencing and large-scale genotyping of 25 biological candidate genes located within RA risk loci discovered by genome-wide association studies (GWASs). First, we assessed the contribution of rare coding variants in the 25 genes to the risk of RA in a pooled sequencing study of 500 RA cases and 650 controls of European ancestry. We observed an accumulation of rare nonsynonymous variants exclusive to RA cases in IL2RA and IL2RB (burden test: p = 0.007 and p = 0.018, respectively). Next, we assessed the aggregate contribution of low-frequency and common coding variants to the risk of RA by dense genotyping of the 25 gene loci in 10,609 RA cases and 35,605 controls. We observed a strong enrichment of coding variants with a nominal signal of association with RA (p < 0.05) after adjusting for the best signal of association at the loci (penrichment = 6.4 × 10−4). For one locus containing CD2, we found that a missense variant, rs699738 (c.798C>A [p.His266Gln]), and a noncoding variant, rs624988, reside on distinct haplotypes and independently contribute to the risk of RA (p = 4.6 × 10−6). Overall, our results indicate that variants (distributed across the allele-frequency spectrum) within the protein-coding portion of a subset of biological candidate genes identified by GWASs contribute to the risk of RA. Further, we have demonstrated that very large sample sizes will be required for comprehensively identifying the independent alleles contributing to the missing heritability of RA.  相似文献   

3.
Genome-wide association studies (GWAS) have been successful in identifying common genetic variation reproducibly associated with disease. However, most associated variants confer very small risk and after meta-analysis of large cohorts a large fraction of expected heritability still remains unexplained. A possible explanation is that rare variants currently undetected by GWAS with SNP arrays could contribute a large fraction of risk when present in cases. This concept has spurred great interest in exploring the role of rare variants in disease. As the cost of sequencing continue to plummet, it is becoming feasible to directly sequence case-control samples for testing disease association including rare variants. We have developed a test statistic that allows for association testing among cases and controls using data directly from sequencing reads. In addition, our method allows for random errors in reads. We determine the probability of a true genotype call based on the observed base pair reads using the expectation-maximization algorithm. We apply the SumStat procedure to obtain a single statistic for a group of multiple rare variant loci. We document the validity of our method through simulations. Our results suggest that our statistic maintains the correct type I error rate, even in the presence of differential misclassification for sequence reads, and that it has good power under a number of scenarios. Finally, our SumStat results show power at least as good as the maximum single locus results.  相似文献   

4.
The contribution of rare coding sequence variants to genetic susceptibility in complex disorders is an important but unresolved question. Most studies thus far have investigated a limited number of genes from regions which contain common disease associated variants. Here we investigate this in inflammatory bowel disease by sequencing the exons and proximal promoters of 531 genes selected from both genome-wide association studies and pathway analysis in pooled DNA panels from 474 cases of Crohn’s disease and 480 controls. 80 variants with evidence of association in the sequencing experiment or with potential functional significance were selected for follow up genotyping in 6,507 IBD cases and 3,064 population controls. The top 5 disease associated variants were genotyped in an extension panel of 3,662 IBD cases and 3,639 controls, and tested for association in a combined analysis of 10,147 IBD cases and 7,008 controls. A rare coding variant p.G454C in the BTNL2 gene within the major histocompatibility complex was significantly associated with increased risk for IBD (p = 9.65x10−10, OR = 2.3[95% CI = 1.75–3.04]), but was independent of the known common associated CD and UC variants at this locus. Rare (<1%) and low frequency (1–5%) variants in 3 additional genes showed suggestive association (p<0.005) with either an increased risk (ARIH2 c.338-6C>T) or decreased risk (IL12B p.V298F, and NICN p.H191R) of IBD. These results provide additional insights into the involvement of the inhibition of T cell activation in the development of both sub-phenotypes of inflammatory bowel disease. We suggest that although rare coding variants may make a modest overall contribution to complex disease susceptibility, they can inform our understanding of the molecular pathways that contribute to pathogenesis.  相似文献   

5.
《PloS one》2014,9(8)
Asthma is a complex genetic disease caused by a combination of genetic and environmental risk factors. We sought to test classes of genetic variants largely missed by genome-wide association studies (GWAS), including copy number variants (CNVs) and low-frequency variants, by performing whole-genome sequencing (WGS) on 16 individuals from asthma-enriched and asthma-depleted families. The samples were obtained from an extended 13-generation Hutterite pedigree with reduced genetic heterogeneity due to a small founding gene pool and reduced environmental heterogeneity as a result of a communal lifestyle. We sequenced each individual to an average depth of 13-fold, generated a comprehensive catalog of genetic variants, and tested the most severe mutations for association with asthma. We identified and validated 1960 CNVs, 19 nonsense or splice-site single nucleotide variants (SNVs), and 18 insertions or deletions that were out of frame. As follow-up, we performed targeted sequencing of 16 genes in 837 cases and 540 controls of Puerto Rican ancestry and found that controls carry a significantly higher burden of mutations in IL27RA (2.0% of controls; 0.23% of cases; nominal p = 0.004; Bonferroni p = 0.21). We also genotyped 593 CNVs in 1199 Hutterite individuals. We identified a nominally significant association (p = 0.03; Odds ratio (OR) = 3.13) between a 6 kbp deletion in an intron of NEDD4L and increased risk of asthma. We genotyped this deletion in an additional 4787 non-Hutterite individuals (nominal p = 0.056; OR = 1.69). NEDD4L is expressed in bronchial epithelial cells, and conditional knockout of this gene in the lung in mice leads to severe inflammation and mucus accumulation. Our study represents one of the early instances of applying WGS to complex disease with a large environmental component and demonstrates how WGS can identify risk variants, including CNVs and low-frequency variants, largely untested in GWAS.  相似文献   

6.
Next-generation sequencing and genome-wide association studies represent powerful tools to identify genetic variants that confer disease risk within populations. On their own, however, they cannot provide insight into how these variants contribute to individual risk for diseases that exhibit complex inheritance, or alternatively confer health in a given individual. Even in the case of well-characterized variants that confer a significant disease risk, more healthy individuals carry the variant, with no apparent ill effect, than those who manifest disease. Access to low-cost genome sequence data promises to provide an unprecedentedly detailed view of the nature of the hereditary component of complex diseases, but requires the large-scale comparison of sequence data from individuals with and without disease to deliver a clinical calibration. The provision of informatics support remains problematic as there are currently no means to interpret the data generated. Here, we initiate this process, a prerequisite for such a study, by narrowing the focus from an entire genome to that of a single biological system. To this end, we examine the ‘Hemostaseome,’ and more specifically focus on DNA sequence changes pertaining to those human genes known to impact upon hemostasis and thrombosis that can be analyzed coordinately, and on an individual basis, to interrogate how specific combinations of variants act to confer disease predisposition. As a first step, we delineate known members of the Hemostaseome and explore the nature of the genetic variants that may cause disease in individuals whose hemostatic balance has become shifted toward either a prothrombotic or anticoagulant phenotype.  相似文献   

7.
Alzheimer disease (AD) is the most common cause of dementia. As with many complex diseases, the identified variants do not explain the total expected genetic risk that is based on heritability estimates for AD. Isolated founder populations, such as the Amish, are advantageous for genetic studies as they overcome heterogeneity limitations associated with complex population studies. We determined that Amish AD cases harbored a significantly higher burden of the known risk alleles compared to Amish cognitively normal controls, but a significantly lower burden when compared to cases from a dataset of unrelated individuals. Whole-exome sequencing of a selected subset of the overall study population was used as a screening tool to identify variants located in the regions of the genome that are most likely to contribute risk. By then genotyping the top candidate variants from the known AD genes and from linkage regions implicated previous studies in the full dataset, new associations could be confirmed. The most significant result (p = 0.0012) was for rs73938538, a synonymous variant in LAMA1 within the previously identified linkage peak on chromosome 18. However, this association is specific to the Amish and did not generalize when tested in a dataset of unrelated individuals. These results suggest that additional risk variation in the Amish remains to be identified and likely resides outside of the classical protein coding gene regions.  相似文献   

8.
Children affected by Specific Language Impairment (SLI) fail to acquire age appropriate language skills despite adequate intelligence and opportunity. SLI is highly heritable, but the understanding of underlying genetic mechanisms has proved challenging. In this study, we use molecular genetic techniques to investigate an admixed isolated founder population from the Robinson Crusoe Island (Chile), who are affected by a high incidence of SLI, increasing the power to discover contributory genetic factors. We utilize exome sequencing in selected individuals from this population to identify eight coding variants that are of putative significance. We then apply association analyses across the wider population to highlight a single rare coding variant (rs144169475, Minor Allele Frequency of 4.1% in admixed South American populations) in the NFXL1 gene that confers a nonsynonymous change (N150K) and is significantly associated with language impairment in the Robinson Crusoe population (p = 2.04 × 10–4, 8 variants tested). Subsequent sequencing of NFXL1 in 117 UK SLI cases identified four individuals with heterozygous variants predicted to be of functional consequence. We conclude that coding variants within NFXL1 confer an increased risk of SLI within a complex genetic model.  相似文献   

9.
With the recent development of whole‐exome sequencing enrichment designs for the dog, a novel tool for disease‐association studies became available. The aim of disease‐association studies is to identify one or a very limited number of putative causal variants or genes from the large pool of genetic variation. To maximize the efficiency of these studies and to provide some directions of what to expect, we evaluated the effect on variant reduction for various combinations of cases and controls for both dominant and recessive types of inheritance assuming variable degrees of penetrance and detectance. In this study, variant data of 14 dogs (13 Labrador Retrievers and one Dogue de Bordeaux), obtained by whole‐exome sequencing, were analyzed. In the filtering process, we found that unrelated dogs from the same breed share up to 70% of their variants, which is likely a consequence of the breeding history of the dog. For the designs tested with unrelated dogs, combining two cases and two controls gave the best result. These results were improved further by adding closely related dogs. Reduced penetrance and/or detectance has a drastic effect on the efficiency and is likely to have a profound effect on the sample size needed to elucidate the causal variant. Overall, we demonstrated that sequencing a small number of dogs results in a marked reduction of variants that are likely sufficient to pinpoint causal variants or genes.  相似文献   

10.
State-of-the-art next-generation-sequencing technologies can facilitate in-depth explorations of the human genome by investigating both common and rare variants. For the identification of genetic factors that are associated with disease risk or other complex phenotypes, methods have been proposed for jointly analyzing variants in a set (e.g., all coding SNPs in a gene). Variants in a properly defined set could be associated with risk or phenotype in a concerted fashion, and by accumulating information from them, one can improve power to detect genetic risk factors. Many set-based methods in the literature are based on statistics that can be written as the summation of variant statistics. Here, we propose taking the summation of the exponential of variant statistics as the set summary for association testing. From both Bayesian and frequentist perspectives, we provide theoretical justification for taking the sum of the exponential of variant statistics because it is particularly powerful for sparse alternatives—that is, compared with the large number of variants being tested in a set, only relatively few variants are associated with disease risk—a distinctive feature of genetic data. We applied the exponential combination gene-based test to a sequencing study in anticancer pharmacogenomics and uncovered mechanistic insights into genes and pathways related to chemotherapeutic susceptibility for an important class of oncologic drugs.  相似文献   

11.
Anterior chamber depth (ACD) is a key anatomical risk factor for primary angle closure glaucoma (PACG). We conducted a genome-wide association study (GWAS) on ACD to discover novel genes for PACG on a total of 5,308 population-based individuals of Asian descent. Genome-wide significant association was observed at a sequence variant within ABCC5 (rs1401999; per-allele effect size = −0.045 mm, P = 8.17×10−9). This locus was associated with an increase in risk of PACG in a separate case-control study of 4,276 PACG cases and 18,801 controls (per-allele OR = 1.13 [95% CI: 1.06–1.22], P = 0.00046). The association was strengthened when a sub-group of controls with open angles were included in the analysis (per-allele OR = 1.30, P = 7.45×10−9; 3,458 cases vs. 3,831 controls). Our findings suggest that the increase in PACG risk could in part be mediated by genetic sequence variants influencing anterior chamber dimensions.  相似文献   

12.
Two coding variants in the APOL1 gene (G1 and G2) explain most of the high rate of kidney disease in African Americans. APOL1-associated kidney disease risk inheritance follows an autosomal recessive pattern: The relative risk of kidney disease associated with inheritance of two high-risk variants is 7–30 fold, depending on the specific kidney phenotype. We wished to determine if the variability in phenotype might in part reflect structural differences in APOL1 gene. We analyzed sequence coverage from 1000 Genomes Project Phase 3 samples as well as exome sequencing data from African American kidney disease cases for copy number variation. 8 samples sequenced in the 1000 Genomes Project showed increased coverage over a ~100kb region that includes APOL2, APOL1 and part of MYH9, suggesting the presence of APOL1 copy number greater than 2. We reasoned that such duplications should be enriched in apparent G1 heterozygotes with kidney disease. Using a PCR-based assay, we observed the presence of this duplication in additional samples from apparent G0G1 or G0G2 individuals. The frequency of this APOL1 duplication was compared among cases (n = 123) and controls (n = 255) with apparent G0G1 heterozygosity. The presence of APOL1 duplication was observed in 4.06% of cases and 0.78% controls, preliminary evidence that this APOL1 duplication may alter susceptibility to kidney disease (p = 0.03). Taqman-based copy number assays confirmed the presence of 3 APOL1 copies in individuals positive for this specific duplication by PCR assay, but also identified a small number of individuals with additional APOL1 copies of presumably different structure. These observations motivate further studies to better assess the contribution of APOL1 copy number on kidney disease risk and on APOL1 function. Investigators and clinicians genotyping APOL1 should also consider whether the particular genotyping platform used is subject to technical errors when more than two copies of APOL1 are present.  相似文献   

13.
There is strong evidence that rare variants are involved in complex disease etiology. The first step in implicating rare variants in disease etiology is their identification through sequencing in both randomly ascertained samples (e.g., the 1,000 Genomes Project) and samples ascertained according to disease status. We investigated to what extent rare variants will be observed across the genome and in candidate genes in randomly ascertained samples, the magnitude of variant enrichment in diseased individuals, and biases that can occur due to how variants are discovered. Although sequencing cases can enrich for casual variants, when a gene or genes are not involved in disease etiology, limiting variant discovery to cases can lead to association studies with dramatically inflated false positive rates.  相似文献   

14.
Rapid advances in sequencing technologies set the stage for the large-scale medical sequencing efforts to be performed in the near future, with the goal of assessing the importance of rare variants in complex diseases. The discovery of new disease susceptibility genes requires powerful statistical methods for rare variant analysis. The low frequency and the expected large number of such variants pose great difficulties for the analysis of these data. We propose here a robust and powerful testing strategy to study the role rare variants may play in affecting susceptibility to complex traits. The strategy is based on assessing whether rare variants in a genetic region collectively occur at significantly higher frequencies in cases compared with controls (or vice versa). A main feature of the proposed methodology is that, although it is an overall test assessing a possibly large number of rare variants simultaneously, the disease variants can be both protective and risk variants, with moderate decreases in statistical power when both types of variants are present. Using simulations, we show that this approach can be powerful under complex and general disease models, as well as in larger genetic regions where the proportion of disease susceptibility variants may be small. Comparisons with previously published tests on simulated data show that the proposed approach can have better power than the existing methods. An application to a recently published study on Type-1 Diabetes finds rare variants in gene IFIH1 to be protective against Type-1 Diabetes.  相似文献   

15.
Previous studies have reported the association between multiple genetic variants in the enamel-formation genes and the risk of dental caries with inconsistent results. We performed a systematic literature search of the PubMed, Cochrane Library, HuGE and Google Scholar databases for studies published before March 21, 2020 and conducted meta-, gene-based and gene-cluster analysis on the association between genetic variants in the enamel-formation genes and the risk of dental caries. We identified 21 relevant publications including a total of 24 studies for analysis. The genetic variant rs17878486 in AMELX was significantly associated with dental caries risk (OR = 1.40, 95% CI: 1.02–1.93, P = 0.037). We found no significant association between the risk of dental caries with rs12640848 in ENAM (OR = 1.15, 95% CI: 0.88–1.52, P = 0.310), rs1784418 in MMP20 (OR = 1.07, 95% CI: 0.76–1.49, P = 0.702) and rs3796704 in ENAM (OR = 1.06, 95% CI: 0.96–1.17, P = 0.228). Gene-based analysis indicated that multiple genetic variants in AMELX showed joint association with the risk of dental caries (6 variants; P < 10−5), so did genetic variants in MMP13 (3 variants; P = 0.004), MMP2 (3 variants; P < 10−5), MMP20 (2 variants; P < 10−5) and MMP3 (2 variants; P < 10−5). The gene-cluster analysis indicated a significant association between the genetic variants in this enamel-formation gene cluster and the risk of dental caries (P < 10−5). The present meta-analysis revealed that genetic variant rs17878486 in AMELX was associated with dental caries, and multiple genetic variants in the enamel-formation genes jointly contributed to the risk of dental caries, supporting the role of genetic variants in the enamel-formation genes in the etiology of dental caries.  相似文献   

16.
The KCC2 cotransporter establishes the low neuronal Cl levels required for GABAA and glycine (Gly) receptor-mediated inhibition, and KCC2 deficiency in model organisms results in network hyperexcitability. However, no mutations in KCC2 have been documented in human disease. Here, we report two non-synonymous functional variants in human KCC2, R952H and R1049C, exhibiting clear statistical association with idiopathic generalized epilepsy (IGE). These variants reside in conserved residues in the KCC2 cytoplasmic C-terminus, exhibit significantly impaired Cl-extrusion capacities resulting in less hyperpolarized Gly equilibrium potentials (EGly), and impair KCC2 stimulatory phosphorylation at serine 940, a key regulatory site. These data describe a novel KCC2 variant significantly associated with a human disease and suggest genetically encoded impairment of KCC2 functional regulation may be a risk factor for the development of human IGE.  相似文献   

17.
In the past few years, case-control studies of common diseases have shifted their focus from single genes to whole exomes. New sequencing technologies now routinely detect hundreds of thousands of sequence variants in a single study, many of which are rare or even novel. The limitation of classical single-marker association analysis for rare variants has been a challenge in such studies. A new generation of statistical methods for case-control association studies has been developed to meet this challenge. A common approach to association analysis of rare variants is the burden-style collapsing methods to combine rare variant data within individuals across or within genes. Here, we propose a new hybrid likelihood model that combines a burden test with a test of the position distribution of variants. In extensive simulations and on empirical data from the Dallas Heart Study, the new model demonstrates consistently good power, in particular when applied to a gene set (e.g., multiple candidate genes with shared biological function or pathway), when rare variants cluster in key functional regions of a gene, and when protective variants are present. When applied to data from an ongoing sequencing study of bipolar disorder (191 cases, 107 controls), the model identifies seven gene sets with nominal p-values0.05, of which one MAPK signaling pathway (KEGG) reaches trend-level significance after correcting for multiple testing.  相似文献   

18.
Recent developments in sequencing technologies have made it possible to uncover both rare and common genetic variants. Genome-wide association studies (GWASs) can test for the effect of common variants, whereas sequence-based association studies can evaluate the cumulative effect of both rare and common variants on disease risk. Many groupwise association tests, including burden tests and variance-component tests, have been proposed for this purpose. Although such tests do not exclude common variants from their evaluation, they focus mostly on testing the effect of rare variants by upweighting rare-variant effects and downweighting common-variant effects and can therefore lose substantial power when both rare and common genetic variants in a region influence trait susceptibility. There is increasing evidence that the allelic spectrum of risk variants at a given locus might include novel, rare, low-frequency, and common genetic variants. Here, we introduce several sequence kernel association tests to evaluate the cumulative effect of rare and common variants. The proposed tests are computationally efficient and are applicable to both binary and continuous traits. Furthermore, they can readily combine GWAS and whole-exome-sequencing data on the same individuals, when available, and are also applicable to deep-resequencing data of GWAS loci. We evaluate these tests on data simulated under comprehensive scenarios and show that compared with the most commonly used tests, including the burden and variance-component tests, they can achieve substantial increases in power. We next show applications to sequencing studies for Crohn disease and autism spectrum disorders. The proposed tests have been incorporated into the software package SKAT.  相似文献   

19.
Ion channel mutations are an important cause of rare Mendelian disorders affecting brain, heart, and other tissues. We performed parallel exome sequencing of 237 channel genes in a well-characterized human sample, comparing variant profiles of unaffected individuals to those with the most common neuronal excitability disorder, sporadic idiopathic epilepsy. Rare missense variation in known Mendelian disease genes is prevalent in both groups at similar complexity, revealing that even deleterious ion channel mutations confer uncertain risk to an individual depending on the other variants with which they are combined. Our findings indicate that variant discovery via large scale sequencing efforts is only a first step in illuminating the complex allelic architecture underlying personal disease risk. We propose that in?silico modeling of channel variation in realistic cell and network models will be crucial to future strategies assessing mutation profile pathogenicity and drug response in individuals with a broad spectrum of excitability disorders.  相似文献   

20.
Asthma is a complex disease involving genetic and environmental aetiology. The tumour necrosis factor-alpha (TNF-alpha) and angiotensin-converting enzyme (ACE) genes have been implicated in asthma pathogenesis. This study investigated the association of a G-308A variant of TNF-alpha and an insertion/deletion (I/D) variant of ACE with a self-reported history of childhood asthma, in two population groups. At Northwick Park Hospital, London, 1,811 pregnant women attending for antenatal care were recruited. Participants with a self-reported history of childhood asthma, determined by a researcher-administered questionnaire, and controls with no personal or family history of asthma, of UK/Irish (cases n=20; controls n=416) and South Asian (cases n=6; controls n=275) origin were used in this study. Participants were genotyped for the TNF-alpha-308 and ACE I/D variants by a PCR-RFLP and PCR approach. The TNF-alpha-308 allele 2 (-308A) was significantly associated with self-reported childhood asthma in the UK/Irish (Odds ratios (OR): 2.6; 95% confidence intervals (CI): 1.1-6.2; P=0.03) but not in the South Asian population. The ACE DD genotype was not associated with childhood asthma in either population group. Gametic phase disequilibrium between the TNF-alpha-308 and ACE I/D variants was significantly different from zero in UK/Irish cases (delta=0.09; P=0.034). The TNF-alpha308 allele 2 or a linked major histocompatibility complex (MHC) variant may be a genetic risk factor for childhood asthma in the UK/Irish sample.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号