首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Primary open angle glaucoma (POAG) is a complex disease and is one of the major leading causes of blindness worldwide. Genome-wide association studies have successfully identified several common variants associated with glaucoma; however, most of these variants only explain a small proportion of the genetic risk. Apart from the standard approach to identify main effects of variants across the genome, it is believed that gene-gene interactions can help elucidate part of the missing heritability by allowing for the test of interactions between genetic variants to mimic the complex nature of biology. To explain the etiology of glaucoma, we first performed a genome-wide association study (GWAS) on glaucoma case-control samples obtained from electronic medical records (EMR) to establish the utility of EMR data in detecting non-spurious and relevant associations; this analysis was aimed at confirming already known associations with glaucoma and validating the EMR derived glaucoma phenotype. Our findings from GWAS suggest consistent evidence of several known associations in POAG. We then performed an interaction analysis for variants found to be marginally associated with glaucoma (SNPs with main effect p-value <0.01) and observed interesting findings in the electronic MEdical Records and GEnomics Network (eMERGE) network dataset. Genes from the top epistatic interactions from eMERGE data (Likelihood Ratio Test i.e. LRT p-value <1e-05) were then tested for replication in the NEIGHBOR consortium dataset. To replicate our findings, we performed a gene-based SNP-SNP interaction analysis in NEIGHBOR and observed significant gene-gene interactions (p-value <0.001) among the top 17 gene-gene models identified in the discovery phase. Variants from gene-gene interaction analysis that we found to be associated with POAG explain 3.5% of additional genetic variance in eMERGE dataset above what is explained by the SNPs in genes that are replicated from previous GWAS studies (which was only 2.1% variance explained in eMERGE dataset); in the NEIGHBOR dataset, adding replicated SNPs from gene-gene interaction analysis explain 3.4% of total variance whereas GWAS SNPs alone explain only 2.8% of variance. Exploring gene-gene interactions may provide additional insights into many complex traits when explored in properly designed and powered association studies.  相似文献   

2.
Several lines of evidence suggest that genome-wide association studies (GWAS) have the potential to explain more of the “missing heritability” of common complex phenotypes. However, reliable methods to identify a larger proportion of single nucleotide polymorphisms (SNPs) that impact disease risk are currently lacking. Here, we use a genetic pleiotropy-informed conditional false discovery rate (FDR) method on GWAS summary statistics data to identify new loci associated with schizophrenia (SCZ) and bipolar disorders (BD), two highly heritable disorders with significant missing heritability. Epidemiological and clinical evidence suggest similar disease characteristics and overlapping genes between SCZ and BD. Here, we computed conditional Q–Q curves of data from the Psychiatric Genome Consortium (SCZ; n = 9,379 cases and n = 7,736 controls; BD: n = 6,990 cases and n = 4,820 controls) to show enrichment of SNPs associated with SCZ as a function of association with BD and vice versa with a corresponding reduction in FDR. Applying the conditional FDR method, we identified 58 loci associated with SCZ and 35 loci associated with BD below the conditional FDR level of 0.05. Of these, 14 loci were associated with both SCZ and BD (conjunction FDR). Together, these findings show the feasibility of genetic pleiotropy-informed methods to improve gene discovery in SCZ and BD and indicate overlapping genetic mechanisms between these two disorders.  相似文献   

3.
Genome-wide association studies (GWAS) are widely used to search for genetic loci that underlie human disease. Another goal is to predict disease risk for different individuals given their genetic sequence. Such predictions could either be used as a “black box” in order to promote changes in life-style and screening for early diagnosis, or as a model that can be studied to better understand the mechanism of the disease. Current methods for risk prediction typically rank single nucleotide polymorphisms (SNPs) by the p-value of their association with the disease, and use the top-associated SNPs as input to a classification algorithm. However, the predictive power of such methods is relatively poor. To improve the predictive power, we devised BootRank, which uses bootstrapping in order to obtain a robust prioritization of SNPs for use in predictive models. We show that BootRank improves the ability to predict disease risk of unseen individuals in the Wellcome Trust Case Control Consortium (WTCCC) data and results in a more robust set of SNPs and a larger number of enriched pathways being associated with the different diseases. Finally, we show that combining BootRank with seven different classification algorithms improves performance compared to previous studies that used the WTCCC data. Notably, diseases for which BootRank results in the largest improvements were recently shown to have more heritability than previously thought, likely due to contributions from variants with low minimum allele frequency (MAF), suggesting that BootRank can be beneficial in cases where SNPs affecting the disease are poorly tagged or have low MAF. Overall, our results show that improving disease risk prediction from genotypic information may be a tangible goal, with potential implications for personalized disease screening and treatment.  相似文献   

4.
Extensive genetic studies have identified a large number of causal genetic variations in many human phenotypes; however, these could not completely explain heritability in complex diseases. Some researchers have proposed that the “missing heritability” may be attributable to gene–gene and gene–environment interactions. Because there are billions of potential interaction combinations, the statistical power of a single study is often ineffective in detecting these interactions. Meta-analysis is a common method of increasing detection power; however, accessing individual data could be difficult. This study presents a simple method that employs aggregated summary values from a “case” group to detect these specific interactions that based on rare disease and independence assumptions. However, these assumptions, particularly the rare disease assumption, may be violated in real situations; therefore, this study further investigated the robustness of our proposed method when it violates the assumptions. In conclusion, we observed that the rare disease assumption is relatively nonessential, whereas the independence assumption is an essential component. Because single nucleotide polymorphisms (SNPs) are often unrelated to environmental factors and SNPs on other chromosomes, researchers should use this method to investigate gene–gene and gene–environment interactions when they are unable to obtain detailed individual patient data.  相似文献   

5.
Late-onset Alzheimer''s disease (LOAD) is a multifactorial disorder with over twenty loci associated with disease risk. Given the number of genome-wide significant variants that fall outside of coding regions, it is possible that some of these variants alter some function of gene expression rather than tagging coding variants that alter protein structure and/or function. RegulomeDB is a database that annotates regulatory functions of genetic variants. In this study, we utilized RegulomeDB to investigate potential regulatory functions of lead single nucleotide polymorphisms (SNPs) identified in five genome-wide association studies (GWAS) of risk and age-at onset (AAO) of LOAD, as well as SNPs in LD (r2≥0.80) with the lead GWAS SNPs. Of a total 614 SNPs examined, 394 returned RegulomeDB scores of 1–6. Of those 394 variants, 34 showed strong evidence of regulatory function (RegulomeDB score <3), and only 3 of them were genome-wide significant SNPs (ZCWPW1/rs1476679, CLU/rs1532278 and ABCA7/rs3764650). This study further supports the assumption that some of the non-coding GWAS SNPs are true associations rather than tagged associations and demonstrates the application of RegulomeDB to GWAS data.  相似文献   

6.
Genome-wide association study (GWAS) data on a disease are increasingly available from multiple related populations. In this scenario, meta-analyses can improve power to detect homogeneous genetic associations, but if there exist ancestry-specific effects, via interactions on genetic background or with a causal effect that co-varies with genetic background, then these will typically be obscured. To address this issue, we have developed a robust statistical method for detecting susceptibility gene-ancestry interactions in multi-cohort GWAS based on closely-related populations. We use the leading principal components of the empirical genotype matrix to cluster individuals into “ancestry groups” and then look for evidence of heterogeneous genetic associations with disease or other trait across these clusters. Robustness is improved when there are multiple cohorts, as the signal from true gene-ancestry interactions can then be distinguished from gene-collection artefacts by comparing the observed interaction effect sizes in collection groups relative to ancestry groups. When applied to colorectal cancer, we identified a missense polymorphism in iron-absorption gene CYBRD1 that associated with disease in individuals of English, but not Scottish, ancestry. The association replicated in two additional, independently-collected data sets. Our method can be used to detect associations between genetic variants and disease that have been obscured by population genetic heterogeneity. It can be readily extended to the identification of genetic interactions on other covariates such as measured environmental exposures. We envisage our methodology being of particular interest to researchers with existing GWAS data, as ancestry groups can be easily defined and thus tested for interactions.  相似文献   

7.

Objective

We examined whether a panel of SNPs, systematically selected from genome-wide association studies (GWAS), could improve risk prediction of coronary heart disease (CHD), over-and-above conventional risk factors. These SNPs have already demonstrated reproducible associations with CHD; here we examined their use in long-term risk prediction.

Study Design and Setting

SNPs identified from meta-analyses of GWAS of CHD were tested in 840 men and women aged 55–75 from the Edinburgh Artery Study, a prospective, population-based study with 15 years of follow-up. Cox proportional hazards models were used to evaluate the addition of SNPs to conventional risk factors in prediction of CHD risk. CHD was classified as myocardial infarction (MI), coronary intervention (angioplasty, or coronary artery bypass surgery), angina and/or unspecified ischaemic heart disease as a cause of death; additional analyses were limited to MI or coronary intervention. Model performance was assessed by changes in discrimination and net reclassification improvement (NRI).

Results

There were significant improvements with addition of 27 SNPs to conventional risk factors for prediction of CHD (NRI of 54%, P<0.001; C-index 0.671 to 0.740, P = 0.001), as well as MI or coronary intervention, (NRI of 44%, P<0.001; C-index 0.717 to 0.750, P = 0.256). ROC curves showed that addition of SNPs better improved discrimination when the sensitivity of conventional risk factors was low for prediction of MI or coronary intervention.

Conclusion

There was significant improvement in risk prediction of CHD over 15 years when SNPs identified from GWAS were added to conventional risk factors. This effect may be particularly useful for identifying individuals with a low prognostic index who are in fact at increased risk of disease than indicated by conventional risk factors alone.  相似文献   

8.
Lifestyle and genetic factors play a large role in the development of Type 2 Diabetes (T2D). Despite the important role of genetic factors, genetic information is not incorporated into the clinical assessment of T2D risk. We assessed and compared Whole Genome Regression methods to predict the T2D status of 5,245 subjects from the Framingham Heart Study. For evaluating each method we constructed the following set of regression models: A clinical baseline model (CBM) which included non-genetic covariates only. CBM was extended by adding the first two marker-derived principal components and 65 SNPs identified by a recent GWAS consortium for T2D (M-65SNPs). Subsequently, it was further extended by adding 249,798 genome-wide SNPs from a high-density array. The Bayesian models used to incorporate genome-wide marker information as predictors were: Bayes A, Bayes Cπ, Bayesian LASSO (BL), and the Genomic Best Linear Unbiased Prediction (G-BLUP). Results included estimates of the genetic variance and heritability, genetic scores for T2D, and predictive ability evaluated in a 10-fold cross-validation. The predictive AUC estimates for CBM and M-65SNPs were: 0.668 and 0.684, respectively. We found evidence of contribution of genetic effects in T2D, as reflected in the genomic heritability estimates (0.492±0.066). The highest predictive AUC among the genome-wide marker Bayesian models was 0.681 for the Bayesian LASSO. Overall, the improvement in predictive ability was moderate and did not differ greatly among models that included genetic information. Approximately 58% of the total number of genetic variants was found to contribute to the overall genetic variation, indicating a complex genetic architecture for T2D. Our results suggest that the Bayes Cπ and the G-BLUP models with a large set of genome-wide markers could be used for predicting risk to T2D, as an alternative to using high-density arrays when selected markers from large consortiums for a given complex trait or disease are unavailable.  相似文献   

9.
Genome-wide association studies (GWAS) have generated sufficient data to assess the role of selection in shaping allelic diversity of disease-associated SNPs. Negative selection against disease risk variants is expected to reduce their frequencies making them overrepresented in the group of minor (<50%) alleles. Indeed, we found that the overall proportion of risk alleles was higher among alleles with frequency <50% (minor alleles) compared to that in the group of major alleles. We hypothesized that negative selection may have different effects on environment (or lifestyle)-dependent versus environment (or lifestyle)-independent diseases. We used an environment/lifestyle index (ELI) to assess influence of environmental/lifestyle factors on disease etiology. ELI was defined as the number of publications mentioning “environment” or “lifestyle” AND disease per 1,000 disease-mentioning publications. We found that the frequency distributions of the risk alleles for the diseases with strong environmental/lifestyle components follow the distribution expected under a selectively neutral model, while frequency distributions of the risk alleles for the diseases with weak environmental/lifestyle influences is shifted to the lower values indicating effects of negative selection. We hypothesized that previously selectively neutral variants become risk alleles when environment changes. The hypothesis of ancestrally neutral, currently disadvantageous risk-associated alleles predicts that the distribution of risk alleles for the environment/lifestyle dependent diseases will follow a neutral model since natural selection has not had enough time to influence allele frequencies. The results of our analysis suggest that prediction of SNP functionality based on the level of evolutionary conservation may not be useful for SNPs associated with environment/lifestyle dependent diseases.  相似文献   

10.
Genome wide association studies (GWAS) have established association of ARID5B and IKZF1 variants with childhood acute lymphoblastic leukemia (ALL). Epidemiological studies suggest that environmental factors alone appear to make a relatively minor contribution to disease risk. The polygenic nature of childhood ALL predisposition together with the timing of environmental triggers may hold vital clues for disease etiology. This study presents results from an Australian GWAS of childhood ALL cases (n = 358) and population controls (n = 1192). Furthermore, we utilised family trio (n = 204) genotypes to extend our investigation to gene-environment interaction of significant loci with parental exposures before conception, and child’s sex and age. Thirteen SNPs achieved genome wide significance in the population based case/control analysis; ten annotated to ARID5B and three to IKZF1. The most significant SNPs in these regions were ARID5B rs4245595 (OR 1.63, CI 1.38–1.93, P = 2.13×10−9), and IKZF1 rs1110701 (OR 1.69, CI 1.42–2.02, p = 7.26×10−9). There was evidence of gene-environment interaction for risk genotype at IKZF1, whereby an apparently stronger genetic effect was observed if the mother took folic acid or if the father did not smoke prior to pregnancy (respective interaction P-values: 0.04, 0.05). There were no interactions of risk genotypes with age or sex (P-values >0.2). Our results evidence that interaction of genetic variants and environmental exposures may further alter risk of childhood ALL however, investigation in a larger population is required. If interaction of folic acid supplementation and IKZF1 variants holds, it may be useful to quantify folate levels prior to initiating use of folic acid supplements.  相似文献   

11.
12.
Congenital heart disease (CHD) is the most common form of congenital human birth anomalies and a leading cause of perinatal and infant mortality. Some studies including our published genome-wide association study (GWAS) of CHD have indicated that genetic variants may contribute to the risk of CHD. Recently, Cordell et al. published a GWAS of multiple CHD phenotypes in European Caucasians and identified 3 susceptibility loci (rs870142, rs16835979 and rs6824295) for ostium secundum atrial septal defect (ASD) at chromosome 4p16. However, whether these loci at 4p16 confer the predisposition to CHD in Chinese population is unclear. In the current study, we first analyzed the associations between these 3 single nucleotide polymorphisms (SNPs) at 4p16 and CHD risk by using our existing genome-wide scan data and found all of the 3 SNPs showed significant associations with ASD in the same direction as that observed in Cordell’s study, but not with other subtypes- ventricular septal defect (VSD) and ASD combined VSD. As these 3 SNPs were in high linkage disequilibrium (LD) in Chinese population, we selected one SNP with the lowest P value in our GWAS scan (rs16835979) to perform a replication study with additional 1,709 CHD cases with multiple phenotypes and 1,962 controls. The significant association was also observed only within the ASD subgroup, which was heterogeneous from other disease groups. In combined GWAS and replication samples, the minor allele of rs16835979 remained significant association with the risk of ASD (OR = 1.22, 95% CI = 1.08–1.38, P = 0.001). Our findings suggest that susceptibility loci of ASD identified from Cordell’s European GWAS are generalizable to Chinese population, and such investigation may provide new insights into the roles of genetic variants in the etiology of different CHD phenotypes.  相似文献   

13.
We developed a new statistical framework to find genetic variants associated with extreme longevity. The method, informed GWAS (iGWAS), takes advantage of knowledge from large studies of age-related disease in order to narrow the search for SNPs associated with longevity. To gain support for our approach, we first show there is an overlap between loci involved in disease and loci associated with extreme longevity. These results indicate that several disease variants may be depleted in centenarians versus the general population. Next, we used iGWAS to harness information from 14 meta-analyses of disease and trait GWAS to identify longevity loci in two studies of long-lived humans. In a standard GWAS analysis, only one locus in these studies is significant (APOE/TOMM40) when controlling the false discovery rate (FDR) at 10%. With iGWAS, we identify eight genetic loci to associate significantly with exceptional human longevity at FDR < 10%. We followed up the eight lead SNPs in independent cohorts, and found replication evidence of four loci and suggestive evidence for one more with exceptional longevity. The loci that replicated (FDR < 5%) included APOE/TOMM40 (associated with Alzheimer’s disease), CDKN2B/ANRIL (implicated in the regulation of cellular senescence), ABO (tags the O blood group), and SH2B3/ATXN2 (a signaling gene that extends lifespan in Drosophila and a gene involved in neurological disease). Our results implicate new loci in longevity and reveal a genetic overlap between longevity and age-related diseases and traits, including coronary artery disease and Alzheimer’s disease. iGWAS provides a new analytical strategy for uncovering SNPs that influence extreme longevity, and can be applied more broadly to boost power in other studies of complex phenotypes.  相似文献   

14.
《PloS one》2012,7(12)
Genome-wide association studies (GWAS) have successfully identified a number of single-nucleotide polymorphisms (SNPs) associated with colorectal cancer (CRC) risk. However, these susceptibility loci known today explain only a small fraction of the genetic risk. Gene-gene interaction (GxG) is considered to be one source of the missing heritability. To address this, we performed a genome-wide search for pair-wise GxG associated with CRC risk using 8,380 cases and 10,558 controls in the discovery phase and 2,527 cases and 2,658 controls in the replication phase. We developed a simple, but powerful method for testing interaction, which we term the Average Risk Due to Interaction (ARDI). With this method, we conducted a genome-wide search to identify SNPs showing evidence for GxG with previously identified CRC susceptibility loci from 14 independent regions. We also conducted a genome-wide search for GxG using the marginal association screening and examining interaction among SNPs that pass the screening threshold (p<10−4). For the known locus rs10795668 (10p14), we found an interacting SNP rs367615 (5q21) with replication p = 0.01 and combined p = 4.19×10−8. Among the top marginal SNPs after LD pruning (n = 163), we identified an interaction between rs1571218 (20p12.3) and rs10879357 (12q21.1) (nominal combined p = 2.51×10−6; Bonferroni adjusted p = 0.03). Our study represents the first comprehensive search for GxG in CRC, and our results may provide new insight into the genetic etiology of CRC.  相似文献   

15.
Although great progress in genome-wide association studies (GWAS) has been made, the significant SNP associations identified by GWAS account for only a few percent of the genetic variance, leading many to question where and how we can find the missing heritability. There is increasing interest in genome-wide interaction analysis as a possible source of finding heritability unexplained by current GWAS. However, the existing statistics for testing interaction have low power for genome-wide interaction analysis. To meet challenges raised by genome-wide interactional analysis, we have developed a novel statistic for testing interaction between two loci (either linked or unlinked). The null distribution and the type I error rates of the new statistic for testing interaction are validated using simulations. Extensive power studies show that the developed statistic has much higher power to detect interaction than classical logistic regression. The results identified 44 and 211 pairs of SNPs showing significant evidence of interactions with FDR<0.001 and 0.001<FDR<0.003, respectively, which were seen in two independent studies of psoriasis. These included five interacting pairs of SNPs in genes LST1/NCR3, CXCR5/BCL9L, and GLS2, some of which were located in the target sites of miR-324-3p, miR-433, and miR-382, as well as 15 pairs of interacting SNPs that had nonsynonymous substitutions. Our results demonstrated that genome-wide interaction analysis is a valuable tool for finding remaining missing heritability unexplained by the current GWAS, and the developed novel statistic is able to search significant interaction between SNPs across the genome. Real data analysis showed that the results of genome-wide interaction analysis can be replicated in two independent studies.  相似文献   

16.
Genome-wide association studies (GWAS) for Autism Spectrum Disorder (ASD) thus far met limited success in the identification of common risk variants, consistent with the notion that variants with small individual effects cannot be detected individually in single SNP analysis. To further capture disease risk gene information from ASD association studies, we applied a network-based strategy to the Autism Genome Project (AGP) and the Autism Genetics Resource Exchange GWAS datasets, combining family-based association data with Human Protein-Protein interaction (PPI) data. Our analysis showed that autism-associated proteins at higher than conventional levels of significance (P<0.1) directly interact more than random expectation and are involved in a limited number of interconnected biological processes, indicating that they are functionally related. The functionally coherent networks generated by this approach contain ASD-relevant disease biology, as demonstrated by an improved positive predictive value and sensitivity in retrieving known ASD candidate genes relative to the top associated genes from either GWAS, as well as a higher gene overlap between the two ASD datasets. Analysis of the intersection between the networks obtained from the two ASD GWAS and six unrelated disease datasets identified fourteen genes exclusively present in the ASD networks. These are mostly novel genes involved in abnormal nervous system phenotypes in animal models, and in fundamental biological processes previously implicated in ASD, such as axon guidance, cell adhesion or cytoskeleton organization. Overall, our results highlighted novel susceptibility genes previously hidden within GWAS statistical “noise” that warrant further analysis for causal variants.  相似文献   

17.
High-density genetic markers are the prerequisite for understanding linkage disequilibrium (LD) and genome-wide association studies (GWASs) of complex traits in crops. To evaluate the LD pattern in oilseed rape, we sequenced a previous association panel containing 189 B. napus inbred lines using double-digested restriction-site associated DNA (ddRAD) and genotyped 19,327 RAD tags. A total of 15,921 RAD tags were assigned to a published genetic linkage map and the majority (71.1%) of these tags was uniquely mapped to the draft reference genome “Darmor-bzh.” The distance of LD decay was 1,214 kb across the genome at the background level (r2 = 0.26), with the distances of LD decay being 405 kb and 2,111 kb in the A and C subgenomes, respectively. A total of 361 haplotype blocks with length > 100 kb were identified in the entire genome. The association panel could be classified into two groups, P1 and P2, which are essentially consistent with the geographical origins of varieties. A large number of group-specific haplotypes were identified, reflecting that varieties in the P1 and P2 groups experienced distinct selection in breeding programs to adapt their different growth habitats. GWAS repeatedly detected two loci significantly associated with oil content of seeds based on the developed SNPs, suggesting that the high-density SNPs were useful for understanding the genetic determinants of complex traits in GWAS.  相似文献   

18.
The range of possible gene interactions in a multilocus model of a complex inherited disease is studied by exploring genotype-specific risks subject to the constraint that the allele frequencies and marginal risks are known. We quantify the effect of gene interactions by defining the interaction ratio, , where KR is the recurrence risk to relatives with relationship R for the true model and is the recurrence risk to relatives for a multiplicative model with the same marginal risks. We use a Markov chain Monte Carlo (MCMC) procedure to sample from the space of possible models. We find that the average of CR increases with the number of loci for both low frequency (p = 0.03) and higher frequency (p = 0.25) causative alleles. Furthermore, the probability that CR > 1 is nearly 1. Similar results are obtained when more weight is given to risk models that are closer to the comparable multiplicative model. These results imply that, in general, gene interactions will result in greater heritability of a complex inherited disease than is expected on the basis of a multiplicative model of interactions and hence may provide a partial explanation for the problem of missing heritability of complex diseases.ALTHOUGH many genome-wide association studies (GWAS) have been performed and have found hundreds of SNPs associated with higher risk of complex inherited diseases, those SNPs so far account for only a small fraction of the inherited risk of those diseases (Altshuler et al. 2008). Several not mutually exclusive explanations have been proposed for the “missing heritability,” i.e., the heritability that is not yet accounted for by SNPs found in GWAS (Manolio et al. 2009): (i) common alleles of small effect that have not been found because GWAS done so far have been underpowered, (ii) low-frequency alleles of moderate effect that are difficult to find using HapMap SNPs, (iii) rare copy-number variants that are not in strong linkage disequilibrium (LD) with HapMap SNPs, (iv) inherited epigenetic factors that are not in strong LD with HapMap SNPs, and (v) interactions among causative alleles that conceal their true contribution to heritability. In this article we investigate the last possibility and determine the extent to which interactions may account for missing heritability.Our analysis is in the same spirit as that of Culverhouse et al. (2002). We assume that the risk of being affected by a complex disease is determined by an individual''s genotype at two or more loci and that the frequencies of causative alleles and the average risks for each one-locus genotype (the marginal risks) are known. Culverhouse et al. (2002) assumed the marginal risks were the same for all genotypes and all loci. In that case, causative alleles have odds ratios of 1; they contribute to risk only through their interactions. Culverhouse et al. found the risk function that maximized the heritability and showed that the maximum possible heritability attributable to interactions increased with the number of loci. They concluded that it is quite possible that interactions among loci that have no main effect could contribute substantially to the heritability of a complex disease and indeed could account for “virtually all the variation in affection status for diseases with any prevalence” (Culverhouse et al. 2002, p. 468).We generalize the analysis of Culverhouse et al. in three ways. First, we allow causative alleles to have odds ratios >1. Second, we explore the entire space of models instead of focusing only on the risk model that maximizes heritability. Third, we examine how the importance of gene interactions depends on the “distance” between a risk model and a comparable multiplicative model. We show that gene interactions can substantially increase the heritability of risk as measured by recurrence risk, KR, and that the effect increases with the number of loci carrying causative alleles. Furthermore, we show that these results are true even if more weight is given to models that are closer to a comparable multiplicative model.Geometrically, the space of feasible genotype-specific risks subject to the aforementioned constraints (i.e., that the allele frequencies and marginal risks are known) corresponds to a high-dimensional convex polytope, and the computational problem of interest involves integrating a quadratic function over the polytope. The dimension of the polytope grows exponentially with the number of loci, and, therefore, analytic computation is intractable for more than two loci. Hence, we devise a Monte Carlo approach to tackle the problem. Note that, because of high dimensionality, rejection algorithms are not appropriate for this kind of problem. We instead employ a Markov chain Monte Carlo (MCMC) algorithm based on a random walk that always stays inside the polytope. We present empirical results for up to five loci and obtain a closed-form formula for the minimum of KR over the polytope; the latter result applies to an arbitrary number of loci. Interestingly, the minimum of KR decreases as the number L of loci increases, but the average of KR over the polytope increases with L.  相似文献   

19.
Assumptions are made about the genetic model of single nucleotide polymorphisms (SNPs) when choosing a traditional genetic encoding: additive, dominant, and recessive. Furthermore, SNPs across the genome are unlikely to demonstrate identical genetic models. However, running SNP-SNP interaction analyses with every combination of encodings raises the multiple testing burden. Here, we present a novel and flexible encoding for genetic interactions, the elastic data-driven genetic encoding (EDGE), in which SNPs are assigned a heterozygous value based on the genetic model they demonstrate in a dataset prior to interaction testing. We assessed the power of EDGE to detect genetic interactions using 29 combinations of simulated genetic models and found it outperformed the traditional encoding methods across 10%, 30%, and 50% minor allele frequencies (MAFs). Further, EDGE maintained a low false-positive rate, while additive and dominant encodings demonstrated inflation. We evaluated EDGE and the traditional encodings with genetic data from the Electronic Medical Records and Genomics (eMERGE) Network for five phenotypes: age-related macular degeneration (AMD), age-related cataract, glaucoma, type 2 diabetes (T2D), and resistant hypertension. A multi-encoding genome-wide association study (GWAS) for each phenotype was performed using the traditional encodings, and the top results of the multi-encoding GWAS were considered for SNP-SNP interaction using the traditional encodings and EDGE. EDGE identified a novel SNP-SNP interaction for age-related cataract that no other method identified: rs7787286 (MAF: 0.041; intergenic region of chromosome 7)–rs4695885 (MAF: 0.34; intergenic region of chromosome 4) with a Bonferroni LRT p of 0.018. A SNP-SNP interaction was found in data from the UK Biobank within 25 kb of these SNPs using the recessive encoding: rs60374751 (MAF: 0.030) and rs6843594 (MAF: 0.34) (Bonferroni LRT p: 0.026). We recommend using EDGE to flexibly detect interactions between SNPs exhibiting diverse action.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号