首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Gene-gene interactions may play an important role in the genetics of a complex disease. Detection and characterization of gene-gene interactions is a challenging issue that has stimulated the development of various statistical methods to address it. In this study, we introduce a method to measure gene interactions using entropy-based statistics from a contingency table of trait and genotype combinations. We also developed an exploration procedure by using graphs. We propose a standardized relative information gain (RIG) measure to evaluate the interactions between single nucleotide polymorphism (SNP) combinations. To identify the k th order interactions, contingency tables of trait and genotype combinations of k SNPs are constructed, with which RIGs are calculated. The RIGs are standardized using the mean and standard deviation from the permuted datasets. SNP combinations yielding high standardized RIG are chosen for gene-gene interactions. Detection of high-order interactions and comparison of interaction strengths between different orders are made possible by using standardized RIG. We have applied the proposed standardized entropy-based method to two types of data sets from a simulation study and a real genetic association study. We have compared our method and the multifactor dimensionality reduction (MDR) method through power analysis of eight different genetic models with varying penetrance rates, number of SNPs, and sample sizes. Our method shows successful identification of genetic associations and gene-gene interactions both in simulation and real genetic data. Simulation results suggest that the proposed entropy-based method is better able to detect high-order interactions and is superior to the MDR method in most cases. The proposed method is well suited for detecting interactions without main effects as well as for models including main effects.  相似文献   

2.
Complex interactions among genes and environmental factors are known to play a role in common human disease aetiology. There is a growing body of evidence to suggest that complex interactions are 'the norm' and, rather than amounting to a small perturbation to classical Mendelian genetics, interactions may be the predominant effect. Traditional statistical methods are not well suited for detecting such interactions, especially when the data are high dimensional (many attributes or independent variables) or when interactions occur between more than two polymorphisms. In this review, we discuss machine-learning models and algorithms for identifying and characterising susceptibility genes in common, complex, multifactorial human diseases. We focus on the following machine-learning methods that have been used to detect gene-gene interactions: neural networks, cellular automata, random forests, and multifactor dimensionality reduction. We conclude with some ideas about how these methods and others can be integrated into a comprehensive and flexible framework for data mining and knowledge discovery in human genetics.  相似文献   

3.
4.
Recently more and more evidence suggest that rare variants with much lower minor allele frequencies play significant roles in disease etiology. Advances in next-generation sequencing technologies will lead to many more rare variants association studies. Several statistical methods have been proposed to assess the effect of rare variants by aggregating information from multiple loci across a genetic region and testing the association between the phenotype and aggregated genotype. One limitation of existing methods is that they only look into the marginal effects of rare variants but do not systematically take into account effects due to interactions among rare variants and between rare variants and environmental factors. In this article, we propose the summation of partition approach (SPA), a robust model-free method that is designed specifically for detecting both marginal effects and effects due to gene-gene (G×G) and gene-environmental (G×E) interactions for rare variants association studies. SPA has three advantages. First, it accounts for the interaction information and gains considerable power in the presence of unknown and complicated G×G or G×E interactions. Secondly, it does not sacrifice the marginal detection power; in the situation when rare variants only have marginal effects it is comparable with the most competitive method in current literature. Thirdly, it is easy to extend and can incorporate more complex interactions; other practitioners and scientists can tailor the procedure to fit their own study friendly. Our simulation studies show that SPA is considerably more powerful than many existing methods in the presence of G×G and G×E interactions.  相似文献   

5.
For genome-wide association data analysis, two genes in any pathway, two SNPs in the two linked gene regions respectively or in the two linked exons respectively within one gene are often correlated with each other. We therefore proposed the concept of gene-gene co-association, which refers to the effects not only due to the traditional interaction under nearly independent condition but the correlation between two genes. Furthermore, we constructed a novel statistic for detecting gene-gene co-association based on Partial Least Squares Path Modeling (PLSPM). Through simulation, the relationship between traditional interaction and co-association was highlighted under three different types of co-association. Both simulation and real data analysis demonstrated that the proposed PLSPM-based statistic has better performance than single SNP-based logistic model, PCA-based logistic model, and other gene-based methods.  相似文献   

6.
Many popular methods for exploring gene-gene interactions, including the case-only approach, rely on the key assumption that physically distant loci are in linkage equilibrium in the underlying population. These methods utilize the presence of correlation between unlinked loci in a disease-enriched sample as evidence of interactions among the loci in the etiology of the disease. We use data from the CGEMS case-control genome-wide association study of breast cancer to demonstrate empirically that the case-only and related methods have the potential to create large-scale false positives because of the presence of population stratification (PS) that creates long-range linkage disequilibrium in the genome. We show that the bias can be removed by considering parametric and nonparametric methods that assume gene-gene independence between unlinked loci, not in the entire population, but only conditional on population substructure that can be uncovered based on the principal components of a suitably large panel of PS markers. Applications in the CGEMS study as well as simulated data show that the proposed methods are robust to the presence of population stratification and are yet much more powerful, relative to standard logistic regression methods that are also commonly used as robust alternatives to the case-only type methods.  相似文献   

7.
Extensive genetic studies have identified a large number of causal genetic variations in many human phenotypes; however, these could not completely explain heritability in complex diseases. Some researchers have proposed that the “missing heritability” may be attributable to gene–gene and gene–environment interactions. Because there are billions of potential interaction combinations, the statistical power of a single study is often ineffective in detecting these interactions. Meta-analysis is a common method of increasing detection power; however, accessing individual data could be difficult. This study presents a simple method that employs aggregated summary values from a “case” group to detect these specific interactions that based on rare disease and independence assumptions. However, these assumptions, particularly the rare disease assumption, may be violated in real situations; therefore, this study further investigated the robustness of our proposed method when it violates the assumptions. In conclusion, we observed that the rare disease assumption is relatively nonessential, whereas the independence assumption is an essential component. Because single nucleotide polymorphisms (SNPs) are often unrelated to environmental factors and SNPs on other chromosomes, researchers should use this method to investigate gene–gene and gene–environment interactions when they are unable to obtain detailed individual patient data.  相似文献   

8.

Background

Epistasis, i.e., the interaction of alleles at different loci, is thought to play a central role in the formation and progression of complex diseases. The complexity of disease expression should arise from a complex network of epistatic interactions involving multiple genes.

Methodology

We develop a general model for testing high-order epistatic interactions for a complex disease in a case-control study. We incorporate the quantitative genetic theory of high-order epistasis into the setting of cases and controls sampled from a natural population. The new model allows the identification and testing of epistasis and its various genetic components.

Conclusions

Simulation studies were used to examine the power and false positive rates of the model under different sampling strategies. The model was used to detect epistasis in a case-control study of inflammatory bowel disease, in which five SNPs at a candidate gene were typed, leading to the identification of a significant three-locus epistasis.  相似文献   

9.
GWAS have emerged as popular tools for identifying genetic variants that are associated with disease risk. Standard analysis of a case-control GWAS involves assessing the association between each individual genotyped SNP and disease risk. However, this approach suffers from limited reproducibility and difficulties in detecting multi-SNP and epistatic effects. As an alternative analytical strategy, we propose grouping SNPs together into SNP sets on the basis of proximity to genomic features such as genes or haplotype blocks, then testing the joint effect of each SNP set. Testing of each SNP set proceeds via the logistic kernel-machine-based test, which is based on a statistical framework that allows for flexible modeling of epistatic and nonlinear SNP effects. This flexibility and the ability to naturally adjust for covariate effects are important features of our test that make it appealing in comparison to individual SNP tests and existing multimarker tests. Using simulated data based on the International HapMap Project, we show that SNP-set testing can have improved power over standard individual-SNP analysis under a wide range of settings. In particular, we find that our approach has higher power than individual-SNP analysis when the median correlation between the disease-susceptibility variant and the genotyped SNPs is moderate to high. When the correlation is low, both individual-SNP analysis and the SNP-set analysis tend to have low power. We apply SNP-set analysis to analyze the Cancer Genetic Markers of Susceptibility (CGEMS) breast cancer GWAS discovery-phase data.  相似文献   

10.
We present an extension of the two-class multifactor dimensionality reduction (MDR) algorithm that enables detection and characterization of epistatic SNP-SNP interactions in the context of a quantitative trait. The proposed Quantitative MDR (QMDR) method handles continuous data by modifying MDR’s constructive induction algorithm to use a T-test. QMDR replaces the balanced accuracy metric with a T-test statistic as the score to determine the best interaction model. We used a simulation to identify the empirical distribution of QMDR’s testing score. We then applied QMDR to genetic data from the ongoing prospective Prevention of Renal and Vascular End-Stage Disease (PREVEND) study.  相似文献   

11.
As sessile organisms, plants must cope with multiple and combined variations of signals in their environment. However, very few reports have studied the genome-wide effects of systematic signal combinations on gene expression. Here, we evaluate a high level of signal integration, by modeling genome-wide expression patterns under a factorial combination of carbon (C), light (L), and nitrogen (N) as binary factors in two organs (O), roots and leaves. Signal management is different between C, N, and L and in shoots and roots. For example, L is the major factor controlling gene expression in leaves. However, in roots there is no obvious prominent signal, and signal interaction is stronger. The major signal interaction events detected genome wide in Arabidopsis roots are deciphered and summarized in a comprehensive conceptual model. Surprisingly, global analysis of gene expression in response to C, N, L, and O revealed that the number of genes controlled by a signal is proportional to the magnitude of the gene expression changes elicited by the signal. These results uncovered a strong constraining structure in plant cell signaling pathways, which prompted us to propose the existence of a “code” of signal integration.  相似文献   

12.
《PloS one》2012,7(12)
Genome-wide association studies (GWAS) have successfully identified a number of single-nucleotide polymorphisms (SNPs) associated with colorectal cancer (CRC) risk. However, these susceptibility loci known today explain only a small fraction of the genetic risk. Gene-gene interaction (GxG) is considered to be one source of the missing heritability. To address this, we performed a genome-wide search for pair-wise GxG associated with CRC risk using 8,380 cases and 10,558 controls in the discovery phase and 2,527 cases and 2,658 controls in the replication phase. We developed a simple, but powerful method for testing interaction, which we term the Average Risk Due to Interaction (ARDI). With this method, we conducted a genome-wide search to identify SNPs showing evidence for GxG with previously identified CRC susceptibility loci from 14 independent regions. We also conducted a genome-wide search for GxG using the marginal association screening and examining interaction among SNPs that pass the screening threshold (p<10−4). For the known locus rs10795668 (10p14), we found an interacting SNP rs367615 (5q21) with replication p = 0.01 and combined p = 4.19×10−8. Among the top marginal SNPs after LD pruning (n = 163), we identified an interaction between rs1571218 (20p12.3) and rs10879357 (12q21.1) (nominal combined p = 2.51×10−6; Bonferroni adjusted p = 0.03). Our study represents the first comprehensive search for GxG in CRC, and our results may provide new insight into the genetic etiology of CRC.  相似文献   

13.
Tuberculosis (TB) is the second leading cause of mortality from infectious disease worldwide. One of the factors involved in developing disease is the genetics of the host, yet the field of TB susceptibility genetics has not yielded the answers that were expected. A commonly posited explanation for the missing heritability of complex disease is gene-gene interactions, also referred to as epistasis. In this study we investigate the role of gene-gene interactions in genetic susceptibility to TB using a cohort recruited from a high TB incidence community from Cape Town, South Africa. Our discovery data set incorporates genotypes from a large a number of candidate gene studies as well as genome-wide data. After limiting our search space to pairs of putative TB susceptibility genes, as well as pairs of genes that have been curated in online databases as potential interactors, we use statistical modelling to identify pairs of interacting SNPs. We attempt to validate the top models identified in our discovery data set using an independent genome-wide TB case-control data set from The Gambia. A number of models were successfully validated, indicating that interplay between the NRG1 - NRG3, GRIK1 - GRIK3 and IL23R - ATG4C gene pairs may modify susceptibility to TB. Gene pairs involved in the NF-κB pathway were also identified in the discovery data set (SFTPD - NOD2, ISG15 - TLR8 and NLRC5 - IL12RB1), but could not be tested in the Gambian study group due to lack of overlapping data.  相似文献   

14.
Genome-wide association studies (GWAS) have successfully discovered hundreds of associations between genetic variants and complex traits. Most GWAS have focused on the identification of single variants. It has been shown that most of the variants that were discovered by GWAS could only partially explain disease heritability. The explanation for this missing heritability is generally believed to be gene-gene (GG) or gene-environment (GE) interactions and other structural variants. Generalized multifactor dimensionality reduction (GMDR) has been proven to be reasonably powerful in detecting GG and GE interactions; however, its performance has been found to decline when outlying quantitative traits are present. This paper proposes a robust GMDR estimation method (based on the L-estimator and M-estimator estimation methods) in an attempt to reduce the effects caused by outlying traits. A comparison of robust GMDR with the original MDR based on simulation studies showed the former method to outperform the latter. The performance of robust GMDR is illustrated through a real GWA example consisting of 8,577 samples from the Korean population using the Homeostasis Model Assessment of Insulin Resistance (HOMA-IR) level as a phenotype. Robust GMDR identified the KCNH1 gene to have strong interaction effects with other genes on the function of insulin secretion.  相似文献   

15.
The elusive but ubiquitous multifactor interactions represent a stumbling block that urgently needs to be removed in searching for determinants involved in human complex diseases. The dimensionality reduction approaches are a promising tool for this task. Many complex diseases exhibit composite syndromes required to be measured in a cluster of clinical traits with varying correlations and/or are inherently longitudinal in nature (changing over time and measured dynamically at multiple time points). A multivariate approach for detecting interactions is thus greatly needed on the purposes of handling a multifaceted phenotype and longitudinal data, as well as improving statistical power for multiple significance testing via a two-stage testing procedure that involves a multivariate analysis for grouped phenotypes followed by univariate analysis for the phenotypes in the significant group(s). In this article, we propose a multivariate extension of generalized multifactor dimensionality reduction (GMDR) based on multivariate generalized linear, multivariate quasi-likelihood and generalized estimating equations models. Simulations and real data analysis for the cohort from the Study of Addiction: Genetics and Environment are performed to investigate the properties and performance of the proposed method, as compared with the univariate method. The results suggest that the proposed multivariate GMDR substantially boosts statistical power.  相似文献   

16.
17.
Detecting gene-gene interaction in complex diseases has become an important priority for common disease genetics, but most current approaches to detecting interaction start with disease-marker associations. These approaches are based on population allele frequency correlations, not genetic inheritance, and therefore cannot exploit the rich information about inheritance contained within families. They are also hampered by issues of rigorous phenotype definition, multiple test correction, and allelic and locus heterogeneity. We recently developed, tested, and published a powerful gene-gene interaction detection strategy based on conditioning family data on a known disease-causing allele or a disease-associated marker allele4. We successfully applied the method to disease data and used computer simulation to exhaustively test the method for some epistatic models. We knew that the statistic we developed to indicate interaction was less reliable when applied to more-complex interaction models. Here, we improve the statistic and expand the testing procedure. We computer-simulated multipoint linkage data for a disease caused by two interacting loci. We examined epistatic as well as additive models and compared them with heterogeneity models. In all our models, the at-risk genotypes are “major” in the sense that among affected individuals, a substantial proportion has a disease-related genotype. One of the loci (A) has a known disease-related allele (as would have been determined from a previous analysis). We removed (pruned) family members who did not carry this allele; the resultant dataset is referred to as “stratified.” This elimination step has the effect of raising the “penetrance” and detectability at the second locus (B). We used the lod scores for the stratified and unstratified data sets to calculate a statistic that either indicated the presence of interaction or indicated that no interaction was detectable. We show that the new method is robust and reliable for a wide range of parameters. Our statistic performs well both with the epistatic models (false negative rates, i.e., failing to detect interaction, ranging from 0 to 2.5%) and with the heterogeneity models (false positive rates, i.e., falsely detecting interaction, ≤1%). It works well with the additive model except when allele frequencies at the two loci differ widely. We explore those features of the additive model that make detecting interaction more difficult. All testing of this method suggests that it provides a reliable approach to detecting gene-gene interaction.  相似文献   

18.

Background

Evidence has accumulated that multiple genetic and environmental factors play important roles in determining susceptibility to type 2 diabetes (T2D). Although variants from candidate genes have become prime targets for genetic analysis, few studies have considered their interplay. Our goal was to evaluate interactions among SNPs within genes frequently identified as associated with T2D.

Methods/Principal Findings

Logistic regression was used to study interactions among 4 SNPs, one each from HNF4A[rs1884613], TCF7L2[rs12255372], WFS1[rs10010131], and KCNJ11[rs5219] in a case-control Ashkenazi sample of 974 diabetic subjects and 896 controls. Nonparametric multifactor dimensionality reduction (MDR) and generalized MDR (GMDR) were used to confirm findings from the logistic regression analysis. HNF4A and WFS1 SNPs were associated with T2D in logistic regression analyses [P<0.0001, P<0.0002, respectively]. Interaction between these SNPs were also strong using parametric or nonparametric methods: the unadjusted odds of being affected with T2D was 3 times greater in subjects with the HNF4A and WFS1 risk alleles than those without either (95% CI = [1.7–5.3]; P≤0.0001). Although the univariate association between the TCF7L2 SNP and T2D was relatively modest [P = 0.02], when paired with the HNF4A SNP, the OR for subjects with risk alleles in both SNPs was 2.4 [95% CI = 1.7–3.4; P≤0.0001]. The KCNJ11 variant reached significance only when paired with either the HNF4A or WFSI SNPs: unadjusted ORs were 2.0 [95% CI = 1.4–2.8; P≤0.0001] and 2.3 [95% CI = 1.2-4.4; P≤0.0001], respectively. MDR and GMDR results were consistent with the parametric findings.

Conclusions

These results provide evidence of strong independent associations between T2D and SNPs in HNF4A and WFS1 and their interaction in our Ashkenazi sample. We also observed an interaction in the nonparametric analysis between the HNF4A and KCNJ11 SNPs (P≤0.001), demonstrating that an independently non-significant variant may interact with another variant resulting in an increased disease risk.  相似文献   

19.

Background and Objective

Previous investigations of glioma risk in women have focused on oral contraceptive (OC), hormone replacement therapy (HRT), and reproductive factors. However, the results of published studies were inconclusive and inconsistent. Thus, a meta-analysis based on published case-control studies was performed to assess the role of exogenous and endogenous hormones factors in glioma risk.

Methods

The PubMed and EMBASE databases were searched without any restrictions on language or publication year. Reference lists from retrieved articles were also reviewed. We included case-control studies reporting relative risks (RRs) with corresponding 95% confidence intervals (CIs) (or data to calculate them) between oral contraceptive (OC) and hormone replacement therapy (HRT) use, reproductive factors and glioma. Random-effects models were used to calculate the summary risk estimates.

Results

Finally, 11 eligible studies with 4860 cases and 14,740 controls were identified. A lower risk of glioma was observed among women who were ever users of exogenous hormones (OC RR = 0.707, 95% CI = 0.604–0.828; HRT: RR = 0.683, 95% CI = 0.577–0.808) compared with never users. An increased glioma risk was associated with older age at menarche (RR = 1.401, 95% CI = 1.052–1.865). No association was observed for menopause status, parous status, age at menopause, or age at first birth and glioma risk.

Conclusion

The results of our study support the hypothesis female sex hormones play a role in the development of glioma in women. Additional studies are warranted to validate the conclusion from this meta-analysis and clarity the underlying mechanisms.  相似文献   

20.
For modelling dose-response relationships in case-control studies the multiplicative logistic regression model, assuming the relative risk to be an exponential function of the dose, is widely known. If the relative risk is assumed to be a linear function of the dose, several authors (see e.g. BERRY (1980)) have proposed an additive (linear) model. This model has a better fit with the data if such a linear relation holds. Confidence limits for the relative risk derived from the information matrix, however, appear to be rather inaccurate. Therefore, use of the ‘standard’ logistic model in two different ways was studied: extension with a quadratic term or a logarithmic transformation of the dose. By applying the methods both to an empirical data set and in a simulation experiment, it is shown that appropriate transformation (often logarithmic) of the dosage and then applying the ‘standard’ logistic model is an useful approach if a linear dose-response relationship holds.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号