首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Despite the growing consensus on the importance of testing gene-gene interactions in genetic studies of complex diseases, the effect of gene-gene interactions has often been defined as a deviance from genetic additive effects, which is essentially treated as a residual term in genetic analysis and leads to low power in detecting the presence of interacting effects. To what extent the definition of gene-gene interaction at population level reflects the genes' biochemical or physiological interaction remains a mystery. In this article, we introduce a novel definition and a new measure of gene-gene interaction between two unlinked loci (or genes). We developed a general theory for studying linkage disequilibrium (LD) patterns in disease population under two-locus disease models. The properties of using the LD measure in a disease population as a function of the measure of gene-gene interaction between two unlinked loci were also investigated. We examined how interaction between two loci creates LD in a disease population and showed that the mathematical formulation of the new definition for gene-gene interaction between two loci was similar to that of the LD between two loci. This finding motived us to develop an LD-based statistic to detect gene-gene interaction between two unlinked loci. The null distribution and type I error rates of the LD-based statistic for testing gene-gene interaction were validated using extensive simulation studies. We found that the new test statistic was more powerful than the traditional logistic regression under three two-locus disease models and demonstrated that the power of the test statistic depends on the measure of gene-gene interaction. We also investigated the impact of using tagging SNPs for testing interaction on the power to detect interaction between two unlinked loci. Finally, to evaluate the performance of our new method, we applied the LD-based statistic to two published data sets. Our results showed that the P values of the LD-based statistic were smaller than those obtained by other approaches, including logistic regression models.  相似文献   

2.
Complex disease by definition results from the interplay of genetic and environmental factors. However, it is currently unclear how gene-environment interaction can best be used to locate complex disease susceptibility loci, particularly in the context of studies where between 1,000 and 1,000,000 markers are scanned for association with disease. We present a joint test of marginal association and gene-environment interaction for case-control data. We compare the power and sample size requirements of this joint test to other analyses: the marginal test of genetic association, the standard test for gene-environment interaction based on logistic regression, and the case-only test for interaction that exploits gene-environment independence. Although for many penetrance models the joint test of genetic marginal effect and interaction is not the most powerful, it is nearly optimal across all penetrance models we considered. In particular, it generally has better power than the marginal test when the genetic effect is restricted to exposed subjects and much better power than the tests of gene-environment interaction when the genetic effect is not restricted to a particular exposure level. This makes the joint test an attractive tool for large-scale association scans where the true gene-environment interaction model is unknown.  相似文献   

3.
ABSTRACT: BACKGROUND: Gene-environment interactions play an important role in the etiological pathway of complex diseases. An appropriate statistical method for handling a wide variety of complex situations involving interactions between variables is still lacking, especially when continuous variables are involved. The aim of this paper is to explore the ability of neural networks to model different structures of gene-environment interactions. A simulation study is set up to compare neural networks with standard logistic regression models. Eight different structures of gene-environment interactions are investigated. These structures are characterized by penetrance functions that are based on sigmoid functions or on combinations of linear and non-linear effects of a continuous environmental factor and a genetic factor with main effect or with a masking effect only. RESULTS: In our simulation study, neural networks are more successful in modeling gene-environment interactions than logistic regression models. This outperfomance is especially pronounced when modeling sigmoid penetrance functions, when distinguishing between linear and nonlinear components, and when modeling masking effects of the genetic factor. CONCLUSION: Our study shows that neural networks are a promising approach for analyzing gene-environment interactions. Especially, if no prior knowledge of the correct nature of the relationship between co-variables and response variable is present, neural networks provide a valuable alternative to regression methods that are limited to the analysis of linearly separable data.  相似文献   

4.
5.
Summary Case-parent trio studies concerned with children affected by a disease and their parents aim to detect single nucleotide polymorphisms (SNPs) showing a preferential transmission of alleles from the parents to their affected offspring. A popular statistical test for detecting such SNPs associated with disease in this study design is the genotypic transmission/disequilibrium test (gTDT) based on a conditional logistic regression model, which usually needs to be fitted by an iterative procedure. In this article, we derive exact closed-form solutions for the parameter estimates of the conditional logistic regression models when testing for an additive, a dominant, or a recessive effect of a SNP, and show that such analytic parameter estimates also exist when considering gene-environment interactions with binary environmental variables. Because the genetic model underlying the association between a SNP and a disease is typically unknown, it might further be beneficial to use the maximum over the gTDT statistics for the possible effects of a SNP as test statistic. We therefore propose a procedure enabling a fast computation of the test statistic and the permutation-based p-value of this MAX gTDT. All these methods are applied to whole-genome scans of the case-parent trios from the International Cleft Consortium. These applications show our procedures dramatically reduce the required computing time compared to the conventional iterative methods allowing, for example, the analysis of hundreds of thousands of SNPs in a few minutes instead of several hours.  相似文献   

6.
Many environmental risk factors for common, complex human diseases have been revealed by epidemiologic studies, but how genotypes at specific loci modulate individual responses to environmental risk factors is largely unknown. Gene-environment interactions will be missed in genome-wide association studies and could account for some of the 'missing heritability' for these diseases. In this review, we focus on asthma as a model disease for studying gene-environment interactions because of relatively large numbers of candidate gene-environment interactions with asthma risk in the literature. Identifying these interactions using genome-wide approaches poses formidable methodological problems, and elucidating molecular mechanisms for these interactions has been challenging. We suggest that studying gene-environment interactions in animal models, although more tractable, might not be sufficient to shed light on the genetic architecture of human diseases. Lastly, we propose avenues for future studies to find gene-environment interactions.  相似文献   

7.
Summary .   Standard prospective logistic regression analysis of case–control data often leads to very imprecise estimates of gene-environment interactions due to small numbers of cases or controls in cells of crossing genotype and exposure. In contrast, under the assumption of gene-environment independence, modern "retrospective" methods, including the "case-only" approach, can estimate the interaction parameters much more precisely, but they can be seriously biased when the underlying assumption of gene-environment independence is violated. In this article, we propose a novel empirical Bayes-type shrinkage estimator to analyze case–control data that can relax the gene-environment independence assumption in a data-adaptive fashion. In the special case, involving a binary gene and a binary exposure, the method leads to an estimator of the interaction log odds ratio parameter in a simple closed form that corresponds to an weighted average of the standard case-only and case–control estimators. We also describe a general approach for deriving the new shrinkage estimator and its variance within the retrospective maximum-likelihood framework developed by Chatterjee and Carroll (2005, Biometrika 92, 399–418). Both simulated and real data examples suggest that the proposed estimator strikes a balance between bias and efficiency depending on the true nature of the gene-environment association and the sample size for a given study.  相似文献   

8.
Abstract. The use of Generalized Linear Models (GLM) in vegetation analysis has been advocated to accommodate complex species response curves. This paper investigates the potential advantages of using classification and regression trees (CART), a recursive partitioning method that is free of distributional assumptions. We used multiple logistic regression (a form of GLM) and CART to predict the distribution of three major oak species in California. We compared two types of model: polynomial logistic regression models optimized to account for non‐linearity and factor interactions, and simple CART‐models. Each type of model was developed using learning data sets of 2085 and 410 sample cases, and assessed on test sets containing 2016 and 3691 cases respectively. The responses of the three species to environmental gradients were varied and often non‐homogeneous or context dependent. We tested the methods for predictive accuracy: CART‐models performed significantly better than our polynomial logistic regression models in four of the six cases considered, and as well in the two remaining cases. CART also showed a superior ability to detect factor interactions. Insight gained from CART‐models then helped develop improved parametric models. Although the probabilistic form of logistic regression results is more adapted to test theories about species responses to environmental gradients, we found that CART‐models are intuitive, easy to develop and interpret, and constitute a valuable tool for modeling species distributions.  相似文献   

9.
Understanding of how interactions between genes and environment contribute to the development of arthritis is a central issue in understanding the etiology of rheumatoid arthritis (RA), as well as for eventual subsequent efforts to prevent the disease. In this paper, we review current published data on genes and environment in RA as well as in certain induced animal models of disease, mainly those in which adjuvants only or adjuvants plus organ-specific autoantigens are used to induce arthritis. We refer to some new data on environmental and genetic factors of importance for RA generated from a large case-control study in Sweden (1200 patients, 1200 matched controls). We found an increased risk of seropositive but not of seronegative RA in smokers, and there are indications that this effect may be due to a gene-environment interaction involving MHC class II genes. We also found an increased risk of RA in individuals heavily exposed to mineral oils. This was of particular interest because mineral oils are strong inducers of arthritis in certain rodent strains and because polymorphisms in human genetic regions syntenic with genes predisposing for oil-induced arthritis in rats have now been shown to associate with RA in humans. Taken together, our data support the notion that concepts and data on gene-environment interactions in arthritis can now be taken from induced animal models of arthritis to generate new etiological hypotheses for RA.  相似文献   

10.
Resistant hypertension, a complex multifactorial hypertensive disease, is triggered by genetic and environmental factors and involves multiple physiological pathways. Single genetic variants may not reveal significant associations with resistant hypertension because their effects may be dependent on gene-gene or gene-environment interactions. We examined the interaction of angiotensin I-converting enzyme (ACE), angiotensinogen (AGT), and endothelial nitric oxide synthase (NOS3) polymorphisms with environmental factors (gender, age, body mass index, glycemia, total cholesterol, low-density lipoprotein cholesterol, high-density lipoprotein cholesterol, triglycerides, estimated glomerular filtration rate, and urinary sodium excretion) in 70 resistant, 80 well-controlled hypertensive patients, and 70 normotensive controls. All subjects were genotyped for ACE insertion/deletion (rs1799752); AGT M235T (rs699), and NOS3 Glu298Asp (rs 1799983). Multifactorial associations were tested using two statistical methods: the traditional parametric method (adjusted logistic regression analysis) and gene-gene and gene-environment interactions evaluated by multifactor dimensionality reduction analyses. While adjusted logistic regression found no significant association between the studied polymorphisms and controlled or resistant hypertension, the multifactor dimensionality reduction analyses showed that carriers of the AGT 235T allele were at increased risk for resistant hypertension, especially if they were older than 50 years. The AGT 235T allele constituted an independent risk factor for resistant hypertension.  相似文献   

11.
The complexity of ecosystems is staggering, with hundreds or thousands of species interacting in a number of ways from competition and predation to facilitation and mutualism. Understanding the networks that form the systems is of growing importance, e.g. to understand how species will respond to climate change, or to predict potential knock-on effects of a biological control agent. In recent years, a variety of summary statistics for characterising the global and local properties of such networks have been derived, which provide a measure for gauging the accuracy of a mathematical model for network formation processes. However, the critical underlying assumption is that the true network is known. This is not a straightforward task to accomplish, and typically requires minute observations and detailed field work. More importantly, knowledge about species interactions is restricted to specific kinds of interactions. For instance, while the interactions between pollinators and their host plants are amenable to direct observation, other types of species interactions, like those mentioned above, are not, and might not even be clearly defined from the outset. To discover information about complex ecological systems efficiently, new tools for inferring the structure of networks from field data are needed. In the present study, we investigate the viability of various statistical and machine learning methods recently applied in molecular systems biology: graphical Gaussian models, L1-regularised regression with least absolute shrinkage and selection operator (LASSO), sparse Bayesian regression and Bayesian networks. We have assessed the performance of these methods on data simulated from food webs of known structure, where we combined a niche model with a stochastic population model in a 2-dimensional lattice. We assessed the network reconstruction accuracy in terms of the area under the receiver operating characteristic (ROC) curve, which was typically in the range between 0.75 and 0.9, corresponding to the recovery of about 60% of the true species interactions at a false prediction rate of 5%. We also applied the models to presence/absence data for 39 European warblers, and found that the inferred species interactions showed a weak yet significant correlation with phylogenetic similarity scores, which tended to weakly increase when including bio-climate covariates and allowing for spatial autocorrelation. Our findings demonstrate that relevant patterns in ecological networks can be identified from large-scale spatial data sets with machine learning methods, and that these methods have the potential to contribute novel important tools for gaining deeper insight into the structure and stability of ecosystems.  相似文献   

12.
The discovery of regulation relationship of protein interactions is crucial for the mechanism research in signaling network. Bioinformatics methods can be used to accelerate the discovery of regulation relationship between protein interactions, to distinguish the activation relations from inhibition relations. In this paper, we describe a novel method to predict the regulation relations of protein interactions in the signaling network. We detected 4,417 domain pairs that were significantly enriched in the activation or inhibition dataset. Three machine learning methods, logistic regression, support vector machines(SVMs), and naïve bayes, were explored in the classifier models. The prediction power of three different models was evaluated by 5-fold cross-validation and the independent test dataset. The area under the receiver operating characteristic curve for logistic regression, SVM, and naïve bayes models was 0.946, 0.905 and 0.809, respectively. Finally, the logistic regression classifier was applied to the human proteome-wide interaction dataset, and 2,591 interactions were predicted with their regulation relations, with 2,048 in activation and 543 in inhibition. This model based on domains can be used to identify the regulation relations between protein interactions and furthermore reconstruct signaling pathways.  相似文献   

13.

Background  

The potential public health benefits of targeting environmental interventions by genotype depend on the environmental and genetic contributions to the variance of common diseases, and the magnitude of any gene-environment interaction. In the absence of prior knowledge of all risk factors, twin, family and environmental data may help to define the potential limits of these benefits in a given population. However, a general methodology to analyze twin data is required because of the potential importance of gene-gene interactions (epistasis), gene-environment interactions, and conditions that break the 'equal environments' assumption for monozygotic and dizygotic twins.  相似文献   

14.
Communication and information are central concepts in evolutionary biology. In fact, it is hard to find an area of biology where these concepts are not used. However, quantifying the information transferred in biological interactions has been difficult. How much information is transferred when the first spring rainfall hits a dormant seed, or when a chick begs for food from its parent? One measure that is commonly used in such cases is fitness value: by how much, on average, an individual's fitness would increase if it behaved optimally with the new information, compared to its average fitness without the information. Another measure, often used to describe neural responses to sensory stimuli, is the mutual information – a measure of reduction in uncertainty, as introduced by Shannon in communication theory. However, mutual information has generally not been considered to be an appropriate measure for describing developmental or behavioral responses at the organismal level, because it is blind to function; it does not distinguish between relevant and irrelevant information. In this paper we show that there is in fact a surprisingly tight connection between these two measures in the important context of evolution in an uncertain environment. In this case, a useful measure of fitness benefit is the increase in the long‐term growth rate, or the fold increase in number of surviving lineages. We show that in many cases the fitness value of a developmental cue, when measured this way, is exactly equal to the reduction in uncertainty about the environment, as described by the mutual information.  相似文献   

15.
Mukherjee B  Zhang L  Ghosh M  Sinha S 《Biometrics》2007,63(3):834-844
In case-control studies of gene-environment association with disease, when genetic and environmental exposures can be assumed to be independent in the underlying population, one may exploit the independence in order to derive more efficient estimation techniques than the traditional logistic regression analysis (Chatterjee and Carroll, 2005, Biometrika92, 399-418). However, covariates that stratify the population, such as age, ethnicity and alike, could potentially lead to nonindependence. In this article, we provide a novel semiparametric Bayesian approach to model stratification effects under the assumption of gene-environment independence in the control population. We illustrate the methods by applying them to data from a population-based case-control study on ovarian cancer conducted in Israel. A simulation study is conducted to compare our method with other popular choices. The results reflect that the semiparametric Bayesian model allows incorporation of key scientific evidence in the form of a prior and offers a flexible, robust alternative when standard parametric model assumptions do not hold.  相似文献   

16.
The relative positions of branching events in a phylogeny contain information about evolutionary and population dynamic processes. We provide new summary statistics of branching event times and describe how these statistics can be used to infer rates of species diversification from interspecies trees or rates of population growth from intraspecies trees. We also introduce a phylogenetic method for estimating the level of taxon sampling in a clade. Different evolutionary models and different sampling regimes can produce similar patterns of branching events, so it is important to consider explicitly the model assumptions involved when making evolutionary inferences. Results of an analysis of the phylogeny of the mosquito-borne flaviviruses suggest that there could be several thousand currently unidentified viruses in this clade.  相似文献   

17.
Evolutionary dynamics, epistatic interactions, and biological information   总被引:1,自引:0,他引:1  
We investigate a definition of biological information that connects population genetics with the tools of information theory by focusing on the distribution of genotypes found in a population. Previous research has treated loci as non-interacting by making specific approximations in the calculation of information-theoretic quantities. We expand earlier mathematical forms to include epistasis, or interactions between mutations at all pairs of loci. Application of our improved measure of biological information to evolution on two-locus, two-allele fitness landscapes demonstrates that mutual information between loci reflects epistatic interaction of mutations. Finally, we consider four-locus, two-allele fitness landscapes with modular structure. As modular interactions are inherently epistatic, we demonstrate that our refined approximation provides insight into the underlying structure of these non-trivial fitness landscapes.  相似文献   

18.
基因调控网络模型为深入理解生命本质提供了一个新的研究框架和平台。作为基因调控网络模型的其中一种,互信息关联网络模型使用熵和互信息描述基因和基因之间的关联。本文描述了用互信息度量基因表达相似性的方法,提出基于Bootstrap的互信息估计算法,并对产生的偏离现象提出了改进策略。实验结果表明,改进的互信息估计方法可以有效提高基因表达相似性估计的精确度。  相似文献   

19.
Regional association analysis is one of the most powerful tools for gene mapping because instead analysis of individual variants it simultaneously considers all variants in the region. Recent development of the models for regional association analysis involves functional data analysis approach. In the framework of this approach, genotypes of variants within region as well as their effects are described by continuous functions. Such approach allows us to use information about both linkage and linkage disequilibrium and reduce the influence of noise and/or observation errors. Here we define a functional linear mixed model to test association on independent and structured samples. We demonstrate how to test fixed and random effects of a set of genetic variants in the region on quantitative trait. Estimation of statistical properties of new methods shows that type I errors are in accordance with declared values and power is high especially for models with fixed effects of genotypes. We suppose that new functional regression linear models facilitate identification of rare genetic variants controlling complex human and animal traits. New methods are implemented in computer software FREGAT which is available for free download at http://mga.bionet.nsc.ru/soft/FREGAT/.  相似文献   

20.
How do the behavioural interactions between individuals in an ecological system produce the global population dynamics of that system? We present a stochastic individual-based model of the reproductive cycle of the mite Varroa jacobsoni, a parasite of honeybees. The model has the interesting property in that its population level behaviour is approximated extremely accurately by the exponential logistic equation or Ricker map. We demonstrated how this approximation is obtained mathematically and how the parameters of the exponential logistic equation can be written in terms of the parameters of the individual-based model. Our procedure demonstrates, in at least one case, how study of animal ecology at an individual level can be used to derive global models which predict population change over time.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号