On the Classification of Epistatic Interactions |
| |
Authors: | Hong Gao Julie M. Granka Marcus W. Feldman |
| |
Affiliation: | *Department of Genetics and †Department of Biology, Stanford University School of Medicine, Stanford, California 94305 |
| |
Abstract: | Modern genomewide association studies are characterized by the problem of “missing heritability.” Epistasis, or genetic interaction, has been suggested as a possible explanation for the relatively small contribution of single significant associations to the fraction of variance explained. Of particular concern to investigators of genetic interactions is how to best represent and define epistasis. Previous studies have found that the use of different quantitative definitions for genetic interaction can lead to different conclusions when constructing genetic interaction networks and when addressing evolutionary questions. We suggest that instead, multiple representations of epistasis, or epistatic “subtypes,” may be valid within a given system. Selecting among these epistatic subtypes may provide additional insight into the biological and functional relationships among pairs of genes. In this study, we propose maximum-likelihood and model selection methods in a hypothesis-testing framework to choose epistatic subtypes that best represent functional relationships for pairs of genes on the basis of fitness data from both single and double mutants in haploid systems. We gauge the performance of our method with extensive simulations under various interaction scenarios. Our approach performs reasonably well in detecting the most likely epistatic subtype for pairs of genes, as well as in reducing bias when estimating the epistatic parameter (ɛ). We apply our approach to two available data sets from yeast (Saccharomyces cerevisiae) and demonstrate through overlap of our identified epistatic pairs with experimentally verified interactions and functional links that our results are likely of biological significance in understanding interaction mechanisms. We anticipate that our method will improve detection of epistatic interactions and will help to unravel the mysteries of complex biological systems.UNDERSTANDING the nature of genetic interactions is crucial to obtaining a more complete picture of complex biological systems and their evolution. The discovery of genetic interactions has been the goal of many researchers studying a number of model systems, including but not limited to Saccharomyces cerevisiae, Caenorhabditis elegans, and Escherichia coli (You and Yin 2002; Burch et al. 2003; Burch and Chao 2004; Tong et al. 2004; Drees et al. 2005; Sanjuán et al. 2005; Segre et al. 2005; Pan et al. 2006; Zhong and Sternberg 2006; Jasnos and Korona 2007; St. Onge et al. 2007; Decourty et al. 2008). Recently, high-throughput experimental approaches, such as epistatic mini-array profiles (E-MAPs) and genetic interaction analysis technology for E. coli (GIANT-coli), have enabled the study of epistasis on a large scale (Schuldiner et al. 2005, 2006; Collins et al. 2006, 2007; Typas et al. 2008). However, it remains unclear whether the computational and statistical methods currently in use to identify these interactions are indeed the most appropriate.The study of genetic interaction, or “epistasis,” has had a long and somewhat convoluted history. Bateson (1909) first used the term epistasis to describe the ability of a gene at one locus to “mask” the mutational influence of a gene at another locus (Cordell 2002). The term “epistacy” was later coined by Fisher (1918) to denote the statistical deviation of multilocus genotype values from an additive linear model for the value of a phenotype (Phillips 1998, 2008).These origins are the basis for the two main current interpretations of epistasis. The first, as introduced by Bateson (1909), is the “biological,” “physiological,” or “compositional” form of epistasis, concerned with the influence of an individual''s genetic background on an allele''s effect on phenotype (Cheverud and Routman 1995; Phillips 1998, 2008; Cordell 2002; Moore and Williams 2005). The second interpretation, attributed to Fisher, is “statistical” epistasis, which in its linear regression framework places the phenomenon of epistasis in the context of a population (Wagner et al. 1998; Wade et al. 2001; Wilke and Adami 2001; Moore and Williams 2005; Phillips 2008). Each of these approaches is equally valid in studying genetic interactions; however, confusion still exists about how to best reconcile the methods and results of the two (Phillips 1998, 2008; Cordell 2002; Moore and Williams 2005; Liberman and Feldman 2006; Aylor and Zeng 2008).Aside from the distinction between the statistical and the physiological definitions of epistasis, inconsistencies exist when studying solely physiological epistasis. For categorical traits, physiological epistasis is clear as a “masking” effect. When noncategorical or numerical traits are measured, epistasis is defined as the deviation of the phenotype of the multiple mutant from that expected under independence of the underlying genes.The “expectation” of the phenotype under independence, that is, in the absence of epistasis, is not defined consistently between studies. For clarity, consider epistasis between pairs of genes and, without loss of generality, consider fitness as the phenotype. The first commonly used definition of independence, originating from additivity, defines the effect of two independent mutations to be equal to the sum of the individual mutational effects. A second, motivated by the use of fitness as a phenotype, defines the effect of the two mutations as the product of the individual effects (Elena and Lenski 1997; Desai et al. 2007; Phillips 2008). A third definition of independence has been referred to as “minimum,” where alleles at two loci are independent if the double mutant has the same fitness as the less-fit single mutant. Mani et al. (2008) claim that this has been used when identifying pairwise epistasis by searching for synthetic lethal double mutants (Tong et al. 2001, 2004; Pan et al. 2004, 2006; Davierwala et al. 2005). A fourth is the “Log” definition presented by Mani et al. (2008) and Sanjuan and Elena (2006). The less-frequently used “scaled ɛ” (Segre et al. 2005) measure of epistasis takes the multiplicative definition of independence with a scaling factor.These different definitions of independence are partly due to distinct measurement “scales.” For some traits, a multiplicative definition of independence may be necessary to identify epistasis between two genes, whereas for other traits, additivity may be appropriate (Falconer and Mackay 1995; Wade et al. 2001; Mani et al. 2008; Phillips 2008). An interaction found under one independence definition may not necessarily be found under another, leading to different biological conclusions (Mani et al. 2008).Mani et al. (2008) suggest that there may be an “ideal” definition of independence for all gene pairs for identifying functional relationships. However, it is plausible that different representations of independence for two genes may reflect different biological properties of the relationship (Kupper and Hogan 1978; Rothman et al. 1980). “Two categories of general interest [the additive and multiplicative definitions, respectively] are those in which etiologic factors act interchangeably in the same step in a multistep process, or alternatively act at different steps in the process” (Rothman et al. 1980, p. 468). In some cases, the discovery of epistasis may merely be an artifact of using an incorrect null model (Kupper and Hogan 1978). It may be necessary to represent “independence” differently, resulting in different statistical measures of interactions, for different pairs of genes depending on their functions.Previous studies have suggested that different pairs of loci may have different modes of interaction and have attempted to subclassify genetic interactions into regulatory hierarchies and mutually exclusive “interaction subtypes” to elucidate underlying biological properties (Avery and Wasserman 1992; Drees et al. 2005; St. Onge et al. 2007). We suggest that epistatic relationships can be divided into several subtypes, or forms, corresponding to the aforementioned definitions of independence. As a particular gene pair may deviate from independence according to several criteria, we do not claim that these subtypes are necessarily mutually exclusive. We attempt to select the most likely epistatic subtype that is the best statistical representation of the relationship between two genes. To further subclassify interactions, epistasis among deleterious mutations can take one of two commonly used forms: positive (equivalently alleviating, antagonistic, or buffering) epistasis, where the phenotype of the double mutant is less severe than expected under independence, and negative (equivalently aggravating, synergistic, or synthetic), where the phenotype is more severe than expected (Segre et al. 2005; Collins et al. 2006; Desai et al. 2007; Mani et al. 2008).Another objective of such distinctions is to reduce the bias of the estimator of the epistatic parameter (ɛ), which measures the extent and direction of epistasis for a given gene pair. Mani et al. (2008), assuming that the overall distribution of ɛ should be centered around 0, find that inaccurately choosing a definition of independence can result in increased bias when estimating ɛ. For example, using the minimum definition results in the most severe bias when single mutants have moderate fitness effects, and the additive definition results in the largest positive bias when at least one gene has an extreme fitness defect (Mani et al. 2008). Therefore, it is important to select an optimal estimator for ɛ for each pair of genes from among the subtypes of epistatic interactions.Epistasis may be important to consider in genomic association studies, as a gene with a weak main effect may be identified only through its interaction with another gene or other genes (Frankel and Schork 1996; Culverhouse et al. 2002; Moore 2003; Cordell 2009; Moore and Williams 2009). Epistasis has also been studied extensively in the context of the evolution of sex and recombination. The mutational deterministic hypothesis proposes that the evolution of sex and recombination would be favored by negative epistatic interactions (Feldman et al. 1980; Kondrashov 1994); many other studies have also studied the importance of the form of epistasis (Elena and Lenski 1997; Otto and Feldman 1997; Burch and Chao 2004; Keightley and Otto 2006; Desai et al. 2007; MacCarthy and Bergman 2007). Indeed, according to Mani et al. (2008, p. 3466), “the choice of definition [of epistasis] alters conclusions relevant to the adaptive value of sex and recombination.”Given fitness data from single and double mutants in haploid organisms, we implement a likelihood method to determine the subtype that is the best statistical representation of the epistatic interaction for pairs of genes. We use maximum-likelihood estimation and the Bayesian information criteria (BIC) (Schwarz 1978) with a likelihood-ratio test to select the most appropriate null or epistatic model for each putative interaction. We conduct extensive simulations to gauge the performance of our method and demonstrate that it performs reasonably well under various interaction scenarios. We apply our method to two data sets with fitness measurements obtained from yeast (Jasnos and Korona 2007; St. Onge et al. 2007), whose authors assume only multiplicative epistasis for all interactions. By examining functional links and experimentally validated interactions among epistatic pairs, we demonstrate that our results are biologically meaningful. Studying a random selection of genes, we find that minimum epistasis is more prevalent than both additive and multiplicative epistasis and that the overall distribution of ɛ is not significantly different from zero (as Jasnos and Korona 2007 suggest). For genes in a particular pathway, we advise selecting among fewer epistatic subtypes. We believe that our method of epistatic subtype classification will aid in understanding genetic interactions and their properties.St. Onge et al. (2007) examined 26 nonessential genes known to confer resistance to MMS, constructed double-deletion strains for 323 double-mutant strains (all but two of the total possible pairs), and assumed the multiplicative form of epistasis for all interactions (see Methods: Analysis of experimental data). Following these authors, we focus on single- and double-mutant fitnesses measured in the presence of MMS. (For results in the absence of MMS, see File S1 and File S1_2.)Using the resampling method described in Analysis of experimental data and File S1, 222 gene pairs pass the cutoff of having epistasis inferred in at least 900 of 1000 replicates. This does not include 5 synthetic lethal gene pairs. Hypothesis testing and a multiple-testing procedure (for 222 simultaneous hypotheses) are necessary to determine the final epistatic pairs.To select one among the three multiple-testing procedures, we follow St. Onge et al. (2007) and examine gene pairs that share specific functional links (see Analysis of experimental data). The Bonferroni method is likely too conservative, yielding only 25 significantly epistatic pairs with only one functional link among them; alternatively, the pFDR procedure appears to be too lenient in rejecting independence for all 222 pairs. Therefore, we use the FDR procedure (although the number of functional links is not significant) and detect 193 epistatic pairs, of which 5 (2.6%) are synthetic lethals, 19 (9.8%) have additive epistasis, 33 (17.1%) have multiplicative epistasis, and 136 (70.5%) have minimum epistasis (). We find 29 gene pairs with positive (alleviating) epistasis and 159 gene pairs with negative (aggravating) epistasis.TABLE 2Summary of gene pairs with the indicated epistatic subtypes, inferred using the FDR procedure with the BIC method that considers all three epistatic subtypes and their corresponding null modelsEpistatic subtype | Study S | Study J |
---|
All | 193 (100%) | 352 (100%) | | = −0.060 | = −0.001 | | = −0.096 | = −0.059 | Additive | 19 (9.8%) | 35 (9.9%) | | = 0.115* | = 0.193*** | | = 0.131 | = 0.188 | Multiplicative | 33 (17.1%) | 63 (17.9%) | | = 0.048 | = 0.017 | | = −0.166 | = −0.115 | Minimum | 136 (70.5%) | 254 (72.2%) | | = −0.111*** | = −0.032** | | = −0.091 | = −0.065 | Open in a separate windowNumbers are the counts of each type, and percentages are given of the total number of epistatic pairs. The mean () and median () of the epistatic parameter (ɛ) are given for each subtype, with “*” indicating that the mean of ɛ is significantly different from 0 (*, P-value ≤0.05; **, P-value ≤0.01; ***, P-value ≤0.001). Study S refers to the St. Onge et al. (2007) data set, and study J refers to the Jasnos and Korona (2007) data set. (For study S, five of the epistatic pairs are synthetic lethals and are not shown; as a result, percentages do not sum to 100%.)To further validate the use of our method and the FDR procedure, we assess by Fisher''s exact test the significance of an enrichment of both Biological Process and all GO Slim term links among epistatic pairs, neither of which are significant (Gene Ontology Consortium 2000; www.yeastgenome.org; ); ]. Although some of the previously unidentified interactions that we identify could be false positives, many are likely to be new discoveries.TABLE 3Comparison of validation measures for each data set for different variations of the FDR and BIC procedures, considering only a subset of epistatic subtypes with their corresponding null models: all epistatic subtypes (A, P, and M); only the additive and multiplicative subtypes (A and P); and only the additive (A), only the multiplicative (P), or only the minimum (M) subtype (see text for details) | Subtypes considered in BIC procedure
|
---|
| A, P, M | A, P | A | P | M |
---|
Study J | No. found (636) | 352 | 273 | 263 | 231 | 329 | Functional links (25) | 19 (0.0255)* | 13 (0.2320) | 11 (0.4689) | 10 (0.4227) | 15 (0.2619) | GO Slim terms (Biological Process) (115) | 69 (0.1573) | 50 (0.4874) | 55 (0.0736) | 44 (0.3534) | 68 (0.04902)* | GO Slim terms (all) (369) | 224 (0.0009)* | 172 (0.01654)* | 160 (0.1297) | 146 (0.0273)* | 213 (0.0003)* | Experimentally identified (3) | 3 | 2 | 1 | 2 | 3 | Study S | No. found (323) | 193 | 192 | 247 | 171 | 243 | Functional links (36) | 21 (0.6450) | 29 (0.0041)* | 34 (0.0031)* | 29 (0.0003)* | 24 (0.9256) | GO Slim terms (Biological Process) (283) | 174 (0.0657) | 174 (0.03656)* | 223 (0.0010)* | 153 (0.1825) | 213 (0.5534) | GO Slim terms (all) (307) | 185 (0.2866) | 182 (0.6926) | 237 (0.1472) | 162 (0.6997) | 231 (0.5908) | Experimentally identified (29) | 17 | 22 | 24 | 23 | 21 | Open in a separate windowNumbers in parentheses indicate P-values by Fisher''s exact test. “*” indicates significance. Study J refers to the Jasnos and Korona (2007) data set, and study S refers to the St. Onge et al. (2007) data set measured in the presence of MMS. Numbers in parentheses indicate the total number of tested pairs and the total number of each type of link found in each complete data set.The epistatic subtypes we consider are not necessarily mutually exclusive. To more fully assess the assumptions of our method, we also consider several of the possible subsets of the epistatic subtypes (and their corresponding null models) in our procedure. As the minimum epistatic subtype was the most frequently selected in this data set, we first do not include the minimum null model or the minimum epistatic model in our procedure (i.e., we select from among four rather than six models for a pair; ). However, there are a significant number of epistatic pairs with functional links only when the minimum epistatic subtype is not included (also see Table S4 and Table S5). It is not immediately clear which epistatic subtypes are the most appropriate for these data, although including the minimum subtype may not be appropriate (Mani et al. 2008) (see discussion).Although it may be best to consider fewer epistatic subtypes for this specific data set, we report our results including all three epistatic subtypes and their corresponding null models (, although we identify 105 epistatic pairs not identified by the original authors (Figure S4, Table S4). St. Onge et al. (2007) find that epistatic pairs with a functional link have a positively shifted distribution of epistasis. We find no such shift in epistasis values (). We also demonstrate [described in application to simulated data: Bias and variance of the epistatic parameter (ɛ)] that our method seems to reduce bias of the epistatic parameter (ɛ) ().] When considering only a subset of the epistatic subtypes, however, we find to be positive and significantly different from zero (results not shown). See File S1, Figure S6, and Figure S7 for additional discussion of the epistatic pairs we identify.The Jasnos and Korona (2007) data set included 758 yeast gene deletions known to cause growth defects and reports fitnesses of only a sparse subset of all possible gene pairs [≈0.2% of the possible pairwise genotypes, or 639 pairs of ]. Because the authors do not identify epistatic pairs in a hypothesis-testing framework, we cannot explicitly compare our conclusions with theirs.To validate our method, we examine gene pairs that have specific functional links (see methods: Analysis of experimental data). When defining a functional link using GO terms (Gene Ontology Consortium 2000) with <30 genes associated with them, only 1 of 639 tested gene pairs has a functional link. Raising the threshold of associated genes to 50 and 100, the number of tested pairs with functional links rises only to 3 and 9, respectively. Because of the large number of random genes and the sparse number of gene pairs in this data set, we follow Tong et al. (2004) and select GO terms that have associated with them ≤200 genes. Twenty-five of 639 tested pairs then have a functional link.Only the FDR multiple-testing procedure results in a significant enrichment of functional links among epistatic pairs (). With the FDR procedure we find 352 significant epistatic pairs, of which 35 (9.9%) have additive epistasis, 63 (17.9%) have multiplicative epistasis, and 254 (72.2%) have minimum epistasis (). These proportions of inferred subtypes suggest that the authors'' original restriction to multiplicative epistasis may be inappropriate. We find 141 gene pairs with positive epistasis and 211 gene pairs with negative epistasis.We do not find a significant number of epistatic pairs with shared GO Slim Biological Process terms (see Analysis of experimental data), but do when considering all shared GO Slim terms ( data set, we also consider some of the possible subsets of the three epistatic subtypes (and their corresponding null models) in our model selection procedure (). In contrast to the St. Onge et al. (2007) data set, using all three epistatic subtypes results in a significant number of epistatic pairs with functional links; this measure is not significant when using any of the other subsets of the subtypes. This suggests that our proposed method with three epistatic subtypes may indeed be the most appropriate for data sets with randomly selected genes.We examined the distribution of the estimated values of the epistatic parameter (ɛ) for all pairs with significant epistasis. Jasnos and Korona (2007), in assuming only multiplicative epistasis, conclude that epistasis is predominantly positive. However, we find that the estimated mean of epistasis is not significantly different from zero (two-sided t-test, P-value = 0.9578; and .Open in a separate windowDistribution of the epistasis values (ɛ) for significant epistatic pairs in the Jasnos and Korona (2007) data set, determined using the FDR procedure and the BIC method including all three epistatic subtypes and their corresponding null models. Mean of ɛ is −0.0009, with a standard deviation of 0.3177; median value is −0.0587. A similar plot is shown in Figure 3 of Jasnos and Korona (2007). |
| |
Keywords: | |
|
|