On the Classification of Epistatic Interactions期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

On the Classification of Epistatic Interactions

Authors:

Hong Gao Julie M. Granka Marcus W. Feldman

Affiliation:

^*Department of Genetics and ^†Department of Biology, Stanford University School of Medicine, Stanford, California 94305

Abstract:

Modern genomewide association studies are characterized by the problem of “missing heritability.” Epistasis, or genetic interaction, has been suggested as a possible explanation for the relatively small contribution of single significant associations to the fraction of variance explained. Of particular concern to investigators of genetic interactions is how to best represent and define epistasis. Previous studies have found that the use of different quantitative definitions for genetic interaction can lead to different conclusions when constructing genetic interaction networks and when addressing evolutionary questions. We suggest that instead, multiple representations of epistasis, or epistatic “subtypes,” may be valid within a given system. Selecting among these epistatic subtypes may provide additional insight into the biological and functional relationships among pairs of genes. In this study, we propose maximum-likelihood and model selection methods in a hypothesis-testing framework to choose epistatic subtypes that best represent functional relationships for pairs of genes on the basis of fitness data from both single and double mutants in haploid systems. We gauge the performance of our method with extensive simulations under various interaction scenarios. Our approach performs reasonably well in detecting the most likely epistatic subtype for pairs of genes, as well as in reducing bias when estimating the epistatic parameter (ɛ). We apply our approach to two available data sets from yeast (Saccharomyces cerevisiae) and demonstrate through overlap of our identified epistatic pairs with experimentally verified interactions and functional links that our results are likely of biological significance in understanding interaction mechanisms. We anticipate that our method will improve detection of epistatic interactions and will help to unravel the mysteries of complex biological systems.UNDERSTANDING the nature of genetic interactions is crucial to obtaining a more complete picture of complex biological systems and their evolution. The discovery of genetic interactions has been the goal of many researchers studying a number of model systems, including but not limited to Saccharomyces cerevisiae, Caenorhabditis elegans, and Escherichia coli (; ; ; ; ; ; ; ; ; ; ; ). Recently, high-throughput experimental approaches, such as epistatic mini-array profiles (E-MAPs) and genetic interaction analysis technology for E. coli (GIANT-coli), have enabled the study of epistasis on a large scale (, ; , ; ). However, it remains unclear whether the computational and statistical methods currently in use to identify these interactions are indeed the most appropriate.The study of genetic interaction, or “epistasis,” has had a long and somewhat convoluted history. Bateson (1909) first used the term epistasis to describe the ability of a gene at one locus to “mask” the mutational influence of a gene at another locus (). The term “epistacy” was later coined by Fisher (1918) to denote the statistical deviation of multilocus genotype values from an additive linear model for the value of a phenotype (, ).These origins are the basis for the two main current interpretations of epistasis. The first, as introduced by Bateson (1909), is the “biological,” “physiological,” or “compositional” form of epistasis, concerned with the influence of an individual''s genetic background on an allele''s effect on phenotype (; , ; ; ). The second interpretation, attributed to Fisher, is “statistical” epistasis, which in its linear regression framework places the phenomenon of epistasis in the context of a population (; Wade et al. 2001; ; ; ). Each of these approaches is equally valid in studying genetic interactions; however, confusion still exists about how to best reconcile the methods and results of the two (, ; ; ; ; ).Aside from the distinction between the statistical and the physiological definitions of epistasis, inconsistencies exist when studying solely physiological epistasis. For categorical traits, physiological epistasis is clear as a “masking” effect. When noncategorical or numerical traits are measured, epistasis is defined as the deviation of the phenotype of the multiple mutant from that expected under independence of the underlying genes.The “expectation” of the phenotype under independence, that is, in the absence of epistasis, is not defined consistently between studies. For clarity, consider epistasis between pairs of genes and, without loss of generality, consider fitness as the phenotype. The first commonly used definition of independence, originating from additivity, defines the effect of two independent mutations to be equal to the sum of the individual mutational effects. A second, motivated by the use of fitness as a phenotype, defines the effect of the two mutations as the product of the individual effects (; ; ). A third definition of independence has been referred to as “minimum,” where alleles at two loci are independent if the double mutant has the same fitness as the less-fit single mutant. claim that this has been used when identifying pairwise epistasis by searching for synthetic lethal double mutants (, ; , ; ). A fourth is the “Log” definition presented by and . The less-frequently used “scaled ɛ” () measure of epistasis takes the multiplicative definition of independence with a scaling factor.These different definitions of independence are partly due to distinct measurement “scales.” For some traits, a multiplicative definition of independence may be necessary to identify epistasis between two genes, whereas for other traits, additivity may be appropriate (Falconer and Mackay 1995; Wade et al. 2001; ; ). An interaction found under one independence definition may not necessarily be found under another, leading to different biological conclusions (). suggest that there may be an “ideal” definition of independence for all gene pairs for identifying functional relationships. However, it is plausible that different representations of independence for two genes may reflect different biological properties of the relationship (; ). “Two categories of general interest [the additive and multiplicative definitions, respectively] are those in which etiologic factors act interchangeably in the same step in a multistep process, or alternatively act at different steps in the process” (, p. 468). In some cases, the discovery of epistasis may merely be an artifact of using an incorrect null model (). It may be necessary to represent “independence” differently, resulting in different statistical measures of interactions, for different pairs of genes depending on their functions.Previous studies have suggested that different pairs of loci may have different modes of interaction and have attempted to subclassify genetic interactions into regulatory hierarchies and mutually exclusive “interaction subtypes” to elucidate underlying biological properties (; ; ). We suggest that epistatic relationships can be divided into several subtypes, or forms, corresponding to the aforementioned definitions of independence. As a particular gene pair may deviate from independence according to several criteria, we do not claim that these subtypes are necessarily mutually exclusive. We attempt to select the most likely epistatic subtype that is the best statistical representation of the relationship between two genes. To further subclassify interactions, epistasis among deleterious mutations can take one of two commonly used forms: positive (equivalently alleviating, antagonistic, or buffering) epistasis, where the phenotype of the double mutant is less severe than expected under independence, and negative (equivalently aggravating, synergistic, or synthetic), where the phenotype is more severe than expected (; ; ; ).Another objective of such distinctions is to reduce the bias of the estimator of the epistatic parameter (ɛ), which measures the extent and direction of epistasis for a given gene pair. , assuming that the overall distribution of ɛ should be centered around 0, find that inaccurately choosing a definition of independence can result in increased bias when estimating ɛ. For example, using the minimum definition results in the most severe bias when single mutants have moderate fitness effects, and the additive definition results in the largest positive bias when at least one gene has an extreme fitness defect (). Therefore, it is important to select an optimal estimator for ɛ for each pair of genes from among the subtypes of epistatic interactions.Epistasis may be important to consider in genomic association studies, as a gene with a weak main effect may be identified only through its interaction with another gene or other genes (; ; ; ; ). Epistasis has also been studied extensively in the context of the evolution of sex and recombination. The mutational deterministic hypothesis proposes that the evolution of sex and recombination would be favored by negative epistatic interactions (; ); many other studies have also studied the importance of the form of epistasis (; ; ; ; ; ). Indeed, according to , p. 3466), “the choice of definition [of epistasis] alters conclusions relevant to the adaptive value of sex and recombination.”Given fitness data from single and double mutants in haploid organisms, we implement a likelihood method to determine the subtype that is the best statistical representation of the epistatic interaction for pairs of genes. We use maximum-likelihood estimation and the Bayesian information criteria (BIC) (Schwarz 1978) with a likelihood-ratio test to select the most appropriate null or epistatic model for each putative interaction. We conduct extensive simulations to gauge the performance of our method and demonstrate that it performs reasonably well under various interaction scenarios. We apply our method to two data sets with fitness measurements obtained from yeast (; ), whose authors assume only multiplicative epistasis for all interactions. By examining functional links and experimentally validated interactions among epistatic pairs, we demonstrate that our results are biologically meaningful. Studying a random selection of genes, we find that minimum epistasis is more prevalent than both additive and multiplicative epistasis and that the overall distribution of ɛ is not significantly different from zero (as suggest). For genes in a particular pathway, we advise selecting among fewer epistatic subtypes. We believe that our method of epistatic subtype classification will aid in understanding genetic interactions and their properties.

St. Onge et al. (2007) data set:

examined 26 nonessential genes known to confer resistance to MMS, constructed double-deletion strains for 323 double-mutant strains (all but two of the total possible pairs), and assumed the multiplicative form of epistasis for all interactions (see Methods: Analysis of experimental data). Following these authors, we focus on single- and double-mutant fitnesses measured in the presence of MMS. (For results in the absence of MMS, see File S1 and File S1_2.)Using the resampling method described in Analysis of experimental data and File S1, 222 gene pairs pass the cutoff of having epistasis inferred in at least 900 of 1000 replicates. This does not include 5 synthetic lethal gene pairs. Hypothesis testing and a multiple-testing procedure (for 222 simultaneous hypotheses) are necessary to determine the final epistatic pairs.To select one among the three multiple-testing procedures, we follow and examine gene pairs that share specific functional links (see Analysis of experimental data). The Bonferroni method is likely too conservative, yielding only 25 significantly epistatic pairs with only one functional link among them; alternatively, the pFDR procedure appears to be too lenient in rejecting independence for all 222 pairs. Therefore, we use the FDR procedure (although the number of functional links is not significant) and detect 193 epistatic pairs, of which 5 (2.6%) are synthetic lethals, 19 (9.8%) have additive epistasis, 33 (17.1%) have multiplicative epistasis, and 136 (70.5%) have minimum epistasis (File S1_1). We find 29 gene pairs with positive (alleviating) epistasis and 159 gene pairs with negative (aggravating) epistasis.

TABLE 2

Summary of gene pairs with the indicated epistatic subtypes, inferred using the FDR procedure with the BIC method that considers all three epistatic subtypes and their corresponding null models

Epistatic subtype	Study S	Study J
All	193 (100%)	352 (100%)
	= −0.060	= −0.001
	= −0.096	= −0.059
Additive	19 (9.8%)	35 (9.9%)
	= 0.115*	= 0.193***
	= 0.131	= 0.188
Multiplicative	33 (17.1%)	63 (17.9%)
	= 0.048	= 0.017
	= −0.166	= −0.115
Minimum	136 (70.5%)	254 (72.2%)
	= −0.111***	= −0.032**
	= −0.091	= −0.065

Open in a separate windowNumbers are the counts of each type, and percentages are given of the total number of epistatic pairs. The mean () and median () of the epistatic parameter (ɛ) are given for each subtype, with “*” indicating that the mean of ɛ is significantly different from 0 (*, P-value ≤0.05; **, P-value ≤0.01; ***, P-value ≤0.001). Study S refers to the data set, and study J refers to the data set. (For study S, five of the epistatic pairs are synthetic lethals and are not shown; as a result, percentages do not sum to 100%.)To further validate the use of our method and the FDR procedure, we assess by Fisher''s exact test the significance of an enrichment of both Biological Process and all GO Slim term links among epistatic pairs, neither of which are significant (; www.yeastgenome.org; Stark et al. 2006); Table S4]. Although some of the previously unidentified interactions that we identify could be false positives, many are likely to be new discoveries.

TABLE 3

Comparison of validation measures for each data set for different variations of the FDR and BIC procedures, considering only a subset of epistatic subtypes with their corresponding null models: all epistatic subtypes (A, P, and M); only the additive and multiplicative subtypes (A and P); and only the additive (A), only the multiplicative (P), or only the minimum (M) subtype (see text for details)

	Subtypes considered in BIC procedure
	A, P, M	A, P	A	P	M
Study J
No. found (636)	352	273	263	231	329
Functional links (25)	19 (0.0255)*	13 (0.2320)	11 (0.4689)	10 (0.4227)	15 (0.2619)
GO Slim terms (Biological Process) (115)	69 (0.1573)	50 (0.4874)	55 (0.0736)	44 (0.3534)	68 (0.04902)*
GO Slim terms (all) (369)	224 (0.0009)*	172 (0.01654)*	160 (0.1297)	146 (0.0273)*	213 (0.0003)*
Experimentally identified (3)	3	2	1	2	3
Study S
No. found (323)	193	192	247	171	243
Functional links (36)	21 (0.6450)	29 (0.0041)*	34 (0.0031)*	29 (0.0003)*	24 (0.9256)
GO Slim terms (Biological Process) (283)	174 (0.0657)	174 (0.03656)*	223 (0.0010)*	153 (0.1825)	213 (0.5534)
GO Slim terms (all) (307)	185 (0.2866)	182 (0.6926)	237 (0.1472)	162 (0.6997)	231 (0.5908)
Experimentally identified (29)	17	22	24	23	21

Open in a separate windowNumbers in parentheses indicate P-values by Fisher''s exact test. “*” indicates significance. Study J refers to the data set, and study S refers to the data set measured in the presence of MMS. Numbers in parentheses indicate the total number of tested pairs and the total number of each type of link found in each complete data set.The epistatic subtypes we consider are not necessarily mutually exclusive. To more fully assess the assumptions of our method, we also consider several of the possible subsets of the epistatic subtypes (and their corresponding null models) in our procedure. As the minimum epistatic subtype was the most frequently selected in this data set, we first do not include the minimum null model or the minimum epistatic model in our procedure (i.e., we select from among four rather than six models for a pair; Table S4). However, there are a significant number of epistatic pairs with functional links only when the minimum epistatic subtype is not included (also see Table S4 and Table S5). It is not immediately clear which epistatic subtypes are the most appropriate for these data, although including the minimum subtype may not be appropriate () (see discussion).Although it may be best to consider fewer epistatic subtypes for this specific data set, we report our results including all three epistatic subtypes and their corresponding null models (St. Onge et al. (2007), although we identify 105 epistatic pairs not identified by the original authors (Figure S4, Table S4). find that epistatic pairs with a functional link have a positively shifted distribution of epistasis. We find no such shift in epistasis values (Figure S5). We also demonstrate [described in application to simulated data: Bias and variance of the epistatic parameter (ɛ)] that our method seems to reduce bias of the epistatic parameter (ɛ) (Table S3).] When considering only a subset of the epistatic subtypes, however, we find to be positive and significantly different from zero (results not shown). See File S1, Figure S6, and Figure S7 for additional discussion of the epistatic pairs we identify.

Jasnos and Korona (2007) data set:

The data set included 758 yeast gene deletions known to cause growth defects and reports fitnesses of only a sparse subset of all possible gene pairs [≈0.2% of the possible pairwise genotypes, or 639 pairs of ]. Because the authors do not identify epistatic pairs in a hypothesis-testing framework, we cannot explicitly compare our conclusions with theirs.To validate our method, we examine gene pairs that have specific functional links (see methods: Analysis of experimental data). When defining a functional link using GO terms () with <30 genes associated with them, only 1 of 639 tested gene pairs has a functional link. Raising the threshold of associated genes to 50 and 100, the number of tested pairs with functional links rises only to 3 and 9, respectively. Because of the large number of random genes and the sparse number of gene pairs in this data set, we follow and select GO terms that have associated with them ≤200 genes. Twenty-five of 639 tested pairs then have a functional link.Only the FDR multiple-testing procedure results in a significant enrichment of functional links among epistatic pairs (File S1). With the FDR procedure we find 352 significant epistatic pairs, of which 35 (9.9%) have additive epistasis, 63 (17.9%) have multiplicative epistasis, and 254 (72.2%) have minimum epistasis (File S1_3). These proportions of inferred subtypes suggest that the authors'' original restriction to multiplicative epistasis may be inappropriate. We find 141 gene pairs with positive epistasis and 211 gene pairs with negative epistasis.We do not find a significant number of epistatic pairs with shared GO Slim Biological Process terms (see Analysis of experimental data), but do when considering all shared GO Slim terms (St. Onge et al. (2007) data set, we also consider some of the possible subsets of the three epistatic subtypes (and their corresponding null models) in our model selection procedure (Table S5). In contrast to the data set, using all three epistatic subtypes results in a significant number of epistatic pairs with functional links; this measure is not significant when using any of the other subsets of the subtypes. This suggests that our proposed method with three epistatic subtypes may indeed be the most appropriate for data sets with randomly selected genes.We examined the distribution of the estimated values of the epistatic parameter (ɛ) for all pairs with significant epistasis. , in assuming only multiplicative epistasis, conclude that epistasis is predominantly positive. However, we find that the estimated mean of epistasis is not significantly different from zero (two-sided t-test, P-value = 0.9578; Figure 1 and File S1.Open in a separate window Figure 1.—Distribution of the epistasis values (ɛ) for significant epistatic pairs in the data set, determined using the FDR procedure and the BIC method including all three epistatic subtypes and their corresponding null models. Mean of ɛ is −0.0009, with a standard deviation of 0.3177; median value is −0.0587. A similar plot is shown in Figure 3 of .

Keywords:

设为首页 | 免责声明 | 关于勤云 | 加入收藏