Genome wide association studies (GWAS) identify susceptibility loci for complex traits, but do not identify particular genes of interest. Integration of functional and network information may help in overcoming this limitation and identifying new susceptibility loci. Using GWAS and comorbidity data, we present a network-based approach to predict candidate genes for lipid and lipoprotein traits. We apply a prediction pipeline incorporating interactome, co-expression, and comorbidity data to Global Lipids Genetics Consortium (GLGC) GWAS for four traits of interest, identifying phenotypically coherent modules. These modules provide insights regarding gene involvement in complex phenotypes with multiple susceptibility alleles and low effect sizes. To experimentally test our predictions, we selected four candidate genes and genotyped representative SNPs in the Malmö Diet and Cancer Cardiovascular Cohort. We found significant associations with LDL-C and total-cholesterol levels for a synonymous SNP (rs234706) in the cystathionine beta-synthase (CBS) gene (
p = 1 × 10
−5 and adjusted-
p = 0.013, respectively). Further, liver samples taken from 206 patients revealed that patients with the minor allele of rs234706 had significant dysregulation of CBS (
p = 0.04). Despite the known biological role of CBS in lipid metabolism, SNPs within the locus have not yet been identified in GWAS of lipoprotein traits. Thus, the
GWAS-based
Comorbidity
Module (GCM) approach identifies candidate genes missed by GWAS studies, serving as a broadly applicable tool for the investigation of other complex disease phenotypes.Genome wide association studies (GWAS)
1 meta-analyses have pinpointed a number of new gene regions contributing to multifactorial diseases. GWAS typically find limited numbers of loci that contribute modestly to complex phenotypes (
1), and GLGC meta-analysis of GWAS data has reached the limit of what can be expected (
2) without the use of alternative strategies. Given that susceptibility loci for complex traits are unlikely to be randomly distributed in the genome (
3), we might expect that the genes associated with a disease will be more likely to be present within the same pathways or functional groupings. In published cases, pathway based GWAS analysis provides an alternative approach to the dissection of complex disease traits (
4,
5). In addition, nominal GWAS
p values superimposed upon the human molecular network have been used to identify genes associated with multiple sclerosis (
6), and the disease association protein–protein link evaluator (DAPPLE) has been used to find significant interactions among proteins encoded by genes in loci associated with other particular diseases (
7). Other approaches incorporate heterogeneous molecular data such as linkage studies, cross species conservation measures, gene expression data and protein–protein interactions to better understand GWAS results (
8,
9). Integrating molecular network information, pathway analyses, and GWAS data thus holds promise for identifying new susceptibility loci and improving the identification of relevant candidate genes.If a gene is involved in a specific functional process or disease, its molecular network neighbors might also be suspected to have some role (
3). In line with this “local” hypothesis, proteins involved in the same disease show a high propensity to interact (
10) or cluster together (
11) with each other. Interactions between variations in multiple genes, each with strong or modest effects, perturbing the same pathways or modules, may govern complex traits (
3,
6). The molecular triangulation (MT) algorithm can be applied to rank seed genes according to their common disease associated neighbors, assigning closer and more connected neighbors higher values (
12). Interactions between modestly associated MT genes may be indicative of coherent disease pathways or of genes conferring susceptibility to disease in a coordinated manner. The
jActiveModule method (
13) combines seed gene scores with biologically relevant interactions to identify network modules where perturbations causative of disease are more likely to reside. Lastly, although not yet implemented at the module level, phenotypic coherence between interacting pairs of genes has been quantified using the combination of molecular level gene to disease relationships and Medicare comorbidity data (
14,
15).We believe that GWAS significant SNPs and variants representing potential candidate genes can use the above strategies to reveal more about the missing heritability of complex phenotypes. The most important risk factors for coronary artery disease (CAD) include serum concentrations of total cholesterol (TC), low-density lipoprotein cholesterol (LDL-C), high-density lipoprotein cholesterol (HDL-C) and triglycerides (TG). We present a GWAS-based meta-analysis Comorbid Module (GCM) approach that uses significant (
p < 5 × 10
−8) GWAS signals for these four traits in the context of molecular networks to prioritize modules of disease-associated candidate genes. We evaluate our approach experimentally through allelic association and genotyping within the Malmö Diet and Cancer Cardiovascular Cohort (MDC-CC) for SNPs representing top candidate genes.
相似文献