首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.

Background

Genome-wide association studies (GWAS) have become a common approach to identifying single nucleotide polymorphisms (SNPs) associated with complex diseases. As complex diseases are caused by the joint effects of multiple genes, while the effect of individual gene or SNP is modest, a method considering the joint effects of multiple SNPs can be more powerful than testing individual SNPs. The multi-SNP analysis aims to test association based on a SNP set, usually defined based on biological knowledge such as gene or pathway, which may contain only a portion of SNPs with effects on the disease. Therefore, a challenge for the multi-SNP analysis is how to effectively select a subset of SNPs with promising association signals from the SNP set.

Results

We developed the Optimal P-value Threshold Pedigree Disequilibrium Test (OPTPDT). The OPTPDT uses general nuclear families. A variable p-value threshold algorithm is used to determine an optimal p-value threshold for selecting a subset of SNPs. A permutation procedure is used to assess the significance of the test. We used simulations to verify that the OPTPDT has correct type I error rates. Our power studies showed that the OPTPDT can be more powerful than the set-based test in PLINK, the multi-SNP FBAT test, and the p-value based test GATES. We applied the OPTPDT to a family-based autism GWAS dataset for gene-based association analysis and identified MACROD2-AS1 with genome-wide significance (p-value= 2.5 × 10− 6).

Conclusions

Our simulation results suggested that the OPTPDT is a valid and powerful test. The OPTPDT will be helpful for gene-based or pathway association analysis. The method is ideal for the secondary analysis of existing GWAS datasets, which may identify a set of SNPs with joint effects on the disease.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1620-3) contains supplementary material, which is available to authorized users.  相似文献   

2.
Li C  Han J  Shang D  Li J  Wang Y  Wang Y  Zhang Y  Yao Q  Zhang C  Li K  Li X 《Gene》2012,503(1):101-109
Most methods for genome-wide association studies (GWAS) focus on discovering a single genetic variant, but the pathogenesis of complex diseases is thought to arise from the joint effect of multiple genetic variants. Information about pathway structure, such as the interactions and distances between gene products within pathways, can help us learn more about the functions and joint effect of genes associated with disease risk. We developed a novel sub-pathway based approach to study the joint effect of multiple genetic variants that are modestly associated with disease. The approach prioritized sub-pathways based on the significance values of single nucleotide polymorphisms (SNPs) and the interactions and distances between gene products within pathways. We applied the method to seven complex diseases. The result showed that our method can efficiently identify statistically significant sub-pathways associated with the pathogenesis of complex diseases. The approach identified sub-pathways that may inform the interpretation of GWAS data.  相似文献   

3.
Torkamani A  Topol EJ  Schork NJ 《Genomics》2008,92(5):265-272
Recent genome-wide association studies (GWAS) have identified DNA sequence variations that exhibit unequivocal statistical associations with many common chronic diseases. However, the vast majority of these studies identified variations that explain only a very small fraction of disease burden in the population at large, suggesting that other factors, such as multiple rare or low-penetrance variations and interacting environmental factors, are major contributors to disease susceptibility. Identifying multiple low-penetrance variations (or "polygenes") contributing to disease susceptibility will be difficult. We present a pathway analysis approach to characterizing the likely polygenic basis of seven common diseases using the Wellcome Trust Case Control Consortium (WTCCC) GWAS results. We identify numerous pathways implicated in disease predisposition that would have not been revealed using standard single-locus GWAS statistical analysis criteria. Many of these pathways have long been assumed to contain polymorphic genes that lead to disease predisposition. Additionally, we analyze the genetic relationships between the seven diseases, and based upon similarities with respect to the associated genes and pathways affected in each, propose a new way of categorizing the diseases.  相似文献   

4.
5.
A Nazarian  H Sichtig  A Riva 《PloS one》2012,7(9):e44162
Complex disorders are a class of diseases whose phenotypic variance is caused by the interplay of multiple genetic and environmental factors. Analyzing the complexity underlying the genetic architecture of such traits may help develop more efficient diagnostic tests and therapeutic protocols. Despite the continuous advances in revealing the genetic basis of many of complex diseases using genome-wide association studies (GWAS), a major proportion of their genetic variance has remained unexplained, in part because GWAS are unable to reliably detect small individual risk contributions and to capture the underlying genetic heterogeneity. In this paper we describe a hypothesis-based method to analyze the association between multiple genetic factors and a complex phenotype. Starting from sets of markers selected based on preexisting biomedical knowledge, our method generates multi-marker models relevant to the biological process underlying a complex trait for which genotype data is available. We tested the applicability of our method using the WTCCC case-control dataset. Analyzing a number of biological pathways, the method was able to identify several immune system related multi-SNP models significantly associated with Rheumatoid Arthritis (RA) and Crohn's disease (CD). RA-associated multi-SNP models were also replicated in an independent case-control dataset. The method we present provides a framework for capturing joint contributions of genetic factors to complex traits. In contrast to hypothesis-free approaches, its results can be given a direct biological interpretation. The replicated multi-SNP models generated by our analysis may serve as a predictor to estimate the risk of RA development in individuals of Caucasian ancestry.  相似文献   

6.
Standard approaches to data analysis in genome-wide association studies (GWAS) ignore any potential functional relationships between gene variants. In contrast gene pathways analysis uses prior information on functional structure within the genome to identify pathways associated with a trait of interest. In a second step, important single nucleotide polymorphisms (SNPs) or genes may be identified within associated pathways. The pathways approach is motivated by the fact that genes do not act alone, but instead have effects that are likely to be mediated through their interaction in gene pathways. Where this is the case, pathways approaches may reveal aspects of a trait''s genetic architecture that would otherwise be missed when considering SNPs in isolation. Most pathways methods begin by testing SNPs one at a time, and so fail to capitalise on the potential advantages inherent in a multi-SNP, joint modelling approach. Here, we describe a dual-level, sparse regression model for the simultaneous identification of pathways and genes associated with a quantitative trait. Our method takes account of various factors specific to the joint modelling of pathways with genome-wide data, including widespread correlation between genetic predictors, and the fact that variants may overlap multiple pathways. We use a resampling strategy that exploits finite sample variability to provide robust rankings for pathways and genes. We test our method through simulation, and use it to perform pathways-driven gene selection in a search for pathways and genes associated with variation in serum high-density lipoprotein cholesterol levels in two separate GWAS cohorts of Asian adults. By comparing results from both cohorts we identify a number of candidate pathways including those associated with cardiomyopathy, and T cell receptor and PPAR signalling. Highlighted genes include those associated with the L-type calcium channel, adenylate cyclase, integrin, laminin, MAPK signalling and immune function.  相似文献   

7.
The aim of this study was to identify candidate causal single nucleotide polymorphisms (SNPs) and candidate causal mechanisms of psoriasis and Behcets’s disease (BD) and to generate an SNP → gene → pathway hypothesis. A psoriasis genome-wide association study (GWAS) dataset that included 436,192 SNPs in 1,409 psoriasis cases and 1,436 controls of European descent and a BD GWAS dataset that contained 310,324 SNPs in 1,215 BD cases and 1,278 controls were used in this study. Identify candidate causal SNPs and pathways (ICSNPathway) analysis was applied to the GWAS datasets. ICSNPathway analysis identified 15 candidate causal SNPs and 28 candidate causal pathways. The top five candidate causal SNPs were rs1063478 (P = 1.45E−10), rs8084 (P = 2.20E−08), rs7192 (P = 5.18E−08), rs20541 (P = 5.30E−06), and rs1130838 (P = 5.65E−06), which with the exception of rs20541 [interleukin (IL)-13] are at human leukocyte antigen (HLA) loci. These candidate causal SNPs and pathways provided ten hypothetical biological mechanisms. The most strongly associated pathway concerned HLA. When HLA loci were excluded, ICSNPathway analysis provided one hypothetical biological mechanism. rs20541 (non_synonymous_coding) → IL-13 → dendritic cell involvement in the regulation of Th1 and Th2 development, and the GATA3 pathway. ICSNPathway analysis identified four candidate causal SNPs, eleven candidate causal pathways, and three hypothetical biological mechanisms. One of them was as follows: rs2072895 (non_synonymous_coding & splice-site) and rs2735059 (non_synonymous_coding) → HLA-F → type I diabetes mellitus, antigen processing and presentation, and autoimmune thyroid disease. The application of ICSNPathway analysis to GWAS dataset of psoriasis and BD resulted in the identification of candidate causal SNPs and candidate pathways that might contribute to psoriasis susceptibility.  相似文献   

8.
The aim of this study was to identify the candidate causal single nucleotide polymorphisms (SNPs) and candidate causal mechanisms that contribute to bone mineral density (BMD) and to generate a SNP to gene to pathway hypothesis using an analytical pathway-based approach. We used hip BMD GWAS data of the genotypes of 301,019 SNPs in 5,715 Europeans. ICSNPathway (identify candidate causal SNPs and pathways) analysis was applied to the BMD GWAS dataset. The first stage involved the pre-selection of candidate causal SNPs by linkage disequilibrium analysis and the functional SNP annotation of the most significant SNPs found. The second stage involved the annotation of biological mechanisms for the pre-selected candidate causal SNPs using improved-gene set enrichment analysis. ICSNPathway analysis identified seven candidate SNPs, eight candidate pathways, and seven hypothetical biological mechanisms. Eight pathways are as follows; gamma-hexachlorocyclohexane degradation (nominal p-value < 0.001, false discovery rate (FDR) <0.001), regulation of the smoothened signaling pathway (nominal p-value < 0.001, FDR = 0.016), TACI and BCMA stimulation of B cell immune response (nominal p-value < 0.001, FDR = 0.021), endonuclease activity (nominal p-value = 0.001, FDR = 0,026), regulation of defense response to virus (nominal p-value = 0.001, FDR = 0.028), serine_type_endopeptidase_inhibitor_activity (nominal p-value = 0.001, FDR = 0.044), endoribonuclease activity (nominal p-value = 0.002, FDR = 0.045), and myeloid leukocyte differentiation (nominal p-value = 0.001, FDR = 0.050). The most significant causal pathway was gamma-hexachlorocyclohexane degradation. CYP3A5, PON2, PON3, CMBL, PON1, ALPL, CYP3A43, CYP3A7, ACP6, ACPP, and ALPI (p < 0.05) are involved in the pathway of gamma-hexachlorocyclohexane degradation. Further examination of the gene contents revealed that DBR1, DICER1, EXO1, FEN1, POP1, POP4, RPP30, and RPP38 were involved in 2 of the 8 pathways (p < 0.05). By applying ICSNPathway analysis to BMD GWAS data, we identified seven candidate SNPs and eight pathways involving gamma-hexachlorocyclohexane degradation, which may contribute to low BMD.  相似文献   

9.
Chung RH  Chen YE 《PloS one》2012,7(5):e36662
Pathway analysis provides a powerful approach for identifying the joint effect of genes grouped into biologically-based pathways on disease. Pathway analysis is also an attractive approach for a secondary analysis of genome-wide association study (GWAS) data that may still yield new results from these valuable datasets. Most of the current pathway analysis methods focused on testing the cumulative main effects of genes in a pathway. However, for complex diseases, gene-gene interactions are expected to play a critical role in disease etiology. We extended a random forest-based method for pathway analysis by incorporating a two-stage design. We used simulations to verify that the proposed method has the correct type I error rates. We also used simulations to show that the method is more powerful than the original random forest-based pathway approach and the set-based test implemented in PLINK in the presence of gene-gene interactions. Finally, we applied the method to a breast cancer GWAS dataset and a lung cancer GWAS dataset and interesting pathways were identified that have implications for breast and lung cancers.  相似文献   

10.
罗旭红刘志芳  董长征 《遗传》2013,35(9):1065-1071
全基因组关联研究(Genome wide association study, GWAS)已经在国内外的医学遗传学研究中得到广泛应用, 但是GWAS数据中所蕴含的与多基因复杂性状疾病机制相关的丰富信息尚未得到深度挖掘。近年来, 研究者采用生物网络分析和生物通路分析等生物信息学和生物统计学手段分析GWAS数据, 并探索潜在的疾病机制。生物网络分析和生物通路分析主要是以基因为单位进行的, 因此必须在分析前将基因上全部或者部分单个单核苷酸多态性(Single nucleotide polymorphism, SNP)的遗传关联结果综合起来, 即基因水平的关联分析。基因水平的关联分析需要考虑单个SNP的遗传关联、基因上SNP数量和SNP之间的连锁不平衡结构等多种因素, 因此不仅在遗传学的概念上也在统计方法方面具有一定的复杂性和挑战性。文章对基因水平的关联分析的研究进展、原理和应用进行了综述。  相似文献   

11.
Genome-wide association studies (GWAS) with hundreds of żthousands of single nucleotide polymorphisms (SNPs) are popular strategies to reveal the genetic basis of human complex diseases. Despite many successes of GWAS, it is well recognized that new analytical approaches have to be integrated to achieve their full potential. Starting with a list of SNPs, found to be associated with disease in GWAS, here we propose a novel methodology to devise functionally important KEGG pathways through the identification of genes within these pathways, where these genes are obtained from SNP analysis. Our methodology is based on functionalization of important SNPs to identify effected genes and disease related pathways. We have tested our methodology on WTCCC Rheumatoid Arthritis (RA) dataset and identified: i) previously known RA related KEGG pathways (e.g., Toll-like receptor signaling, Jak-STAT signaling, Antigen processing, Leukocyte transendothelial migration and MAPK signaling pathways); ii) additional KEGG pathways (e.g., Pathways in cancer, Neurotrophin signaling, Chemokine signaling pathways) as associated with RA. Furthermore, these newly found pathways included genes which are targets of RA-specific drugs. Even though GWAS analysis identifies 14 out of 83 of those drug target genes; newly found functionally important KEGG pathways led to the discovery of 25 out of 83 genes, known to be used as drug targets for the treatment of RA. Among the previously known pathways, we identified additional genes associated with RA (e.g. Antigen processing and presentation, Tight junction). Importantly, within these pathways, the associations between some of these additionally found genes, such as HLA-C, HLA-G, PRKCQ, PRKCZ, TAP1, TAP2 and RA were verified by either OMIM database or by literature retrieved from the NCBI PubMed module. With the whole-genome sequencing on the horizon, we show that the full potential of GWAS can be achieved by integrating pathway and network-oriented analysis and prior knowledge from functional properties of a SNP.  相似文献   

12.
Liu LY  Schaub MA  Sirota M  Butte AJ 《Human genetics》2012,131(3):353-364
Men and women differ in susceptibility to many diseases and in responses to treatment. Recent advances in genome-wide association studies (GWAS) provide a wealth of data for associating genetic profiles with disease risk; however, in general, these data have not been systematically probed for sex differences in gene-disease associations. Incorporating sex into the analysis of GWAS results can elucidate new relationships between single nucleotide polymorphisms (SNPs) and human disease. In this study, we performed a sex-differentiated analysis on significant SNPs from GWAS data of the seven common diseases studied by the Wellcome Trust Case Control Consortium. We employed and compared three methods: logistic regression, Woolf’s test of heterogeneity, and a novel statistical metric that we developed called permutation method to assess sex effects (PMASE). After correction for false discovery, PMASE finds SNPs that are significantly associated with disease in only one sex. These sexually dimorphic SNP-disease associations occur in Coronary Artery Disease and Crohn’s Disease. GWAS analyses that fail to consider sex-specific effects may miss discovering sexual dimorphism in SNP-disease associations that give new insights into differences in disease mechanism between men and women.  相似文献   

13.
Parkinson's disease (PD) has had six genome-wide association studies (GWAS) conducted as well as several gene expression studies. However, only variants in MAPT and SNCA have been consistently replicated. To improve the utility of these approaches, we applied pathway analyses integrating both GWAS and gene expression. The top 5000 SNPs (p<0.01) from a joint analysis of three existing PD GWAS were identified and each assigned to a gene. For gene expression, rather than the traditional comparison of one anatomical region between sets of patients and controls, we identified differentially expressed genes between adjacent Braak regions in each individual and adjusted using average control expression profiles. Over-represented pathways were calculated using a hyper-geometric statistical comparison. An integrated, systems meta-analysis of the over-represented pathways combined the expression and GWAS results using a Fisher's combined probability test. Four of the top seven pathways from each approach were identical. The top three pathways in the meta-analysis, with their corrected p-values, were axonal guidance (p = 2.8E-07), focal adhesion (p = 7.7E-06) and calcium signaling (p = 2.9E-05). These results support that a systems biology (pathway) approach will provide additional insight into the genetic etiology of PD and that these pathways have both biological and statistical support to be important in PD.  相似文献   

14.

Background

A genome-wide association study (GWAS) typically involves examining representative SNPs in individuals from some population. A GWAS data set can concern a million SNPs and may soon concern billions. Researchers investigate the association of each SNP individually with a disease, and it is becoming increasingly commonplace to also analyze multi-SNP associations. Techniques for handling so many hypotheses include the Bonferroni correction and recently developed Bayesian methods. These methods can encounter problems. Most importantly, they are not applicable to a complex multi-locus hypothesis which has several competing hypotheses rather than only a null hypothesis. A method that computes the posterior probability of complex hypotheses is a pressing need.

Methodology/Findings

We introduce the Bayesian network posterior probability (BNPP) method which addresses the difficulties. The method represents the relationship between a disease and SNPs using a directed acyclic graph (DAG) model, and computes the likelihood of such models using a Bayesian network scoring criterion. The posterior probability of a hypothesis is computed based on the likelihoods of all competing hypotheses. The BNPP can not only be used to evaluate a hypothesis that has previously been discovered or suspected, but also to discover new disease loci associations. The results of experiments using simulated and real data sets are presented. Our results concerning simulated data sets indicate that the BNPP exhibits both better evaluation and discovery performance than does a p-value based method. For the real data sets, previous findings in the literature are confirmed and additional findings are found.

Conclusions/Significance

We conclude that the BNPP resolves a pressing problem by providing a way to compute the posterior probability of complex multi-locus hypotheses. A researcher can use the BNPP to determine the expected utility of investigating a hypothesis further. Furthermore, we conclude that the BNPP is a promising method for discovering disease loci associations.  相似文献   

15.
16.
The widely used pathway-based approach for interpreting Genome Wide Association Studies (GWAS), assumes that since function is executed through the interactions of multiple genes, different perturbations of the same pathway would result in a similar phenotype. This assumption, however, was not systemically assessed on a large scale. To determine whether SNPs associated with a given complex phenotype affect the same pathways more than expected by chance, we analyzed 368 phenotypes that were studied in >5000 GWAS. We found 216 significant phenotype-pathway associations between 70 of the phenotypes we analyzed and known pathways. We also report 391 strong phenotype-phenotype associations between phenotypes that are affected by the same pathways. While some of these associations confirm previously reported connections, others are new and could shed light on the molecular basis of these diseases. Our findings confirm that phenotype-associated SNPs cluster into pathways much more than expected by chance. However, this is true for <20% (70/368) of the phenotypes. Different types of phenotypes show markedly different tendencies: Virtually all autoimmune phenotypes show strong clustering of SNPs into pathways, while most cancers and metabolic conditions, and all electrophysiological phenotypes, could not be significantly associated with any pathway despite being significantly associated with a large number of SNPs. While this may be due to missing data, it may also suggest that these phenotypes could result only from perturbations of specific genes and not from other perturbations of the same pathway. Further analysis of pathway-associated versus gene-associated phenotypes is, therefore, needed in order to understand disease etiology and in order to promote better drug target selection.  相似文献   

17.
Genome-wide association studies (GWAS) have been successful in identifying single nucleotide polymorphisms (SNPs) associated with many traits and diseases. However, at existing sample sizes, these variants explain only part of the estimated heritability. Leverage of GWAS results from related phenotypes may improve detection without the need for larger datasets. The Bayesian conditional false discovery rate (cFDR) constitutes an upper bound on the expected false discovery rate (FDR) across a set of SNPs whose p values for two diseases are both less than two disease-specific thresholds. Calculation of the cFDR requires only summary statistics and have several advantages over traditional GWAS analysis. However, existing methods require distinct control samples between studies. Here, we extend the technique to allow for some or all controls to be shared, increasing applicability. Several different SNP sets can be defined with the same cFDR value, and we show that the expected FDR across the union of these sets may exceed expected FDR in any single set. We describe a procedure to establish an upper bound for the expected FDR among the union of such sets of SNPs. We apply our technique to pairwise analysis of p values from ten autoimmune diseases with variable sharing of controls, enabling discovery of 59 SNP-disease associations which do not reach GWAS significance after genomic control in individual datasets. Most of the SNPs we highlight have previously been confirmed using replication studies or larger GWAS, a useful validation of our technique; we report eight SNP-disease associations across five diseases not previously declared. Our technique extends and strengthens the previous algorithm, and establishes robust limits on the expected FDR. This approach can improve SNP detection in GWAS, and give insight into shared aetiology between phenotypically related conditions.  相似文献   

18.
Single nucleotide polymorphisms (SNPs) associated with average daily gain (ADG) and dry matter intake (DMI), two major components of feed efficiency in cattle, were identified in a genome-wide association study (GWAS). Uni- and multi-SNP models were used to describe feed efficiency in a training data set and the results were confirmed in a validation data set. Results from the univariate and bivariate analyses of ADG and DMI, adjusted by the feedlot beef steer maintenance requirements, were compared. The bivariate uni-SNP analysis identified (P-value <0.0001) 11 SNPs, meanwhile the univariate analyses of ADG and DMI identified 8 and 9 SNPs, respectively. Among the six SNPs confirmed in the validation data set, five SNPs were mapped to KDELC2, PHOX2A, and TMEM40. Findings from the uni-SNP models were used to develop highly accurate predictive multi-SNP models in the training data set. Despite the substantially smaller size of the validation data set, the training multi-SNP models had slightly lower predictive ability when applied to the validation data set. Six Gene Ontology molecular functions related to ion transport activity were enriched (P-value <0.001) among the genes associated with the detected SNPs. The findings from this study demonstrate the complementary value of the uni- and multi-SNP models, and univariate and bivariate GWAS analyses. The identified SNPs can be used for genome-enabled improvement of feed efficiency in feedlot beef cattle, and can aid in the design of empirical studies to further confirm the associations.  相似文献   

19.
The aim of this study was to explore candidate single nucleotide polymorphisms (SNPs) and candidate mechanisms of systemic lupus erythematosus (SLE) and rheumatoid arthritis (RA). Two SLE genome-wide association studies (GWASs) datasets were included in this study. Meta-analysis was conducted using 737,984 SNPs in 1,527 SLE cases and 3,421 controls of European ancestry, and 4,429 SNPs that met a threshold of p?<?0.01 in a Korean RA GWAS dataset was used. ICSNPathway (identify candidate causal SNPs and pathways) analysis was applied to the meta-analysis results of the SLE GWAS datasets, and a RA GWAS dataset. The most significant result of SLE GWAS meta-analysis concerned rs2051549 in the human leukocyte antigen (HLA) region (p?=?3.36E?22). In the non-HLA region, meta-analysis identified 6 SNPs associated with SLE with genome-wide significance (STAT4, TNPO3, BLK, FAM167A, and IRF5). ICSNPathway identified five candidate causal SNPs and 13 candidate causal pathways. This pathway-based analysis provides three hypotheses of the biological mechanism involved. First, rs8084 and rs7192?→?HLA-DRA?→?bystander B cell activation. Second, rs1800629?→?TNF?→?cytokine network. Third, rs1150752 and rs185819?→?TNXB?→?collagen metabolic process. ICSNPathway analysis identified three candidate causal non-HLA SNPs and four candidate causal pathways involving the PADI4, MTR, PADI2, and TPH2 genes of RA. We identified five candidate SNPs and thirteen pathways, involving bystander B cell activation, cytokine network, and collagen metabolic processing, which may contribute to SLE susceptibility, and we revealed candidate causal non-HLA SNPs, genes, and pathways of RA.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号