首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
We used established databases in standard ways to systematically characterize gene ontologies, pathways and functional linkages in the large set of genes now associated with autism spectrum disorders (ASDs). These conditions are particularly challenging—they lack clear pathognomonic biological markers, they involve great heterogeneity across multiple levels (genes, systemic biological and brain characteristics, and nuances of behavioral manifestations)—and yet everyone with this diagnosis meets the same defining behavioral criteria. Using the human gene list from Simons Foundation Autism Research Initiative (SFARI) we performed gene set enrichment analysis with the Kyoto Encyclopedia of Genes and Genomes (KEGG) Pathway Database, and then derived a pathway network from pathway-pathway functional interactions again in reference to KEGG. Through identifying the GO (Gene Ontology) groups in which SFARI genes were enriched, mapping the coherence between pathways and GO groups, and ranking the relative strengths of representation of pathway network components, we 1) identified 10 disease-associated and 30 function-associated pathways 2) revealed calcium signaling pathway and neuroactive ligand-receptor interaction as the most enriched, statistically significant pathways from the enrichment analysis, 3) showed calcium signaling pathways and MAPK signaling pathway to be interactive hubs with other pathways and also to be involved with pervasively present biological processes, 4) found convergent indications that the process “calcium-PRC (protein kinase C)-Ras-Raf-MAPK/ERK” is likely a major contributor to ASD pathophysiology, and 5) noted that perturbations associated with KEGG’s category of environmental information processing were common. These findings support the idea that ASD-associated genes may contribute not only to core features of ASD themselves but also to vulnerability to other chronic and systemic problems potentially including cancer, metabolic conditions and heart diseases. ASDs may thus arise, or emerge, from underlying vulnerabilities related to pleiotropic genes associated with pervasively important molecular mechanisms, vulnerability to environmental input and multiple systemic co-morbidities.  相似文献   

3.
The use of high-throughput techniques to generate large volumes of protein-protein interaction (PPI) data has increased the need for methods that systematically and automatically suggest functional relationships among proteins. In a yeast PPI network, previous work has shown that the local connection topology, particularly for two proteins sharing an unusually large number of neighbors, can predict functional association. In this study we improved the prediction scheme by developing a new algorithm and applied it on a human PPI network to make a genome-wide functional inference. We used the new algorithm to measure and reduce the influence of hub proteins on detecting function-associated protein pairs. We used the annotations of the Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) as benchmarks to compare and evaluate the function relevance. The application of our algorithms to human PPI data yielded 4,233 significant functional associations among 1,754 proteins. Further functional comparisons between them allowed us to assign 466 KEGG pathway annotations to 274 proteins and 123 GO annotations to 114 proteins with estimated false discovery rates of <21% for KEGG and <30% for GO. We clustered 1,729 proteins by their functional associations and made functional inferences from detailed analysis on one subcluster highly enriched in the TGF-β signaling pathway (P<10−50). Analysis of another four subclusters also suggested potential new players in six signaling pathways worthy of further experimental investigations. Our study gives clear insight into the common neighbor-based prediction scheme and provides a reliable method for large-scale functional annotation in this post-genomic era.  相似文献   

4.
5.
6.
7.
8.
Predicting gene functions by integrating large-scale biological data remains a challenge for systems biology. Here we present a resource for Drosophila melanogaster gene function predictions. We trained function-specific classifiers to optimize the influence of different biological datasets for each functional category. Our model predicted GO terms and KEGG pathway memberships for Drosophila melanogaster genes with high accuracy, as affirmed by cross-validation, supporting literature evidence, and large-scale RNAi screens. The resulting resource of prioritized associations between Drosophila genes and their potential functions offers a guide for experimental investigations.  相似文献   

9.
Advanced statistical methods used to analyze high-throughput data such as gene-expression assays result in long lists of “significant genes.” One way to gain insight into the significance of altered expression levels is to determine whether Gene Ontology (GO) terms associated with a particular biological process, molecular function, or cellular component are over- or under-represented in the set of genes deemed significant. This process, referred to as enrichment analysis, profiles a gene-set, and is widely used to makes sense of the results of high-throughput experiments. The canonical example of enrichment analysis is when the output dataset is a list of genes differentially expressed in some condition. To determine the biological relevance of a lengthy gene list, the usual solution is to perform enrichment analysis with the GO. We can aggregate the annotating GO concepts for each gene in this list, and arrive at a profile of the biological processes or mechanisms affected by the condition under study. While GO has been the principal target for enrichment analysis, the methods of enrichment analysis are generalizable. We can conduct the same sort of profiling along other ontologies of interest. Just as scientists can ask “Which biological process is over-represented in my set of interesting genes or proteins?” we can also ask “Which disease (or class of diseases) is over-represented in my set of interesting genes or proteins?“. For example, by annotating known protein mutations with disease terms from the ontologies in BioPortal, Mort et al. recently identified a class of diseases—blood coagulation disorders—that were associated with a 14-fold depletion in substitutions at O-linked glycosylation sites. With the availability of tools for automatic annotation of datasets with terms from disease ontologies, there is no reason to restrict enrichment analyses to the GO. In this chapter, we will discuss methods to perform enrichment analysis using any ontology available in the biomedical domain. We will review the general methodology of enrichment analysis, the associated challenges, and discuss the novel translational analyses enabled by the existence of public, national computational infrastructure and by the use of disease ontologies in such analyses.

What to Learn in This Chapter

  • Review the commonly used approach of Gene Ontology based enrichment analysis
  • Understand the pitfalls associated with current approaches
  • Understand the national infrastructure available for using alternative ontologies for enrichment analysis
  • Learn about a generalized enrichment analysis workflow and its application using disease ontologies
This article is part of the “Translational Bioinformatics” collection for PLOS Computational Biology.
  相似文献   

10.
11.
A large volume of honey bee (Apis mellifera) tag-seq was obtained to identify differential gene expression via Solexa/lllumina Digital Gene Expression tag profiling (DGE) based on next generation sequencing. In total, 4,286,250 (foragers) and 3,422,327 (nurses) clean tags were sequenced, 24,568 (foragers) and 13,134 (nurses) distinct clean tags could not be match to the reference database, and 7508 and 6875 mapped genes were detected in foragers and nurses respectively. 7045 genes were found differentially expressed between foragers and nurses. Of those genes, 1621genes had significantly different expression, that is, they showed an expression ratio (foragers/nurses) of more than 2 and FDR (False Discovery Rate) of less than 0.001. We identified 101 genes that were uniquely expressed in foragers, and 9 genes that were only expressed in nurses. We performed the Gene Ontology (GO) category and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis, and found 415 genes with annotation terms linked to the GO cellular component category. 200 components of KEGG pathways were obtained, including 21 signaling pathways. The PPAR signaling pathway was the most highly enriched, with the lowest Q-value.  相似文献   

12.
The structure of protein-protein interaction (PPI) networks has already been successfully used as a source of new biological information. Even though cardiovascular diseases (CVDs) are a major global cause of death, many CVD genes still await discovery. We explore ways to utilize the structure of the human PPI network to find important genes for CVDs that should be targeted by drugs. The hope is to use the properties of such important genes to predict new ones, which would in turn improve a choice of therapy. We propose a methodology that examines the PPI network wiring around genes involved in CVDs. We use the methodology to identify a subset of CVD-related genes that are statistically significantly enriched in drug targets and “driver genes.” We seek such genes, since driver genes have been proposed to drive onset and progression of a disease. Our identified subset of CVD genes has a large overlap with the Core Diseasome, which has been postulated to be the key to disease formation and hence should be the primary object of therapeutic intervention. This indicates that our methodology identifies “key” genes responsible for CVDs. Thus, we use it to predict new CVD genes and we validate over 70% of our predictions in the literature. Finally, we show that our predicted genes are functionally similar to currently known CVD drug targets, which confirms a potential utility of our methodology towards improving therapy for CVDs.  相似文献   

13.
SNP-based gene-set enrichment analysis from single nucleotide polymorphisms, or GSEA-SNP, is a tool to identify candidate genes based on enrichment analysis of sets of genes rather than single SNP associations. The objective of this study was to identify modest-effect genes associated with Mycobacterium avium subsp. paratuberculosis (Map) tissue infection or fecal shedding using GSEA-SNP applied to KEGG pathways or Gene Ontology (GO) gene sets. The Illumina Bovine SNP50 BeadChip was used to genotype 209 Holstein cows for the GSEA-SNP analyses. For each of 13,744 annotated genes genome-wide located within 50 kb of a Bovine SNP50 SNP, the single SNP with the highest Cochran-Armitage Max statistic was used as a proxy statistic for that gene’s strength of affiliation with Map. Gene-set enrichment was tested using a weighted Kolmogorov-Smirnov-like running sum statistic with data permutation to adjust for multiple testing. For tissue infection and fecal shedding, no gene sets in KEGG pathways or in GO sets for molecular function or cellular component were enriched for signal. The GO biological process gene set for positive regulation of cell motion (GO:0051272, q = 0.039, 5/11 genes contributing to the core enrichment) was enriched for Map tissue infection, while no GO biological process gene sets were enriched for fecal shedding. GSEA-SNP complements traditional SNP association approaches to identify genes of modest effects as well as genes with larger effects as demonstrated by the identification of one locus that we previously found to be associated with Map tissue infection using a SNP-by-SNP genome-wide association study.  相似文献   

14.
In the present study, we systematically investigated population differentiation of drug-related (DR) genes in order to identify common genetic features underlying population-specific responses to drugs. To do so, we used the International HapMap project release 27 Data and Pharmacogenomics Knowledge Base (PharmGKB) database. First, we compared four measures for assessing population differentiation: the chi-square test, the analysis of variance (ANOVA) F-test, Fst, and Nearest Shrunken Centroid Method (NSCM). Fst showed high sensitivity with stable specificity among varying sample sizes; thus, we selected Fst for determining population differentiation. Second, we divided DR genes from PharmGKB into two groups based on the degree of population differentiation as assessed by Fst: genes with a high level of differentiation (HD gene group) and genes with a low level of differentiation (LD gene group). Last, we conducted a gene ontology (GO) analysis and pathway analysis. Using all genes in the human genome as the background, the GO analysis and pathway analysis of the HD genes identified terms related to cell communication. “Cell communication” and “cell-cell signaling” had the lowest Benjamini-Hochberg’s q-values (0.0002 and 0.0006, respectively), and “drug binding” was highly enriched (16.51) despite its relatively high q-value (0.0142). Among the 17 genes related to cell communication identified in the HD gene group, five genes (STX4, PPARD, DCK, GRIK4, and DRD3) contained single nucleotide polymorphisms with Fst values greater than 0.5. Specifically, the Fst values for rs10871454, rs6922548, rs3775289, rs1954787, and rs167771 were 0.682, 0.620, 0.573, 0.531, and 0.510, respectively. In the analysis using DR genes as the background, the HD gene group contained six significant terms. Five were related to reproduction, and one was “Wnt signaling pathway,” which has been implicated in cancer. Our analysis suggests that the HD gene group from PharmGKB is associated with cell communication and drug binding.  相似文献   

15.
16.
Algorithms for active module identification (AMI) are central to analysis of omics data. Such algorithms receive a gene network and nodes'' activity scores as input and report subnetworks that show significant over‐representation of accrued activity signal (“active modules”), thus representing biological processes that presumably play key roles in the analyzed conditions. Here, we systematically evaluated six popular AMI methods on gene expression and GWAS data. We observed that GO terms enriched in modules detected on the real data were often also enriched on modules found on randomly permuted data. This indicated that AMI methods frequently report modules that are not specific to the biological context measured by the analyzed omics dataset. To tackle this bias, we designed a permutation‐based method that empirically evaluates GO terms reported by AMI methods. We used the method to fashion five novel AMI performance criteria. Last, we developed DOMINO, a novel AMI algorithm, that outperformed the other six algorithms in extensive testing on GE and GWAS data. Software is available at https://github.com/Shamir‐Lab.  相似文献   

17.
Yang JO  Charny P  Lee B  Kim S  Bhak J  Woo HG 《Bioinformation》2007,2(5):194-196
GS2PATH is a Web-based pipeline tool to permit functional enrichment of a given gene set from prior knowledge databases, including gene ontology (GO) database and biological pathway databases. The tool also provides an estimation of gene set enrichment, in GO terms, from the databases of the KEGG and BioCarta pathways, which may allow users to compute and compare functional over-representations. This is especially useful in the perspective of biological pathways such as metabolic, signal transduction, genetic information processing, environmental information processing, cellular process, disease, and drug development. It provides relevant images of biochemical pathways with highlighting of the gene set by customized colors, which can directly assist in the visualization of functional alteration.

Availability  相似文献   


18.
为了探讨牦牛适应高海拔低氧环境的基因表达特征与规律,对在高海拔(3 560 m)和低海拔地区(478 m)饲育4个月的2.5~3岁健康雄性麦洼牦牛肺组织进行转录组测序。转录组测序采用Illumina高通量测序平台(HiSeqTM2500/4000)进行,并以qRT-PCR验证差异表达基因的表达量。结果显示,高海拔组牦牛肺脏转录组平均每个测序样本得到约5.76亿条Clean Reads,低海拔组牦牛中得到约6.10亿条Clean Reads,比对到参考基因组上的Reads数分别占91.74%和91.28%以上,共发现了2 047个新转录本。低海拔组与高海拔组牦牛肺脏组织之间共有199个差异表达基因,其中含89个差异上调表达基因和110个差异下调表达基因。所得差异表达基因富集在297个GO条目和146个KEGG通路中,包含62个低氧适应相关的GO条目和35个低氧适应相关代谢通路。其中低氧适应相关GO条目在生物过程、细胞组成和分子功能三种类别中占比最多的分别为细胞粘附、蛋白复合物和钙离子结合。低氧适应相关KEGG通路中占比最多的为肿瘤坏死因子(TNF)信号通路,其次为低氧诱导因子1(HIF-1)信号通路。qRT-PCR验证结果显示,Ⅱ类人类白细胞抗原α链(HLA-DOA、HLA-DRA)、补体因子 (C2)和甘露糖结合凝集素相关丝氨酸蛋白酶1(MASP1)基因的表达量变化与转录组测序结果相符。本研究为全局和深入理解牦牛肺组织转录本表达对高海拔低氧的响应提供了有价值的切入点。  相似文献   

19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号