首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A new method based on a mathematically natural local search framework for max cut is developed to uncover functionally coherent module and BPM motifs in high-throughput genetic interaction data. Unlike previous methods, which also consider physical protein-protein interaction data, our method utilizes genetic interaction data only; this becomes increasingly important as high-throughput genetic interaction data is becoming available in settings where less is known about physical interaction data. We compare modules and BPMs obtained to previous methods and across different datasets. Despite needing no physical interaction information, the BPMs produced by our method are competitive with previous methods. Biological findings include a suggested global role for the prefoldin complex and a SWR subcomplex in pathway buffering in the budding yeast interactome.  相似文献   

2.
Advanced statistical methods used to analyze high-throughput data such as gene-expression assays result in long lists of “significant genes.” One way to gain insight into the significance of altered expression levels is to determine whether Gene Ontology (GO) terms associated with a particular biological process, molecular function, or cellular component are over- or under-represented in the set of genes deemed significant. This process, referred to as enrichment analysis, profiles a gene-set, and is widely used to makes sense of the results of high-throughput experiments. The canonical example of enrichment analysis is when the output dataset is a list of genes differentially expressed in some condition. To determine the biological relevance of a lengthy gene list, the usual solution is to perform enrichment analysis with the GO. We can aggregate the annotating GO concepts for each gene in this list, and arrive at a profile of the biological processes or mechanisms affected by the condition under study. While GO has been the principal target for enrichment analysis, the methods of enrichment analysis are generalizable. We can conduct the same sort of profiling along other ontologies of interest. Just as scientists can ask “Which biological process is over-represented in my set of interesting genes or proteins?” we can also ask “Which disease (or class of diseases) is over-represented in my set of interesting genes or proteins?“. For example, by annotating known protein mutations with disease terms from the ontologies in BioPortal, Mort et al. recently identified a class of diseases—blood coagulation disorders—that were associated with a 14-fold depletion in substitutions at O-linked glycosylation sites. With the availability of tools for automatic annotation of datasets with terms from disease ontologies, there is no reason to restrict enrichment analyses to the GO. In this chapter, we will discuss methods to perform enrichment analysis using any ontology available in the biomedical domain. We will review the general methodology of enrichment analysis, the associated challenges, and discuss the novel translational analyses enabled by the existence of public, national computational infrastructure and by the use of disease ontologies in such analyses.

What to Learn in This Chapter

  • Review the commonly used approach of Gene Ontology based enrichment analysis
  • Understand the pitfalls associated with current approaches
  • Understand the national infrastructure available for using alternative ontologies for enrichment analysis
  • Learn about a generalized enrichment analysis workflow and its application using disease ontologies
This article is part of the “Translational Bioinformatics” collection for PLOS Computational Biology.
  相似文献   

3.
4.
We used established databases in standard ways to systematically characterize gene ontologies, pathways and functional linkages in the large set of genes now associated with autism spectrum disorders (ASDs). These conditions are particularly challenging—they lack clear pathognomonic biological markers, they involve great heterogeneity across multiple levels (genes, systemic biological and brain characteristics, and nuances of behavioral manifestations)—and yet everyone with this diagnosis meets the same defining behavioral criteria. Using the human gene list from Simons Foundation Autism Research Initiative (SFARI) we performed gene set enrichment analysis with the Kyoto Encyclopedia of Genes and Genomes (KEGG) Pathway Database, and then derived a pathway network from pathway-pathway functional interactions again in reference to KEGG. Through identifying the GO (Gene Ontology) groups in which SFARI genes were enriched, mapping the coherence between pathways and GO groups, and ranking the relative strengths of representation of pathway network components, we 1) identified 10 disease-associated and 30 function-associated pathways 2) revealed calcium signaling pathway and neuroactive ligand-receptor interaction as the most enriched, statistically significant pathways from the enrichment analysis, 3) showed calcium signaling pathways and MAPK signaling pathway to be interactive hubs with other pathways and also to be involved with pervasively present biological processes, 4) found convergent indications that the process “calcium-PRC (protein kinase C)-Ras-Raf-MAPK/ERK” is likely a major contributor to ASD pathophysiology, and 5) noted that perturbations associated with KEGG’s category of environmental information processing were common. These findings support the idea that ASD-associated genes may contribute not only to core features of ASD themselves but also to vulnerability to other chronic and systemic problems potentially including cancer, metabolic conditions and heart diseases. ASDs may thus arise, or emerge, from underlying vulnerabilities related to pleiotropic genes associated with pervasively important molecular mechanisms, vulnerability to environmental input and multiple systemic co-morbidities.  相似文献   

5.

Background

Communalities between large sets of genes obtained from high-throughput experiments are often identified by searching for enrichments of genes with the same Gene Ontology (GO) annotations. The GO analysis tools used for these enrichment analyses assume that GO terms are independent and the semantic distances between all parent–child terms are identical, which is not true in a biological sense. In addition these tools output lists of often redundant or too specific GO terms, which are difficult to interpret in the context of the biological question investigated by the user. Therefore, there is a demand for a robust and reliable method for gene categorization and enrichment analysis.

Results

We have developed Categorizer, a tool that classifies genes into user-defined groups (categories) and calculates p-values for the enrichment of the categories. Categorizer identifies the biologically best-fit category for each gene by taking advantage of a specialized semantic similarity measure for GO terms. We demonstrate that Categorizer provides improved categorization and enrichment results of genetic modifiers of Huntington’s disease compared to a classical GO Slim-based approach or categorizations using other semantic similarity measures.

Conclusion

Categorizer enables more accurate categorizations of genes than currently available methods. This new tool will help experimental and computational biologists analyzing genomic and proteomic data according to their specific needs in a more reliable manner.  相似文献   

6.
The Gene Ontology (GO) provides biologists with a controlled terminology that describes how genes are associated with functions and how functional terms are related to one another. These term-term relationships encode how scientists conceive the organization of biological functions, and they take the form of a directed acyclic graph (DAG). Here, we propose that the network structure of gene-term annotations made using GO can be employed to establish an alternative approach for grouping functional terms that captures intrinsic functional relationships that are not evident in the hierarchical structure established in the GO DAG. Instead of relying on an externally defined organization for biological functions, our approach connects biological functions together if they are performed by the same genes, as indicated in a compendium of gene annotation data from numerous different sources. We show that grouping terms by this alternate scheme provides a new framework with which to describe and predict the functions of experimentally identified sets of genes.  相似文献   

7.
A proteome‐wide mapping of interactions between hepatitis C virus (HCV) and human proteins was performed to provide a comprehensive view of the cellular infection. A total of 314 protein–protein interactions between HCV and human proteins was identified by yeast two‐hybrid and 170 by literature mining. Integration of this data set into a reconstructed human interactome showed that cellular proteins interacting with HCV are enriched in highly central and interconnected proteins. A global analysis on the basis of functional annotation highlighted the enrichment of cellular pathways targeted by HCV. A network of proteins associated with frequent clinical disorders of chronically infected patients was constructed by connecting the insulin, Jak/STAT and TGFβ pathways with cellular proteins targeted by HCV. CORE protein appeared as a major perturbator of this network. Focal adhesion was identified as a new function affected by HCV, mainly by NS3 and NS5A proteins.  相似文献   

8.
9.
[目的]利用RNA-Seq,分析人巨噬细胞在牛痘病毒(VACV)感染前后基因表达的变化,探索牛痘病毒与宿主细胞相互作用的机制.[方法]用牛痘病毒感染人巨噬细胞,采用RNA-Seq比较感染组和对照组的差异表达基因,并进行KEGG、GO以及STRING网络分析.[结果]感染组与对照组相比,筛选出显著性差异表达基因4796个...  相似文献   

10.
Spinal cord injury (SCI) is associated with complex pathophysiological processes that follow the primary traumatic event and determine the extent of secondary damage and functional recovery. Numerous reports have used global and hypothesis-driven approaches to identify protein changes that contribute to the overall pathology of SCI in an effort to identify potential therapeutic interventions. In this study, we use a semi-automatic annotation approach to detect terms referring to genes or proteins dysregulated in the SCI literature and develop a curated SCI interactome. Network analysis of the SCI interactome revealed the presence of a rich-club organization corresponding to a “powerhouse” of highly interacting hub-proteins. Studying the modular organization of the network have shown that rich-club proteins cluster into modules that are specifically enriched for biological processes that fall under the categories of cell death, inflammation, injury recognition and systems development. Pathway analysis of the interactome and the rich-club revealed high similarity indicating the role of the rich-club proteins as hubs of the most prominent pathways in disease pathophysiology and illustrating the centrality of pro-and anti-survival signal competition in the pathology of SCI. In addition, evaluation of centrality measures of single nodes within the rich-club have revealed that neuronal growth factor (NGF), caspase 3, and H-Ras are the most central nodes and potentially an interesting targets for therapy. Our integrative approach uncovers the molecular architecture of SCI interactome, and provide an essential resource for evaluating significant therapeutic candidates.  相似文献   

11.
12.
Familial hypercholesterolemia (FH) is a monogenic lipid disorder which promotes atherosclerosis and cardiovascular diseases. Owing to the lack of sufficient published information, this study aims to identify the potential genetic biomarkers for FH by studying the global gene expression profile of blood cells. The microarray expression data of FH patients and controls was analyzed by different computational biology methods like differential expression analysis, protein network mapping, hub gene identification, functional enrichment of biological pathways, and immune cell restriction analysis. Our results showed the dysregulated expression of 115 genes connected to lipid homeostasis, immune responses, cell adhesion molecules, canonical Wnt signaling, mucin type O-glycan biosynthesis pathways in FH patients. The findings from expanded protein interaction network construction with known FH genes and subsequent Gene Ontology (GO) annotations have also supported the above findings, in addition to identifying the involvement of dysregulated thyroid hormone and ErbB signaling pathways in FH patients. The genes like CSNK1A1, JAK3, PLCG2, RALA, and ZEB2 were found to be enriched under all GO annotation categories. The subsequent phenotype ontology results have revealed JAK3I, PLCG2, and ZEB2 as key hub genes contributing to the inflammation underlying cardiovascular and immune response related phenotypes. Immune cell restriction findings show that above three genes are highly expressed by T-follicular helper CD4+ T cells, naïve B cells, and monocytes, respectively. These findings not only provide a theoretical basis to understand the role of immune dysregulations underlying the atherosclerosis among FH patients but may also pave the way to develop genomic medicine for cardiovascular diseases.  相似文献   

13.
Plant protein-protein interaction networks have not been identified by large-scale experiments. In order to better understand the protein interactions in rice, the Predicted Rice Interactome Network (PRIN; http://bis.zju.edu.cn/prin/) presented 76,585 predicted interactions involving 5,049 rice proteins. After mapping genomic features of rice (GO annotation, subcellular localization prediction, and gene expression), we found that a well-annotated and biologically significant network is rich enough to capture many significant functional linkages within higher-order biological systems, such as pathways and biological processes. Furthermore, we took MADS-box domain-containing proteins and circadian rhythm signaling pathways as examples to demonstrate that functional protein complexes and biological pathways could be effectively expanded in our predicted network. The expanded molecular network in PRIN has considerably improved the capability of these analyses to integrate existing knowledge and provide novel insights into the function and coordination of genes and gene networks.  相似文献   

14.
An integral part of functional genomics studies is to assess the enrichment of specific biological terms in lists of genes found to be playing an important role in biological phenomena. Contrasting the observed frequency of annotated terms with those of the background is at the core of overrepresentation analysis (ORA). Gene Ontology (GO) is a means to consistently classify and annotate gene products and has become a mainstay in ORA. Alternatively, Medical Subject Headings (MeSH) offers a comprehensive life science vocabulary including additional categories that are not covered by GO. Although MeSH is applied predominantly in human and model organism research, its full potential in livestock genetics is yet to be explored. In this study, MeSH ORA was evaluated to discern biological properties of identified genes and contrast them with the results obtained from GO enrichment analysis. Three published datasets were employed for this purpose, representing a gene expression study in dairy cattle, the use of SNPs for genome‐wide prediction in swine and the identification of genomic regions targeted by selection in horses. We found that several overrepresented MeSH annotations linked to these gene sets share similar concepts with those of GO terms. Moreover, MeSH yielded unique annotations, which are not directly provided by GO terms, suggesting that MeSH has the potential to refine and enrich the representation of biological knowledge. We demonstrated that MeSH can be regarded as another choice of annotation to draw biological inferences from genes identified via experimental analyses. When used in combination with GO terms, our results indicate that MeSH can enhance our functional interpretations for specific biological conditions or the genetic basis of complex traits in livestock species.  相似文献   

15.
Information Flow Analysis of Interactome Networks   总被引:1,自引:0,他引:1  
Recent studies of cellular networks have revealed modular organizations of genes and proteins. For example, in interactome networks, a module refers to a group of interacting proteins that form molecular complexes and/or biochemical pathways and together mediate a biological process. However, it is still poorly understood how biological information is transmitted between different modules. We have developed information flow analysis, a new computational approach that identifies proteins central to the transmission of biological information throughout the network. In the information flow analysis, we represent an interactome network as an electrical circuit, where interactions are modeled as resistors and proteins as interconnecting junctions. Construing the propagation of biological signals as flow of electrical current, our method calculates an information flow score for every protein. Unlike previous metrics of network centrality such as degree or betweenness that only consider topological features, our approach incorporates confidence scores of protein–protein interactions and automatically considers all possible paths in a network when evaluating the importance of each protein. We apply our method to the interactome networks of Saccharomyces cerevisiae and Caenorhabditis elegans. We find that the likelihood of observing lethality and pleiotropy when a protein is eliminated is positively correlated with the protein's information flow score. Even among proteins of low degree or low betweenness, high information scores serve as a strong predictor of loss-of-function lethality or pleiotropy. The correlation between information flow scores and phenotypes supports our hypothesis that the proteins of high information flow reside in central positions in interactome networks. We also show that the ranks of information flow scores are more consistent than that of betweenness when a large amount of noisy data is added to an interactome. Finally, we combine gene expression data with interaction data in C. elegans and construct an interactome network for muscle-specific genes. We find that genes that rank high in terms of information flow in the muscle interactome network but not in the entire network tend to play important roles in muscle function. This framework for studying tissue-specific networks by the information flow model can be applied to other tissues and other organisms as well.  相似文献   

16.
A large volume of honey bee (Apis mellifera) tag-seq was obtained to identify differential gene expression via Solexa/lllumina Digital Gene Expression tag profiling (DGE) based on next generation sequencing. In total, 4,286,250 (foragers) and 3,422,327 (nurses) clean tags were sequenced, 24,568 (foragers) and 13,134 (nurses) distinct clean tags could not be match to the reference database, and 7508 and 6875 mapped genes were detected in foragers and nurses respectively. 7045 genes were found differentially expressed between foragers and nurses. Of those genes, 1621genes had significantly different expression, that is, they showed an expression ratio (foragers/nurses) of more than 2 and FDR (False Discovery Rate) of less than 0.001. We identified 101 genes that were uniquely expressed in foragers, and 9 genes that were only expressed in nurses. We performed the Gene Ontology (GO) category and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis, and found 415 genes with annotation terms linked to the GO cellular component category. 200 components of KEGG pathways were obtained, including 21 signaling pathways. The PPAR signaling pathway was the most highly enriched, with the lowest Q-value.  相似文献   

17.
Euphorbiaceae represents flowering plants family of tropical and sub-tropical region rich in secondary metabolites of economic importance. To understand and assess the genetic makeup among the members, this study was undertaken to characterize and compare SSR markers from publicly available ESTs and GSSs of nine selected species of the family. Mining of SSRs was performed by MISA, primer designing by Primer3, while functional annotation, gene ontology (GO) and enrichment analysis were performed by Blast2GO. A total 12,878 number of SSRs were detected from 101,701 number of EST sequences. SSR density ranged from 1 SSR/3.22 kb to 1 SSR/15.65 kb. A total of 1873 primer pairs were designed for the annotated SSR-Contigs. About 77.07% SSR–ESTs could be assigned a significant match to the protein database. 3037 unique SSR–FDM were assigned and IPR003657 (WRKY Domain) was found to be the most dominant FDM among the members. 1810 unique GO terms obtained were further subjected to enrichment analysis to obtain 513 statistically significant GO terms mapped to the SSR containing ESTs. Most frequent enriched GO terms were, GO:0003824 for molecular function, GO:0006350 for biological process and GO:0005886 for cellular component, justifying the richness of defensive secondary metabolites and phytomedicine within the family. The results from this study provides tangible insight to genetic make-up and distribution of SSRs. Functional annotation corresponded many genes of unknown functions which may be considered as novel genes or genes responsible for stress specific secondary metabolites. Further studies are required to understand stress specific genes accountable for leveraging the synthesis of secondary metabolites.  相似文献   

18.
19.
20.
The aim of this study was to explore the dysregulated expression of the immune system in pancreatic cancer and clarify the pathogenesis of pancreatic cancer. The Dataset GSE15471 was downloaded from GEO database, Student’s t test was used to screen differentially expressed genes (DEGs) between the pancreatic cancer group and the normal control group. Kyoto Encyclopedia of Genes and Genomes (KEGG) provides functional annotation was employed to explore the significant DEGs involved in biological functions. We got 988 significantly DEGs, including 832 up-regulated genes and 156 down-regulated genes. The ratio of up-regulated genes and down-regulated genes was 5.3. Total 13 biological pathways which were significant enriched with DEGs by KEGG pathway enrichment analysis. Finally, we constructed a overall network of the immune system in pancreatic cancer with these biological pathways information. Our study reveals that dysregulated pathways in pancreatic cancer associated with the immune system. Besides, we also identify some important molecular biomarkers of the pancreatic cancer, including CXCR4 and CD4. Dysfunctional pathways and important molecular biomarkers of pancreatic cancer will provide useful information for potential treatment of pancreatic cancer.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号