首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
ABSTRACT: BACKGROUND: Statistical analyses of whole genome expression data require functional information about genes in order to yield meaningful biological conclusions. The Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) are common sources of functionally grouped gene sets. For bacteria, the SEED and MicrobesOnline provide alternative, complementary sources of gene sets. To date, no comprehensive evaluation of the data obtained from these resources has been performed. RESULTS: We define a series of gene set consistency metrics directly related to the most common classes of statistical analyses for gene expression data, and then perform a comprehensive analysis of 3581 Affymetrix gene expression arrays across 17 diverse bacteria. We find that gene sets obtained from GO and KEGG demonstrate lower consistency than those obtained from the SEED and MicrobesOnline, regardless of gene set size. CONCLUSIONS: Despite the widespread use of GO and KEGG gene sets in bacterial gene expression data analysis, the SEED and MicrobesOnline provide more consistent sets for a wide variety of statistical analyses such data. Increased use of the SEED and MicrobesOnline gene sets in the analysis of bacterial gene expression data may improve statistical power and utility of expression data.  相似文献   

2.
3.
Pathway analysis, also known as gene-set enrichment analysis, is a multilocus analytic strategy that integrates a priori, biological knowledge into the statistical analysis of high-throughput genetics data. Originally developed for the studies of gene expression data, it has become a powerful analytic procedure for indepth mining of genome-wide genetic variation data. Astonishing discoveries were made in the past years,uncovering genes and biological mechanisms underlying common and complex disorders. However, as massive amounts of diverse functional genomics data accrue, there is a pressing need for newer generations of pathway analysis methods that can utilize multiple layers of high-throughput genomics data. In this review, we provide an intellectual foundation of this powerful analytic strategy, as well as an update of the state-of-the-art in recent method developments. The goal of this review is threefold:(1) introduce the motivation and basic steps of pathway analysis for genome-wide genetic variation data;(2) review the merits and the shortcomings of classic and newly emerging integrative pathway analysis tools; and(3)discuss remaining challenges and future directions for further method developments.  相似文献   

4.
Schizophrenia is a complex genetic disorder. Gene set-based analytic (GSA) methods have been widely applied for exploratory analyses of large, high-throughput datasets, but less commonly employed for biological hypothesis testing. Our primary hypothesis is that variation in ion channel genes contribute to the genetic susceptibility to schizophrenia. We applied Exploratory Visual Analysis (EVA), one GSA application, to analyze European-American (EA) and African-American (AA) schizophrenia genome-wide association study datasets for statistical enrichment of ion channel gene sets, comparing GSA results derived under three SNP-to-gene mapping strategies: (1) GENIC; (2) 500-Kb; (3) 2.5-Mb and three complimentary SNP-to-gene statistical reduction methods: (1) minimum p value (pMIN); (2) a novel method, proportion of SNPs per Gene with p values below a pre-defined α-threshold (PROP); and (3) the truncated product method (TPM). In the EA analyses, ion channel gene set(s) were enriched under all mapping and statistical approaches. In the AA analysis, ion channel gene set(s) were significantly enriched under pMIN for all mapping strategies and under PROP for broader mapping strategies. Less extensive enrichment in the AA sample may reflect true ethnic differences in susceptibility, sampling or case ascertainment differences, or higher dimensionality relative to sample size of the AA data. More consistent findings under broader mapping strategies may reflect enhanced power due to increased SNP inclusion, enhanced capture of effects over extended haplotypes or significant contributions from regulatory regions. While extensive pMIN findings may reflect gene size bias, the extent and significance of PROP and TPM findings suggest that common variation at ion channel genes may capture some of the heritability of schizophrenia.  相似文献   

5.
In DNA microarray studies, gene-set analysis (GSA) has become the focus of gene expression data analysis. GSA utilizes the gene expression profiles of functionally related gene sets in Gene Ontology (GO) categories or priori-defined biological classes to assess the significance of gene sets associated with clinical outcomes or phenotypes. Many statistical approaches have been proposed to determine whether such functionally related gene sets express differentially (enrichment and/or deletion) in variations of phenotypes. However, little attention has been given to the discriminatory power of gene sets and classification of patients.  相似文献   

6.
An integral part of functional genomics studies is to assess the enrichment of specific biological terms in lists of genes found to be playing an important role in biological phenomena. Contrasting the observed frequency of annotated terms with those of the background is at the core of overrepresentation analysis (ORA). Gene Ontology (GO) is a means to consistently classify and annotate gene products and has become a mainstay in ORA. Alternatively, Medical Subject Headings (MeSH) offers a comprehensive life science vocabulary including additional categories that are not covered by GO. Although MeSH is applied predominantly in human and model organism research, its full potential in livestock genetics is yet to be explored. In this study, MeSH ORA was evaluated to discern biological properties of identified genes and contrast them with the results obtained from GO enrichment analysis. Three published datasets were employed for this purpose, representing a gene expression study in dairy cattle, the use of SNPs for genome‐wide prediction in swine and the identification of genomic regions targeted by selection in horses. We found that several overrepresented MeSH annotations linked to these gene sets share similar concepts with those of GO terms. Moreover, MeSH yielded unique annotations, which are not directly provided by GO terms, suggesting that MeSH has the potential to refine and enrich the representation of biological knowledge. We demonstrated that MeSH can be regarded as another choice of annotation to draw biological inferences from genes identified via experimental analyses. When used in combination with GO terms, our results indicate that MeSH can enhance our functional interpretations for specific biological conditions or the genetic basis of complex traits in livestock species.  相似文献   

7.
REVIGO summarizes and visualizes long lists of gene ontology terms   总被引:1,自引:0,他引:1  
Outcomes of high-throughput biological experiments are typically interpreted by statistical testing for enriched gene functional categories defined by the Gene Ontology (GO). The resulting lists of GO terms may be large and highly redundant, and thus difficult to interpret.REVIGO is a Web server that summarizes long, unintelligible lists of GO terms by finding a representative subset of the terms using a simple clustering algorithm that relies on semantic similarity measures. Furthermore, REVIGO visualizes this non-redundant GO term set in multiple ways to assist in interpretation: multidimensional scaling and graph-based visualizations accurately render the subdivisions and the semantic relationships in the data, while treemaps and tag clouds are also offered as alternative views. REVIGO is freely available at http://revigo.irb.hr/.  相似文献   

8.

Background

Communalities between large sets of genes obtained from high-throughput experiments are often identified by searching for enrichments of genes with the same Gene Ontology (GO) annotations. The GO analysis tools used for these enrichment analyses assume that GO terms are independent and the semantic distances between all parent–child terms are identical, which is not true in a biological sense. In addition these tools output lists of often redundant or too specific GO terms, which are difficult to interpret in the context of the biological question investigated by the user. Therefore, there is a demand for a robust and reliable method for gene categorization and enrichment analysis.

Results

We have developed Categorizer, a tool that classifies genes into user-defined groups (categories) and calculates p-values for the enrichment of the categories. Categorizer identifies the biologically best-fit category for each gene by taking advantage of a specialized semantic similarity measure for GO terms. We demonstrate that Categorizer provides improved categorization and enrichment results of genetic modifiers of Huntington’s disease compared to a classical GO Slim-based approach or categorizations using other semantic similarity measures.

Conclusion

Categorizer enables more accurate categorizations of genes than currently available methods. This new tool will help experimental and computational biologists analyzing genomic and proteomic data according to their specific needs in a more reliable manner.  相似文献   

9.
SNP-based gene-set enrichment analysis from single nucleotide polymorphisms, or GSEA-SNP, is a tool to identify candidate genes based on enrichment analysis of sets of genes rather than single SNP associations. The objective of this study was to identify modest-effect genes associated with Mycobacterium avium subsp. paratuberculosis (Map) tissue infection or fecal shedding using GSEA-SNP applied to KEGG pathways or Gene Ontology (GO) gene sets. The Illumina Bovine SNP50 BeadChip was used to genotype 209 Holstein cows for the GSEA-SNP analyses. For each of 13,744 annotated genes genome-wide located within 50 kb of a Bovine SNP50 SNP, the single SNP with the highest Cochran-Armitage Max statistic was used as a proxy statistic for that gene’s strength of affiliation with Map. Gene-set enrichment was tested using a weighted Kolmogorov-Smirnov-like running sum statistic with data permutation to adjust for multiple testing. For tissue infection and fecal shedding, no gene sets in KEGG pathways or in GO sets for molecular function or cellular component were enriched for signal. The GO biological process gene set for positive regulation of cell motion (GO:0051272, q = 0.039, 5/11 genes contributing to the core enrichment) was enriched for Map tissue infection, while no GO biological process gene sets were enriched for fecal shedding. GSEA-SNP complements traditional SNP association approaches to identify genes of modest effects as well as genes with larger effects as demonstrated by the identification of one locus that we previously found to be associated with Map tissue infection using a SNP-by-SNP genome-wide association study.  相似文献   

10.
11.
The interpretation of data-driven experiments in genomics often involves a search for biological categories that are enriched for the responder genes identified by the experiments. However, knowledge bases such as the Gene Ontology (GO) contain hundreds or thousands of categories with very high overlap between categories. Thus, enrichment analysis performed on one category at a time frequently returns large numbers of correlated categories, leaving the choice of the most relevant ones to the user''s; interpretation.Here we present model-based gene set analysis (MGSA) that analyzes all categories at once by embedding them in a Bayesian network, in which gene response is modeled as a function of the activation of biological categories. Probabilistic inference is used to identify the active categories. The Bayesian modeling approach naturally takes category overlap into account and avoids the need for multiple testing corrections met in single-category enrichment analysis. On simulated data, MGSA identifies active categories with up to 95% precision at a recall of 20% for moderate settings of noise, leading to a 10-fold precision improvement over single-category statistical enrichment analysis. Application to a gene expression data set in yeast demonstrates that the method provides high-level, summarized views of core biological processes and correctly eliminates confounding associations.  相似文献   

12.
An understanding of heart development is critical in any systems biology approach to cardiovascular disease. The interpretation of data generated from high-throughput technologies (such as microarray and proteomics) is also essential to this approach. However, characterizing the role of genes in the processes underlying heart development and cardiovascular disease involves the non-trivial task of data analysis and integration of previous knowledge. The Gene Ontology (GO) Consortium provides structured controlled biological vocabularies that are used to summarize previous functional knowledge for gene products across all species. One aspect of GO describes biological processes, such as development and signaling.In order to support high-throughput cardiovascular research, we have initiated an effort to fully describe heart development in GO; expanding the number of GO terms describing heart development from 12 to over 280. This new ontology describes heart morphogenesis, the differentiation of specific cardiac cell types, and the involvement of signaling pathways in heart development. This work also aligns GO with the current views of the heart development research community and its representation in the literature. This extension of GO allows gene product annotators to comprehensively capture the genetic program leading to the developmental progression of the heart. This will enable users to integrate heart development data across species, resulting in the comprehensive retrieval of information about this subject.The revised GO structure, combined with gene product annotations, should improve the interpretation of data from high-throughput methods in a variety of cardiovascular research areas, including heart development, congenital cardiac disease, and cardiac stem cell research. Additionally, we invite the heart development community to contribute to the expansion of this important dataset for the benefit of future research in this area.  相似文献   

13.
《Genomics》2023,115(1):110528
Functional enrichment analysis is a cornerstone in bioinformatics as it makes possible to identify functional information by using a gene list as source. Different tools are available to compare gene ontology (GO) terms, based on a directed acyclic graph structure or content-based algorithms which are time-consuming and require a priori information of GO terms. Nevertheless, quantitative procedures to compare GO terms among gene lists and species are not available. Here we present a computational procedure, implemented in R, to infer functional information derived from comparative strategies. GOCompare provides a framework for functional comparative genomics starting from comparable lists from GO terms. The program uses functional enrichment analysis (FEA) results and implement graph theory to identify statistically relevant GO terms for both, GO categories and analyzed species. Thus, GOCompare allows finding new functional information complementing current FEA approaches and extending their use to a comparative perspective. To test our approach GO terms were obtained for a list of aluminum tolerance-associated genes in Oryza sativa subsp. japonica and their orthologues in Arabidopsis thaliana. GOCompare was able to detect functional similarities for reactive oxygen species and ion binding capabilities which are common in plants as molecular mechanisms to tolerate aluminum toxicity. Consequently, the R package exhibited a good performance when implemented in complex datasets, allowing to establish hypothesis that might explain a biological process from a functional perspective, and narrowing down the possible landscapes to design wet lab experiments.  相似文献   

14.

Background  

Large-scale statistical analyses have become hallmarks of post-genomic era biological research due to advances in high-throughput assays and the integration of large biological databases. One accompanying issue is the simultaneous estimation of p-values for a large number of hypothesis tests. In many applications, a parametric assumption in the null distribution such as normality may be unreasonable, and resampling-based p-values are the preferred procedure for establishing statistical significance. Using resampling-based procedures for multiple testing is computationally intensive and typically requires large numbers of resamples.  相似文献   

15.

Background  

Several tools have been developed to explore and search Gene Ontology (GO) databases allowing efficient GO enrichment analysis and GO tree visualization. Nevertheless, identification of highly specific GO-terms in complex data sets is relatively complicated and the display of GO term assignments and GO enrichment analysis by simple tables or pie charts is not optimal. Valuable information such as the hierarchical position of a single GO term within the GO tree (topological ordering), or enrichment within a complex set of biological experiments is not displayed. Pie charts based on GO tree levels are, themselves, one-dimensional graphs, which cannot properly or efficiently represent the hierarchical specificity for the biological system being studied.  相似文献   

16.
GOAT     
Understanding the composition of gene lists that result from high-throughput experiments requires elaborate processing of gene annotation lists. In this article we present GOAT (Gene Ontology Analysis Tool), a tool based on the statistical software 'R' for analysing Gene Ontologytrade mark (GO) term enrichment in gene lists. Given a gene list, GOAT calculates the enrichment and statistical significance of every GO term and generates graphical presentations of significantly enriched terms. GOAT works for any organism with a genome-scale GO annotation and allows easy updates of ontologies and annotations. AVAILABILITY: GOAT is freely available from http://dictygenome.org/software/GOAT/ CONTACT: Gad Shaulsky (gadi@bcm.tmc.edu).  相似文献   

17.
Ellagic acid (EA) is a natural polyphenolic compound. Recent studies have shown that EA has potential anticancer properties against gastric cancer (GC). This study aims to reveal the potential targets and mechanisms of EA against GC. This study adopted methods of bioinformatics analysis and network pharmacology, including the weighted gene co-expression network analysis (WGCNA), construction of protein–protein interaction (PPI) network, receiver operating characteristic (ROC) and Kaplan–Meier (KM) survival curve analysis, Gene Ontology (GO) function and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis, molecular docking and molecular dynamics simulations (MDS). A total of 540 EA targets were obtained. Through WGCNA, we obtained a total of 2914 GC clinical module genes, combined with the disease database for screening, a total of 606 GC-related targets and 79 intersection targets of EA and GC were obtained by constructing Venn diagram. PPI network was constructed to identify 14 core candidate targets; TP53, JUN, CASP3, HSP90AA1, VEGFA, HRAS, CDH1, MAPK3, CDKN1A, SRC, CYCS, BCL2L1 and CDK4 were identified as the key targets of EA regulation of GC by ROC and KM curve analysis. The enrichment analysis of GO and KEGG pathways of key targets was performed, and they were mainly enriched in p53 signalling pathway, PI3K-Akt signalling pathway. The results of molecular docking and MDS showed that EA could effectively bind to 13 key targets to form stable protein–ligand complexes. This study revealed the key targets and molecular mechanisms of EA against GC and provided a theoretical basis for further study of the pharmacological mechanism of EA against GC.  相似文献   

18.
Research in human-associated microbiomes often involves the analysis of taxonomic count tables generated via high-throughput sequencing. It is difficult to apply statistical tools as the data is high-dimensional, sparse, and compositional. An approachable way to alleviate high-dimensionality and sparsity is to aggregate variables into pre-defined sets. Set-based analysis is ubiquitous in the genomics literature and has demonstrable impact on improving interpretability and power of downstream analysis. Unfortunately, there is a lack of sophisticated set-based analysis methods specific to microbiome taxonomic data, where current practice often employs abundance summation as a technique for aggregation. This approach prevents comparison across sets of different sizes, does not preserve inter-sample distances, and amplifies protocol bias. Here, we attempt to fill this gap with a new single-sample taxon enrichment method that uses a novel log-ratio formulation based on the competitive null hypothesis commonly used in the enrichment analysis literature. Our approach, titled competitive balances for taxonomic enrichment analysis (CBEA), generates sample-specific enrichment scores as the scaled log-ratio of the subcomposition defined by taxa within a set and the subcomposition defined by its complement. We provide sample-level significance testing by estimating an empirical null distribution of our test statistic with valid p-values. Herein, we demonstrate, using both real data applications and simulations, that CBEA controls for type I error, even under high sparsity and high inter-taxa correlation scenarios. Additionally, CBEA provides informative scores that can be inputs to downstream analyses such as prediction tasks.  相似文献   

19.
20.
Advanced statistical methods used to analyze high-throughput data such as gene-expression assays result in long lists of “significant genes.” One way to gain insight into the significance of altered expression levels is to determine whether Gene Ontology (GO) terms associated with a particular biological process, molecular function, or cellular component are over- or under-represented in the set of genes deemed significant. This process, referred to as enrichment analysis, profiles a gene-set, and is widely used to makes sense of the results of high-throughput experiments. The canonical example of enrichment analysis is when the output dataset is a list of genes differentially expressed in some condition. To determine the biological relevance of a lengthy gene list, the usual solution is to perform enrichment analysis with the GO. We can aggregate the annotating GO concepts for each gene in this list, and arrive at a profile of the biological processes or mechanisms affected by the condition under study. While GO has been the principal target for enrichment analysis, the methods of enrichment analysis are generalizable. We can conduct the same sort of profiling along other ontologies of interest. Just as scientists can ask “Which biological process is over-represented in my set of interesting genes or proteins?” we can also ask “Which disease (or class of diseases) is over-represented in my set of interesting genes or proteins?“. For example, by annotating known protein mutations with disease terms from the ontologies in BioPortal, Mort et al. recently identified a class of diseases—blood coagulation disorders—that were associated with a 14-fold depletion in substitutions at O-linked glycosylation sites. With the availability of tools for automatic annotation of datasets with terms from disease ontologies, there is no reason to restrict enrichment analyses to the GO. In this chapter, we will discuss methods to perform enrichment analysis using any ontology available in the biomedical domain. We will review the general methodology of enrichment analysis, the associated challenges, and discuss the novel translational analyses enabled by the existence of public, national computational infrastructure and by the use of disease ontologies in such analyses.

What to Learn in This Chapter

  • Review the commonly used approach of Gene Ontology based enrichment analysis
  • Understand the pitfalls associated with current approaches
  • Understand the national infrastructure available for using alternative ontologies for enrichment analysis
  • Learn about a generalized enrichment analysis workflow and its application using disease ontologies
This article is part of the “Translational Bioinformatics” collection for PLOS Computational Biology.
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号