首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
GOAT     
Understanding the composition of gene lists that result from high-throughput experiments requires elaborate processing of gene annotation lists. In this article we present GOAT (Gene Ontology Analysis Tool), a tool based on the statistical software 'R' for analysing Gene Ontologytrade mark (GO) term enrichment in gene lists. Given a gene list, GOAT calculates the enrichment and statistical significance of every GO term and generates graphical presentations of significantly enriched terms. GOAT works for any organism with a genome-scale GO annotation and allows easy updates of ontologies and annotations. AVAILABILITY: GOAT is freely available from http://dictygenome.org/software/GOAT/ CONTACT: Gad Shaulsky (gadi@bcm.tmc.edu).  相似文献   

2.
An integral part of functional genomics studies is to assess the enrichment of specific biological terms in lists of genes found to be playing an important role in biological phenomena. Contrasting the observed frequency of annotated terms with those of the background is at the core of overrepresentation analysis (ORA). Gene Ontology (GO) is a means to consistently classify and annotate gene products and has become a mainstay in ORA. Alternatively, Medical Subject Headings (MeSH) offers a comprehensive life science vocabulary including additional categories that are not covered by GO. Although MeSH is applied predominantly in human and model organism research, its full potential in livestock genetics is yet to be explored. In this study, MeSH ORA was evaluated to discern biological properties of identified genes and contrast them with the results obtained from GO enrichment analysis. Three published datasets were employed for this purpose, representing a gene expression study in dairy cattle, the use of SNPs for genome‐wide prediction in swine and the identification of genomic regions targeted by selection in horses. We found that several overrepresented MeSH annotations linked to these gene sets share similar concepts with those of GO terms. Moreover, MeSH yielded unique annotations, which are not directly provided by GO terms, suggesting that MeSH has the potential to refine and enrich the representation of biological knowledge. We demonstrated that MeSH can be regarded as another choice of annotation to draw biological inferences from genes identified via experimental analyses. When used in combination with GO terms, our results indicate that MeSH can enhance our functional interpretations for specific biological conditions or the genetic basis of complex traits in livestock species.  相似文献   

3.
With numerous whole genomes now in hand, and experimental data about genes and biological pathways on the increase, a systems approach to biological research is becoming essential. Ontologies provide a formal representation of knowledge that is amenable to computational as well as human analysis, an obvious underpinning of systems biology. Mapping function to gene products in the genome consists of two, somewhat intertwined enterprises: ontology building and ontology annotation. Ontology building is the formal representation of a domain of knowledge; ontology annotation is association of specific genomic regions (which we refer to simply as 'genes', including genes and their regulatory elements and products such as proteins and functional RNAs) to parts of the ontology. We consider two complementary representations of gene function: the Gene Ontology (GO) and pathway ontologies. GO represents function from the gene's eye view, in relation to a large and growing context of biological knowledge at all levels. Pathway ontologies represent function from the point of view of biochemical reactions and interactions, which are ordered into networks and causal cascades. The more mature GO provides an example of ontology annotation: how conclusions from the scientific literature and from evolutionary relationships are converted into formal statements about gene function. Annotations are made using a variety of different types of evidence, which can be used to estimate the relative reliability of different annotations.  相似文献   

4.
OntoBlast allows one to find information about potential functions of proteins by presenting a weighted list of ontology entries associated with similar sequences from completely sequenced genomes identified in a BLAST search. It combines, in a single analysis step, the search for sequence similarities in several species with the association of information stored in ontologies. From each identified ontology term a list of genes, which share the functional annotation, can be retrieved. The OntoBlast function is an integral part of the 'Ontologies TO GenomeMatrix' tool which provides an alternative entry point from ontology terms to the Genome-Matrix database. OntoBlast's web interface is accessible on the 'Ontologies TO GenomeMatrix Gate' page at http://functionalgenomics.de/ontogate/.  相似文献   

5.
随着越来越多基因组测序的完成,人们可以获得大量的序列信息,如何利用这些信息对未知基因的功能进行预测是一个非常重要的问题.Blast是基本的预测新基因功能的工具,但是仅通过Blast的原始搜索结果,尚无法获得相关基因本体论(gene ontology,GO)注释信息.目前,用户为了获得新基因的GO注释信息,首先需要进行Blast搜索,然后用Blast搜索的结果到GO网站去查询相关的GO注释信息.这浪费了大量的时间,尤其是当Blast的结果数据量很大时.为此,基于GO分类系统,整合BLAST 的结果信息,结合bioperl模块,使用perl语言开发了GoBlast软件.通过GoBlast系统,对于新基因,研究人员只须1次分析运算,就可以同时获得Blast搜索结果和GO注释信息,从而有效地提高了基因功能注释的可信度,加速了功能基因组学的研究.GoBlast为B/S(Browser/Server)架构,用户客户端只要有浏览器程序,就可以通过国际互联网在http://bioq.org/goblast上使用GoBlast系统  相似文献   

6.
Next‐generation technologies generate an overwhelming amount of gene sequence data. Efficient annotation tools are required to make these data amenable to functional genomics analyses. The Mercator pipeline automatically assigns functional terms to protein or nucleotide sequences. It uses the MapMan ‘BIN’ ontology, which is tailored for functional annotation of plant ‘omics’ data. The classification procedure performs parallel sequence searches against reference databases, compiles the results and computes the most likely MapMan BINs for each query. In the current version, the pipeline relies on manually curated reference classifications originating from the three reference organisms (Arabidopsis, Chlamydomonas, rice), various other plant species that have a reviewed SwissProt annotation, and more than 2000 protein domain and family profiles at InterPro, CDD and KOG. Functional annotations predicted by Mercator achieve accuracies above 90% when benchmarked against manual annotation. In addition to mapping files for direct use in the visualization software MapMan, Mercator provides graphical overview charts, detailed annotation information in a convenient web browser interface and a MapMan‐to‐GO translation table to export results as GO terms. Mercator is available free of charge via http://mapman.gabipd.org/web/guest/app/Mercator .  相似文献   

7.
A system for "intelligent" semantic integration and querying of federated databases is being implemented by using three main components: A component which enables SQL access to integrated databases by database federation (MARGBench), an ontology based semantic metadatabase (SEMEDA) and an ontology based query interface (SEMEDA-query). In this publication we explain and demonstrate the principles, architecture and the use of SEMEDA. Since SEMEDA is implemented as 3 tiered web application database providers can enter all relevant semantic and technical information about their databases by themselves via a web browser. SEMEDA' s collaborative ontology editing feature is not restricted to database integration, and might also be useful for ongoing ontology developments, such as the "Gene Ontology" [2]. SEMEDA can be found at http://www-bm.cs.uni-magdeburg.de/semeda/. We explain how this ontologically structured information can be used for semantic database integration. In addition, requirements to ontologies for molecular biological database integration are discussed and relevant existing ontologies are evaluated. We further discuss how ontologies and structured knowledge sources can be used in SEMEDA and whether they can be merged supplemented or updated to meet the requirements for semantic database integration.  相似文献   

8.
MOTIVATION: There has been an explosion of interest in the role of mitochondria in programmed cell death and other fundamental pathological processes underlying the development of human diseases. Nevertheless, the inventory of mitochondrial proteins encoded in the nuclear genome remains incomplete, providing an impediment to mitochondrial research at the interface with systems biology. We created the MiGenes database to further define the scope of the mitochondrial proteome in humans and model organisms including mice, rats, flies and worms as well as budding and fission yeasts. MiGenes is intended to stimulate mitochondrial research using model organisms. SUMMARY: MiGenes is a large-scale relational database that is automatically updated to keep pace with advances in mitochondrial proteomics and is curated to assure that the designation of proteins as mitochondrial reflects gene ontology (GO) annotations supported by high-quality evidence codes. A set of postulates is proposed to help define which proteins are authentic components of mitochondria. MiGenes incorporates >1160 new GO annotations to human, mouse and rat protein records, 370 of which represent the first GO annotation reflecting a mitochondrial localization. MiGenes employs a flexible search interface that permits batchwise accession number searches to support high-throughput proteomic studies. A web interface is provided to permit members of the mitochondrial research community to suggest modifications in protein annotations or mitochondrial status.  相似文献   

9.
The Gene Ontology (GO) project provides a controlled vocabulary to facilitate high-quality functional gene annotation for all species. Genes in biological databases are linked to GO terms, allowing biologists to ask questions about gene function in a manner independent of species. This tutorial provides an introduction for biologists to the GO resources and covers three of the most common methods of querying GO: by individual gene, by gene function and by using a list of genes. [For the sake of brevity, the term 'gene' is used throughout this paper to refer to genes and their products (proteins and RNAs). GO annotations are always based on the characteristics of gene products, even though it may be the gene that is cited in the annotation.].  相似文献   

10.

Background

Biomedical ontologies are increasingly instrumental in the advancement of biological research primarily through their use to efficiently consolidate large amounts of data into structured, accessible sets. However, ontology development and usage can be hampered by the segregation of knowledge by domain that occurs due to independent development and use of the ontologies. The ability to infer data associated with one ontology to data associated with another ontology would prove useful in expanding information content and scope. We here focus on relating two ontologies: the Gene Ontology (GO), which encodes canonical gene function, and the Mammalian Phenotype Ontology (MP), which describes non-canonical phenotypes, using statistical methods to suggest GO functional annotations from existing MP phenotype annotations. This work is in contrast to previous studies that have focused on inferring gene function from phenotype primarily through lexical or semantic similarity measures.

Results

We have designed and tested a set of algorithms that represents a novel methodology to define rules for predicting gene function by examining the emergent structure and relationships between the gene functions and phenotypes rather than inspecting the terms semantically. The algorithms inspect relationships among multiple phenotype terms to deduce if there are cases where they all arise from a single gene function.We apply this methodology to data about genes in the laboratory mouse that are formally represented in the Mouse Genome Informatics (MGI) resource. From the data, 7444 rule instances were generated from five generalized rules, resulting in 4818 unique GO functional predictions for 1796 genes.

Conclusions

We show that our method is capable of inferring high-quality functional annotations from curated phenotype data. As well as creating inferred annotations, our method has the potential to allow for the elucidation of unforeseen, biologically significant associations between gene function and phenotypes that would be overlooked by a semantics-based approach. Future work will include the implementation of the described algorithms for a variety of other model organism databases, taking full advantage of the abundance of available high quality curated data.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-014-0405-z) contains supplementary material, which is available to authorized users.  相似文献   

11.
Microarray technology has become employed widely for biological researchers to identify genes associated with conditions such as diseases and drugs. To date, many methods have been developed to analyze data covering a large number of genes, but they focus only on statistical significance and cannot decipher the data with biological concepts. Gene Ontology (GO) is utilized to understand the data with biological interpretation; however, it is restricted to specific ontology such as biological process, molecular function, and cellular component. Here, we attempted to apply MeSH (Medical Subject Headings) to interpret groups of genes from biological viewpoint. To assign MeSH terms to genes, in this study, contexts associated with genes are retrieved from full set of MEDLINE data using machine learning, and then extracted MeSH terms from retrieved articles. Utilizing the developed method, we implemented a software called BioCompass. It generates high-scoring lists and hierarchical lists for diseases MeSH terms associated with groups of genes to utilize MeSH and GO tree, and illustrated a wiring diagram by linking genes with extracted association from articles. Researchers can easily retrieve genes and keywords of interest, such as diseases and drugs, associated with groups of genes. Using retrieved MeSH terms and OMIM in conjunction with, we could obtain more disease information associated with target gene. BioCompass helps researchers to interpret groups of genes such as microarray data from a biological viewpoint.  相似文献   

12.
The Gene Ontology Categorizer, developed jointly by the Los Alamos National Laboratory and Procter & Gamble Corp., provides a capability for the categorization task in the Gene Ontology (GO): given a list of genes of interest, what are the best nodes of the GO to summarize or categorize that list? The motivating question is from a drug discovery process, where after some gene expression analysis experiment, we wish to understand the overall effect of some cell treatment or condition by identifying 'where' in the GO the differentially expressed genes fall: 'clustered' together in one place? in two places? uniformly spread throughout the GO? 'high', or 'low'? In order to address this need, we view bio-ontologies more as combinatorially structured databases than facilities for logical inference, and draw on the discrete mathematics of finite partially ordered sets (posets) to develop data representation and algorithms appropriate for the GO. In doing so, we have laid the foundations for a general set of methods to address not just the categorization task, but also other tasks (e.g. distances in ontologies and ontology merger and exchange) in both the GO and other bio-ontologies (such as the Enzyme Commission database or the MEdical Subject Headings) cast as hierarchically structured taxonomic knowledge systems.  相似文献   

13.
This research analyzes some aspects of the relationship between gene expression, gene function, and gene annotation. Many recent studies are implicitly based on the assumption that gene products that are biologically and functionally related would maintain this similarity both in their expression profiles as well as in their gene ontology (GO) annotation. We analyze how accurate this assumption proves to be using real publicly available data. We also aim to validate a measure of semantic similarity for GO annotation. We use the Pearson correlation coefficient and its absolute value as a measure of similarity between expression profiles of gene products. We explore a number of semantic similarity measures (Resnik, Jiang, and Lin) and compute the similarity between gene products annotated using the GO. Finally, we compute correlation coefficients to compare gene expression similarity against GO semantic similarity. Our results suggest that the Resnik similarity measure outperforms the others and seems better suited for use in gene ontology. We also deduce that there seems to be correlation between semantic similarity in the GO annotation and gene expression for the three GO ontologies. We show that this correlation is negligible up to a certain semantic similarity value; then, for higher similarity values, the relationship trend becomes almost linear. These results can be used to augment the knowledge provided by clustering algorithms and in the development of bioinformatic tools for finding and characterizing gene products.  相似文献   

14.
Advanced statistical methods used to analyze high-throughput data such as gene-expression assays result in long lists of “significant genes.” One way to gain insight into the significance of altered expression levels is to determine whether Gene Ontology (GO) terms associated with a particular biological process, molecular function, or cellular component are over- or under-represented in the set of genes deemed significant. This process, referred to as enrichment analysis, profiles a gene-set, and is widely used to makes sense of the results of high-throughput experiments. The canonical example of enrichment analysis is when the output dataset is a list of genes differentially expressed in some condition. To determine the biological relevance of a lengthy gene list, the usual solution is to perform enrichment analysis with the GO. We can aggregate the annotating GO concepts for each gene in this list, and arrive at a profile of the biological processes or mechanisms affected by the condition under study. While GO has been the principal target for enrichment analysis, the methods of enrichment analysis are generalizable. We can conduct the same sort of profiling along other ontologies of interest. Just as scientists can ask “Which biological process is over-represented in my set of interesting genes or proteins?” we can also ask “Which disease (or class of diseases) is over-represented in my set of interesting genes or proteins?“. For example, by annotating known protein mutations with disease terms from the ontologies in BioPortal, Mort et al. recently identified a class of diseases—blood coagulation disorders—that were associated with a 14-fold depletion in substitutions at O-linked glycosylation sites. With the availability of tools for automatic annotation of datasets with terms from disease ontologies, there is no reason to restrict enrichment analyses to the GO. In this chapter, we will discuss methods to perform enrichment analysis using any ontology available in the biomedical domain. We will review the general methodology of enrichment analysis, the associated challenges, and discuss the novel translational analyses enabled by the existence of public, national computational infrastructure and by the use of disease ontologies in such analyses.

What to Learn in This Chapter

  • Review the commonly used approach of Gene Ontology based enrichment analysis
  • Understand the pitfalls associated with current approaches
  • Understand the national infrastructure available for using alternative ontologies for enrichment analysis
  • Learn about a generalized enrichment analysis workflow and its application using disease ontologies
This article is part of the “Translational Bioinformatics” collection for PLOS Computational Biology.
  相似文献   

15.
SUMMARY: Modern experimental techniques, as for example DNA microarrays, as a result usually produce a long list of genes, which are potentially interesting in the analyzed process. In order to gain biological understanding from this type of data, it is necessary to analyze the functional annotations of all genes in this list. The Gene-Ontology (GO) database provides a useful tool to annotate and analyze the functions of a large number of genes. Here, we introduce a tool that utilizes this information to obtain an understanding of which annotations are typical for the analyzed list of genes. This program automatically obtains the GO annotations from a database and generates statistics of which annotations are overrepresented in the analyzed list of genes. This results in a list of GO terms sorted by their specificity. AVAILABILITY: Our program GOstat is accessible via the Internet at http://gostat.wehi.edu.au  相似文献   

16.
SUMMARY: "Database Referencing of Array Genes ONline" or "DRAGON" is a web-accessible database that aids in the analysis of differential gene expression data as a biological annotation tool. Users of DRAGON can submit data sets containing large lists of genes and then choose particular characteristics that DRAGON supplies for all genes on the list rapidly and simultaneously. AVAILABILITY: The DRAGON database is available for queries on the DRAGON web site www.kennedykrieger.org/pevsnerlab/dragon.htm. CONTACT: pevsner@kennedykrieger.org or cbouton@jhmi.edu  相似文献   

17.
SUMMARY: The NetAffx Gene Ontology (GO) Mining Tool is a web-based, interactive tool that permits traversal of the GO graph in the context of microarray data. It accepts a list of Affymetrix probe sets and renders a GO graph as a heat map colored according to significance measurements. The rendered graph is interactive, with nodes linked to public web sites and to lists of the relevant probe sets. The GO Mining Tool provides visualization combining biological annotation with expression data, encompassing thousands of genes in one interactive view. AVAILABILITY: GO Mining Tool is freely available at http://www.affymetrix.com/analysis/query/go_analysis.affx  相似文献   

18.
Additional gene ontology structure for improved biological reasoning   总被引:5,自引:0,他引:5  
MOTIVATION: The Gene Ontology (GO) is a widely used terminology for gene product characterization in, for example, interpretation of biology underlying microarray experiments. The current GO defines term relationships within each of the independent subontologies: molecular function, biological process and cellular component. However, it is evident that there also exist biological relationships between terms of different subontologies. Our aim was to connect the three subontologies to enable GO to cover more biological knowledge, enable a more consistent use of GO and provide new opportunities for biological reasoning. RESULTS: We propose a new structure, the Second Gene Ontology Layer, capturing biological relations not directly reflected in the present ontology structure. Given molecular functions, these paths identify biological processes where the molecular functions are involved and cellular components where they are active. The current Second Layer contains 6271 validated paths, covering 54% of the molecular functions of GO and can be used to render existing gene annotation sets more complete and consistent. Applying Second Layer paths to a set of 4223 human genes, increased biological process annotations by 24% compared to publicly available annotations and reproduced 30% of them. AVAILABILITY: The Second GO is publicly available through the GO Annotation Toolbox (GOAT.no): http://www.goat.no.  相似文献   

19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号