首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Learnability-based further prediction of gene functions in Gene Ontology   总被引:9,自引:0,他引:9  
Tu K  Yu H  Guo Z  Li X 《Genomics》2004,84(6):922-928
Currently the functional annotations of many genes are not specific enough, limiting their further application in biology and medicine. It is necessary to push the gene functional annotations deeper in Gene Ontology (GO), or to predict further annotated genes with more specific GO terms. A framework of learnability-based further prediction of gene functions in GO is proposed in this paper. Local classifiers are constructed in local classification spaces rooted at qualified parent nodes in GO, and their classification performances are evaluated with the averaged Tanimoto index (ATI). Classification spaces with higher ATIs are selected out, and genes annotated only to the parent classes are predicted to child classes. Through learnability-based further predicting, the functional annotations of annotated genes are made more specific. Experiments on the fibroblast serum response dataset reported further functional predictions for several human genes and also gave interesting clues to the varied learnability between classes of different GO ontologies, different levels, and different numbers of child classes.  相似文献   

2.
MOTIVATION: Despite advances in the gene annotation process, the functions of a large portion of gene products remain insufficiently characterized. In addition, the in silico prediction of novel Gene Ontology (GO) annotations for partially characterized gene functions or processes is highly dependent on reverse genetic or functional genomic approaches. To our knowledge, no prediction method has been demonstrated to be highly accurate for sparsely annotated GO terms (those associated to fewer than 10 genes). RESULTS: We propose a novel approach, information theory-based semantic similarity (ITSS), to automatically predict molecular functions of genes based on existing GO annotations. Using a 10-fold cross-validation, we demonstrate that the ITSS algorithm obtains prediction accuracies (precision 97%, recall 77%) comparable to other machine learning algorithms when compared in similar conditions over densely annotated portions of the GO datasets. This method is able to generate highly accurate predictions in sparsely annotated portions of GO, where previous algorithms have failed. As a result, our technique generates an order of magnitude more functional predictions than previous methods. A 10-fold cross validation demonstrated a precision of 90% at a recall of 36% for the algorithm over sparsely annotated networks of the recent GO annotations (about 1400 GO terms and 11,000 genes in Homo sapiens). To our knowledge, this article presents the first historical rollback validation for the predicted GO annotations, which may represent more realistic conditions than more widely used cross-validation approaches. By manually assessing a random sample of 100 predictions conducted in a historical rollback evaluation, we estimate that a minimum precision of 51% (95% confidence interval: 43-58%) can be achieved for the human GO Annotation file dated 2003. AVAILABILITY: The program is available on request. The 97,732 positive predictions of novel gene annotations from the 2005 GO Annotation dataset and other supplementary information is available at http://phenos.bsd.uchicago.edu/ITSS/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

3.
4.
An integral part of functional genomics studies is to assess the enrichment of specific biological terms in lists of genes found to be playing an important role in biological phenomena. Contrasting the observed frequency of annotated terms with those of the background is at the core of overrepresentation analysis (ORA). Gene Ontology (GO) is a means to consistently classify and annotate gene products and has become a mainstay in ORA. Alternatively, Medical Subject Headings (MeSH) offers a comprehensive life science vocabulary including additional categories that are not covered by GO. Although MeSH is applied predominantly in human and model organism research, its full potential in livestock genetics is yet to be explored. In this study, MeSH ORA was evaluated to discern biological properties of identified genes and contrast them with the results obtained from GO enrichment analysis. Three published datasets were employed for this purpose, representing a gene expression study in dairy cattle, the use of SNPs for genome‐wide prediction in swine and the identification of genomic regions targeted by selection in horses. We found that several overrepresented MeSH annotations linked to these gene sets share similar concepts with those of GO terms. Moreover, MeSH yielded unique annotations, which are not directly provided by GO terms, suggesting that MeSH has the potential to refine and enrich the representation of biological knowledge. We demonstrated that MeSH can be regarded as another choice of annotation to draw biological inferences from genes identified via experimental analyses. When used in combination with GO terms, our results indicate that MeSH can enhance our functional interpretations for specific biological conditions or the genetic basis of complex traits in livestock species.  相似文献   

5.
Experimentally characterized enhancer regions have previously been shown to display specific patterns of enrichment for several different histone modifications. We modelled these enhancer chromatin profiles in the human genome and used them to guide the search for novel enhancers derived from transposable element (TE) sequences. To do this, a computational approach was taken to analyze the genome-wide histone modification landscape characterized by the ENCODE project in two human hematopoietic cell types, GM12878 and K562. We predicted the locations of 2,107 and 1,448 TE-derived enhancers in the GM12878 and K562 cell lines respectively. A vast majority of these putative enhancers are unique to each cell line; only 3.5% of the TE-derived enhancers are shared between the two. We evaluated the functional effect of TE-derived enhancers by associating them with the cell-type specific expression of nearby genes, and found that the number of TE-derived enhancers is strongly positively correlated with the expression of nearby genes in each cell line. Furthermore, genes that are differentially expressed between the two cell lines also possess a divergent number of TE-derived enhancers in their vicinity. As such, genes that are up-regulated in the GM12878 cell line and down-regulated in K562 have significantly more TE-derived enhancers in their vicinity in the GM12878 cell line and vice versa. These data indicate that human TE-derived sequences are likely to be involved in regulating cell-type specific gene expression on a broad scale and suggest that the enhancer activity of TE-derived sequences is mediated by epigenetic regulatory mechanisms.  相似文献   

6.
7.
The Gene Ontology (GO) project provides a controlled vocabulary to facilitate high-quality functional gene annotation for all species. Genes in biological databases are linked to GO terms, allowing biologists to ask questions about gene function in a manner independent of species. This tutorial provides an introduction for biologists to the GO resources and covers three of the most common methods of querying GO: by individual gene, by gene function and by using a list of genes. [For the sake of brevity, the term 'gene' is used throughout this paper to refer to genes and their products (proteins and RNAs). GO annotations are always based on the characteristics of gene products, even though it may be the gene that is cited in the annotation.].  相似文献   

8.
The Gene Ontology (GO) is a collaborative effort that provides structured vocabularies for annotating the molecular function, biological role, and cellular location of gene products in a highly systematic way and in a species-neutral manner with the aim of unifying the representation of gene function across different organisms. Each contributing member of the GO Consortium independently associates GO terms to gene products from the organism(s) they are annotating. Here we introduce the Reference Genome project, which brings together those independent efforts into a unified framework based on the evolutionary relationships between genes in these different organisms. The Reference Genome project has two primary goals: to increase the depth and breadth of annotations for genes in each of the organisms in the project, and to create data sets and tools that enable other genome annotation efforts to infer GO annotations for homologous genes in their organisms. In addition, the project has several important incidental benefits, such as increasing annotation consistency across genome databases, and providing important improvements to the GO's logical structure and biological content.  相似文献   

9.
Gene Ontology (GO) uses structured vocabularies (or terms) to describe the molecular functions, biological roles, and cellular locations of gene products in a hierarchical ontology. GO annotations associate genes with GO terms and indicate the given gene products carrying out the biological functions described by the relevant terms. However, predicting correct GO annotations for genes from a massive set of GO terms as defined by GO is a difficult challenge. To combat with this challenge, we introduce a Gene Ontology Hierarchy Preserving Hashing (HPHash) based semantic method for gene function prediction. HPHash firstly measures the taxonomic similarity between GO terms. It then uses a hierarchy preserving hashing technique to keep the hierarchical order between GO terms, and to optimize a series of hashing functions to encode massive GO terms via compact binary codes. After that, HPHash utilizes these hashing functions to project the gene-term association matrix into a low-dimensional one and performs semantic similarity based gene function prediction in the low-dimensional space. Experimental results on three model species (Homo sapiens, Mus musculus and Rattus norvegicus) for interspecies gene function prediction show that HPHash performs better than other related approaches and it is robust to the number of hash functions. In addition, we also take HPHash as a plugin for BLAST based gene function prediction. From the experimental results, HPHash again significantly improves the prediction performance. The codes of HPHash are available at: http://mlda.swu.edu.cn/codes.php?name=HPHash.  相似文献   

10.
GoSurfer   总被引:2,自引:0,他引:2  
The analysis of complex patterns of gene regulation is central to understanding the biology of cells, tissues and organisms. Patterns of gene regulation pertaining to specific biological processes can be revealed by a variety of experimental strategies, particularly microarrays and other highly parallel methods, which generate large datasets linking many genes. Although methods for detecting gene expression have improved substantially in recent years, understanding the physiological implications of complex patterns in gene expression data is a major challenge. This article presents GoSurfer, an easy-to-use graphical exploration tool with built-in statistical features that allow a rapid assessment of the biological functions represented in large gene sets. GoSurfer takes one or two list(s) of gene identifiers (Affymetrix probe set ID) as input and retrieves all the Gene Ontology (GO) terms associated with the input genes. GoSurfer visualises these GO terms in a hierarchical tree format. With GoSurfer, users can perform statistical tests to search for the GO terms that are enriched in the annotations of the input genes. These GO terms can be highlighted on the GO tree. Users can manipulate the GO tree in various ways and interactively query the genes associated with any GO term. The user-generated graphics can be saved as graphics files, and all the GO information related to the input genes can be exported as text files. AVAILABILITY: GoSurfer is a Windows-based program freely available for noncommercial use and can be downloaded at http://www.gosurfer.org. Datasets used to construct the trees shown in the figures in this article are available at http://www.gosurfer.org/download/GoSurfer.zip.  相似文献   

11.
FungiFun assigns functional annotations to fungal genes or proteins and performs gene set enrichment analysis. Based on three different classification methods (FunCat, GO and KEGG), FungiFun categorizes genes and proteins for several fungal species on different levels of annotation detail. It is web-based and accessible to users without any programming skills. FungiFun is the first tool offering gene set enrichment analysis including the FunCat categorization. Two biological datasets for Aspergillus fumigatus and Candida albicans were analyzed using FungiFun, providing an overview of the usage and functions of the tool. FungiFun is freely accessible at https://www.omnifung.hki-jena.de/FungiFun/.  相似文献   

12.
Additional gene ontology structure for improved biological reasoning   总被引:5,自引:0,他引:5  
MOTIVATION: The Gene Ontology (GO) is a widely used terminology for gene product characterization in, for example, interpretation of biology underlying microarray experiments. The current GO defines term relationships within each of the independent subontologies: molecular function, biological process and cellular component. However, it is evident that there also exist biological relationships between terms of different subontologies. Our aim was to connect the three subontologies to enable GO to cover more biological knowledge, enable a more consistent use of GO and provide new opportunities for biological reasoning. RESULTS: We propose a new structure, the Second Gene Ontology Layer, capturing biological relations not directly reflected in the present ontology structure. Given molecular functions, these paths identify biological processes where the molecular functions are involved and cellular components where they are active. The current Second Layer contains 6271 validated paths, covering 54% of the molecular functions of GO and can be used to render existing gene annotation sets more complete and consistent. Applying Second Layer paths to a set of 4223 human genes, increased biological process annotations by 24% compared to publicly available annotations and reproduced 30% of them. AVAILABILITY: The Second GO is publicly available through the GO Annotation Toolbox (GOAT.no): http://www.goat.no.  相似文献   

13.
SUMMARY: Modern experimental techniques, as for example DNA microarrays, as a result usually produce a long list of genes, which are potentially interesting in the analyzed process. In order to gain biological understanding from this type of data, it is necessary to analyze the functional annotations of all genes in this list. The Gene-Ontology (GO) database provides a useful tool to annotate and analyze the functions of a large number of genes. Here, we introduce a tool that utilizes this information to obtain an understanding of which annotations are typical for the analyzed list of genes. This program automatically obtains the GO annotations from a database and generates statistics of which annotations are overrepresented in the analyzed list of genes. This results in a list of GO terms sorted by their specificity. AVAILABILITY: Our program GOstat is accessible via the Internet at http://gostat.wehi.edu.au  相似文献   

14.
15.
16.
The Gene Ontology (GO) provides biologists with a controlled terminology that describes how genes are associated with functions and how functional terms are related to one another. These term-term relationships encode how scientists conceive the organization of biological functions, and they take the form of a directed acyclic graph (DAG). Here, we propose that the network structure of gene-term annotations made using GO can be employed to establish an alternative approach for grouping functional terms that captures intrinsic functional relationships that are not evident in the hierarchical structure established in the GO DAG. Instead of relying on an externally defined organization for biological functions, our approach connects biological functions together if they are performed by the same genes, as indicated in a compendium of gene annotation data from numerous different sources. We show that grouping terms by this alternate scheme provides a new framework with which to describe and predict the functions of experimentally identified sets of genes.  相似文献   

17.
We have developed methods and tools based on the Gene Ontology (GO) resource allowing the identification of statistically over- or under-represented terms in a gene dataset; the clustering of functionally related genes within a set; and the retrieval of genes sharing annotations with a query gene. GO annotations can also be constrained to a slim hierarchy or a given level of the ontology. The source codes are available upon request, and distributed under the GPL license.  相似文献   

18.
Apolipoprotein CIII (ApoCIII) has been shown to be associated with the inflammatory response, but the mechanism of its inflammatory effects remains unclear. Because vascular endothelial cells (VECs) play a key role in the development of inflammation, the present study was performed to investigate inflammatory mechanisms induced by ApoCIII in VECs. In this study, we screened differentially expressed genes (DEGs) using RNA-sequencing. The results identified 390 up-regulated genes and 257 down-regulated genes. We performed GO functional classification and KEGG pathway analysis for DEGs. Analysis of sequencing data showed that 21 genes were related to the MAPK pathway. Finally, we investigated whether ApoCIII regulates the expression of pro-inflammatory cytokines via MAPK signaling pathway. The results showed that ApoCIII increased the expression levels of IL-6, TNF-α, VCAM-1 and ICAM-1 in VECs. ApoCIII activated the phosphorylation of ERK1/2 and p38 MAPK. An inhibitor of ERK1/2 and p38 MAPK decreased the protein levels of IL-6 and TNF-α. Our findings demonstrate that ApoCIII induces pro-inflammatory cytokine production in VECs via activation of ERK1/2 and p38 MAPK phosphorylation.  相似文献   

19.
20.
The goal of the Gene Ontology (GO) project is to provide a uniform way to describe the functions of gene products from organisms across all kingdoms of life and thereby enable analysis of genomic data. Protein annotations are either based on experiments or predicted from protein sequences. Since most sequences have not been experimentally characterized, most available annotations need to be based on predictions. To make as accurate inferences as possible, the GO Consortium's Reference Genome Project is using an explicit evolutionary framework to infer annotations of proteins from a broad set of genomes from experimental annotations in a semi-automated manner. Most components in the pipeline, such as selection of sequences, building multiple sequence alignments and phylogenetic trees, retrieving experimental annotations and depositing inferred annotations, are fully automated. However, the most crucial step in our pipeline relies on software-assisted curation by an expert biologist. This curation tool, Phylogenetic Annotation and INference Tool (PAINT) helps curators to infer annotations among members of a protein family. PAINT allows curators to make precise assertions as to when functions were gained and lost during evolution and record the evidence (e.g. experimentally supported GO annotations and phylogenetic information including orthology) for those assertions. In this article, we describe how we use PAINT to infer protein function in a phylogenetic context with emphasis on its strengths, limitations and guidelines. We also discuss specific examples showing how PAINT annotations compare with those generated by other highly used homology-based methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号