首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Learnability-based further prediction of gene functions in Gene Ontology   总被引:9,自引:0,他引:9  
Tu K  Yu H  Guo Z  Li X 《Genomics》2004,84(6):922-928
Currently the functional annotations of many genes are not specific enough, limiting their further application in biology and medicine. It is necessary to push the gene functional annotations deeper in Gene Ontology (GO), or to predict further annotated genes with more specific GO terms. A framework of learnability-based further prediction of gene functions in GO is proposed in this paper. Local classifiers are constructed in local classification spaces rooted at qualified parent nodes in GO, and their classification performances are evaluated with the averaged Tanimoto index (ATI). Classification spaces with higher ATIs are selected out, and genes annotated only to the parent classes are predicted to child classes. Through learnability-based further predicting, the functional annotations of annotated genes are made more specific. Experiments on the fibroblast serum response dataset reported further functional predictions for several human genes and also gave interesting clues to the varied learnability between classes of different GO ontologies, different levels, and different numbers of child classes.  相似文献   

2.
MOTIVATION: Despite advances in the gene annotation process, the functions of a large portion of gene products remain insufficiently characterized. In addition, the in silico prediction of novel Gene Ontology (GO) annotations for partially characterized gene functions or processes is highly dependent on reverse genetic or functional genomic approaches. To our knowledge, no prediction method has been demonstrated to be highly accurate for sparsely annotated GO terms (those associated to fewer than 10 genes). RESULTS: We propose a novel approach, information theory-based semantic similarity (ITSS), to automatically predict molecular functions of genes based on existing GO annotations. Using a 10-fold cross-validation, we demonstrate that the ITSS algorithm obtains prediction accuracies (precision 97%, recall 77%) comparable to other machine learning algorithms when compared in similar conditions over densely annotated portions of the GO datasets. This method is able to generate highly accurate predictions in sparsely annotated portions of GO, where previous algorithms have failed. As a result, our technique generates an order of magnitude more functional predictions than previous methods. A 10-fold cross validation demonstrated a precision of 90% at a recall of 36% for the algorithm over sparsely annotated networks of the recent GO annotations (about 1400 GO terms and 11,000 genes in Homo sapiens). To our knowledge, this article presents the first historical rollback validation for the predicted GO annotations, which may represent more realistic conditions than more widely used cross-validation approaches. By manually assessing a random sample of 100 predictions conducted in a historical rollback evaluation, we estimate that a minimum precision of 51% (95% confidence interval: 43-58%) can be achieved for the human GO Annotation file dated 2003. AVAILABILITY: The program is available on request. The 97,732 positive predictions of novel gene annotations from the 2005 GO Annotation dataset and other supplementary information is available at http://phenos.bsd.uchicago.edu/ITSS/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

3.
X Chen  R Yang  J Xu  H Ma  S Chen  X Bian  L Liu 《Gene》2012,509(1):131-135
Methods for computing similarities among genes have attracted increasing attention for their applications in gene clustering, gene expression data analysis, protein interaction prediction and evaluation. To address the need for automatically computing functional similarities of genes, an important class of methods that computes functional similarities by comparing Gene Ontology (GO) annotations of genes has been developed. However, all of the currently available methods have some drawbacks; for example, they either ignore the specificity of the GO terms or do not consider the information contained within the GO structure. As a result, the existing methods perform weakly when the genes are annotated with 'shallow annotations'. Here, we propose a new method to compute functional similarities among genes based on their GO annotations and compare it with the widely-used G-SESAME method. The results show that the new method reliably distinguishes functional similarities among genes and demonstrate that the method is especially sensitive to genes with 'shallow annotations'. Moreover, our method has high correlations with sequence and EC similarities.  相似文献   

4.
Gene Ontology (GO) uses structured vocabularies (or terms) to describe the molecular functions, biological roles, and cellular locations of gene products in a hierarchical ontology. GO annotations associate genes with GO terms and indicate the given gene products carrying out the biological functions described by the relevant terms. However, predicting correct GO annotations for genes from a massive set of GO terms as defined by GO is a difficult challenge. To combat with this challenge, we introduce a Gene Ontology Hierarchy Preserving Hashing (HPHash) based semantic method for gene function prediction. HPHash firstly measures the taxonomic similarity between GO terms. It then uses a hierarchy preserving hashing technique to keep the hierarchical order between GO terms, and to optimize a series of hashing functions to encode massive GO terms via compact binary codes. After that, HPHash utilizes these hashing functions to project the gene-term association matrix into a low-dimensional one and performs semantic similarity based gene function prediction in the low-dimensional space. Experimental results on three model species (Homo sapiens, Mus musculus and Rattus norvegicus) for interspecies gene function prediction show that HPHash performs better than other related approaches and it is robust to the number of hash functions. In addition, we also take HPHash as a plugin for BLAST based gene function prediction. From the experimental results, HPHash again significantly improves the prediction performance. The codes of HPHash are available at: http://mlda.swu.edu.cn/codes.php?name=HPHash.  相似文献   

5.
6.
The Gene Ontology (GO) is a collaborative effort that provides structured vocabularies for annotating the molecular function, biological role, and cellular location of gene products in a highly systematic way and in a species-neutral manner with the aim of unifying the representation of gene function across different organisms. Each contributing member of the GO Consortium independently associates GO terms to gene products from the organism(s) they are annotating. Here we introduce the Reference Genome project, which brings together those independent efforts into a unified framework based on the evolutionary relationships between genes in these different organisms. The Reference Genome project has two primary goals: to increase the depth and breadth of annotations for genes in each of the organisms in the project, and to create data sets and tools that enable other genome annotation efforts to infer GO annotations for homologous genes in their organisms. In addition, the project has several important incidental benefits, such as increasing annotation consistency across genome databases, and providing important improvements to the GO's logical structure and biological content.  相似文献   

7.
Gene Ontology annotation quality analysis in model eukaryotes   总被引:1,自引:0,他引:1       下载免费PDF全文
Functional analysis using the Gene Ontology (GO) is crucial for array analysis, but it is often difficult for researchers to assess the amount and quality of GO annotations associated with different sets of gene products. In many cases the source of the GO annotations and the date the GO annotations were last updated is not apparent, further complicating a researchers’ ability to assess the quality of the GO data provided. Moreover, GO biocurators need to ensure that the GO quality is maintained and optimal for the functional processes that are most relevant for their research community. We report the GO Annotation Quality (GAQ) score, a quantitative measure of GO quality that includes breadth of GO annotation, the level of detail of annotation and the type of evidence used to make the annotation. As a case study, we apply the GAQ scoring method to a set of diverse eukaryotes and demonstrate how the GAQ score can be used to track changes in GO annotations over time and to assess the quality of GO annotations available for specific biological processes. The GAQ score also allows researchers to quantitatively assess the functional data available for their experimental systems (arrays or databases).  相似文献   

8.
Gu J  Li S 《Molecular bioSystems》2012,8(8):2041-2049
Vascular endothelial cells (VECs), which form the inner surface of blood vessels, play essential roles in many physiological and pathological processes. VECs are exposed to various micro-environmental stimuli delivered by the circulatory systems. Systematically deciphering the gene functions and signaling circuits in VECs responsive to the complex micro-environmental stimuli is one of the fundamental tasks in vascular biology. Currently, several databases aim at genome-widely annotating the gene functions and signaling circuits, but most of them take limited consideration on the cell-type specific information. And also, current annotations only provide the core genes involved in different signaling circuits, lacking the annotations on the peripheral signaling molecules or the signaling cross-talks. To quickly construct the genome-wide gene functional and signaling map in VECs, we developed a N[combining low line]etwork-based a[combining low line]n[combining low line]n[combining low line]o[combining low line]tating system (Nanno) by integrating cell-type specific gene expression profiles, genome-wide protein-protein interaction (PPI) networks, Gene Ontology (GO) annotations and microRNA (miRNA) target gene information. Using this system, we successfully re-annotated the genes involved in several essential cellular functions and also identified the signaling circuits under different stimuli in VECs in a cell-type specific manner. Many important genes, which are not included in GO annotations, can be recovered by Nanno. And several canonical signaling pathways and miRNAs are predicted to involve in the inflammatory and angiogenic signaling in VECs. The annotations suggest that there may exist cross-talks in cell cycle regulation between the two conditions.  相似文献   

9.
The Gene Ontology (GO) project provides a controlled vocabulary to facilitate high-quality functional gene annotation for all species. Genes in biological databases are linked to GO terms, allowing biologists to ask questions about gene function in a manner independent of species. This tutorial provides an introduction for biologists to the GO resources and covers three of the most common methods of querying GO: by individual gene, by gene function and by using a list of genes. [For the sake of brevity, the term 'gene' is used throughout this paper to refer to genes and their products (proteins and RNAs). GO annotations are always based on the characteristics of gene products, even though it may be the gene that is cited in the annotation.].  相似文献   

10.
An integral part of functional genomics studies is to assess the enrichment of specific biological terms in lists of genes found to be playing an important role in biological phenomena. Contrasting the observed frequency of annotated terms with those of the background is at the core of overrepresentation analysis (ORA). Gene Ontology (GO) is a means to consistently classify and annotate gene products and has become a mainstay in ORA. Alternatively, Medical Subject Headings (MeSH) offers a comprehensive life science vocabulary including additional categories that are not covered by GO. Although MeSH is applied predominantly in human and model organism research, its full potential in livestock genetics is yet to be explored. In this study, MeSH ORA was evaluated to discern biological properties of identified genes and contrast them with the results obtained from GO enrichment analysis. Three published datasets were employed for this purpose, representing a gene expression study in dairy cattle, the use of SNPs for genome‐wide prediction in swine and the identification of genomic regions targeted by selection in horses. We found that several overrepresented MeSH annotations linked to these gene sets share similar concepts with those of GO terms. Moreover, MeSH yielded unique annotations, which are not directly provided by GO terms, suggesting that MeSH has the potential to refine and enrich the representation of biological knowledge. We demonstrated that MeSH can be regarded as another choice of annotation to draw biological inferences from genes identified via experimental analyses. When used in combination with GO terms, our results indicate that MeSH can enhance our functional interpretations for specific biological conditions or the genetic basis of complex traits in livestock species.  相似文献   

11.

Background  

The Gene Ontology (GO) is a well known controlled vocabulary describing the biological process, molecular function and cellular component aspects of gene annotation. It has become a widely used knowledge source in bioinformatics for annotating genes and measuring their semantic similarity. These measures generally involve the GO graph structure, the information content of GO aspects, or a combination of both. However, only a few of the semantic similarity measures described so far can handle GO annotations differently according to their origin (i.e. their evidence codes).  相似文献   

12.

Background  

The search for enriched features has become widely used to characterize a set of genes or proteins. A key aspect of this technique is its ability to identify correlations amongst heterogeneous data such as Gene Ontology annotations, gene expression data and genome location of genes. Despite the rapid growth of available data, very little has been proposed in terms of formalization and optimization. Additionally, current methods mainly ignore the structure of the data which causes results redundancy. For example, when searching for enrichment in GO terms, genes can be annotated with multiple GO terms and should be propagated to the more general terms in the Gene Ontology. Consequently, the gene sets often overlap partially or totally, and this causes the reported enriched GO terms to be both numerous and redundant, hence, overwhelming the researcher with non-pertinent information. This situation is not unique, it arises whenever some hierarchical clustering is performed (e.g. based on the gene expression profiles), the extreme case being when genes that are neighbors on the chromosomes are considered.  相似文献   

13.
GoSurfer   总被引:2,自引:0,他引:2  
The analysis of complex patterns of gene regulation is central to understanding the biology of cells, tissues and organisms. Patterns of gene regulation pertaining to specific biological processes can be revealed by a variety of experimental strategies, particularly microarrays and other highly parallel methods, which generate large datasets linking many genes. Although methods for detecting gene expression have improved substantially in recent years, understanding the physiological implications of complex patterns in gene expression data is a major challenge. This article presents GoSurfer, an easy-to-use graphical exploration tool with built-in statistical features that allow a rapid assessment of the biological functions represented in large gene sets. GoSurfer takes one or two list(s) of gene identifiers (Affymetrix probe set ID) as input and retrieves all the Gene Ontology (GO) terms associated with the input genes. GoSurfer visualises these GO terms in a hierarchical tree format. With GoSurfer, users can perform statistical tests to search for the GO terms that are enriched in the annotations of the input genes. These GO terms can be highlighted on the GO tree. Users can manipulate the GO tree in various ways and interactively query the genes associated with any GO term. The user-generated graphics can be saved as graphics files, and all the GO information related to the input genes can be exported as text files. AVAILABILITY: GoSurfer is a Windows-based program freely available for noncommercial use and can be downloaded at http://www.gosurfer.org. Datasets used to construct the trees shown in the figures in this article are available at http://www.gosurfer.org/download/GoSurfer.zip.  相似文献   

14.
A recent paper (Nehrt et al., PLoS Comput. Biol. 7:e1002073, 2011) has proposed a metric for the "functional similarity" between two genes that uses only the Gene Ontology (GO) annotations directly derived from published experimental results. Applying this metric, the authors concluded that paralogous genes within the mouse genome or the human genome are more functionally similar on average than orthologous genes between these genomes, an unexpected result with broad implications if true. We suggest, based on both theoretical and empirical considerations, that this proposed metric should not be interpreted as a functional similarity, and therefore cannot be used to support any conclusions about the "ortholog conjecture" (or, more properly, the "ortholog functional conservation hypothesis"). First, we reexamine the case studies presented by Nehrt et al. as examples of orthologs with divergent functions, and come to a very different conclusion: they actually exemplify how GO annotations for orthologous genes provide complementary information about conserved biological functions. We then show that there is a global ascertainment bias in the experiment-based GO annotations for human and mouse genes: particular types of experiments tend to be performed in different model organisms. We conclude that the reported statistical differences in annotations between pairs of orthologous genes do not reflect differences in biological function, but rather complementarity in experimental approaches. Our results underscore two general considerations for researchers proposing novel types of analysis based on the GO: 1) that GO annotations are often incomplete, potentially in a biased manner, and subject to an "open world assumption" (absence of an annotation does not imply absence of a function), and 2) that conclusions drawn from a novel, large-scale GO analysis should whenever possible be supported by careful, in-depth examination of examples, to help ensure the conclusions have a justifiable biological basis.  相似文献   

15.
SUMMARY: Modern experimental techniques, as for example DNA microarrays, as a result usually produce a long list of genes, which are potentially interesting in the analyzed process. In order to gain biological understanding from this type of data, it is necessary to analyze the functional annotations of all genes in this list. The Gene-Ontology (GO) database provides a useful tool to annotate and analyze the functions of a large number of genes. Here, we introduce a tool that utilizes this information to obtain an understanding of which annotations are typical for the analyzed list of genes. This program automatically obtains the GO annotations from a database and generates statistics of which annotations are overrepresented in the analyzed list of genes. This results in a list of GO terms sorted by their specificity. AVAILABILITY: Our program GOstat is accessible via the Internet at http://gostat.wehi.edu.au  相似文献   

16.

Background

Communalities between large sets of genes obtained from high-throughput experiments are often identified by searching for enrichments of genes with the same Gene Ontology (GO) annotations. The GO analysis tools used for these enrichment analyses assume that GO terms are independent and the semantic distances between all parent–child terms are identical, which is not true in a biological sense. In addition these tools output lists of often redundant or too specific GO terms, which are difficult to interpret in the context of the biological question investigated by the user. Therefore, there is a demand for a robust and reliable method for gene categorization and enrichment analysis.

Results

We have developed Categorizer, a tool that classifies genes into user-defined groups (categories) and calculates p-values for the enrichment of the categories. Categorizer identifies the biologically best-fit category for each gene by taking advantage of a specialized semantic similarity measure for GO terms. We demonstrate that Categorizer provides improved categorization and enrichment results of genetic modifiers of Huntington’s disease compared to a classical GO Slim-based approach or categorizations using other semantic similarity measures.

Conclusion

Categorizer enables more accurate categorizations of genes than currently available methods. This new tool will help experimental and computational biologists analyzing genomic and proteomic data according to their specific needs in a more reliable manner.  相似文献   

17.
一种新的基因注释语义相似度计算方法   总被引:1,自引:0,他引:1  
基因本体(GO)数据库为基因提供了统一的注释,有效地解决了不同数据库描述相同基因的不一致问题。但是,根据基因注释如何比较基因的功能相似性,这个问题仍然没有得到有效解决。本文提出一种新的基因注释语义相似度计算方法,这种方法在本质上是基于基因的生物学特性,其特点在于结点的语义相似度与结点所在集合无关,只与结点在GO图的位置有关,语义相似度可被重复利用。它既考虑了基因所映射的GO结点深度,又考虑了两GO结点之间所有路径对结点语义相似度的影响。文中以酵母菌的异亮氨酸降解代谢通路和谷氨酸合成代谢通路为实验,实验结果表明这种算法能准确地计算基因注释语义相似度。  相似文献   

18.
Gene Ontology (GO) vocabularies are an established standard for linking functional information to genes and gene products (www.geneontology.org/). A recent collaboration between University College London and the European Bioinformatics Institute is providing GO annotation to human cardiovascular-associated genes (http://www.ucl.ac.uk/medicine/cardiovascular-genetics/geneontology.html). This report outlines the aims of this collaboration and summarizes how the cardiovascular community can help improve the quality and quantity of GO annotations. This new initiative is funded by the British Heart Foundation and fully supported by the GO Consortium.  相似文献   

19.
Linkage studies of complex traits frequently yield multiple linkage regions covering hundreds of genes. Testing each candidate gene from every region is prohibitively expensive and computational methods that simplify this process would benefit genetic research. We present a new method based on commonality of functional annotation (CFA) that aids dissection of complex traits for which multiple causal genes act in a single pathway or process. CFA works by testing individual Gene Ontology (GO) terms for enrichment among candidate gene pools, performs multiple hypothesis testing adjustment using an estimate of independent tests based on correlation of GO terms, and then scores and ranks genes annotated with significantly-enriched terms based on the number of quantitative trait loci regions in which genes bearing those annotations appear. We evaluate CFA using simulated linkage data and show that CFA has good power despite being conservative. We apply CFA to published linkage studies investigating age-of-onset of Alzheimer's disease and body mass index and obtain previously known and new candidate genes. CFA provides a new tool for studies in which causal genes are expected to participate in a common pathway or process and can easily be extended to utilize annotation schemes in addition to the GO.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号