共查询到20条相似文献,搜索用时 15 毫秒
1.
Characterising gene function for the ever-increasing number and diversity of species with annotated genomes relies almost entirely on computational prediction methods. These software are also numerous and diverse, each with different strengths and weaknesses as revealed through community benchmarking efforts. Meta-predictors that assess consensus and conflict from individual algorithms should deliver enhanced functional annotations. To exploit the benefits of meta-approaches, we developed CrowdGO, an open-source consensus-based Gene Ontology (GO) term meta-predictor that employs machine learning models with GO term semantic similarities and information contents. By re-evaluating each gene-term annotation, a consensus dataset is produced with high-scoring confident annotations and low-scoring rejected annotations. Applying CrowdGO to results from a deep learning-based, a sequence similarity-based, and two protein domain-based methods, delivers consensus annotations with improved precision and recall. Furthermore, using standard evaluation measures CrowdGO performance matches that of the community’s best performing individual methods. CrowdGO therefore offers a model-informed approach to leverage strengths of individual predictors and produce comprehensive and accurate gene functional annotations. 相似文献
2.
Functional analysis using the Gene Ontology (GO) is crucial for array analysis, but it is often difficult for researchers to assess the amount and quality of GO annotations associated with different sets of gene products. In many cases the source of the GO annotations and the date the GO annotations were last updated is not apparent, further complicating a researchers’ ability to assess the quality of the GO data provided. Moreover, GO biocurators need to ensure that the GO quality is maintained and optimal for the functional processes that are most relevant for their research community. We report the GO Annotation Quality (GAQ) score, a quantitative measure of GO quality that includes breadth of GO annotation, the level of detail of annotation and the type of evidence used to make the annotation. As a case study, we apply the GAQ scoring method to a set of diverse eukaryotes and demonstrate how the GAQ score can be used to track changes in GO annotations over time and to assess the quality of GO annotations available for specific biological processes. The GAQ score also allows researchers to quantitatively assess the functional data available for their experimental systems (arrays or databases). 相似文献
3.
Background
Conserved protein sequence motifs are short stretches of amino acid sequence patterns that potentially encode the function of proteins. Several sequence pattern searching algorithms and programs exist foridentifying candidate protein motifs at the whole genome level. However, amuch needed and importanttask is to determine the functions of the newly identified protein motifs. The Gene Ontology (GO) project is an endeavor to annotate the function of genes or protein sequences with terms from a dynamic, controlled vocabulary and these annotations serve well as a knowledge base. 相似文献4.
GoFigure: automated Gene Ontology annotation 总被引:4,自引:0,他引:4
SUMMARY: We have developed a web tool to predict Gene Ontology (GO) terms. The tool accepts an input DNA or protein sequence, and uses BLAST to identify homologous sequences in GO annotated databases. A graph is returned to the user via email. AVAILABILITY: The tool is freely available at: http://udgenome.ags.udel.edu/frm_go.html/ 相似文献
5.
Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation 总被引:12,自引:0,他引:12
MOTIVATION: Many bioinformatics data resources not only hold data in the form of sequences, but also as annotation. In the majority of cases, annotation is written as scientific natural language: this is suitable for humans, but not particularly useful for machine processing. Ontologies offer a mechanism by which knowledge can be represented in a form capable of such processing. In this paper we investigate the use of ontological annotation to measure the similarities in knowledge content or 'semantic similarity' between entries in a data resource. These allow a bioinformatician to perform a similarity measure over annotation in an analogous manner to those performed over sequences. A measure of semantic similarity for the knowledge component of bioinformatics resources should afford a biologist a new tool in their repertoire of analyses. RESULTS: We present the results from experiments that investigate the validity of using semantic similarity by comparison with sequence similarity. We show a simple extension that enables a semantic search of the knowledge held within sequence databases. AVAILABILITY: Software available from http://www.russet.org.uk. 相似文献
6.
7.
8.
Background
Many biological processes recognize in particular the importance of protein complexes, and various computational approaches have been developed to identify complexes from protein-protein interaction (PPI) networks. However, high false-positive rate of PPIs leads to challenging identification.Results
A protein semantic similarity measure is proposed in this study, based on the ontology structure of Gene Ontology (GO) terms and GO annotations to estimate the reliability of interactions in PPI networks. Interaction pairs with low GO semantic similarity are removed from the network as unreliable interactions. Then, a cluster-expanding algorithm is used to detect complexes with core-attachment structure on filtered network. Our method is applied to three different yeast PPI networks. The effectiveness of our method is examined on two benchmark complex datasets. Experimental results show that our method performed better than other state-of-the-art approaches in most evaluation metrics.Conclusions
The method detects protein complexes from large scale PPI networks by filtering GO semantic similarity. Removing interactions with low GO similarity significantly improves the performance of complex identification. The expanding strategy is also effective to identify attachment proteins of complexes.9.
Background
The annotations of Affymetrix DNA microarray probe sets with Gene Ontology terms are carefully selected for correctness. This results in very accurate but incomplete annotations which is not always desirable for microarray experiment evaluation. 相似文献10.
Background
The accomplishment of the various genome sequencing projects resulted in accumulation of massive amount of gene sequence information. This calls for a large-scale computational method for predicting protein localization from sequence. The protein localization can provide valuable information about its molecular function, as well as the biological pathway in which it participates. The prediction of localization of a protein at subnuclear level is a challenging task. In our previous work we proposed an SVM-based system using protein sequence information for this prediction task. In this work, we assess protein similarity with Gene Ontology (GO) and then improve the performance of the system by adding a module of nearest neighbor classifier using a similarity measure derived from the GO annotation terms for protein sequences. 相似文献11.
Popescu M Keller JM Mitchell JA 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2006,3(3):263-274
One of the most important objects in bioinformatics is a gene product (protein or RNA). For many gene products, functional information is summarized in a set of Gene Ontology (GO) annotations. For these genes, it is reasonable to include similarity measures based on the terms found in the GO or other taxonomy. In this paper, we introduce several novel measures for computing the similarity of two gene products annotated with GO terms. The fuzzy measure similarity (FMS) has the advantage that it takes into consideration the context of both complete sets of annotation terms when computing the similarity between two gene products. When the two gene products are not annotated by common taxonomy terms, we propose a method that avoids a zero similarity result. To account for the variations in the annotation reliability, we propose a similarity measure based on the Choquet integral. These similarity measures provide extra tools for the biologist in search of functional information for gene products. The initial testing on a group of 194 sequences representing three proteins families shows a higher correlation of the FMS and Choquet similarities to the BLAST sequence similarities than the traditional similarity measures such as pairwise average or pairwise maximum. 相似文献
12.
13.
Background
The availability of various high-throughput experimental and computational methods allows biologists to rapidly infer functional relationships between genes. It is often necessary to evaluate these predictions computationally, a task that requires a reference database for functional relatedness. One such reference is the Gene Ontology (GO). A number of groups have suggested that the semantic similarity of the GO annotations of genes can serve as a proxy for functional relatedness. Here we evaluate a simple measure of semantic similarity, term overlap (TO). 相似文献14.
15.
Andreas Schlicker Francisco S Domingues Jörg Rahnenführer Thomas Lengauer 《BMC bioinformatics》2006,7(1):302-16
Background
Gene Ontology (GO) is a standard vocabulary of functional terms and allows for coherent annotation of gene products. These annotations provide a basis for new methods that compare gene products regarding their molecular function and biological role. 相似文献16.
SUMMARY: Analysis of microarray data most often produces lists of genes with similar expression patterns, which are then subdivided into functional categories for biological interpretation. Such functional categorization is most commonly accomplished using Gene Ontology (GO) categories. Although there are several programs that identify and analyze functional categories for human, mouse and yeast genes, none of them accept Arabidopsis thaliana data. In order to address this need for A.thaliana community, we have developed a program that retrieves GO annotations for A.thaliana genes and performs functional category analysis for lists of genes selected by the user. AVAILABILITY: http://www.personal.psu.edu/nhs109/Clench 相似文献
17.
Gene array analysis of osteoblast differentiation. 总被引:4,自引:0,他引:4
18.
19.
20.
Saccharomyces Genome Database (SGD) provides secondary gene annotation using the Gene Ontology (GO) 总被引:20,自引:2,他引:20 下载免费PDF全文
Selina S. Dwight Midori A. Harris Kara Dolinski Catherine A. Ball Gail Binkley Karen R. Christie Dianna G. Fisk Laurie Issel-Tarver Mark Schroeder Gavin Sherlock Anand Sethuraman Shuai Weng David Botstein J. Michael Cherry 《Nucleic acids research》2002,30(1):69-72