期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

CrowdGO: Machine learning and semantic similarity guided consensus Gene Ontology annotation

Maarten J. M. F. Reijnders Robert M. Waterhouse 《PLoS computational biology》2022,18(5)

Characterising gene function for the ever-increasing number and diversity of species with annotated genomes relies almost entirely on computational prediction methods. These software are also numerous and diverse, each with different strengths and weaknesses as revealed through community benchmarking efforts. Meta-predictors that assess consensus and conflict from individual algorithms should deliver enhanced functional annotations. To exploit the benefits of meta-approaches, we developed CrowdGO, an open-source consensus-based Gene Ontology (GO) term meta-predictor that employs machine learning models with GO term semantic similarities and information contents. By re-evaluating each gene-term annotation, a consensus dataset is produced with high-scoring confident annotations and low-scoring rejected annotations. Applying CrowdGO to results from a deep learning-based, a sequence similarity-based, and two protein domain-based methods, delivers consensus annotations with improved precision and recall. Furthermore, using standard evaluation measures CrowdGO performance matches that of the community’s best performing individual methods. CrowdGO therefore offers a model-informed approach to leverage strengths of individual predictors and produce comprehensive and accurate gene functional annotations. 相似文献

2.

Gene Ontology annotation quality analysis in model eukaryotes 总被引：1，自引：0，他引：1

下载免费PDF全文

Buza TJ McCarthy FM Wang N Bridges SM Burgess SC 《Nucleic acids research》2008,36(2):e12

Functional analysis using the Gene Ontology (GO) is crucial for array analysis, but it is often difficult for researchers to assess the amount and quality of GO annotations associated with different sets of gene products. In many cases the source of the GO annotations and the date the GO annotations were last updated is not apparent, further complicating a researchers’ ability to assess the quality of the GO data provided. Moreover, GO biocurators need to ensure that the GO quality is maintained and optimal for the functional processes that are most relevant for their research community. We report the GO Annotation Quality (GAQ) score, a quantitative measure of GO quality that includes breadth of GO annotation, the level of detail of annotation and the type of evidence used to make the annotation. As a case study, we apply the GAQ scoring method to a set of diverse eukaryotes and demonstrate how the GAQ score can be used to track changes in GO annotations over time and to assess the quality of GO annotations available for specific biological processes. The GAQ score also allows researchers to quantitatively assess the functional data available for their experimental systems (arrays or databases). 相似文献

3.

Automatic annotation of protein motif function with Gene Ontology terms

Xinghua?Lu Email author Chengxiang?Zhai Vanathi?Gopalakrishnan Bruce?G?Buchanan 《BMC bioinformatics》2004,5(1):122

Background

Conserved protein sequence motifs are short stretches of amino acid sequence patterns that potentially encode the function of proteins. Several sequence pattern searching algorithms and programs exist foridentifying candidate protein motifs at the whole genome level. However, amuch needed and importanttask is to determine the functions of the newly identified protein motifs. The Gene Ontology (GO) project is an endeavor to annotate the function of genes or protein sequences with terms from a dynamic, controlled vocabulary and these annotations serve well as a knowledge base. 相似文献

4.

GoFigure: automated Gene Ontology annotation 总被引：4，自引：0，他引：4

Khan S Situ G Decker K Schmidt CJ 《Bioinformatics (Oxford, England)》2003,19(18):2484-2485

SUMMARY: We have developed a web tool to predict Gene Ontology (GO) terms. The tool accepts an input DNA or protein sequence, and uses BLAST to identify homologous sequences in GO annotated databases. A graph is returned to the user via email. AVAILABILITY: The tool is freely available at: http://udgenome.ags.udel.edu/frm_go.html/ 相似文献

5.

Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation 总被引：12，自引：0，他引：12

Lord PW Stevens RD Brass A Goble CA 《Bioinformatics (Oxford, England)》2003,19(10):1275-1283

MOTIVATION: Many bioinformatics data resources not only hold data in the form of sequences, but also as annotation. In the majority of cases, annotation is written as scientific natural language: this is suitable for humans, but not particularly useful for machine processing. Ontologies offer a mechanism by which knowledge can be represented in a form capable of such processing. In this paper we investigate the use of ontological annotation to measure the similarities in knowledge content or 'semantic similarity' between entries in a data resource. These allow a bioinformatician to perform a similarity measure over annotation in an analogous manner to those performed over sequences. A measure of semantic similarity for the knowledge component of bioinformatics resources should afford a biologist a new tool in their repertoire of analyses. RESULTS: We present the results from experiments that investigate the validity of using semantic similarity by comparison with sequence similarity. We show a simple extension that enables a semantic search of the knowledge held within sequence databases. AVAILABILITY: Software available from http://www.russet.org.uk. 相似文献

6.

Applying the Gene Ontology in microbial annotation

Michelle G. Giglio Candace W. Collmer Jane Lomax Amelia Ireland 《Trends in microbiology》2009,17(7):262-268

相似文献

7.

Automated Gene Ontology annotation for anonymous sequence data 总被引：9，自引：1，他引：9

下载免费PDF全文

Hennig S Groth D Lehrach H 《Nucleic acids research》2003,31(13):3712-3715

相似文献

8.

Filtering Gene Ontology semantic similarity for identifying protein complexes in large protein interaction networks

Wang J Xie D Lin H Yang Z Zhang Y 《Proteome science》2012,10(Z1):S18

Background

Many biological processes recognize in particular the importance of protein complexes, and various computational approaches have been developed to identify complexes from protein-protein interaction (PPI) networks. However, high false-positive rate of PPIs leads to challenging identification.

Results

A protein semantic similarity measure is proposed in this study, based on the ontology structure of Gene Ontology (GO) terms and GO annotations to estimate the reliability of interactions in PPI networks. Interaction pairs with low GO semantic similarity are removed from the network as unreliable interactions. Then, a cluster-expanding algorithm is used to detect complexes with core-attachment structure on filtered network. Our method is applied to three different yeast PPI networks. The effectiveness of our method is examined on two benchmark complex datasets. Experimental results show that our method performed better than other state-of-the-art approaches in most evaluation metrics.

Conclusions

The method detects protein complexes from large scale PPI networks by filtering GO semantic similarity. Removing interactions with low GO similarity significantly improves the performance of complex identification. The expanding strategy is also effective to identify attachment proteins of complexes.

相似文献

9.

Amplification of the Gene Ontology annotation of Affymetrix probe sets

Enrique M Muro Carolina Perez-Iratxeta Miguel A Andrade-Navarro 《BMC bioinformatics》2006,7(1):159-6

Background

The annotations of Affymetrix DNA microarray probe sets with Gene Ontology terms are carefully selected for correctness. This results in very accurate but incomplete annotations which is not always desirable for microarray experiment evaluation. 相似文献

10.

Assessing protein similarity with Gene Ontology and its use in subnuclear localization prediction

Zhengdeng Lei Yang Dai 《BMC bioinformatics》2006,7(1):491

Background

The accomplishment of the various genome sequencing projects resulted in accumulation of massive amount of gene sequence information. This calls for a large-scale computational method for predicting protein localization from sequence. The protein localization can provide valuable information about its molecular function, as well as the biological pathway in which it participates. The prediction of localization of a protein at subnuclear level is a challenging task. In our previous work we proposed an SVM-based system using protein sequence information for this prediction task. In this work, we assess protein similarity with Gene Ontology (GO) and then improve the performance of the system by adding a module of nearest neighbor classifier using a similarity measure derived from the GO annotation terms for protein sequences. 相似文献

11.

Fuzzy measures on the Gene Ontology for gene product similarity

Popescu M Keller JM Mitchell JA 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2006,3(3):263-274

One of the most important objects in bioinformatics is a gene product (protein or RNA). For many gene products, functional information is summarized in a set of Gene Ontology (GO) annotations. For these genes, it is reasonable to include similarity measures based on the terms found in the GO or other taxonomy. In this paper, we introduce several novel measures for computing the similarity of two gene products annotated with GO terms. The fuzzy measure similarity (FMS) has the advantage that it takes into consideration the context of both complete sets of annotation terms when computing the similarity between two gene products. When the two gene products are not annotated by common taxonomy terms, we propose a method that avoids a zero similarity result. To account for the variations in the annotation reliability, we propose a similarity measure based on the Choquet integral. These similarity measures provide extra tools for the biologist in search of functional information for gene products. The initial testing on a group of 194 sequences representing three proteins families shows a higher correlation of the FMS and Choquet similarities to the BLAST sequence similarities than the traditional similarity measures such as pairwise average or pairwise maximum. 相似文献

12.

A relation based measure of semantic similarity for Gene Ontology annotations

Brendan Sheehan Aaron Quigley Benoit Gaudin Simon Dobson 《BMC bioinformatics》2008,9(1):468

相似文献

13.

Gene Ontology term overlap as a measure of gene functional similarity

Meeta Mistry Paul Pavlidis 《BMC bioinformatics》2008,9(1):327

Background

The availability of various high-throughput experimental and computational methods allows biologists to rapidly infer functional relationships between genes. It is often necessary to evaluate these predictions computationally, a task that requires a reference database for functional relatedness. One such reference is the Gene Ontology (GO). A number of groups have suggested that the semantic similarity of the GO annotations of genes can serve as a proxy for functional relatedness. Here we evaluate a simple measure of semantic similarity, term overlap (TO). 相似文献

14.

Gene Ontology and the annotation of pathogen genomes: the case of Candida albicans

Martha B. Arnaud Maria C. Costanzo Prachi Shah Marek S. Skrzypek Gavin Sherlock 《Trends in microbiology》2009,17(7):295-303

相似文献

15.

A new measure for functional similarity of gene products based on Gene Ontology

Andreas Schlicker Francisco S Domingues Jörg Rahnenführer Thomas Lengauer 《BMC bioinformatics》2006,7(1):302-16

Background

Gene Ontology (GO) is a standard vocabulary of functional terms and allows for coherent annotation of gene products. These annotations provide a basis for new methods that compare gene products regarding their molecular function and biological role. 相似文献

16.

CLENCH: a program for calculating Cluster ENriCHment using the Gene Ontology 总被引：1，自引：0，他引：1

Shah NH Fedoroff NV 《Bioinformatics (Oxford, England)》2004,20(7):1196-1197

SUMMARY: Analysis of microarray data most often produces lists of genes with similar expression patterns, which are then subdivided into functional categories for biological interpretation. Such functional categorization is most commonly accomplished using Gene Ontology (GO) categories. Although there are several programs that identify and analyze functional categories for human, mouse and yeast genes, none of them accept Arabidopsis thaliana data. In order to address this need for A.thaliana community, we have developed a program that retrieves GO annotations for A.thaliana genes and performs functional category analysis for lists of genes selected by the user. AVAILABILITY: http://www.personal.psu.edu/nhs109/Clench 相似文献

17.

Gene array analysis of osteoblast differentiation. 总被引：4，自引：0，他引：4

G R Beck B Zerler E Moran 《Cell growth & differentiation》2001,12(2):61-83

相似文献

18.

Prediction of human protein function according to Gene Ontology categories 总被引：12，自引：0，他引：12

Jensen LJ Gupta R Staerfeldt HH Brunak S 《Bioinformatics (Oxford, England)》2003,19(5):635-642

相似文献

19.

An annotation infrastructure for the analysis and interpretation of Affymetrix exon array data

下载免费PDF全文

Okoniewski MJ Yates T Dibben S Miller CJ 《Genome biology》2007,8(5):R79

相似文献

20.

Saccharomyces Genome Database (SGD) provides secondary gene annotation using the Gene Ontology (GO) 总被引：20，自引：2，他引：20

下载免费PDF全文

Selina S. Dwight Midori A. Harris Kara Dolinski Catherine A. Ball Gail Binkley Karen R. Christie Dianna G. Fisk Laurie Issel-Tarver Mark Schroeder Gavin Sherlock Anand Sethuraman Shuai Weng David Botstein J. Michael Cherry 《Nucleic acids research》2002,30(1):69-72

相似文献