期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

On gene ontology and function annotation

Pal D 《Bioinformation》2006,1(3):97-98

The effort of function annotation does not merely involve associating a gene with some structured vocabulary that describes action. Rather the details of the actions, the components of the actions, the larger context of the actions are important issues that are of direct relevance, because they help understand the biological system to which the gene/protein belongs. Currently Gene Ontology (GO) Consortium offers the most comprehensive sets of relationships to describe gene/protein activity. However, its choice to segregate gene ontology to subdomains of molecular function, biological process and cellular component is creating significant limitations in terms of future scope of use. If we are to understand biology in its total complexity, comprehensive ontologies in larger biological domains are essential. A vigorous discussion on this topic is necessary for the larger benefit of the biological community. I highlight this point because larger-bio-domain ontologies cannot be simply created by integrating subdomain ontologies. Relationships in larger bio-domain-ontologies are more complex due to larger size of the system and are therefore more labor intensive to create. The current limitations of GO will be a handicap in derivation of more complex relationships from the high throughput biology data. 相似文献

2.

A dictionary-based approach for gene annotation.

L Pachter S Batzoglou V I Spitkovsky E Banks E S Lander D J Kleitman B Berger 《Journal of computational biology》1999,6(3-4):419-430

This paper describes a fast and fully automated dictionary-based approach to gene annotation and exon prediction. Two dictionaries are constructed, one from the nonredundant protein OWL database and the other from the dbEST database. These dictionaries are used to obtain O (1) time lookups of tuples in the dictionaries (4 tuples for the OWL database and 11 tuples for the dbEST database). These tuples can be used to rapidly find the longest matches at every position in an input sequence to the database sequences. Such matches provide very useful information pertaining to locating common segments between exons, alternative splice sites, and frequency data of long tuples for statistical purposes. These dictionaries also provide the basis for both homology determination, and statistical approaches to exon prediction. 相似文献

3.

miRDB: a microRNA target prediction and functional annotation database with a wiki interface 总被引：1，自引：0，他引：1

Wang X 《RNA (New York, N.Y.)》2008,14(6):1012-1017

MicroRNAs (miRNAs) are short noncoding RNAs that are involved in the regulation of thousands of gene targets. Recent studies indicate that miRNAs are likely to be master regulators of many important biological processes. Due to their functional importance, miRNAs are under intense study at present, and many studies have been published in recent years on miRNA functional characterization. The rapid accumulation of miRNA knowledge makes it challenging to properly organize and present miRNA function data. Although several miRNA functional databases have been developed recently, this remains a major bioinformatics challenge to miRNA research community. Here, we describe a new online database system, miRDB, on miRNA target prediction and functional annotation. Flexible web search interface was developed for the retrieval of target prediction results, which were generated with a new bioinformatics algorithm we developed recently. Unlike most other miRNA databases, miRNA functional annotations in miRDB are presented with a primary focus on mature miRNAs, which are the functional carriers of miRNA-mediated gene expression regulation. In addition, a wiki editing interface was established to allow anyone with Internet access to make contributions on miRNA functional annotation. This is a new attempt to develop an interactive community-annotated miRNA functional catalog. All data stored in miRDB are freely accessible at http://mirdb.org. 相似文献

4.

A combined approach for genome wide protein function annotation/prediction

Alfredo Benso Stefano Di Carlo Hafeez ur Rehman Gianfranco Politano Alessandro Savino Prashanth Suravajhala 《Proteome science》2013,11(Z1):S1

Background

Today large scale genome sequencing technologies are uncovering an increasing amount of new genes and proteins, which remain uncharacterized. Experimental procedures for protein function prediction are low throughput by nature and thus can't be used to keep up with the rate at which new proteins are discovered. On the other hand, proteins are the prominent stakeholders in almost all biological processes, and therefore the need to precisely know their functions for a better understanding of the underlying biological mechanism is inevitable. The challenge of annotating uncharacterized proteins in functional genomics and biology in general motivates the use of computational techniques well orchestrated to accurately predict their functions.

Methods

We propose a computational flow for the functional annotation of a protein able to assign the most probable functions to a protein by aggregating heterogeneous information. Considered information include: protein motifs, protein sequence similarity, and protein homology data gathered from interacting proteins, combined with data from highly similar non-interacting proteins (hereinafter called Similactors). Moreover, to increase the predictive power of our model we also compute and integrate term specific relationships among functional terms based on Gene Ontology (GO).

Results

We tested our method on Saccharomyces Cerevisiae and Homo sapiens species proteins. The aggregation of different structural and functional evidence with GO relationships outperforms, in terms of precision and accuracy of prediction than the other methods reported in literature. The predicted precision and accuracy is 100% for more than half of the input set for both species; overall, we obtained 85.38% precision and 81.95% accuracy for Homo sapiens and 79.73% precision and 80.06% accuracy for Saccharomyces Cerevisiae species proteins.

相似文献

5.

A bag-of-words approach for Drosophila gene expression pattern annotation

Shuiwang Ji Ying-Xin Li Zhi-Hua Zhou Sudhir Kumar Jieping Ye 《BMC bioinformatics》2009,10(1):119

Background

Drosophila gene expression pattern images document the spatiotemporal dynamics of gene expression during embryogenesis. A comparative analysis of these images could provide a fundamentally important way for studying the regulatory networks governing development. To facilitate pattern comparison and searching, groups of images in the Berkeley Drosophila Genome Project (BDGP) high-throughput study were annotated with a variable number of anatomical terms manually using a controlled vocabulary. Considering that the number of available images is rapidly increasing, it is imperative to design computational methods to automate this task. 相似文献

6.

Community-based gene structure annotation

Schlueter SD Wilkerson MD Huala E Rhee SY Brendel V 《Trends in plant science》2005,10(1):9-14

Uncertainty and inconsistency of gene structure annotation remain limitations on research in the genome era, frustrating both biologists and bioinformaticians, who have to sort out annotation errors for their genes of interest or to generate trustworthy datasets for algorithmic development. It is unrealistic to hope for better software solutions in the near future that would solve all the problems. The issue is all the more urgent with more species being sequenced and analyzed by comparative genomics - erroneous annotations could easily propagate, whereas correct annotations in one species will greatly facilitate annotation of novel genomes. We propose a dynamic, economically feasible solution to the annotation predicament: broad-based, web-technology-enabled community annotation, a prototype of which is now in use for Arabidopsis. 相似文献

7.

Complexity of automated gene annotation

Nikoloski Z Grimbs S Klie S Selbig J 《Bio Systems》2011,104(1):1-8

Describing the determinants of robustness of biological systems has become one of the central questions in systems biology. Despite the increasing research efforts, it has proven difficult to arrive at a unifying definition for this important concept. We argue that this is due to the multifaceted nature of the concept of robustness and the possibility to formally capture it at different levels of systemic formalisms (e.g., topology and dynamic behavior). Here we provide a comprehensive review of the existing definitions of robustness pertaining to metabolic networks. As kinetic approaches have been excellently reviewed elsewhere, we focus on definitions of robustness proposed within graph-theoretic and constraint-based formalisms. 相似文献

8.

Introduction: Validation methods for function genome annotation

Stodolsky M 《BMC genomics》2011,12(Z1):I1

相似文献

9.

Calling on a million minds for community annotation in WikiProteins 总被引：1，自引：0，他引：1

Mons B Ashburner M Chichester C van Mulligen E Weeber M den Dunnen J van Ommen GJ Musen M Cockerill M Hermjakob H Mons A Packer A Pacheco R Lewis S Berkeley A Melton W Barris N Wales J Meijssen G Moeller E Roes PJ Borner K Bairoch A 《Genome biology》2008,9(5):R89-15

WikiProteins enables community annotation in a Wiki-based system. Extracts of major data sources have been fused into an editable environment that links out to the original sources. Data from community edits create automatic copies of the original data. Semantic technology captures concepts co-occurring in one sentence and thus potential factual statements. In addition, indirect associations between concepts have been calculated. We call on a 'million minds' to annotate a 'million concepts' and to collect facts from the literature with the reward of collaborative knowledge discovery. The system is available for beta testing at http://www.wikiprofessional.org. 相似文献

10.

Information theory applied to the sparse gene ontology annotation network to predict novel gene function

Tao Y Sam L Li J Friedman C Lussier YA 《Bioinformatics (Oxford, England)》2007,23(13):i529-i538

MOTIVATION: Despite advances in the gene annotation process, the functions of a large portion of gene products remain insufficiently characterized. In addition, the in silico prediction of novel Gene Ontology (GO) annotations for partially characterized gene functions or processes is highly dependent on reverse genetic or functional genomic approaches. To our knowledge, no prediction method has been demonstrated to be highly accurate for sparsely annotated GO terms (those associated to fewer than 10 genes). RESULTS: We propose a novel approach, information theory-based semantic similarity (ITSS), to automatically predict molecular functions of genes based on existing GO annotations. Using a 10-fold cross-validation, we demonstrate that the ITSS algorithm obtains prediction accuracies (precision 97%, recall 77%) comparable to other machine learning algorithms when compared in similar conditions over densely annotated portions of the GO datasets. This method is able to generate highly accurate predictions in sparsely annotated portions of GO, where previous algorithms have failed. As a result, our technique generates an order of magnitude more functional predictions than previous methods. A 10-fold cross validation demonstrated a precision of 90% at a recall of 36% for the algorithm over sparsely annotated networks of the recent GO annotations (about 1400 GO terms and 11,000 genes in Homo sapiens). To our knowledge, this article presents the first historical rollback validation for the predicted GO annotations, which may represent more realistic conditions than more widely used cross-validation approaches. By manually assessing a random sample of 100 predictions conducted in a historical rollback evaluation, we estimate that a minimum precision of 51% (95% confidence interval: 43-58%) can be achieved for the human GO Annotation file dated 2003. AVAILABILITY: The program is available on request. The 97,732 positive predictions of novel gene annotations from the 2005 GO Annotation dataset and other supplementary information is available at http://phenos.bsd.uchicago.edu/ITSS/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. 相似文献

11.

MGOS: development of a community annotation database for Magnaporthe oryzae

Kour A Greer K Valent B Orbach MJ Soderlund C 《Molecular plant-microbe interactions : MPMI》2012,25(3):271-278

相似文献

12.

ASAP,a systematic annotation package for community analysis of genomes 总被引：10，自引：0，他引：10

Glasner JD Liss P Plunkett G Darling A Prasad T Rusch M Byrnes A Gilson M Biehl B Blattner FR Perna NT 《Nucleic acids research》2003,31(1):147-151

相似文献

13.

The use of multiple hierarchically independent gene ontology terms in gene function prediction and genome annotation

Kourmpetis YA van der Burgt A Bink MC Ter Braak CJ van Ham RC 《In silico biology》2007,7(6):575-582

相似文献

14.

Paradigm shifts in the approaches for gene annotation

Thanaraj TA Robinson A Muilu J Riethoven JJ 《Briefings in bioinformatics》2000,1(4):324-329

相似文献

15.

Enhanced function annotations for Drosophila serine proteases: a case study for systematic annotation of multi-member gene families 总被引：1，自引：0，他引：1

Shah PK Tripathi LP Jensen LJ Gahnim M Mason C Furlong EE Rodrigues V White KP Bork P Sowdhamini R 《Gene》2008,407(1-2):199-215

Systematically annotating function of enzymes that belong to large protein families encoded in a single eukaryotic genome is a very challenging task. We carried out such an exercise to annotate function for serine-protease family of the trypsin fold in Drosophila melanogaster, with an emphasis on annotating serine-protease homologues (SPHs) that may have lost their catalytic function. Our approach involves data mining and data integration to provide function annotations for 190 Drosophila gene products containing serine-protease-like domains, of which 35 are SPHs. This was accomplished by analysis of structure-function relationships, gene-expression profiles, large-scale protein-protein interaction data, literature mining and bioinformatic tools. We introduce functional residue clustering (FRC), a method that performs hierarchical clustering of sequences using properties of functionally important residues and utilizes correlation co-efficient as a quantitative similarity measure to transfer in vivo substrate specificities to proteases. We show that the efficiency of transfer of substrate-specificity information using this method is generally high. FRC was also applied on Drosophila proteases to assign putative competitive inhibitor relationships (CIRs). Microarray gene-expression data were utilized to uncover a large-scale and dual involvement of proteases in development and in immune response. We found specific recruitment of SPHs and proteases with CLIP domains in immune response, suggesting evolution of a new function for SPHs. We also suggest existence of separate downstream protease cascades for immune response against bacterial/fungal infections and parasite/parasitoid infections. We verify quality of our annotations using information from RNAi screens and other evidence types. Utilization of such multi-fold approaches results in 10-fold increase of function annotation for Drosophila serine proteases and demonstrates value in increasing annotations in multiple genomes. 相似文献

16.

DIG--a system for gene annotation and functional discovery

Delong M Yao G Wang Q Dobra A Black EP Chang JT Bild A West M Nevins JR Dressman H 《Bioinformatics (Oxford, England)》2005,21(13):2957-2959

SUMMARY: We describe a database and information discovery system named DIG (Duke Integrated Genomics) designed to facilitate the process of gene annotation and the discovery of functional context. The DIG system collects and organizes gene annotation and functional information, and includes tools that support an understanding of genes in a functional context by providing a framework for integrating and visualizing gene expression, protein interaction and literature-based interaction networks. 相似文献

17.

Mass spectrometry-based prokaryote gene annotation

Ishino Y Okada H Ikeuchi M Taniguchi H 《Proteomics》2007,7(22):4053-4065

MS combined with database searching has become the preferred method for identifying proteins present in cell or tissue samples. The technique enables us to execute large-scale proteome analyses of species whose genomes have already been sequenced. Searching mass spectrometric data against protein databases composed of annotated genes has been widely conducted. However, there are some issues with this technique; wrong annotations in protein databases cause deterioration in the accuracy of protein identification, and only proteins that have already been annotated can be identified. We propose a new framework that can detect correct ORFs by integrating an MS/MS proteomic data mapping and a knowledge-based system regarding the translation initiation sites. This technique can provide correction of predicted coding sequences, together with the possibility of identifying novel genes. We have developed a computational system; it should first conduct the probabilistic peptide-matching against all possible translational frames using MS/MS data, then search for discriminative DNA patterns around the detected peptides, and lastly integrate the facts using empirical knowledge stored in knowledge bases to obtain correct ORFs. We used photosynthetic bacteria Synechocystis sp. PCC6803 as a sample prokaryote, resulting in the finding of 14 N-terminus annotation errors and several new candidate genes. 相似文献

18.

Evolution of gene regulation of pluripotency - the case for wiki tracks at genome browsers

Georg Fuellen Stephan Struckmann 《Biology direct》2010,5(1):67

相似文献

19.

Automated discovery of 3D motifs for protein function annotation 总被引：2，自引：0，他引：2

Polacco BJ Babbitt PC 《Bioinformatics (Oxford, England)》2006,22(6):723-730

MOTIVATION: Function inference from structure is facilitated by the use of patterns of residues (3D motifs), normally identified by expert knowledge, that correlate with function. As an alternative to often limited expert knowledge, we use machine-learning techniques to identify patterns of 3-10 residues that maximize function prediction. This approach allows us to test the assumption that residues that provide function are the most informative for predicting function. RESULTS: We apply our method, GASPS, to the haloacid dehalogenase, enolase, amidohydrolase and crotonase superfamilies and to the serine proteases. The motifs found by GASPS are as good at function prediction as 3D motifs based on expert knowledge. The GASPS motifs with the greatest ability to predict protein function consist mainly of known functional residues. However, several residues with no known functional role are equally predictive. For four groups, we show that the predictive power of our 3D motifs is comparable with or better than approaches that use the entire fold (Combinatorial-Extension) or sequence profiles (PSI-BLAST). AVAILABILITY: Source code is freely available for academic use by contacting the authors. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. 相似文献

20.

Protein function annotation by homology-based inference

Yaniv Loewenstein Domenico Raimondo Oliver C Redfern James Watson Dmitrij Frishman Michal Linial Christine Orengo Janet Thornton Anna Tramontano 《Genome biology》2009,10(2):207-8

With many genomes now sequenced, computational annotation methods to characterize genes and proteins from their sequence are increasingly important. The BioSapiens Network has developed tools to address all stages of this process, and here we review progress in the automated prediction of protein function based on protein sequence and structure. 相似文献