共查询到20条相似文献,搜索用时 31 毫秒
1.
King OD Lee JC Dudley AM Janse DM Church GM Roth FP 《Bioinformatics (Oxford, England)》2003,19(Z1):i183-i189
MOTIVATION:Predicting the outcome of specific experiments (such as the growth of a particular mutant strain in a particular medium) has the potential to allow researchers to devote resources to experiments with higher expected numbers of 'hits'. RESULTS: We use decision trees to predict phenotypes associated with Saccharomyces cerevisiae genes on the basis of Gene Ontology (GO) functional annotations from the Saccharomyces Genome Database (SGD) and other phenotypic annotations from the Yeast Phenotype Catalog at the Munich Information Center for Protein Sequences (MIPS). We assess the methodology in three ways: (1) we use cross-validation on the phenotypic annotations listed in MIPS, and show ROC curves indicating the tradeoff between true-positive rate and false-positive rate; (2) we do a literature-search for 100 of the predicted gene-phenotype associations that are not listed in MIPS, and find evidence for 43 of them; (3) we use deletion strains to experimentally assess 61 predicted gene-phenotype associations not listed in MIPS; significantly more of these deletion strains show abnormal growth than would be expected by chance. 相似文献
2.
MOTIVATION: Probabilistic graphical models have been developed in the past for the task of protein classification. In many cases, classifications obtained from the Gene Ontology have been used to validate these models. In this work we directly incorporate the structure of the Gene Ontology into the graphical representation for protein classification. We present a method in which each protein is represented by a replicate of the Gene Ontology structure, effectively modeling each protein in its own 'annotation space'. Proteins are also connected to one another according to different measures of functional similarity, after which belief propagation is run to make predictions at all ontology terms. RESULTS: The proposed method was evaluated on a set of 4879 proteins from the Saccharomyces Genome Database whose interactions were also recorded in the GRID project. Results indicate that direct utilization of the Gene Ontology improves predictive ability, outperforming traditional models that do not take advantage of dependencies among functional terms. Average increase in accuracy (precision) of positive and negative term predictions of 27.8% (2.0%) over three different similarity measures and three subontologies was observed. AVAILABILITY: C/C++/Perl implementation is available from authors upon request. 相似文献
3.
4.
The Alliance of Genome Resources (the Alliance) is a combined effort of 7 knowledgebase projects: Saccharomyces Genome Database, WormBase, FlyBase, Mouse Genome Database, the Zebrafish Information Network, Rat Genome Database, and the Gene Ontology Resource. The Alliance seeks to provide several benefits: better service to the various communities served by these projects; a harmonized view of data for all biomedical researchers, bioinformaticians, clinicians, and students; and a more sustainable infrastructure. The Alliance has harmonized cross-organism data to provide useful comparative views of gene function, gene expression, and human disease relevance. The basis of the comparative views is shared calls of orthology relationships and the use of common ontologies. The key types of data are alleles and variants, gene function based on gene ontology annotations, phenotypes, association to human disease, gene expression, protein–protein and genetic interactions, and participation in pathways. The information is presented on uniform gene pages that allow facile summarization of information about each gene in each of the 7 organisms covered (budding yeast, roundworm Caenorhabditis elegans, fruit fly, house mouse, zebrafish, brown rat, and human). The harmonized knowledge is freely available on the alliancegenome.org portal, as downloadable files, and by APIs. We expect other existing and emerging knowledge bases to join in the effort to provide the union of useful data and features that each knowledge base currently provides. 相似文献
5.
King NL Deutsch EW Ranish JA Nesvizhskii AI Eddes JS Mallick P Eng J Desiere F Flory M Martin DB Kim B Lee H Raught B Aebersold R 《Genome biology》2006,7(11):R106-15
We present the Saccharomyces cerevisiae PeptideAtlas composed from 47 diverse experiments and 4.9 million tandem mass spectra. The observed peptides align to 61% of Saccharomyces Genome Database (SGD) open reading frames (ORFs), 49% of the uncharacterized SGD ORFs, 54% of S. cerevisiae ORFs with a Gene Ontology annotation of 'molecular function unknown', and 76% of ORFs with Gene names. We highlight the use of this resource for data mining, construction of high quality lists for targeted proteomics, validation of proteins, and software development. 相似文献
6.
7.
8.
Martin G Reese Barry Moore Colin Batchelor Fidel Salas Fiona Cunningham Gabor T Marth Lincoln Stein Paul Flicek Mark Yandell Karen Eilbeck 《Genome biology》2010,11(8):R88
Here we describe the Genome Variation Format (GVF) and the 10Gen dataset. GVF, an extension of Generic Feature Format version
3 (GFF3), is a simple tab-delimited format for DNA variant files, which uses Sequence Ontology to describe genome variation
data. The 10Gen dataset, ten human genomes in GVF format, is freely available for community analysis from the Sequence Ontology
website and from an Amazon elastic block storage (EBS) snapshot for use in Amazon's EC2 cloud computing environment. 相似文献
9.
Genome-wide analysis of human disease alleles reveals that their locations are correlated in paralogous proteins
下载免费PDF全文
![点击此处可从《PLoS computational biology》网站下载免费的PDF全文](/ch/ext_images/free.gif)
Yandell M Moore B Salas F Mungall C MacBride A White C Reese MG 《PLoS computational biology》2008,4(11):e1000218
The millions of mutations and polymorphisms that occur in human populations are potential predictors of disease, of our reactions to drugs, of predisposition to microbial infections, and of age-related conditions such as impaired brain and cardiovascular functions. However, predicting the phenotypic consequences and eventual clinical significance of a sequence variant is not an easy task. Computational approaches have found perturbation of conserved amino acids to be a useful criterion for identifying variants likely to have phenotypic consequences. To our knowledge, however, no study to date has explored the potential of variants that occur at homologous positions within paralogous human proteins as a means of identifying polymorphisms with likely phenotypic consequences. In order to investigate the potential of this approach, we have assembled a unique collection of known disease-causing variants from OMIM and the Human Genome Mutation Database (HGMD) and used them to identify and characterize pairs of sequence variants that occur at homologous positions within paralogous human proteins. Our analyses demonstrate that the locations of variants are correlated in paralogous proteins. Moreover, if one member of a variant-pair is disease-causing, its partner is likely to be disease-causing as well. Thus, information about variant-pairs can be used to identify potentially disease-causing variants, extend existing procedures for polymorphism prioritization, and provide a suite of candidates for further diagnostic and therapeutic purposes. 相似文献
10.
11.
Mahima Vedi Harika S Nalabolu Chien-Wei Lin Matthew J Hoffman Jennifer R Smith Kent Brodie Jeffrey L De Pons Wendy M Demos Adam C Gibson G Thomas Hayman Morgan L Hill Mary L Kaldunski Logan Lamers Stanley J F Laulederkind Ketaki Thorat Jyothi Thota Monika Tutaj Marek A Tutaj Shur-Jen Wang Stacy Zacher Melinda R Dwinell Anne E Kwitek 《Genetics》2022,220(4)
Biological interpretation of a large amount of gene or protein data is complex. Ontology analysis tools are imperative in finding functional similarities through overrepresentation or enrichment of terms associated with the input gene or protein lists. However, most tools are limited by their ability to do ontology-specific and species-limited analyses. Furthermore, some enrichment tools are not updated frequently with recent information from databases, thus giving users inaccurate, outdated or uninformative data. Here, we present MOET or the Multi-Ontology Enrichment Tool (v.1 released in April 2019 and v.2 released in May 2021), an ontology analysis tool leveraging data that the Rat Genome Database (RGD) integrated from in-house expert curation and external databases including the National Center for Biotechnology Information (NCBI), Mouse Genome Informatics (MGI), The Kyoto Encyclopedia of Genes and Genomes (KEGG), The Gene Ontology Resource, UniProt-GOA, and others. Given a gene or protein list, MOET analysis identifies significantly overrepresented ontology terms using a hypergeometric test and provides nominal and Bonferroni corrected P-values and odds ratios for the overrepresented terms. The results are shown as a downloadable list of terms with and without Bonferroni correction, and a graph of the P-values and number of annotated genes for each term in the list. MOET can be accessed freely from https://rgd.mcw.edu/rgdweb/enrichment/start.html. 相似文献
12.
Birgit H M Meldal Carles Pons Livia Perfetto Noemi Del-Toro Edith Wong Patrick Aloy Henning Hermjakob Sandra Orchard Pablo Porras 《Nucleic acids research》2021,49(6):3156
The EMBL-EBI Complex Portal is a knowledgebase of macromolecular complexes providing persistent stable identifiers. Entries are linked to literature evidence and provide details of complex membership, function, structure and complex-specific Gene Ontology annotations. Data are freely available and downloadable in HUPO-PSI community standards and missing entries can be requested for curation. In collaboration with Saccharomyces Genome Database and UniProt, the yeast complexome, a compendium of all known heteromeric assemblies from the model organism Saccharomyces cerevisiae, was curated. This expansion of knowledge and scope has led to a 50% increase in curated complexes compared to the previously published dataset, CYC2008. The yeast complexome is used as a reference resource for the analysis of complexes from large-scale experiments. Our analysis showed that genes coding for proteins in complexes tend to have more genetic interactions, are co-expressed with more genes, are more multifunctional, localize more often in the nucleus, and are more often involved in nucleic acid-related metabolic processes and processes where large machineries are the predominant functional drivers. A comparison to genetic interactions showed that about 40% of expanded co-complex pairs also have genetic interactions, suggesting strong functional links between complex members. 相似文献
13.
Hunter L 《Genome biology》2002,3(6):interactions1002.1-interactions10022
A response to Life sentences: Ontology recapitulates philology by Sydney Brenner, Genome Biology 2002, 3:comment1006.1-1006.2. 相似文献
14.
15.
Karolchik D Baertsch R Diekhans M Furey TS Hinrichs A Lu YT Roskin KM Schwartz M Sugnet CW Thomas DJ Weber RJ Haussler D Kent WJ;University of California Santa Cruz 《Nucleic acids research》2003,31(1):51-54
The University of California Santa Cruz (UCSC) Genome Browser Database is an up to date source for genome sequence data integrated with a large collection of related annotations. The database is optimized to support fast interactive performance with the web-based UCSC Genome Browser, a tool built on top of the database for rapid visualization and querying of the data at many levels. The annotations for a given genome are displayed in the browser as a series of tracks aligned with the genomic sequence. Sequence data and annotations may also be viewed in a text-based tabular format or downloaded as tab-delimited flat files. The Genome Browser Database, browsing tools and downloadable data files can all be found on the UCSC Genome Bioinformatics website (http://genome.ucsc.edu), which also contains links to documentation and related technical information. 相似文献
16.
The Human Gene Mutation Database (HGMD) represents a comprehensive core collection of data on published germline mutations in nuclear genes underlying human inherited disease. By September 1997, the database contained nearly 12 000 different lesions in a total of 636 different genes, with new entries currently accumulating at a rate of over 2000 per annum. Although originally established for the scientific study of mutational mechanisms in human genes, HGMD has acquired a much broader utility to researchers, physicians and genetic counsellors so that it was made publicly available at http://uwcm.ac.uk/uwcm/mg/hgmd0.html in April 1996. Mutation data in HGMD are accessible on the basis of every gene being allocated one web page per mutation type, if data of that type are present. Meaningful integration with phenotypic, structural and mapping information has been accomplished through bi-directional links between HGMD and both the Genome Database (GDB) and Online Mendelian Inheritance in Man (OMIM), Baltimore, USA. Hypertext links have also been established to Medline abstracts through Entrez , and to a collection of 458 reference cDNA sequences also used for data checking. Being both comprehensive and fully integrated into the existing bioinformatics structures relevant to human genetics, HGMD has established itself as the central core database of inherited human gene mutations. 相似文献
17.
Jaiswal P Ware D Ni J Chang K Zhao W Schmidt S Pan X Clark K Teytelman L Cartinhour S Stein L McCouch S 《Comparative and Functional Genomics》2002,3(2):132-136
Gramene (http://www.gramene.org/) is a comparative genome database for cereal crops and a community resource for rice. We are populating and curating Gramene with annotated rice (Oryza sativa) genomic sequence data and associated biological information including molecular markers, mutants, phenotypes, polymorphisms and Quantitative Trait Loci (QTL). In order to support queries across various data sets as well as across external databases, Gramene will employ three related controlled vocabularies. The specific goal of Gramene is, first to provide a Trait Ontology (TO) that can be used across the cereal crops to facilitate phenotypic comparisons both within and between the genera. Second, a vocabulary for plant anatomy terms, the Plant Ontology (PO) will facilitate the curation of morphological and anatomical feature information with respect to expression, localization of genes and gene products and the affected plant parts in a phenotype. The TO and PO are both in the early stages of development in collaboration with the International Rice Research Institute, TAIR and MaizeDB as part of the Plant Ontology Consortium. Finally, as part of another consortium comprising macromolecular databases from other model organisms, the Gene Ontology Consortium, we are annotating the confirmed and predicted protein entries from rice using both electronic and manual curation. 相似文献
18.
Background
Biomedical ontologies are increasingly instrumental in the advancement of biological research primarily through their use to efficiently consolidate large amounts of data into structured, accessible sets. However, ontology development and usage can be hampered by the segregation of knowledge by domain that occurs due to independent development and use of the ontologies. The ability to infer data associated with one ontology to data associated with another ontology would prove useful in expanding information content and scope. We here focus on relating two ontologies: the Gene Ontology (GO), which encodes canonical gene function, and the Mammalian Phenotype Ontology (MP), which describes non-canonical phenotypes, using statistical methods to suggest GO functional annotations from existing MP phenotype annotations. This work is in contrast to previous studies that have focused on inferring gene function from phenotype primarily through lexical or semantic similarity measures.Results
We have designed and tested a set of algorithms that represents a novel methodology to define rules for predicting gene function by examining the emergent structure and relationships between the gene functions and phenotypes rather than inspecting the terms semantically. The algorithms inspect relationships among multiple phenotype terms to deduce if there are cases where they all arise from a single gene function.We apply this methodology to data about genes in the laboratory mouse that are formally represented in the Mouse Genome Informatics (MGI) resource. From the data, 7444 rule instances were generated from five generalized rules, resulting in 4818 unique GO functional predictions for 1796 genes.Conclusions
We show that our method is capable of inferring high-quality functional annotations from curated phenotype data. As well as creating inferred annotations, our method has the potential to allow for the elucidation of unforeseen, biologically significant associations between gene function and phenotypes that would be overlooked by a semantics-based approach. Future work will include the implementation of the described algorithms for a variety of other model organism databases, taking full advantage of the abundance of available high quality curated data.Electronic supplementary material
The online version of this article (doi:10.1186/s12859-014-0405-z) contains supplementary material, which is available to authorized users. 相似文献19.
The Human Phenotype Ontology: a tool for annotating and analyzing human hereditary disease 总被引:2,自引:0,他引:2
Robinson PN Köhler S Bauer S Seelow D Horn D Mundlos S 《American journal of human genetics》2008,83(5):610-615
There are many thousands of hereditary diseases in humans, each of which has a specific combination of phenotypic features, but computational analysis of phenotypic data has been hampered by lack of adequate computational data structures. Therefore, we have developed a Human Phenotype Ontology (HPO) with over 8000 terms representing individual phenotypic anomalies and have annotated all clinical entries in Online Mendelian Inheritance in Man with the terms of the HPO. We show that the HPO is able to capture phenotypic similarities between diseases in a useful and highly significant fashion. 相似文献
20.
The value of the Genome Database (GDB) for the human genome research community has been greatly increased since the release of version 6. 0 last year. Thanks to the introduction of significant technical improvements, GDB has seen dramatic growth in the type and volume of information stored in the database. This article summarizes the types of data that are now available in the Genome Database, demonstrates how the database is interconnected with other biomedical resources on the World Wide Web, discusses how researchers can contribute new or updated information to the database, and describes our current efforts as well as planned improvements for the future. 相似文献