首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
With numerous whole genomes now in hand, and experimental data about genes and biological pathways on the increase, a systems approach to biological research is becoming essential. Ontologies provide a formal representation of knowledge that is amenable to computational as well as human analysis, an obvious underpinning of systems biology. Mapping function to gene products in the genome consists of two, somewhat intertwined enterprises: ontology building and ontology annotation. Ontology building is the formal representation of a domain of knowledge; ontology annotation is association of specific genomic regions (which we refer to simply as 'genes', including genes and their regulatory elements and products such as proteins and functional RNAs) to parts of the ontology. We consider two complementary representations of gene function: the Gene Ontology (GO) and pathway ontologies. GO represents function from the gene's eye view, in relation to a large and growing context of biological knowledge at all levels. Pathway ontologies represent function from the point of view of biochemical reactions and interactions, which are ordered into networks and causal cascades. The more mature GO provides an example of ontology annotation: how conclusions from the scientific literature and from evolutionary relationships are converted into formal statements about gene function. Annotations are made using a variety of different types of evidence, which can be used to estimate the relative reliability of different annotations.  相似文献   

2.
3.
Additional gene ontology structure for improved biological reasoning   总被引:5,自引:0,他引:5  
MOTIVATION: The Gene Ontology (GO) is a widely used terminology for gene product characterization in, for example, interpretation of biology underlying microarray experiments. The current GO defines term relationships within each of the independent subontologies: molecular function, biological process and cellular component. However, it is evident that there also exist biological relationships between terms of different subontologies. Our aim was to connect the three subontologies to enable GO to cover more biological knowledge, enable a more consistent use of GO and provide new opportunities for biological reasoning. RESULTS: We propose a new structure, the Second Gene Ontology Layer, capturing biological relations not directly reflected in the present ontology structure. Given molecular functions, these paths identify biological processes where the molecular functions are involved and cellular components where they are active. The current Second Layer contains 6271 validated paths, covering 54% of the molecular functions of GO and can be used to render existing gene annotation sets more complete and consistent. Applying Second Layer paths to a set of 4223 human genes, increased biological process annotations by 24% compared to publicly available annotations and reproduced 30% of them. AVAILABILITY: The Second GO is publicly available through the GO Annotation Toolbox (GOAT.no): http://www.goat.no.  相似文献   

4.
5.

Background  

Semantic similarity measures are useful to assess the physiological relevance of protein-protein interactions (PPIs). They quantify similarity between proteins based on their function using annotation systems like the Gene Ontology (GO). Proteins that interact in the cell are likely to be in similar locations or involved in similar biological processes compared to proteins that do not interact. Thus the more semantically similar the gene function annotations are among the interacting proteins, more likely the interaction is physiologically relevant. However, most semantic similarity measures used for PPI confidence assessment do not consider the unequal depth of term hierarchies in different classes of cellular location, molecular function, and biological process ontologies of GO and thus may over-or under-estimate similarity.  相似文献   

6.
A central problem in current biology is elucidating the molecular networks that drive developmental change and physiological function. Such knowledge is needed partly to understand these networks, partly to be able to manipulate them, and partly to understand and help treat those human congenital abnormalities that arise as a result of mutation. Thus far, bioinformatics technology has been of limited use in this enterprise, mainly because its core focus has been on sequence technology and data archiving. For bioinformatics to be of use in this next tier of investigations, genetic and protein data need to be both archived and searchable by tissue since this is the level at which these networks operate. The resulting databases in turn require ontologies of developmental anatomy that can provide the formal infrastructure for handling gene expression, microarray and other tissue-based data. Here, the progress in making such ontologies, particularly for the developing mouse, is reported and the uses to which they are and will be put, together with the resources and tools currently available for investigating molecular networks and the genetic basis of congenital abnormalities, are considered.  相似文献   

7.
Currently, literature is integrated in systems biology studies in three ways. Hand-curated pathways have been sufficient for assembling models in numerous studies. Second, literature is frequently accessed in a derived form, such as the concepts represented by the Medical Subject Headings (MeSH) and Gene Ontologies (GO), or functional relationships captured in protein-protein interaction (PPI) databases; both of these are convenient, consistent reductions of more complex concepts expressed as free text in the literature. Moreover, their contents are easily integrated into computational processes required for dealing with large data sets. Last, mining text directly for specific types of information is on the rise as text analytics methods become more accurate and accessible. These uses of literature, specifically manual curation, derived concepts captured in ontologies and databases, and indirect and direct application of text mining, will be discussed as they pertain to systems biology.  相似文献   

8.
We wished to quantify the state-of-the-art of our understanding of clusters in microarray data. To do this we systematically compared the clusters produced on sets of microarray data using a representative set of clustering algorithms (hierarchical, k-means, and a modified version of QT_CLUST) with the annotation schemes MIPS, GeneOntology and GenProtEC. We assumed that if a cluster reflected known biology its members would share related ontological annotations. This assumption is the basis of "guilt-by-association" and is commonly used to assign the putative function of proteins. To statistically measure the relationship between cluster and annotation we developed a new predictive discriminatory measure. We found that the clusters found in microarray data do not in general agree with functional annotation classes. Although many statistically significant relationships can be found, the majority of clusters are not related to known biology (as described in annotation ontologies). This implies that use of guilt-by-association is not supported by annotation ontologies. Depending on the estimate of the amount of noise in the data, our results suggest that bioinformatics has only codified a small proportion of the biological knowledge required to understand microarray data.  相似文献   

9.
Learnability-based further prediction of gene functions in Gene Ontology   总被引:9,自引:0,他引:9  
Tu K  Yu H  Guo Z  Li X 《Genomics》2004,84(6):922-928
Currently the functional annotations of many genes are not specific enough, limiting their further application in biology and medicine. It is necessary to push the gene functional annotations deeper in Gene Ontology (GO), or to predict further annotated genes with more specific GO terms. A framework of learnability-based further prediction of gene functions in GO is proposed in this paper. Local classifiers are constructed in local classification spaces rooted at qualified parent nodes in GO, and their classification performances are evaluated with the averaged Tanimoto index (ATI). Classification spaces with higher ATIs are selected out, and genes annotated only to the parent classes are predicted to child classes. Through learnability-based further predicting, the functional annotations of annotated genes are made more specific. Experiments on the fibroblast serum response dataset reported further functional predictions for several human genes and also gave interesting clues to the varied learnability between classes of different GO ontologies, different levels, and different numbers of child classes.  相似文献   

10.

Background

Biomedical ontologies are increasingly instrumental in the advancement of biological research primarily through their use to efficiently consolidate large amounts of data into structured, accessible sets. However, ontology development and usage can be hampered by the segregation of knowledge by domain that occurs due to independent development and use of the ontologies. The ability to infer data associated with one ontology to data associated with another ontology would prove useful in expanding information content and scope. We here focus on relating two ontologies: the Gene Ontology (GO), which encodes canonical gene function, and the Mammalian Phenotype Ontology (MP), which describes non-canonical phenotypes, using statistical methods to suggest GO functional annotations from existing MP phenotype annotations. This work is in contrast to previous studies that have focused on inferring gene function from phenotype primarily through lexical or semantic similarity measures.

Results

We have designed and tested a set of algorithms that represents a novel methodology to define rules for predicting gene function by examining the emergent structure and relationships between the gene functions and phenotypes rather than inspecting the terms semantically. The algorithms inspect relationships among multiple phenotype terms to deduce if there are cases where they all arise from a single gene function.We apply this methodology to data about genes in the laboratory mouse that are formally represented in the Mouse Genome Informatics (MGI) resource. From the data, 7444 rule instances were generated from five generalized rules, resulting in 4818 unique GO functional predictions for 1796 genes.

Conclusions

We show that our method is capable of inferring high-quality functional annotations from curated phenotype data. As well as creating inferred annotations, our method has the potential to allow for the elucidation of unforeseen, biologically significant associations between gene function and phenotypes that would be overlooked by a semantics-based approach. Future work will include the implementation of the described algorithms for a variety of other model organism databases, taking full advantage of the abundance of available high quality curated data.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-014-0405-z) contains supplementary material, which is available to authorized users.  相似文献   

11.
Advanced statistical methods used to analyze high-throughput data such as gene-expression assays result in long lists of “significant genes.” One way to gain insight into the significance of altered expression levels is to determine whether Gene Ontology (GO) terms associated with a particular biological process, molecular function, or cellular component are over- or under-represented in the set of genes deemed significant. This process, referred to as enrichment analysis, profiles a gene-set, and is widely used to makes sense of the results of high-throughput experiments. The canonical example of enrichment analysis is when the output dataset is a list of genes differentially expressed in some condition. To determine the biological relevance of a lengthy gene list, the usual solution is to perform enrichment analysis with the GO. We can aggregate the annotating GO concepts for each gene in this list, and arrive at a profile of the biological processes or mechanisms affected by the condition under study. While GO has been the principal target for enrichment analysis, the methods of enrichment analysis are generalizable. We can conduct the same sort of profiling along other ontologies of interest. Just as scientists can ask “Which biological process is over-represented in my set of interesting genes or proteins?” we can also ask “Which disease (or class of diseases) is over-represented in my set of interesting genes or proteins?“. For example, by annotating known protein mutations with disease terms from the ontologies in BioPortal, Mort et al. recently identified a class of diseases—blood coagulation disorders—that were associated with a 14-fold depletion in substitutions at O-linked glycosylation sites. With the availability of tools for automatic annotation of datasets with terms from disease ontologies, there is no reason to restrict enrichment analyses to the GO. In this chapter, we will discuss methods to perform enrichment analysis using any ontology available in the biomedical domain. We will review the general methodology of enrichment analysis, the associated challenges, and discuss the novel translational analyses enabled by the existence of public, national computational infrastructure and by the use of disease ontologies in such analyses.

What to Learn in This Chapter

  • Review the commonly used approach of Gene Ontology based enrichment analysis
  • Understand the pitfalls associated with current approaches
  • Understand the national infrastructure available for using alternative ontologies for enrichment analysis
  • Learn about a generalized enrichment analysis workflow and its application using disease ontologies
This article is part of the “Translational Bioinformatics” collection for PLOS Computational Biology.
  相似文献   

12.
MOTIVATION: Natural language processing (NLP) techniques are increasingly being used in biology to automate the capture of new biological discoveries in text, which are being reported at a rapid rate. Yet, information represented in NLP data structures is classically very different from information organized with ontologies as found in model organisms or genetic databases. To facilitate the computational reuse and integration of information buried in unstructured text with that of genetic databases, we propose and evaluate a translational schema that represents a comprehensive set of phenotypic and genetic entities, as well as their closely related biomedical entities and relations as expressed in natural language. In addition, the schema connects different scales of biological information, and provides mappings from the textual information to existing ontologies, which are essential in biology for integration, organization, dissemination and knowledge management of heterogeneous phenotypic information. A common comprehensive representation for otherwise heterogeneous phenotypic and genetic datasets, such as the one proposed, is critical for advancing systems biology because it enables acquisition and reuse of unprecedented volumes of diverse types of knowledge and information from text. RESULTS: A novel representational schema, PGschema, was developed that enables translation of phenotypic, genetic and their closely related information found in textual narratives to a well-defined data structure comprising phenotypic and genetic concepts from established ontologies along with modifiers and relationships. Evaluation for coverage of a selected set of entities showed that 90% of the information could be represented (95% confidence interval: 86-93%; n = 268). Moreover, PGschema can be expressed automatically in an XML format using natural language techniques to process the text. To our knowledge, we are providing the first evaluation of a translational schema for NLP that contains declarative knowledge about genes and their associated biomedical data (e.g. phenotypes). AVAILABILITY: http://zellig.cpmc.columbia.edu/PGschema  相似文献   

13.
We analyze human-specific KEGG pathways trying to understand the functional role of intrinsic disorder in proteins. Pathways provide a comprehensive picture of biological processes and allow better understanding of a protein's function within the specific context of its surroundings. Our study pinpoints a few specific pathways significantly enriched in disorder-containing proteins and identifies the role of these proteins within the framework of pathway relationships. Three major categories of relations are shown to be significantly enriched in disordered proteins: gene expression, protein binding and to a lesser degree, protein phosphorylation. Finally we find that relations involving protein activation and to some extent inhibition are characterized by low disorder content.  相似文献   

14.
Integrating 'omic' information: a bridge between genomics and systems biology   总被引:17,自引:0,他引:17  
The availability of genome sequences for several organisms, including humans, and the resulting first-approximation lists of genes, have allowed a transition from molecular biology to 'modular biology'. In modular biology, biological processes of interest, or modules, are studied as complex systems of functionally interacting macromolecules. Functional genomic and proteomic ('omic') approaches can be helpful to accelerate the identification of the genes and gene products involved in particular modules, and to describe the functional relationships between them. However, the data emerging from individual omic approaches should be viewed with caution because of the occurrence of false-negative and false-positive results and because single annotations are not sufficient for an understanding of gene function. To increase the reliability of gene function annotation, multiple independent datasets need to be integrated. Here, we review the recent development of strategies for such integration and we argue that these will be important for a systems approach to modular biology.  相似文献   

15.
Systems biology in drug discovery   总被引:15,自引:0,他引:15  
The hope of the rapid translation of 'genes to drugs' has foundered on the reality that disease biology is complex, and that drug development must be driven by insights into biological responses. Systems biology aims to describe and to understand the operation of complex biological systems and ultimately to develop predictive models of human disease. Although meaningful molecular level models of human cell and tissue function are a distant goal, systems biology efforts are already influencing drug discovery. Large-scale gene, protein and metabolite measurements ('omics') dramatically accelerate hypothesis generation and testing in disease models. Computer simulations integrating knowledge of organ and system-level responses help prioritize targets and design clinical trials. Automation of complex primary human cell-based assay systems designed to capture emergent properties can now integrate a broad range of disease-relevant human biology into the drug discovery process, informing target and compound validation, lead optimization, and clinical indication selection. These systems biology approaches promise to improve decision making in pharmaceutical development.  相似文献   

16.
MOTIVATION: An important contribution to the Gene Ontology (GO) project is to develop tools that facilitate the creation, maintenance and use of ontologies. Several tools have been created for communicating and using the GO project. However, a limitation with most of these tools is that they suffer from lack of a comprehensive search facility. We developed a web application, GOfetcher, with a very comprehensive search facility for the GO project and a variety of output formats for the results. GOfetcher has three different levels for searching the GO: 'Quick Search', 'Advanced Search' and 'Upload Files' for searching. The application includes a unique search option which generates gene information given a nucleotide or protein accession number which can then be used in generating GO information. The output data in GOfetcher can be saved into several different formats; including spreadsheet, comma-separated values and the extensible markup language (XML) format. The database is available at http://mcbc.usm.edu/gofetcher/.  相似文献   

17.
Developmental biology, like many other areas of biology, has undergone a dramatic shift in the perspective from which developmental processes are viewed. Instead of focusing on the actions of a handful of genes or functional RNAs, we now consider the interactions of large functional gene networks and study how these complex systems orchestrate the unfolding of an organism, from gametes to adult. Developmental biologists are beginning to realize that understanding ontogeny on this scale requires the utilization of computational methods to capture, store and represent the knowledge we have about the underlying processes. Here we review the use of the Gene Ontology (GO) to study developmental biology. We describe the organization and structure of the GO and illustrate some of the ways we use it to capture the current understanding of many common developmental processes. We also discuss ways in which gene product annotations using the GO have been used to ask and answer developmental questions in a variety of model developmental systems. We provide suggestions as to how the GO might be used in more powerful ways to address questions about development. Our goal is to provide developmental biologists with enough background about the GO that they can begin to think about how they might use the ontology efficiently and in the most powerful ways possible. Mol. Reprod. Dev. 77: 314–329, 2010. © 2009 Wiley‐Liss, Inc.  相似文献   

18.
An enormous amount of information and materials in the field of biology has been accumulating, such as nucleotide and amino acid sequences, gene and protein functions, mutants and their phenotypes, and literature references, produced by the rapid development in this field. Effective use of the information may strongly promote biological studies, and may lead to many important findings. It is, however, time-consuming and laborious for individual researchers to collect information from individual original sites and to rearrange it for their own purpose. A concept, ontology, has been introduced in biology to support and encourage researchers to share and reuse information among biological databases. Ontology has a glossary, named dynamic controlled vocabulary, in which relationships between terms are defined. Since each term is strictly defined and identified with an ID number, a set of data represented in biological ontology is easily accessible to automated information processing, even if the data sets are across several databases and/or different organisms. In this mini-review, we introduce activities in Gramene and Oryzabase, which provide biological ontologies for Oryza sativa (rice).  相似文献   

19.
Knowing the comprehensive knowledge about the protein subcellular localization is an important step to understand the function of the proteins. Recent advances in system biology have allowed us to develop more accurate methods for characterizing the proteins at subcellular localization level. In this study, the analysis method was developed to characterize the topological properties and biological properties of the cytoplasmic proteins, inner membrane proteins, outer membrane proteins and periplasmic proteins in Escherichia coli (E. coli). Statistical significant differences were found in all topological properties and biological properties among proteins in different subcellular localizations. In addition, investigation was carried out to analyze the differences in 20 amino acid compositions for four protein categories. We also found that there were significant differences in all of the 20 amino acid compositions. These findings may be helpful for understanding the comprehensive relationship between protein subcellular localization and biological function  相似文献   

20.
The young investigator award from the Protein Society was a special honor for me because, at its essence, the goal of my laboratory is to define what obscure proteins do. Years ago, I stumbled into mitochondria as a venue for this work, and these organelles continue to define the biological theme of my laboratory. Our approaches are fairly broad, reflecting my own somewhat unorthodox training among diverse scientific fields spanning organic synthesis, chemical biology, mechanistic biochemistry, signal transduction, and systems biology. Yet, whatever the theme or the discipline, we aim to understand how proteins work—especially those that hide in the dark corners of mitochondria. Below, I recount my own path into this arena of protein science, and describe how my experiences along the way have shaped our current multi‐disciplinary efforts to define the inner workings of this complex biological system.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号