首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 130 毫秒
1.
MOTIVATION: Availability of the sequences of entire genomes shifts the scientific curiosity towards the identification of function of the genomes in large scale as in genome studies. In the near future, data produced about cellular processes at molecular level will accumulate with an accelerating rate as a result of proteomics studies. In this regard, it is essential to develop tools for storing, integrating, accessing, and analyzing this data effectively. RESULTS: We define an ontology for a comprehensive representation of cellular events. The ontology presented here enables integration of fragmented or incomplete pathway information and supports manipulation and incorporation of the stored data, as well as multiple levels of abstraction. Based on this ontology, we present the architecture of an integrated environment named Patika (Pathway Analysis Tool for Integration and Knowledge Acquisition). Patika is composed of a server-side, scalable, object-oriented database and client-side editors to provide an integrated, multi-user environment for visualizing and manipulating network of cellular events. This tool features automated pathway layout, functional computation support, advanced querying and a user-friendly graphical interface. We expect that Patika will be a valuable tool for rapid knowledge acquisition, microarray generated large-scale data interpretation, disease gene identification, and drug development. AVAILABILITY: A prototype of Patika is available upon request from the authors.  相似文献   

2.
To understand complex signaling pathways and networks, it is necessary to develop a formal and structured representation of the available information in a format suitable for analysis by software tools. Due to the complexity and incompleteness of the current biological knowledge about cell signaling, such a device must be able to represent cellular pathways at differing levels of details, one level of information abstract enough to convey an essential signaling flow while hiding its details and another level of information detailed enough to explain the underlying mechanisms that account for the signaling flow described at a more abstract level. We have defined a formal ontology for cell-signaling events that allows us to describe these cellular pathways at various levels of abstraction. Using this formal representation, ROSPath (reactive oxygen species-mediated signaling pathway) database system has been implemented and made available on the web (rospath.ewha.ac.kr). ROSPath is a database system for reactive oxygen species (ROS)-mediated cell signaling pathways and signaling processes in molecular detail, which facilitates a comprehensive understanding of the regulatory mechanisms in signaling pathways. ROSPath includes growth factor-, stress-, and cytokine-induced signaling pathways containing about 500 unique proteins (mostly mammalian) and their related protein states, protein complexes, protein complex states, signaling interactions, signaling steps, and pathways. It is a web-based structured repository of information on the signaling pathways of interest and provides a means for managing data produced by large-scale and high-throughput techniques such as proteomics. Also, software tools are provided for querying, displaying, and analyzing pathways, thus furnishing an integrated web environment for visualizing and manipulating ROS-mediated cell-signaling events.  相似文献   

3.
An initiative to increase biopharmaceutical research productivity by capturing, sharing and computationally integrating proprietary scientific discoveries with public knowledge is described. This initiative involves both organisational process change and multiple interoperating software systems. The software components rely on mutually supporting integration techniques. These include a richly structured ontology, statistical analysis of experimental data against stored conclusions, natural language processing of public literature, secure document repositories with lightweight metadata, web services integration, enterprise web portals and relational databases. This approach has already begun to increase scientific productivity in our enterprise by creating an organisational memory (OM) of internal research findings, accessible on the web. Through bringing together these components it has also been possible to construct a very large and expanding repository of biological pathway information linked to this repository of findings which is extremely useful in analysis of DNA microarray data. This repository, in turn, enables our research paradigm to be shifted towards more comprehensive systems-based understandings of drug action.  相似文献   

4.
To provide support for the analysis of biochemical pathways a database system based on a model that represents the characteristics of the domain is needed. This domain has proven to be difficult to model by using conventional data modelling techniques. We are building an ontology for biochemical pathways, which acts as the basis for the generation of a database on the same domain, allowing the definition of complex queries and complex data representation. The ontology is used as a modelling and analysis tool which allows the expression of complex semantics based on a first-order logic representation language. The induction capabilities of the system can help the scientist in formulating and testing research hypotheses that are difficult to express with the standard relational database mechanisms. An ontology representing the shared formalisation of the knowledge in a scientific domain can also be used as data integration tool clarifying the mapping of concepts to the developers of different databases. In this paper we describe the general structure of our system, concentrating on the ontology-based database as the key component of the system.  相似文献   

5.
Discovery and integration of data is important in many ecological studies, especially those that concern broad-scale ecological questions. Data discovery and integration are often difficult and time consuming tasks for researchers, which is due in part to the use of informal, ambiguous, and sometimes inconsistent terms for describing data content. Ontologies offer a solution to this problem by providing consistent definitions of ecological concepts that in turn can be used to annotate, relate, and search for data sets. However, unlike in molecular biology or biomedicine, few ontology development efforts exist within ecology. Ontology development often requires considerable expertise in ontology languages and development tools, which is often a barrier for ontology creation in ecology. In this paper we describe an approach for ontology creation that allows ecologists to use common spreadsheet tools to describe different aspects of an ontology. We present conventions for creating, relating, and constraining concepts through spreadsheets, and provide software tools for converting these ontologies into equivalent OWL-DL representations. We also consider inverse translations, i.e., to convert ontologies represented using OWL-DL into our spreadsheet format. Our approach allows large lists of terms to be easily related and organized into concept hierarchies, and generally provides a more intuitive and natural interface for ontology development by ecologists.  相似文献   

6.
An architecture for biological information extraction and representation   总被引:1,自引:0,他引:1  
Motivations: Technological advances in biomedical research are generating a plethora of heterogeneous data at a high rate. There is a critical need for extraction, integration and management tools for information discovery and synthesis from these heterogeneous data. RESULTS: In this paper, we present a general architecture, called ALFA, for information extraction and representation from diverse biological data. The ALFA architecture consists of: (i) a networked, hierarchical, hyper-graph object model for representing information from heterogeneous data sources in a standardized, structured format; and (ii) a suite of integrated, interactive software tools for information extraction and representation from diverse biological data sources. As part of our research efforts to explore this space, we have currently prototyped the ALFA object model and a set of interactive software tools for searching, filtering, and extracting information from scientific text. In particular, we describe BioFerret, a meta-search tool for searching and filtering relevant information from the web, and ALFA Text Viewer, an interactive tool for user-guided extraction, disambiguation, and representation of information from scientific text. We further demonstrate the potential of our tools in integrating the extracted information with experimental data and diagrammatic biological models via the common underlying ALFA representation. CONTACT: aditya_vailaya@agilent.com.  相似文献   

7.
Semantic technology plays a key role in various domains, from conversation understanding to algorithm analysis. As the most efficient semantic tool, ontology can represent, process and manage the widespread knowledge. Nowadays, many researchers use ontology to collect and organize data''s semantic information in order to maximize research productivity. In this paper, we firstly describe our work on the development of a remote sensing data ontology, with a primary focus on semantic fusion-driven research for big data. Our ontology is made up of 1,264 concepts and 2,030 semantic relationships. However, the growth of big data is straining the capacities of current semantic fusion and reasoning practices. Considering the massive parallelism of DNA strands, we propose a novel DNA-based semantic fusion model. In this model, a parallel strategy is developed to encode the semantic information in DNA for a large volume of remote sensing data. The semantic information is read in a parallel and bit-wise manner and an individual bit is converted to a base. By doing so, a considerable amount of conversion time can be saved, i.e., the cluster-based multi-processes program can reduce the conversion time from 81,536 seconds to 4,937 seconds for 4.34 GB source data files. Moreover, the size of result file recording DNA sequences is 54.51 GB for parallel C program compared with 57.89 GB for sequential Perl. This shows that our parallel method can also reduce the DNA synthesis cost. In addition, data types are encoded in our model, which is a basis for building type system in our future DNA computer. Finally, we describe theoretically an algorithm for DNA-based semantic fusion. This algorithm enables the process of integration of the knowledge from disparate remote sensing data sources into a consistent, accurate, and complete representation. This process depends solely on ligation reaction and screening operations instead of the ontology.  相似文献   

8.
9.
《Ecological Complexity》2008,5(3):272-279
As ecological data increases in breadth, depth, and complexity, the discipline of ecology is increasingly influenced by information science. While this influence provides many opportunities for ecologists, it also necessitates a change in how we manage and share data, and perhaps more fundamentally, define concepts in ecology. Specifically, the information technology process of automated data integration entirely depends upon consistent concept definition. A common tool used in computer science and engineering to specify meanings, which is both novel and offers significant potential to ecology, is an ontology. An ontology is a formal representation of knowledge in which concepts are described by their meaning and their relationship to each other. Ontologies are a tool that can be used to ‘explicitly specify a concept’ (Gruber, 1993) and this approach is uncommon in ecology. In this paper, we develop an ontology for the concept of ‘landscape’ that captures the most general definitions and usages of this term. We selected the concept of landscape because it is often used in very different ways by investigators and hence generates linguistic uncertainty. A graphic theoretic (i.e., visual) model is provided which describes the set of structuring rules we used to define the relationships between ‘landscape’ and appropriately related terms. Based upon these rules, a landscape necessarily contains a spatial component (i.e., area), structure and function (i.e., ecosystems), and is scale independent. This approach provides the set of necessary conditions for landscape studies to reduce linguistic uncertainty, and facilitate interoperability of data, i.e., in a manner that promotes data linkages and quantitative synthesis particularly by automatic data synthesis programs that are likely to become an important part of ecology in the future. Simply put, we use an ontology, a technique novel to ecology but not other disciplines, to define ‘landscape,’ thereby clearly delineating one subset of its potential general usage. As such this ontology can serve as both a checklist for landscape studies and a blueprint for additional ecological ontologies.  相似文献   

10.
MOTIVATION: A Robot Scientist is a physically implemented robotic system that can automatically carry out cycles of scientific experimentation. We are commissioning a new Robot Scientist designed to investigate gene function in S. cerevisiae. This Robot Scientist will be capable of initiating >1,000 experiments, and making >200,000 observations a day. Robot Scientists provide a unique test bed for the development of methodologies for the curation and annotation of scientific experiments: because the experiments are conceived and executed automatically by computer, it is possible to completely capture and digitally curate all aspects of the scientific process. This new ability brings with it significant technical challenges. To meet these we apply an ontology driven approach to the representation of all the Robot Scientist's data and metadata. RESULTS: We demonstrate the utility of developing an ontology for our new Robot Scientist. This ontology is based on a general ontology of experiments. The ontology aids the curation and annotating of the experimental data and metadata, and the equipment metadata, and supports the design of database systems to hold the data and metadata. AVAILABILITY: EXPO in XML and OWL formats is at: http://sourceforge.net/projects/expo/. All materials about the Robot Scientist project are available at: http://www.aber.ac.uk/compsci/Research/bio/robotsci/.  相似文献   

11.
The systematic assignment of gene function to a sequenced genome is one of the outstanding challenges in the post-genomic era. Large-scale systematic mutagenesis screens are important tools for reaching this goal. Here we describe GSD, a software package that allows storage and integration of data from genetic screens. GSD was initially developed for a large-scale F3 mutagenesis screen for developmental mutants of medaka (Oryzias latipes). The version presented here supports a wide range of different screens (mutagenesis, RNAi, morpholinos, transgenesis and others) using different organisms. Data are stored in a relational database and can be made accessible through web interfaces. Researchers can enter data describing their screened embryos: They can track statistics, submit images and describe the resulting phenotypes using a phenotype classification ontology. We developed a fish phenotype classification ontology of medaka and zebrafish for this software package and made it available to the public. In addition, a list of genetic lines resulting from each screen can be generated. These lines (mutant alleles, transgenic lines) can be described and categorized in the same ways as the screened individuals. Raw data from the screen can be integrated to describe these lines. A query module that searches this list can be used to publish the screen results on the Internet. A test version is available at and the software can be downloaded from this site.  相似文献   

12.
13.

Background  

Graph-based pathway ontologies and databases are widely used to represent data about cellular processes. This representation makes it possible to programmatically integrate cellular networks and to investigate them using the well-understood concepts of graph theory in order to predict their structural and dynamic properties. An extension of this graph representation, namely hierarchically structured or compound graphs, in which a member of a biological network may recursively contain a sub-network of a somehow logically similar group of biological objects, provides many additional benefits for analysis of biological pathways, including reduction of complexity by decomposition into distinct components or modules. In this regard, it is essential to effectively query such integrated large compound networks to extract the sub-networks of interest with the help of efficient algorithms and software tools.  相似文献   

14.
MetaCyc (http://metacyc.org) contains experimentally determined biochemical pathways to be used as a reference database for metabolism. In conjunction with the Pathway Tools software, MetaCyc can be used to computationally predict the metabolic pathway complement of an annotated genome. To increase the breadth of pathways and enzymes, more than 60 plant-specific pathways have been added or updated in MetaCyc recently. In contrast to MetaCyc, which contains metabolic data for a wide range of organisms, AraCyc is a species-specific database containing only enzymes and pathways found in the model plant Arabidopsis (Arabidopsis thaliana). AraCyc (http://arabidopsis.org/tools/aracyc/) was the first computationally predicted plant metabolism database derived from MetaCyc. Since its initial computational build, AraCyc has been under continued curation to enhance data quality and to increase breadth of pathway coverage. Twenty-eight pathways have been manually curated from the literature recently. Pathway predictions in AraCyc have also been recently updated with the latest functional annotations of Arabidopsis genes that use controlled vocabulary and literature evidence. AraCyc currently features 1,418 unique genes mapped onto 204 pathways with 1,156 literature citations. The Omics Viewer, a user data visualization and analysis tool, allows a list of genes, enzymes, or metabolites with experimental values to be painted on a diagram of the full pathway map of AraCyc. Other recent enhancements to both MetaCyc and AraCyc include implementation of an evidence ontology, which has been used to provide information on data quality, expansion of the secondary metabolism node of the pathway ontology to accommodate curation of secondary metabolic pathways, and enhancement of the cellular component ontology for storing and displaying enzyme and pathway locations within subcellular compartments.  相似文献   

15.
Immunofluorescent staining is a widespread tool in basic science to understand organ morphology and (patho-) physiology. The analysis of imaging data is often performed manually, limiting throughput and introducing human bias. Quantitative analysis is particularly challenging for organs with complex structure such as the kidney. In this study we present an approach for automatic quantification of fluorescent markers and histochemical stainings in whole organ sections using open source software. We validate our novel method in multiple typical challenges of basic kidney research and demonstrate its general relevance and applicability to other complex solid organs for a variety of different markers and stainings. Our newly developed software tool “AQUISTO”, applied as a standard in primary data analysis, facilitates efficient large scale evaluation of cellular populations in various types of histological samples. Thereby it contributes to the characterization and understanding of (patho-) physiological processes.  相似文献   

16.
The Protein Ontology (PRO) provides terms for and supports annotation of species-specific protein complexes in an ontology framework that relates them both to their components and to species-independent families of complexes. Comprehensive curation of experimentally known forms and annotations thereof is expected to expose discrepancies, differences, and gaps in our knowledge. We have annotated the early events of innate immune signaling mediated by Toll-Like Receptor 3 and 4 complexes in human, mouse, and chicken. The resulting ontology and annotation data set has allowed us to identify species-specific gaps in experimental data and possible functional differences between species, and to employ inferred structural and functional relationships to suggest plausible resolutions of these discrepancies and gaps.  相似文献   

17.
We describe a semantic data validation tool that is capable of observing incoming real-time sensor data and performing reasoning against a set of rules specific to the scientific domain to which the data belongs. Our software solution can produce a variety of different outcomes when a data anomaly or unexpected event is detected, ranging from simple flagging of data points, to data augmentation, to validation of proposed hypotheses that could explain the phenomenon. Hosted on the Jena Semantic Web Framework, the tool is completely domain-agnostic and is made domain-aware by reference to an ontology and Knowledge Base (KB) that together describe the key resources of the system being observed. The KB comprises ontologies for the sensor packages and for the domain; historical data from the network; concepts designed to guide discovery of internet resources unavailable in the local KB but relevant to reasoning about the anomaly; and a set of rules that represent domain expert knowledge of constraints on data from different kinds of instruments as well as rules that relate types of ecosystem events to properties of the ecosystem. We describe an instance of such a system that includes a sensor ontology, some rules describing coastal storm events and their consequences, and how we relate local data to external resources. We describe in some detail how a specific actual event—an unusually high chlorophyll reading—can be deduced by machine reasoning to be consistent with being caused by benthic diatom resuspension, consistent with being caused by an algal bloom, or both.  相似文献   

18.
19.
MOTIVATION: BioPAX is a standard language for representing and exchanging models of biological processes at the molecular and cellular levels. It is widely used by different pathway databases and genomics data analysis software. Currently, the primary source of BioPAX data is direct exports from the curated pathway databases. It is still uncommon for wet-lab biologists to share and exchange pathway knowledge using BioPAX. Instead, pathways are usually represented as informal diagrams in the literature. In order to encourage formal representation of pathways, we describe a software package that allows users to create pathway diagrams using CellDesigner, a user-friendly graphical pathway-editing tool and save the pathway data in BioPAX Level 3 format. AVAILABILITY: The plug-in is freely available and can be downloaded at ftp://ftp.pantherdb.org/CellDesigner/plugins/BioPAX/ CONTACT: huaiyumi@usc.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

20.

Background  

Signal transduction events often involve transient, yet specific, interactions between structurally conserved protein domains and polypeptide sequences in target proteins. The identification and validation of these associating domains is crucial to understand signal transduction pathways that modulate different cellular or developmental processes. Bioinformatics strategies to extract and integrate information from diverse sources have been shown to facilitate the experimental design to understand complex biological events. These methods, primarily based on information from high-throughput experiments, have also led to the identification of new connections thus providing hypothetical models for cellular events. Such models, in turn, provide a framework for directing experimental efforts for validating the predicted molecular rationale for complex cellular processes. In this context, it is envisaged that the rational design of peptides for protein-peptide binding studies could substantially facilitate the experimental strategies to evaluate a predicted interaction. This rational design procedure involves the integration of protein-protein interaction data, gene ontology, physico-chemical calculations, domain-domain interaction data and information on functional sites or critical residues.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号