首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
An effective strategy for managing protein databases is to provide mechanisms to transform raw data into consistent, accurate and reliable information. Such mechanisms will greatly reduce operational inefficiencies and improve one's ability to better handle scientific objectives and interpret the research results. To achieve this challenging goal for the STING project, we introduce Sting_RDB, a relational database of structural parameters for protein analysis with support for data warehousing and data mining. In this article, we highlight the main features of Sting_RDB and show how a user can explore it for efficient and biologically relevant queries. Considering its importance for molecular biologists, effort has been made to advance Sting_RDB toward data quality assessment. To the best of our knowledge, Sting_RDB is one of the most comprehensive data repositories for protein analysis, now also capable of providing its users with a data quality indicator. This paper differs from our previous study in many aspects. First, we introduce Sting_RDB, a relational database with mechanisms for efficient and relevant queries using SQL. Sting_rdb evolved from the earlier, text (flat file)-based database, in which data consistency and integrity was not guaranteed. Second, we provide support for data warehousing and mining. Third, the data quality indicator was introduced. Finally and probably most importantly, complex queries that could not be posed on a text-based database, are now easily implemented. Further details are accessible at the Sting_RDB demo web page: http://www.cbi.cnptia.embrapa.br/StingRDB.  相似文献   

2.
Current research of gene regulatory mechanisms is increasingly dependent on the availability of high-quality information from manually curated databases. Biocurators undertake the task of extracting knowledge claims from scholarly publications, organizing these claims in a meaningful format and making them computable. In doing so, they enhance the value of existing scientific knowledge by making it accessible to the users of their databases.In this capacity, biocurators are well positioned to identify and weed out information that is of insufficient quality. The criteria that define information quality are typically outlined in curation guidelines developed by biocurators. These guidelines have been prudently developed to reflect the needs of the user community the database caters to. The guidelines depict the standard evidence that this community recognizes as sufficient justification for trustworthy data. Additionally, these guidelines determine the process by which data should be organized and maintained to be valuable to users. Following these guidelines, biocurators assess the quality, reliability, and validity of the information they encounter.In this article we explore to what extent different use cases agree with the inclusion criteria that define positive and negative data, implemented by the database. What are the drawbacks to users who have queries that would be well served by results that fall just short of the criteria used by a database? Finally, how can databases (and biocurators) accommodate the needs of such more explorative use cases?  相似文献   

3.
Over recent years, there has been a growing interest in extracting information automatically or semi-automatically from the scientific literature. This paper describes a novel ontology-based interactive information extraction (OBIIE) framework and a specific OBIIE system. We describe how this system enables life scientists to make ad hoc queries similar to using a standard search engine, but where the results are obtained in a database format similar to a pre-programmed information extraction engine. We present a case study in which the system was evaluated for extracting co-factors from EMBASE and MEDLINE.  相似文献   

4.

In systems biology, study of a complex and multicomponent system, such as morphogenesis, comprises accumulation of data on morphogenetic processes in databases, classification and logical analysis of this information, and computer simulation of the processes in question using the data accumulated and the results of their analysis. This paper describes realization of the first steps in a systems study of morphogenesis (annotating research papers, compiling information in a database, data systematization, and their logical analysis) by the example of Arabidopsis thaliana, a model object in plant molecular biology. The database AGNS (Arabidopsis GeneNet Supplementary; http://wwwmgs.bionet.nsc.ru/agns) contains the experimentally confirmed information from published papers on specific features of gene expression and phenotypes of wild-type, mutant, and transgenic A. thaliana plants. AGNS queries and logical data analysis with the aid of specially developed software makes it possible to model various morphogenetic processes from gene expression to functioning of gene networks and their contribution to the development of certain traits.

  相似文献   

5.
MOTIVATION: The information model chosen to store biological data affects the types of queries possible, database performance, and difficulty in updating that information model. Genetic sequence data for pharmacogenetics studies can be complex, and the best information model to use may change over time. As experimental and analytical methods change, and as biological knowledge advances, the data storage requirements and types of queries needed may also change. RESULTS: We developed a model for genetic sequence and polymorphism data, and used XML Schema to specify the elements and attributes required for this model. We implemented this model as an ontology in a frame-based representation and as a relational model in a database system. We collected genetic data from two pharmacogenetics resequencing studies, and formulated queries useful for analysing these data. We compared the ontology and relational models in terms of query complexity, performance, and difficulty in changing the information model. Our results demonstrate benefits of evolving the schema for storing pharmacogenetics data: ontologies perform well in early design stages as the information model changes rapidly and simplify query formulation, while relational models offer improved query speed once the information model and types of queries needed stabilize.  相似文献   

6.
7.
Systemic analysis of available large-scale biological/biomedical data is critical for studying biological mechanisms, and developing novel and effective treatment approaches against diseases. However, different layers of the available data are produced using different technologies and scattered across individual computational resources without any explicit connections to each other, which hinders extensive and integrative multi-omics-based analysis. We aimed to address this issue by developing a new data integration/representation methodology and its application by constructing a biological data resource. CROssBAR is a comprehensive system that integrates large-scale biological/biomedical data from various resources and stores them in a NoSQL database. CROssBAR is enriched with the deep-learning-based prediction of relationships between numerous data entries, which is followed by the rigorous analysis of the enriched data to obtain biologically meaningful modules. These complex sets of entities and relationships are displayed to users via easy-to-interpret, interactive knowledge graphs within an open-access service. CROssBAR knowledge graphs incorporate relevant genes-proteins, molecular interactions, pathways, phenotypes, diseases, as well as known/predicted drugs and bioactive compounds, and they are constructed on-the-fly based on simple non-programmatic user queries. These intensely processed heterogeneous networks are expected to aid systems-level research, especially to infer biological mechanisms in relation to genes, proteins, their ligands, and diseases.  相似文献   

8.
In systems biology, study of a complex and multicomponent system, such as morphogenesis, comprises accumulation of data on morphogenetic processes in databases, classification and logical analysis of this information, and computer simulation of the processes in question using the data accumulated and the results of their analysis. This paper describes realization of the first steps in a systems study of morphogenesis (annotating research papers, compiling information in a database, data systematization, and their logical analysis) by the example of Arabidopsis thaliana, a model object in plant molecular biology. The database AGNS (Arabidopsis GeneNet Supplementary; http://wwwmgs.bionet.nsc.ru/agns) contains the experimentally confirmed information from published papers on specific features of gene expression and phenotypes of wild-type, mutant, and transgenic A. thaliana plants. AGNS queries and logical data analysis with the aid of specially developed software makes it possible to model various morphogenetic processes from gene expression to functioning of gene networks and their contribution to the development of certain traits.  相似文献   

9.
Development of a clearer understanding of the causes and consequences of environmental change is an important issue globally. The consequent demand for objective, reliable and up-to-date environmental information has led to the establishment of long-term integrated environmental monitoring programmes, including the UK's Environmental Change Network (ECN). Databases form the core information resource for such programmes. The UK Environmental Change Network Data Centre manages data on behalf of ECN (as well as other related UK integrated environmental monitoring networks) and provides a robust and integrated system of information management. This paper describes how data are captured – through standardised protocols and data entry systems – as well more recent approaches such as wireless sensors. Data are managed centrally through a database and GIS. Quality control is built in at all levels of the system. Data are then made accessible through a variety of data access methods – through bespoke web interfaces, as well as third-party data portals. This paper describes the informatics approach of the ECN Data Centre which aims to develop a seamless system of data capture, management and data access interfaces to support research.  相似文献   

10.
DExH/D proteins are essential for all aspects of cellular RNA metabolism and processing, in the replication of many viruses and in DNA replication. DExH/D proteins are subject to current biological, biochemical and biophysical research which provides a continuous wealth of data. The DExH/D protein family database compiles this information and makes it available over the WWW (http://www.columbia.edu/ ej67/dbhome.htm ). The database can be fully searched by text based queries, facilitating fast access to specific information about this important class of enzymes.  相似文献   

11.
The increasing use of high-throughput methods for the production of biologically important information and the increasing diversity of that information pose considerable bioinformatics challenges. These challenges will be met by implementing electronic data management systems not only to capture the data, but increasingly to provide a platform for data integration and mining as we enter the post-genomic era. We discuss the design and implementation of such a data capture system, `Mutabase', as a model of how such electronic systems might be designed and implemented. Mutabase was created in support of a large-scale, phenotype-driven mouse mutagenesis program at MRC Mammalian Genetics Unit, Harwell, in collaboration with SmithKline Beecham Pharmaceuticals, Queen Mary and Westfield College, London, and Imperial College of Science, Technology and Medicine, London. The aim of this mutagenesis project is to make a significant contribution to the existing mouse mutant resource, closing the phenotype gap and providing many more models for fundamental research and disease modeling. Mutabase records experimental details at the `point of generation' and provides a number of dissemination and analysis tools for the experimental data, as well as providing a means of assessing various aspects of progress of the program. Mutabase uses a hypertext-based interface to provide interaction between a number of intranet-based client workstations and a central industrial strength database. Mutabase utilizes a variety of techniques in order to implement the user interface system including Perl/CGI, Java Servlets, and an experimental CORBA server. We discuss the relative merits of these methods in the context of the need to provide sound informatics approaches for the support of systematic mutagenesis programs. Received: 16 December 1999 / Accepted: 17 December 1999  相似文献   

12.
PlasmoDB (http://PlasmoDB.org) is the official database of the Plasmodium falciparum genome sequencing consortium. This resource incorporates finished and draft genome sequence data and annotation emerging from Plasmodium sequencing projects. PlasmoDB currently houses information from five parasite species and provides tools for cross-species comparisons. Sequence information is also integrated with other genomic-scale data emerging from the Plasmodium research community, including gene expression analysis from EST, SAGE and microarray projects. The relational schemas used to build PlasmoDB [Genomics Unified Schema (GUS) and RNA Abundance Database (RAD)] employ a highly structured format to accommodate the diverse data types generated by sequence and expression projects. A variety of tools allow researchers to formulate complex, biologically based queries of the database. A version of the database is also available on CD-ROM (Plasmodium GenePlot), facilitating access to the data in situations where Internet access is difficult (e.g. by malaria researchers working in the field). The goal of PlasmoDB is to enhance utilization of the vast quantities of data emerging from genome-scale projects by the global malaria research community.  相似文献   

13.
We present DR-GAS1, a unique, consolidated and comprehensive DNA repair genetic association studies database of human DNA repair system. It presents information on repair genes, assorted mechanisms of DNA repair, linkage disequilibrium, haplotype blocks, nsSNPs, phosphorylation sites, associated diseases, and pathways involved in repair systems. DNA repair is an intricate process which plays an essential role in maintaining the integrity of the genome by eradicating the damaging effect of internal and external changes in the genome. Hence, it is crucial to extensively understand the intact process of DNA repair, genes involved, non-synonymous SNPs which perhaps affect the function, phosphorylated residues and other related genetic parameters. All the corresponding entries for DNA repair genes, such as proteins, OMIM IDs, literature references and pathways are cross-referenced to their respective primary databases. DNA repair genes and their associated parameters are either represented in tabular or in graphical form through images elucidated by computational and statistical analyses. It is believed that the database will assist molecular biologists, biotechnologists, therapeutic developers and other scientific community to encounter biologically meaningful information, and meticulous contribution of genetic level information towards treacherous diseases in human DNA repair systems. DR-GAS is freely available for academic and research purposes at: http://www.bioinfoindia.org/drgas.  相似文献   

14.
GOBASE: the organelle genome database   总被引:3,自引:1,他引:2  
  相似文献   

15.
Over the last two decades, there has been a huge increase in our understanding of microbial diversity, structure and composition enabled by high-throughput sequencing technologies. Yet, it is unclear how the number of sequences translates to the number of cells or species within the community. In some cases, additional observational data may be required to ensure relative abundance patterns from sequence reads are biologically meaningful. The goal of DNA-based methods for biodiversity assessments is to obtain robust community abundance data, simultaneously, from environmental samples. In this issue of Molecular Ecology Resources, Pierella Karlusich et al. (2022) describe a new method for quantifying phytoplankton cell abundance. Using Tara Oceans data sets, the authors propose the photosynthetic gene psbO for reporting accurate relative abundance of the entire phytoplankton community from metagenomic data. The authors demonstrate higher correlations with traditional optical methods (including microscopy and flow cytometry), using their new method, improving upon molecular abundance assessments using multicopy marker genes. Furthermore, to facilitate application of their approach, the authors curated a psbO gene database for accessible taxonomic queries. This is an important step towards improving species abundance estimates from molecular data and eventually reporting of absolute species abundance, enhancing our understanding of community dynamics.  相似文献   

16.
17.
18.
The analysis of proteomes of biological organisms represents a major challenge of the post-genome era. Classical proteomics combines two-dimensional electrophoresis (2-DE) and mass spectrometry (MS) for the identification of proteins. Novel technologies such as isotope coded affinity tag (ICAT)-liquid chromatography/mass spectrometry (LC/MS) open new insights into protein alterations. The vast amount and diverse types of proteomic data require adequate web-accessible computational and database technologies for storage, integration, dissemination, analysis and visualization. A proteome database system (http://www.mpiib-berlin.mpg.de/2D-PAGE) for microbial research has been constructed which integrates 2-DE/MS, ICAT-LC/MS and functional classification data of proteins with genomic, metabolic and other biological knowledge sources. The two-dimensional polyacrylamide gel electrophoresis database delivers experimental data on microbial proteins including mass spectra for the validation of protein identification. The ICAT-LC/MS database comprises experimental data for protein alterations of mycobacterial strains BCG vs. H37Rv. By formulating complex queries within a functional protein classification database "FUNC_CLASS" for Mycobacterium tuberculosis and Helicobacter pylori the researcher can gather precise information on genes, proteins, protein classes and metabolic pathways. The use of the R language in the database architecture allows high-level data analysis and visualization to be performed "on-the-fly". The database system is centrally administrated, and investigators without specific bioinformatic competence in database construction can submit their data. The database system also serves as a template for a prototype of a European Proteome Database of Pathogenic Bacteria. Currently, the database system includes proteome information for six strains of microorganisms.  相似文献   

19.
An object-oriented database system has been developed which is being used to store protein structure data. The database can be queried using the logic programming language Prolog or the query language Daplex. Queries retrieve information by navigating through a network of objects which represent the primary, secondary and tertiary structures of proteins. Routines written in both Prolog and Daplex can integrate complex calculations with the retrieval of data from the database, and can also be stored in the database for sharing among users. Thus object-oriented databases are better suited to prototyping applications and answering complex queries about protein structure than relational databases. This system has been used to find loops of varying length and anchor positions when modelling homologous protein structures.  相似文献   

20.
Song WM  Di Matteo T  Aste T 《PloS one》2012,7(3):e31929
We introduce a graph-theoretic approach to extract clusters and hierarchies in complex data-sets in an unsupervised and deterministic manner, without the use of any prior information. This is achieved by building topologically embedded networks containing the subset of most significant links and analyzing the network structure. For a planar embedding, this method provides both the intra-cluster hierarchy, which describes the way clusters are composed, and the inter-cluster hierarchy which describes how clusters gather together. We discuss performance, robustness and reliability of this method by first investigating several artificial data-sets, finding that it can outperform significantly other established approaches. Then we show that our method can successfully differentiate meaningful clusters and hierarchies in a variety of real data-sets. In particular, we find that the application to gene expression patterns of lymphoma samples uncovers biologically significant groups of genes which play key-roles in diagnosis, prognosis and treatment of some of the most relevant human lymphoid malignancies.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号