首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The structure of the Dutch Relational Archaeobotanical Database (RADAR) is presented. RADAR is a rather compact archaeobotanical database that is controlled centrally, but can be distributed to individual scientists. For this reason RADAR contains only the most important archaeobotanical data. For detailed archaeological, botanical and regional palaeoenvironmental information, links can be established with the national archaeological database (ARCHIS), the national botanical database (BBR) and the European Pollen Database (EPD). The software used for manipulation of the database is PARADOX for reasons of its highly visible nature, its control facilities for data entry and the ease of importing and exporting data from and to many other programs. The potential of the database is demonstrated with query examples.  相似文献   

2.
3.
4.
MOTIVATION: The PFDB (Protein Family Database) is a new database designed to integrate protein family-related data with relevant functional and genomic data. It currently manages biological data for three projects-the CATH protein domain database (Orengo et al., 1997; Pearl et al., 2001), the VIDA virus domains database (Albà et al., 2001) and the Gene3D database (Buchan et al., 2001). The PFDB has been designed to accommodate protein families identified by a variety of sequence based or structure based protocols and provides a generic resource for biological research by enabling mapping between different protein families and diverse biochemical and genetic data, including complete genomes. RESULTS: A characteristic feature of the PFDB is that it has a number of meta-level entities (for example aggregation, collection and inclusion) represented as base tables in the final design. The explicit representation of relationships at the meta-level has a number of advantages, including flexibility-both in terms of the range of queries that can be formulated and the ability to integrate new biological entities within the existing design. A potential drawback with this approach-poor performance caused by the number of joins across meta-level tables-is avoided by implementing the PFDB with materialized views using the mature relational database technology of Oracle 8i. The resultant database is both fast and flexible. This paper presents the principles on which the database has been designed and implemented, and describes the current status of the database and query facilities supported.  相似文献   

5.
The needs for permanently changing the logical and physical structure of a medical datebase during the development of a health information system have initiated the project of implementing a DATA MANAGER. The concept of the DATA MANAGER covers facilities for the development of the logical data structure model including documentation of the model and programming support for application programs accessing the health information system (HIS) database. The outstanding facilities of the INTERLISP system have been found to be appropriate for writing the DATA MANAGER. A first data structure model, on which the DATA MANAGER will operate, is roughly outlined.  相似文献   

6.
MOTIVATION: The exponential growth of sequence databases poses a major challenge to bioinformatics tools for querying alignment and annotation databases. There is a pressing need for methods for finding overlapping sequence intervals that are highly scalable to database size, query interval size, result size and construction/updating of the interval database. RESULTS: We have developed a new interval database representation, the Nested Containment List (NCList), whose query time is O(n + log N), where N is the database size and n is the size of the result set. In all cases tested, this query algorithm is 5-500-fold faster than other indexing methods tested in this study, such as MySQL multi-column indexing, MySQL binning and R-Tree indexing. We provide performance comparisons both in simulated datasets and real-world genome alignment databases, across a wide range of database sizes and query interval widths. We also present an in-place NCList construction algorithm that yields database construction times that are approximately 100-fold faster than other methods available. The NCList data structure appears to provide a useful foundation for highly scalable interval database applications. AVAILABILITY: NCList data structure is part of Pygr, a bioinformatics graph database library, available at http://sourceforge.net/projects/pygr  相似文献   

7.
We consider the problem of similarity queries in biological network databases. Given a database of networks, similarity query returns all the database networks whose similarity (i.e. alignment score) to a given query network is at least a specified similarity cutoff value. Alignment of two networks is a very costly operation, which makes exhaustive comparison of all the database networks with a query impractical. To tackle this problem, we develop a novel indexing method, named RINQ (Reference-based Indexing for Biological Network Queries). Our method uses a set of reference networks to eliminate a large portion of the database quickly for each query. A reference network is a small biological network. We precompute and store the alignments of all the references with all the database networks. When our database is queried, we align the query network with all the reference networks. Using these alignments, we calculate a lower bound and an approximate upper bound to the alignment score of each database network with the query network. With the help of upper and lower bounds, we eliminate the majority of the database networks without aligning them to the query network. We also quickly identify a small portion of these as guaranteed to be similar to the query. We perform pairwise alignment only for the remaining networks. We also propose a supervised method to pick references that have a large chance of filtering the unpromising database networks. Extensive experimental evaluation suggests that (i) our method reduced the running time of a single query on a database of around 300 networks from over 2 days to only 8 h; (ii) our method outperformed the state of the art method Closure Tree and SAGA by a factor of three or more; and (iii) our method successfully identified statistically and biologically significant relationships across networks and organisms.  相似文献   

8.

Background  

Motivated by a biomedical database set up by our group, we aimed to develop a generic database front-end with embedded knowledge discovery and analysis features. A major focus was the human-oriented representation of the data and the enabling of a closed circle of data query, exploration, visualization and analysis.  相似文献   

9.
10.
11.
Chen Y  Hanan J 《Bio Systems》2002,65(2-3):187-197
Models of plant architecture allow us to explore how genotype environment interactions effect the development of plant phenotypes. Such models generate masses of data organised in complex hierarchies. This paper presents a generic system for creating and automatically populating a relational database from data generated by the widely used L-system approach to modelling plant morphogenesis. Techniques from compiler technology are applied to generate attributes (new fields) in the database, to simplify query development for the recursively-structured branching relationship. Use of biological terminology in an interactive query builder contributes towards making the system biologist-friendly.  相似文献   

12.
The intestinal crypt/villus in situ hybridization database (CVD) query interface is a web-based tool to search for genes with similar relative expression patterns along the crypt/villus axis of the mammalian intestine. The CVD is an online database holding information for relative gene expression patterns in the mammalian intestine and is based on the scoring of in situ hybridization experiments reported in the literature. CVD contains expression data for 88 different genes collected from 156 different in situ hybridization profiles. The web-based query interface allows execution of both single gene queries and pattern searches. The query results provide links to the most relevant public gene databases. AVAILABILITY: http://pc113.imbg.ku.dk/ps/  相似文献   

13.
In this paper, we discuss the properties of biological data and challenges it poses for data management, and argue that, in order to meet the data management requirements for 'digital biology', careful integration of the existing technologies and the development of new data management techniques for biological data are needed. Based on this premise, we present PathCase: Case Pathways Database System. PathCase is an integrated set of software tools for modelling, storing, analysing, visualizing and querying biological pathways data at different levels of genetic, molecular, biochemical and organismal detail. The novel features of the system include: (i) genomic information integrated with other biological data and presented starting from pathways; (ii) design for biologists who are possibly unfamiliar with genomics, but whose research is essential for annotating gene and genome sequences with biological functions; (iii) database design, implementation and graphical tools which enable users to visualize pathways data in multiple abstraction levels and to pose exploratory queries; (iv) a wide range of different types of queries including, 'path' and 'neighbourhood queries' and graphical visualization of query outputs; and (v) an implementation that allows for web (XML)-based dissemination of query outputs (i.e. pathways data in BIOPAX format) to researchers in the community, giving them control on the use of pathways data.  相似文献   

14.
With the proliferation of high-throughput technologies, genome-level data analysis has become common in molecular biology. Bioinformaticians are developing extensive resources to annotate and mine biological features from high-throughput data. The underlying database management systems for most bioinformatics software are based on a relational model. Modern non-relational databases offer an alternative that has flexibility, scalability, and a non-rigid design schema. Moreover, with an accelerated development pace, non-relational databases like CouchDB can be ideal tools to construct bioinformatics utilities. We describe CouchDB by presenting three new bioinformatics resources: (a) geneSmash, which collates data from bioinformatics resources and provides automated gene-centric annotations, (b) drugBase, a database of drug-target interactions with a web interface powered by geneSmash, and (c) HapMap-CN, which provides a web interface to query copy number variations from three SNP-chip HapMap datasets. In addition to the web sites, all three systems can be accessed programmatically via web services.  相似文献   

15.
《IRBM》2008,29(1):35-43
In this article, we present a Case-based Reasoning system for the retrieval of patient files similar to a case placed as query. We focus on patient files made up of several images with contextual information (such as the patient age, sex and medical history). Indeed, medical experts generally need varied sources of information (which might be incomplete) to diagnose a pathology. Consequently, we derive a retrieval framework from decision trees, which are well suited to process heterogeneous and incomplete information. To be integrated in the system, images are indexed by their digital content. The method is evaluated on a classified diabetic retinopathy database. On this database, results are promising: the retrieval sensitivity reaches 79.5% for a window of five cases, which is almost twice as good as the retrieval of single images alone.  相似文献   

16.
We have undertaken a large scale study of the proteins expressed in the procyclic form of the parasite Trypanosoma brucei, which causes African sleeping sickness, using 2-DE and MS. The complete data set encompasses over 2000 identifications, of which 770 are distinct proteins. We have discovered that multiple protein isoforms appear to be common in T. brucei, as most proteins have been matched to more than one gel spot. We have developed visualisation software to investigate the differences between isoforms, based on the information from the results of database searches with MS data. We are able to highlight instances where PTMs are the most likely cause of variant forms. In other cases, spots that appear reproducibly across replicates contain fragments of proteins, arising either as experimental artefacts or as part of protein degradation. We are also able to classify clusters of gel spots into different groups based on the pattern of peptides that have been matched from MS data. The entire data set is stored within a relational database system that allows complex queries ( http://www.gla.ac.uk/functionalgenomics). Using specific proteins as examples, we demonstrate how the visualisation software and the database query facilities can be used.  相似文献   

17.
Recently, a number of collaborative large-scale mouse mutagenesis programs have been launched. These programs aim for a better understanding of the roles of all individual coding genes and the biological systems in which these genes participate. In international efforts to share phenotypic data among facilities/institutes, it is desirable to integrate information obtained from different phenotypic platforms reliably. Since the definitions of specific phenotypes often depend on a tacit understanding of concepts that tends to vary among different facilities, it is necessary to define phenotypes based on the explicit evidence of assay results. We have developed a website termed PhenoSITE (Phenome Semantics Information with Terminology of Experiments: http://www.gsc.riken.jp/Mouse/), in which we are trying to integrate phenotype-related information using an experimental-evidence-based approach. The site's features include (1) a baseline database for our phenotyping platform; (2) an ontology associating international phenotypic definitions with experimental terminologies used in our phenotyping platform; (3) a database for standardized operation procedures of the phenotyping platform; and (4) a database for mouse mutants using data produced from the large-scale mutagenesis program at RIKEN GSC. We have developed two types of integrated viewers to enhance the accessibility to mutant resource information. One viewer depicts a matrix view of the ontology-based classification and chromosomal location of each gene; the other depicts ontology-mediated integration of experimental protocols, baseline data, and mutant information. These approaches rely entirely upon experiment-based evidence, ensuring the reliability of the integrated data from different phenotyping platforms.  相似文献   

18.
MHCBN: a comprehensive database of MHC binding and non-binding peptides   总被引:6,自引:0,他引:6  
MHCBN is a comprehensive database of Major Histocompatibility Complex (MHC) binding and non-binding peptides compiled from published literature and existing databases. The latest version of the database has 19 777 entries including 17 129 MHC binders and 2648 MHC non-binders for more than 400 MHC molecules. The database has sequence and structure data of (a) source proteins of peptides and (b) MHC molecules. MHCBN has a number of web tools that include: (i) mapping of peptide on query sequence; (ii) search on any field; (iii) creation of data sets; and (iv) online data submission. The database also provides hypertext links to major databases like SWISS-PROT, PDB, IMGT/HLA-DB, GenBank and PUBMED.  相似文献   

19.
Searching a database for a local alignment to a query under a typical scoring scheme, such as PAM120 or BLOSUM62 with affine gap costs, is a computation that has resisted algorithmic improvement due to its basis in dynamic programming and the weak nature of the signals being searched for. In a query preprocessing step, a set of tables can be built that permit one to (a) eliminate a large fraction of the dynamic programming matrix from consideration and (b) to compute several steps of the remainder with a single table lookup. While this result is not an asymptotic improvement over the original Smith-Waterman algorithm, its complexity is characterized in terms of some sparse features of the matrix and it yields the fastest software implementation to date for such searches.  相似文献   

20.
Scan operation will involve many fragments and cause many extra invalid partitioning query operations in distributed column-oriented database which affects query efficiency seriously, especially for spatial data. To solve this question, this paper refers to partitioning strategy in distributed column-oriented database and advocates a spatial data storage optimization strategy named ‘SPPS’. This strategy makes adjacent spatial objects stored in the same data fragment with considering spatial adjacency, and reserves the spatial information of each fragment. Thus spatial query operation can locate the relevant fragment on basis of spatial information of fragment, and extra invalid partitioning scan operations would be lighted. Then the storage and query efficiency would be improved. To verify the validity of ‘SPPS’ optimization strategy, this paper carries on relevant experiments based on HBase and records spatial query efficiency with and without ‘SPPS’ respectively. The experiments results indicate that ‘SPPS’ strategy can optimize the storage and query efficiency in distributed column-oriented databases.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号