首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
This study provides an experimental performance evaluation on population-based queries of NoSQL databases storing archetype-based Electronic Health Record (EHR) data. There are few published studies regarding the performance of persistence mechanisms for systems that use multilevel modelling approaches, especially when the focus is on population-based queries. A healthcare dataset with 4.2 million records stored in a relational database (MySQL) was used to generate XML and JSON documents based on the openEHR reference model. Six datasets with different sizes were created from these documents and imported into three single machine XML databases (BaseX, eXistdb and Berkeley DB XML) and into a distributed NoSQL database system based on the MapReduce approach, Couchbase, deployed in different cluster configurations of 1, 2, 4, 8 and 12 machines. Population-based queries were submitted to those databases and to the original relational database. Database size and query response times are presented. The XML databases were considerably slower and required much more space than Couchbase. Overall, Couchbase had better response times than MySQL, especially for larger datasets. However, Couchbase requires indexing for each differently formulated query and the indexing time increases with the size of the datasets. The performances of the clusters with 2, 4, 8 and 12 nodes were not better than the single node cluster in relation to the query response time, but the indexing time was reduced proportionally to the number of nodes. The tested XML databases had acceptable performance for openEHR-based data in some querying use cases and small datasets, but were generally much slower than Couchbase. Couchbase also outperformed the response times of the relational database, but required more disk space and had a much longer indexing time. Systems like Couchbase are thus interesting research targets for scalable storage and querying of archetype-based EHR data when population-based use cases are of interest.  相似文献   

2.
Biochemical databases will be best served by the development of new specialized database management systems whose storage managers are based on metric-space indexing techniques and the development a database query languages that embody semantics derived from biochemical models of similarity and evolution. Important biochemical data types cannot be effectively mapped to low dimensional coordinate systems on which O(log n) indexing methods rely. It is clear from an abundance of bioinformatic discoveries that biochemical data is not random and exhibits interesting structure with respect to clustering. Metric-space indexing exploits a data set's intrinsic clustering to speed the execution of similarity queries, even when the data cannot be mapped to a coordinate system. Database management systems that seamlessly integrate semantically rich query languages with a metric-storage and retrieval mechanism will allow biologists to simply and concisely develop informatic studies that have traditionally been large and labor intensive.  相似文献   

3.
MOTIVATION: The large and growing body of experimental data on biomolecular binding is of enormous value in developing a deeper understanding of molecular biology, in developing new therapeutics, and in various molecular design applications. However, most of these data are found only in the published literature and are therefore difficult to access and use. No existing public database has focused on measured binding affinities and has provided query capabilities that include chemical structure and sequence homology searches. METHODS & RESULTS: We have created Binding DataBase (BindingDB), a public, web-accessible database of measured binding affinities. BindingDB is based upon a relational data specification for describing binding measurements via Isothermal Titration Calorimetry (ITC) and enzyme inhibition. A corresponding XML Document Type Definition (DTD) is used to create and parse intermediate files during the on-line deposition process and will also be used for data interchange, including collection of data from other sources. The on-line query interface, which is constructed with Java Servlet technology, supports standard SQL queries as well as searches for molecules by chemical structure and sequence homology. The on-line deposition interface uses Java Server Pages and JavaBean objects to generate dynamic HTML and to store intermediate results. The resulting data resource provides a range of functionality with brisk response-times, and lends itself well to continued development and enhancement.  相似文献   

4.
This paper describes the design and implementation of ADAMIS ('A database for medical information systems'). ADAMIS is a relational database management system for a general hospital environment. Apart from the usual database (DB) facilities of data definition and data manipulation, ADAMIS supports a query language called the 'simplified medical query language' (SMQL) which is completely end-user oriented and highly non-procedural. Other features of ADAMIS include provision of facilities for statistics collection and report generation. ADAMIS also provides adequate security and integrity features and has been designed mainly for use on interactive terminals.  相似文献   

5.
The National Center for Biotechnology Information (NCBI) integrates data from more than 20 biological databases through a flexible search and retrieval system called Entrez. A core Entrez database, Entrez Nucleotide, includes GenBank and is tightly linked to the NCBI Taxonomy database, the Entrez Protein database, and the scientific literature in PubMed. A suite of more specialized databases for genomes, genes, gene families, gene expression, gene variation, and protein domains dovetails with the core databases to make Entrez a powerful system for genomic research. Linked to the full range of Entrez databases is the NCBI Map Viewer, which displays aligned genetic, physical, and sequence maps for eukaryotic genomes including those of many plants. A specialized plant query page allow maps from all plant genomes covered by the Map Viewer to be searched in tandem to produce a display of aligned maps from several species. PlantBLAST searches against the sequences shown in the Map Viewer allow BLAST alignments to be viewed within a genomic context. In addition, precomputed sequence similarities, such as those for proteins offered by BLAST Link, enable fluid navigation from unannotated to annotated sequences, quickening the pace of discovery. NCBI Web pages for plants, such as Plant Genome Central, complete the system by providing centralized access to NCBI's genomic resources as well as links to organism-specific Web pages beyond NCBI.  相似文献   

6.
7.
施建平  孙波  杨林章 《应用生态学报》2003,14(11):1873-1878
近年来,农田生态系统NPK养分循环研究已经积累了大量的数据,迫切需要建立可长期保存数据、并为养分循环研究全局决策服务的数据管理系统,本文描述了用于养分循环数据管理的概念模型的设计,并说明构建模型的过程,最后给出依据该模型建立数据库系统应用实例。结果表明,依据模型建立的数据库系统可提供按照时间、地点、专题查询的功能,能够管理野外观测数据、专题图和研究报告等多种类型数据,并快速提取和分析数据。  相似文献   

8.
GENEVESTIGATOR. Arabidopsis microarray database and analysis toolbox   总被引:26,自引:0,他引:26  
High-throughput gene expression analysis has become a frequent and powerful research tool in biology. At present, however, few software applications have been developed for biologists to query large microarray gene expression databases using a Web-browser interface. We present GENEVESTIGATOR, a database and Web-browser data mining interface for Affymetrix GeneChip data. Users can query the database to retrieve the expression patterns of individual genes throughout chosen environmental conditions, growth stages, or organs. Reversely, mining tools allow users to identify genes specifically expressed during selected stresses, growth stages, or in particular organs. Using GENEVESTIGATOR, the gene expression profiles of more than 22,000 Arabidopsis genes can be obtained, including those of 10,600 currently uncharacterized genes. The objective of this software application is to direct gene functional discovery and design of new experiments by providing plant biologists with contextual information on the expression of genes. The database and analysis toolbox is available as a community resource at https://www.genevestigator.ethz.ch.  相似文献   

9.
Background: In the field of bioinformatics interchangeable data formats based on XML are widely used. XML-type data is also at the core of most web services. With the increasing amount of data stored in XML comes the need for storing and accessing the data. In this paper we analyse the suitability of different database systems for storing and querying large datasets in general and Medline in particular.Results: All reviewed database systems perform well when tested with small to medium sized datasets, however when the full Medline dataset is queried a large variation in query times is observed. Conclusions: There is not one system that is vastly superior to the others in this comparison and, depending on the database size and the query requirements, different systems are most suitable. The best all-round solution is the Oracle 11~g database system using the new binary storage option. Alias-i's Lingpipe is a more lightweight, customizable and sufficiently fast solution. It does however require more initial configuration steps. For data with a changing XML structure Sedna and BaseX as native XML database systems or MySQL with an XML-type column are suitable.  相似文献   

10.
iFlora beta版移动平台软件研制   总被引:1,自引:0,他引:1  
iF|ora计划的目的之一是为提高公众对植物的认知和了解,移动设备平台软件开发是该计划的重要组成部分。因此,根据iFlora研发的总体需求,在中国植物物种信息数据库的基础上,设计和开发了iFlo—rabeta版移动平台软件。该软件在应用软件设计的基础上,针对安卓系统和iOS系统,实现对物种数据库的基本检索功能、向导式查询功能、专家互动功能及系统个性化功能,满足用户对植物数据快速检索及物种鉴定的需求。在此基础上,探讨了植物检索特征库和图片特征库的建设思路,为iFlora的总体目标实现奠定基础。  相似文献   

11.
Knowledge discovery from the exponentially growing body of structurally characterised protein-ligand complexes as a source of information in structure-based drug design is a major challenge in contemporary drug research. Given the need for powerful data retrieval, integration and analysis tools, Relibase was developed as a database system particularly designed to handle protein-ligand related problems and tasks. Here, we describe the design and functionality of the Relibase core database system. Features of Relibase include, e.g. the detailed analysis of superimposed ligand binding sites, ligand similarity and substructure searches, and 3D searches for protein-ligand and protein-protein interaction patterns. The broad range of functions provided in Relibase and its high level of data integration, along with its flexible and intuitive interface, makes Relibase an invaluable data mining tool which can significantly enhance the drug development process. An example, illustrating a 3D query for quarternary ligand nitrogen atoms interacting with aromatic ring systems in proteins, a pattern found in pharmaceutically relevant target proteins such as, e.g. acetylcholine-esterase, is discussed.  相似文献   

12.
李溪盛  马莺 《生物信息学》2014,12(4):287-291
为借助Internet技术进行数据的共享以及开发一套基于DNA指纹信息来识别粳稻品种的计算机功能平台,设计一套北方粳稻品种识别数据库系统。根据该数据库的设计所包含的信息含量,确定本数据库包含的四个表,并根据功能需求分析,设计了用户管理,北方粳稻DNA指纹数据管理,粳稻DNA指纹查询与粳稻品种识别分析四个功能模块,同时设计了该系统的界面设计图,为北方粳稻品种识别数据库系统的构建奠定基础。  相似文献   

13.
A distributed computing system is developed to search and analyze genetic databases using parallel computing technologies. Queries are processed by a local network PC cluster. A universal task and data exchange format is developed for effective query processing. A multilevel hierarchic task batching procedure is elaborated to generate multiple subtasks and distribute them over cluster units under dynamic priority levels and with dynamic distribution of replicated source data subbases. Primary source data preparation and generation of annotation word indices are used to significantly reduce query processing time.  相似文献   

14.
MOTIVATION: The information model chosen to store biological data affects the types of queries possible, database performance, and difficulty in updating that information model. Genetic sequence data for pharmacogenetics studies can be complex, and the best information model to use may change over time. As experimental and analytical methods change, and as biological knowledge advances, the data storage requirements and types of queries needed may also change. RESULTS: We developed a model for genetic sequence and polymorphism data, and used XML Schema to specify the elements and attributes required for this model. We implemented this model as an ontology in a frame-based representation and as a relational model in a database system. We collected genetic data from two pharmacogenetics resequencing studies, and formulated queries useful for analysing these data. We compared the ontology and relational models in terms of query complexity, performance, and difficulty in changing the information model. Our results demonstrate benefits of evolving the schema for storing pharmacogenetics data: ontologies perform well in early design stages as the information model changes rapidly and simplify query formulation, while relational models offer improved query speed once the information model and types of queries needed stabilize.  相似文献   

15.
Problem: A series of long‐term field experiments is described, with particular reference to monitoring and quality control. This paper addresses problems in data‐management of particular importance for long‐term studies, including data manipulation, archiving, quality assessment, and flexible retrieval for analysis Method: The problems were addressed using a purpose‐built database system, using commercial software and running under Microsoft Windows. Conclusion: The database system brings many advantages compared to available software, including significantly improved quality checking and access. The query system allows for easy access to data sets thus improving the efficiency of analysis. Quality assessments of the initial dataset demonstrated that the database system can also provide general insight into types and magnitudes of error in data‐sets. Finally, the system can be generalised to include data from a number of different projects, thus simplifying data manipulation for meta‐analysis.  相似文献   

16.
A distributed computing system is developed to search and analyze genetic databases using parallel computing technologies. Queries are processed by a local network PC cluster. A universal task and data exchange format is developed for effective query processing. A multilevel hierarchic task batching procedure is elaborated to generate multiple subtasks and distribute them over cluster units under dynamic priority levels and with dynamic distribution of replicated source data subbases. Primary source data preparation and generation of annotation word indices are used to significantly reduce query processing time.  相似文献   

17.
PeroxiBase: a class III plant peroxidase database   总被引:7,自引:0,他引:7  
Class III plant peroxidases (EC 1.11.1.7), which are encoded by multigenic families in land plants, are involved in several important physiological and developmental processes. Their varied functions are not yet clearly determined, but their characterization will certainly lead to a better understanding of plant growth, differentiation and interaction with the environment, and hence to many exciting applications. Since there is currently no central database for plant peroxidase sequences and many plant sequences are not deposited in the EMBL/GenBank/DDBJ repository or the UniProt KnowledgeBase, this prevents researchers from easily accessing all peroxidase sequences. Furthermore, gene expression data are poorly covered and annotations are inconsistent. In this rapidly moving field, there is a need for continual updating and correction of the peroxidase superfamily in plants. Moreover, consolidating information about peroxidases will allow for comparison of peroxidases between species and thus significantly help making correlations of function, structure or phylogeny. We report a new database (PeroxiBase) accessible through a web server with specific tools dedicated to facilitate query, classification and submission of peroxidase sequences. Recent developments in the field of plant peroxidase are also mentioned.  相似文献   

18.
A novel database and modified alignment program is described which provides a fast and accurate procedure for assigning nucleotide sequences to allele types for multi-locus sequence analysis (MLSA). The database has between 40 and 160 alleles per organism including Neisseria meningitidis, Streptococcus pneumoniae, Staphylococcus aureus and Haemophilus influenzae. The database directly compares the query nucleotide sequence against all alleles within the database and this system reduces the time taken for the analysis of nucleotide sequence data and assignment of alleles for subsequent sequence analysis.  相似文献   

19.
The automated sequence annotation pipeline (ASAP) is designed to ease routine investigation of new functional annotations on unknown sequences, such as expressed sequence tags (ESTs), through querying of web-accessible resources and maintenance of a local database. The system allows easy use of the output from one search as the input for a new search, as well as the filtering of results. The database is used to store formats and parameters and information for parsing data from web sites. The database permits easy updating of format information should a site modify the format of a query or of a returned web page.  相似文献   

20.
MOTIVATION: The exponential growth of sequence databases poses a major challenge to bioinformatics tools for querying alignment and annotation databases. There is a pressing need for methods for finding overlapping sequence intervals that are highly scalable to database size, query interval size, result size and construction/updating of the interval database. RESULTS: We have developed a new interval database representation, the Nested Containment List (NCList), whose query time is O(n + log N), where N is the database size and n is the size of the result set. In all cases tested, this query algorithm is 5-500-fold faster than other indexing methods tested in this study, such as MySQL multi-column indexing, MySQL binning and R-Tree indexing. We provide performance comparisons both in simulated datasets and real-world genome alignment databases, across a wide range of database sizes and query interval widths. We also present an in-place NCList construction algorithm that yields database construction times that are approximately 100-fold faster than other methods available. The NCList data structure appears to provide a useful foundation for highly scalable interval database applications. AVAILABILITY: NCList data structure is part of Pygr, a bioinformatics graph database library, available at http://sourceforge.net/projects/pygr  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号