首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This study provides an experimental performance evaluation on population-based queries of NoSQL databases storing archetype-based Electronic Health Record (EHR) data. There are few published studies regarding the performance of persistence mechanisms for systems that use multilevel modelling approaches, especially when the focus is on population-based queries. A healthcare dataset with 4.2 million records stored in a relational database (MySQL) was used to generate XML and JSON documents based on the openEHR reference model. Six datasets with different sizes were created from these documents and imported into three single machine XML databases (BaseX, eXistdb and Berkeley DB XML) and into a distributed NoSQL database system based on the MapReduce approach, Couchbase, deployed in different cluster configurations of 1, 2, 4, 8 and 12 machines. Population-based queries were submitted to those databases and to the original relational database. Database size and query response times are presented. The XML databases were considerably slower and required much more space than Couchbase. Overall, Couchbase had better response times than MySQL, especially for larger datasets. However, Couchbase requires indexing for each differently formulated query and the indexing time increases with the size of the datasets. The performances of the clusters with 2, 4, 8 and 12 nodes were not better than the single node cluster in relation to the query response time, but the indexing time was reduced proportionally to the number of nodes. The tested XML databases had acceptable performance for openEHR-based data in some querying use cases and small datasets, but were generally much slower than Couchbase. Couchbase also outperformed the response times of the relational database, but required more disk space and had a much longer indexing time. Systems like Couchbase are thus interesting research targets for scalable storage and querying of archetype-based EHR data when population-based use cases are of interest.  相似文献   

2.
The EBI SRS server--recent developments   总被引:4,自引:0,他引:4  
MOTIVATION: The current data explosion is intractable without advanced data management systems. The numerous data sets become really useful when they are interconnected under a uniform interface--representing the domain knowledge. The SRS has become an integration system for both data retrieval and applications for data analysis. It provides capabilities to search multiple databases by shared attributes and to query across databases fast and efficiently. RESULTS: Here we present recent developments at the EBI SRS server (http://srs.ebi.ac.uk). The EBI SRS server contains today more than 130 biological databases and integrates more than 10 applications. It is a central resource for molecular biology data as well as a reference server for the latest developments in data integration. One of the latest additions to the EBI SRS server is the InterPro database-Integrated Resource of Protein Domains and Functional Sites. Distributed in XML format it became a turning point in low level XML-SRS integration. We present InterProScan as an example of data analysis applications, describe some advanced features of SRS6, and introduce the SRSQuickSearch JavaScript interfaces to SRS.  相似文献   

3.
The EBI SRS server-new features   总被引:4,自引:0,他引:4  
MOTIVATION: Here we report on recent developments at the EBI SRS server (http://srs.ebi.ac.uk). SRS has become an integration system for both data retrieval and sequence analysis applications. The EBI SRS server is a primary gateway to major databases in the field of molecular biology produced and supported at EBI as well as European public access point to the MEDLINE database provided by US National Library of Medicine (NLM). It is a reference server for latest developments in data and application integration. The new additions include: concept of virtual databases, integration of XML databases like the Integrated Resource of Protein Domains and Functional Sites (InterPro), Gene Ontology (GO), MEDLINE, Metabolic pathways, etc., user friendly data representation in 'Nice views', SRSQuickSearch bookmarklets. AVAILABILITY: SRS6 is a licensed product of LION Bioscience AG freely available for academics. The EBI SRS server (http://srs.ebi.ac.uk) is a free central resource for molecular biology data as well as a reference server for the latest developments in data integration.  相似文献   

4.
HCVDB   总被引:2,自引:0,他引:2  
To date, more than 30 000 hepatitis C virus (HCV) sequences have been deposited in the generalist databases DNA Data Bank of Japan (DDBJ), EMBL Nucleotide Sequence Database (EMBL) and GenBank. The main difficulties with HCV sequences in these databases are their retrieval, annotation and analyses. To help HCV researchers face the increasing needs of HCV sequence analyses, we developed a specialised database of computer-annotated HCV sequences, called HCVDB. HCVDB is re-built every month from an up-to-date EMBL database by an automated process. HCVDB provides key data about the HCV sequences (e.g. genotype, genomic region, protein names and functions, known 3-dimensional structures) and ensures consistency of the annotations, which enables reliable keyword queries. The database is highly integrated with sequence and structure analysis tools and the SRS (LION bioscience) keywords query system. Thus, any user can extract subsets of sequences matching particular criteria or enter their own sequences and analyse them with various bioinformatics programs available on the same server. AVAILABILITY: HCVDB is available from http://hepatitis.ibcp.fr.  相似文献   

5.
Regarding molecular biology, we see an exponential growth of data and knowledge. Among others, this fact is reflected in more than 300 molecular databases which are readily available on the Internet. The usage of these data requires integration tools capable of complex information fusion processes. This paper will present a novel concept for user specific integration of life science data. Our approach is based on a mediator architecture in conjunction with freely adjustable data schemes. The implemented prototype is called BioDataServer and can be accessed on the Internet: http://integration.genophen.de. To realize a comfortable usage of the resulted data sets of the integration process, a SQL-based query language and a XML data format were developed and implemented.  相似文献   

6.
pProRep is a web application integrating electrophoretic and mass spectral data from proteome analyses into a relational database. The graphical web-interface allows users to upload, analyse and share experimental proteome data. It offers researchers the possibility to query all previously analysed datasets and can visualize selected features, such as the presence of a certain set of ions in a peptide mass spectrum, on the level of the two-dimensional gel. AVAILABILITY: The pProRep package and instructions for its use can be downloaded from http://www.ptools.ua.ac.be/pProRep. The application requires a web server that runs PHP 5 (http://www.php.net) and MySQL. Some (non-essential) extensions need additional freely available libraries: details are described in the installation instructions.  相似文献   

7.
The Plant Gene Index (PlantGI) database is developed as a web-based search system with search capabilities for keywords to provide information on gene indices specifically for agricultural plants. The database contains specific Gene Index information for ten agricultural species, namely, rice, Chinese cabbage, wheat, maize, soybean, barley, mushroom, Arabidopsis, hot pepper and tomato. PlantGI differs from other Gene Index databases in being specific to agricultural plant species and thus complements services from similar other developments. The database includes options for interactive mining of EST CONTIGS and assembled EST data for user specific keyword queries. The current version of PlantGI contains a total of 34,000 EST CONTIGS data for rice (8488 records), wheat (8560 records), maize (4570 records), soybean (3726 records), barley (3417 records), Chinese cabbage (3602 records), tomato (1236 records), hot pepper (998 records), mushroom (130 records) and Arabidopsis (8 records). AVAILABILITY: The database is available for free at http://www.niab.go.kr/nabic/.  相似文献   

8.
为了建立基于EBI数据库的本地SRS服务系统,进而为生物医学研究人员提供方便、快速搜索EBI常用生物数据库的web服务,同时提供一套技术机制为其他生物医学研究机构建设自己的SRS服务系统提供参考。在Linux系统和Tomcat环境下安装调试EBI提供的SRS8.1学术版软件,并利用perl和shell程序设计语言开发EBI数据库的自动下载和定期更新模块。完成了本地SRS系统的安装和测试,实现了EBI数据库的自动下载和更新机制,目前系统已经正常运行。  相似文献   

9.
MOTIVATION: The exponential growth of sequence databases poses a major challenge to bioinformatics tools for querying alignment and annotation databases. There is a pressing need for methods for finding overlapping sequence intervals that are highly scalable to database size, query interval size, result size and construction/updating of the interval database. RESULTS: We have developed a new interval database representation, the Nested Containment List (NCList), whose query time is O(n + log N), where N is the database size and n is the size of the result set. In all cases tested, this query algorithm is 5-500-fold faster than other indexing methods tested in this study, such as MySQL multi-column indexing, MySQL binning and R-Tree indexing. We provide performance comparisons both in simulated datasets and real-world genome alignment databases, across a wide range of database sizes and query interval widths. We also present an in-place NCList construction algorithm that yields database construction times that are approximately 100-fold faster than other methods available. The NCList data structure appears to provide a useful foundation for highly scalable interval database applications. AVAILABILITY: NCList data structure is part of Pygr, a bioinformatics graph database library, available at http://sourceforge.net/projects/pygr  相似文献   

10.
基因表达谱微阵列数据库是一类可提供存储、查询、下载分析的在线网络数据库,在肿瘤相关领域的研究中提供了大量的数据来源。由于微阵列分析对于无生物/医学信息学专业背景的研究人员仍然有较多困难,致使该数据库的使用尚未普及。本文从数据查询、下载分析和使用方法等方面对常用基因表达谱微阵列数据库进行概述,并对现阶段基因表达微阵列数据库的应用策略进行总结,旨在帮助该领域研究的初学工作者了解数据库的基本知识并推动其在科研工作中的应用。  相似文献   

11.
MedlineR: an open source library in R for Medline literature data mining   总被引:3,自引:0,他引:3  
SUMMARY: We describe an open source library written in the R programming language for Medline literature data mining. This MedlineR library includes programs to query Medline through the NCBI PubMed database; to construct the co-occurrence matrix; and to visualize the network topology of query terms. The open source nature of this library allows users to extend it freely in the statistical programming language of R. To demonstrate its utility, we have built an application to analyze term-association by using only 10 lines of code. We provide MedlineR as a library foundation for bioinformaticians and statisticians to build more sophisticated literature data mining applications. AVAILABILITY: The library is available from http://dbsr.duke.edu/pub/MedlineR.  相似文献   

12.
13.
Storing biological sequence databases in relational form   总被引:2,自引:0,他引:2  
SUMMARY: We have created a set of applications using Perl and Java in combination with XML technology to install biological sequence databases into an Oracle RDBMS. An easy-to-use interface using Java has been created for database query and other tools developed to integrate with our in-house bioinformatics applications. AVAILIBILITY: The database schema, DTD file, and source codes are available from the authors via email. CONTACT: guochun_ xie@merck. com  相似文献   

14.
An object-oriented database system has been developed which is being used to store protein structure data. The database can be queried using the logic programming language Prolog or the query language Daplex. Queries retrieve information by navigating through a network of objects which represent the primary, secondary and tertiary structures of proteins. Routines written in both Prolog and Daplex can integrate complex calculations with the retrieval of data from the database, and can also be stored in the database for sharing among users. Thus object-oriented databases are better suited to prototyping applications and answering complex queries about protein structure than relational databases. This system has been used to find loops of varying length and anchor positions when modelling homologous protein structures.  相似文献   

15.
Background: In the field of bioinformatics interchangeable data formats based on XML are widely used. XML-type data is also at the core of most web services. With the increasing amount of data stored in XML comes the need for storing and accessing the data. In this paper we analyse the suitability of different database systems for storing and querying large datasets in general and Medline in particular.Results: All reviewed database systems perform well when tested with small to medium sized datasets, however when the full Medline dataset is queried a large variation in query times is observed. Conclusions: There is not one system that is vastly superior to the others in this comparison and, depending on the database size and the query requirements, different systems are most suitable. The best all-round solution is the Oracle 11~g database system using the new binary storage option. Alias-i's Lingpipe is a more lightweight, customizable and sufficiently fast solution. It does however require more initial configuration steps. For data with a changing XML structure Sedna and BaseX as native XML database systems or MySQL with an XML-type column are suitable.  相似文献   

16.
Scan operation will involve many fragments and cause many extra invalid partitioning query operations in distributed column-oriented database which affects query efficiency seriously, especially for spatial data. To solve this question, this paper refers to partitioning strategy in distributed column-oriented database and advocates a spatial data storage optimization strategy named ‘SPPS’. This strategy makes adjacent spatial objects stored in the same data fragment with considering spatial adjacency, and reserves the spatial information of each fragment. Thus spatial query operation can locate the relevant fragment on basis of spatial information of fragment, and extra invalid partitioning scan operations would be lighted. Then the storage and query efficiency would be improved. To verify the validity of ‘SPPS’ optimization strategy, this paper carries on relevant experiments based on HBase and records spatial query efficiency with and without ‘SPPS’ respectively. The experiments results indicate that ‘SPPS’ strategy can optimize the storage and query efficiency in distributed column-oriented databases.  相似文献   

17.
ViroBLAST is a stand-alone BLAST web interface for nucleotide and amino acid sequence similarity searches. It extends the utility of BLAST to query against multiple sequence databases and user sequence datasets, and provides a friendly output to easily parse and navigate BLAST results. ViroBLAST is readily useful for all research areas that require BLAST functions and is available online and as a downloadable archive for independent installation. Availability: http://indra.mullins.microbiol.washington.edu/blast/viroblast.php.  相似文献   

18.
以NCBI维护的一级数据库为数据源建立植物激素相关核酸和蛋白质二级数据库。将该二级数据库设计为基因、蛋白质和文献三部分, 编写软件从上述数据源中采集数据, 并以XML作为中间格式保存, 通过解析提交到二级数据库中并集成部分生物信息学工具软件, 初步实现了数据检索、统计分析、基于Web的本地化BLAST同源序列检索、序列的自动拼接以及蛋白质结构和功能位点的分析等功能。该二级数据库的构建为植物激素作用分子机理研究提供了高针对性的植物激素数据源和生物信息学辅助工具。  相似文献   

19.
为配合总体的实验研究构建了中华民族基因组多态性(Genomic Polymorphism of Chinese Ethnic Groups,简称GPCEG)数据库,现已初步建成包括民族名称、基本情况介绍、体态特征、基因多态性数据、永生细胞株系、参考文献、国际相关数据库连接等内容的数据库,并完成了其可视化浏览及查询系统的建立,为建成具有中国特色的国家自有数据库奠定了基础,也可为从事相关研究的科学工作者提供信息服务。  相似文献   

20.
SRS (Sequence Retrieval System), an indexing system for flatfile libraries, provides fast access to individual library entriesvia retrie by keywords from rious data fields. SRS is now alsoable to build indices using cross–references that mostlibraries provide. Fifteen libraries of DNA and protein sequencesand structures have been selected. These libraries interactwith at least one other by means of cross–references.Indexing these cross–references allows a complete networkof libraries to be built. In the network an entry from one librarycan be linked in principle to every other library. If two librariesare not directly cross–referenced, the linkage can bemade with a succession of single links between neighbouring,cross–referenced libraries. A new operator has been addedto the query language of SRS for convenient specification oflinks amongst complete libraries or entry sets generated byprevious queries on particular libraries. All the informationin the network can now be used to retrieve an entry in a specificlibrary, e.g. the full information given in amino acid sequenceentries from SwissProt can now be used to retrieve related tertiarystructure entries from PDB. Furthermore, a search in a singlelibrary can be extended to a search in the complete librarynetwork, e.g. all entries in all databases pertaining to elastasecan be found.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号