首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Dendrochronological data formats in general offer limited space for recording associated metadata. Such information is often recorded separately from the actual time series, and often only on paper. TRiDaBASE has been developed to improve metadata administration. It is a relational Microsoft Access database that allows users to register digital metadata according to TRiDaS, to generate TRiDaS XML for uploading to TRiDaS-based analytical systems and repositories, and to ingest TRiDaS XML created elsewhere for local querying and analyses.  相似文献   

2.
3.
There is a great need for standards in the orthology field. Users must contend with different ortholog data representations from each provider, and the providers themselves must independently gather and parse the input sequence data. These burdensome and redundant procedures make data comparison and integration difficult. We have designed two XML-based formats, SeqXML and OrthoXML, to solve these problems. SeqXML is a lightweight format for sequence records-the input for orthology prediction. It stores the same sequence and metadata as typical FASTA format records, but overcomes common problems such as unstructured metadata in the header and erroneous sequence content. XML provides validation to prevent data integrity problems that are frequent in FASTA files. The range of applications for SeqXML is broad and not limited to ortholog prediction. We provide read/write functions for BioJava, BioPerl, and Biopython. OrthoXML was designed to represent ortholog assignments from any source in a consistent and structured way, yet cater to specific needs such as scoring schemes or meta-information. A unified format is particularly valuable for ortholog consumers that want to integrate data from numerous resources, e.g. for gene annotation projects. Reference proteomes for 61 organisms are already available in SeqXML, and 10 orthology databases have signed on to OrthoXML. Adoption by the entire field would substantially facilitate exchange and quality control of sequence and orthology information.  相似文献   

4.
5.

Background  

Flow cytometry technology is widely used in both health care and research. The rapid expansion of flow cytometry applications has outpaced the development of data storage and analysis tools. Collaborative efforts being taken to eliminate this gap include building common vocabularies and ontologies, designing generic data models, and defining data exchange formats. The Minimum Information about a Flow Cytometry Experiment (MIFlowCyt) standard was recently adopted by the International Society for Advancement of Cytometry. This standard guides researchers on the information that should be included in peer reviewed publications, but it is insufficient for data exchange and integration between computational systems. The Functional Genomics Experiment (FuGE) formalizes common aspects of comprehensive and high throughput experiments across different biological technologies. We have extended FuGE object model to accommodate flow cytometry data and metadata.  相似文献   

6.
MOTIVATION: A Robot Scientist is a physically implemented robotic system that can automatically carry out cycles of scientific experimentation. We are commissioning a new Robot Scientist designed to investigate gene function in S. cerevisiae. This Robot Scientist will be capable of initiating >1,000 experiments, and making >200,000 observations a day. Robot Scientists provide a unique test bed for the development of methodologies for the curation and annotation of scientific experiments: because the experiments are conceived and executed automatically by computer, it is possible to completely capture and digitally curate all aspects of the scientific process. This new ability brings with it significant technical challenges. To meet these we apply an ontology driven approach to the representation of all the Robot Scientist's data and metadata. RESULTS: We demonstrate the utility of developing an ontology for our new Robot Scientist. This ontology is based on a general ontology of experiments. The ontology aids the curation and annotating of the experimental data and metadata, and the equipment metadata, and supports the design of database systems to hold the data and metadata. AVAILABILITY: EXPO in XML and OWL formats is at: http://sourceforge.net/projects/expo/. All materials about the Robot Scientist project are available at: http://www.aber.ac.uk/compsci/Research/bio/robotsci/.  相似文献   

7.
Tree-ring research and collaboration are currently being hampered by the lack of a suitable data-transfer standard for both data and metadata. This paper highlights the issues currently being faced and proposes a solution that, if adopted by the global dendro community, will open up the possibility of exciting new research collaborations. The solution consists of a data model for dendrochronological data and metadata, and an eXtensible Markup Language (XML) schema as a technical vehicle to exchange this data and metadata. The technology and structure of the standard enable future versions to be developed that will satisfy evolving requirements whilst remaining backwards compatible.  相似文献   

8.
生物多样性数据集成模式初探   总被引:1,自引:0,他引:1  
本文以生物多样性研究发展现状的分析为基础,为生物多样性保护政策的制定提供可靠的数据支持为目标,通过对国内外几个著名的生物多样性数据库建设情况的分析,从相关学者的需求出发,提出了设计一个多层次多角度并带有一定人工智能的生物多样性集成数据库的构想。该系统基于都柏林核心(Dublin Core)的数据规范,并符合基于开放文献预研的元数据互操作协议(The Open Archixles Initiative Protocol for Metadata Hatvesting,OAIPMH)的标准,是一个集文字、图件、图片、声音、影像为一体的,能够在网上和硬件载体(如光盘)上同时进行发布的分布式数据库平台。其网上数据库系统的子系统之间以及子系统和硬件载体之间可以通过元数据获取的开放档案倡议协议互相交换数据。  相似文献   

9.
Data support knowledge development and theory advances in ecology and evolution. We are increasingly reusing data within our teams and projects and through the global, openly archived datasets of others. Metadata can be challenging to write and interpret, but it is always crucial for reuse. The value metadata cannot be overstated—even as a relatively independent research object because it describes the work that has been done in a structured format. We advance a new perspective and classify methods for metadata curation and development with tables. Tables with templates can be effectively used to capture all components of an experiment or project in a single, easy‐to‐read file familiar to most scientists. If coupled with the R programming language, metadata from tables can then be rapidly and reproducibly converted to publication formats including extensible markup language files suitable for data repositories. Tables can also be used to summarize existing metadata and store metadata across many datasets. A case study is provided and the added benefits of tables for metadata, a priori, are developed to ensure a more streamlined publishing process for many data repositories used in ecology, evolution, and the environmental sciences. In ecology and evolution, researchers are often highly tabular thinkers from experimental data collection in the lab and/or field, and representations of metadata as a table will provide novel research and reuse insights.  相似文献   

10.
Dynamic publication model for neurophysiology databases.   总被引:2,自引:0,他引:2  
We have implemented a pair of database projects, one serving cortical electrophysiology and the other invertebrate neurones and recordings. The design for each combines aspects of two proven schemes for information interchange. The journal article metaphor determined the type, scope, organization and quantity of data to comprise each submission. Sequence databases encouraged intuitive tools for data viewing, capture, and direct submission by authors. Neurophysiology required transcending these models with new datatypes. Time-series, histogram and bivariate datatypes, including illustration-like wrappers, were selected by their utility to the community of investigators. As interpretation of neurophysiological recordings depends on context supplied by metadata attributes, searches are via visual interfaces to sets of controlled-vocabulary metadata trees. Neurones, for example, can be specified by metadata describing functional and anatomical characteristics. Permanence is advanced by data model and data formats largely independent of contemporary technology or implementation, including Java and the XML standard. All user tools, including dynamic data viewers that serve as a virtual oscilloscope, are Java-based, free, multiplatform, and distributed by our application servers to any contemporary networked computer. Copyright is retained by submitters; viewer displays are dynamic and do not violate copyright of related journal figures. Panels of neurophysiologists view and test schemas and tools, enhancing community support.  相似文献   

11.
  1. Metadata plays an essential role in the long‐term preservation, reuse, and interoperability of data. Nevertheless, creating useful metadata can be sufficiently difficult and weakly enough incentivized that many datasets may be accompanied by little or no metadata. One key challenge is, therefore, how to make metadata creation easier and more valuable. We present a solution that involves creating domain‐specific metadata schemes that are as complex as necessary and as simple as possible. These goals are achieved by co‐development between a metadata expert and the researchers (i.e., the data creators). The final product is a bespoke metadata scheme into which researchers can enter information (and validate it) via the simplest of interfaces: a web browser application and a spreadsheet.
  2. We provide the R package dmdScheme (dmdScheme: An R package for working with domain specific MetaData schemes (Version v0.9.22), 2019) for creating a template domain‐specific scheme. We describe how to create a domain‐specific scheme from this template, including the iterative co‐development process, and the simple methods for using the scheme, and simple methods for quality assessment, improvement, and validation.
  3. The process of developing a metadata scheme following the outlined approach was successful, resulting in a metadata scheme which is used for the data generated in our research group. The validation quickly identifies forgotten metadata, as well as inconsistent metadata, therefore improving the quality of the metadata. Multiple output formats are available, including XML.
  4. Making the provision of metadata easier while also ensuring high quality must be a priority for data curation initiatives. We show how both objectives are achieved by close collaboration between metadata experts and researchers to create domain‐specific schemes. A near‐future priority is to provide methods to interface domain‐specific schemes with general metadata schemes, such as the Ecological Metadata Language, to increase interoperability.

The article describes a methodology to develop, enter, and validate domain specific metadata schemes which is suitable to be used by nonmetadata specialists. The approach uses an R package which forms the backend of the processing of the metadata, uses spreadsheets to enter the metadata, and provides a server based approach to distribute and use the developed metadata schemes.  相似文献   

12.

Motivation

In mass spectrometry-based proteomics, XML formats such as mzML and mzXML provide an open and standardized way to store and exchange the raw data (spectra and chromatograms) of mass spectrometric experiments. These file formats are being used by a multitude of open-source and cross-platform tools which allow the proteomics community to access algorithms in a vendor-independent fashion and perform transparent and reproducible data analysis. Recent improvements in mass spectrometry instrumentation have increased the data size produced in a single LC-MS/MS measurement and put substantial strain on open-source tools, particularly those that are not equipped to deal with XML data files that reach dozens of gigabytes in size.

Results

Here we present a fast and versatile parsing library for mass spectrometric XML formats available in C++ and Python, based on the mature OpenMS software framework. Our library implements an API for obtaining spectra and chromatograms under memory constraints using random access or sequential access functions, allowing users to process datasets that are much larger than system memory. For fast access to the raw data structures, small XML files can also be completely loaded into memory. In addition, we have improved the parsing speed of the core mzML module by over 4-fold (compared to OpenMS 1.11), making our library suitable for a wide variety of algorithms that need fast access to dozens of gigabytes of raw mass spectrometric data.

Availability

Our C++ and Python implementations are available for the Linux, Mac, and Windows operating systems. All proposed modifications to the OpenMS code have been merged into the OpenMS mainline codebase and are available to the community at https://github.com/OpenMS/OpenMS.  相似文献   

13.
The Human Proteome Organization's Proteomics Standards Initiative (PSI) promotes the development of exchange standards to improve data integration and interoperability. PSI specifies the suitable level of detail required when reporting a proteomics experiment (via the Minimum Information About a Proteomics Experiment), and provides extensible markup language (XML) exchange formats and dedicated controlled vocabularies (CVs) that must be combined to generate a standard compliant document. The framework presented here tackles the issue of checking that experimental data reported using a specific format, CVs and public bio‐ontologies (e.g. Gene Ontology, NCBI taxonomy) are compliant with the Minimum Information About a Proteomics Experiment recommendations. The semantic validator not only checks the XML syntax but it also enforces rules regarding the use of an ontology class or CV terms by checking that the terms exist in the resource and that they are used in the correct location of a document. Moreover, this framework is extremely fast, even on sizable data files, and flexible, as it can be adapted to any standard by customizing the parameters it requires: an XML Schema Definition, one or more CVs or ontologies, and a mapping file describing in a formal way how the semantic resources and the format are interrelated. As such, the validator provides a general solution to the common problem in data exchange: how to validate the correct usage of a data standard beyond simple XML Schema Definition validation. The framework source code and its various applications can be found at http://psidev.info/validator .  相似文献   

14.
15.
Metadata describe the ancillary information needed for data preservation and independent interpretation, comparison across heterogeneous datasets, and quality assessment and quality control (QA/QC). Environmental observations are vastly diverse in type and structure, can be taken across a wide range of spatiotemporal scales in a variety of measurement settings and approaches, and saved in multiple formats. Thus, well-organized, consistent metadata are required to produce usable data products from diverse environmental observations collected across field sites. However, existing metadata reporting protocols do not support the complex data synthesis and model-data integration needs of interdisciplinary earth system research. We developed a metadata reporting framework (FRAMES) to enable management and synthesis of observational data that are essential in advancing a predictive understanding of earth systems. FRAMES utilizes best practices for data and metadata organization enabling consistent data reporting and compatibility with a variety of standardized data protocols. We used an iterative scientist-centered design process to develop FRAMES, resulting in a data reporting format that incorporates existing field practices to maximize data-entry efficiency. Thus, FRAMES has a modular organization that streamlines metadata reporting and can be expanded to incorporate additional data types. With FRAMES's multi-scale measurement position hierarchy, data can be reported at observed spatial resolutions and then easily aggregated and linked across measurement types to support model-data integration. FRAMES is in early use by both data originators (persons generating data) and consumers (persons using data and metadata). In this paper, we describe FRAMES, identify lessons learned, and discuss areas of future development.  相似文献   

16.
As modern computer systems face the challenge of large data, filesystems have to deal with a large number of files. This leads to amplified concerns of metadata operations as well as data operations. Most filesystems manage metadata of files by constructing in-memory data structures, such as directory entry (dentry) and inode. We found inefficiencies on management of metadata in existing filesystems, such as path traversal mechanism. In this article, we optimize the metadata operations by (1) looking up dentry cache (dcache) hash table in backward manner. To adopt the backward finding mechanism, we devise the rename and permission-granted mechanism. We also propose (2) compacting the metadata into dentry structures for in-memory space efficiency. We evaluate our optimized metadata managing mechanisms with several benchmarks, including a real-world workload. These optimizations significantly reduce dcache lookup latency by up to 40% and improve overall throughput by up to 72% in a real-world benchmark.  相似文献   

17.
Background: In the field of bioinformatics interchangeable data formats based on XML are widely used. XML-type data is also at the core of most web services. With the increasing amount of data stored in XML comes the need for storing and accessing the data. In this paper we analyse the suitability of different database systems for storing and querying large datasets in general and Medline in particular.Results: All reviewed database systems perform well when tested with small to medium sized datasets, however when the full Medline dataset is queried a large variation in query times is observed. Conclusions: There is not one system that is vastly superior to the others in this comparison and, depending on the database size and the query requirements, different systems are most suitable. The best all-round solution is the Oracle 11~g database system using the new binary storage option. Alias-i's Lingpipe is a more lightweight, customizable and sufficiently fast solution. It does however require more initial configuration steps. For data with a changing XML structure Sedna and BaseX as native XML database systems or MySQL with an XML-type column are suitable.  相似文献   

18.
Our team developed a metadata editing and management system employing state of the art XML technologies initially aimed at the environmental sciences but with the potential to be useful across multiple domains. We chose a modular and distributed design for scalability, flexibility, options for customizations, and the possibility to add more functionality at a later stage. The system consists of a desktop design tool that generates code for the actual online editor, a native XML database, and an online user access management application. A Java Swing application that reads an XML schema, the design tool provides the designer with options to combine input fields into online forms with user-friendly tags and determine the flow of input forms. Based on design decisions, the tool generates XForm code for the online metadata editor which is based on the Orbeon XForms engine. The design tool fulfills two requirements: First data entry forms based on a schema are customized at design time and second the tool can generate data entry applications for any valid XML schema without relying on custom information in the schema. A configuration file in the design tool saves custom information generated at design time. Future developments will add functionality to the design tool to integrate help text, tool tips, project specific keyword lists, and thesaurus services.Cascading style sheets customize the look-and-feel of the finished editor. The editor produces XML files in compliance with the original schema, however, a user may save the input into a native XML database at any time independent of validity. The system uses the open source XML database eXist for storage and uses a MySQL relational database and a simple Java Server Faces user interface for file and access management. We chose three levels to distribute administrative responsibilities and handle the common situation of an information manager entering the bulk of the metadata but leave specifics to the actual data provider.  相似文献   

19.
20.
The Annual Spring workshop of the HUPO-PSI was this year held at the EMBL International Centre for Advanced Training (EICAT) in Heidelberg, Germany. Delegates briefly reviewed the successes of the group to date. These include the wide spread implementation of the molecular interaction data exchange formats, PSI-MI XML2.5 and MITAB, and also of mzML, the standard output format for mass spectrometer output data. These successes have resulted in enhanced accessibility to published data, for example the development of the PSICQUIC common query interface for interaction data and the development of databases such as PRIDE to act as public repositories for proteomics data and increased biosharing, through the development of consortia, for example IMEx and ProteomeXchange which will both share the burden of curating the increasing amounts of data being published and work together to make this more accessible to the bench scientist. Work then started over the three days of the workshop, with a focus on advancing the draft format for handling quantitative mass spectrometry data (mzQuantML) and further developing TraML, a standardized format for the exchange and transmission of transition lists for SRM experiments.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号