首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 234 毫秒
1.
We here present the jmzReader library: a collection of Java application programming interfaces (APIs) to parse the most commonly used peak list and XML-based mass spectrometry (MS) data formats: DTA, MS2, MGF, PKL, mzXML, mzData, and mzML (based on the already existing API jmzML). The library is optimized to be used in conjunction with mzIdentML, the recently released standard data format for reporting protein and peptide identifications, developed by the HUPO proteomics standards initiative (PSI). mzIdentML files do not contain spectra data but contain references to different kinds of external MS data files. As a key functionality, all parsers implement a common interface that supports the various methods used by mzIdentML to reference external spectra. Thus, when developing software for mzIdentML, programmers no longer have to support multiple MS data file formats but only this one interface. The library (which includes a viewer) is open source and, together with detailed documentation, can be downloaded from http://code.google.com/p/jmzreader/.  相似文献   

2.
We here present jmzML, a Java API for the Proteomics Standards Initiative mzML data standard. Based on the Java Architecture for XML Binding and XPath‐based XML indexer random‐access XML parser, jmzML can handle arbitrarily large files in minimal memory, allowing easy and efficient processing of mzML files using the Java programming language. jmzML also automatically resolves internal XML references on‐the‐fly. The library (which includes a viewer) can be downloaded from http://jmzml.googlecode.com .  相似文献   

3.
The mzQuantML standard from the HUPO Proteomics Standards Initiative has recently been released, capturing quantitative data about peptides and proteins, following analysis of MS data. We present a Java application programming interface (API) for mzQuantML called jmzQuantML. The API provides robust bridges between Java classes and elements in mzQuantML files and allows random access to any part of the file. The API provides read and write capabilities, and is designed to be embedded in other software packages, enabling mzQuantML support to be added to proteomics software tools ( http://code.google.com/p/jmzquantml/ ). The mzQuantML standard is designed around a multilevel validation system to ensure that files are structurally and semantically correct for different proteomics quantitative techniques. In this article, we also describe a Java software tool ( http://code.google.com/p/mzquantml‐validator/ ) for validating mzQuantML files, which is a formal part of the data standard.  相似文献   

4.
5.
Many data manipulation processes involve the use of programming libraries. These processes may beneficially be automated due to their repeated use. A convenient type of automation is in the form of workflows that also allow such processes to be shared amongst the community. The Taverna workflow system has been extended to enable it to use and invoke Java classes and methods as tasks within Taverna workflows. These classes and methods are selected for use during workflow construction by a Java Doclet application called the API Consumer. This selection is stored as an XML file which enables Taverna to present the subset of the API for use in the composition of workflows. The ability of Taverna to invoke Java classes and methods is demonstrated by a workflow in which we use libSBML to map gene expression data onto a metabolic pathway represented as a SBML model. AVAILABILITY: Taverna and the API Consumer application can be freely downloaded from http://taverna.sourceforge.net  相似文献   

6.
mzTab is the most recent standard format developed by the Proteomics Standards Initiative. mzTab is a flexible tab‐delimited file that can capture identification and quantification results coming from MS‐based proteomics and metabolomics approaches. We here present an open‐source Java application programming interface for mzTab called jmzTab. The software allows the efficient processing of mzTab files, providing read and write capabilities, and is designed to be embedded in other software packages. The second key feature of the jmzTab model is that it provides a flexible framework to maintain the logical integrity between the metadata and the table‐based sections in the mzTab files. In this article, as two example implementations, we also describe two stand‐alone tools that can be used to validate mzTab files and to convert PRIDE XML files to mzTab. The library is freely available at http://mztab.googlecode.com .  相似文献   

7.
Protein identification using MS is an important technique in proteomics as well as a major generator of proteomics data. We have designed the protein identification data object model (PDOM) and developed a parser based on this model to facilitate the analysis and storage of these data. The parser works with HTML or XML files saved or exported from MASCOT MS/MS ions search in peptide summary report or MASCOT PMF search in protein summary report. The program creates PDOM objects, eliminates redundancy in the input file, and has the capability to output any PDOM object to a relational database. This program facilitates additional analysis of MASCOT search results and aids the storage of protein identification information. The implementation is extensible and can serve as a template to develop parsers for other search engines. The parser can be used as a stand-alone application or can be driven by other Java programs. It is currently being used as the front end for a system that loads HTML and XML result files of MASCOT searches into a relational database. The source code is freely available at http://www.ccbm.jhu.edu and the program uses only free and open-source Java libraries.  相似文献   

8.
MOTIVATION: The National Cancer Institute's Center for Bioinformatics (NCICB) has developed a Java based data management and information system called caCORE. One component of this software suite is the object oriented API (caBIO) used to access the rich biological datasets collected at the NCI. This API can access the data using native Java classes, SOAP requests or HTTP calls. Non-Java based clients wanting to use this API have to use the SOAP or HTTP interfaces with the data being returned from the NCI servers as an XML data stream. Although the XML can be read and manipulated using DOM or SAX parsers, one loses the convenience and usability of an object oriented programming paradigm. caBIONet is a set of .NET wrapper classes (managers, genes, chromosomes, sequences, etc.) capable of serializing the XML data stream into local .NET objects. The software is able to search NCICB databases and provide local objects representing the data that can be manipulated and used by other .NET programs. The software was written in C# and compiled as a .NET DLL.  相似文献   

9.
Our team developed a metadata editing and management system employing state of the art XML technologies initially aimed at the environmental sciences but with the potential to be useful across multiple domains. We chose a modular and distributed design for scalability, flexibility, options for customizations, and the possibility to add more functionality at a later stage. The system consists of a desktop design tool that generates code for the actual online editor, a native XML database, and an online user access management application. A Java Swing application that reads an XML schema, the design tool provides the designer with options to combine input fields into online forms with user-friendly tags and determine the flow of input forms. Based on design decisions, the tool generates XForm code for the online metadata editor which is based on the Orbeon XForms engine. The design tool fulfills two requirements: First data entry forms based on a schema are customized at design time and second the tool can generate data entry applications for any valid XML schema without relying on custom information in the schema. A configuration file in the design tool saves custom information generated at design time. Future developments will add functionality to the design tool to integrate help text, tool tips, project specific keyword lists, and thesaurus services.Cascading style sheets customize the look-and-feel of the finished editor. The editor produces XML files in compliance with the original schema, however, a user may save the input into a native XML database at any time independent of validity. The system uses the open source XML database eXist for storage and uses a MySQL relational database and a simple Java Server Faces user interface for file and access management. We chose three levels to distribute administrative responsibilities and handle the common situation of an information manager entering the bulk of the metadata but leave specifics to the actual data provider.  相似文献   

10.
11.
Pedro is a Java application that dynamically generates data entry forms for data models expressed in XML Schema, producing XML data files that validate against this schema. The software uses an intuitive tree-based navigation system, can supply context-sensitive help to users and features a sophisticated interface for populating data fields with terms from controlled vocabularies. The software also has the ability to import records from tab delimited text files and features various validation routines. AVAILABILITY: The application, source code, example models from several domains and tutorials can be downloaded from http://pedro.man.ac.uk/.  相似文献   

12.
The Human Proteome Organization's Proteomics Standards Initiative (PSI) promotes the development of exchange standards to improve data integration and interoperability. PSI specifies the suitable level of detail required when reporting a proteomics experiment (via the Minimum Information About a Proteomics Experiment), and provides extensible markup language (XML) exchange formats and dedicated controlled vocabularies (CVs) that must be combined to generate a standard compliant document. The framework presented here tackles the issue of checking that experimental data reported using a specific format, CVs and public bio‐ontologies (e.g. Gene Ontology, NCBI taxonomy) are compliant with the Minimum Information About a Proteomics Experiment recommendations. The semantic validator not only checks the XML syntax but it also enforces rules regarding the use of an ontology class or CV terms by checking that the terms exist in the resource and that they are used in the correct location of a document. Moreover, this framework is extremely fast, even on sizable data files, and flexible, as it can be adapted to any standard by customizing the parameters it requires: an XML Schema Definition, one or more CVs or ontologies, and a mapping file describing in a formal way how the semantic resources and the format are interrelated. As such, the validator provides a general solution to the common problem in data exchange: how to validate the correct usage of a data standard beyond simple XML Schema Definition validation. The framework source code and its various applications can be found at http://psidev.info/validator .  相似文献   

13.
Storing biological sequence databases in relational form   总被引:2,自引:0,他引:2  
SUMMARY: We have created a set of applications using Perl and Java in combination with XML technology to install biological sequence databases into an Oracle RDBMS. An easy-to-use interface using Java has been created for database query and other tools developed to integrate with our in-house bioinformatics applications. AVAILIBILITY: The database schema, DTD file, and source codes are available from the authors via email. CONTACT: guochun_ xie@merck. com  相似文献   

14.
  1. Download : Download high-res image (48KB)
  2. Download : Download full-size image
Highlights
  • •Incrementally build mzML 1.1, and mzIdentML 1.2 files in Python over a file stream.
  • •Traverse controlled vocabularies using common mapping patterns.
  • •Generate byte offset index as the document streams.
  • •Manage referential integrity on-the-fly.
  相似文献   

15.
The application of mass spectrometry imaging (MS imaging) is rapidly growing with a constantly increasing number of different instrumental systems and software tools. The data format imzML was developed to allow the flexible and efficient exchange of MS imaging data between different instruments and data analysis software. imzML data is divided in two files which are linked by a universally unique identifier (UUID). Experimental details are stored in an XML file which is based on the HUPO-PSI format mzML. Information is provided in the form of a 'controlled vocabulary' (CV) in order to unequivocally describe the parameters and to avoid redundancy in nomenclature. Mass spectral data are stored in a binary file in order to allow efficient storage. imzML is supported by a growing number of software tools. Users will be no longer limited to proprietary software, but are able to use the processing software best suited for a specific question or application. MS imaging data from different instruments can be converted to imzML and displayed with identical parameters in one software package for easier comparison. All technical details necessary to implement imzML and additional background information is available at www.imzml.org.  相似文献   

16.
SUMMARY: Linkage analysis software requires an input text file that describes the structure of the pedigrees to be analysed. Manual creation of these files is tedious and error-prone, and a graphical input tool is desirable. This is currently only available in commercial packages that include much greater functionality. We have therefore developed Pelican, a lightweight graphical pedigree editor for rapid construction of linkage pedigree files and diagrams. AVAILABILITY: The software runs on any Java-enabled machine (version 1.2 or higher). A Java Web Start launch, class files, a demonstration applet, source code and documentation are freely available at http://www.rfcgr.mrc.ac.uk/Software/PELICAN/  相似文献   

17.
Identification of proteins by MS plays an important role in proteomics. A crucial step concerns the identification of peptides from MS/MS spectra. The X!Tandem Project ( http://www.thegpm.org/tandem ) supplies an open‐source search engine for this purpose. In this study, we present an open‐source Java library called XTandem Parser that parses X!Tandem XML result files into an easily accessible and fully functional object model ( http://xtandem‐parser.googlecode.com ). In addition, a graphical user interface is provided that functions as a usage example and an end‐user visualization tool.  相似文献   

18.

Motivation

In mass spectrometry-based proteomics, XML formats such as mzML and mzXML provide an open and standardized way to store and exchange the raw data (spectra and chromatograms) of mass spectrometric experiments. These file formats are being used by a multitude of open-source and cross-platform tools which allow the proteomics community to access algorithms in a vendor-independent fashion and perform transparent and reproducible data analysis. Recent improvements in mass spectrometry instrumentation have increased the data size produced in a single LC-MS/MS measurement and put substantial strain on open-source tools, particularly those that are not equipped to deal with XML data files that reach dozens of gigabytes in size.

Results

Here we present a fast and versatile parsing library for mass spectrometric XML formats available in C++ and Python, based on the mature OpenMS software framework. Our library implements an API for obtaining spectra and chromatograms under memory constraints using random access or sequential access functions, allowing users to process datasets that are much larger than system memory. For fast access to the raw data structures, small XML files can also be completely loaded into memory. In addition, we have improved the parsing speed of the core mzML module by over 4-fold (compared to OpenMS 1.11), making our library suitable for a wide variety of algorithms that need fast access to dozens of gigabytes of raw mass spectrometric data.

Availability

Our C++ and Python implementations are available for the Linux, Mac, and Windows operating systems. All proposed modifications to the OpenMS code have been merged into the OpenMS mainline codebase and are available to the community at https://github.com/OpenMS/OpenMS.  相似文献   

19.
Liquid chromatography coupled tandem mass spectrometry (LC‐MS/MS) is an important technique for detecting peptides in proteomics studies. Here, we present an open source software tool, termed IPeak, a peptide identification pipeline that is designed to combine the Percolator post‐processing algorithm and multi‐search strategy to enhance the sensitivity of peptide identifications without compromising accuracy. IPeak provides a graphical user interface (GUI) as well as a command‐line interface, which is implemented in JAVA and can work on all three major operating system platforms: Windows, Linux/Unix and OS X. IPeak has been designed to work with the mzIdentML standard from the Proteomics Standards Initiative (PSI) as an input and output, and also been fully integrated into the associated mzidLibrary project, providing access to the overall pipeline, as well as modules for calling Percolator on individual search engine result files. The integration thus enables IPeak (and Percolator) to be used in conjunction with any software packages implementing the mzIdentML data standard. IPeak is freely available and can be downloaded under an Apache 2.0 license at https://code.google.com/p/mzidentml‐lib/ .  相似文献   

20.
The Open Biomedical Ontologies (OBO) format from the GO consortium is a very successful format for biomedical ontologies, including the Gene Ontology. But it lacks formal computational definitions for its constructs and tools, like DL reasoners, to facilitate ontology development/maintenance. We describe the OBO Converter, a Java tool to convert files from OBO format to Web Ontology Language (OWL) (and vice versa) that can also be used as a Protégé Tab plug-in. It uses the OBO to OWL mapping provided by the National Center for Biomedical Ontologies (NCBO) (a joint effort of OBO developers and OWL experts) and offers options to ease the task of saving/reading files in both formats. AVAILABILITY: bioontology.org/tools/oboinowl/obo_converter.html. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号