首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Mass spectrometry-based proteomics is increasingly being used in biomedical research. These experiments typically generate a large volume of highly complex data, and the volume and complexity are only increasing with time. There exist many software pipelines for analyzing these data (each typically with its own file formats), and as technology improves, these file formats change and new formats are developed. Files produced from these myriad software programs may accumulate on hard disks or tape drives over time, with older files being rendered progressively more obsolete and unusable with each successive technical advancement and data format change. Although initiatives exist to standardize the file formats used in proteomics, they do not address the core failings of a file-based data management system: (1) files are typically poorly annotated experimentally, (2) files are "organically" distributed across laboratory file systems in an ad hoc manner, (3) files formats become obsolete, and (4) searching the data and comparing and contrasting results across separate experiments is very inefficient (if possible at all). Here we present a relational database architecture and accompanying web application dubbed Mass Spectrometry Data Platform that is designed to address the failings of the file-based mass spectrometry data management approach. The database is designed such that the output of disparate software pipelines may be imported into a core set of unified tables, with these core tables being extended to support data generated by specific pipelines. Because the data are unified, they may be queried, viewed, and compared across multiple experiments using a common web interface. Mass Spectrometry Data Platform is open source and freely available at http://code.google.com/p/msdapl/.  相似文献   

2.
MOTIVATION: Effective use of proteomics data, specifically mass spectrometry data, relies on the ability to read and write the many mass spectrometer file formats. Even with mass spectrometer vendor-specific libraries and vendor-neutral file formats, such as mzXML and mzData it can be difficult to extract raw data files in a form suitable for batch processing and basic research. Introduced here are the ProteomeCommons.org Input and Output Framework, abbreviated to IO Framework, which is designed to abstractly represent mass spectrometry data. This project is a public, open-source, free-to-use framework that supports most of the mass spectrometry data formats, including current formats, legacy formats and proprietary formats that require a vendor-specific library in order to operate. The IO Framework includes an on-line tool for non-programmers and a set of libraries that developers may use to convert between various proteomics file formats. AVAILABILITY: The current source-code and documentation for the ProteomeCommons.org IO Framework is freely available at http://www.proteomecommons.org/current/531/  相似文献   

3.
The application of mass spectrometry imaging (MS imaging) is rapidly growing with a constantly increasing number of different instrumental systems and software tools. The data format imzML was developed to allow the flexible and efficient exchange of MS imaging data between different instruments and data analysis software. imzML data is divided in two files which are linked by a universally unique identifier (UUID). Experimental details are stored in an XML file which is based on the HUPO-PSI format mzML. Information is provided in the form of a 'controlled vocabulary' (CV) in order to unequivocally describe the parameters and to avoid redundancy in nomenclature. Mass spectral data are stored in a binary file in order to allow efficient storage. imzML is supported by a growing number of software tools. Users will be no longer limited to proprietary software, but are able to use the processing software best suited for a specific question or application. MS imaging data from different instruments can be converted to imzML and displayed with identical parameters in one software package for easier comparison. All technical details necessary to implement imzML and additional background information is available at www.imzml.org.  相似文献   

4.
We here present the jmzReader library: a collection of Java application programming interfaces (APIs) to parse the most commonly used peak list and XML-based mass spectrometry (MS) data formats: DTA, MS2, MGF, PKL, mzXML, mzData, and mzML (based on the already existing API jmzML). The library is optimized to be used in conjunction with mzIdentML, the recently released standard data format for reporting protein and peptide identifications, developed by the HUPO proteomics standards initiative (PSI). mzIdentML files do not contain spectra data but contain references to different kinds of external MS data files. As a key functionality, all parsers implement a common interface that supports the various methods used by mzIdentML to reference external spectra. Thus, when developing software for mzIdentML, programmers no longer have to support multiple MS data file formats but only this one interface. The library (which includes a viewer) is open source and, together with detailed documentation, can be downloaded from http://code.google.com/p/jmzreader/.  相似文献   

5.
We describe Census, a quantitative software tool compatible with many labeling strategies as well as with label-free analyses, single-stage mass spectrometry (MS1) and tandem mass spectrometry (MS/MS) scans, and high- and low-resolution mass spectrometry data. Census uses robust algorithms to address poor-quality measurements and improve quantitative efficiency, and it can support several input file formats. We tested Census with stable-isotope labeling analyses as well as label-free analyses.  相似文献   

6.

Motivation

In mass spectrometry-based proteomics, XML formats such as mzML and mzXML provide an open and standardized way to store and exchange the raw data (spectra and chromatograms) of mass spectrometric experiments. These file formats are being used by a multitude of open-source and cross-platform tools which allow the proteomics community to access algorithms in a vendor-independent fashion and perform transparent and reproducible data analysis. Recent improvements in mass spectrometry instrumentation have increased the data size produced in a single LC-MS/MS measurement and put substantial strain on open-source tools, particularly those that are not equipped to deal with XML data files that reach dozens of gigabytes in size.

Results

Here we present a fast and versatile parsing library for mass spectrometric XML formats available in C++ and Python, based on the mature OpenMS software framework. Our library implements an API for obtaining spectra and chromatograms under memory constraints using random access or sequential access functions, allowing users to process datasets that are much larger than system memory. For fast access to the raw data structures, small XML files can also be completely loaded into memory. In addition, we have improved the parsing speed of the core mzML module by over 4-fold (compared to OpenMS 1.11), making our library suitable for a wide variety of algorithms that need fast access to dozens of gigabytes of raw mass spectrometric data.

Availability

Our C++ and Python implementations are available for the Linux, Mac, and Windows operating systems. All proposed modifications to the OpenMS code have been merged into the OpenMS mainline codebase and are available to the community at https://github.com/OpenMS/OpenMS.  相似文献   

7.
8.
SUMMARY: A WWW server is described for creating 3D models of canonical or bent DNA starting from sequence data. Predicted DNA trajectory is first computed based on a choice of di- and tri-nucleotide models (M.G. Munteanu et al., Trends Biochem. Sci. 23, 341-347, 1998); an atomic model is then constructed and optionally energy-minimized with constrained molecular dynamics. The data are presented as a standard PDB file, directly viewable on the user's PC using any molecule manipulation program. AVAILABILITY: The model.it server is freely available at http://www.icgeb.trieste.it/dna/ CONTACT: kristian@icgeb.trieste.it; pongor@icgeb.trieste.it SUPPLEMENTARY INFORMATION: a series of help files is available at the above address.  相似文献   

9.
SUMMARY: Accurate and complete mapping of short-read sequencing to a reference genome greatly enhances the discovery of biological results and improves statistical predictions. We recently presented RNA-MATE, a pipeline for the recursive mapping of RNA-Seq datasets. With the rapid increase in genome re-sequencing projects, progression of available mapping software and the evolution of file formats, we now present X-MATE, an updated version of RNA-MATE, capable of mapping both RNA-Seq and DNA datasets and with improved performance, output file formats, configuration files, and flexibility in core mapping software. AVAILABILITY: Executables, source code, junction libraries, test data and results and the user manual are available from http://grimmond.imb.uq.edu.au/X-MATE/.  相似文献   

10.
MOSIX is a cluster management system that supports preemptive process migration. This paper presents the MOSIX Direct File System Access (DFSA), a provision that can improve the performance of cluster file systems by allowing a migrated process to directly access files in its current location. This capability, when combined with an appropriate file system, could substantially increase the I/O performance and reduce the network congestion by migrating an I/O intensive process to a file server rather than the traditional way of bringing the file's data to the process. DFSA is suitable for clusters that manage a pool of shared disks among multiple machines. With DFSA, it is possible to migrate parallel processes from a client node to file servers for parallel access to different files. Any consistent file system can be adjusted to work with DFSA. To test its performance, we developed the MOSIX File-System (MFS) which allows consistent parallel operations on different files. The paper describes DFSA and presents the performance of MFS with and without DFSA.  相似文献   

11.
Strehlow D 《BioTechniques》2000,29(1):118-121
Software is described that facilitates the analysis of phosphoimages from large array hybridizations. The Macintosh PowerPC-compatible application and its manual are available at no charge from http:?people.bu.edu/strehlow. The software is compatible with both custom formats and array filters from three commercial manufacturers. It allows the rapid quantitation of every spot on images of hybridizations to large arrays. The user drags grids of squares over the spots on the image to define the coordinates of each spot, then aligns and edits the position of the grid. The software then corrects the positions as necessary and quantitates up to 27,000 spots per image. It stores the numerical values for each signal in a format called the fingerprint file. Fingerprint files can be directly averaged or compared, allowing the user to find mean values or differences in data from independent hybridization experiments. Data can be recalled from the fingerprint file and can be output in a variety of spreadsheet formats with several options for background correction. Finally, the software offers an output format that allows the convenient visualization of data points using animated, three-dimensional graphs.  相似文献   

12.
Puah WC  Cheok LP  Biro M  Ng WT  Wasser M 《BioTechniques》2011,51(1):49-50, 52-3
Automated microscopy enables in vivo studies in developmental biology over long periods of time. Time-lapse recordings in three or more dimensions to study the dynamics of developmental processes can produce huge data sets that extend into the terabyte range. However, depending on the available computational resources and software design, downstream processing of very large image data sets can become highly inefficient, if not impossible. To address the lack of available open source and commercial software tools to efficiently reorganize time-lapse data on a desktop computer with limited system resources, we developed TLM-Converter. The software either fragments oversized files or concatenates multiple files representing single time frames and saves the output files in open standard formats. Our application is undemanding on system resources as it does not require the whole data set to be loaded into the system memory. We tested our tool on time-lapse data sets of live Drosophila specimens recorded by laser scanning confocal microscopy. Image data reorganization dramatically enhances the productivity of time-lapse data processing and allows the use of downstream image analysis software that is unable to handle large data sets of ≥2 GB. In addition, saving the outputs in open standard image file formats enables data sharing between independently developed software tools.  相似文献   

13.

Background

Proteomics continues to play a critical role in post-genomic science as continued advances in mass spectrometry and analytical chemistry support the separation and identification of increasing numbers of peptides and proteins from their characteristic mass spectra. In order to facilitate the sharing of this data, various standard formats have been, and continue to be, developed. Still not fully mature however, these are not yet able to cope with the increasing number of quantitative proteomic technologies that are being developed.

Results

We propose an extension to the PRIDE and mzData XML schema to accommodate the concept of multiple samples per experiment, and in addition, capture the intensities of the iTRAQ TM reporter ions in the entry. A simple Java-client has been developed to capture and convert the raw data from common spectral file formats, which also uses a third-party open source tool for the generation of iTRAQ TM reported intensities from Mascot output, into a valid PRIDE XML entry.

Conclusion

We describe an extension to the PRIDE and mzData schemas to enable the capture of quantitative data. Currently this is limited to iTRAQ TM data but is readily extensible for other quantitative proteomic technologies. Furthermore, a software tool has been developed which enables conversion from various mass spectrum file formats and corresponding Mascot peptide identifications to PRIDE formatted XML. The tool represents a simple approach to preparing quantitative and qualitative data for submission to repositories such as PRIDE, which is necessary to facilitate data deposition and sharing in public domain database. The software is freely available from http://www.mcisb.org/software/PrideWizard.  相似文献   

14.
Depository of low-molecular-weight compounds or metabolites detected in various organisms in a non-targeted manner is indispensable for metabolomics research. Due to the diverse chemical compounds, various mass spectrometry (MS) setups with state-of-the-art technologies have been used. Over the past two decades, we have analyzed various biological samples by using gas chromatography-mass spectrometry, liquid chromatography-mass spectrometry, or capillary electrophoresis-mass spectrometry, and archived the datasets in the depository MassBase (http://webs2.kazusa.or.jp/massbase/). As the format of MS datasets depends on the MS setup used, we converted each raw binary dataset of the mass chromatogram to text file format, and thereafter, information of the chromatograph peak was extracted in the text file from the converted file. In total, the depository comprises 46,493 datasets, of which 38,750 belong to the plant species and 7,743 are authentic or mixed chemicals as well as other sources (microorganisms, animals, and foods), as on August 1, 2020. All files in the depository can be downloaded in bulk from the website. Mass chromatograms of 90 plant species obtained by LC-Fourier transform ion cyclotron resonance MS or Orbitrap MS, which detect the ionized molecules with high accuracy allowing speculation of chemical compositions, were converted to text files by the software PowerGet, and the chemical annotation of each peak was added. The processed datasets were deposited in the annotation database KomicMarket2 (http://webs2.kazusa.or.jp/km2/). The archives provide fundamental resources for comparative metabolomics and functional genomics, which may result in deeper understanding of living organisms.  相似文献   

15.
With high sensitivity and reproducibility, selected reaction monitoring (SRM) has become increasingly popular in proteome research for targeted quantification of low abundance proteins and post translational modification. SRM is also well accepted in other mass-spectrometry based research areas such as lipidomics and metabolomics, which necessitates the development of easy-to-use software for both post-acquisition SRM data analysis and quantification result validation. Here, we introduce a software tool SRMBuilder, which can automatically parse SRM data in multiple file formats, assign transitions to compounds, match light/heavy transition/compound pairs and provide a user-friendly graphic interface to manually validate the quantification result at transition/compound/sample level. SRMBuilder will greatly facilitate processing of the post-acquisition data files and validation of quantification result for SRM. The software can be downloaded for free from http://www.proteomics.ac.cn/software/proteomicstools/index.htm as part of the software suite ProteomicsTools.  相似文献   

16.
17.
This is part two of an article that describes the properties of the image data files that are encountered routinely in digital light micrography. In the current part of the article, the differences between saving image data as large intact files and smaller files that have had some information removed, i.e., using lossy compression, are related first. Subsequently, appropriate ways of configuring computers to deal with the large intact image data files are suggested. The structures of the image data files used for recording dynamic sequences and kinematic animations of series of digital light micrographs, i.e., movie formats, are then described. Finally, some information is supplied about choosing file formats for compressing both static and dynamic image data sets.  相似文献   

18.
The growing use of mass spectrometry in the context of biomedical research has been accompanied by an increased demand for distribution of results in a format that facilitates rapid and efficient validation of claims by reviewers and other interested parties. However, the continued evolution of mass spectrometry hardware, sample preparation methods, and peptide identification algorithms complicates standardization and creates hurdles related to compliance with journal submission requirements. Moreover, the recently announced Philadelphia Guidelines (1, 2) suggest that authors provide native mass spectrometry data files in support of their peer-reviewed research articles. These trends highlight the need for data viewers and other tools that work independently of manufacturers' proprietary data systems and seamlessly connect proteomics results with original data files to support user-driven data validation and review. Based upon our recently described API(1)-based framework for mass spectrometry data analysis (3, 4), we created an interactive viewer (mzResults) that is built on established database standards and enables efficient distribution and interrogation of results associated with proteomics experiments, while also providing a convenient mechanism for authors to comply with data submission standards as described in the Philadelphia Guidelines. In addition, the architecture of mzResults supports in-depth queries of the native mass spectrometry files through our multiplierz software environment. We use phosphoproteomics data to illustrate the features and capabilities of mzResults.  相似文献   

19.
This is part two of an article that describes the properties of the image data files that are encountered routinely in digital light micrography. In the current part of the article, the differences between saving image data as large intact files and smaller files that have had some information removed, i.e., using lossy compression, are related first. Subsequently, appropriate ways of configuring computers to deal with the large intact image data files are suggested. The structures of the image data files used for recording dynamic sequences and kinematic animations of series of digital light micrographs, i.e., movie formats, are then described. Finally, some information is supplied about choosing file formats for compressing both static and dynamic image data sets.  相似文献   

20.
NAExplor is a software tool for converting coordinates files between the software packages AMBER, CHARMM, and XPLOR. In addition, it manages the conversion of NMR-derived distance restraints information from the MARDIGRAS program into the appropriate file formats used for input in AMBER, CHARMM, and XPLOR. Analyses of H-H distances in nucleic acid structures and calculations of torsion angles for nucleic acid backbone and riboses are also possible.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号