首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 0 毫秒
The mzQuantML standard from the HUPO Proteomics Standards Initiative has recently been released, capturing quantitative data about peptides and proteins, following analysis of MS data. We present a Java application programming interface (API) for mzQuantML called jmzQuantML. The API provides robust bridges between Java classes and elements in mzQuantML files and allows random access to any part of the file. The API provides read and write capabilities, and is designed to be embedded in other software packages, enabling mzQuantML support to be added to proteomics software tools ( http://code.google.com/p/jmzquantml/ ). The mzQuantML standard is designed around a multilevel validation system to ensure that files are structurally and semantically correct for different proteomics quantitative techniques. In this article, we also describe a Java software tool ( http://code.google.com/p/mzquantml‐validator/ ) for validating mzQuantML files, which is a formal part of the data standard.  相似文献   

We present a Java application programming interface (API), jmzIdentML, for the Human Proteome Organisation (HUPO) Proteomics Standards Initiative (PSI) mzIdentML standard for peptide and protein identification data. The API combines the power of Java Architecture of XML Binding (JAXB) and an XPath-based random-access indexer to allow a fast and efficient mapping of extensible markup language (XML) elements to Java objects. The internal references in the mzIdentML files are resolved in an on-demand manner, where the whole file is accessed as a random-access swap file, and only the relevant piece of XMLis selected for mapping to its corresponding Java object. The APIis highly efficient in its memory usage and can handle files of arbitrary sizes. The APIfollows the official release of the mzIdentML (version 1.1) specifications and is available in the public domain under a permissive licence at http://www.code.google.com/p/jmzidentml/.  相似文献   

We here present the jmzReader library: a collection of Java application programming interfaces (APIs) to parse the most commonly used peak list and XML-based mass spectrometry (MS) data formats: DTA, MS2, MGF, PKL, mzXML, mzData, and mzML (based on the already existing API jmzML). The library is optimized to be used in conjunction with mzIdentML, the recently released standard data format for reporting protein and peptide identifications, developed by the HUPO proteomics standards initiative (PSI). mzIdentML files do not contain spectra data but contain references to different kinds of external MS data files. As a key functionality, all parsers implement a common interface that supports the various methods used by mzIdentML to reference external spectra. Thus, when developing software for mzIdentML, programmers no longer have to support multiple MS data file formats but only this one interface. The library (which includes a viewer) is open source and, together with detailed documentation, can be downloaded from http://code.google.com/p/jmzreader/.  相似文献   

We here present jmzML, a Java API for the Proteomics Standards Initiative mzML data standard. Based on the Java Architecture for XML Binding and XPath‐based XML indexer random‐access XML parser, jmzML can handle arbitrarily large files in minimal memory, allowing easy and efficient processing of mzML files using the Java programming language. jmzML also automatically resolves internal XML references on‐the‐fly. The library (which includes a viewer) can be downloaded from http://jmzml.googlecode.com .  相似文献   

mzTab is the most recent standard format developed by the Proteomics Standards Initiative. mzTab is a flexible tab‐delimited file that can capture identification and quantification results coming from MS‐based proteomics and metabolomics approaches. We here present an open‐source Java application programming interface for mzTab called jmzTab. The software allows the efficient processing of mzTab files, providing read and write capabilities, and is designed to be embedded in other software packages. The second key feature of the jmzTab model is that it provides a flexible framework to maintain the logical integrity between the metadata and the table‐based sections in the mzTab files. In this article, as two example implementations, we also describe two stand‐alone tools that can be used to validate mzTab files and to convert PRIDE XML files to mzTab. The library is freely available at http://mztab.googlecode.com .  相似文献   

Due to the enormous complexity of proteomes which constitute the entirety of protein species expressed by a certain cell or tissue, proteome-wide studies performed in discovery mode are still limited in their ability to reproducibly identify and quantify all proteins present in complex biological samples. Therefore, the targeted analysis of informative subsets of the proteome has been beneficial to generate reproducible data sets across multiple samples. Here we review the repertoire of antibody- and mass spectrometry (MS) -based analytical tools which is currently available for the directed analysis of predefined sets of proteins. The topics of emphasis for this review are Selected Reaction Monitoring (SRM) mass spectrometry, emerging tools to control error rates in targeted proteomic experiments, and some representative examples of applications. The ability to cost- and time-efficiently generate specific and quantitative assays for large numbers of proteins and posttranslational modifications has the potential to greatly expand the range of targeted proteomic coverage in biological studies. This article is part of a Special Section entitled: Understanding genome regulation and genetic diversity by mass spectrometry.  相似文献   

Many top‐down proteomics experiments focus on identifying and localizing PTMs and other potential sources of “mass shift” on a known protein sequence. A simple application to match ion masses and facilitate the iterative hypothesis testing of PTM presence and location would assist with the data analysis in these experiments. ProSight Lite is a free software tool for matching a single candidate sequence against a set of mass spectrometric observations. Fixed or variable modifications, including both PTMs and a select number of glycosylations, can be applied to the amino acid sequence. The application reports multiple scores and a matching fragment list. Fragmentation maps can be exported for publication in either portable network graphic (PNG) or scalable vector graphic (SVG) format. ProSight Lite can be freely downloaded from http://prosightlite.northwestern.edu , installs and updates from the web, and requires Windows 7 or a higher version.  相似文献   

In proteomics, rapid developments in instrumentation led to the acquisition of increasingly large data sets. Correspondingly, ProDaC was founded in 2006 as a Coordination Action project within the 6th European Union Framework Programme to support data sharing and community‐wide data collection. The objectives of ProDaC were the development of documentation and storage standards, setup of a standardized data submission pipeline and collection of data. Ending in March 2009, ProDaC has delivered a comprehensive toolbox of standards and computer programs to achieve these goals.  相似文献   

Selected reaction monitoring (SRM) is an accurate quantitative technique, typically used for small-molecule mass spectrometry (MS). SRM has emerged as an important technique for targeted and hypothesis-driven proteomic research, and is becoming the reference method for protein quantification in complex biological samples. SRM offers high selectivity, a lower limit of detection and improved reproducibility, compared to conventional shot-gun-based tandem MS (LC-MS/MS) methods. Unlike LC-MS/MS, which requires computationally intensive informatic postanalysis, SRM requires preacquisition bioinformatic analysis to determine proteotypic peptides and optimal transitions to uniquely identify and to accurately quantitate proteins of interest. Extensive arrays of bioinformatics software tools, both web-based and stand-alone, have been published to assist researchers to determine optimal peptides and transition sets. The transitions are oftentimes selected based on preferred precursor charge state, peptide molecular weight, hydrophobicity, fragmentation pattern at a given collision energy (CE), and instrumentation chosen. Validation of the selected transitions for each peptide is critical since peptide performance varies depending on the mass spectrometer used. In this review, we provide an overview of open source and commercial bioinformatic tools for analyzing LC-MS data acquired by SRM.  相似文献   

Identifying reproducible yet relevant protein features in proteomics data is a major challenge. Analysis at the level of protein complexes can resolve this issue and we have developed a suite of feature‐selection methods collectively referred to as Rank‐Based Network Analysis (RBNA). RBNAs differ in their individual statistical test setup but are similar in the sense that they deploy rank‐defined weights among proteins per sample. This procedure is known as gene fuzzy scoring. Currently, no RBNA exists for paired‐sample scenarios where both control and test tissues originate from the same source (e.g. same patient). It is expected that paired tests, when used appropriately, are more powerful than approaches intended for unpaired samples. We report that the class‐paired RBNA, PPFSNET, dominates in both simulated and real data scenarios. Moreover, for the first time, we explicitly incorporate batch‐effect resistance as an additional evaluation criterion for feature‐selection approaches. Batch effects are class irrelevant variations arising from different handlers or processing times, and can obfuscate analysis. We demonstrate that PPFSNET and an earlier RBNA, PFSNET, are particularly resistant against batch effects, and only select features strongly correlated with class but not batch.  相似文献   

Inferring which protein species have been detected in bottom‐up proteomics experiments has been a challenging problem for which solutions have been maturing over the past decade. While many inference approaches now function well in isolation, comparing and reconciling the results generated across different tools remains difficult. It presently stands as one of the greatest barriers in collaborative efforts such as the Human Proteome Project and public repositories such as the PRoteomics IDEntifications (PRIDE) database. Here we present a framework for reporting protein identifications that seeks to improve capabilities for comparing results generated by different inference tools. This framework standardizes the terminology for describing protein identification results, associated with the HUPO‐Proteomics Standards Initiative (PSI) mzIdentML standard, while still allowing for differing methodologies to reach that final state. It is proposed that developers of software for reporting identification results will adopt this terminology in their outputs. While the new terminology does not require any changes to the core mzIdentML model, it represents a significant change in practice, and, as such, the rules will be released via a new version of the mzIdentML specification (version 1.2) so that consumers of files are able to determine whether the new guidelines have been adopted by export software.  相似文献   

The amount of data currently being generated by proteomics laboratories around the world is increasing exponentially, making it ever more critical that scientists are able to exchange, compare and retrieve datasets when re-evaluation of their original conclusions becomes important. Only a fraction of this data is published in the literature and important information is being lost every day as data formats become obsolete. The Human Proteome Organisation Proteomics Standards Initiative (HUPO-PSI) was tasked with the creation of data standards and interchange formats to allow both the exchange and storage of such data irrespective of the hardware and software from which it was generated. This article will provide an update on the work of this group, the creation and implementation of these standards and the standards-compliant data repositories being established as result of their efforts.  相似文献   

生物信息学是运用数学和信息学方法阐明和解释海量生物学数据所蕴含的生物学意义的重要手段和工具.随着蛋白质组学研究的不断发展和深入,大量的蛋白序列、结构、功能以及互作数据不断产生.面对海量蛋白质组数据的获取、处理、存储以及蛋白质组数据信息的挖掘,生物信息学已成为蛋白组学研究中不可或缺的组成部分.本文结合蛋白质组学的发展历程...  相似文献   

We describe a cell-free approach that employs selected reaction monitoring (SRM) in tandem mass spectrometry to identify and quantitate T-cell epitopes. This approach utilises multiple epitope-specific SRM transitions to identify known T-cell epitopes and an absolute quantitation (AQUA) peptide strategy to afford AQUA. The advantage of a mass spectrometry-based approach over more traditional cell-based assays resides in the robustness and transferability of an SRM approach between laboratories and the ability of this strategy to detect multiple peptides simultaneously without the requirement of epitope-specific reagents such as T-cell lines. Thus, the SRM strategy for epitope quantitation will find application in studies of antigen density, the link between epitope abundance and immunogenicity, the dynamic range of epitope presentation and the abundance of T-cell epitopes in disease.  相似文献   

The Swiss-Prot protein knowledgebase provides manually annotated entries for all species, but concentrates on the annotation of entries from model organisms to ensure the presence of high quality annotation of representative members of all protein families. A specific Plant Protein Annotation Program (PPAP) was started to cope with the increasing amount of data produced by the complete sequencing of plant genomes. Its main goal is the annotation of proteins from the model plant organism Arabidopsis thaliana. In addition to bibliographic references, experimental results, computed features and sometimes even contradictory conclusions, direct links to specialized databases connect amino acid sequences with the current knowledge in plant sciences. As protein families and groups of plant-specific proteins are regularly reviewed to keep up with current scientific findings, we hope that the wealth of information of Arabidopsis origin accumulated in our knowledgebase, and the numerous software tools provided on the Expert Protein Analysis System (ExPASy) web site might help to identify and reveal the function of proteins originating from other plants. Recently, a single, centralized, authoritative resource for protein sequences and functional information, UniProt, was created by joining the information contained in Swiss-Prot, Translation of the EMBL nucleotide sequence (TrEMBL), and the Protein Information Resource-Protein Sequence Database (PIR-PSD). A rising problem is that an increasing number of nucleotide sequences are not being submitted to the public databases, and thus the proteins inferred from such sequences will have difficulties finding their way to the Swiss-Prot or TrEMBL databases.  相似文献   

Collision‐activated dissociation and electron‐transfer dissociation (ETD) each produce spectra containing unique features. Though several database search algorithms (e.g. SEQUEST, MASCOT, and Open Mass Spectrometry Search Algorithm) have been modified to search ETD data, this consists chiefly of the ability to search for c‐ and z?‐ions; additional ETD‐specific features are often unaccounted for and may hinder identification. Removal of these features via spectral processing increased total search sensitivity by ~20% for both human and yeast data sets; unique peptide identifications increased by ~17% for the yeast data sets and ~16% for the human data set.  相似文献   

This paper reports on the 5th joint British Society for Proteome Research (BSPR) and European Bioinformatics Institute (EBI) meeting which took place at the Wellcome Trust Conference Centre, Cambridge, UK, from the 8th to 10th July, 2008. As in previous years, the meeting attracted leading experts in the field who presented the latest cutting edge in proteomics. The meeting was entitled “Proteomics: From Technology to New Biology” taking into account the major transition proteomics has undergone in the past few years. In particular, the use of multiple reaction monitoring (MRM)‐based targeted experiments for absolute quantification and validation of proteins was the hot topic of the meeting. Attended by some 250 delegates, the conference was extremely well organised and provided a great opportunity for discussion and initiation of new collaborations.  相似文献   

With its predicted proteome of 1550 proteins (data set Etalon) Helicobacter pylori 26695 represents a perfect model system of medium complexity for investigating basic questions in proteomics. We analyzed urea‐solubilized proteins by 2‐DE/MS (data set 2‐DE) and by 1‐DE‐LC/MS (Supprot); proteins insoluble in 9 M urea but solubilized by SDS (Pellet); proteins precipitating in the Sephadex layer at the application side of IEF (Sephadex) by 1‐DE‐LC/MS; and proteins precipitating close to the application side within the IEF gel by LC/MS (Startline). The experimental proteomics data of H. pylori comprising 567 proteins (protein coverage: 36.6%) were stored in the Proteome Database System for Microbial Research ( http://www.mpiib‐berlin.mpg.de/2D‐PAGE/ ), which gives access to raw mass spectra (MALDI‐TOF/TOF) in T2D format, as well as to text files of peak lists. For data mining the protein mapping and comparison tool PROMPT ( http://webclu.bio.wzw.tum.de/prompt/ ) was used. The percentage of proteins with transmembrane regions, relative to all proteins detected, was 0, 0.2, 0, 0.5, 3.8 and 6.3% for 2‐DE, Supprot, Startline, Sephadex, Pellet, and Etalon, respectively. 2‐DE does not separate membrane proteins because they are insoluble in 9 M urea/70 mM DTT and 2% CHAPS. SDS solubilizes a considerable portion of the urea‐insoluble proteins and makes them accessible for separation by SDS‐PAGE and LC. The 2‐DE/MS analysis with urea‐solubilized proteins and the 1‐DE‐LC/MS analysis with the urea‐insoluble protein fraction (Pellet) are complementary procedures in the pursuit of a complete proteome analysis. Access to the PROMPT‐generated diagrams in the Proteome Database allows the mining of experimental data with respect to other functional aspects.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号