首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Protein identification using MS is an important technique in proteomics as well as a major generator of proteomics data. We have designed the protein identification data object model (PDOM) and developed a parser based on this model to facilitate the analysis and storage of these data. The parser works with HTML or XML files saved or exported from MASCOT MS/MS ions search in peptide summary report or MASCOT PMF search in protein summary report. The program creates PDOM objects, eliminates redundancy in the input file, and has the capability to output any PDOM object to a relational database. This program facilitates additional analysis of MASCOT search results and aids the storage of protein identification information. The implementation is extensible and can serve as a template to develop parsers for other search engines. The parser can be used as a stand-alone application or can be driven by other Java programs. It is currently being used as the front end for a system that loads HTML and XML result files of MASCOT searches into a relational database. The source code is freely available at http://www.ccbm.jhu.edu and the program uses only free and open-source Java libraries.  相似文献   

2.
Tandem mass spectrometry-based proteomics experiments produce large amounts of raw data, and different database search engines are needed to reliably identify all the proteins from this data. Here, we present Compid, an easy-to-use software tool that can be used to integrate and compare protein identification results from two search engines, Mascot and Paragon. Additionally, Compid enables extraction of information from large Mascot result files that cannot be opened via the Web interface and calculation of general statistical information about peptide and protein identifications in a data set. To demonstrate the usefulness of this tool, we used Compid to compare Mascot and Paragon database search results for mitochondrial proteome sample of human keratinocytes. The reports generated by Compid can be exported and opened as Excel documents or as text files using configurable delimiters, allowing the analysis and further processing of Compid output with a multitude of programs. Compid is freely available and can be downloaded from http://users.utu.fi/lanatr/compid. It is released under an open source license (GPL), enabling modification of the source code. Its modular architecture allows for creation of supplementary software components e.g. to enable support for additional input formats and report categories.  相似文献   

3.
数据非依赖采集(DIA)是蛋白质组学领域近年来快速发展的质谱采集技术,其通过无偏碎裂隔离窗口内的所有母离子采集二级谱图,理论上可实现蛋白质样品的深度覆盖,同时具有高通量、高重现性和高灵敏度的优点。现有的DIA数据采集方法可以分为全窗口碎裂方法、隔离窗口序列碎裂方法和四维DIA数据采集方法(4D-DIA)3大类。针对DIA数据的不同特点,主要数据解析方法包括谱库搜索方法、蛋白质序列库直接搜索方法、伪二级谱图鉴定方法和从头测序方法4大类。解析得到的肽段鉴定结果需要进行可信度评估,包括使用机器学习方法的重排序和对报告结果集合的假发现率估计两个步骤,实现对数据解析结果的质控。本文对DIA数据的采集方法、数据解析方法及软件和鉴定结果可信度评估方法进行了整理和综述,并展望了未来的发展方向。  相似文献   

4.
Proteomics research programs typically comprise the identification of protein content of any given cell, their isoforms, splice variants, post-translational modifications, interacting partners and higher-order complexes under different conditions. These studies present significant analytical challenges owing to the high proteome complexity and the low abundance of the corresponding proteins, which often requires highly sensitive and resolving techniques. Mass spectrometry plays an important role in proteomics and has become an indispensable tool for molecular and cellular biology. However, the analysis of mass spectrometry data can be a daunting task in view of the complexity of the information to decipher, the accuracy and dynamic range of quantitative analysis, the availability of appropriate bioinformatics software and the overwhelming size of data files. The past ten years have witnessed significant technological advances in mass spectrometry-based proteomics and synergy with bioinformatics is vital to fulfill the expectations of biological discovery programs. We present here the technological capabilities of mass spectrometry and bioinformatics for mining the cellular proteome in the context of discovery programs aimed at trace-level protein identification and expression from microgram amounts of protein extracts from human tissues.  相似文献   

5.
The field of proteomics is advancing rapidly as a result of powerful new technologies and proteomics experiments yield a vast and increasing amount of information. Data regarding protein occurrence, abundance, identity, sequence, structure, properties, and interactions need to be stored. Currently, a common standard has not yet been established and open access to results is needed for further development of robust analysis algorithms. Databases for proteomics will evolve from pure storage into knowledge resources, providing a repository for information (meta-data) which is mainly not stored in simple flat files. This review will shed light on recent steps towards the generation of a common standard in proteomics data storage and integration, but is not meant to be a comprehensive overview of all available databases and tools in the proteomics community.  相似文献   

6.
Jens Allmer 《Amino acids》2010,38(4):1075-1087
Determining the differential expression of proteins under different conditions is of major importance in proteomics. Since mass spectrometry-based proteomics is often used to quantify proteins, several labelling strategies have been developed. While these are generally more precise than label-free quantitation approaches, they imply specifically designed experiments which also require knowledge about peptides that are expected to be measured and need to be modified. We recently designed the 2DB database which aids storage, analysis, and publication of data from mass spectrometric experiments to identify proteins. This database can aid identifying peptides which can be used for quantitation. Here an extension to the database application, named MSMAG, is presented which allows for more detailed analysis of the distribution of peptides and their associated proteins over the fractions of an experiment. Furthermore, given several biological samples in the database, label-free quantitation can be performed. Thus, interesting proteins, which may warrant further investigation, can be identified en passant while performing high-throughput proteomics studies.  相似文献   

7.
The discovery of many noncanonical peptides detectable with sensitive mass spectrometry inside, outside, and on cells shepherded the development of novel methods for their identification, often not supported by a systematic benchmarking with other methods. We here propose iBench, a bioinformatic tool that can construct ground truth proteomics datasets and cognate databases, thereby generating a training court wherein methods, search engines, and proteomics strategies can be tested, and their performances estimated by the same tool. iBench can be coupled to the main database search engines, allows the selection of customized features of mass spectrometry spectra and peptides, provides standard benchmarking outputs, and is open source. The proof-of-concept application to tryptic proteome digestions, immunopeptidomes, and synthetic peptide libraries dissected the impact that noncanonical peptides could have on the identification of canonical peptides by Mascot search with rescoring via Percolator (Mascot+Percolator).  相似文献   

8.
We report a hybrid search method combining database and spectral library searches that allows for a straightforward approach to characterizing the error rates from the combined data. Using these methods, we demonstrate significantly increased sensitivity and specificity in matching peptides to tandem mass spectra. The hybrid search method increased the number of spectra that can be assigned to a peptide in a global proteomics study by 57-147% at an estimated false discovery rate of 5%, with clear room for even greater improvements. The approach combines the general utility of using consensus model spectra typical of database search methods with the accuracy of the intensity information contained in spectral libraries. A common scoring metric based on recent developments linking data analysis and statistical thermodynamics is used, which allows the use of a conservative estimate of error rates for the combined data. We applied this approach to proteomics analysis of Synechococcus sp. PCC 7002, a cyanobacterium that is a model organism for studies of photosynthetic carbon fixation and biofuels development. The increased specificity and sensitivity of this approach allowed us to identify many more peptides involved in the processes important for photoautotrophic growth.  相似文献   

9.
The global analysis of proteins is now feasible due to improvements in techniques such as two-dimensional gel electrophoresis (2-DE), mass spectrometry, yeast two-hybrid systems and the development of bioinformatics applications. The experiments form the basis of proteomics, and present significant challenges in data analysis, storage and querying. We argue that a standard format for proteome data is required to enable the storage, exchange and subsequent re-analysis of large datasets. We describe the criteria that must be met for the development of a standard for proteomics. We have developed a model to represent data from 2-DE experiments, including difference gel electrophoresis along with image analysis and statistical analysis across multiple gels. This part of proteomics analysis is not represented in current proposals for proteomics standards. We are working with the Proteomics Standards Initiative to develop a model encompassing biological sample origin, experimental protocols, a number of separation techniques and mass spectrometry. The standard format will facilitate the development of central repositories of data, enabling results to be verified or re-analysed, and the correlation of results produced by different research groups using a variety of laboratory techniques.  相似文献   

10.
The effectiveness of database search algorithms, such as Mascot, Sequest and ProteinPilot is limited by the quality of the input spectra: spurious peaks in MS/MS spectra can jeopardize the correct identification of peptides or reduce their score significantly. Consequently, an efficient preprocessing of MS/MS spectra can increase the sensitivity of peptide identification at reduced file sizes and run time without compromising its specificity. We investigate the performance of 25 MS/MS preprocessing methods on various data sets and make software for improved preprocessing of mgf/dta‐files freely available from http://hci.iwr.uni‐heidelberg.de/mip/proteomics or http://www.childrenshospital.org/research/steenlab .  相似文献   

11.
Current efforts aimed at developing high-throughput proteomics focus on increasing the speed of protein identification. Although improvements in sample separation, enrichment, automated handling, mass spectrometric analysis, as well as data reduction and database interrogation strategies have done much to increase the quality, quantity and efficiency of data collection, significant bottlenecks still exist. Various separation techniques have been coupled with tandem mass spectrometric (MS/MS) approaches to allow a quicker analysis of complex mixtures of proteins, especially where a high number of unambiguous protein identifications are the exception, rather than the rule. MS/MS is required to provide structural / amino acid sequence information on a peptide and thus allow protein identity to be inferred from individual peptides. Currently these spectra need to be manually validated because: (a) the potential of false positive matches i.e., protein not in database, and (b) observed fragmentation trends may not be incorporated into current MS/MS search algorithms. This validation represents a significant bottleneck associated with high-throughput proteomic strategies. We have developed CHOMPER, a software program which reduces the time required to both visualize and confirm MS/MS search results and generate post-analysis reports and protein summary tables. CHOMPER extracts the identification information from SEQUEST MS/MS search result files, reproduces both the peptide and protein identification summaries, provides a more interactive visualization of the MS/MS spectra and facilitates the direct submission of manually validated identifications to a database.  相似文献   

12.
The growing use of mass spectrometry in the context of biomedical research has been accompanied by an increased demand for distribution of results in a format that facilitates rapid and efficient validation of claims by reviewers and other interested parties. However, the continued evolution of mass spectrometry hardware, sample preparation methods, and peptide identification algorithms complicates standardization and creates hurdles related to compliance with journal submission requirements. Moreover, the recently announced Philadelphia Guidelines (1, 2) suggest that authors provide native mass spectrometry data files in support of their peer-reviewed research articles. These trends highlight the need for data viewers and other tools that work independently of manufacturers' proprietary data systems and seamlessly connect proteomics results with original data files to support user-driven data validation and review. Based upon our recently described API(1)-based framework for mass spectrometry data analysis (3, 4), we created an interactive viewer (mzResults) that is built on established database standards and enables efficient distribution and interrogation of results associated with proteomics experiments, while also providing a convenient mechanism for authors to comply with data submission standards as described in the Philadelphia Guidelines. In addition, the architecture of mzResults supports in-depth queries of the native mass spectrometry files through our multiplierz software environment. We use phosphoproteomics data to illustrate the features and capabilities of mzResults.  相似文献   

13.
Andromeda: a peptide search engine integrated into the MaxQuant environment   总被引:3,自引:0,他引:3  
A key step in mass spectrometry (MS)-based proteomics is the identification of peptides in sequence databases by their fragmentation spectra. Here we describe Andromeda, a novel peptide search engine using a probabilistic scoring model. On proteome data, Andromeda performs as well as Mascot, a widely used commercial search engine, as judged by sensitivity and specificity analysis based on target decoy searches. Furthermore, it can handle data with arbitrarily high fragment mass accuracy, is able to assign and score complex patterns of post-translational modifications, such as highly phosphorylated peptides, and accommodates extremely large databases. The algorithms of Andromeda are provided. Andromeda can function independently or as an integrated search engine of the widely used MaxQuant computational proteomics platform and both are freely available at www.maxquant.org. The combination enables analysis of large data sets in a simple analysis workflow on a desktop computer. For searching individual spectra Andromeda is also accessible via a web server. We demonstrate the flexibility of the system by implementing the capability to identify cofragmented peptides, significantly improving the total number of identified peptides.  相似文献   

14.
High-throughput sequencing assays are now routinely used to study different aspects of genome organization. As decreasing costs and widespread availability of sequencing enable more laboratories to use sequencing assays in their research projects, the number of samples and replicates in these experiments can quickly grow to several dozens of samples and thus require standardized annotation, storage and management of preprocessing steps. As a part of the STATegra project, we have developed an Experiment Management System (EMS) for high throughput omics data that supports different types of sequencing-based assays such as RNA-seq, ChIP-seq, Methyl-seq, etc, as well as proteomics and metabolomics data. The STATegra EMS provides metadata annotation of experimental design, samples and processing pipelines, as well as storage of different types of data files, from raw data to ready-to-use measurements. The system has been developed to provide research laboratories with a freely-available, integrated system that offers a simple and effective way for experiment annotation and tracking of analysis procedures.  相似文献   

15.
Confident identification of peptides via tandem mass spectrometry underpins modern high-throughput proteomics. This has motivated considerable recent interest in the postprocessing of search engine results to increase confidence and calculate robust statistical measures, for example through the use of decoy databases to calculate false discovery rates (FDR). FDR-based analyses allow for multiple testing and can assign a single confidence value for both sets and individual peptide spectrum matches (PSMs). We recently developed an algorithm for combining the results from multiple search engines, integrating FDRs for sets of PSMs made by different search engine combinations. Here we describe a web-server and a downloadable application that makes this routinely available to the proteomics community. The web server offers a range of outputs including informative graphics to assess the confidence of the PSMs and any potential biases. The underlying pipeline also provides a basic protein inference step, integrating PSMs into protein ambiguity groups where peptides can be matched to more than one protein. Importantly, we have also implemented full support for the mzIdentML data standard, recently released by the Proteomics Standards Initiative, providing users with the ability to convert native formats to mzIdentML files, which are available to download.  相似文献   

16.
The objective of proteomics is to get an overview of the proteins expressed at a given point in time in a given tissue and to identify the connection to the biochemical status of that tissue. Therefore sample throughput and analysis time are important issues in proteomics. The concept of proteomics is to encircle the identity of proteins of interest. However, the overall relation between proteins must also be explained. Classical proteomics consist of separation and characterization, based on two-dimensional electrophoresis, trypsin digestion, mass spectrometry and database searching. Characterization includes labor intensive work in order to manage, handle and analyze data. The field of classical proteomics should therefore be extended to also include handling of large datasets in an objective way. The separation obtained by two-dimensional electrophoresis and mass spectrometry gives rise to huge amount of data. We present a multivariate approach to the handling of data in proteomics with the advantage that protein patterns can be spotted at an early stage and consequently the proteins selected for sequencing can be selected intelligently. These methods can also be applied to other data generating protein analysis methods like mass spectrometry and near infrared spectroscopy and examples of application to these techniques are also presented. Multivariate data analysis can unravel complicated data structures and may thereby relieve the characterization phase in classical proteomics. Traditionally statistical methods are not suitable for analysis of the huge amounts of data, where the number of variables exceed the number of objects. Multivariate data analysis, on the other hand, may uncover the hidden structures present in these data. This study takes its starting point in the field of classical proteomics and shows how multivariate data analysis can lead to faster ways of finding interesting proteins. Multivariate analysis has shown interesting results as a supplement to classical proteomics and added a new dimension to the field of proteomics.  相似文献   

17.
Targeted proteomics allows researchers to study proteins of interest without being drowned in data from other, less interesting proteins or from redundant or uninformative peptides. While the technique is mostly used for smaller, focused studies, there are several reasons to conduct larger targeted experiments. Automated, highly robust software becomes more important in such experiments. In addition, larger experiments are carried out over longer periods of time, requiring strategies to handle the sometimes large shift in retention time often observed. We present a complete proof-of-principle software stack that automates most aspects of selected reaction monitoring workflows, a targeted proteomics technology. The software allows experiments to be easily designed and carried out. The steps automated are the generation of assays, generation of mass spectrometry driver files and methods files, and the import and analysis of the data. All data are normalized to a common retention time scale, the data are then scored using a novel score model, and the error is subsequently estimated. We also show that selected reaction monitoring can be used for label-free quantification. All data generated are stored in a relational database, and the growing resource further facilitates the design of new experiments. We apply the technology to a large-scale experiment studying how Streptococcus pyogenes remodels its proteome under stimulation of human plasma.  相似文献   

18.
Clustering millions of tandem mass spectra   总被引:1,自引:0,他引:1  
Tandem mass spectrometry (MS/MS) experiments often generate redundant data sets containing multiple spectra of the same peptides. Clustering of MS/MS spectra takes advantage of this redundancy by identifying multiple spectra of the same peptide and replacing them with a single representative spectrum. Analyzing only representative spectra results in significant speed-up of MS/MS database searches. We present an efficient clustering approach for analyzing large MS/MS data sets (over 10 million spectra) with a capability to reduce the number of spectra submitted to further analysis by an order of magnitude. The MS/MS database search of clustered spectra results in fewer spurious hits to the database and increases number of peptide identifications as compared to regular nonclustered searches. Our open source software MS-Clustering is available for download at http://peptide.ucsd.edu or can be run online at http://proteomics.bioprojects.org/MassSpec.  相似文献   

19.
Mass measurement is the main outcome of mass spectrometry-based proteomics yet the potential of recent advances in accurate mass measurements remains largely unexploited. There is not even a clear definition of mass accuracy in the proteomics literature, and we identify at least three uses of this term: anecdotal mass accuracy, statistical mass accuracy, and the maximum mass deviation (MMD) allowed in a database search. We suggest using the second of these terms as the generic one. To make the best use of the mass precision offered by modern instruments we propose a series of simple steps involving recalibration of the data on "internal standards" contained in every proteomics data set. Each data set should be accompanied by a plot of mass errors from which the appropriate MMD can be chosen. More advanced uses of high mass accuracy include an MMD that depends on the signal abundance of each peptide. Adapting search engines to high mass accuracy in the MS/MS data is also a high priority. Proper use of high mass accuracy data can make MS-based proteomics one of the most "digital" and accurate post-genomics disciplines.  相似文献   

20.
Recent technological advances have made it possible to identify and quantify thousands of proteins in a single proteomics experiment. As a result of these developments, the analysis of data has become the bottleneck of proteomics experiment. To provide the proteomics community with a user-friendly platform for comprehensive analysis, inspection and visualization of quantitative proteomics data we developed the Graphical Proteomics Data Explorer (GProX)(1). The program requires no special bioinformatics training, as all functions of GProX are accessible within its graphical user-friendly interface which will be intuitive to most users. Basic features facilitate the uncomplicated management and organization of large data sets and complex experimental setups as well as the inspection and graphical plotting of quantitative data. These are complemented by readily available high-level analysis options such as database querying, clustering based on abundance ratios, feature enrichment tests for e.g. GO terms and pathway analysis tools. A number of plotting options for visualization of quantitative proteomics data is available and most analysis functions in GProX create customizable high quality graphical displays in both vector and bitmap formats. The generic import requirements allow data originating from essentially all mass spectrometry platforms, quantitation strategies and software to be analyzed in the program. GProX represents a powerful approach to proteomics data analysis providing proteomics experimenters with a toolbox for bioinformatics analysis of quantitative proteomics data. The program is released as open-source and can be freely downloaded from the project webpage at http://gprox.sourceforge.net.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号