首页 | 本学科首页   官方微博 | 高级检索  
   检索      


Abundance-based Classifier for the Prediction of Mass Spectrometric Peptide Detectability Upon Enrichment (PPA)
Authors:Jan Muntel  Sarah A Boswell  Shaojun Tang  Saima Ahmed  Ilan Wapinski  Greg Foley  Hanno Steen  Michael Springer
Institution:From the ‡Departments of Pathology, Boston Children''s Hospital and Harvard Medical School, Boston, MA; ;§Department of Systems Biology, Harvard Medical School, Boston, MA
Abstract:The function of a large percentage of proteins is modulated by post-translational modifications (PTMs). Currently, mass spectrometry (MS) is the only proteome-wide technology that can identify PTMs. Unfortunately, the inability to detect a PTM by MS is not proof that the modification is not present. The detectability of peptides varies significantly making MS potentially blind to a large fraction of peptides. Learning from published algorithms that generally focus on predicting the most detectable peptides we developed a tool that incorporates protein abundance into the peptide prediction algorithm with the aim to determine the detectability of every peptide within a protein. We tested our tool, “Peptide Prediction with Abundance” (PPA), on in-house acquired as well as published data sets from other groups acquired on different instrument platforms. Incorporation of protein abundance into the prediction allows us to assess not only the detectability of all peptides but also whether a peptide of interest is likely to become detectable upon enrichment. We validated the ability of our tool to predict changes in protein detectability with a dilution series of 31 purified proteins at several different concentrations. PPA predicted the concentration dependent peptide detectability in 78% of the cases correctly, demonstrating its utility for predicting the protein enrichment needed to observe a peptide of interest in targeted experiments. This is especially important in the analysis of PTMs. PPA is available as a web-based or executable package that can work with generally applicable defaults or retrained from a pilot MS data set.Post-translational modification (PTM)1 of proteins is a key regulatory mechanism in the vast majority of biological processes. Historically, to follow PTMs, site-specific antibodies had to be generated in a time-consuming and laborious process associated with high failure rates. Mass spectrometry (MS) holds enormous promise in PTM analysis as it is currently the only technique that has the ability to both discover, localize, and quantify proteome-wide modifications (1). Recent advances in instrumentation and method optimization makes it possible to detect the complete yeast proteome within one hour (2), an ever increasing proportion of the human proteome (36), and more than 10,000 phosphorylation sites in a single MS experiment (7, 8). As a result one of the major publicly available databases (www.phosphosite.org (9)) has curated >200,000 phosphorylation sites.Although the number of proteins and PTMs that can be identified is impressive, many modifications have still not been identified in any MS-based experiment. The identification and quantification of biologically relevant modifications is challenging for three reasons: (1) many proteins of interest are of very low abundance rendering them difficult to detect and quantify; (2) many modifications sites are present at substoichiometric quantities, further reducing their detectability; and (3) as large scale proteomics is based on the detection of peptides after a proteolytic digest, and the detectability of a peptide is determined by its physiochemical properties (10), many peptides from highly abundant proteins are never detected. This is particularly important, as there is a shift in the use of MS-based proteomics from large scale, unbiased, discovery-focused experiments toward directed experiments for accurate and precise quantification of biologically relevant PTMs. Protein and peptide enrichment strategies and/or targeted MS experiments like single reaction monitoring (SRM) (11) have increased the number of detectable peptides; however, both of these methods are laborious, and often not successful, that is, the peptide carrying the modification of interest is still not observed as it is fundamentally very difficult to detect.Protein enrichment is the method choice for most experimentalists, but there is no current way to determine whether this is likely to succeed prior to engaging in lengthy biochemical and/or analytical experiments. In an effort to gauge the chances of success for detecting a particular peptide we sought to develop an algorithm that can predict both the chances of detecting a particular peptide and, more importantly, what enrichment it would take to detect a particular peptide that is not easily detected. Here we present such a tool that predicts the detectability and estimates an enrichment factor, i.e. an increase in signal over the background that is necessary to actually detect a particular peptide. Our algorithm development was motivated by two premises: (1) In silico methods have been developed that focus on the prediction of easily detectable “proteotypic” peptides (peptides that are likely to provide the best detection sensitivity) with good accuracy (1215). (2) Comprehensive proteome studies have shown that the number of detected peptides per protein, and thus the sequence coverage, varies with protein abundance (which is the basis for spectral counting-based protein quantification (16, 17)). We find that incorporation of protein abundance in a peptide classification tool improves the accuracy of the prediction of peptide detectability allowing us to predict the detectability of all peptides within a protein as well as the amount of enrichment needed to detect a peptide of interest.We used a set of 120 purified in vitro expressed proteins as a training set to develop a prediction tool. We deliver this in the form of a web-based interface that provides information about: (1) the probability of detecting the different tryptic peptides of a protein, and (2) the fold enrichment that would be required to bring a peptide of interest into the detectable range. This tool will help guide researchers in their efforts to monitor particular peptides and their modified cognates by MS, specifically, in prioritizing their efforts toward enriching proteins where they would be likely to be able to detect a peptide or modification of interest.
Keywords:
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号