首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
The identification of peptides and proteins from fragmentation mass spectra is a very common approach in the field of proteomics. Contemporary high-throughput peptide identification pipelines can quickly produce large quantities of MS/MS data that contain valuable knowledge about the actual physicochemical processes involved in the peptide fragmentation process, which can be extracted through extensive data mining studies. As these studies attempt to exploit the intensity information contained in the MS/MS spectra, a critical step required for a meaningful comparison of this information between MS/MS spectra is peak intensity normalization. We here describe a procedure for quantifying the efficiency of different published normalization methods in terms of the quartile coefficient of dispersion (qcod) statistic. The quartile coefficient of dispersion is applied to measure the dispersion of the peak intensities between redundant MS/MS spectra, allowing the quantification of the differences in computed peak intensity reproducibility between the different normalization methods. We demonstrate that our results are independent of the data set used in the evaluation procedure, allowing us to provide generic guidance on the choice of normalization method to apply in a certain MS/MS pipeline application.  相似文献   

High‐resolution MS/MS spectra of peptides can be deisotoped to identify monoisotopic masses of peptide fragments. The use of such masses should improve protein identification rates. However, deisotoping is not universally used and its benefits have not been fully explored. Here, MS2‐Deisotoper, a tool for use prior to database search, is used to identify monoisotopic peaks in centroided MS/MS spectra. MS2‐Deisotoper works by comparing the mass and relative intensity of each peptide fragment peak to every other peak of greater mass, and by applying a set of rules concerning mass and intensity differences. After comprehensive parameter optimization, it is shown that MS2‐Deisotoper can improve the number of peptide spectrum matches (PSMs) identified by up to 8.2% and proteins by up to 2.8%. It is effective with SILAC and non‐SILAC MS/MS data. The identification of unique peptide sequences is also improved, increasing the number of human proteoforms by 3.7%. Detailed investigation of results shows that deisotoping increases Mascot ion scores, improves FDR estimation for PSMs, and leads to greater protein sequence coverage. At a peptide level, it is found that the efficacy of deisotoping is affected by peptide mass and charge. MS2‐Deisotoper can be used via a user interface or as a command‐line tool.  相似文献   

To interpret LC-MS/MS data in proteomics, most popular protein identification algorithms primarily use predicted fragment m/z values to assign peptide sequences to fragmentation spectra. The intensity information is often undervalued, because it is not as easy to predict and incorporate into algorithms. Nevertheless, the use of intensity to assist peptide identification is an attractive prospect and can potentially improve the confidence of matches and generate more identifications. On the basis of our previously reported study of fragmentation intensity patterns, we developed a protein identification algorithm, SeQuence IDentfication (SQID), that makes use of the coarse intensity from a statistical analysis. The scoring scheme was validated by comparing with Sequest and X!Tandem using three data sets, and the results indicate an improvement in the number of identified peptides, including unique peptides that are not identified by Sequest or X!Tandem. The software and source code are available under the GNU GPL license at http://quiz2.chem.arizona.edu/wysocki/bioinformatics.htm.  相似文献   

A major limitation in identifying peptides from complex mixtures by shotgun proteomics is the ability of search programs to accurately assign peptide sequences using mass spectrometric fragmentation spectra (MS/MS spectra). Manual analysis is used to assess borderline identifications; however, it is error-prone and time-consuming, and criteria for acceptance or rejection are not well defined. Here we report a Manual Analysis Emulator (MAE) program that evaluates results from search programs by implementing two commonly used criteria: 1) consistency of fragment ion intensities with predicted gas phase chemistry and 2) whether a high proportion of the ion intensity (proportion of ion current (PIC)) in the MS/MS spectra can be derived from the peptide sequence. To evaluate chemical plausibility, MAE utilizes similarity (Sim) scoring against theoretical spectra simulated by MassAnalyzer software (Zhang, Z. (2004) Prediction of low-energy collision-induced dissociation spectra of peptides. Anal. Chem. 76, 3908-3922) using known gas phase chemical mechanisms. The results show that Sim scores provide significantly greater discrimination between correct and incorrect search results than achieved by Sequest XCorr scoring or Mascot Mowse scoring, allowing reliable automated validation of borderline cases. To evaluate PIC, MAE simplifies the DTA text files summarizing the MS/MS spectra and applies heuristic rules to classify the fragment ions. MAE output also provides data mining functions, which are illustrated by using PIC to identify spectral chimeras, where two or more peptide ions were sequenced together, as well as cases where fragmentation chemistry is not well predicted.  相似文献   

We report an isotope labeling shotgun proteome analysis strategy to validate the spectrum-to-sequence assignments generated by using sequence-database searching for the construction of a more reliable MS/MS spectral library. This strategy is demonstrated in the analysis of the E. coli K12 proteome. In the workflow, E. coli cells were cultured in normal and (15)N-enriched media. The differentially labeled proteins from the cell extracts were subjected to trypsin digestion and two-dimensional liquid chromatography quadrupole time-of-flight tandem mass spectrometry (2D-LC QTOF MS/MS) analysis. The MS/MS spectra of the two samples were individually searched using Mascot against the E. coli proteome database to generate lists of peptide sequence matches. The two data sets were compared by overlaying the spectra of unlabeled and labeled matches of the same peptide sequence for validation. Two cutoff filters, one based on the number of common fragment ions and another one on the similarity of intensity patterns among the common ions, were developed and applied to the overlaid spectral pairs to reject the low quality or incorrectly assigned spectra. By examining 257,907 and 245,156 spectra acquired from the unlabeled and (15)N-labeled samples, respectively, an experimentally validated MS/MS spectral library of tryptic peptides was constructed for E. coli K12 that consisted of 9,302 unique spectra with unique sequence and charge state, representing 7,763 unique peptide sequences. This E. coli spectral library could be readily expanded, and the overall strategy should be applicable to other organisms. Even with this relatively small library, it was shown that more peptides could be identified with higher confidence using the spectral search method than by sequence-database searching.  相似文献   

Tandem mass spectrometry (MS/MS) has emerged as a cornerstone of proteomics owing in part to robust spectral interpretation algorithms. Widely used algorithms do not fully exploit the intensity patterns present in mass spectra. Here, we demonstrate that intensity pattern modeling improves peptide and protein identification from MS/MS spectra. We modeled fragment ion intensities using a machine-learning approach that estimates the likelihood of observed intensities given peptide and fragment attributes. From 1,000,000 spectra, we chose 27,000 with high-quality, nonredundant matches as training data. Using the same 27,000 spectra, intensity was similarly modeled with mismatched peptides. We used these two probabilistic models to compute the relative likelihood of an observed spectrum given that a candidate peptide is matched or mismatched. We used a 'decoy' proteome approach to estimate incorrect match frequency, and demonstrated that an intensity-based method reduces peptide identification error by 50-96% without any loss in sensitivity.  相似文献   

The promise of mass spectrometry as a tool for probing signal-transduction is predicated on reliable identification of post-translational modifications. Phosphorylations are key mediators of cellular signaling, yet are hard to detect, partly because of unusual fragmentation patterns of phosphopeptides. In addition to being accurate, MS/MS identification software must be robust and efficient to deal with increasingly large spectral data sets. Here, we present a new scoring function for the Inspect software for phosphorylated peptide tandem mass spectra for ion-trap instruments, without the need for manual validation. The scoring function was modeled by learning fragmentation patterns from 7677 validated phosphopeptide spectra. We compare our algorithm against SEQUEST and X!Tandem on testing and training data sets. At a 1% false positive rate, Inspect identified the greatest total number of phosphorylated spectra, 13% more than SEQUEST and 39% more than X!Tandem. Spectra identified by Inspect tended to score better in several spectral quality measures. Furthermore, Inspect runs much faster than either SEQUEST or X!Tandem, making desktop phosphoproteomics feasible. Finally, we used our new models to reanalyze a corpus of 423,000 LTQ spectra acquired for a phosphoproteome analysis of Saccharomyces cerevisiae DNA damage and repair pathways and discovered 43% more phosphopeptides than the previous study.  相似文献   

Independent of the approach used, the ability to correctly interpret tandem MS data depends on the quality of the original spectra. Even in the case of the highest quality spectra, the majority of spectral peaks can not be reliably interpreted. The accuracy of sequencing algorithms can be improved by filtering out such 'noise' peaks. Preprocessing MS/MS spectra to select informative ion peaks increases accuracy and reduces the processing time. Intuitively, the mix of informative versus non-informative peaks has a direct effect on the quality and size of the resulting candidate peptide search space. As the number of selected peaks increases, the corresponding search space increases exponentially. If we select too few peaks then the ion-ladder interpretation of the spectrum will contain gaps that can only be explained by permutations of combinations of amino acids. This will result in a larger candidate peptide search space and poorer quality candidates. The dependency that peptide sequencing accuracy has on an initial peak selection regime makes this preprocessing step a crucial facet of any approach, whether de novo or not, to MS/MS spectra interpretation.We have developed a novel approach to address this problem. Our approach uses a staged neural network to model ion fragmentation patterns and estimate the posterior probability of each ion type. Our method improves upon other preprocessing techniques and shows a significant reduction in the search space for candidate peptides without sacrificing candidate peptide quality.  相似文献   

An important step in mass spectrometry (MS)-based proteomics is the identification of peptides by their fragment spectra. Regardless of the identification score achieved, almost all tandem-MS (MS/MS) spectra contain remaining peaks that are not assigned by the search engine. These peaks may be explainable by human experts but the scale of modern proteomics experiments makes this impractical. In computer science, Expert Systems are a mature technology to implement a list of rules generated by interviews with practitioners. We here develop such an Expert System, making use of literature knowledge as well as a large body of high mass accuracy and pure fragmentation spectra. Interestingly, we find that even with high mass accuracy data, rule sets can quickly become too complex, leading to over-annotation. Therefore we establish a rigorous false discovery rate, calculated by random insertion of peaks from a large collection of other MS/MS spectra, and use it to develop an optimized knowledge base. This rule set correctly annotates almost all peaks of medium or high abundance. For high resolution HCD data, median intensity coverage of fragment peaks in MS/MS spectra increases from 58% by search engine annotation alone to 86%. The resulting annotation performance surpasses a human expert, especially on complex spectra such as those of larger phosphorylated peptides. Our system is also applicable to high resolution collision-induced dissociation data. It is available both as a part of MaxQuant and via a webserver that only requires an MS/MS spectrum and the corresponding peptides sequence, and which outputs publication quality, annotated MS/MS spectra (www.biochem.mpg.de/mann/tools/). It provides expert knowledge to beginners in the field of MS-based proteomics and helps advanced users to focus on unusual and possibly novel types of fragment ions.In MS-based proteomics, peptides are matched to peptide sequences in databases using search engines (13). Statistical criteria are established for accepted versus rejected peptide spectra matches based on the search engine score, and usually a 99% certainty is required for reported peptides. The search engines typically only take sequence specific backbone fragmentation into account (i.e. a, b, and y ions) and some of their neutral losses. However, tandem mass spectra—especially of larger peptides—can be quite complex and contain a number of medium or even high abundance peptide fragments that are not annotated by the search engine result. This can result in uncertainty for the user—especially if only relatively few peaks are annotated—because it may reflect an incorrect identification. However, the most common cause of unlabeled peaks is that another peptide was present in the precursor selection window and was cofragmented. This has variously been termed “chimeric spectra” (46), or the problem of low precursor ion fraction (PIF)1 (7). Such spectra may still be identifiable with high confidence. The Andromeda search engine in MaxQuant, for instance, attempts to identify a second peptide in such cases (8, 9). However, even “pure” spectra (those with a high PIF) often still contain many unassigned peaks. These can be caused by different fragment types, such as internal ions, single or combined neutral losses as well as immonium and other ion types in the low mass region. A mass spectrometric expert can assign many or all of these peaks, based on expert knowledge of fragmentation and manual calculation of fragment masses, resulting in a higher degree of confidence for the identification. However, there are more and more practitioners of proteomics without in depth training or experience in annotating MS/MS spectra and such annotation would in any case be prohibitive for hundreds of thousands of spectra. Furthermore, even human experts may wrongly annotate a given peak—especially with low mass accuracy tandem mass spectra—or fail to consider every possibility that could have resulted in this fragment mass.Given the desirability of annotating fragment peaks to the highest degree possible, we turned to “Expert Systems,” a well-established technology in computer science. Expert Systems achieved prominence in the 1970s and 1980s and were meant to solve complex problems by reasoning about knowledge (10, 11). Interestingly, one of the first examples was developed by Nobel Prize winner Joshua Lederberg more than 40 years ago, and dealt with the interpretation of mass spectrometric data. The program''s name was Heuristic DENTRAL (12), and it was capable of interpreting the mass spectra of aliphatic ethers and their fragments. The hypotheses produced by the program described molecular structures that are plausible explanations of the data. To infer these explanations from the data, the program incorporated a theory of chemical stability that provided limiting constraints as well as heuristic rules.In general, the aim of an Expert System is to encode knowledge extracted from professionals in the field in question. This then powers a rule-based system that can be applied broadly and in an automated manner. A rule-based Expert System represents the information obtained from human specialists in the form of IF-THEN rules. These are used to perform operations on input data to reach appropriate conclusion. A generic Expert System is essentially a computer program that provides a framework for performing a large number of inferences in a predictable way, using forward or backward chains, backtracking, and other mechanisms (13). Therefore, in contrast to statistics based learning, the “expert program” does not know what it knows through the raw volume of facts in the computer''s memory. Instead, like a human expert, it relies on a reasoning-like process of applying an empirically derived set of rules to the data.Here we implemented an Expert System for the interpretation for high mass accuracy tandem mass spectrometry data of peptides. It was developed in an iterative manner together with human experts on peptide fragmentation, using the published literature on fragmentation pathways as well as large data sets of higher-energy collisional dissociation (HCD) (14) and collision-induced dissociation (CID) based peptide identifications. Our goal was to achieve an annotation performance similar or better than experienced mass spectrometrists (15), thus making comprehensively annotated peptide spectra available in large scale proteomics.  相似文献   

The dominant ions in MS/MS spectra of peptides, which have been fragmented by low-energy CID, are often b-, y-ions and their derivatives resulting from the cleavage of the peptide bonds. However, MS/MS spectra typically contain many more peaks. These can result not only from isotope variants and multiply charged replicates of the peptide fragmentation products but also from unknown fragmentation pathways, sample-specific or systematic chemical contaminations or from noise generated by the electronic detection system. The presence of this background complicates spectrum interpretation. Besides dramatically prolonged computation time, it can lead to incorrect protein identification, especially in the case of de novo sequencing algorithms. Here, we present an algorithm for detection and transformation of multiply charged peaks into singly charged monoisotopic peaks, removal of heavy isotope replicates, and random noise. A quantitative criterion for the recognition of some noninterpretable spectra has been derived as a byproduct. The approach is based on numerical spectral analysis and signal detection methods. The algorithm has been implemented in a stand-alone computer program called MS Cleaner that can be obtained from the authors upon request.  相似文献   

Hu Y  Li Y  Lam H 《Proteomics》2011,11(24):4702-4711
Spectral library searching is a promising alternative to sequence database searching in peptide identification from MS/MS spectra. The key advantage of spectral library searching is the utilization of more spectral features to improve score discrimination between good and bad matches, and hence sensitivity. However, the coverage of reference spectral library is limited by current experimental and computational methods. We developed a computational approach to expand the coverage of spectral libraries with semi-empirical spectra predicted from perturbing known spectra of similar sequences, such as those with single amino acid substitutions. We hypothesized that the peptide of similar sequences should produce similar fragmentation patterns, at least in most cases. Our results confirm our hypothesis and specify when this approach can be applied. In actual spectral searching of real data sets, the sensitivity advantage of spectral library searching over sequence database searching can be mostly retained even when all real spectra are replaced by semi-empirical ones. We demonstrated the applicability of this approach by detecting several known non-synonymous single-nucleotide polymorphisms in three large human data sets by spectral searching.  相似文献   

Time-consuming and experience-dependent manual validations of tandem mass spectra are usually applied to SEQUEST results. This inefficient method has become a significant bottleneck for MS/MS data processing. Here we introduce a program AMASS (advanced mass spectrum screener), which can filter the tandem mass spectra of SEQUEST results by measuring the match percentage of high-abundant ions and the continuity of matched fragment ions in b, y series. Compared with Xcorr and DeltaCn filter, AMASS can increase the number of positives and reduce the number of negatives in 22 datasets generated from 18 known protein mixtures. It effectively removed most noisy spectra, false interpretations, and about half of poor fragmentation spectra, and AMASS can work synergistically with Rscore filter. We believe the use of AMASS and Rscore can result in a more accurate identification of peptide MS/MS spectra and reduce the time and energy for manual validation.  相似文献   

Systematic investigation of cellular process by mass spectrometric detection of peptides obtained from proteins digestion or directly from immuno-purification can be a powerful tool when used appropriately. The true sequence of these peptides is defined by the interpretation of spectral data using a variety of available algorithms. However peptide match algorithm scoring is typically based on some, but not all, of the mechanisms of peptide fragmentation. Although algorithm rules for soft ionization techniques generally fit very well to tryptic peptides, manual validation of spectra is often required for endogenous peptides such as MHC class I molecules where traditional trypsin digest techniques are not used. This study summarizes data mining and manual validation of hundreds of peptide sequences from MHC class I molecules in publically available data files. We herein describe several important features to improve and quantify manual validation for these endogenous peptides--post automated algorithm searching. Important fragmentation patterns are discussed for the studied MHC Class I peptides. These findings lead to practical rules that are helpful when performing manual validation. Furthermore, these observations may be useful to improve current peptide search algorithms or development of novel software tools.  相似文献   

An Z  Chen Y  Koomen JM  Merkler DJ 《Proteomics》2012,12(2):173-182
Amidation is a post-translational modification found at the C-terminus of ~50% of all neuropeptide hormones. Cleavage of the C(α)-N bond of a C-terminal glycine yields the α-amidated peptide in a reaction catalyzed by peptidylglycine α-amidating monooxygenase (PAM). The mass of an α-amidated peptide decreases by 58 Da relative to its precursor. The amino acid sequences of an α-amidated peptide and its precursor differ only by the C-terminal glycine meaning that the peptides exhibit similar RP-HPLC properties and tandem mass spectral (MS/MS) fragmentation patterns. Growth of cultured cells in the presence of a PAM inhibitor ensured the coexistence of α-amidated peptides and their precursors. A strategy was developed for precursor and α-amidated peptide pairing (PAPP): LC-MS/MS data of peptide extracts were scanned for peptide pairs that differed by 58 Da in mass, but had similar RP-HPLC retention times. The resulting peptide pairs were validated by checking for similar fragmentation patterns in their MS/MS data prior to identification by database searching or manual interpretation. This approach significantly reduced the number of spectra requiring interpretation, decreasing the computing time required for database searching and enabling manual interpretation of unidentified spectra. Reported here are the α-amidated peptides identified from AtT-20 cells using the PAPP method.  相似文献   

Manual checking is commonly employed to validate the phosphopeptide identifications from database searching of tandem mass spectra. It is very time-consuming and labor intensive as the number of phosphopeptide identifications increases greatly. In this study, a simple automatic validation approach was developed for phosphopeptide identification by combining consecutive stage mass spectrometry data and the target-decoy database searching strategy. Only phosphopeptides identified from both MS2 and its corresponding MS3 were accepted for further filtering, which greatly improved the reliability in phosphopeptide identification. Before database searching, the spectra were validated for charge state and neutral loss peak intensity, and then the invalid MS2/MS3 spectra were removed, which greatly reduced the database searching time. It was found that the sensitivity was significantly improved in MS2/MS3 strategy as the number of identified phosphopeptides was 2.5 times that obtained by the conventional filter-based MS2 approach. Because of the use of the target-decoy database, the false-discovery rate (FDR) of the identified phosphopeptides could be easily determined, and it was demonstrated that the determined FDR can precisely reflect the actual FDR without any manual validation stage.  相似文献   

The applicability of a trypsin-based monolithic bioreactor coupled on-line with LC/MS/MS for rapid proteolytic digestion and protein identification is here described. Dilute samples are passed through the bioreactor for generation of proteolytic fragments in less than 10 min. After digestion and peptide separation, electrospray ionization tandem mass spectrometry is used to generate a peptide map and to identify proteolytic peptides by correlating their fragmentation spectra with amino acid sequences from a protein database. By digesting picomoles of proteins sufficient data from ESI and MS/MS were obtained to unambiguously identify proteins alone and in serum samples. This approach was also extended to locate mutation sites in beta-lactoglobulin A and B variants.  相似文献   



A better understanding of the mechanisms involved in gas-phase fragmentation of peptides is essential for the development of more reliable algorithms for high-throughput protein identification using mass spectrometry (MS). Current methodologies depend predominantly on the use of derived m/z values of fragment ions, and, the knowledge provided by the intensity information present in MS/MS spectra has not been fully exploited. Indeed spectrum intensity information is very rarely utilized in the algorithms currently in use for high-throughput protein identification.  相似文献   

High-throughput proteomics experiments typically generate large amounts of peptide fragmentation mass spectra during a single experiment. There is often a substantial amount of redundant fragmentation of the same precursors among these spectra, which is usually considered a nuisance. We here discuss the potential of clustering and merging redundant spectra to turn this redundancy into a useful property of the dataset. To this end, we have created the first general-purpose, freely available open-source software application for clustering and merging MS/MS spectra. The application also introduces a novel approach to calculating the similarity of fragmentation mass spectra that takes into account the increased precision of modern mass spectrometers, and we suggest a simple but effective improvement to single-linkage clustering. The application and the novel algorithms are applied to several real-life proteomic datasets and the results are discussed. An analysis of the influence of the different algorithms available and their parameters is given, as well as a number of important applications of the overall approach.  相似文献   

Efficient peptide sequencing relies on both high quality MS/MS data acquisition and exhaustive knowledge of gas-phase dissociation mechanisms. We report our contribution to the elaboration of more comprehensive fragmentation models required for efficient automated MS/MS spectra interpretation. Following a statistical approach, various peptides (296 sequences of variable compositions and lengths) were prepared and subjected to low-energy collision-induced dissociations (CID) in an electrospray hybrid instrument (ESI-Q-q-Tof type mass spectrometer) that has retained relatively limited attention so far. Besides, our studies were focused on low molecular weight singly charged peptides that often failed to be identified by sequencing algorithms. Only half of the studied compounds showed charge directed dissociations in accordance with the mobile proton model producing fragment ions directly related to the primary sequence. For the peptides that did not exhibit the expected fragment ion series, alternative dissociation behaviors issued from complex rearrangements were evidenced.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号