首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 44 毫秒
1.
The sequence tag-based peptide identification methods are a promising alternative to the traditional database search approach. However, a more comprehensive analysis, optimization, and comparison with established methods are necessary before these methods can gain widespread use in the proteomics community. Using the InsPecT open source code base ( Tanner et al., Anal. Chem. 2005, 77, 4626- 39 ), we present an improved sequence tag generation method that directly incorporates multicharged fragment ion peaks present in many tandem mass spectra of higher charge states. We also investigate the performance of sequence tagging under different settings using control data sets generated on five different types of mass spectrometers, as well as using a complex phosphopeptide-enriched sample. We also demonstrate that additional modeling of InsPecT search scores using a semiparametric approach incorporating the accuracy of the precursor ion mass measurement provides additional improvement in the ability to discriminate between correct and incorrect peptide identifications. The overall superior performance of the sequence tag-based peptide identification method is demonstrated by comparison with a commonly used SEQUEST/PeptideProphet approach.  相似文献   

2.
The SwePep database is designed for endogenous peptides and mass spectrometry. It contains information about the peptides such as mass, pl, precursor protein and potential post-translational modifications. Here, we have improved and extended the SwePep database with tandem mass spectra, by adding a locally curated version of the global proteome machine database (GPMDB). In peptidomic experiment practice, many peptide sequences contain multiple tandem mass spectra with different quality. The new tandem mass spectra database in SwePep enables validation of low quality spectra using high quality tandem mass spectra. The validation is performed by comparing the fragmentation patterns of the two spectra using algorithms for calculating the correlation coefficient between the spectra. The present study is the first step in developing a tandem spectrum database for endogenous peptides that can be used for spectrum-to-spectrum identifications instead of peptide identifications using traditional protein sequence database searches.  相似文献   

3.
Scherl A  Tsai YS  Shaffer SA  Goodlett DR 《Proteomics》2008,8(14):2791-2797
Although mass spectrometers are capable of providing high mass accuracy data, assignment of true monoisotopic precursor ion mass is complicated during data-dependent ion selection for LC-MS/MS analysis of complex mixtures. The complication arises when chromatographic peak widths for a given analyte exceed the time required to acquire a precursor ion mass spectrum. The result is that many measured monoisotopic masses are misassigned due to calculation from a single mass spectrum with poor ion statistics based on only a fraction of the total available ions for a given analyte. Such data in turn produces errors in automated database searches, where precursor m/z value is one search parameter. We propose here a postacquisition approach to correct misassigned monoisotopic m/z values that involves peak detection over the entire elution profile and correction of the precursor ion monoisotopic mass. As a result of using this approach to reprocess shotgun proteomic data we increased peptide sequence assignments by 10% while reducing the estimated false positive ratio from 1 to 0.2%. We also show that 4% of the salvaged identifications may be accounted for by correction of mixed tandem mass spectra resulting from fragmentation of multiple peptides simultaneously, a situation which we refer to as accidental CID.  相似文献   

4.
Quantitative proteomics relies on accurate protein identification, which often is carried out by automated searching of a sequence database with tandem mass spectra of peptides. When these spectra contain limited information, automated searches may lead to incorrect peptide identifications. It is therefore necessary to validate the identifications by careful manual inspection of the mass spectra. Not only is this task time-consuming, but the reliability of the validation varies with the experience of the analyst. Here, we report a systematic approach to evaluating peptide identifications made by automated search algorithms. The method is based on the principle that the candidate peptide sequence should adequately explain the observed fragment ions. Also, the mass errors of neighboring fragments should be similar. To evaluate our method, we studied tandem mass spectra obtained from tryptic digests of E. coli and HeLa cells. Candidate peptides were identified with the automated search engine Mascot and subjected to the manual validation method. The method found correct peptide identifications that were given low Mascot scores (e.g., 20-25) and incorrect peptide identifications that were given high Mascot scores (e.g., 40-50). The method comprehensively detected false results from searches designed to produce incorrect identifications. Comparison of the tandem mass spectra of synthetic candidate peptides to the spectra obtained from the complex peptide mixtures confirmed the accuracy of the evaluation method. Thus, the evaluation approach described here could help boost the accuracy of protein identification, increase number of peptides identified, and provide a step toward developing a more accurate next-generation algorithm for protein identification.  相似文献   

5.
Spectral libraries have emerged as a viable alternative to protein sequence databases for peptide identification. These libraries contain previously detected peptide sequences and their corresponding tandem mass spectra (MS/MS). Search engines can then identify peptides by comparing experimental MS/MS scans to those in the library. Many of these algorithms employ the dot product score for measuring the quality of a spectrum-spectrum match (SSM). This scoring system does not offer a clear statistical interpretation and ignores fragment ion m/z discrepancies in the scoring. We developed a new spectral library search engine, Pepitome, which employs statistical systems for scoring SSMs. Pepitome outperformed the leading library search tool, SpectraST, when analyzing data sets acquired on three different mass spectrometry platforms. We characterized the reliability of spectral library searches by confirming shotgun proteomics identifications through RNA-Seq data. Applying spectral library and database searches on the same sample revealed their complementary nature. Pepitome identifications enabled the automation of quality analysis and quality control (QA/QC) for shotgun proteomics data acquisition pipelines.  相似文献   

6.
Researchers have several options when designing proteomics experiments. Primary among these are choices of experimental method, instrumentation and spectral interpretation software. To evaluate these choices on a proteome scale, we compared triplicate measurements of the yeast proteome by liquid chromatography tandem mass spectrometry (LC-MS/MS) using linear ion trap (LTQ) and hybrid quadrupole time-of-flight (QqTOF; QSTAR) mass spectrometers. Acquired MS/MS spectra were interpreted with Mascot and SEQUEST algorithms with and without the requirement that all returned peptides be tryptic. Using a composite target decoy database strategy, we selected scoring criteria yielding 1% estimated false positive identifications at maximum sensitivity for all data sets, allowing reasonable comparisons between them. These comparisons indicate that Mascot and SEQUEST yield similar results for LTQ-acquired spectra but less so for QSTAR spectra. Furthermore, low reproducibility between replicate data acquisitions made on one or both instrument platforms can be exploited to increase sensitivity and confidence in large-scale protein identifications.  相似文献   

7.
Protein identifications with the borderline statistical confidence are typically produced by matching a few marginal quality MS/MS spectra to database peptide sequences and represent a significant bottleneck in the reliable and reproducible characterization of proteomes. Here, we present a method for rapid validation of borderline hits that circumvents the need in, often biased, manual inspection of raw MS/MS spectra. The approach takes advantage of the independent interpretation of corresponding MS/MS spectra by PepNovo de novo sequencing software followed by mass spectrometry-driven BLAST (MS BLAST) sequence-similarity database searches that utilize all partially inaccurate, degenerate and redundant candidate peptide sequences. In a case study involving the identification of more than 180 Caenorhabditis elegans proteins by nanoLC-MS/MS analysis on a linear ion trap LTQ mass spectrometer, the approach enabled rapid assignment (confirmation or rejection) of more than 70% of Mascot hits of borderline statistical confidence.  相似文献   

8.
Mass spectrometers that provide high mass accuracy such as FT-ICR instruments are increasingly used in proteomic studies. Although the importance of accurately determined molecular masses for the identification of biomolecules is generally accepted, its role in the analysis of shotgun proteomic data has not been thoroughly studied. To gain insight into this role, we used a hybrid linear quadrupole ion trap/FT-ICR (LTQ FT) mass spectrometer for LC-MS/MS analysis of a highly complex peptide mixture derived from a fraction of the yeast proteome. We applied three data-dependent MS/MS acquisition methods. The FT-ICR part of the hybrid mass spectrometer was either not exploited, used only for survey MS scans, or also used for acquiring selected ion monitoring scans to optimize mass accuracy. MS/MS data were assigned with the SEQUEST algorithm, and peptide identifications were validated by estimating the number of incorrect assignments using the composite target/decoy database search strategy. We developed a simple mass calibration strategy exploiting polydimethylcyclosiloxane background ions as calibrant ions. This strategy allowed us to substantially improve mass accuracy without reducing the number of MS/MS spectra acquired in an LC-MS/MS run. The benefits of high mass accuracy were greatest for assigning MS/MS spectra with low signal-to-noise ratios and for assigning phosphopeptides. Confident peptide identification rates from these data sets could be doubled by the use of mass accuracy information. It was also shown that improving mass accuracy at a cost to the MS/MS acquisition rate substantially lowered the sensitivity of LC-MS/MS analyses. The use of FT-ICR selected ion monitoring scans to maximize mass accuracy reduced the number of protein identifications by 40%.  相似文献   

9.
Clustering millions of tandem mass spectra   总被引:1,自引:0,他引:1  
Tandem mass spectrometry (MS/MS) experiments often generate redundant data sets containing multiple spectra of the same peptides. Clustering of MS/MS spectra takes advantage of this redundancy by identifying multiple spectra of the same peptide and replacing them with a single representative spectrum. Analyzing only representative spectra results in significant speed-up of MS/MS database searches. We present an efficient clustering approach for analyzing large MS/MS data sets (over 10 million spectra) with a capability to reduce the number of spectra submitted to further analysis by an order of magnitude. The MS/MS database search of clustered spectra results in fewer spurious hits to the database and increases number of peptide identifications as compared to regular nonclustered searches. Our open source software MS-Clustering is available for download at http://peptide.ucsd.edu or can be run online at http://proteomics.bioprojects.org/MassSpec.  相似文献   

10.
Reliable statistical validation of peptide and protein identifications is a top priority in large-scale mass spectrometry based proteomics. PeptideProphet is one of the computational tools commonly used for assessing the statistical confidence in peptide assignments to tandem mass spectra obtained using database search programs such as SEQUEST, MASCOT, or X! TANDEM. We present two flexible methods, the variable component mixture model and the semiparametric mixture model, that remove the restrictive parametric assumptions in the mixture modeling approach of PeptideProphet. Using a control protein mixture data set generated on an linear ion trap Fourier transform (LTQ-FT) mass spectrometer, we demonstrate that both methods improve parametric models in terms of the accuracy of probability estimates and the power to detect correct identifications controlling the false discovery rate to the same degree. The statistical approaches presented here require that the data set contain a sufficient number of decoy (known to be incorrect) peptide identifications, which can be obtained using the target-decoy database search strategy.  相似文献   

11.
Tandem mass spectrometry (MS/MS) combined with database searching is currently the most widely used method for high-throughput peptide and protein identification. Many different algorithms, scoring criteria, and statistical models have been used to identify peptides and proteins in complex biological samples, and many studies, including our own, describe the accuracy of these identifications, using at best generic terms such as "high confidence." False positive identification rates for these criteria can vary substantially with changing organisms under study, growth conditions, sequence databases, experimental protocols, and instrumentation; therefore, study-specific methods are needed to estimate the accuracy (false positive rates) of these peptide and protein identifications. We present and evaluate methods for estimating false positive identification rates based on searches of randomized databases (reversed and reshuffled). We examine the use of separate searches of a forward then a randomized database and combined searches of a randomized database appended to a forward sequence database. Estimated error rates from randomized database searches are first compared against actual error rates from MS/MS runs of known protein standards. These methods are then applied to biological samples of the model microorganism Shewanella oneidensis strain MR-1. Based on the results obtained in this study, we recommend the use of use of combined searches of a reshuffled database appended to a forward sequence database as a means providing quantitative estimates of false positive identification rates of peptides and proteins. This will allow researchers to set criteria and thresholds to achieve a desired error rate and provide the scientific community with direct and quantifiable measures of peptide and protein identification accuracy as opposed to vague assessments such as "high confidence."  相似文献   

12.
The proteins in blood were all first expressed as mRNAs from genes within cells. There are databases of human proteins that are known to be expressed as mRNA in human cells and tissues. Proteins identified from human blood by the correlation of mass spectra that fail to match human mRNA expression products may not be correct. We compared the proteins identified in human blood by mass spectrometry by 10 different groups by correlation to human and nonhuman nucleic acid sequences. We determined whether the peptides or proteins identified by the different groups mapped to the human known proteins of the Reference Sequence (RefSeq) database. We used Structured Query Language data base searches of the peptide sequences correlated to tandem mass spectrometry spectra and basic local alignment search tool analysis of the identified full length proteins to control for correlation to the wrong peptide sequence or the existence of the same or very similar peptide sequence shared by more than one protein. Mass spectra were correlated against large protein data bases that contain many sequences that may not be expressed in human beings yet the search returned a very high percentage of peptides or proteins that are known to be found in humans. Only about 5% of proteins mapped to hypothetical sequences, which is in agreement with the reported false-positive rate of searching algorithms conditions. The results were highly enriched in secreted and soluble proteins and diminished in insoluble or membrane proteins. Most of the proteins identified were relatively short and showed a similar size distribution compared to the RefSeq database. At least three groups agree on a nonredundant set of 1671 types of proteins and a nonredundant set of 3151 proteins were identified by at least three peptides.  相似文献   

13.
Tandem mass spectrometry is commonly used to identify peptides, typically by comparing their product ion spectra with those predicted from a protein sequence database and scoring these matches. The most reported quality metric for a set of peptide identifications is the false discovery rate (FDR), the fraction of expected false identifications in the set. This metric has so far only been used for completely sequenced organisms or known protein mixtures. We have investigated whether FDR estimations are also applicable in the case of partially sequenced organisms, where many high-quality spectra fail to identify the correct peptides because the latter are not present in the searched sequence database. Using real data from human plasma and simulated partial sequence databases derived from two complete human sequence databases with different levels of redundancy, we could demonstrate that the mixture model approach in PeptideProphet is robust for partial databases, particularly if used in combination with decoy sequences. We therefore recommend using this method when estimating the FDR and reporting peptide identifications from incompletely sequenced organisms.  相似文献   

14.
Tandem mass spectrometry (MS/MS) is frequently used in the identification of peptides and proteins. Typical proteomic experiments rely on algorithms such as SEQUEST and MASCOT to compare thousands of tandem mass spectra against the theoretical fragment ion spectra of peptides in a database. The probabilities that these spectrum-to-sequence assignments are correct can be determined by statistical software such as PeptideProphet or through estimations based on reverse or decoy databases. However, many of the software applications that assign probabilities for MS/MS spectra to sequence matches were developed using training data sets from 3D ion-trap mass spectrometers. Given the variety of types of mass spectrometers that have become commercially available over the last 5 years, we sought to generate a data set of reference data covering multiple instrumentation platforms to facilitate both the refinement of existing computational approaches and the development of novel software tools. We analyzed the proteolytic peptides in a mixture of tryptic digests of 18 proteins, named the "ISB standard protein mix", using 8 different mass spectrometers. These include linear and 3D ion traps, two quadrupole time-of-flight platforms (qq-TOF), and two MALDI-TOF-TOF platforms. The resulting data set, which has been named the Standard Protein Mix Database, consists of over 1.1 million spectra in 150+ replicate runs on the mass spectrometers. The data were inspected for quality of separation and searched using SEQUEST. All data, including the native raw instrument and mzXML formats and the PeptideProphet validated peptide assignments, are available at http://regis-web.systemsbiology.net/PublicDatasets/.  相似文献   

15.
Robust statistical validation of peptide identifications obtained by tandem mass spectrometry and sequence database searching is an important task in shotgun proteomics. PeptideProphet is a commonly used computational tool that computes confidence measures for peptide identifications. In this paper, we investigate several limitations of the PeptideProphet modeling approach, including the use of fixed coefficients in computing the discriminant search score and selection of the top scoring peptide assignment per spectrum only. To address these limitations, we describe an adaptive method in which a new discriminant function is learned from the data in an iterative fashion. We extend the modeling framework to go beyond the top scoring peptide assignment per spectrum. We also investigate the effect of clustering the spectra according to their spectrum quality score followed by cluster-specific mixture modeling. The analysis is carried out using data acquired from a mixture of purified proteins on four different types of mass spectrometers, as well as using a complex human serum data set. A special emphasis is placed on the analysis of data generated on high mass accuracy instruments.  相似文献   

16.
Mass spectrometers equipped with matrix‐assisted laser desorption/ionization (MALDI‐MS) require frequent multipoint calibration to obtain good mass accuracy over a wide mass range and across large numbers of samples. In this study, we introduce a new synthetic peptide mass calibration standard termed PAS‐cal tailored for MALDI‐MS based bottom‐up proteomics. This standard consists of 30 peptides between 8 and 37 amino acids long and each constructed to contain repetitive sequences of Pro, Ala and Ser as well as one C‐terminal arginine residue. MALDI spectra thus cover a mass range between 750 and 3200 m/z in MS mode and between 100 and 3200 m/z in MS/MS mode. Our results show that multipoint calibration of MS spectra using PAS‐cal peptides compares well to current commercial reagents for protein identification by PMF. Calibration of tandem mass spectra from LC‐MALDI experiments using the longest peptide, PAS‐cal37, resulted in smaller fragment ion mass errors, more matching fragment ions and more protein and peptide identifications compared to commercial standards, making the PAS‐cal standard generically useful for bottom‐up proteomics.  相似文献   

17.
A novel software tool named PTM-Explorer has been applied to LC-MS/MS datasets acquired within the Human Proteome Organisation (HUPO) Brain Proteome Project (BPP). PTM-Explorer enables automatic identification of peptide MS/MS spectra that were not explained in typical sequence database searches. The main focus was detection of PTMs, but PTM-Explorer detects also unspecific peptide cleavage, mass measurement errors, experimental modifications, amino acid substitutions, transpeptidation products and unknown mass shifts. To avoid a combinatorial problem the search is restricted to a set of selected protein sequences, which stem from previous protein identifications using a common sequence database search. Prior to application to the HUPO BPP data, PTM-Explorer was evaluated on excellently manually characterized and evaluated LC-MS/MS data sets from Alpha-A-Crystallin gel spots obtained from mouse eye lens. Besides various PTMs including phosphorylation, a wealth of experimental modifications and unspecific cleavage products were successfully detected, completing the primary structure information of the measured proteins. Our results indicate that a large amount of MS/MS spectra that currently remain unidentified in standard database searches contain valuable information that can only be elucidated using suitable software tools.  相似文献   

18.
A key problem in computational proteomics is distinguishing between correct and false peptide identifications. We argue that evaluating the error rates of peptide identifications is not unlike computing generating functions in combinatorics. We show that the generating functions and their derivatives ( spectral energy and spectral probability) represent new features of tandem mass spectra that, similarly to Delta-scores, significantly improve peptide identifications. Furthermore, the spectral probability provides a rigorous solution to the problem of computing statistical significance of spectral identifications. The spectral energy/probability approach improves the sensitivity-specificity tradeoff of existing MS/MS search tools, addresses the notoriously difficult problem of "one-hit-wonders" in mass spectrometry, and often eliminates the need for decoy database searches. We therefore argue that the generating function approach has the potential to increase the number of peptide identifications in MS/MS searches.  相似文献   

19.
Recent advances in instrument control and enrichment procedures have enabled us to quantify large numbers of phosphoproteins and record site-specific phosphorylation events. An intriguing problem that has arisen with these advances is to accurately validate where phosphorylation events occur, if possible, in an automated manner. The problem is difficult because MS/MS spectra of phosphopeptides are generally more complicated than those of unmodified peptides. For large scale studies, the problem is even more evident because phosphorylation sites are based on single peptide identifications in contrast to protein identifications where at least two peptides from the same protein are required for identification. To address this problem we have developed an integrated strategy that increases the reliability and ease for phosphopeptide validation. We have developed an off-line titanium dioxide (TiO(2)) selective phosphopeptide enrichment procedure for crude cell lysates. Following enrichment, half of the phosphopeptide fractionated sample is enzymatically dephosphorylated, after which both samples are subjected to LC-MS/MS. From the resulting MS/MS analyses, the dephosphorylated peptide is used as a reference spectrum against the original phosphopeptide spectrum, in effect generating two peptide spectra for the same amino acid sequence, thereby enhancing the probability of a correct identification. The integrated procedure is summarized as follows: 1) enrichment for phosphopeptides by TiO(2) chromatography, 2) dephosphorylation of half the sample, 3) LC-MS/MS-based analysis of phosphopeptides and corresponding dephosphorylated peptides, 4) comparison of peptide elution profiles before and after dephosphorylation to confirm phosphorylation, and 5) comparison of MS/MS spectra before and after dephosphorylation to validate the phosphopeptide and its phosphorylation site. This phosphopeptide identification represents a major improvement as compared with identifications based only on single MS/MS spectra and probability-based database searches. We investigated an applicability of this method to crude cell lysates and demonstrate its application on the large scale analysis of phosphorylation sites in differentiating mouse myoblast cells.  相似文献   

20.
Proteome identification using peptide-centric proteomics techniques is a routinely used analysis technique. One of the most powerful and popular methods for the identification of peptides from MS/MS spectra is protein database matching using search engines. Significance thresholding through false discovery rate (FDR) estimation by target/decoy searches is used to ensure the retention of predominantly confident assignments of MS/MS spectra to peptides. However, shortcomings have become apparent when such decoy searches are used to estimate the FDR. To study these shortcomings, we here introduce a novel kind of decoy database that contains isobaric mutated versions of the peptides that were identified in the original search. Because of the supervised way in which the entrapment sequences are generated, we call this a directed decoy database. Since the peptides found in our directed decoy database are thus specifically designed to look quite similar to the forward identifications, the limitations of the existing search algorithms in making correct calls in such strongly confusing situations can be analyzed. Interestingly, for the vast majority of confidently identified peptide identifications, a directed decoy peptide-to-spectrum match can be found that has a better or equal match score than the forward match score, highlighting an important issue in the interpretation of peptide identifications in present-day high-throughput proteomics.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号