首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 453 毫秒
1.
Shotgun tandem mass spectrometry-based peptide sequencing using programs such as SEQUEST allows high-throughput identification of peptides, which in turn allows the identification of corresponding proteins. We have applied a machine learning algorithm, called the support vector machine, to discriminate between correctly and incorrectly identified peptides using SEQUEST output. Each peptide was characterized by SEQUEST-calculated features such as delta Cn and Xcorr, measurements such as precursor ion current and mass, and additional calculated parameters such as the fraction of matched MS/MS peaks. The trained SVM classifier performed significantly better than previous cutoff-based methods at separating positive from negative peptides. Positive and negative peptides were more readily distinguished in training set data acquired on a QTOF, compared to an ion trap mass spectrometer. The use of 13 features, including four new parameters, significantly improved the separation between positive and negative peptides. Use of the support vector machine and these additional parameters resulted in a more accurate interpretation of peptide MS/MS spectra and is an important step toward automated interpretation of peptide tandem mass spectrometry data in proteomics.  相似文献   

2.
用于串联质谱鉴定多肽的计量方法   总被引:1,自引:0,他引:1  
目前已有多种对串联质谱与数据库中多肽的理论质谱的一致性进行评估的高通量计量算法用于鸟枪法蛋白质组学 (shotgunproteomics)研究。然而这些方法操作时存在大量错误的多肽鉴定。这里提出一种新的串联质谱识别多肽序列的计量算法。该算法综合考虑了串联质谱中不同离子出现的概率、多肽的酶切位点数、理论离子与实验离子的匹配程度和匹配模式。对大容量的串联质谱数据集的测试表明 ,根据算法开发的软件PepSearch比目前最常用的软件SEQUEST有更好的鉴定准确性。PepSearch可从http : compbio.sibsnet.org projects pepsearch下载。  相似文献   

3.
A concept of unique peptides(CUP)was proposed and implemented to identify whole-cell proteins from tandem mass spectrometry(MS/MS)ion spectra.A unique peptide is defined as a peptide,irrespective of its length,that exists only in one protein of a proteome of interest,despite the fact that this peptide may appear more than once in the same protein.Integrating CUP,a two-step whole-cell protein identification strategy was developed to further increase the confidence of identified proteins.A dataset containing 40,243 MS/MS ion spectra of Saccharomyces cerevisiae and protein identification tools including Mascot and SEQUEST were used to illustrate the proposed concept and strategy.Without implementing CUP,the proteins identified by SEQUEST are 2.26 fold of those identified by Mascot.When CUP was applied,the proteins bearing unique peptides identified by SEQUEST are3.89 fold of those identified by Mascot.By cross-comparing two sets of identified proteins,only 89 common proteins derived from CUP were found.The key discrepancy between identified proteins was resulted from the filtering criteria employed by each protein identification tool.According to the origin of peptides classified by CUP and the commonality of proteins recognized by protein identification tools,all identified proteins were cross-compared,resulting in four groups of proteins possessing different levels of assigned confidence.  相似文献   

4.
Computational analysis of mass spectra remains the bottleneck in many proteomics experiments. SEQUEST was one of the earliest software packages to identify peptides from mass spectra by searching a database of known peptides. Though still popular, SEQUEST performs slowly. Crux and TurboSEQUEST have successfully sped up SEQUEST by adding a precomputed index to the search, but the demand for ever-faster peptide identification software continues to grow. Tide, introduced here, is a software program that implements the SEQUEST algorithm for peptide identification and that achieves a dramatic speedup over Crux and SEQUEST. The optimization strategies detailed here employ a combination of algorithmic and software engineering techniques to achieve speeds up to 170 times faster than a recent version of SEQUEST that uses indexing. For example, on a single Xeon CPU, Tide searches 10,000 spectra against a tryptic database of 27,499 Caenorhabditis elegans proteins at a rate of 1550 spectra per second, which compares favorably with a rate of 8.8 spectra per second for a recent version of SEQUEST with index running on the same hardware.  相似文献   

5.
We report on the effectiveness of CID, HCD, and ETD for LC-FT MS/MS analysis of peptides using a tandem linear ion trap-Orbitrap mass spectrometer. A range of software tools and analysis parameters were employed to explore the use of CID, HCD, and ETD to identify peptides (isolated from human blood plasma) without the use of specific "enzyme rules". In the evaluation of an FDR-controlled SEQUEST scoring method, the use of accurate masses for fragments increased the number of identified peptides (by ~50%) compared to the use of conventional low accuracy fragment mass information, and CID provided the largest contribution to the identified peptide data sets compared to HCD and ETD. The FDR-controlled Mascot scoring method provided significantly fewer peptide identifications than SEQUEST (by 1.3-2.3 fold) and CID, HCD, and ETD provided similar contributions to identified peptides. Evaluation of de novo sequencing and the UStags method for more intense fragment ions revealed that HCD afforded more contiguous residues (e.g., ≥ 7 amino acids) than either CID or ETD. Both the FDR-controlled SEQUEST and Mascot scoring methods provided peptide data sets that were affected by the decoy database used and mass tolerances applied (e.g., identical peptides between data sets could be limited to ~70%), while the UStags method provided the most consistent peptide data sets (>90% overlap). The m/z ranges in which CID, HCD, and ETD contributed the largest number of peptide identifications were substantially overlapping. This work suggests that the three peptide ion fragmentation methods are complementary and that maximizing the number of peptide identifications benefits significantly from a careful match with the informatics tools and methods applied. These results also suggest that the decoy strategy may inaccurately estimate identification FDRs.  相似文献   

6.
Lipid rafts were prepared according to standard protocols from Jurkat T cells stimulated via T cell receptor/CD28 cross-linking and from control (unstimulated) cells. Co-isolating proteins from the control and stimulated cell preparations were labeled with isotopically normal (d0) and heavy (d8) versions of the same isotope-coded affinity tag (ICAT) reagent, respectively. Samples were combined, proteolyzed, and resultant peptides fractionated via cation exchange chromatography. Cysteine-containing (ICAT-labeled) peptides were recovered via the biotin tag component of the ICAT reagents by avidin-affinity chromatography. On-line micro-capillary liquid chromatography tandem mass spectrometry was performed on both avidin-affinity (ICAT-labeled) and flow-through (unlabeled) fractions. Initial peptide sequence identification was by searching recorded tandem mass spectrometry spectra against a human sequence data base using SEQUEST software. New statistical data modeling algorithms were then applied to the SEQUEST search results. These allowed for discrimination between likely "correct" and "incorrect" peptide assignments, and from these the inferred proteins that they collectively represented, by calculating estimated probabilities that each peptide assignment and subsequent protein identification was a member of the "correct" population. For convenience, the resultant lists of peptide sequences assigned and the proteins to which they corresponded were filtered at an arbitrarily set cut-off of 0.5 (i.e. 50% likely to be "correct") and above and compiled into two separate datasets. In total, these data sets contained 7667 individual peptide identifications, which represented 2669 unique peptide sequences, corresponding to 685 proteins and related protein groups.  相似文献   

7.
Tandem mass spectrometry (MS/MS) is frequently used in the identification of peptides and proteins. Typical proteomic experiments rely on algorithms such as SEQUEST and MASCOT to compare thousands of tandem mass spectra against the theoretical fragment ion spectra of peptides in a database. The probabilities that these spectrum-to-sequence assignments are correct can be determined by statistical software such as PeptideProphet or through estimations based on reverse or decoy databases. However, many of the software applications that assign probabilities for MS/MS spectra to sequence matches were developed using training data sets from 3D ion-trap mass spectrometers. Given the variety of types of mass spectrometers that have become commercially available over the last 5 years, we sought to generate a data set of reference data covering multiple instrumentation platforms to facilitate both the refinement of existing computational approaches and the development of novel software tools. We analyzed the proteolytic peptides in a mixture of tryptic digests of 18 proteins, named the "ISB standard protein mix", using 8 different mass spectrometers. These include linear and 3D ion traps, two quadrupole time-of-flight platforms (qq-TOF), and two MALDI-TOF-TOF platforms. The resulting data set, which has been named the Standard Protein Mix Database, consists of over 1.1 million spectra in 150+ replicate runs on the mass spectrometers. The data were inspected for quality of separation and searched using SEQUEST. All data, including the native raw instrument and mzXML formats and the PeptideProphet validated peptide assignments, are available at http://regis-web.systemsbiology.net/PublicDatasets/.  相似文献   

8.
In protein identification through tandem mass spectrometry, it is critical to accurately predict the theoretical spectrum for a peptide sequence. The widely used prediction models, such as SEQUEST and MASCOT, ignore the intensity of the ions with important neutral losses, including water loss and ammonia loss. However, ignoring these neutral losses results in a significant deviation between the predicted theoretical spectrum and its experimental counterpart. Here, based on the "one peak, multiple explanations" observation, we proposed an expectation-maximization (EM) method to automatically learn the probabilities of water loss and ammonia loss for each amino acid. Then we employed these probabilities to design an improved statistical model for theoretical spectrum prediction. We implemented these methods and tested them on practical data. On a training set containing 1803 spectra, the experimental results show a good agreement with some known knowledge about neutral losses, such as the tendency of water loss from Asp, Glu, Ser, and Thr. Furthermore, on a testing set containing 941 spectra, the improved similarity between the experimental and predicted spectra demonstrates that this method can generate more reasonable predictions relative to the model that ignores neutral losses. As an application of the derived probabilities, we implemented a database searching method adopting the improved theoretical spectrum model with neutral loss ions estimated. Experimental results on Keller's data set demonstrate that this method can identify peptides more accurately than SEQUEST. In another application to validate SEQUEST's results, the reported peptide-spectrum pairs are reranked with respect to the similarity between experimental and predicted spectra. Experimental results on both LTQ and QSTAR data sets suggest that this reranking strategy can effectively distinguish the false negative predictions reported by SEQUEST.  相似文献   

9.
In high-throughput proteomics the development of computational methods and novel experimental strategies often rely on each other. In certain areas, mass spectrometry methods for data acquisition are ahead of computational methods to interpret the resulting tandem mass spectra. Particularly, although there are numerous situations in which a mixture tandem mass spectrum can contain fragment ions from two or more peptides, nearly all database search tools still make the assumption that each tandem mass spectrum comes from one peptide. Common examples include mixture spectra from co-eluting peptides in complex samples, spectra generated from data-independent acquisition methods, and spectra from peptides with complex post-translational modifications. We propose a new database search tool (MixDB) that is able to identify mixture tandem mass spectra from more than one peptide. We show that peptides can be reliably identified with up to 95% accuracy from mixture spectra while considering only a 0.01% of all possible peptide pairs (four orders of magnitude speedup). Comparison with current database search methods indicates that our approach has better or comparable sensitivity and precision at identifying single-peptide spectra while simultaneously being able to identify 38% more peptides from mixture spectra at significantly higher precision.  相似文献   

10.
The components of complex peptide mixtures can be separated by liquid chromatography, fragmented by tandem mass spectrometry, and identified by the SEQUEST algorithm. Inferring a mixture's source proteins requires that the identified peptides be reassociated. This process becomes more challenging as the number of peptides increases. DTASelect, a new software package, assembles SEQUEST identifications and highlights the most significant matches. The accompanying Contrast tool compares DTASelect results from multiple experiments. The two programs improve the speed and precision of proteomic data analysis.  相似文献   

11.
Sequence determination of peptides is a crucial step in mass spectrometry–based proteomics. Peptide sequences are determined either by database search or by de novo sequencing using tandem mass spectrometry. Determination of all the theoretical expected peptide fragments and eliminating false discoveries remains a challenge in proteomics. Developing standards for evaluating the performance of mass spectrometers and algorithms used for identification of proteins is important for proteomics studies. The current study is focused on these aspects by using synthetic peptides. A total of 599 peptides were designed from in silico tryptic digest with 1 or 2 missed cleavages from 199 human proteins, and synthetic peptides corresponding to these sequences were obtained. The peptides were mixed together, and analysis was carried out using liquid chromatography–electrospray ionization tandem mass spectrometry on a Q-Exactive HF mass spectrometer. The peptides and proteins were identified with SEQUEST program. The analysis was carried out using the proteomics workflows. A total of 573 peptides representing 196 proteins could be identified, and a spectral library was created for these peptides. Analysis parameters such as “no enzyme selection” gave the maximum number of detected peptides as compared with trypsin in the selection. False discoveries could be identified. This study highlights the limitations of peptide detection and the need for developing powerful algorithms along with tools to evaluate mass spectrometers and algorithms. It also shows the limitations of peptide detection even with high-end mass spectrometers. The mass spectral data are available in ProteomeXchange with accession no. PXD017992.  相似文献   

12.
High throughput identification of peptides in databases from tandem mass spectrometry data is a key technique in modern proteomics. Common approaches to interpret large scale peptide identification results are based on the statistical analysis of average score distributions, which are constructed from the set of best scores produced by large collections of MS/MS spectra by using searching engines such as SEQUEST. Other approaches calculate individual peptide identification probabilities on the basis of theoretical models or from single-spectrum score distributions constructed by the set of scores produced by each MS/MS spectrum. In this work, we study the mathematical properties of average SEQUEST score distributions by introducing the concept of spectrum quality and expressing these average distributions as compositions of single-spectrum distributions. We predict and demonstrate in the practice that average score distributions are dominated by the quality distribution in the spectra collection, except in the low probability region, where it is possible to predict the dependence of average probability on database size. Our analysis leads to a novel indicator, the probability ratio, which takes optimally into account the statistical information provided by the first and second best scores. The probability ratio is a non-parametric and robust indicator that makes spectra classification according to parameters such as charge state unnecessary and allows a peptide identification performance, on the basis of false discovery rates, that is better than that obtained by other empirical statistical approaches. The probability ratio also compares favorably with statistical probability indicators obtained by the construction of single-spectrum SEQUEST score distributions. These results make the robustness, conceptual simplicity, and ease of automation of the probability ratio algorithm a very attractive alternative to determine peptide identification confidences and error rates in high throughput experiments.  相似文献   

13.
We describe an integrated workstation for the automated, high-throughput, and conclusive identification of proteins by reverse-phase chromatography electrospray ionization tandem mass spectrometry. The instrumentation consists of a refrigerated autosampler, a submicrobore reverse-phase liquid chromatograph, and an electrospray triple quadrupole mass spectrometer. For protein identification, enzymatic digests of either homogeneous polypeptides or simple protein mixtures were generated and loaded into the autosampler. Samples were sequentially injected every 32 min. Ions of eluting peptides were automatically selected by the mass spectrometer and subjected to collision-induced dissociation. Following each run, the resulting tandem mass spectra were automatically analyzed by SEQUEST, a program that correlates uninterpreted peptide fragmentation patterns with amino acid sequences contained in databases. Protein identification was established by SEQUEST_SUMMARY a program that combines the SEQUEST scores of peptides originating from the same protein and ranks the cumulative results in a short summary. The workstation's performance was demonstrated by the unattended identification of 90 proteins from the yeast Saccharomyces cerevisiae, which were separated by high-resolution two-dimensional PAGE. The system was found to be very robust and identification was reliably and conclusively established for proteins if quantities exceeding 1-5 pmol were applied to the gel. The level of automation, the throughput, and the reliability of the results suggest that this system will be useful for the many projects that require the characterization of large numbers of proteins.  相似文献   

14.
Researchers have several options when designing proteomics experiments. Primary among these are choices of experimental method, instrumentation and spectral interpretation software. To evaluate these choices on a proteome scale, we compared triplicate measurements of the yeast proteome by liquid chromatography tandem mass spectrometry (LC-MS/MS) using linear ion trap (LTQ) and hybrid quadrupole time-of-flight (QqTOF; QSTAR) mass spectrometers. Acquired MS/MS spectra were interpreted with Mascot and SEQUEST algorithms with and without the requirement that all returned peptides be tryptic. Using a composite target decoy database strategy, we selected scoring criteria yielding 1% estimated false positive identifications at maximum sensitivity for all data sets, allowing reasonable comparisons between them. These comparisons indicate that Mascot and SEQUEST yield similar results for LTQ-acquired spectra but less so for QSTAR spectra. Furthermore, low reproducibility between replicate data acquisitions made on one or both instrument platforms can be exploited to increase sensitivity and confidence in large-scale protein identifications.  相似文献   

15.
MOTIVATION: A powerful proteomics methodology couples high-performance liquid chromatography (HPLC) with tandem mass spectrometry and database-search software, such as SEQUEST. Such a set-up, however, produces a large number of spectra, many of which are of too poor quality to be useful. Hence a filter that eliminates poor spectra before the database search can significantly improve throughput and robustness. Moreover, spectra judged to be of high quality, but that cannot be identified by database search, are prime candidates for still more computationally intensive methods, such as de novo sequencing or wider database searches including post-translational modifications. RESULTS: We report on two different approaches to assessing spectral quality prior to identification: binary classification, which predicts whether or not SEQUEST will be able to make an identification, and statistical regression, which predicts a more universal quality metric involving the number of b- and y-ion peaks. The best of our binary classifiers can eliminate over 75% of the unidentifiable spectra while losing only 10% of the identifiable spectra. Statistical regression can pick out spectra of modified peptides that can be identified by a de novo program but not by SEQUEST. In a section of independent interest, we discuss intensity normalization of mass spectra.  相似文献   

16.
毛细管区带电泳/串联质谱联用法鉴定多肽和蛋白质   总被引:11,自引:3,他引:8  
建立了毛细管区带电泳-串联质谱联用(CZE/MS/MS)对多肽和蛋白质高灵敏度鉴定方法,对Met-脑啡肽和Leu-脑啡肽的混合物进行了分析,用CZE/MS/MS方法验证了各自的序列,同样对细胞色素c的胰蛋白酶酶解产物用CZE/MS/MS方法进行了肽质谱分析,几科所有肽段的序列及其与在分子中的位置都得到了确定,通过SEQUEST软件进行蛋白质序列数据库搜索得到准确的鉴定结果,所消耗的样品量均在低皮可  相似文献   

17.
The high-throughput nature of proteomics mass spectrometry is enabled by a productive combination of data acquisition protocols and the computational tools used to interpret the resulting spectra. One of the key components in mainstream protocols is the generation of tandem mass (MS/MS) spectra by peptide fragmentation using collision induced dissociation, the approach currently used in the large majority of proteomics experiments to routinely identify hundreds to thousands of proteins from single mass spectrometry runs. Complementary to these, alternative peptide fragmentation methods such as electron capture/transfer dissociation and higher-energy collision dissociation have consistently achieved significant improvements in the identification of certain classes of peptides, proteins, and post-translational modifications. Recognizing these advantages, mass spectrometry instruments now conveniently support fine-tuned methods that automatically alternate between peptide fragmentation modes for either different types of peptides or for acquisition of multiple MS/MS spectra from each peptide. But although these developments have the potential to substantially improve peptide identification, their routine application requires corresponding adjustments to the software tools and procedures used for automated downstream processing. This review discusses the computational implications of alternative and alternate modes of MS/MS peptide fragmentation and addresses some practical aspects of using such protocols for identification of peptides and post-translational modifications.  相似文献   

18.
The sequence tag-based peptide identification methods are a promising alternative to the traditional database search approach. However, a more comprehensive analysis, optimization, and comparison with established methods are necessary before these methods can gain widespread use in the proteomics community. Using the InsPecT open source code base ( Tanner et al., Anal. Chem. 2005, 77, 4626- 39 ), we present an improved sequence tag generation method that directly incorporates multicharged fragment ion peaks present in many tandem mass spectra of higher charge states. We also investigate the performance of sequence tagging under different settings using control data sets generated on five different types of mass spectrometers, as well as using a complex phosphopeptide-enriched sample. We also demonstrate that additional modeling of InsPecT search scores using a semiparametric approach incorporating the accuracy of the precursor ion mass measurement provides additional improvement in the ability to discriminate between correct and incorrect peptide identifications. The overall superior performance of the sequence tag-based peptide identification method is demonstrated by comparison with a commonly used SEQUEST/PeptideProphet approach.  相似文献   

19.
The utility and advantages of the recently introduced two-dimensional quadrupole ion trap mass spectrometer in proteomics over the traditional three-dimensional ion trap mass spectrometer have not been systematically characterized. Here we rigorously compared the performance of these two platforms by using over 100,000 tandem mass spectra acquired with identical complex peptide mixtures and acquisition parameters. Specifically we compared four factors that are critical for a successful proteomic study: 1) the number of proteins identified, 2) sequence coverage or the number of peptides identified for every protein, 3) the data base matching SEQUEST X(corr) and S(p) score, and 4) the quality of the fragment ion series of peptides. We found a 4-6-fold increase in the number of peptides and proteins identified on the two-dimensional ion trap mass spectrometer as a direct result of improvement in all the other parameters examined. Interestingly more than 70% of the doubly and triply charged peptides, but not the singly charged peptides, showed better quality of fragmentation spectra on the two-dimensional ion trap. These results highlight specific advantages of the two-dimensional ion trap over the conventional three-dimensional ion traps for protein identification in proteomic experiments.  相似文献   

20.
Time-consuming and experience-dependent manual validations of tandem mass spectra are usually applied to SEQUEST results. This inefficient method has become a significant bottleneck for MS/MS data processing. Here we introduce a program AMASS (advanced mass spectrum screener), which can filter the tandem mass spectra of SEQUEST results by measuring the match percentage of high-abundant ions and the continuity of matched fragment ions in b, y series. Compared with Xcorr and DeltaCn filter, AMASS can increase the number of positives and reduce the number of negatives in 22 datasets generated from 18 known protein mixtures. It effectively removed most noisy spectra, false interpretations, and about half of poor fragmentation spectra, and AMASS can work synergistically with Rscore filter. We believe the use of AMASS and Rscore can result in a more accurate identification of peptide MS/MS spectra and reduce the time and energy for manual validation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号