首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Trypsin cleaves exclusively C-terminal to arginine and lysine residues   总被引:2,自引:0,他引:2  
Almost all large-scale projects in mass spectrometry-based proteomics use trypsin to convert protein mixtures into more readily analyzable peptide populations. When searching peptide fragmentation spectra against sequence databases, potentially matching peptide sequences can be required to conform to tryptic specificity, namely, cleavage exclusively C-terminal to arginine or lysine. In many published reports, however, significant numbers of proteins are identified by non-tryptic peptides. Here we use the sub-parts per million mass accuracy of a new ion trap Fourier transform mass spectrometer to achieve more than a 100-fold increased confidence in peptide identification compared with typical ion trap experiments and show that trypsin cleaves solely C-terminal to arginine and lysine. We find that non-tryptic peptides occur only as the C-terminal peptides of proteins and as breakup products of fully tryptic peptides N-terminal to an internal proline. Simulating lower mass accuracy led to a large number of proteins erroneously identified with non-tryptic peptide hits. Our results indicate that such peptide hits in previous studies should be re-examined and that peptide identification should be based on strict trypsin specificity.  相似文献   

2.
Bandeira N 《BioTechniques》2007,42(6):687, 689, 691 passim
Significant technological advances have accelerated high-throughput proteomics to the automated generation of millions of tandem mass spectra on a daily basis. In such a setup, the desire for greater sequence coverage combines with standard experimental procedures to commonly yield multiple tandem mass spectra from overlapping peptides-typical observations include peptides differing by one or two terminal amino acids and spectra from modified and unmodified variants of the same peptides. In a departure from the traditional spectrum identification algorithms that analyze each tandem mass spectrum in isolation, spectral networks define a new computational approach that instead finds and simultaneously interprets sets of spectra from overlapping peptides. In shotgun protein sequencing, spectral networks capitalize on the redundant sequence information in the aligned spectra to deliver the longest and most accurate de novo sequences ever reported for ion trap data. Also, by combining spectra from multiple modified and unmodified variants of the same peptides, spectral networks are able to bypass the dominant guess/confirm approach to the identification of posttranslational modifications and alternatively discover modifications and highly modified peptides directly from experimental data. Open-source implementations of these algorithms may be downloaded from peptide.ucsd.edu.  相似文献   

3.
Traditionally, peptide identification using fragmentation spectra relies on extracting the maximum amount of information from spectra. Using different combinations of small ion masses, we show that identifying a small number of fragment ions in a spectrum is sufficient for peptide identification. We consider y2-, y3-, b2-, and b3-ions and find the combination of b2-y2 to be sufficient for many peptides. Adding either the y3- or the b3-ion increases specificity and allows reliable peptide identification in the human proteome. Fragmentation spectra and peptides are represented as n-dimensional vectors, where n is given by the number of fragment ions considered, and the peptide mass. The identification score is given by the Euclidian distance between the spectra and the matching peptide in n-dimensional space. We show that this approach, using minimal information, allows for precise and fast peptide identification.  相似文献   

4.
Peptide identification by tandem mass spectrometry is the dominant proteomics workflow for protein characterization in complex samples. The peptide fragmentation spectra generated by these workflows exhibit characteristic fragmentation patterns that can be used to identify the peptide. In other fields, where the compounds of interest do not have the convenient linear structure of peptides, fragmentation spectra are identified by comparing new spectra with libraries of identified spectra, an approach called spectral matching. In contrast to sequence-based tandem mass spectrometry search engines used for peptides, spectral matching can make use of the intensities of fragment peaks in library spectra to assess the quality of a match. We evaluate a hidden Markov model approach (HMMatch) to spectral matching, in which many examples of a peptide's fragmentation spectrum are summarized in a generative probabilistic model that captures the consensus and variation of each peak's intensity. We demonstrate that HMMatch has good specificity and superior sensitivity, compared to sequence database search engines such as X!Tandem. HMMatch achieves good results from relatively few training spectra, is fast to train, and can evaluate many spectra per second. A statistical significance model permits HMMatch scores to be compared with each other, and with other peptide identification tools, on a unified scale. HMMatch shows a similar degree of concordance with X!Tandem, Mascot, and NIST's MS Search, as they do with each other, suggesting that each tool can assign peptides to spectra that the others miss. Finally, we show that it is possible to extrapolate HMMatch models beyond a single peptide's training spectra to the spectra of related peptides, expanding the application of spectral matching techniques beyond the set of peptides previously observed.  相似文献   

5.
Protein identification has been greatly facilitated by database searches against protein sequences derived from product ion spectra of peptides. This approach is primarily based on the use of fragment ion mass information contained in a MS/MS spectrum. Unambiguous protein identification from a spectrum with low sequence coverage or poor spectral quality can be a major challenge. We present a two-dimensional (2D) mass spectrometric method in which the numbers of nitrogen atoms in the molecular ion and the fragment ions are used to provide additional discriminating power for much improved protein identification and de novo peptide sequencing. The nitrogen number is determined by analyzing the mass difference of corresponding peak pairs in overlaid spectra of (15)N-labeled and unlabeled peptides. These peptides are produced by enzymatic or chemical cleavage of proteins from cells grown in (15)N-enriched and normal media, respectively. It is demonstrated that, using 2D information, i.e., m/z and its associated nitrogen number, this method can, not only confirm protein identification results generated by MS/MS database searching, but also identify peptides that are not possible to identify by database searching alone. Examples are presented of analyzing Escherichia coli K12 extracts that yielded relatively poor MS/MS spectra, presumably from the digests of low abundance proteins, which can still give positive protein identification using this method. Additionally, this 2D MS method can facilitate spectral interpretation for de novo peptide sequencing and identification of posttranslational or other chemical modifications. We envision that this method should be particularly useful for proteome expression profiling of organelles or cells that can be grown in (15)N-enriched media.  相似文献   

6.
MS2 library spectra are rich in reproducible information about peptide fragmentation patterns compared to theoretical spectra modeled by a sequence search tool. So far, spectrum library searches are mostly applied to detect peptides as they are present in the library. However, they also allow finding modified variants of the library peptides if the search is done with a large precursor mass window and an adapted Spectrum-Spectrum Match (SSM) scoring algorithm. We perform a thorough evaluation on the use of library spectra as opposed to theoretical peptide spectra for the identification of PTMs, analyzing spectra of a well-annotated modification-rich test data set compiled from public data repositories. These initial studies motivate the development of our modification tolerant spectrum library search tool QuickMod, designed to identify modified variants of the peptides listed in the spectrum library without any prior input from the user estimating the modifications present in the sample. We built the search algorithm of QuickMod after carefully testing different SSM similarity scores. The final spectrum scoring scheme uses a support vector machine (SVM) on a selection of scoring features to classify correct and incorrect SSM. After identification of a list of modified peptides at a given False Discovery Rate (FDR), the modifications need to be positioned on the peptide sequence. We present a rapid modification site assignment algorithm and evaluate its positioning accuracy. Finally, we demonstrate that QuickMod performs favorably in terms of speed and identification rate when compared to other software solutions for PTM analysis.  相似文献   

7.
MOTIVATION: Tandem mass spectrometry combined with sequence database searching is one of the most powerful tools for protein identification. As thousands of spectra are generated by a mass spectrometer in one hour, the speed of database searching is critical, especially when searching against a large sequence database, or when the peptide is generated by some unknown or non-specific enzyme, even or when the target peptides have post-translational modifications (PTM). In practice, about 70-90% of the spectra have no match in the database. Many believe that a significant portion of them are due to peptides of non-specific digestions by unknown enzymes or amino acid modifications. In another case, scientists may choose to use some non-specific enzymes such as pepsin or thermolysin for proteolysis in proteomic study, in that not all proteins are amenable to be digested by some site-specific enzymes, and furthermore many digested peptides may not fall within the rang of molecular weight suitable for mass spectrometry analysis. Interpreting mass spectra of these kinds will cost a lot of computational time of database search engines. OVERVIEW: The present study was designed to speed up the database searching process for both cases. More specifically speaking, we employed an approach combining suffix tree data structure and spectrum graph. The suffix tree is used to preprocess the protein sequence database, while the spectrum graph is used to preprocess the tandem mass spectrum. We then search the suffix tree against the spectrum graph for candidate peptides. We design an efficient algorithm to compute a matching threshold with some statistical significance level, e.g. p = 0.01, for each spectrum, and use it to select candidate peptides. Then we rank these peptides using a SEQUEST-like scoring function. The algorithms were implemented and tested on experimental data. For post-translational modifications, we allow arbitrary number of any modification to a protein. AVAILABILITY: The executable program and other supplementary materials are available online at: http://hto-c.usc.edu:8000/msms/suffix/.  相似文献   

8.
Quantitative proteomics relies on accurate protein identification, which often is carried out by automated searching of a sequence database with tandem mass spectra of peptides. When these spectra contain limited information, automated searches may lead to incorrect peptide identifications. It is therefore necessary to validate the identifications by careful manual inspection of the mass spectra. Not only is this task time-consuming, but the reliability of the validation varies with the experience of the analyst. Here, we report a systematic approach to evaluating peptide identifications made by automated search algorithms. The method is based on the principle that the candidate peptide sequence should adequately explain the observed fragment ions. Also, the mass errors of neighboring fragments should be similar. To evaluate our method, we studied tandem mass spectra obtained from tryptic digests of E. coli and HeLa cells. Candidate peptides were identified with the automated search engine Mascot and subjected to the manual validation method. The method found correct peptide identifications that were given low Mascot scores (e.g., 20-25) and incorrect peptide identifications that were given high Mascot scores (e.g., 40-50). The method comprehensively detected false results from searches designed to produce incorrect identifications. Comparison of the tandem mass spectra of synthetic candidate peptides to the spectra obtained from the complex peptide mixtures confirmed the accuracy of the evaluation method. Thus, the evaluation approach described here could help boost the accuracy of protein identification, increase number of peptides identified, and provide a step toward developing a more accurate next-generation algorithm for protein identification.  相似文献   

9.
用于串联质谱鉴定多肽的计量方法   总被引:1,自引:0,他引:1  
目前已有多种对串联质谱与数据库中多肽的理论质谱的一致性进行评估的高通量计量算法用于鸟枪法蛋白质组学 (shotgunproteomics)研究。然而这些方法操作时存在大量错误的多肽鉴定。这里提出一种新的串联质谱识别多肽序列的计量算法。该算法综合考虑了串联质谱中不同离子出现的概率、多肽的酶切位点数、理论离子与实验离子的匹配程度和匹配模式。对大容量的串联质谱数据集的测试表明 ,根据算法开发的软件PepSearch比目前最常用的软件SEQUEST有更好的鉴定准确性。PepSearch可从http : compbio.sibsnet.org projects pepsearch下载。  相似文献   

10.
Searching spectral libraries in MS/MS is an important new approach to improving the quality of peptide and protein identification. The idea relies on the observation that ion intensities in an MS/MS spectrum of a given peptide are generally reproducible across experiments, and thus, matching between spectra from an experiment and the spectra of previously identified peptides stored in a spectral library can lead to better peptide identification compared to the traditional database search. However, the use of libraries is greatly limited by their coverage of peptide sequences: even for well‐studied organisms a large fraction of peptides have not been previously identified. To address this issue, we propose to expand spectral libraries by predicting the MS/MS spectra of peptides based on the spectra of peptides with similar sequences. We first demonstrate that the intensity patterns of dominant fragment ions between similar peptides tend to be similar. In accordance with this observation, we develop a neighbor‐based approach that first selects peptides that are likely to have spectra similar to the target peptide and then combines their spectra using a weighted K‐nearest neighbor method to accurately predict fragment ion intensities corresponding to the target peptide. This approach has the potential to predict spectra for every peptide in the proteome. When rigorous quality criteria are applied, we estimate that the method increases the coverage of spectral libraries available from the National Institute of Standards and Technology by 20–60%, although the values vary with peptide length and charge state. We find that the overall best search performance is achieved when spectral libraries are supplemented by the high quality predicted spectra.  相似文献   

11.
A system for creating a library of tandem mass spectra annotated with corresponding peptide sequences was described. This system was based on the annotated spectra currently available in the Global Proteome Machine Database (GPMDB). The library spectra were created by averaging together spectra that were annotated with the same peptide sequence, sequence modifications, and parent ion charge. The library was constructed so that experimental peptide tandem mass spectra could be compared with those in the library, resulting in a peptide sequence identification based on scoring the similarity of the experimental spectrum with the contents of the library. A software implementation that performs this type of library search was constructed and successfully used to obtain sequence identifications. The annotated tandem mass spectrum libraries for the Homo sapiens, Mus musculus, and Saccharomyces cerevisiae proteomes and search software were made available for download and use by other groups.  相似文献   

12.
Shotgun proteomics experiments are dependent upon database search engines to identify peptides from tandem mass spectra. Many of these algorithms score potential identifications by evaluating the number of fragment ions matched between each peptide sequence and an observed spectrum. These systems, however, generally do not distinguish between matching an intense peak and matching a minor peak. We have developed a statistical model to score peptide matches that is based upon the multivariate hypergeometric distribution. This scorer, part of the "MyriMatch" database search engine, places greater emphasis on matching intense peaks. The probability that the best match for each spectrum has occurred by random chance can be employed to separate correct matches from random ones. We evaluated this software on data sets from three different laboratories employing three different ion trap instruments. Employing a novel system for testing discrimination, we demonstrate that stratifying peaks into multiple intensity classes improves the discrimination of scoring. We compare MyriMatch results to those of Sequest and X!Tandem, revealing that it is capable of higher discrimination than either of these algorithms. When minimal peak filtering is employed, performance plummets for a scoring model that does not stratify matched peaks by intensity. On the other hand, we find that MyriMatch discrimination improves as more peaks are retained in each spectrum. MyriMatch also scales well to tandem mass spectra from high-resolution mass analyzers. These findings may indicate limitations for existing database search scorers that count matched peaks without differentiating them by intensity. This software and source code is available under Mozilla Public License at this URL: http://www.mc.vanderbilt.edu/msrc/bioinformatics/.  相似文献   

13.
The utility and advantages of the recently introduced two-dimensional quadrupole ion trap mass spectrometer in proteomics over the traditional three-dimensional ion trap mass spectrometer have not been systematically characterized. Here we rigorously compared the performance of these two platforms by using over 100,000 tandem mass spectra acquired with identical complex peptide mixtures and acquisition parameters. Specifically we compared four factors that are critical for a successful proteomic study: 1) the number of proteins identified, 2) sequence coverage or the number of peptides identified for every protein, 3) the data base matching SEQUEST X(corr) and S(p) score, and 4) the quality of the fragment ion series of peptides. We found a 4-6-fold increase in the number of peptides and proteins identified on the two-dimensional ion trap mass spectrometer as a direct result of improvement in all the other parameters examined. Interestingly more than 70% of the doubly and triply charged peptides, but not the singly charged peptides, showed better quality of fragmentation spectra on the two-dimensional ion trap. These results highlight specific advantages of the two-dimensional ion trap over the conventional three-dimensional ion traps for protein identification in proteomic experiments.  相似文献   

14.
High‐resolution MS/MS spectra of peptides can be deisotoped to identify monoisotopic masses of peptide fragments. The use of such masses should improve protein identification rates. However, deisotoping is not universally used and its benefits have not been fully explored. Here, MS2‐Deisotoper, a tool for use prior to database search, is used to identify monoisotopic peaks in centroided MS/MS spectra. MS2‐Deisotoper works by comparing the mass and relative intensity of each peptide fragment peak to every other peak of greater mass, and by applying a set of rules concerning mass and intensity differences. After comprehensive parameter optimization, it is shown that MS2‐Deisotoper can improve the number of peptide spectrum matches (PSMs) identified by up to 8.2% and proteins by up to 2.8%. It is effective with SILAC and non‐SILAC MS/MS data. The identification of unique peptide sequences is also improved, increasing the number of human proteoforms by 3.7%. Detailed investigation of results shows that deisotoping increases Mascot ion scores, improves FDR estimation for PSMs, and leads to greater protein sequence coverage. At a peptide level, it is found that the efficacy of deisotoping is affected by peptide mass and charge. MS2‐Deisotoper can be used via a user interface or as a command‐line tool.  相似文献   

15.
Generating all plausible de novo interpretations of a peptide tandem mass (MS/MS) spectrum (Spectral Dictionary) and quickly matching them against the database represent a recently emerged alternative approach to peptide identification. However, the sizes of the Spectral Dictionaries quickly grow with the peptide length making their generation impractical for long peptides. We introduce Gapped Spectral Dictionaries (all plausible de novo interpretations with gaps) that can be easily generated for any peptide length thus addressing the limitation of the Spectral Dictionary approach. We show that Gapped Spectral Dictionaries are small thus opening a possibility of using them to speed-up MS/MS searches. Our MS-Gapped-Dictionary algorithm (based on Gapped Spectral Dictionaries) enables proteogenomics applications (such as searches in the six-frame translation of the human genome) that are prohibitively time consuming with existing approaches. MS-Gapped-Dictionary generates gapped peptides that occupy a niche between accurate but short peptide sequence tags and long but inaccurate full length peptide reconstructions. We show that, contrary to conventional wisdom, some high-quality spectra do not have good peptide sequence tags and introduce gapped tags that have advantages over the conventional peptide sequence tags in MS/MS database searches.  相似文献   

16.
Zhang N  Li XJ  Ye M  Pan S  Schwikowski B  Aebersold R 《Proteomics》2005,5(16):4096-4106
In MS/MS experiments with automated precursor ion, selection only a fraction of sequencing attempts lead to the successful identification of a peptide. A number of reasons may contribute to this situation. They include poor fragmentation of the selected precursor ion, the presence of modified residues in the peptide, mismatches with sequence databases, and frequently, the concurrent fragmentation of multiple precursors in the same CID attempt. Current database search engines are incapable of correctly assigning the sequences of multiple precursors to such spectra. We have developed a search engine, ProbIDtree, which can identify multiple peptides from a CID spectrum generated by the concurrent fragmentation of multiple precursor ions. This is achieved by iterative database searching in which the submitted spectra are generated by subtracting the fragment ions assigned to a tentatively matched peptide from the acquired spectrum and in which each match is assigned a tentative probability score. Tentatively matched peptides are organized in a tree structure from which their adjusted probability scores are calculated and used to determine the correct identifications. The results using MALDI-TOF-TOF MS/MS data demonstrate that multiple peptides can be effectively identified simultaneously with high confidence using ProbIDtree.  相似文献   

17.
Pachl F  Fellenberg K  Wagner C  Kuster B 《Proteomics》2012,12(9):1328-1332
Isobaric tagging using reagents such as tandem mass tags (TMT) and isobaric tags for relative and absolute quantification (iTRAQ) have become popular tools for mass spectrometry based quantitative proteomics. Because the peptide quantification information is collected in tandem mass spectra, the accuracy and precision of this method largely depend on the resolution with which precursor ions can be selected for the fragmentation and the specificity of the generated reporter ion. The latter can constitute an issue if near isobaric ion signals are present in such spectra because they may distort quantification results. We propose a simple remedy for this problem by identifying reporter ions via the accurate mass differences within a single tandem mass spectrum instead of applying fixed mass error tolerances for all tandem mass spectra. Our results show that this leads to unambiguous reporter ion identification and complete removal of interfering signals. This mode of data processing is easily implemented in software and offers advantages for protein quantification based on few peptides.  相似文献   

18.
Phosphorylation has been the most studied of all the posttranslational modifications of proteins. Mass spectrometry has emerged as a powerful tool for phosphomapping on proteins/peptides. Collision-induced dissociation (CID) of phosphopeptides leads to the loss of phosphoric or metaphosphoric acid as a neutral molecule, giving an intense neutral loss product ion in the mass spectrum. Dissociation of the neutral loss product ion identifies peptide sequence. This method of data-dependent constant neutral loss (DDNL) scanning analysis has been commonly used for mapping phosphopeptides. However, preferential losses of groups other than phosphate are frequently observed during CID of phosphopeptides. Ions that result from such losses are not identified during DDNL analysis due to predetermined scanning for phosphate loss. In this study, we describe an alternative approach for improved identification of phosphopeptides by sequential abundant ion fragmentation analysis (SAIFA). In this approach, there is no predetermined neutral loss molecule, thereby undergoing sequential fragmentation of abundant peak, irrespective of the moiety lost during CID. In addition to improved phosphomapping, the method increases the sequence coverage of the proteins identified, thereby increasing the confidence of protein identification. To the best of our knowledge, this is the first report to use SAIFA for phosphopeptide identification.  相似文献   

19.
MOTIVATION: The identification of peptides by tandem mass spectrometry (MS/MS) is a central method of proteomics research, but due to the complexity of MS/MS data and the large databases searched, the accuracy of peptide identification algorithms remains limited. To improve the accuracy of identification we applied a machine-learning approach using a hidden Markov model (HMM) to capture the complex and often subtle links between a peptide sequence and its MS/MS spectrum. Model: Our model, HMM_Score, represents ion types as HMM states and calculates the maximum joint probability for a peptide/spectrum pair using emission probabilities from three factors: the amino acids adjacent to each fragmentation site, the mass dependence of ion types and the intensity dependence of ion types. The Viterbi algorithm is used to calculate the most probable assignment between ion types in a spectrum and a peptide sequence, then a correction factor is added to account for the propensity of the model to favor longer peptides. An expectation value is calculated based on the model score to assess the significance of each peptide/spectrum match. RESULTS: We trained and tested HMM_Score on three data sets generated by two different mass spectrometer types. For a reference data set recently reported in the literature and validated using seven identification algorithms, HMM_Score produced 43% more positive identification results at a 1% false positive rate than the best of two other commonly used algorithms, Mascot and X!Tandem. HMM_Score is a highly accurate platform for peptide identification that works well for a variety of mass spectrometer and biological sample types. AVAILABILITY: The program is freely available on ProteomeCommons via an OpenSource license. See http://bioinfo.unc.edu/downloads/ for the download link.  相似文献   

20.
A database of high-mass accuracy tryptic peptides has been created. The database contains 15 897 unique, annotated MS/MS spectra. It is possible to search for peptides according to their mass, number of missed cleavages, and sequence motifs. All of the data contained in the database is downloadable, and each spectrum can be visualized. An example is presented of how the database can be used for studying peptide fragmentation. Fragmentation of different types of missed cleaved peptides has been studied, and the results can be used to improve identification of these types of peptides.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号