首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Peptide mass fingerprinting, regardless of becoming complementary to tandem mass spectrometry for protein identification, is still the subject of in-depth study because of its higher sample throughput, higher level of specificity for single peptides and lower level of sensitivity to unexpected post-translational modifications compared with tandem mass spectrometry. In this study, we propose, implement and evaluate a uniform approach using support vector machines to incorporate individual concepts and conclusions for accurate PMF. We focus on the inherent attributes and critical issues of the theoretical spectrum (peptides), the experimental spectrum (peaks) and spectrum (masses) alignment. Eighty-one feature-matching patterns derived from cleavage type, uniqueness and variable masses of theoretical peptides together with the intensity rank of experimental peaks were proposed to characterize the matching profile of the peptide mass fingerprinting procedure. We developed a new strategy including the participation of matched peak intensity redistribution to handle shared peak intensities and 440 parameters were generated to digitalize each feature-matching pattern. A high performance for an evaluation data set of 137 items was finally achieved by the optimal multi-criteria support vector machines approach, with 491 final features out of a feature vector of 35,640 normalized features through cross training and validating a publicly available "gold standard" peptide mass fingerprinting data set of 1733 items. Compared with the Mascot, MS-Fit, ProFound and Aldente algorithms commonly used for MS-based protein identification, the feature-matching patterns algorithm has a greater ability to clearly separate correct identifications and random matches with the highest values for sensitivity (82%), precision (97%) and F1-measure (89%) of protein identification. Several conclusions reached via this research make general contributions to MS-based protein identification. Firstly, inherent attributes showed comparable or even greater robustness than other explicit. As an inherent attribute of an experimental spectrum, peak intensity should receive considerable attention during protein identification. Secondly, alignment between intense experimental peaks and properly digested, unique or non-modified theoretical peptides is very likely to occur in positive peptide mass fingerprinting. Finally, normalization by several types of harmonic factors, including missed cleavages and mass modification, can make important contributions to the performance of the procedure.  相似文献   

2.
Peptide mass fingerprinting (PMF) is among the principle methods of contemporary proteomic analysis. While PMF is routinely practiced in many laboratories, the complexity of protein tryptic digests is such that PMF based on unrefined mass spectrometric peak lists is often inconclusive. A number of data processing strategies have thus been designed to improve the quality of PMF peak lists, and the development of increasingly elaborate tools for PMF data reduction remains an active area of research. In this report, a novel and direct means of PMF peak list enhancement is suggested. Since the monoisotopic mass of a peptide must fall within a predictable range of residual values, PMF peak lists can in principle be relieved of many non-peptide signals solely on the basis of accurately determined monoisotopic mass. The calculations involved are relatively simple, making implementation of this scheme computationally facile. When this procedure for peak list processing was used, the large number of unassigned masses typical of PMF peak lists was considerably attenuated. As a result, protein identifications could be made with greater confidence and improved discrimination as compared to PMF queries submitted with raw peak lists. Importantly, this scheme for removal of non-peptide masses was found to conserve peptides bearing various post-translational and artificial modifications. All PMF experiments discussed here were performed using Fourier transform ion cyclotron resonance mass spectrometry (FTICR-MS), which provided the high mass resolution and high mass accuracy essential for this application. Previously reported equations relating the nominal peptide mass to the permissible range of fractional peptide masses were slightly modified for this application, and these adjustments have been illustrated in detail. The role of mass accuracy in application of this scheme has also been explored.  相似文献   

3.
For the identification of peptides with tandem mass spectrometry (MS/MS), many software tools rely on the comparison between an experimental spectrum and a theoretically predicted spectrum. Consequently, the accurate prediction of the theoretical spectrum from a peptide sequence can potentially improve the peptide identification performance and is an important problem for mass spectrometry based proteomics. In this study a new approach, called MS-Simulator, is presented for predicting the y-ion intensities in the spectrum of a given peptide. The new approach focuses on the accurate prediction of the relative intensity ratio between every two adjacent y-ions. The theoretical spectrum can then be derived from these ratios. The prediction of a ratio is a closed-form equation that involves up to five consecutive amino acids nearby the two y-ions and the two peptide termini. Compared with another existing spectrum prediction tool MassAnalyzer, the new approach not only simplifies the computation, but also improves the prediction accuracy.  相似文献   

4.
Ding Q  Xiao L  Xiong S  Jia Y  Que H  Guo Y  Liu S 《Proteomics》2003,3(7):1313-1317
Unmatched masses are often observed in the experimental peptide mass spectra when database searching is performed with the ProFound program. Comparison between theoretical and experimental mass spectra of standard proteins shows that contamination accounts for most of the unmatched masses. In this retrospective analysis, the top 100 most probable contaminating masses, as listed in order of their probability, are statistically filtered out from 118 different experimental peptide mass fingerprinting (PMF) maps. Most of the interfering masses originate from trypsin autolysis and human keratins. Subtraction of known contaminants from raw data and using cleaner masses for searching can enhance protein identification by PMF.  相似文献   

5.
Tandem mass spectrometry (MS/MS) has emerged as a cornerstone of proteomics owing in part to robust spectral interpretation algorithms. Widely used algorithms do not fully exploit the intensity patterns present in mass spectra. Here, we demonstrate that intensity pattern modeling improves peptide and protein identification from MS/MS spectra. We modeled fragment ion intensities using a machine-learning approach that estimates the likelihood of observed intensities given peptide and fragment attributes. From 1,000,000 spectra, we chose 27,000 with high-quality, nonredundant matches as training data. Using the same 27,000 spectra, intensity was similarly modeled with mismatched peptides. We used these two probabilistic models to compute the relative likelihood of an observed spectrum given that a candidate peptide is matched or mismatched. We used a 'decoy' proteome approach to estimate incorrect match frequency, and demonstrated that an intensity-based method reduces peptide identification error by 50-96% without any loss in sensitivity.  相似文献   

6.
The high-throughput nature of proteomics mass spectrometry is enabled by a productive combination of data acquisition protocols and the computational tools used to interpret the resulting spectra. One of the key components in mainstream protocols is the generation of tandem mass (MS/MS) spectra by peptide fragmentation using collision induced dissociation, the approach currently used in the large majority of proteomics experiments to routinely identify hundreds to thousands of proteins from single mass spectrometry runs. Complementary to these, alternative peptide fragmentation methods such as electron capture/transfer dissociation and higher-energy collision dissociation have consistently achieved significant improvements in the identification of certain classes of peptides, proteins, and post-translational modifications. Recognizing these advantages, mass spectrometry instruments now conveniently support fine-tuned methods that automatically alternate between peptide fragmentation modes for either different types of peptides or for acquisition of multiple MS/MS spectra from each peptide. But although these developments have the potential to substantially improve peptide identification, their routine application requires corresponding adjustments to the software tools and procedures used for automated downstream processing. This review discusses the computational implications of alternative and alternate modes of MS/MS peptide fragmentation and addresses some practical aspects of using such protocols for identification of peptides and post-translational modifications.  相似文献   

7.
Peptide identification by tandem mass spectrometry is the dominant proteomics workflow for protein characterization in complex samples. The peptide fragmentation spectra generated by these workflows exhibit characteristic fragmentation patterns that can be used to identify the peptide. In other fields, where the compounds of interest do not have the convenient linear structure of peptides, fragmentation spectra are identified by comparing new spectra with libraries of identified spectra, an approach called spectral matching. In contrast to sequence-based tandem mass spectrometry search engines used for peptides, spectral matching can make use of the intensities of fragment peaks in library spectra to assess the quality of a match. We evaluate a hidden Markov model approach (HMMatch) to spectral matching, in which many examples of a peptide's fragmentation spectrum are summarized in a generative probabilistic model that captures the consensus and variation of each peak's intensity. We demonstrate that HMMatch has good specificity and superior sensitivity, compared to sequence database search engines such as X!Tandem. HMMatch achieves good results from relatively few training spectra, is fast to train, and can evaluate many spectra per second. A statistical significance model permits HMMatch scores to be compared with each other, and with other peptide identification tools, on a unified scale. HMMatch shows a similar degree of concordance with X!Tandem, Mascot, and NIST's MS Search, as they do with each other, suggesting that each tool can assign peptides to spectra that the others miss. Finally, we show that it is possible to extrapolate HMMatch models beyond a single peptide's training spectra to the spectra of related peptides, expanding the application of spectral matching techniques beyond the set of peptides previously observed.  相似文献   

8.
Despite a recent surge of interest in database-independent peptide identifications, accurate de novo peptide sequencing remains an elusive goal. While the recently introduced spectral network approach resulted in accurate peptide sequencing in low-complexity samples, its success depends on the chance of presence of spectra from overlapping peptides. On the other hand, while multistage mass spectrometry (collecting multiple MS 3 spectra from each MS 2 spectrum) can be applied to all spectra in a complex sample, there are currently no software tools for de novo peptide sequencing by multistage mass spectrometry. We describe a rigorous probabilistic framework for analyzing spectra of overlapping peptides and show how to apply it for multistage mass spectrometry. Our software results in both accurate de novo peptide sequencing from multistage mass spectra (despite the inferior quality of MS 3 spectra) and improved interpretation of spectral networks. We further study the problem of de novo peptide sequencing with accurate parent mass (but inaccurate fragment masses), the protocol that may soon become the dominant mode of spectral acquisition. Most existing peptide sequencing algorithms (based on the spectrum graph approach) do not track the accurate parent mass and are thus not equipped for solving this problem. We describe a de novo peptide sequencing algorithm aimed at this experimental protocol and show that it improves the sequencing accuracy on both tandem and multistage mass spectrometry.  相似文献   

9.
Protein identification by matrix-assisted laser desorption/ionization mass-spectrometry peptide mass fingerprinting (MALDI-MS PMF) represents a cornerstone of proteomics. However, it often fails to identify low-molecular-mass proteins, protein fragments, and protein mixtures reliably. To overcome these limitations, PMF can be complemented by tandem mass spectrometry and other search strategies for unambiguous protein identification. The present study explores the advantages of using a MALDI-MS-based approach, designated minimal protein identifier (MPI) approach, for protein identification. This is illustrated for culture supernatant (CSN) proteins of Mycobacterium tuberculosis H37Rv after separation by two-dimensional gel electrophoresis (2-DE). The MPI approach takes into consideration that proteins yield characteristic peptides upon proteolytic cleavage. In this study, peptide mixtures derived from tryptic protein cleavage were analyzed by MALDI-MS and the resulting spectra were compared with template spectra of previously identified counterparts. The MPI approach allowed protein identification by few protein-specific signature peptide masses and revealed truncated variants of mycobacterial elongation factor EF-Tu, previously not identified by PMF. Furthermore, the MPI approach can be employed to track proteins in 2-DE gels, as demonstrated for the 14 kDa antigen, the 10 kDa chaperone, and the conserved hypothetical protein Rv0569 of M. tuberculosis H37Rv. Furthermore, it is shown that the power of the MPI approach strongly depends on distinct factors, most notably on the complexity of the proteome analyzed and accuracy of the mass spectrometer used for peptide mass determination.  相似文献   

10.
In high-throughput proteomics the development of computational methods and novel experimental strategies often rely on each other. In certain areas, mass spectrometry methods for data acquisition are ahead of computational methods to interpret the resulting tandem mass spectra. Particularly, although there are numerous situations in which a mixture tandem mass spectrum can contain fragment ions from two or more peptides, nearly all database search tools still make the assumption that each tandem mass spectrum comes from one peptide. Common examples include mixture spectra from co-eluting peptides in complex samples, spectra generated from data-independent acquisition methods, and spectra from peptides with complex post-translational modifications. We propose a new database search tool (MixDB) that is able to identify mixture tandem mass spectra from more than one peptide. We show that peptides can be reliably identified with up to 95% accuracy from mixture spectra while considering only a 0.01% of all possible peptide pairs (four orders of magnitude speedup). Comparison with current database search methods indicates that our approach has better or comparable sensitivity and precision at identifying single-peptide spectra while simultaneously being able to identify 38% more peptides from mixture spectra at significantly higher precision.  相似文献   

11.
The identification of peptides and proteins from fragmentation mass spectra is a very common approach in the field of proteomics. Contemporary high-throughput peptide identification pipelines can quickly produce large quantities of MS/MS data that contain valuable knowledge about the actual physicochemical processes involved in the peptide fragmentation process, which can be extracted through extensive data mining studies. As these studies attempt to exploit the intensity information contained in the MS/MS spectra, a critical step required for a meaningful comparison of this information between MS/MS spectra is peak intensity normalization. We here describe a procedure for quantifying the efficiency of different published normalization methods in terms of the quartile coefficient of dispersion (qcod) statistic. The quartile coefficient of dispersion is applied to measure the dispersion of the peak intensities between redundant MS/MS spectra, allowing the quantification of the differences in computed peak intensity reproducibility between the different normalization methods. We demonstrate that our results are independent of the data set used in the evaluation procedure, allowing us to provide generic guidance on the choice of normalization method to apply in a certain MS/MS pipeline application.  相似文献   

12.

Background

We developed a new version of the open source software package Peptrix that can yet compare large numbers of Orbitrap? LC-MS data. The peptide profiling results for Peptrix on MS1 spectra were compared with those obtained from a small selection of open source and commercial software packages: msInspect, Sieve? and Progenesis?. The properties compared in these packages were speed, total number of detected masses, redundancy of masses, reproducibility in numbers and CV of intensity, overlap of masses, and differences in peptide peak intensities. Reproducibility measurements were taken for the different MS1 software applications by measuring in triplicate a complex peptide mixture of immunoglobulin on the Orbitrap? mass spectrometer. Values of peptide masses detected from the high intensity peaks of the MS1 spectra by peptide profiling were verified with values of the MS2 fragmented and sequenced masses that resulted in protein identifications with a significant score.

Findings

Peptrix finds about the same number of peptide features as the other packages, but peptide masses are in some cases approximately 5 to 10 times less redundant present in the peptide profile matrix. The Peptrix profile matrix displays the largest overlap when comparing the number of masses in a pair between two software applications. The overlap of peptide masses between software packages of low intensity peaks in the spectra is remarkably low with about 50% of the detected masses in the individual packages. Peptrix does not differ from the other packages in detecting 96% of the masses that relate to highly abundant sequenced proteins. MS1 peak intensities vary between the applications in a non linear way as they are not processed using the same method.

Conclusions

Peptrix is capable of peptide profiling using Orbitrap? files and finding differential expressed peptides in body fluid and tissue samples. The number of peptide masses detected in Orbitrap? files can be increased by using more MS1 peptide profiling applications, including Peptrix, since it appears from the comparison of Peptrix with the other applications that all software packages have likely a high false negative rate of low intensity peptide peaks (missing peptides).  相似文献   

13.
The development of high throughput utilities to identify proteins is a major challenge in present research in the field of proteomics. One such utility, the molecular scanner, uses proteins separated by two-dimensional polyacrylamide gel electrophoresis that are digested in the gel and during transfer onto a collecting membrane. After adding a matrix, the membrane is inserted into a matrix-assisted laser desorption/ionization-time of flight mass spectrometer and a peptide mass fingerprint (PMF) is measured for every scanned site. Since the spacing between scanned sites is much smaller than the size of the most abundant protein spots, there is a certain redundancy in the data that was used in an earlier experiment with Escherichia coli [1] to improve mass calibration and PMF identification results. It was observed that the signal intensity of a peptide mass as a function of the position on the membrane showed similar patterns if peptides stemmed from the same protein. Taking account of these similarities a clustering algorithm was used to find lists of experimental masses with similar intensity distributions, which provided clearer identification of the corresponding proteins. Here, these methods are applied to a human plasma scan, where proteins were highly modified and less separated. The presence of very abundant proteins like albumin and immunoglobulins added another difficulty. The calibration of the initial PMFs was not satisfactory and masses had to be recalibrated. After discarding chemical noise, the membrane was partitioned into regions and for each region protein identification was carried out separately. A new scoring method was used, where the PMF score was multiplied by a factor that measures the similarity of matching peptides. This method proved to be more robust than the method developed in [1] if the region where a protein was found had an extended, nonspherical shape and strong overlap with regions of other proteins. Many proteins annotated on the SWISS-2D PAGE human plasma master gel could be clearly identified and many interesting properties were observed.  相似文献   

14.
Separation and identification of hydrophobic membrane proteins is a major challenge in proteomics. Identification of such sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE)-separated proteins by peptide mass fingerprinting (PMF) via matrix-assisted laser desorption/ionization-time of flight (MALDI-TOF) is frequently hampered by the insufficient amount of peptides being generated and their low signal intensity. Using the seven helical transmembrane-spanning proton pump bacteriorhodopsin as model protein, we demonstrate here that SDS removal from hydrophobic proteins by ion-pair extraction prior to in-gel tryptic proteolysis leads to a tenfold higher sensitivity in mass spectrometric identification via PMF, with respect to initial protein load on SDS-PAGE. Furthermore, parallel sequencing of the generated peptides by electrospray ionization-mass spectrometry (ESI-MS) and tandem mass spectrometry (MS/MS) was possible without further sample cleanup. We also show identification of other membrane proteins by this protocol, as proof of general applicability.  相似文献   

15.

Background  

Peptide Mass Fingerprinting (PMF) is a widely used mass spectrometry (MS) method of analysis of proteins and peptides. It relies on the comparison between experimentally determined and theoretical mass spectra. The PMF process requires calibration, usually performed with external or internal calibrants of known molecular masses.  相似文献   

16.
Timely classification and identification of bacteria is of vital importance in many areas of public health. Mass spectrometry-based methods provide an attractive alternative to well-established microbiologic procedures. Mass spectrometry methods can be characterized by the relatively high speed of acquiring taxonomically relevant information. Gel-free mass spectrometry proteomics techniques allow for rapid fingerprinting of bacterial proteins using matrix-assisted laser desorption/ionization time-of-flight mass spectrometry or, for high-throughput sequencing of peptides from protease-digested cellular proteins, using mass analysis of fragments from collision-induced dissociation of peptide ions. The latter technique uses database searching of product ion mass spectra. A database contains a comprehensive list of protein sequences translated from protein-encoding open reading frames found in bacterial genomes. The results of such searches allow the assignment of experimental peptide sequences to matching theoretical bacterial proteomes. Phylogenetic profiles of sequenced peptides are then used to create a matrix of sequence-to-bacterium assignments, which are analyzed using numerical taxonomy tools. The results thereof reveal the relatedness between bacteria, and allow the taxonomic position of an investigated strain to be inferred.  相似文献   

17.
Timely classification and identification of bacteria is of vital importance in many areas of public health. Mass spectrometry-based methods provide an attractive alternative to well-established microbiologic procedures. Mass spectrometry methods can be characterized by the relatively high speed of acquiring taxonomically relevant information. Gel-free mass spectrometry proteomics techniques allow for rapid fingerprinting of bacterial proteins using matrix-assisted laser desorption/ionization time-of-flight mass spectrometry or, for high-throughput sequencing of peptides from protease-digested cellular proteins, using mass analysis of fragments from collision-induced dissociation of peptide ions. The latter technique uses database searching of product ion mass spectra. A database contains a comprehensive list of protein sequences translated from protein-encoding open reading frames found in bacterial genomes. The results of such searches allow the assignment of experimental peptide sequences to matching theoretical bacterial proteomes. Phylogenetic profiles of sequenced peptides are then used to create a matrix of sequence-to-bacterium assignments, which are analyzed using numerical taxonomy tools. The results thereof reveal the relatedness between bacteria, and allow the taxonomic position of an investigated strain to be inferred.  相似文献   

18.
High‐resolution MS/MS spectra of peptides can be deisotoped to identify monoisotopic masses of peptide fragments. The use of such masses should improve protein identification rates. However, deisotoping is not universally used and its benefits have not been fully explored. Here, MS2‐Deisotoper, a tool for use prior to database search, is used to identify monoisotopic peaks in centroided MS/MS spectra. MS2‐Deisotoper works by comparing the mass and relative intensity of each peptide fragment peak to every other peak of greater mass, and by applying a set of rules concerning mass and intensity differences. After comprehensive parameter optimization, it is shown that MS2‐Deisotoper can improve the number of peptide spectrum matches (PSMs) identified by up to 8.2% and proteins by up to 2.8%. It is effective with SILAC and non‐SILAC MS/MS data. The identification of unique peptide sequences is also improved, increasing the number of human proteoforms by 3.7%. Detailed investigation of results shows that deisotoping increases Mascot ion scores, improves FDR estimation for PSMs, and leads to greater protein sequence coverage. At a peptide level, it is found that the efficacy of deisotoping is affected by peptide mass and charge. MS2‐Deisotoper can be used via a user interface or as a command‐line tool.  相似文献   

19.
For MALDI-TOF mass spectrometry, we show that the intensity of a peptide-ion peak is directly correlated with its sequence, with the residues M, H, P, R, and L having the most substantial effect on ionization. We developed a machine learning approach that exploits this relationship to significantly improve peptide mass fingerprint (PMF) accuracy based on training data sets from both true-positive and false-positive PMF searches. The model's cross-validated accuracy in distinguishing real versus false-positive database search results is 91%, rivaling the accuracy of MS/MS-based protein identification.  相似文献   

20.
We describe an approach to screen large sets of MALDI-MS mass spectra for protein isoforms separated on two-dimensional electrophoresis gels. Mass spectra are matched against each other by utilizing extracted peak mass lists and hierarchical clustering. The output is presented as dendrograms in which protein isoforms cluster together. Clustering could be applied to mass spectra from different sample sets, dates, and instruments, revealed similarities between mass spectra, and was a useful tool to highlight peptide peaks of interest for further investigation. Shared peak masses in a cluster could be identified and were used to create novel peak mass lists suitable for protein identification using peptide mass fingerprinting. Complex mass spectra consisting of more than one protein were deconvoluted using information from other mass spectra in the same cluster. The number of peptide peaks shared between mass spectra in a cluster was typically found to be larger than the number of peaks that matched to calculated peak masses in databases, thus modified peaks are probably among the shared peptides. Clustering increased the number of peaks associated with a given protein.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号