首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 734 毫秒
1.
Gay S  Binz PA  Hochstrasser DF  Appel RD 《Proteomics》2002,2(10):1374-1391
Matrix-assisted laser desorption/ionization-time of flight mass spectrometry has become a valuable tool in proteomics. With the increasing acquisition rate of mass spectrometers, one of the major issues is the development of accurate, efficient and automatic peptide mass fingerprinting (PMF) identification tools. Current tools are mostly based on counting the number of experimental peptide masses matching with theoretical masses. Almost all of them use additional criteria such as isoelectric point, molecular weight, PTMs, taxonomy or enzymatic cleavage rules to enhance prediction performance. However, these identification tools seldom use peak intensities as parameter as there is currently no model predicting the intensities based on the physicochemical properties of peptides. In this work, we used standard datamining methods such as classification and regression methods to find correlations between peak intensities and the properties of the peptides composing a PMF spectrum. These methods were applied on a dataset comprising a series of PMF experiments involving 157 proteins. We found that the C4.5 method gave the more informative results for the classification task (prediction of the presence or absence of a peptide in a spectra) and M5' for the regression methods (prediction of the normalized intensity of a peptide peak). The C4.5 result correctly classified 88% of the theoretical peaks; whereas the M5' peak intensities had a correlation coefficient of 0.6743 with the experimental peak intensities. These methods enabled us to obtain decision and model trees that can be directly used for prediction and identification of PMF results. The work performed permitted to lay the foundations of a method to analyze factors influencing the peak intensity of PMF spectra. A simple extension of this analysis could lead to improve the accuracy of the results by using a larger dataset. Additional peptide characteristics or even PMF experimental parameters can also be taken into account in the datamining process to analyze their influence on the peak intensity. Furthermore, this datamining approach can certainly be extended to the tandem mass spectrometry domain or other mass spectrometry derived methods.  相似文献   

2.
Whereas the bearing of mass measurement error on protein identification is sometimes underestimated, uncertainty in observed peptide masses unavoidably translates to ambiguity in subsequent protein identifications. Although ongoing instrumental advances continue to make high accuracy mass spectrometry (MS) increasingly accessible, many proteomics experiments are still conducted with rather large mass error tolerances. In addition, the ranking schemes of most protein identification algorithms do not include a meaningful incorporation of mass measurement error. This article provides a critical evaluation of mass error tolerance as it pertains to false positive peptide and protein associations resulting from peptide mass fingerprint (PMF) database searching. High accuracy, high resolution PMFs of several model proteins were obtained using matrix-assisted laser desorption/ionization Fourier transform ion cyclotron resonance mass spectrometry (MALDI-FTICR-MS). Varying levels of mass accuracy were simulated by systematically modulating the mass error tolerance of the PMF query and monitoring the effect on figures of merit indicating the PMF quality. Importantly, the benefits of decreased mass error tolerance are not manifest in Mowse scores when operating at tolerances in the low parts-per-million range but become apparent with the consideration of additional metrics that are often overlooked. Furthermore, the outcomes of these experiments support the concept that false discovery is closely tied to mass measurement error in PMF analysis. Clear establishment of this relation demonstrates the need for mass error-aware protein identification routines and argues for a more prominent contribution of high accuracy mass measurement to proteomic science.  相似文献   

3.
4.
Peptide mass fingerprinting (PMF) is among the principle methods of contemporary proteomic analysis. While PMF is routinely practiced in many laboratories, the complexity of protein tryptic digests is such that PMF based on unrefined mass spectrometric peak lists is often inconclusive. A number of data processing strategies have thus been designed to improve the quality of PMF peak lists, and the development of increasingly elaborate tools for PMF data reduction remains an active area of research. In this report, a novel and direct means of PMF peak list enhancement is suggested. Since the monoisotopic mass of a peptide must fall within a predictable range of residual values, PMF peak lists can in principle be relieved of many non-peptide signals solely on the basis of accurately determined monoisotopic mass. The calculations involved are relatively simple, making implementation of this scheme computationally facile. When this procedure for peak list processing was used, the large number of unassigned masses typical of PMF peak lists was considerably attenuated. As a result, protein identifications could be made with greater confidence and improved discrimination as compared to PMF queries submitted with raw peak lists. Importantly, this scheme for removal of non-peptide masses was found to conserve peptides bearing various post-translational and artificial modifications. All PMF experiments discussed here were performed using Fourier transform ion cyclotron resonance mass spectrometry (FTICR-MS), which provided the high mass resolution and high mass accuracy essential for this application. Previously reported equations relating the nominal peptide mass to the permissible range of fractional peptide masses were slightly modified for this application, and these adjustments have been illustrated in detail. The role of mass accuracy in application of this scheme has also been explored.  相似文献   

5.
Protein identification by matrix-assisted laser desorption/ionization mass-spectrometry peptide mass fingerprinting (MALDI-MS PMF) represents a cornerstone of proteomics. However, it often fails to identify low-molecular-mass proteins, protein fragments, and protein mixtures reliably. To overcome these limitations, PMF can be complemented by tandem mass spectrometry and other search strategies for unambiguous protein identification. The present study explores the advantages of using a MALDI-MS-based approach, designated minimal protein identifier (MPI) approach, for protein identification. This is illustrated for culture supernatant (CSN) proteins of Mycobacterium tuberculosis H37Rv after separation by two-dimensional gel electrophoresis (2-DE). The MPI approach takes into consideration that proteins yield characteristic peptides upon proteolytic cleavage. In this study, peptide mixtures derived from tryptic protein cleavage were analyzed by MALDI-MS and the resulting spectra were compared with template spectra of previously identified counterparts. The MPI approach allowed protein identification by few protein-specific signature peptide masses and revealed truncated variants of mycobacterial elongation factor EF-Tu, previously not identified by PMF. Furthermore, the MPI approach can be employed to track proteins in 2-DE gels, as demonstrated for the 14 kDa antigen, the 10 kDa chaperone, and the conserved hypothetical protein Rv0569 of M. tuberculosis H37Rv. Furthermore, it is shown that the power of the MPI approach strongly depends on distinct factors, most notably on the complexity of the proteome analyzed and accuracy of the mass spectrometer used for peptide mass determination.  相似文献   

6.
Lester PJ  Hubbard SJ 《Proteomics》2002,2(10):1392-1405
Peptide mass fingerprinting (PMF) remains the most amenable technique for protein identification in proteomics, using mass spectrometry as the primary analytical technique coupled with bioinformatics. This relies on the presence of the amino acid sequence of the protein in the current databanks. Despite this, it is desirable to be able to use the technique for organisms whose genomes are not yet fully sequenced and apply cross-species protein identification. In this study, we have re-examined the feasibility of such approaches by considering the extent of protein similarity between genome sequences using a data set of 29 complete bacterial and two eukaryotic genomes. A range of protein and peptide features are considered, including protein isoelectric focussing point, protein mass, and amino acid conservation. The effectiveness of PMF approaches has then been tested with a series of computer simulations with varying peptide number and mass accuracy for several cross-species tests. The results show that PMF alone is unsuitable in general for divergent species jumps, or when protein similarity is less than 70% identity. Despite this, there exists a considerable enrichment above random of tryptic peptide conservation and PMF promises to remain useful when combined with other data than just peptide masses for cross-species protein identification.  相似文献   

7.
In this study, we present a preprocessing method for quadrupole time-of-flight (Q-TOF) tandem mass spectra to increase the accuracy of database searching for peptide (protein) identification. Based on the natural isotopic information inherent in tandem mass spectra, we construct a decision tree after feature selection to classify the noise and ion peaks in tandem spectra. Furthermore, we recognize overlapping peaks to find the monoisotopic masses of ions for the following identification process. The experimental results show that this preprocessing method increases the search speed and the reliability of peptide identification.  相似文献   

8.
Xiong S  Ding Q  Zhao Z  Chen W  Wang G  Liu S 《Proteomics》2003,3(3):265-272
High detection sensitivity and resolution are two critical parameters for recording good peptide mass fingerprints (PMF) of low abundance proteins. This paper reports a mass spectrometry (MS) sample preparation technique that could improve sensitivity and resolution. By coating the MS steel target with a thin layer of pentadecafluorooctamido propyltrimethoxysilane, which was both polar and nonpolar solvent repellent, the transferred sample droplets on its surface were significantly smaller. As a result, the analyte of the peptide mixture became more concentrated and homogeneous, which helped to improve the sensitivity. The advantages of a modified MS target were documented by mass spectra improvement of attomole level standard peptides and silver-stained proteins from polyacrylamide gels. The mass signal of angiotensin II at 100 attomole was difficult to record on the conventional support, whereas it was easily detected on the modified one. The PMF of cytochrome C was also better recorded on the modified support, in terms of both signal-to-noise ratio and the number of detected peptides. When silver-stained proteins from two-dimensional electrophoresis gels were analyzed, in most cases more satisfactory peptide mass spectra were obtained from the modified support. Searching protein databases with more mass data from the improved PMFs, several unknown proteins were successfully identified.  相似文献   

9.

Background  

Protein identification based on mass spectrometry (MS) has previously been performed using peptide mass fingerprinting (PMF) or tandem MS (MS/MS) database searching. However, these methods cannot identify proteins that are not already listed in existing databases. Moreover, the alternative approach of de novo sequencing requires costly equipment and the interpretation of complex MS/MS spectra. Thus, there is a need for novel high-throughput protein-identification methods that are independent of existing predefined protein databases.  相似文献   

10.
Motivation: Peptide mass fingerprinting (PMF) is a method for protein identification in which a protein is fragmented by a defined cleavage protocol (usually proteolysis with trypsin), and the masses of these products constitute a 'fingerprint' that can be searched against theoretical fingerprints of all known proteins. In the first stage of PMF, the raw mass spectrometric data are processed to generate a peptide mass list. In the second stage this protein fingerprint is used to search a database of known proteins for the best protein match. Although current software solutions can typically deliver a match in a relatively short time, a system that can find a match in real time could change the way in which PMF is deployed and presented. In a paper published earlier we presented a hardware design of a raw mass spectra processor that, when implemented in Field Programmable Gate Array (FPGA) hardware, achieves almost 170-fold speed gain relative to a conventional software implementation running on a dual processor server. In this article we present a complementary hardware realization of a parallel database search engine that, when running on a Xilinx Virtex 2 FPGA at 100 MHz, delivers 1800-fold speed-up compared with an equivalent C software routine, running on a 3.06 GHz Xeon workstation. The inherent scalability of the design means that processing speed can be multiplied by deploying the design on multiple FPGAs. The database search processor and the mass spectra processor, running on a reconfigurable computing platform, provide a complete real-time PMF protein identification solution.  相似文献   

11.
Separation of proteins by two-dimensional gel electrophoresis (2-DE) coupled with identification of proteins through peptide mass fingerprinting (PMF) by matrix-assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-TOF MS) is the widely used technique for proteomic analysis. This approach relies, however, on the presence of the proteins studied in public-accessible protein databases or the availability of annotated genome sequences of an organism. In this work, we investigated the reliability of using raw genome sequences for identifying proteins by PMF without the need of additional information such as amino acid sequences. The method is demonstrated for proteomic analysis of Klebsiella pneumoniae grown anaerobically on glycerol. For 197 spots excised from 2-DE gels and submitted for mass spectrometric analysis 164 spots were clearly identified as 122 individual proteins. 95% of the 164 spots can be successfully identified merely by using peptide mass fingerprints and a strain-specific protein database (ProtKpn) constructed from the raw genome sequences of K. pneumoniae. Cross-species protein searching in the public databases mainly resulted in the identification of 57% of the 66 high expressed protein spots in comparison to 97% by using the ProtKpn database. 10 dha regulon related proteins that are essential for the initial enzymatic steps of anaerobic glycerol metabolism were successfully identified using the ProtKpn database, whereas none of them could be identified by cross-species searching. In conclusion, the use of strain-specific protein database constructed from raw genome sequences makes it possible to reliably identify most of the proteins from 2-DE analysis simply through peptide mass fingerprinting.  相似文献   

12.
The development of high throughput utilities to identify proteins is a major challenge in present research in the field of proteomics. One such utility, the molecular scanner, uses proteins separated by two-dimensional polyacrylamide gel electrophoresis that are digested in the gel and during transfer onto a collecting membrane. After adding a matrix, the membrane is inserted into a matrix-assisted laser desorption/ionization-time of flight mass spectrometer and a peptide mass fingerprint (PMF) is measured for every scanned site. Since the spacing between scanned sites is much smaller than the size of the most abundant protein spots, there is a certain redundancy in the data that was used in an earlier experiment with Escherichia coli [1] to improve mass calibration and PMF identification results. It was observed that the signal intensity of a peptide mass as a function of the position on the membrane showed similar patterns if peptides stemmed from the same protein. Taking account of these similarities a clustering algorithm was used to find lists of experimental masses with similar intensity distributions, which provided clearer identification of the corresponding proteins. Here, these methods are applied to a human plasma scan, where proteins were highly modified and less separated. The presence of very abundant proteins like albumin and immunoglobulins added another difficulty. The calibration of the initial PMFs was not satisfactory and masses had to be recalibrated. After discarding chemical noise, the membrane was partitioned into regions and for each region protein identification was carried out separately. A new scoring method was used, where the PMF score was multiplied by a factor that measures the similarity of matching peptides. This method proved to be more robust than the method developed in [1] if the region where a protein was found had an extended, nonspherical shape and strong overlap with regions of other proteins. Many proteins annotated on the SWISS-2D PAGE human plasma master gel could be clearly identified and many interesting properties were observed.  相似文献   

13.
Protein identification via peptide mass fingerprinting (PMF) remains a key component of high-throughput proteomics experiments in post-genomic science. Candidate protein identifications are made using bioinformatic tools from peptide peak lists obtained via mass spectrometry (MS). These algorithms rely on several search parameters, including the number of potential uncut peptide bonds matching the primary specificity of the hydrolytic enzyme used in the experiment. Typically, up to one of these "missed cleavages" are considered by the bioinformatics search tools, usually after digestion of the in silico proteome by trypsin. Using two distinct, nonredundant datasets of peptides identified via PMF and tandem MS, a simple predictive method based on information theory is presented which is able to identify experimentally defined missed cleavages with up to 90% accuracy from amino acid sequence alone. Using this simple protocol, we are able to "mask" candidate protein databases so that confident missed cleavage sites need not be considered for in silico digestion. We show that that this leads to an improvement in database searching, with two different search engines, using the PMF dataset as a test set. In addition, the improved approach is also demonstrated on an independent PMF data set of known proteins that also has corresponding high-quality tandem MS data, validating the protein identifications. This approach has wider applicability for proteomics database searching, and the program for predicting missed cleavages and masking Fasta-formatted protein sequence databases has been made available via http:// ispider.smith.man.ac uk/MissedCleave.  相似文献   

14.
Eriksson J  Fenyö D 《Proteomics》2002,2(3):262-270
A rapid and accurate method for testing the significance of protein identities determined by mass spectrometric analysis of protein digests and genome database searching is presented. The method is based on direct computation using a statistical model of the random matching of measured and theoretical proteolytic peptide masses. Protein identification algorithms typically rank the proteins of a genome database according to a score based on the number of matches between the masses obtained by mass spectrometry analysis and the theoretical proteolytic peptide masses of a database protein. The random matching of experimental and theoretical masses can cause false results. A result is significant only if the score characterizing the result deviates significantly from the score expected from a false result. A distribution of the score (number of matches) for random (false) results is computed directly from our model of the random matching, which allows significance testing under any experimental and database search constraints. In order to mimic protein identification data quality in large-scale proteome projects, low-to-high quality proteolytic peptide mass data were generated in silico and subsequently submitted to a database search program designed to include significance testing based on direct computation. This simulation procedure demonstrates the usefulness of direct significance testing for automatically screening for samples that must be subjected to peptide sequence analysis by e.g. tandem mass spectrometry in order to determine the protein identity.  相似文献   

15.
Despite a recent surge of interest in database-independent peptide identifications, accurate de novo peptide sequencing remains an elusive goal. While the recently introduced spectral network approach resulted in accurate peptide sequencing in low-complexity samples, its success depends on the chance of presence of spectra from overlapping peptides. On the other hand, while multistage mass spectrometry (collecting multiple MS 3 spectra from each MS 2 spectrum) can be applied to all spectra in a complex sample, there are currently no software tools for de novo peptide sequencing by multistage mass spectrometry. We describe a rigorous probabilistic framework for analyzing spectra of overlapping peptides and show how to apply it for multistage mass spectrometry. Our software results in both accurate de novo peptide sequencing from multistage mass spectra (despite the inferior quality of MS 3 spectra) and improved interpretation of spectral networks. We further study the problem of de novo peptide sequencing with accurate parent mass (but inaccurate fragment masses), the protocol that may soon become the dominant mode of spectral acquisition. Most existing peptide sequencing algorithms (based on the spectrum graph approach) do not track the accurate parent mass and are thus not equipped for solving this problem. We describe a de novo peptide sequencing algorithm aimed at this experimental protocol and show that it improves the sequencing accuracy on both tandem and multistage mass spectrometry.  相似文献   

16.
Although peptide mass fingerprinting is currently the method of choice to identify proteins, the number of proteins available in databases is increasing constantly, and hence, the advantage of having sequence data on a selected peptide, in order to increase the effectiveness of database searching, is more crucial. Until recently, the ability to identify proteins based on the peptide sequence was essentially limited to the use of electrospray ionization tandem mass spectrometry (MS) methods. The recent development of new instruments with matrix-assisted laser desorption/ionization (MALDI) sources and true tandem mass spectrometry (MS/MS) capabilities creates the capacity to obtain high quality tandem mass spectra of peptides. In this work, using the new high resolution tandem time of flight MALDI-(TOF/TOF) mass spectrometer from Applied Biosystems, examples of successful identification and characterization of bovine heart proteins (SWISS-PROT entries: P02192, Q9XSC6, P13620) separated by two-dimensional electrophoresis and blotted onto polyvinylidene difluoride membrane are described. Tryptic protein digests were analyzed by MALDI-TOF to identify peptide masses afterward used for MS/MS. Subsequent high energy MALDI-TOF/TOF collision-induced dissociation spectra were recorded on selected ions. All data, both MS and MS/MS, were recorded on the same instrument. Tandem mass spectra were submitted to database searching using MS-Tag or were manually de novo sequenced. An interesting modification of a tryptophan residue, a "double oxidation", came to light during these analyses.  相似文献   

17.
Most existing Mass Spectra (MS) analysis programs are automatic and provide limited opportunity for editing during the interpretation. Furthermore, they rely entirely on publicly available databases for interpretation. VEMS (Virtual Expert Mass Spectrometrist) is a program for interactive analysis of peptide MS/MS spectra imported in text file format. Peaks are annotated, the monoisotopic peaks retained, and the b-and y-ion series identified in an interactive manner. The called peptide sequence is searched against a local protein database for sequence identity and peptide mass. The report compares the calculated and the experimental mass spectrum of the called peptide. The program package includes four accessory programs. VEMStrans creates protein databases in FASTA format from EST or cDNA sequence files. VEMSdata creates a virtual peptide database from FASTA files. VEMSdist displays the distribution of masses up to 5000 Da. VEMSmaldi searches singly charged peptide masses against the local database.  相似文献   

18.

Background  

Peptide Mass Fingerprinting (PMF) is a widely used mass spectrometry (MS) method of analysis of proteins and peptides. It relies on the comparison between experimentally determined and theoretical mass spectra. The PMF process requires calibration, usually performed with external or internal calibrants of known molecular masses.  相似文献   

19.
We describe an approach to screen large sets of MALDI-MS mass spectra for protein isoforms separated on two-dimensional electrophoresis gels. Mass spectra are matched against each other by utilizing extracted peak mass lists and hierarchical clustering. The output is presented as dendrograms in which protein isoforms cluster together. Clustering could be applied to mass spectra from different sample sets, dates, and instruments, revealed similarities between mass spectra, and was a useful tool to highlight peptide peaks of interest for further investigation. Shared peak masses in a cluster could be identified and were used to create novel peak mass lists suitable for protein identification using peptide mass fingerprinting. Complex mass spectra consisting of more than one protein were deconvoluted using information from other mass spectra in the same cluster. The number of peptide peaks shared between mass spectra in a cluster was typically found to be larger than the number of peaks that matched to calculated peak masses in databases, thus modified peaks are probably among the shared peptides. Clustering increased the number of peaks associated with a given protein.  相似文献   

20.
Spectral library searching is an emerging approach in peptide identifications from tandem mass spectra, a critical step in proteomic data analysis. In spectral library searching, a spectral library is first meticulously compiled from a large collection of previously observed peptide MS/MS spectra that are conclusively assigned to their corresponding amino acid sequence. An unknown spectrum is then identified by comparing it to all the candidates in the spectral library for the most similar match. This review discusses the basic principles of spectral library building and searching, describes its advantages and limitations, and provides a primer for researchers interested in adopting this new approach in their data analysis. It will also discuss the future outlook on the evolution and utility of spectral libraries in the field of proteomics.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号