首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Methods for treating MS/MS data to achieve accurate peptide identification are currently the subject of much research activity. In this study we describe a new method for filtering MS/MS data and refining precursor masses that provides highly accurate analyses of massive sets of proteomics data. This method, coined "postexperiment monoisotopic mass filtering and refinement" (PE-MMR), consists of several data processing steps: 1) generation of lists of all monoisotopic masses observed in a whole LC/MS experiment, 2) clusterization of monoisotopic masses of a peptide into unique mass classes (UMCs) based on their masses and LC elution times, 3) matching the precursor masses of the MS/MS data to a representative mass of a UMC, and 4) filtration of the MS/MS data based on the presence of corresponding monoisotopic masses and refinement of the precursor ion masses by the UMC mass. PE-MMR increases the throughput of proteomics data analysis, by efficiently removing "garbage" MS/MS data prior to database searching, and improves the mass measurement accuracies (i.e. 0.05 +/- 1.49 ppm for yeast data (from 4.46 +/- 2.81 ppm) and 0.03 +/- 3.41 ppm for glycopeptide data (from 4.8 +/- 7.4 ppm)) for an increased number of identified peptides. In proteomics analyses of glycopeptide-enriched samples, PE-MMR processing greatly reduces the degree of false glycopeptide identification by correctly assigning the monoisotopic masses for the precursor ions prior to database searching. By applying this technique to analyses of proteome samples of varying complexities, we demonstrate herein that PE-MMR is an effective and accurate method for treating massive sets of proteomics data.  相似文献   

2.
Whereas the bearing of mass measurement error on protein identification is sometimes underestimated, uncertainty in observed peptide masses unavoidably translates to ambiguity in subsequent protein identifications. Although ongoing instrumental advances continue to make high accuracy mass spectrometry (MS) increasingly accessible, many proteomics experiments are still conducted with rather large mass error tolerances. In addition, the ranking schemes of most protein identification algorithms do not include a meaningful incorporation of mass measurement error. This article provides a critical evaluation of mass error tolerance as it pertains to false positive peptide and protein associations resulting from peptide mass fingerprint (PMF) database searching. High accuracy, high resolution PMFs of several model proteins were obtained using matrix-assisted laser desorption/ionization Fourier transform ion cyclotron resonance mass spectrometry (MALDI-FTICR-MS). Varying levels of mass accuracy were simulated by systematically modulating the mass error tolerance of the PMF query and monitoring the effect on figures of merit indicating the PMF quality. Importantly, the benefits of decreased mass error tolerance are not manifest in Mowse scores when operating at tolerances in the low parts-per-million range but become apparent with the consideration of additional metrics that are often overlooked. Furthermore, the outcomes of these experiments support the concept that false discovery is closely tied to mass measurement error in PMF analysis. Clear establishment of this relation demonstrates the need for mass error-aware protein identification routines and argues for a more prominent contribution of high accuracy mass measurement to proteomic science.  相似文献   

3.
Gay S  Binz PA  Hochstrasser DF  Appel RD 《Proteomics》2002,2(10):1374-1391
Matrix-assisted laser desorption/ionization-time of flight mass spectrometry has become a valuable tool in proteomics. With the increasing acquisition rate of mass spectrometers, one of the major issues is the development of accurate, efficient and automatic peptide mass fingerprinting (PMF) identification tools. Current tools are mostly based on counting the number of experimental peptide masses matching with theoretical masses. Almost all of them use additional criteria such as isoelectric point, molecular weight, PTMs, taxonomy or enzymatic cleavage rules to enhance prediction performance. However, these identification tools seldom use peak intensities as parameter as there is currently no model predicting the intensities based on the physicochemical properties of peptides. In this work, we used standard datamining methods such as classification and regression methods to find correlations between peak intensities and the properties of the peptides composing a PMF spectrum. These methods were applied on a dataset comprising a series of PMF experiments involving 157 proteins. We found that the C4.5 method gave the more informative results for the classification task (prediction of the presence or absence of a peptide in a spectra) and M5' for the regression methods (prediction of the normalized intensity of a peptide peak). The C4.5 result correctly classified 88% of the theoretical peaks; whereas the M5' peak intensities had a correlation coefficient of 0.6743 with the experimental peak intensities. These methods enabled us to obtain decision and model trees that can be directly used for prediction and identification of PMF results. The work performed permitted to lay the foundations of a method to analyze factors influencing the peak intensity of PMF spectra. A simple extension of this analysis could lead to improve the accuracy of the results by using a larger dataset. Additional peptide characteristics or even PMF experimental parameters can also be taken into account in the datamining process to analyze their influence on the peak intensity. Furthermore, this datamining approach can certainly be extended to the tandem mass spectrometry domain or other mass spectrometry derived methods.  相似文献   

4.
Scherl A  Tsai YS  Shaffer SA  Goodlett DR 《Proteomics》2008,8(14):2791-2797
Although mass spectrometers are capable of providing high mass accuracy data, assignment of true monoisotopic precursor ion mass is complicated during data-dependent ion selection for LC-MS/MS analysis of complex mixtures. The complication arises when chromatographic peak widths for a given analyte exceed the time required to acquire a precursor ion mass spectrum. The result is that many measured monoisotopic masses are misassigned due to calculation from a single mass spectrum with poor ion statistics based on only a fraction of the total available ions for a given analyte. Such data in turn produces errors in automated database searches, where precursor m/z value is one search parameter. We propose here a postacquisition approach to correct misassigned monoisotopic m/z values that involves peak detection over the entire elution profile and correction of the precursor ion monoisotopic mass. As a result of using this approach to reprocess shotgun proteomic data we increased peptide sequence assignments by 10% while reducing the estimated false positive ratio from 1 to 0.2%. We also show that 4% of the salvaged identifications may be accounted for by correction of mixed tandem mass spectra resulting from fragmentation of multiple peptides simultaneously, a situation which we refer to as accidental CID.  相似文献   

5.

Background  

The elemental composition of peptides results in formation of distinct, equidistantly spaced clusters across the mass range. The property of peptide mass clustering is used to calibrate peptide mass lists, to identify and remove non-peptide peaks and for data reduction.  相似文献   

6.
In this study, we present a preprocessing method for quadrupole time-of-flight (Q-TOF) tandem mass spectra to increase the accuracy of database searching for peptide (protein) identification. Based on the natural isotopic information inherent in tandem mass spectra, we construct a decision tree after feature selection to classify the noise and ion peaks in tandem spectra. Furthermore, we recognize overlapping peaks to find the monoisotopic masses of ions for the following identification process. The experimental results show that this preprocessing method increases the search speed and the reliability of peptide identification.  相似文献   

7.
MOTIVATION: Mass Spectrometry (MS)-based protein identification via peptide mass fingerprinting (PMF) is a key component in high-throughput proteome research. While PMF was the first commonly used protein identification method, provided higher throughput than the tandem MS-based method, its accuracy is lower than that of the tandem MS method. Thus, it is desirable to develop PMF-based algorithm with higher protein identification accuracy to facilitate proteome research. RESULTS: We propose a peak bagging method for single MS-based protein identification. It combines results from multiple PMF algorithms, where each PMF algorithm takes a random peak subset as input. Evaluation with a set of real MALDI-TOF MS spectra shows that the new peak bagging method provides consistent improvements over the single PMF algorithm.  相似文献   

8.
Ding Q  Xiao L  Xiong S  Jia Y  Que H  Guo Y  Liu S 《Proteomics》2003,3(7):1313-1317
Unmatched masses are often observed in the experimental peptide mass spectra when database searching is performed with the ProFound program. Comparison between theoretical and experimental mass spectra of standard proteins shows that contamination accounts for most of the unmatched masses. In this retrospective analysis, the top 100 most probable contaminating masses, as listed in order of their probability, are statistically filtered out from 118 different experimental peptide mass fingerprinting (PMF) maps. Most of the interfering masses originate from trypsin autolysis and human keratins. Subtraction of known contaminants from raw data and using cleaner masses for searching can enhance protein identification by PMF.  相似文献   

9.
Motivation: Peptide mass fingerprinting (PMF) is a method for protein identification in which a protein is fragmented by a defined cleavage protocol (usually proteolysis with trypsin), and the masses of these products constitute a 'fingerprint' that can be searched against theoretical fingerprints of all known proteins. In the first stage of PMF, the raw mass spectrometric data are processed to generate a peptide mass list. In the second stage this protein fingerprint is used to search a database of known proteins for the best protein match. Although current software solutions can typically deliver a match in a relatively short time, a system that can find a match in real time could change the way in which PMF is deployed and presented. In a paper published earlier we presented a hardware design of a raw mass spectra processor that, when implemented in Field Programmable Gate Array (FPGA) hardware, achieves almost 170-fold speed gain relative to a conventional software implementation running on a dual processor server. In this article we present a complementary hardware realization of a parallel database search engine that, when running on a Xilinx Virtex 2 FPGA at 100 MHz, delivers 1800-fold speed-up compared with an equivalent C software routine, running on a 3.06 GHz Xeon workstation. The inherent scalability of the design means that processing speed can be multiplied by deploying the design on multiple FPGAs. The database search processor and the mass spectra processor, running on a reconfigurable computing platform, provide a complete real-time PMF protein identification solution.  相似文献   

10.
We describe an approach to screen large sets of MALDI-MS mass spectra for protein isoforms separated on two-dimensional electrophoresis gels. Mass spectra are matched against each other by utilizing extracted peak mass lists and hierarchical clustering. The output is presented as dendrograms in which protein isoforms cluster together. Clustering could be applied to mass spectra from different sample sets, dates, and instruments, revealed similarities between mass spectra, and was a useful tool to highlight peptide peaks of interest for further investigation. Shared peak masses in a cluster could be identified and were used to create novel peak mass lists suitable for protein identification using peptide mass fingerprinting. Complex mass spectra consisting of more than one protein were deconvoluted using information from other mass spectra in the same cluster. The number of peptide peaks shared between mass spectra in a cluster was typically found to be larger than the number of peaks that matched to calculated peak masses in databases, thus modified peaks are probably among the shared peptides. Clustering increased the number of peaks associated with a given protein.  相似文献   

11.
For MALDI-TOF mass spectrometry, we show that the intensity of a peptide-ion peak is directly correlated with its sequence, with the residues M, H, P, R, and L having the most substantial effect on ionization. We developed a machine learning approach that exploits this relationship to significantly improve peptide mass fingerprint (PMF) accuracy based on training data sets from both true-positive and false-positive PMF searches. The model's cross-validated accuracy in distinguishing real versus false-positive database search results is 91%, rivaling the accuracy of MS/MS-based protein identification.  相似文献   

12.
Lester PJ  Hubbard SJ 《Proteomics》2002,2(10):1392-1405
Peptide mass fingerprinting (PMF) remains the most amenable technique for protein identification in proteomics, using mass spectrometry as the primary analytical technique coupled with bioinformatics. This relies on the presence of the amino acid sequence of the protein in the current databanks. Despite this, it is desirable to be able to use the technique for organisms whose genomes are not yet fully sequenced and apply cross-species protein identification. In this study, we have re-examined the feasibility of such approaches by considering the extent of protein similarity between genome sequences using a data set of 29 complete bacterial and two eukaryotic genomes. A range of protein and peptide features are considered, including protein isoelectric focussing point, protein mass, and amino acid conservation. The effectiveness of PMF approaches has then been tested with a series of computer simulations with varying peptide number and mass accuracy for several cross-species tests. The results show that PMF alone is unsuitable in general for divergent species jumps, or when protein similarity is less than 70% identity. Despite this, there exists a considerable enrichment above random of tryptic peptide conservation and PMF promises to remain useful when combined with other data than just peptide masses for cross-species protein identification.  相似文献   

13.
蛋白质组学的基础研究之一是蛋白质鉴定.规模化的蛋白质鉴定通常采用"鸟枪法",即选择一些酶切肽段(母离子)碎裂生成二级谱图,通过二级谱图及其母离子质量鉴定肽段,再推断对应的蛋白质.在鉴定过程中,母离子质量是一个关键参数.母离子是否是肽段的单同位素峰决定了正确肽段是否能进入候选,母离子的质量精度决定了候选肽段的数目.本文从判断单同位素峰和系统误差校准这两个角度研究了母离子的准确检测技术.判断单同位素峰的技术在蛋白质上已有研究,包括电荷判断、单同位素峰判断和重叠同位素峰判断.可以借鉴蛋白质水平的技术研究母离子的单同位素峰判断方法.同时母离子的系统误差校准也有较为成熟的方法.这两个角度的研究有助于提高规模化蛋白质的鉴定率.  相似文献   

14.
High‐resolution MS/MS spectra of peptides can be deisotoped to identify monoisotopic masses of peptide fragments. The use of such masses should improve protein identification rates. However, deisotoping is not universally used and its benefits have not been fully explored. Here, MS2‐Deisotoper, a tool for use prior to database search, is used to identify monoisotopic peaks in centroided MS/MS spectra. MS2‐Deisotoper works by comparing the mass and relative intensity of each peptide fragment peak to every other peak of greater mass, and by applying a set of rules concerning mass and intensity differences. After comprehensive parameter optimization, it is shown that MS2‐Deisotoper can improve the number of peptide spectrum matches (PSMs) identified by up to 8.2% and proteins by up to 2.8%. It is effective with SILAC and non‐SILAC MS/MS data. The identification of unique peptide sequences is also improved, increasing the number of human proteoforms by 3.7%. Detailed investigation of results shows that deisotoping increases Mascot ion scores, improves FDR estimation for PSMs, and leads to greater protein sequence coverage. At a peptide level, it is found that the efficacy of deisotoping is affected by peptide mass and charge. MS2‐Deisotoper can be used via a user interface or as a command‐line tool.  相似文献   

15.
Most existing Mass Spectra (MS) analysis programs are automatic and provide limited opportunity for editing during the interpretation. Furthermore, they rely entirely on publicly available databases for interpretation. VEMS (Virtual Expert Mass Spectrometrist) is a program for interactive analysis of peptide MS/MS spectra imported in text file format. Peaks are annotated, the monoisotopic peaks retained, and the b-and y-ion series identified in an interactive manner. The called peptide sequence is searched against a local protein database for sequence identity and peptide mass. The report compares the calculated and the experimental mass spectrum of the called peptide. The program package includes four accessory programs. VEMStrans creates protein databases in FASTA format from EST or cDNA sequence files. VEMSdata creates a virtual peptide database from FASTA files. VEMSdist displays the distribution of masses up to 5000 Da. VEMSmaldi searches singly charged peptide masses against the local database.  相似文献   

16.
Protein identification via peptide mass fingerprinting (PMF) remains a key component of high-throughput proteomics experiments in post-genomic science. Candidate protein identifications are made using bioinformatic tools from peptide peak lists obtained via mass spectrometry (MS). These algorithms rely on several search parameters, including the number of potential uncut peptide bonds matching the primary specificity of the hydrolytic enzyme used in the experiment. Typically, up to one of these "missed cleavages" are considered by the bioinformatics search tools, usually after digestion of the in silico proteome by trypsin. Using two distinct, nonredundant datasets of peptides identified via PMF and tandem MS, a simple predictive method based on information theory is presented which is able to identify experimentally defined missed cleavages with up to 90% accuracy from amino acid sequence alone. Using this simple protocol, we are able to "mask" candidate protein databases so that confident missed cleavage sites need not be considered for in silico digestion. We show that that this leads to an improvement in database searching, with two different search engines, using the PMF dataset as a test set. In addition, the improved approach is also demonstrated on an independent PMF data set of known proteins that also has corresponding high-quality tandem MS data, validating the protein identifications. This approach has wider applicability for proteomics database searching, and the program for predicting missed cleavages and masking Fasta-formatted protein sequence databases has been made available via http:// ispider.smith.man.ac uk/MissedCleave.  相似文献   

17.
Yuan ZF  Liu C  Wang HP  Sun RX  Fu Y  Zhang JF  Wang LH  Chi H  Li Y  Xiu LY  Wang WP  He SM 《Proteomics》2012,12(2):226-235
Determining the monoisotopic peak of a precursor is a first step in interpreting mass spectra, which is basic but non-trivial. The reason is that in the isolation window of a precursor, other peaks interfere with the determination of the monoisotopic peak, leading to wrong mass-to-charge ratio or charge state. Here we propose a method, named pParse, to export the most probable monoisotopic peaks for precursors, including co-eluted precursors. We use the relationship between the position of the highest peak and the mass of the first peak to detect candidate clusters. Then, we extract three features to sort the candidate clusters: (i) the sum of the intensity, (ii) the similarity of the experimental and the theoretical isotopic distribution, and (iii) the similarity of elution profiles. We showed that the recall of pParse, MaxQuant, and BioWorks was 98-98.8%, 0.5-17%, and 1.8-36.5% at the same precision, respectively. About 50% of tandem mass spectra are triggered by multiple precursors which are difficult to identify. Then we design a new scoring function to identify the co-eluted precursors. About 26% of all identified peptides were exclusively from co-eluted peptides. Therefore, accurately determining monoisotopic peaks, including co-eluted precursors, can greatly increase peptide identification rate.  相似文献   

18.
19.
Peptide mass fingerprinting, regardless of becoming complementary to tandem mass spectrometry for protein identification, is still the subject of in-depth study because of its higher sample throughput, higher level of specificity for single peptides and lower level of sensitivity to unexpected post-translational modifications compared with tandem mass spectrometry. In this study, we propose, implement and evaluate a uniform approach using support vector machines to incorporate individual concepts and conclusions for accurate PMF. We focus on the inherent attributes and critical issues of the theoretical spectrum (peptides), the experimental spectrum (peaks) and spectrum (masses) alignment. Eighty-one feature-matching patterns derived from cleavage type, uniqueness and variable masses of theoretical peptides together with the intensity rank of experimental peaks were proposed to characterize the matching profile of the peptide mass fingerprinting procedure. We developed a new strategy including the participation of matched peak intensity redistribution to handle shared peak intensities and 440 parameters were generated to digitalize each feature-matching pattern. A high performance for an evaluation data set of 137 items was finally achieved by the optimal multi-criteria support vector machines approach, with 491 final features out of a feature vector of 35,640 normalized features through cross training and validating a publicly available "gold standard" peptide mass fingerprinting data set of 1733 items. Compared with the Mascot, MS-Fit, ProFound and Aldente algorithms commonly used for MS-based protein identification, the feature-matching patterns algorithm has a greater ability to clearly separate correct identifications and random matches with the highest values for sensitivity (82%), precision (97%) and F1-measure (89%) of protein identification. Several conclusions reached via this research make general contributions to MS-based protein identification. Firstly, inherent attributes showed comparable or even greater robustness than other explicit. As an inherent attribute of an experimental spectrum, peak intensity should receive considerable attention during protein identification. Secondly, alignment between intense experimental peaks and properly digested, unique or non-modified theoretical peptides is very likely to occur in positive peptide mass fingerprinting. Finally, normalization by several types of harmonic factors, including missed cleavages and mass modification, can make important contributions to the performance of the procedure.  相似文献   

20.
The development of high throughput utilities to identify proteins is a major challenge in present research in the field of proteomics. One such utility, the molecular scanner, uses proteins separated by two-dimensional polyacrylamide gel electrophoresis that are digested in the gel and during transfer onto a collecting membrane. After adding a matrix, the membrane is inserted into a matrix-assisted laser desorption/ionization-time of flight mass spectrometer and a peptide mass fingerprint (PMF) is measured for every scanned site. Since the spacing between scanned sites is much smaller than the size of the most abundant protein spots, there is a certain redundancy in the data that was used in an earlier experiment with Escherichia coli [1] to improve mass calibration and PMF identification results. It was observed that the signal intensity of a peptide mass as a function of the position on the membrane showed similar patterns if peptides stemmed from the same protein. Taking account of these similarities a clustering algorithm was used to find lists of experimental masses with similar intensity distributions, which provided clearer identification of the corresponding proteins. Here, these methods are applied to a human plasma scan, where proteins were highly modified and less separated. The presence of very abundant proteins like albumin and immunoglobulins added another difficulty. The calibration of the initial PMFs was not satisfactory and masses had to be recalibrated. After discarding chemical noise, the membrane was partitioned into regions and for each region protein identification was carried out separately. A new scoring method was used, where the PMF score was multiplied by a factor that measures the similarity of matching peptides. This method proved to be more robust than the method developed in [1] if the region where a protein was found had an extended, nonspherical shape and strong overlap with regions of other proteins. Many proteins annotated on the SWISS-2D PAGE human plasma master gel could be clearly identified and many interesting properties were observed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号