首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Hernandez P  Gras R  Frey J  Appel RD 《Proteomics》2003,3(6):870-878
In recent years, proteomics research has gained importance due to increasingly powerful techniques in protein purification, mass spectrometry and identification, and due to the development of extensive protein and DNA databases from various organisms. Nevertheless, current identification methods from spectrometric data have difficulties in handling modifications or mutations in the source peptide. Moreover, they have low performance when run on large databases (such as genomic databases), or with low quality data, for example due to bad calibration or low fragmentation of the source peptide. We present a new algorithm dedicated to automated protein identification from tandem mass spectrometry (MS/MS) data by searching a peptide sequence database. Our identification approach shows promising properties for solving the specific difficulties enumerated above. It consists of matching theoretical peptide sequences issued from a database with a structured representation of the source MS/MS spectrum. The representation is similar to the spectrum graphs commonly used by de novo sequencing software. The identification process involves the parsing of the graph in order to emphasize relevant sections for each theoretical sequence, and leads to a list of peptides ranked by a correlation score. The parsing of the graph, which can be a highly combinatorial task, is performed by a bio-inspired algorithm called Ant Colony Optimization algorithm.  相似文献   

2.
MOTIVATION: The identification of peptides by tandem mass spectrometry (MS/MS) is a central method of proteomics research, but due to the complexity of MS/MS data and the large databases searched, the accuracy of peptide identification algorithms remains limited. To improve the accuracy of identification we applied a machine-learning approach using a hidden Markov model (HMM) to capture the complex and often subtle links between a peptide sequence and its MS/MS spectrum. Model: Our model, HMM_Score, represents ion types as HMM states and calculates the maximum joint probability for a peptide/spectrum pair using emission probabilities from three factors: the amino acids adjacent to each fragmentation site, the mass dependence of ion types and the intensity dependence of ion types. The Viterbi algorithm is used to calculate the most probable assignment between ion types in a spectrum and a peptide sequence, then a correction factor is added to account for the propensity of the model to favor longer peptides. An expectation value is calculated based on the model score to assess the significance of each peptide/spectrum match. RESULTS: We trained and tested HMM_Score on three data sets generated by two different mass spectrometer types. For a reference data set recently reported in the literature and validated using seven identification algorithms, HMM_Score produced 43% more positive identification results at a 1% false positive rate than the best of two other commonly used algorithms, Mascot and X!Tandem. HMM_Score is a highly accurate platform for peptide identification that works well for a variety of mass spectrometer and biological sample types. AVAILABILITY: The program is freely available on ProteomeCommons via an OpenSource license. See http://bioinfo.unc.edu/downloads/ for the download link.  相似文献   

3.
Despite a recent surge of interest in database-independent peptide identifications, accurate de novo peptide sequencing remains an elusive goal. While the recently introduced spectral network approach resulted in accurate peptide sequencing in low-complexity samples, its success depends on the chance of presence of spectra from overlapping peptides. On the other hand, while multistage mass spectrometry (collecting multiple MS 3 spectra from each MS 2 spectrum) can be applied to all spectra in a complex sample, there are currently no software tools for de novo peptide sequencing by multistage mass spectrometry. We describe a rigorous probabilistic framework for analyzing spectra of overlapping peptides and show how to apply it for multistage mass spectrometry. Our software results in both accurate de novo peptide sequencing from multistage mass spectra (despite the inferior quality of MS 3 spectra) and improved interpretation of spectral networks. We further study the problem of de novo peptide sequencing with accurate parent mass (but inaccurate fragment masses), the protocol that may soon become the dominant mode of spectral acquisition. Most existing peptide sequencing algorithms (based on the spectrum graph approach) do not track the accurate parent mass and are thus not equipped for solving this problem. We describe a de novo peptide sequencing algorithm aimed at this experimental protocol and show that it improves the sequencing accuracy on both tandem and multistage mass spectrometry.  相似文献   

4.
Gay S  Binz PA  Hochstrasser DF  Appel RD 《Proteomics》2002,2(10):1374-1391
Matrix-assisted laser desorption/ionization-time of flight mass spectrometry has become a valuable tool in proteomics. With the increasing acquisition rate of mass spectrometers, one of the major issues is the development of accurate, efficient and automatic peptide mass fingerprinting (PMF) identification tools. Current tools are mostly based on counting the number of experimental peptide masses matching with theoretical masses. Almost all of them use additional criteria such as isoelectric point, molecular weight, PTMs, taxonomy or enzymatic cleavage rules to enhance prediction performance. However, these identification tools seldom use peak intensities as parameter as there is currently no model predicting the intensities based on the physicochemical properties of peptides. In this work, we used standard datamining methods such as classification and regression methods to find correlations between peak intensities and the properties of the peptides composing a PMF spectrum. These methods were applied on a dataset comprising a series of PMF experiments involving 157 proteins. We found that the C4.5 method gave the more informative results for the classification task (prediction of the presence or absence of a peptide in a spectra) and M5' for the regression methods (prediction of the normalized intensity of a peptide peak). The C4.5 result correctly classified 88% of the theoretical peaks; whereas the M5' peak intensities had a correlation coefficient of 0.6743 with the experimental peak intensities. These methods enabled us to obtain decision and model trees that can be directly used for prediction and identification of PMF results. The work performed permitted to lay the foundations of a method to analyze factors influencing the peak intensity of PMF spectra. A simple extension of this analysis could lead to improve the accuracy of the results by using a larger dataset. Additional peptide characteristics or even PMF experimental parameters can also be taken into account in the datamining process to analyze their influence on the peak intensity. Furthermore, this datamining approach can certainly be extended to the tandem mass spectrometry domain or other mass spectrometry derived methods.  相似文献   

5.
Nakamura T  Dohmae N  Takio K 《Proteomics》2004,4(9):2558-2566
We describe a new approach for the characterization of a digested protein complex with quantitative aspects. Accurate masses of tryptic peptides in the digested complex were acquired by nano-liquid chromatography Fourier transform-ion cyclotron resonance mass spectrometry (MS). The conditions of the electrospray ion source were alternated to acquire normal and fragment-ion-rich mass spectra concurrently. This, alternating-scan method, which includes no tandem mass spectrometry (MS/MS), allowed us to retain the integrity of the mass chromatograms and averted missed peptides due to MS and MS/MS switching. Tentative assignments of accurate peptide masses were verified with the concurrently acquired fragment-ion-rich spectra, and the identities of the protein components were established. For each identified protein component, mass chromatograms attributable to the validated accurate peptide masses were extracted, and the peak areas of multiple mass chromatograms were standardized. The standardized peak areas appeared to reasonably reflect the molar ratio of the protein components in standard mixtures. This new approach was successfully applied to the characterization of a cyanobacterial photosystem II complex preparation. A clear difference in the standardized peak areas was observed between the two groups of identified components, namely eight stoichiometric photosystem II proteins and two minor copurified phycobiliproteins.  相似文献   

6.
Peptide mass fingerprinting, regardless of becoming complementary to tandem mass spectrometry for protein identification, is still the subject of in-depth study because of its higher sample throughput, higher level of specificity for single peptides and lower level of sensitivity to unexpected post-translational modifications compared with tandem mass spectrometry. In this study, we propose, implement and evaluate a uniform approach using support vector machines to incorporate individual concepts and conclusions for accurate PMF. We focus on the inherent attributes and critical issues of the theoretical spectrum (peptides), the experimental spectrum (peaks) and spectrum (masses) alignment. Eighty-one feature-matching patterns derived from cleavage type, uniqueness and variable masses of theoretical peptides together with the intensity rank of experimental peaks were proposed to characterize the matching profile of the peptide mass fingerprinting procedure. We developed a new strategy including the participation of matched peak intensity redistribution to handle shared peak intensities and 440 parameters were generated to digitalize each feature-matching pattern. A high performance for an evaluation data set of 137 items was finally achieved by the optimal multi-criteria support vector machines approach, with 491 final features out of a feature vector of 35,640 normalized features through cross training and validating a publicly available "gold standard" peptide mass fingerprinting data set of 1733 items. Compared with the Mascot, MS-Fit, ProFound and Aldente algorithms commonly used for MS-based protein identification, the feature-matching patterns algorithm has a greater ability to clearly separate correct identifications and random matches with the highest values for sensitivity (82%), precision (97%) and F1-measure (89%) of protein identification. Several conclusions reached via this research make general contributions to MS-based protein identification. Firstly, inherent attributes showed comparable or even greater robustness than other explicit. As an inherent attribute of an experimental spectrum, peak intensity should receive considerable attention during protein identification. Secondly, alignment between intense experimental peaks and properly digested, unique or non-modified theoretical peptides is very likely to occur in positive peptide mass fingerprinting. Finally, normalization by several types of harmonic factors, including missed cleavages and mass modification, can make important contributions to the performance of the procedure.  相似文献   

7.
Tandem mass spectrometry (MS/MS) has emerged as a cornerstone of proteomics owing in part to robust spectral interpretation algorithms. Widely used algorithms do not fully exploit the intensity patterns present in mass spectra. Here, we demonstrate that intensity pattern modeling improves peptide and protein identification from MS/MS spectra. We modeled fragment ion intensities using a machine-learning approach that estimates the likelihood of observed intensities given peptide and fragment attributes. From 1,000,000 spectra, we chose 27,000 with high-quality, nonredundant matches as training data. Using the same 27,000 spectra, intensity was similarly modeled with mismatched peptides. We used these two probabilistic models to compute the relative likelihood of an observed spectrum given that a candidate peptide is matched or mismatched. We used a 'decoy' proteome approach to estimate incorrect match frequency, and demonstrated that an intensity-based method reduces peptide identification error by 50-96% without any loss in sensitivity.  相似文献   

8.
Rescoring of mass spectrometry (MS) search results using spectral predictors can strongly increase peptide spectrum match (PSM) identification rates. This approach is particularly effective when aiming to search MS data against large databases, for example, when dealing with nonspecific cleavage in immunopeptidomics or inflation of the reference database for noncanonical peptide identification. Here, we present inSPIRE (in silico Spectral Predictor Informed REscoring), a flexible and performant open-source rescoring pipeline built on Prosit MS spectral prediction, which is compatible with common database search engines. inSPIRE allows large-scale rescoring with data from multiple MS search files, increases sensitivity to minor differences in amino acid residue position, and can be applied to various MS sample types, including tryptic proteome digestions and immunopeptidomes. inSPIRE boosts PSM identification rates in immunopeptidomics, leading to better performance than the original Prosit rescoring pipeline, as confirmed by benchmarking of inSPIRE performance on ground truth datasets. The integration of various features in the inSPIRE backbone further boosts the PSM identification in immunopeptidomics, with a potential benefit for the identification of noncanonical peptides.  相似文献   

9.
Tandem mass spectrometry fragments a large number of molecules of the same peptide sequence into charged molecules of prefix and suffix peptide subsequences and then measures mass/charge ratios of these ions. The de novo peptide sequencing problem is to reconstruct the peptide sequence from a given tandem mass spectral data of k ions. By implicitly transforming the spectral data into an NC-spectrum graph G (V, E) where /V/ = 2k + 2, we can solve this problem in O(/V//E/) time and O(/V/2) space using dynamic programming. For an ideal noise-free spectrum with only b- and y-ions, we improve the algorithm to O(/V/ + /E/) time and O(/V/) space. Our approach can be further used to discover a modified amino acid in O(/V//E/) time. The algorithms have been implemented and tested on experimental data.  相似文献   

10.
The identification of proteins in proteomics experiments is usually based on mass information derived from tandem mass spectrometry data. To improve the performance of the identification algorithms, additional information available in the fragment peak intensity patterns has been shown to be useful. In this study, we consider the effect of iTRAQ labeling on the fragment peak intensity patterns of singly charged peptides from MALDI tandem MS data. The presence of an iTRAQ-modified basic group on the N-terminus leads to a more pronounced set of b-ion peaks and distinct changes in the abundance of specific peptide types. We performed a simple intensity prediction by using a decision-tree machine learning approach and were able to show that the relative ion abundance in a spectrum can be correctly predicted and distinguished from closely related sequences. This information will be useful for the development of improved method-specific intensity-based protein identification algorithms.  相似文献   

11.
Shotgun tandem mass spectrometry-based peptide sequencing using programs such as SEQUEST allows high-throughput identification of peptides, which in turn allows the identification of corresponding proteins. We have applied a machine learning algorithm, called the support vector machine, to discriminate between correctly and incorrectly identified peptides using SEQUEST output. Each peptide was characterized by SEQUEST-calculated features such as delta Cn and Xcorr, measurements such as precursor ion current and mass, and additional calculated parameters such as the fraction of matched MS/MS peaks. The trained SVM classifier performed significantly better than previous cutoff-based methods at separating positive from negative peptides. Positive and negative peptides were more readily distinguished in training set data acquired on a QTOF, compared to an ion trap mass spectrometer. The use of 13 features, including four new parameters, significantly improved the separation between positive and negative peptides. Use of the support vector machine and these additional parameters resulted in a more accurate interpretation of peptide MS/MS spectra and is an important step toward automated interpretation of peptide tandem mass spectrometry data in proteomics.  相似文献   

12.
数据非依赖采集(DIA)是蛋白质组学领域近年来快速发展的质谱采集技术,其通过无偏碎裂隔离窗口内的所有母离子采集二级谱图,理论上可实现蛋白质样品的深度覆盖,同时具有高通量、高重现性和高灵敏度的优点。现有的DIA数据采集方法可以分为全窗口碎裂方法、隔离窗口序列碎裂方法和四维DIA数据采集方法(4D-DIA)3大类。针对DIA数据的不同特点,主要数据解析方法包括谱库搜索方法、蛋白质序列库直接搜索方法、伪二级谱图鉴定方法和从头测序方法4大类。解析得到的肽段鉴定结果需要进行可信度评估,包括使用机器学习方法的重排序和对报告结果集合的假发现率估计两个步骤,实现对数据解析结果的质控。本文对DIA数据的采集方法、数据解析方法及软件和鉴定结果可信度评估方法进行了整理和综述,并展望了未来的发展方向。  相似文献   

13.
The dominant ions in MS/MS spectra of peptides, which have been fragmented by low-energy CID, are often b-, y-ions and their derivatives resulting from the cleavage of the peptide bonds. However, MS/MS spectra typically contain many more peaks. These can result not only from isotope variants and multiply charged replicates of the peptide fragmentation products but also from unknown fragmentation pathways, sample-specific or systematic chemical contaminations or from noise generated by the electronic detection system. The presence of this background complicates spectrum interpretation. Besides dramatically prolonged computation time, it can lead to incorrect protein identification, especially in the case of de novo sequencing algorithms. Here, we present an algorithm for detection and transformation of multiply charged peaks into singly charged monoisotopic peaks, removal of heavy isotope replicates, and random noise. A quantitative criterion for the recognition of some noninterpretable spectra has been derived as a byproduct. The approach is based on numerical spectral analysis and signal detection methods. The algorithm has been implemented in a stand-alone computer program called MS Cleaner that can be obtained from the authors upon request.  相似文献   

14.
High-throughput proteomics is made possible by a combination of modern mass spectrometry instruments capable of generating many millions of tandem mass (MS(2)) spectra on a daily basis and the increasingly sophisticated associated software for their automated identification. Despite the growing accumulation of collections of identified spectra and the regular generation of MS(2) data from related peptides, the mainstream approach for peptide identification is still the nearly two decades old approach of matching one MS(2) spectrum at a time against a database of protein sequences. Moreover, database search tools overwhelmingly continue to require that users guess in advance a small set of 4-6 post-translational modifications that may be present in their data in order to avoid incurring substantial false positive and negative rates. The spectral networks paradigm for analysis of MS(2) spectra differs from the mainstream database search paradigm in three fundamental ways. First, spectral networks are based on matching spectra against other spectra instead of against protein sequences. Second, spectral networks find spectra from related peptides even before considering their possible identifications. Third, spectral networks determine consensus identifications from sets of spectra from related peptides instead of separately attempting to identify one spectrum at a time. Even though spectral networks algorithms are still in their infancy, they have already delivered the longest and most accurate de novo sequences to date, revealed a new route for the discovery of unexpected post-translational modifications and highly-modified peptides, enabled automated sequencing of cyclic non-ribosomal peptides with unknown amino acids and are now defining a novel approach for mapping the entire molecular output of biological systems that is suitable for analysis with tandem mass spectrometry. Here we review the current state of spectral networks algorithms and discuss possible future directions for automated interpretation of spectra from any class of molecules.  相似文献   

15.
Protein identification by mass spectrometry is mainly based on MS/MS spectra and the accuracy of molecular mass determination. However, the high complexity and dynamic ranges for any species of proteomic samples, surpass the separation capacity and detection power of the most advanced multidimensional liquid chromatographs and mass spectrometers. Only a tiny portion of signals is selected for MS/MS experiments and a still considerable number of them do not provide reliable peptide identification. In this article, an in silico analysis for a novel methodology of peptides and proteins identification is described. The approach is based on mass accuracy, isoelectric point (pI), retention time (t(R)) and N-terminal amino acid determination as protein identification criteria regardless of high quality MS/MS spectra. When the methodology was combined with the selective isolation methods, the number of unique peptides and identified proteins increases. Finally, to demonstrate the feasibility of the methodology, an OFFGEL-LC-MS/MS experiment was also implemented. We compared the more reliable peptide identified with MS/MS information, and peptide identified with three experimental features (pI, t(R), molecular mass). Also, two theoretical assumptions from MS/MS identification (selective isolation of peptides and N-terminal amino acid) were analyzed. Our results show that using the information provided by these features and selective isolation methods we could found the 93% of the high confidence protein identified by MS/MS with false-positive rate lower than 5%.  相似文献   

16.
Progress is reviewed towards the development of a global strategy that aims to extend the sensitivity, dynamic range, comprehensiveness and throughput of proteomic measurements based upon the use of high performance separations and mass spectrometry. The approach uses high accuracy mass measurements from Fourier transform ion cyclotron resonance mass spectrometry (FTICR) to validate peptide 'accurate mass tags' (AMTs) produced by global protein enzymatic digestions for a specific organism, tissue or cell type from 'potential mass tags' tentatively identified using conventional tandem mass spectrometry (MS/MS). This provides the basis for subsequent measurements without the need for MS/ MS. High resolution capillary liquid chromatography separations combined with high sensitivity, and high resolution accurate FTICR measurements are shown to be capable of characterizing peptide mixtures of more than 10(5) components. The strategy has been initially demonstrated using the microorganisms Saccharomyces cerevisiae and Deinococcus radiodurans. Advantages of the approach include the high confidence of protein identification, its broad proteome coverage, high sensitivity, and the capability for stableisotope labeling methods for precise relative protein abundance measurements.Abbreviations: LC, liquid chromatography; FTICR, Fourier transform ion cyclotron resonance; AMT, accurate mass tag; PMT, potential mass tag; MMA, mass measurement accuracy; MS, mass spectrometry; MS/MS, tandem mass spectrometry; ppm, parts per million.  相似文献   

17.
At present, mass spectrometry provides a rapid and sensitive means for making conclusive protein identifications from complex mixtures. Sequencing tryptic peptides derived from proteolyzed protein samples, also known as the "Bottom Up" approach, is the mass spectrometric gold standard for identifying unknowns. An alternative technology, "Top Down" characterization, is emerging as a viable option for protein identifications, which involves analyzing the intact unknowns for accurate mass and amino acid sequence tags. In this paper, both characterization methods were employed to more comprehensively differentiate two early-eluting peaks in a process-scale size-exclusion chromatography (SEC) step for a recombinant, immunoglobulin gamma-1 (IgG-1) fusion protein. The contents of each SEC peak were enzymatically digested, and the resulting peptides were mapped using reversed-phase (RP) HPLC-ion trap MS. Many low-level UV signals were observed among the fusion protein-related peptide peaks. These unknowns were collected, concentrated, and analyzed using nanoelectrospray (nanoES) collision-induced dissociation (CID) tandem (MS/MS) mass spectrometry for identification. The peptide sequencing experiments resulted in the identification of twenty host cell-related proteins. Following peptide mapping, the contents of the two SEC peaks were protein mass profiled using on-line RP HPLC coupled to a high-resolution, quadrupole time-of-flight (Qq/TOF) MS. Unknown proteins were also collected, concentrated, and dissociated using nanoES CID MS/MS. Intact protein CID experiments and accurate molecular weight information allowed for the identification of three full length host cell-derived proteins and numerous clips from these and additional proteins. The accurate molecular weight values allowed for the assignment of N- and C-terminal processing, which is difficult to conclusively access from peptide mapping data. The peptide-mapping experiments proved to be far more effective for making protein identifications from complex mixtures, whereas the protein mass profiling was useful for assessing modifications and distinguishing protein clips from full length species.  相似文献   

18.
A vented column, capillary liquid chromatography (LC) microelectrospray ionization (ESI) Fourier transform ion cyclotron resonance (FT-ICR (9.4 T)) mass spectrometry (MS) approach to phosphopeptide identification is described. A dual-ESI source capable of rapid (approximately 200 ms) switching between two independently controlled ESI emitters was constructed. The dual-ESI source, combined with external ion accumulation in a linear octopole ion trap, allowed for internal calibration of every mass spectrum during LC. LC ESI FT-ICR positive-ion MS of protein kinase C (PKC) revealed four previously unidentified phosphorylated peptides (one within PKC(alpha), one within PKC(delta), and two within PKC(zeta)). Internal calibration improved the mass accuracy for LC MS spectra from an absolute mean (47 peptide ions) of 11.5 ppm to 1.5 ppm. Five additional (out of eight known) activating sites of PKC phosphorylation, not detected in positive-ion experiments, were observed by subsequent negative-ion direct infusion nanoelectrospray. Extension of the method to enable infrared multiphoton dissociation of all ions in the ICR cell prior to every other mass measurement revealed the diagnostic neutral loss of H3PO4 from phosphorylated peptide ions. The combination of accurate-mass MS and MS/MS offers a powerful new tool for identifying the presence and site(s) of phosphorylation in peptides, without the need for additional wet chemical derivatization.  相似文献   

19.
MOTIVATION: Peptide identification following tandem mass spectrometry (MS/MS) is usually achieved by searching for the best match between the mass spectrum of an unidentified peptide and model spectra generated from peptides in a sequence database. This methodology will be successful only if the peptide under investigation belongs to an available database. Our objective is to develop and test the performance of a heuristic optimization algorithm capable of dealing with some features commonly found in actual MS/MS spectra that tend to stop simpler deterministic solution approaches. RESULTS: We present the implementation of a Genetic Algorithm (GA) in the reconstruction of amino acid sequences using only spectral features, discuss some of the problems associated with this approach and compare its performance to a de novo sequencing method. The GA can potentially overcome some of the most problematic aspects associated with de novo analysis of real MS/MS data such as missing or unclearly defined peaks and may prove to be a valuable tool in the proteomics field. We assess the performance of our algorithm under conditions of perfect spectral information, in situations where key spectral features are missing, and using real MS/MS spectral data.  相似文献   

20.
We demonstrate an approach for global quantitative analysis of protein mixtures using differential stable isotopic labeling of the enzyme-digested peptides combined with microbore liquid chromatography (LC) matrix-assisted laser desorption ionization (MALDI) mass spectrometry (MS). Microbore LC provides higher sample loading, compared to capillary LC, which facilitates the quantification of low abundance proteins in protein mixtures. In this work, microbore LC is combined with MALDI MS via a heated droplet interface. The compatibilities of two global peptide labeling methods (i.e., esterification to carboxylic groups and dimethylation to amine groups of peptides) with this LC-MALDI technique are evaluated. Using a quadrupole-time-of-flight mass spectrometer, MALDI spectra of the peptides in individual sample spots are obtained to determine the abundance ratio among pairs of differential isotopically labeled peptides. MS/MS spectra are subsequently obtained from the peptide pairs showing significant abundance differences to determine the sequences of selected peptides for protein identification. The peptide sequences determined from MS/MS database search are confirmed by using the overlaid fragment ion spectra generated from a pair of differentially labeled peptides. The effectiveness of this microbore LC-MALDI approach is demonstrated in the quantification and identification of peptides from a mixture of standard proteins as well as E. coli whole cell extract of known relative concentrations. It is shown that this approach provides a facile and economical means of comparing relative protein abundances from two proteome samples.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号