期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

共查询到20条相似文献，搜索用时 15 毫秒

Prediction of missed cleavage sites in tryptic peptides aids protein identification in proteomics

Siepen JA Keevil EJ Knight D Hubbard SJ 《Journal of proteome research》2007,6(1):399-408

Protein identification via peptide mass fingerprinting (PMF) remains a key component of high-throughput proteomics experiments in post-genomic science. Candidate protein identifications are made using bioinformatic tools from peptide peak lists obtained via mass spectrometry (MS). These algorithms rely on several search parameters, including the number of potential uncut peptide bonds matching the primary specificity of the hydrolytic enzyme used in the experiment. Typically, up to one of these "missed cleavages" are considered by the bioinformatics search tools, usually after digestion of the in silico proteome by trypsin. Using two distinct, nonredundant datasets of peptides identified via PMF and tandem MS, a simple predictive method based on information theory is presented which is able to identify experimentally defined missed cleavages with up to 90% accuracy from amino acid sequence alone. Using this simple protocol, we are able to "mask" candidate protein databases so that confident missed cleavage sites need not be considered for in silico digestion. We show that that this leads to an improvement in database searching, with two different search engines, using the PMF dataset as a test set. In addition, the improved approach is also demonstrated on an independent PMF data set of known proteins that also has corresponding high-quality tandem MS data, validating the protein identifications. This approach has wider applicability for proteomics database searching, and the program for predicting missed cleavages and masking Fasta-formatted protein sequence databases has been made available via http:// ispider.smith.man.ac uk/MissedCleave. 相似文献

Postexperiment monoisotopic mass filtering and refinement (PE-MMR) of tandem mass spectrometric data increases accuracy of peptide identification in LC/MS/MS

Shin B Jung HJ Hyung SW Kim H Lee D Lee C Yu MH Lee SW 《Molecular & cellular proteomics : MCP》2008,7(6):1124-1134

Methods for treating MS/MS data to achieve accurate peptide identification are currently the subject of much research activity. In this study we describe a new method for filtering MS/MS data and refining precursor masses that provides highly accurate analyses of massive sets of proteomics data. This method, coined "postexperiment monoisotopic mass filtering and refinement" (PE-MMR), consists of several data processing steps: 1) generation of lists of all monoisotopic masses observed in a whole LC/MS experiment, 2) clusterization of monoisotopic masses of a peptide into unique mass classes (UMCs) based on their masses and LC elution times, 3) matching the precursor masses of the MS/MS data to a representative mass of a UMC, and 4) filtration of the MS/MS data based on the presence of corresponding monoisotopic masses and refinement of the precursor ion masses by the UMC mass. PE-MMR increases the throughput of proteomics data analysis, by efficiently removing "garbage" MS/MS data prior to database searching, and improves the mass measurement accuracies (i.e. 0.05 +/- 1.49 ppm for yeast data (from 4.46 +/- 2.81 ppm) and 0.03 +/- 3.41 ppm for glycopeptide data (from 4.8 +/- 7.4 ppm)) for an increased number of identified peptides. In proteomics analyses of glycopeptide-enriched samples, PE-MMR processing greatly reduces the degree of false glycopeptide identification by correctly assigning the monoisotopic masses for the precursor ions prior to database searching. By applying this technique to analyses of proteome samples of varying complexities, we demonstrate herein that PE-MMR is an effective and accurate method for treating massive sets of proteomics data. 相似文献

Fast multi-blind modification search through tandem mass spectrometry

Na S Bandeira N Paek E 《Molecular & cellular proteomics : MCP》2012,11(4):M111.010199

With great biological interest in post-translational modifications (PTMs), various approaches have been introduced to identify PTMs using MS/MS. Recent developments for PTM identification have focused on an unrestrictive approach that searches MS/MS spectra for all known and possibly even unknown types of PTMs at once. However, the resulting expanded search space requires much longer search time and also increases the number of false positives (incorrect identifications) and false negatives (missed true identifications), thus creating a bottleneck in high throughput analysis. Here we introduce MODa, a novel "multi-blind" spectral alignment algorithm that allows for fast unrestrictive PTM searches with no limitation on the number of modifications per peptide while featuring over an order of magnitude speedup in relation to existing approaches. We demonstrate the sensitivity of MODa on human shotgun proteomics data where it reveals multiple mutations, a wide range of modifications (including glycosylation), and evidence for several putative novel modifications. Based on the reported findings, we argue that the efficiency and sensitivity of MODa make it the first unrestrictive search tool with the potential to fully replace conventional restrictive identification of proteomics mass spectrometry data. 相似文献

基于数据非依赖采集的蛋白质组质谱数据解析方法研究进展

下载免费PDF全文

侯鑫行周丕宇宫鹏云付嘉乐刘超王海鹏《生物化学与生物物理进展》2022,49(12):2364-2386

数据非依赖采集（DIA）是蛋白质组学领域近年来快速发展的质谱采集技术，其通过无偏碎裂隔离窗口内的所有母离子采集二级谱图，理论上可实现蛋白质样品的深度覆盖，同时具有高通量、高重现性和高灵敏度的优点。现有的DIA数据采集方法可以分为全窗口碎裂方法、隔离窗口序列碎裂方法和四维DIA数据采集方法（4D-DIA）3大类。针对DIA数据的不同特点，主要数据解析方法包括谱库搜索方法、蛋白质序列库直接搜索方法、伪二级谱图鉴定方法和从头测序方法4大类。解析得到的肽段鉴定结果需要进行可信度评估，包括使用机器学习方法的重排序和对报告结果集合的假发现率估计两个步骤，实现对数据解析结果的质控。本文对DIA数据的采集方法、数据解析方法及软件和鉴定结果可信度评估方法进行了整理和综述，并展望了未来的发展方向。相似文献

A framework for intelligent data acquisition and real-time database searching for shotgun proteomics

Graumann J Scheltema RA Zhang Y Cox J Mann M 《Molecular & cellular proteomics : MCP》2012,11(3):M111.013185

In the analysis of complex peptide mixtures by MS-based proteomics, many more peptides elute at any given time than can be identified and quantified by the mass spectrometer. This makes it desirable to optimally allocate peptide sequencing and narrow mass range quantification events. In computer science, intelligent agents are frequently used to make autonomous decisions in complex environments. Here we develop and describe a framework for intelligent data acquisition and real-time database searching and showcase selected examples. The intelligent agent is implemented in the MaxQuant computational proteomics environment, termed MaxQuant Real-Time. It analyzes data as it is acquired on the mass spectrometer, constructs isotope patterns and SILAC pair information as well as controls MS and tandem MS events based on real-time and prior MS data or external knowledge. Re-implementing a top10 method in the intelligent agent yields similar performance to the data dependent methods running on the mass spectrometer itself. We demonstrate the capabilities of MaxQuant Real-Time by creating a real-time search engine capable of identifying peptides "on-the-fly" within 30 ms, well within the time constraints of a shotgun fragmentation "topN" method. The agent can focus sequencing events onto peptides of specific interest, such as those originating from a specific gene ontology (GO) term, or peptides that are likely modified versions of already identified peptides. Finally, we demonstrate enhanced quantification of SILAC pairs whose ratios were poorly defined in survey spectra. MaxQuant Real-Time is flexible and can be applied to a large number of scenarios that would benefit from intelligent, directed data acquisition. Our framework should be especially useful for new instrument types, such as the quadrupole-Orbitrap, that are currently becoming available. 相似文献

Expert System for Computer-assisted Annotation of MS/MS Spectra

Nadin Neuhauser Annette Michalski J��rgen Cox Matthias Mann 《Molecular & cellular proteomics : MCP》2012,11(11):1500-1509

An important step in mass spectrometry (MS)-based proteomics is the identification of peptides by their fragment spectra. Regardless of the identification score achieved, almost all tandem-MS (MS/MS) spectra contain remaining peaks that are not assigned by the search engine. These peaks may be explainable by human experts but the scale of modern proteomics experiments makes this impractical. In computer science, Expert Systems are a mature technology to implement a list of rules generated by interviews with practitioners. We here develop such an Expert System, making use of literature knowledge as well as a large body of high mass accuracy and pure fragmentation spectra. Interestingly, we find that even with high mass accuracy data, rule sets can quickly become too complex, leading to over-annotation. Therefore we establish a rigorous false discovery rate, calculated by random insertion of peaks from a large collection of other MS/MS spectra, and use it to develop an optimized knowledge base. This rule set correctly annotates almost all peaks of medium or high abundance. For high resolution HCD data, median intensity coverage of fragment peaks in MS/MS spectra increases from 58% by search engine annotation alone to 86%. The resulting annotation performance surpasses a human expert, especially on complex spectra such as those of larger phosphorylated peptides. Our system is also applicable to high resolution collision-induced dissociation data. It is available both as a part of MaxQuant and via a webserver that only requires an MS/MS spectrum and the corresponding peptides sequence, and which outputs publication quality, annotated MS/MS spectra (www.biochem.mpg.de/mann/tools/). It provides expert knowledge to beginners in the field of MS-based proteomics and helps advanced users to focus on unusual and possibly novel types of fragment ions.In MS-based proteomics, peptides are matched to peptide sequences in databases using search engines (–). Statistical criteria are established for accepted versus rejected peptide spectra matches based on the search engine score, and usually a 99% certainty is required for reported peptides. The search engines typically only take sequence specific backbone fragmentation into account (i.e. a, b, and y ions) and some of their neutral losses. However, tandem mass spectra—especially of larger peptides—can be quite complex and contain a number of medium or even high abundance peptide fragments that are not annotated by the search engine result. This can result in uncertainty for the user—especially if only relatively few peaks are annotated—because it may reflect an incorrect identification. However, the most common cause of unlabeled peaks is that another peptide was present in the precursor selection window and was cofragmented. This has variously been termed “chimeric spectra” (–), or the problem of low precursor ion fraction (PIF)¹ (). Such spectra may still be identifiable with high confidence. The Andromeda search engine in MaxQuant, for instance, attempts to identify a second peptide in such cases (, ). However, even “pure” spectra (those with a high PIF) often still contain many unassigned peaks. These can be caused by different fragment types, such as internal ions, single or combined neutral losses as well as immonium and other ion types in the low mass region. A mass spectrometric expert can assign many or all of these peaks, based on expert knowledge of fragmentation and manual calculation of fragment masses, resulting in a higher degree of confidence for the identification. However, there are more and more practitioners of proteomics without in depth training or experience in annotating MS/MS spectra and such annotation would in any case be prohibitive for hundreds of thousands of spectra. Furthermore, even human experts may wrongly annotate a given peak—especially with low mass accuracy tandem mass spectra—or fail to consider every possibility that could have resulted in this fragment mass.Given the desirability of annotating fragment peaks to the highest degree possible, we turned to “Expert Systems,” a well-established technology in computer science. Expert Systems achieved prominence in the 1970s and 1980s and were meant to solve complex problems by reasoning about knowledge (10, 11). Interestingly, one of the first examples was developed by Nobel Prize winner Joshua Lederberg more than 40 years ago, and dealt with the interpretation of mass spectrometric data. The program''s name was Heuristic DENTRAL (12), and it was capable of interpreting the mass spectra of aliphatic ethers and their fragments. The hypotheses produced by the program described molecular structures that are plausible explanations of the data. To infer these explanations from the data, the program incorporated a theory of chemical stability that provided limiting constraints as well as heuristic rules.In general, the aim of an Expert System is to encode knowledge extracted from professionals in the field in question. This then powers a rule-based system that can be applied broadly and in an automated manner. A rule-based Expert System represents the information obtained from human specialists in the form of IF-THEN rules. These are used to perform operations on input data to reach appropriate conclusion. A generic Expert System is essentially a computer program that provides a framework for performing a large number of inferences in a predictable way, using forward or backward chains, backtracking, and other mechanisms (13). Therefore, in contrast to statistics based learning, the “expert program” does not know what it knows through the raw volume of facts in the computer''s memory. Instead, like a human expert, it relies on a reasoning-like process of applying an empirically derived set of rules to the data.Here we implemented an Expert System for the interpretation for high mass accuracy tandem mass spectrometry data of peptides. It was developed in an iterative manner together with human experts on peptide fragmentation, using the published literature on fragmentation pathways as well as large data sets of higher-energy collisional dissociation (HCD) () and collision-induced dissociation (CID) based peptide identifications. Our goal was to achieve an annotation performance similar or better than experienced mass spectrometrists (), thus making comprehensively annotated peptide spectra available in large scale proteomics. 相似文献

Halogenated Peptides as Internal Standards (H-PINS): INTRODUCTION OF AN MS-BASED INTERNAL STANDARD SET FOR LIQUID CHROMATOGRAPHY-MASS SPECTROMETRY*

Hamid Mirzaei Mi-Youn Brusniak Lukas N. Mueller Simon Letarte Julian D. Watts Ruedi Aebersold 《Molecular & cellular proteomics : MCP》2009,8(8):1934-1946

As the application for quantitative proteomics in the life sciences has grown in recent years, so has the need for more robust and generally applicable methods for quality control and calibration. The reliability of quantitative proteomics is tightly linked to the reproducibility and stability of the analytical platforms, which are typically multicomponent (e.g. sample preparation, multistep separations, and mass spectrometry) with individual components contributing unequally to the overall system reproducibility. Variations in quantitative accuracy are thus inevitable, and quality control and calibration become essential for the assessment of the quality of the analyses themselves. Toward this end, the use of internal standards cannot only assist in the detection and removal of outlier data acquired by an irreproducible system (quality control) but can also be used for detection of changes in instruments for their subsequent performance and calibration. Here we introduce a set of halogenated peptides as internal standards. The peptides are custom designed to have properties suitable for various quality control assessments, data calibration, and normalization processes. The unique isotope distribution of halogenated peptides makes their mass spectral detection easy and unambiguous when spiked into complex peptide mixtures. In addition, they were designed to elute sequentially over an entire aqueous to organic LC gradient and to have m/z values within the commonly scanned mass range (300–1800 Da). In a series of experiments in which these peptides were spiked into an enriched N-glycosite peptide fraction (i.e. from formerly N-glycosylated intact proteins in their deglycosylated form) isolated from human plasma, we show the utility and performance of these halogenated peptides for sample preparation and LC injection quality control as well as for retention time and mass calibration. Further use of the peptides for signal intensity normalization and retention time synchronization for selected reaction monitoring experiments is also demonstrated.As proteomics and systems biology converge, the need for the generation of high quality, large scale quantitative proteomics data sets has grown, and so-called label-free quantification has emerged as a very useful platform for their generation (). Label-free quantitative experiments are usually designed to detect differentially abundant features in biologically relevant samples by comparing mass versus retention time feature maps generated by LC-MS. Although label-free proteomics experiments are time- and cost-effective, they require high levels of reproducibility at every step of the process (). Too much variation resulting from sample preparation, LC performance (e.g. injection, gradient delivery, and flow rate), and MS performance (e.g. ionization efficiency, mass accuracy, and detector performance) could lead to an increase in the false discovery rate of detected peptides. Thus it is crucial to minimize such variation to adequately control the quality of the data. In addition, label-free experiments are often followed by directed MS/MS analyses in which selected peptides are specifically targeted for identification, a procedure that also requires high system reproducibility (, ). The total variation in the acquired data is the result of accumulating variation at each step. This variation, regardless of its source, be it from sample handling, injection irreproducibility, change in analyte volume, matrix and co-eluter interference (both suppression and enhancement), system instability, or finally variations in the ion source performance, can be accounted for if an appropriate internal standard (ISTD)¹ system is used.A more recent development in the field of quantitative proteomics is multireaction monitoring (MRM) also referred to as selected reaction monitoring (SRM). This MS-based technology is aimed at fast, sensitive, and reproducible screening of large sets of known targets and is ideal for building biological assays in which the presence and quantity of specific analytes is being determined in multiple samples. Certain inputs, such as transitional values (m/z values for the precursor ion and its fragment ions), collision energies, and chromatographic retention time are required to build a validated S/MRM assay. These values are either extracted from MS/MS data acquired from biological samples with the same type of instrument used for the S/MRM analyses or from a set of peptide standards (). To maximize the number of S/MRM measurements in one LC-MS/MS run, the use of elution time constraints has proven to be highly beneficial (). ISTDs could therefore play an integral role in building S/MRM assays if used to synchronize input values such as retention times between instruments or to monitor the retention time consistency in sequences of scheduled S/MRM experiments.ISTDs are usually designed to best fit the analytical system for which they are being used. Because the currency of quantitative proteomics is ionized peptide ions, peptides thus represent the best candidates for ISTDs for proteomics measurements. The use of peptides as ISTDs for proteomics applications, however, is not new. Both natural peptides and heavy isotope-labeled peptides (either chemically synthesized or produced by tryptic digestion of biologically expressed quantification concatamers (QconCATs)) have been used as internal standards by spiking (, ). Peptides from the biological analyte have also been used as pseudo-internal standards for normalization (). But a limitation with all these methods that use native and heavy isotope-labeled peptides as ISTDs is signal detection. The MS-based signal detection for this type of peptide can be challenging when trying to confidently detect their signal in ion chromatograms acquired by mass spectral analysis of biological fluids or other samples of similar complexity where densely packed features cover the entire mass and time range (). In addition, there is always a chance that a peptide with the same elemental composition as the internal standard might exist in the analyte and thus completely throw off the calibration curve (). The same argument is valid for heavy isotope-labeled peptides because in many quantitative applications the analytical matrix is made of heavy isotope-labeled peptides (–). Obviously utilization of ISTDs in complex mixtures requires highly confident detection of corresponding signals, and for natural and heavy isotope-labeled peptides MS/MS analysis is the only way to accomplish that. But CID attempts on mass spectral features do not necessarily result in identification. First the MS features from ISTDs have to be picked for CID, and then the fragmentation should result in high quality MS/MS spectra that could be matched to the ISTD sequence with high confidence. This process is not always successful and consequently can result in an incomplete set of ISTD signals. The other limitation of MS/MS-based ISTDs is processing time. All MS/MS data have to be searched and curated before ISTD signals can be used.On the other hand, if ISTD signals could be easily detected at the MS level, then all the aforementioned limitations are lifted. For such a peptide to be an MS-based ISTD, it should really have unusual properties that make it easily detectable in a background of biological peptides.In this study we introduce the use of a set of halogenated peptides as internal standards (H-PINS) with unique isotopic distributions and mass defect that are easily detectable at the MS level by manual search and automated peak picking algorithms. The pattern of the isotopic distribution and mass defect are essential for detection of H-PINS at the MS level. Hence these peptides are best suited for high resolution and mass accuracy instruments. These peptides are similar to ordinary peptides in any other respect and can be treated similarly during purification and LC-MS analysis. We go on to illustrate their use for quality control (QC) at various steps of a proteomics experiment including sample preparation, LC-MS, and mass calibration and retention time synchronization between various analytical platforms. 相似文献

Generalized method for probability-based peptide and protein identification from tandem mass spectrometry data and sequence database searching

Ramos-Fernández A Paradela A Navajas R Albar JP 《Molecular & cellular proteomics : MCP》2008,7(9):1748-1754

Tandem mass spectrometry-based proteomics is currently in great demand of computational methods that facilitate the elimination of likely false positives in peptide and protein identification. In the last few years, a number of new peptide identification programs have been described, but scores or other significance measures reported by these programs cannot always be directly translated into an easy to interpret error rate measurement such as the false discovery rate. In this work we used generalized lambda distributions to model frequency distributions of database search scores computed by MASCOT, X!TANDEM with k-score plug-in, OMSSA, and InsPecT. From these distributions, we could successfully estimate p values and false discovery rates with high accuracy. From the set of peptide assignments reported by any of these engines, we also defined a generic protein scoring scheme that enabled accurate estimation of protein-level p values by simulation of random score distributions that was also found to yield good estimates of protein-level false discovery rate. The performance of these methods was evaluated by searching four freely available data sets ranging from 40,000 to 285,000 MS/MS spectra. 相似文献

LFQuant: A label‐free fast quantitative analysis tool for high‐resolution LC‐MS/MS proteomics data

Changming Xu Ning Li Hui Liu Jie Ma Yunping Zhu Hongwei Xie 《Proteomics》2012,12(23-24):3475-3484

Database searching based methods for label‐free quantification aim to reconstruct the peptide extracted ion chromatogram based on the identification information, which can limit the search space and thus make the data processing much faster. The random effect of the MS/MS sampling can be remedied by cross‐assignment among different runs. Here, we present a new label‐free fast quantitative analysis tool, LFQuant, for high‐resolution LC‐MS/MS proteomics data based on database searching. It is designed to accept raw data in two common formats (mzXML and Thermo RAW), and database search results from mainstream tools (MASCOT, SEQUEST, and X!Tandem), as input data. LFQuant can handle large‐scale label‐free data with fractionation such as SDS‐PAGE and 2D LC. It is easy to use and provides handy user interfaces for data loading, parameter setting, quantitative analysis, and quantitative data visualization. LFQuant was compared with two common quantification software packages, MaxQuant and IDEAL‐Q, on the replication data set and the UPS1 standard data set. The results show that LFQuant performs better than them in terms of both precision and accuracy, and consumes significantly less processing time. LFQuant is freely available under the GNU General Public License v3.0 at http://sourceforge.net/projects/lfquant/ . 相似文献

10.

Increased protein identification capabilities through novel tandem MS calibration strategies

Wu S Kaiser NK Meng D Anderson GA Zhang K Bruce JE 《Journal of proteome research》2005,4(4):1434-1441

High mass measurement accuracy is critical for confident protein identification and characterization in proteomics research. Fourier transform ion cyclotron resonance (FTICR) mass spectrometry is a unique technique which can provide unparalleled mass accuracy and resolving power. However, the mass measurement accuracy of FTICR-MS can be affected by space charge effects. Here, we present a novel internal calibrant-free calibration method that corrects for space charge-induced frequency shifts in FTICR fragment spectra called Calibration Optimization on Fragment Ions (COFI). This new strategy utilizes the information from fixed mass differences between two neighboring peptide fragment ions (such as y(1) and y(2)) to correct the frequency shift after data collection. COFI has been successfully applied to LC-FTICR fragmentation data. Mascot MS/MS ion search data demonstrate that most of the fragments from BSA tryptic digested peptides can be identified using a much lower mass tolerance window after applying COFI to LC-FTICR-MS/MS of BSA tryptic digest. Furthermore, COFI has been used for multiplexed LC-CID-FTICR-MS which is an attractive technique because of its increased duty cycle and dynamic range. After the application of COFI to a multiplexed LC-CID-FTICR-MS of BSA tryptic digest, we achieved an average measured mass accuracy of 2.49 ppm for all the identified BSA fragments. 相似文献

11.

Andromeda: a peptide search engine integrated into the MaxQuant environment 总被引：3，自引：0，他引：3

Cox J Neuhauser N Michalski A Scheltema RA Olsen JV Mann M 《Journal of proteome research》2011,10(4):1794-1805

A key step in mass spectrometry (MS)-based proteomics is the identification of peptides in sequence databases by their fragmentation spectra. Here we describe Andromeda, a novel peptide search engine using a probabilistic scoring model. On proteome data, Andromeda performs as well as Mascot, a widely used commercial search engine, as judged by sensitivity and specificity analysis based on target decoy searches. Furthermore, it can handle data with arbitrarily high fragment mass accuracy, is able to assign and score complex patterns of post-translational modifications, such as highly phosphorylated peptides, and accommodates extremely large databases. The algorithms of Andromeda are provided. Andromeda can function independently or as an integrated search engine of the widely used MaxQuant computational proteomics platform and both are freely available at www.maxquant.org. The combination enables analysis of large data sets in a simple analysis workflow on a desktop computer. For searching individual spectra Andromeda is also accessible via a web server. We demonstrate the flexibility of the system by implementing the capability to identify cofragmented peptides, significantly improving the total number of identified peptides. 相似文献

12.

Galaxy Integrated Omics: Web-based Standards-Compliant Workflows for Proteomics Informed by Transcriptomics*

Jun Fan Shyamasree Saha Gary Barker Kate J. Heesom Fawaz Ghali Andrew R. Jones David A. Matthews Conrad Bessant 《Molecular & cellular proteomics : MCP》2015,14(11):3087-3093

相似文献

13.

PEPPeR, a platform for experimental proteomic pattern recognition

Jaffe JD Mani DR Leptos KC Church GM Gillette MA Carr SA 《Molecular & cellular proteomics : MCP》2006,5(10):1927-1941

Quantitative proteomics holds considerable promise for elucidation of basic biology and for clinical biomarker discovery. However, it has been difficult to fulfill this promise due to over-reliance on identification-based quantitative methods and problems associated with chromatographic separation reproducibility. Here we describe new algorithms termed "Landmark Matching" and "Peak Matching" that greatly reduce these problems. Landmark Matching performs time base-independent propagation of peptide identities onto accurate mass LC-MS features in a way that leverages historical data derived from disparate data acquisition strategies. Peak Matching builds upon Landmark Matching by recognizing identical molecular species across multiple LC-MS experiments in an identity-independent fashion by clustering. We have bundled these algorithms together with other algorithms, data acquisition strategies, and experimental designs to create a Platform for Experimental Proteomic Pattern Recognition (PEPPeR). These developments enable use of established statistical tools previously limited to microarray analysis for treatment of proteomics data. We demonstrate that the proposed platform can be calibrated across 2.5 orders of magnitude and can perform robust quantification of ratios in both simple and complex mixtures with good precision and error characteristics across multiple sample preparations. We also demonstrate de novo marker discovery based on statistical significance of unidentified accurate mass components that changed between two mixtures. These markers were subsequently identified by accurate mass-driven MS/MS acquisition and demonstrated to be contaminant proteins associated with known proteins whose concentrations were designed to change between the two mixtures. These results have provided a real world validation of the platform for marker discovery. 相似文献

14.

Statistical validation of peptide identifications in large-scale proteomics using the target-decoy database search strategy and flexible mixture modeling

Choi H Ghosh D Nesvizhskii AI 《Journal of proteome research》2008,7(1):286-292

Reliable statistical validation of peptide and protein identifications is a top priority in large-scale mass spectrometry based proteomics. PeptideProphet is one of the computational tools commonly used for assessing the statistical confidence in peptide assignments to tandem mass spectra obtained using database search programs such as SEQUEST, MASCOT, or X! TANDEM. We present two flexible methods, the variable component mixture model and the semiparametric mixture model, that remove the restrictive parametric assumptions in the mixture modeling approach of PeptideProphet. Using a control protein mixture data set generated on an linear ion trap Fourier transform (LTQ-FT) mass spectrometer, we demonstrate that both methods improve parametric models in terms of the accuracy of probability estimates and the power to detect correct identifications controlling the false discovery rate to the same degree. The statistical approaches presented here require that the data set contain a sufficient number of decoy (known to be incorrect) peptide identifications, which can be obtained using the target-decoy database search strategy. 相似文献

15.

Index-ion triggered MS2 ion quantification: a novel proteomics approach for reproducible detection and quantification of targeted proteins in complex mixtures

Yan W Luo J Robinson M Eng J Aebersold R Ranish J 《Molecular & cellular proteomics : MCP》2011,10(3):M110.005611

Biomedical research requires protein detection technology that is not only sensitive and quantitative, but that can reproducibly measure any set of proteins in a biological system in a high throughput manner. Here we report the development and application of a targeted proteomics platform termed index-ion triggered MS2 ion quantification (iMSTIQ) that allows reproducible and accurate peptide quantification in complex mixtures. The key feature of iMSTIQ is an approach called index-ion triggered analysis (ITA) that permits the reproducible acquisition of full MS2 spectra of targeted peptides independent of their ion intensities. Accurate quantification is achieved by comparing the relative intensities of multiple pairs of fragment ions derived from isobaric targeted peptides during MS2 analysis. Importantly, the method takes advantage of the favorable performance characteristics of the LTQ-Orbitrap, which include high mass accuracy, resolution, and throughput. As such it provides an attractive targeted proteomics tool to meet the demands of systems biology research and biomarker studies. 相似文献

16.

基于谱图库的蛋白质鉴定策略研究进展

蔚德睿马洁解增言白明泽朱云平舒坤贤《生物工程学报》2018,34(4):525-536

基于质谱的蛋白质组学快速发展,蛋白质质谱数据也呈指数式增长。寻找速度快、准确度高以及重复性好的鉴定方法是该领域的一项重要任务。谱图库搜索策略直接比较实验谱图与谱图库中的真实谱图,充分利用了谱图中的丰度、非常规碎裂模式和其他的一些特征,使得搜索更加快速和准确,成为蛋白质组学的主流鉴定方法之一。文中介绍基于谱图库的蛋白质组质谱数据鉴定策略,并针对其中两个关键步骤——谱图库构建方法和谱图库搜索方法进行深入介绍,探讨了谱图库策略的进展和挑战。相似文献

17.

Preprocessing Significantly Improves the Peptide/Protein Identification Sensitivity of High-resolution Isobarically Labeled Tandem Mass Spectrometry Data

Quanhu Sheng Rongxia Li Jie Dai Qingrun Li Zhiduan Su Yan Guo Chen Li Yu Shyr Rong Zeng 《Molecular & cellular proteomics : MCP》2015,14(2):405-417

Isobaric labeling techniques coupled with high-resolution mass spectrometry have been widely employed in proteomic workflows requiring relative quantification. For each high-resolution tandem mass spectrum (MS/MS), isobaric labeling techniques can be used not only to quantify the peptide from different samples by reporter ions, but also to identify the peptide it is derived from. Because the ions related to isobaric labeling may act as noise in database searching, the MS/MS spectrum should be preprocessed before peptide or protein identification. In this article, we demonstrate that there are a lot of high-frequency, high-abundance isobaric related ions in the MS/MS spectrum, and removing isobaric related ions combined with deisotoping and deconvolution in MS/MS preprocessing procedures significantly improves the peptide/protein identification sensitivity. The user-friendly software package TurboRaw2MGF (v2.0) has been implemented for converting raw TIC data files to mascot generic format files and can be downloaded for free from https://github.com/shengqh/RCPA.Tools/releases as part of the software suite ProteomicsTools. The data have been deposited to the ProteomeXchange with identifier PXD000994.Mass spectrometry-based proteomics has been widely applied to investigate protein mixtures derived from tissue, cell lysates, or from body fluids (, ). Liquid chromatography coupled to tandem mass spectrometry (LC-MS/MS)¹ is the most popular strategy for protein/peptide mixtures analysis in shotgun proteomics (). Large-scale protein/peptide mixtures are separated by liquid chromatography followed by online detection by tandem mass spectrometry. The capabilities of proteomics rely greatly on the performance of the mass spectrometer. With the improvement of MS technology, proteomics has benefited significantly from the high-resolution and excellent mass accuracy (). In recent years, based on the higher efficiency of higher energy collision dissociation (HCD), a new “high–high” strategy (high-resolution MS as well as MS/MS(tandem MS)) has been applied instead of the “high–low” strategy (high-resolution MS, i.e. in Orbitrap, and low-resolution MS/MS, i.e. in ion trap) to obtain high quality tandem MS/MS data as well as full MS in shotgun proteomics. Both full MS scans and MS/MS scans can be performed, and the whole cycle time of MS detection is very compatible with the chromatographic time scale ().High-resolution measurement is one of the most important features in mass spectrometric application. In this high–high strategy, high-resolution and accurate spectra will be achieved in tandem MS/MS scans as well as full MS scans, which makes isotopic peaks distinguishable from one another, thus enabling the easy calculation of precise charge states and monoisotopic mass. During an LC-MS/MS experiment, a multiply charged precursor ion (peptide) is usually isolated and fragmented, and then the multiple charge states of the fragment ions are generated and collected. After full extraction of peak lists from original tandem mass spectra, the commonly used search engines (i.e. Mascot (), Sequest ()) have no capability to distinguish isotopic peaks and recognize charge states, so all of the product ions are considered as all charge state hypotheses during the database search for protein identification. These multiple charge states of fragment ions and their isotopic cluster peaks can be incorrectly assigned by the search engine, which can cause false peptide identification. To overcome this issue, data preprocessing of the high-resolution MS/MS spectra is required before submitting them for identification. There are usually two major preprocessing steps used for high-resolution MS/MS data: deisotoping and deconvolution (, ). Deisotoping of spectra removes all isotopic peaks except monoisotopic peaks from multi-isotopic peaks. Deconvolution of spectra translates multiply charged ions to singly charged ions and also accumulates the intensity of fragment ions by summing up all the intensities from their multiply charged states. After performing these two data-preprocessing steps, the resulting spectra is simpler and cleaner and allows more precise database searching and accurate bioinformatics analysis.With the capacity to analyze multiple samples simultaneously, stable isotope labeling approaches have been widely used in quantitative proteomics. Stable isotope labeling approaches are categorized as metabolic labeling (SILAC, stable isotope labeling by amino acids in cell culture) and chemical labeling (, ). The peptides labeled by the SILAC approach are quantified by precursor ions in full MS spectra, whereas peptides that have been isobarically labeled using chemical means are quantified by reporter ions in MS/MS spectra. There are two similar isobaric chemical labeling methods: (1) isobaric tag for relative and absolute quantification (iTRAQ), and (2) tandem mass tag (TMT) (, ). These reagents contain an amino-reactive group that specifically reacts with N-terminal amino groups and epilson-amino groups of lysine residues to label digested peptides in a typical shotgun proteomics experiment. There are four different channels of isobaric tags: TMT two-plex, iTRAQ four-plex, TMT six-plex, and iTRAQ eight-plex (–). The number before “plex” denotes the number of samples that can be analyzed by the mass spectrum simultaneously. Peptides labeled with different isotopic variants of the tag show identical or similar mass and appear as a single peak in full scans. This single peak may be selected for subsequent MS/MS analysis. In an MS/MS scan, the mass of reporter ions (114 to 117 for iTRAQ four-plex, 113 to 121 for iTRAQ eight-plex, and 126 to 131for TMT six-plex upon CID or HCD activation) are associated with corresponding samples, and the intensities represent the relative abundances of the labeled peptides. Meanwhile, the other ions from the MS/MS spectra can be used for peptide identification. Because of the multiplexing capability, isobaric labeling methods combined with bottom-up proteomics have been widely applied for accurate quantification of proteins on a global scale (, –). Although mostly associated with peptide labeling, these isobaric labeling methods have also been applied at protein level (–).For the proteomic analysis of isobarically labeled peptides/proteins in “high–high” MS strategy, the common consensus is that accurate reporter ions can contribute to more accurate quantification. However, there is no evidence to show how the ions related to isobaric labeling affect the peptide/protein identification and what preprocessing steps should be taken for high-resolution isobarically labeled MS/MS. To demonstrate the effectiveness and importance of preprocessing, we examined how the combination of preprocessing steps improved peptide/protein sensitivity in database searching. Several combinatorial ways of data-preprocessing were applied for high-throughput data analysis including deisotoping to keep simple monoisotopic mass peaks, deconvolution of ions with multiple charge states, and preservation of top 10 peaks in every 100 Dalton mass range. After systematic analysis of high-resolution isobarically labeled spectra, we further processed the spectra and removed interferential ions that were not related to the peptide. Our results suggested that the preprocessing of isobarically labeled high-resolution tandem mass spectra significantly improved the peptide/protein identification sensitivity. 相似文献

18.

MassMatrix: A database search program for rapid characterization of proteins and peptides from tandem mass spectrometry data

Hua Xu Michael A. Freitas Dr. 《Proteomics》2009,9(6):1548-1555

MassMatrix is a program that matches tandem mass spectra with theoretical peptide sequences derived from a protein database. The program uses a mass accuracy sensitive probabilistic score model to rank peptide matches. The MS/MS search software was evaluated by use of a high mass accuracy dataset and its results compared with those from MASCOT, SEQUEST, X!Tandem, and OMSSA. For the high mass accuracy data, MassMatrix provided better sensitivity than MASCOT, SEQUEST, X!Tandem, and OMSSA for a given specificity and the percentage of false positives was 2%. More importantly all manually validated true positives corresponded to a unique peptide/spectrum match. The presence of decoy sequence and additional variable PTMs did not significantly affect the results from the high mass accuracy search. MassMatrix performs well when compared with MASCOT, SEQUEST, X!Tandem, and OMSSA with regard to search time. MassMatrix was also run on a distributed memory clusters and achieved search speeds of ～100 000 spectra per hour when searching against a complete human database with eight variable modifications. The algorithm is available for public searches at http://www.massmatrix.net. 相似文献

19.

From proteomics data representation to public data flow: a report on the HUPO-PSI workshop September 2011, Geneva, Switzerland

Orchard S Albar JP Deutsch EW Eisenacher M Binz PA Martinez-Bartolomé S Vizcaíno JA Hermjakob H 《Proteomics》2012,12(3):351-355

The plenary session of the Proteomics Standards Initiative (PSI) of the Human Proteome Organization at the Tenth annual HUPO World Congress updated the delegates on the ongoing activities of this group. The Molecular Interactions workgroup described the success of the PSICQUIC web service, which enables users to access multiple interaction resources with a single query. One user instance is the IMEx Consortium, which uses the service to enable users to access a non-redundant set of protein-protein interaction records. The mass spectrometry data formats, mzML for mass spectrometer output files and mzIdentML for the output of search engines, are now successfully established with increasing numbers of implementations. A format for the output of quantitative proteomics data, mzQuantML, and also TraML, for SRM/MRM transition lists, are both currently nearing completion. The corresponding MIAPE documents are being updated in line with advances in the field, as is the shared controlled vocabulary PSI-MS. In addition, the mzTab format was introduced, as a simpler way to report MS proteomics and metabolomics results. Finally, the ProteomeXchange Consortium, which will supply a single entry point for the submission of MS proteomics data to multiple data resources including PRIDE and PeptideAtlas, is currently being established. 相似文献

20.

Optimization and use of peptide mass measurement accuracy in shotgun proteomics 总被引：2，自引：0，他引：2

Haas W Faherty BK Gerber SA Elias JE Beausoleil SA Bakalarski CE Li X Villén J Gygi SP 《Molecular & cellular proteomics : MCP》2006,5(7):1326-1337

Mass spectrometers that provide high mass accuracy such as FT-ICR instruments are increasingly used in proteomic studies. Although the importance of accurately determined molecular masses for the identification of biomolecules is generally accepted, its role in the analysis of shotgun proteomic data has not been thoroughly studied. To gain insight into this role, we used a hybrid linear quadrupole ion trap/FT-ICR (LTQ FT) mass spectrometer for LC-MS/MS analysis of a highly complex peptide mixture derived from a fraction of the yeast proteome. We applied three data-dependent MS/MS acquisition methods. The FT-ICR part of the hybrid mass spectrometer was either not exploited, used only for survey MS scans, or also used for acquiring selected ion monitoring scans to optimize mass accuracy. MS/MS data were assigned with the SEQUEST algorithm, and peptide identifications were validated by estimating the number of incorrect assignments using the composite target/decoy database search strategy. We developed a simple mass calibration strategy exploiting polydimethylcyclosiloxane background ions as calibrant ions. This strategy allowed us to substantially improve mass accuracy without reducing the number of MS/MS spectra acquired in an LC-MS/MS run. The benefits of high mass accuracy were greatest for assigning MS/MS spectra with low signal-to-noise ratios and for assigning phosphopeptides. Confident peptide identification rates from these data sets could be doubled by the use of mass accuracy information. It was also shown that improving mass accuracy at a cost to the MS/MS acquisition rate substantially lowered the sensitivity of LC-MS/MS analyses. The use of FT-ICR selected ion monitoring scans to maximize mass accuracy reduced the number of protein identifications by 40%. 相似文献