首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Searching spectral libraries in MS/MS is an important new approach to improving the quality of peptide and protein identification. The idea relies on the observation that ion intensities in an MS/MS spectrum of a given peptide are generally reproducible across experiments, and thus, matching between spectra from an experiment and the spectra of previously identified peptides stored in a spectral library can lead to better peptide identification compared to the traditional database search. However, the use of libraries is greatly limited by their coverage of peptide sequences: even for well‐studied organisms a large fraction of peptides have not been previously identified. To address this issue, we propose to expand spectral libraries by predicting the MS/MS spectra of peptides based on the spectra of peptides with similar sequences. We first demonstrate that the intensity patterns of dominant fragment ions between similar peptides tend to be similar. In accordance with this observation, we develop a neighbor‐based approach that first selects peptides that are likely to have spectra similar to the target peptide and then combines their spectra using a weighted K‐nearest neighbor method to accurately predict fragment ion intensities corresponding to the target peptide. This approach has the potential to predict spectra for every peptide in the proteome. When rigorous quality criteria are applied, we estimate that the method increases the coverage of spectral libraries available from the National Institute of Standards and Technology by 20–60%, although the values vary with peptide length and charge state. We find that the overall best search performance is achieved when spectral libraries are supplemented by the high quality predicted spectra.  相似文献   

2.
Spectral library searching is an emerging approach in peptide identifications from tandem mass spectra, a critical step in proteomic data analysis. In spectral library searching, a spectral library is first meticulously compiled from a large collection of previously observed peptide MS/MS spectra that are conclusively assigned to their corresponding amino acid sequence. An unknown spectrum is then identified by comparing it to all the candidates in the spectral library for the most similar match. This review discusses the basic principles of spectral library building and searching, describes its advantages and limitations, and provides a primer for researchers interested in adopting this new approach in their data analysis. It will also discuss the future outlook on the evolution and utility of spectral libraries in the field of proteomics.  相似文献   

3.
Wenguang Shao  Kan Zhu  Henry Lam 《Proteomics》2013,13(22):3273-3283
Spectral library searching is a maturing approach for peptide identification from MS/MS, offering an alternative to traditional sequence database searching. Spectral library searching relies on direct spectrum‐to‐spectrum matching between the query data and the spectral library, which affords better discrimination of true and false matches, leading to improved sensitivity. However, due to the inherent diversity of the peak location and intensity profiles of real spectra, the resulting similarity score distributions often take on unpredictable shapes. This makes it difficult to model the scores of the false matches accurately, necessitating the use of decoy searching to sample the score distribution of the false matches. Here, we refined the similarity scoring in spectral library searching to enable the validation of spectral search results without the use of decoys. We rank‐transformed the peak intensities to standardize all spectra, making it possible to fit a parametric distribution to the scores of the nontop‐scoring spectral matches. The statistical significance of the top‐scoring match can then be estimated in a rigorous manner according to Extreme Value Theory. The overall result is a more robust and interpretable measure of the quality of the spectral match, which can be obtained without decoys. We tested this refined similarity scoring function on real datasets and demonstrated its effectiveness. This approach reduces search time, increases sensitivity, and extends spectral library searching to situations where decoy spectra cannot be readily generated, such as in searching unidentified and nonpeptide spectral libraries.  相似文献   

4.
The identification of ubiquitin (Ub) and Ub‐like protein (Ubl) conjugation sites is important in understanding their roles in biological pathway regulations. However, unambiguously and sensitively identifying Ub/Ubl conjugation sites through high‐throughput MS remains challenging. We introduce an improved workflow for identifying Ub/Ubl conjugation sites based on the ChopNSpice and X!Tandem software. ChopNSpice is modified to generate Ub/Ubl conjugation peptides in the form of a cross‐link. A combinatorial FASTA database can be acquired using the modified ChopNSpice (MchopNSpice). The modified X!Tandem (UblSearch) introduces a new fragmentation model for the Ub/Ubl conjugation peptides to match unambiguously the MS/MS spectra with linear peptides or Ub/Ubl conjugation peptides using the combinatorial FASTA database. The novel workflow exhibited better performance in analyzing an Ub and Ubl spectral library and a large‐scale Trypanosoma cruzi small Ub‐related modifier dataset compared with the original ChopNSpice method. The proposed workflow is more suitable for processing large‐scale MS datasets of Ub/Ubl modification. MchopNSpice and UblSearch are freely available under the GNU General Public License v3.0 at http://sourceforge.net/projects/maublsearch .  相似文献   

5.
A notable inefficiency of shotgun proteomics experiments is the repeated rediscovery of the same identifiable peptides by sequence database searching methods, which often are time-consuming and error-prone. A more precise and efficient method, in which previously observed and identified peptide MS/MS spectra are catalogued and condensed into searchable spectral libraries to allow new identifications by spectral matching, is seen as a promising alternative. To that end, an open-source, functionally complete, high-throughput and readily extensible MS/MS spectral searching tool, SpectraST, was developed. A high-quality spectral library was constructed by combining the high-confidence identifications of millions of spectra taken from various data repositories and searched using four sequence search engines. The resulting library consists of over 30,000 spectra for Saccharomyces cerevisiae. Using this library, SpectraST vastly outperforms the sequence search engine SEQUEST in terms of speed and the ability to discriminate good and bad hits. A unique advantage of SpectraST is its full integration into the popular Trans Proteomic Pipeline suite of software, which facilitates user adoption and provides important functionalities such as peptide and protein probability assignment, quantification, and data visualization. This method of spectral library searching is especially suited for targeted proteomics applications, offering superior performance to traditional sequence searching.  相似文献   

6.
We demonstrate a new approach to the determination of amino acid composition from tandem mass spectrometrically fragmented peptides using both experimental and simulated data. The approach has been developed to be used as a search-space filter in a protein identification pipeline with the aim of increased performance above that which could be attained by using immonium ion information. Three automated methods have been developed and tested: one based upon a simple peak traversal, in which all intense ion peaks are treated as being either a b- or y-ion using a wide mass tolerance; a second which uses a much narrower tolerance and does not perform transformations of ion peaks to the complementary type; and the unique fragments method which allows for b- or y-ion type to be inferred and corroborated using a scan of the other ions present in each peptide spectrum. The combination of these methods is shown to provide a high-accuracy set of amino acid predictions using both experimental and simulated data sets. These high quality predictions, with an accuracy of over 85%, may be used to identify peptide fragments that are hard to identify using other methods. The data simulation algorithm is also shown post priori to be a good model of noiseless tandem mass spectrometric peptide data.  相似文献   

7.
In a typical shotgun proteomics experiment, a significant number of high‐quality MS/MS spectra remain “unassigned.” The main focus of this work is to improve our understanding of various sources of unassigned high‐quality spectra. To achieve this, we designed an iterative computational approach for more efficient interrogation of MS/MS data. The method involves multiple stages of database searching with different search parameters, spectral library searching, blind searching for modified peptides, and genomic database searching. The method is applied to a large publicly available shotgun proteomic data set.  相似文献   

8.
As the speed of mass spectrometers, sophistication of sample fractionation, and complexity of experimental designs increase, the volume of tandem mass spectra requiring reliable automated analysis continues to grow. Software tools that quickly, effectively, and robustly determine the peptide associated with each spectrum with high confidence are sorely needed. Currently available tools that postprocess the output of sequence-database search engines use three techniques to distinguish the correct peptide identifications from the incorrect: statistical significance re-estimation, supervised machine learning scoring and prediction, and combining or merging of search engine results. We present a unifying framework that encompasses each of these techniques in a single model-free machine-learning framework that can be trained in an unsupervised manner. The predictor is trained on the fly for each new set of search results without user intervention, making it robust for different instruments, search engines, and search engine parameters. We demonstrate the performance of the technique using mixtures of known proteins and by using shuffled databases to estimate false discovery rates, from data acquired on three different instruments with two different ionization technologies. We show that this approach outperforms machine-learning techniques applied to a single search engine’s output, and demonstrate that combining search engine results provides additional benefit. We show that the performance of the commercial Mascot tool can be bested by the machine-learning combination of two open-source tools X!Tandem and OMSSA, but that the use of all three search engines boosts performance further still. The Peptide identification Arbiter by Machine Learning (PepArML) unsupervised, model-free, combining framework can be easily extended to support an arbitrary number of additional searches, search engines, or specialized peptide–spectrum match metrics for each spectrum data set. PepArML is open-source and is available from . Electronic supplementary material The online version of this article (doi: ) contains supplementary material, which is available to authorized users.  相似文献   

9.
A novel hierarchical MS2/MS3 database search algorithm has been developed to analyze MS2/MS3 phosphopeptides proteomic data. The algorithm is incorporated in an automated database search program, MassMatrix. The algorithm matches experimental MS2 spectra against a supplied protein database to determine candidate peptide matches. It then matches the corresponding experimental MS3 spectra against those candidate peptide matches. The MS2 and MS3 spectra are used in concert to arrive at peptide matches with overall higher confidence rather than combining MS2 and MS3 data searched separately. Receiver operating characteristic analysis showed that hierarchical MS2/MS3 database searches with MassMatrix had better sensitivity and specificity than the two‐stage MS2/MS3 database searches obtained with MassMatrix, MASCOT, and X!Tandem. A greater number of true peptide matches at a given false rate were identified by use of this new algorithm for data collected on both LCQ and LTQ‐FTICR mass spectrometers. The additional MS3 spectral data also improved the overall reliability and the number of true positives (TPs) due to the fact that the TPs of the MS2/MS3 search results had higher scores than those of the MS2.  相似文献   

10.
In tandem mass spectrometry (MS/MS), there are several different fragmentation techniques possible, including, collision‐induced dissociation (CID) higher energy collisional dissociation (HCD), electron‐capture dissociation (ECD), and electron transfer dissociation (ETD). When using pairs of spectra for de novo peptide sequencing, the most popular methods are designed for CID (or HCD) and ECD (or ETD) spectra because of the complementarity between them. Less attention has been paid to the use of CID and HCD spectra pairs. In this study, a new de novo peptide sequencing method is proposed for these spectra pairs. This method includes a CID and HCD spectra merging criterion and a parent mass correction step, along with improvements to our previously proposed algorithm for sequencing merged spectra. Three pairs of spectral datasets were used to investigate and compare the performance of the proposed method with other existing methods designed for single spectrum (HCD or CID) sequencing. Experimental results showed that full‐length peptide sequencing accuracy was increased significantly by using spectra pairs in the proposed method, with the highest accuracy reaching 81.31%.  相似文献   

11.
We present MassSieve, a Java‐based platform for visualization and parsimony analysis of single and comparative LC‐MS/MS database search engine results. The success of mass spectrometric peptide sequence assignment algorithms has led to the need for a tool to merge and evaluate the increasing data set sizes that result from LC‐MS/MS‐based shotgun proteomic experiments. MassSieve supports reports from multiple search engines with differing search characteristics, which can increase peptide sequence coverage and/or identify conflicting or ambiguous spectral assignments.  相似文献   

12.
We report an isotope labeling shotgun proteome analysis strategy to validate the spectrum-to-sequence assignments generated by using sequence-database searching for the construction of a more reliable MS/MS spectral library. This strategy is demonstrated in the analysis of the E. coli K12 proteome. In the workflow, E. coli cells were cultured in normal and (15)N-enriched media. The differentially labeled proteins from the cell extracts were subjected to trypsin digestion and two-dimensional liquid chromatography quadrupole time-of-flight tandem mass spectrometry (2D-LC QTOF MS/MS) analysis. The MS/MS spectra of the two samples were individually searched using Mascot against the E. coli proteome database to generate lists of peptide sequence matches. The two data sets were compared by overlaying the spectra of unlabeled and labeled matches of the same peptide sequence for validation. Two cutoff filters, one based on the number of common fragment ions and another one on the similarity of intensity patterns among the common ions, were developed and applied to the overlaid spectral pairs to reject the low quality or incorrectly assigned spectra. By examining 257,907 and 245,156 spectra acquired from the unlabeled and (15)N-labeled samples, respectively, an experimentally validated MS/MS spectral library of tryptic peptides was constructed for E. coli K12 that consisted of 9,302 unique spectra with unique sequence and charge state, representing 7,763 unique peptide sequences. This E. coli spectral library could be readily expanded, and the overall strategy should be applicable to other organisms. Even with this relatively small library, it was shown that more peptides could be identified with higher confidence using the spectral search method than by sequence-database searching.  相似文献   

13.
MHC class I (MHC‐I)‐bound ligands play a pivotal role in CD8 T cell immunity and are hence of major interest in understanding and designing immunotherapies. One of the most commonly utilized approaches for detecting MHC ligands is LC‐MS/MS. Unfortunately, the effectiveness of current algorithms to identify MHC ligands from LC‐MS/MS data is limited because the search algorithms used were originally developed for proteomics approaches detecting tryptic peptides. Consequently, the analysis often results in inflated false discovery rate (FDR) statistics and an overall decrease in the number of peptides that pass FDR filters. Andreatta et al. describe a new scoring tool (MS‐rescue) for peptides from MHC‐I immunopeptidome datasets. MS‐rescue incorporates the existence of MHC‐I peptide motifs to rescore peptides from ligandome data. The tool is demonstrated here using peptides assigned from LC‐MS/MS data with PEAKs software but can be deployed on data from any search algorithm. This new approach increased the number of peptides identified by up to 20–30% and promises to aid the discovery of novel MHC‐I ligands with immunotherapeutic potential.  相似文献   

14.
In prostate cancer (PCa), prognostic (predictive) factors are particularly important given the marked heterogeneity of this disease at clinical, morphologic, and biomolecular levels. Blood contains a treasure of previously unstudied biomarkers that could reflect the ongoing physiological state of all tissue. The serum prostate-specific antigen (PSA) measurement is a very good biomarker for PCa, but the percentage of bad classification is somewhat high. The blood proteome mass spectra (MS) represent a potential tool for detection of diseases; however the identification of a single biomarker from the complex output from MS is often difficult. In this paper, we propose a general strategy, based on computational chemistry techniques, which should improve the predictive power of PSA. Our group adapted the square-spiral graph to represent human serum-plasma-proteome MS for healthy and PCa patients. These graphs were previously applied to DNA and/or protein sequences. In this work, we calculated different classes of connectivity indices (CIs), and created various models based on the spectral moments. The best QPDRs model found showed accuracy values ranging from 71.7% to 97.2%, and 70.4% to 99.2% of specificity. This methodology might be useful for several applications in computational chemistry.  相似文献   

15.
For bottom‐up proteomics, there are wide variety of database‐searching algorithms in use for matching peptide sequences to tandem MS spectra. Likewise, there are numerous strategies being employed to produce a confident list of peptide identifications from the different search algorithm outputs. Here we introduce a grid‐search approach for determining optimal database filtering criteria in shotgun proteomics data analyses that is easily adaptable to any search. Systematic Trial and Error Parameter Selection‐–referred to as STEPS‐–utilizes user‐defined parameter ranges to test a wide array of parameter combinations to arrive at an optimal “parameter set” for data filtering, thus maximizing confident identifications. The benefits of this approach in terms of numbers of true‐positive identifications are demonstrated using datasets derived from immunoaffinity‐depleted blood serum and a bacterial cell lysate, two common proteomics sample types.  相似文献   

16.
MassMatrix is a program that matches tandem mass spectra with theoretical peptide sequences derived from a protein database. The program uses a mass accuracy sensitive probabilistic score model to rank peptide matches. The MS/MS search software was evaluated by use of a high mass accuracy dataset and its results compared with those from MASCOT, SEQUEST, X!Tandem, and OMSSA. For the high mass accuracy data, MassMatrix provided better sensitivity than MASCOT, SEQUEST, X!Tandem, and OMSSA for a given specificity and the percentage of false positives was 2%. More importantly all manually validated true positives corresponded to a unique peptide/spectrum match. The presence of decoy sequence and additional variable PTMs did not significantly affect the results from the high mass accuracy search. MassMatrix performs well when compared with MASCOT, SEQUEST, X!Tandem, and OMSSA with regard to search time. MassMatrix was also run on a distributed memory clusters and achieved search speeds of ~100 000 spectra per hour when searching against a complete human database with eight variable modifications. The algorithm is available for public searches at http://www.massmatrix.net.  相似文献   

17.
Summary Two combinatorial libraries of 1296 compounds each were synthesized from two sets of carboxylic acid building blocks and two diamino acid scaffolds. The library was designed to produce low-molecular-weight compounds in a soluble form, to be assayed as potential ligands for peptidergic receptors.  相似文献   

18.
Typically, detection of protein sequences in collision-induced dissociation (CID) tandem MS (MS2) dataset is performed by mapping identified peptide ions back to protein sequence by using the protein database search (PDS) engine. Finding a particular peptide sequence of interest in CID MS2 records very often requires manual evaluation of the spectrum, regardless of whether the peptide-associated MS2 scan is identified by PDS algorithm or not. We have developed a compact cross-platform database-free command-line utility, pepgrep, which helps to find an MS2 fingerprint for a selected peptide sequence by pattern-matching of modelled MS2 data using Peptide-to-MS2 scoring algorithm. pepgrep can incorporate dozens of mass offsets corresponding to a variety of post-translational modifications (PTMs) into the algorithm. Decoy peptide sequences are used with the tested peptide sequence to reduce false-positive results. The engine is capable of screening an MS2 data file at a high rate when using a cluster computing environment. The matched MS2 spectrum can be displayed by using built-in graphical application programming interface (API) or optionally recorded to file. Using this algorithm, we were able to find extra peptide sequences in studied CID spectra that were missed by PDS identification. Also we found pepgrep especially useful for examining a CID of small fractions of peptides resulting from, for example, affinity purification techniques. The peptide sequences in such samples are less likely to be positively identified by using routine protein-centric algorithm implemented in PDS. The software is freely available at http://bsproteomics.essex.ac.uk:8080/data/download/pepgrep-1.4.tgz.  相似文献   

19.
Ion suppression effects during electrospray-ionsation mass spectrometry (ESI-MS) caused by different sample preparation procedures for serum were investigated. This topic is of importance for systematic toxicological analysis for which LC-ESI-MS has been developed with transport-region collision-induced dissociation (ECI-CID) and mass spectra library searching. With continuous postcolumn infusion of two test compounds-codeine and glafenine-the ion suppression effects of extracted biological matrix obtained after a standard liquid-liquid extraction, a mixed-mode solid-phase extraction (SPE) method, a protein precipitation method and a combination of precipitation with polymer-based mixed-mode SPE have been investigated. Extracted ion chromatograms of codeine ([M+H](+), m/z 300) and glafenine ([M-H](-), m/z 371) were used for monitoring ion suppression. Severe ion suppression effects for codeine and glafenine were detected in positive and in negative ionisation modes, respectively, in the LC-front peak after serum clean-up with SPE (acid/neutral fraction) and protein precipitation as well as with protein precipitation combined with SPE. Less ion suppression of codeine in positive mode was found with liquid-liquid extraction of serum samples. No ion suppression was detected with the second fraction of the mixed-mode SPE (using RP-C(8) and cation-exchange phase) in both ionisation modes. All suppression effects were caused by polar and unretained matrix components, which were present after extraction and/or protein precipitation. However, no specific ion suppression was seen after elution of the polar LC-front throughout the whole gradient. It could be demonstrated, that ion suppression is not generally present at any retention time when using reversed-phase HPLC with rather long gradient programs, but may play an important role in case of high-throughput LC-MS analysis, when the analyte is not separated from the LC-front, or in flow injection analysis without chromatographic separation.  相似文献   

20.
The identification and characterization of peptides from MS/MS data represents a critical aspect of proteomics. It has been the subject of extensive research in bioinformatics resulting in the generation of a fair number of identification software tools. Most often, only one program with a specific and unvarying set of parameters is selected for identifying proteins. Hence, a significant proportion of the experimental spectra do not match the peptide sequences in the screened database due to inappropriate parameters or scoring schemes. The Swiss protein identification toolbox (swissPIT) project provides the scientific community with an expandable multitool platform for automated in‐depth analysis of MS data also able to handle data from high‐throughput experiments. With swissPIT many problems have been solved: The missing standards for input and output formats (A), creation of analysis workflows (B), unified result visualization (C), and simplicity of the user interface (D). Currently, swissPIT supports four different programs implementing two different search strategies to identify MS/MS spectra. Conceived to handle the calculation‐intensive needs of each of the programs, swissPIT uses the distributed resources of a Swiss‐wide computer Grid (http://www.swing‐grid.ch).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号