首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A notable inefficiency of shotgun proteomics experiments is the repeated rediscovery of the same identifiable peptides by sequence database searching methods, which often are time-consuming and error-prone. A more precise and efficient method, in which previously observed and identified peptide MS/MS spectra are catalogued and condensed into searchable spectral libraries to allow new identifications by spectral matching, is seen as a promising alternative. To that end, an open-source, functionally complete, high-throughput and readily extensible MS/MS spectral searching tool, SpectraST, was developed. A high-quality spectral library was constructed by combining the high-confidence identifications of millions of spectra taken from various data repositories and searched using four sequence search engines. The resulting library consists of over 30,000 spectra for Saccharomyces cerevisiae. Using this library, SpectraST vastly outperforms the sequence search engine SEQUEST in terms of speed and the ability to discriminate good and bad hits. A unique advantage of SpectraST is its full integration into the popular Trans Proteomic Pipeline suite of software, which facilitates user adoption and provides important functionalities such as peptide and protein probability assignment, quantification, and data visualization. This method of spectral library searching is especially suited for targeted proteomics applications, offering superior performance to traditional sequence searching.  相似文献   

2.
Spectral libraries have emerged as a viable alternative to protein sequence databases for peptide identification. These libraries contain previously detected peptide sequences and their corresponding tandem mass spectra (MS/MS). Search engines can then identify peptides by comparing experimental MS/MS scans to those in the library. Many of these algorithms employ the dot product score for measuring the quality of a spectrum-spectrum match (SSM). This scoring system does not offer a clear statistical interpretation and ignores fragment ion m/z discrepancies in the scoring. We developed a new spectral library search engine, Pepitome, which employs statistical systems for scoring SSMs. Pepitome outperformed the leading library search tool, SpectraST, when analyzing data sets acquired on three different mass spectrometry platforms. We characterized the reliability of spectral library searches by confirming shotgun proteomics identifications through RNA-Seq data. Applying spectral library and database searches on the same sample revealed their complementary nature. Pepitome identifications enabled the automation of quality analysis and quality control (QA/QC) for shotgun proteomics data acquisition pipelines.  相似文献   

3.
Searching spectral libraries in MS/MS is an important new approach to improving the quality of peptide and protein identification. The idea relies on the observation that ion intensities in an MS/MS spectrum of a given peptide are generally reproducible across experiments, and thus, matching between spectra from an experiment and the spectra of previously identified peptides stored in a spectral library can lead to better peptide identification compared to the traditional database search. However, the use of libraries is greatly limited by their coverage of peptide sequences: even for well‐studied organisms a large fraction of peptides have not been previously identified. To address this issue, we propose to expand spectral libraries by predicting the MS/MS spectra of peptides based on the spectra of peptides with similar sequences. We first demonstrate that the intensity patterns of dominant fragment ions between similar peptides tend to be similar. In accordance with this observation, we develop a neighbor‐based approach that first selects peptides that are likely to have spectra similar to the target peptide and then combines their spectra using a weighted K‐nearest neighbor method to accurately predict fragment ion intensities corresponding to the target peptide. This approach has the potential to predict spectra for every peptide in the proteome. When rigorous quality criteria are applied, we estimate that the method increases the coverage of spectral libraries available from the National Institute of Standards and Technology by 20–60%, although the values vary with peptide length and charge state. We find that the overall best search performance is achieved when spectral libraries are supplemented by the high quality predicted spectra.  相似文献   

4.
《Proteomics》2009,9(6)
In this issue of Proteomics you will find the following highlighted articles: Keeping up with the lung cancers You're in good company if you smoke and develop lung cancer. The World Health Organization estimates 1.2 million new cases occur every year. On the other hand, 1.1 million people die from it every year‐bummer. One reason for the high death rate is the frequent development of resistance to several of the most commonly used drugs simultaneously. Multiple drug resistance (MDR) is the major cause of chemotherapeutic failure. Keenan et al. explored the proteomic changes associated with MDR failure (adriamycin) in a cultured lung cancer cell line (DLKP) and several subtypes. Adriamycin normally kills by blocking replication at DNA gyrase and by generating reactive oxygen species that lead to apoptosis. Proteomes were examined by 2‐D DIGE. Approximately 80 proteins displayed quantitative shifts, 32 showed a correlation with resistance, 24 being linked positively to resistance, 6 correlated negatively. Some known targets did not appear on the 2‐D maps consistently. Keenan, J. et al., Proteomics 2009, 9, 1556‐1566. An image of spit Spitting images have been around for a long time. The phrase is possibly human‐kind's first recognition of genetically transmitted traits. Proteomic analysis of saliva has only developed recently. The question raised by Walz et al. here is “What is the possible contribution of saliva to the high level of infection by Helicobacter pylori?” H. pylori is known to have extracellular adhesins that bind to a number of salivary proteins. A convenient way to detect targets of adhesins was found to be incubating 1‐D and 2‐D PAGE Western blots with an overlay of whole H. pylori. Targets detected included mucins, sialic acid‐containing glycoproteins, fucose‐containing blood group antigens and each pair of salivary glands had a different binding pattern. Walz, A. et al., Proteomics 2009, 9, 1582‐1592. Mix'em up, folks Conventional analytical chemical identifications frequently yield a characteristic spectrum of peaks for particular compounds on particular instruments. Just look up the observed spectrum in the “library” of standard spectra for identification. It is not so simple for proteins. Because of the size of a potential proteomic peptide library and the diversity of instruments used, most often the observed spectrum is compared to a theoretical spectrum for a peptide of interest. Ahrné et al. combine the two for improved performance. First they run the spectrum of interest through an exhaustive proteome search program (Phenyx), then through a sensitive library search (SpectraST) of the highest scoring sequences in the previous Phenyx search plus a number of controls. In the first (relatively simple) test, Phenyx matched 362 spectra, SpectraST made 639 matches at the same error detection level. In a more complex test, Phenyx generated >1000 hits, SpectraST 1304 hits. Ahrné, E. et al., Proteomics 2009, 9, 1731‐1736.  相似文献   

5.
《Proteomics》2009,9(6)
In this issue of Proteomics you will find the following highlighted articles: Keeping up with the lung cancers You're in good company if you smoke and develop lung cancer. The World Health Organization estimates 1.2 million new cases occur every year. On the other hand, 1.1 million people die from it every year‐bummer. One reason for the high death rate is the frequent development of resistance to several of the most commonly used drugs simultaneously. Multiple drug resistance (MDR) is the major cause of chemotherapeutic failure. Keenan et al. explored the proteomic changes associated with MDR failure (adriamycin) in a cultured lung cancer cell line (DLKP) and several subtypes. Adriamycin normally kills by blocking replication at DNA gyrase and by generating reactive oxygen species that lead to apoptosis. Proteomes were examined by 2‐D DIGE. Approximately 80 proteins displayed quantitative shifts, 32 showed a correlation with resistance, 24 being linked positively to resistance, 6 correlated negatively. Some known targets did not appear on the 2‐D maps consistently. Keenan, J. et al., Proteomics 2009, 9, 1556‐1566. An image of spit Spitting images have been around for a long time. The phrase is possibly human‐kind's first recognition of genetically transmitted traits. Proteomic analysis of saliva has only developed recently. The question raised by Walz et al. here is “What is the possible contribution of saliva to the high level of infection by Helicobacter pylori?” H. pylori is known to have extracellular adhesins that bind to a number of salivary proteins. A convenient way to detect targets of adhesins was found to be incubating 1‐D and 2‐D PAGE Western blots with an overlay of whole H. pylori. Targets detected included mucins, sialic acid‐containing glycoproteins, fucose‐containing blood group antigens and each pair of salivary glands had a different binding pattern. Walz, A. et al., Proteomics 2009, 9, 1582‐1592. Mix'em up, folks Conventional analytical chemical identifications frequently yield a characteristic spectrum of peaks for particular compounds on particular instruments. Just look up the observed spectrum in the “library” of standard spectra for identification. It is not so simple for proteins. Because of the size of a potential proteomic peptide library and the diversity of instruments used, most often the observed spectrum is compared to a theoretical spectrum for a peptide of interest. Ahrné et al. combine the two for improved performance. First they run the spectrum of interest through an exhaustive proteome search program (Phenyx), then through a sensitive library search (SpectraST) of the highest scoring sequences in the previous Phenyx search plus a number of controls. In the first (relatively simple) test, Phenyx matched 362 spectra, SpectraST made 639 matches at the same error detection level. In a more complex test, Phenyx generated >1000 hits, SpectraST 1304 hits. Ahrné, E. et al., Proteomics 2009, 9, 1731‐1736.  相似文献   

6.
The unambiguous assignment of tandem mass spectra (MS/MS) to peptide sequences remains a key unsolved problem in proteomics. Spectral library search strategies have emerged as a promising alternative for peptide identification, in which MS/MS spectra are directly compared against a reference library of confidently assigned spectra. Two problems relate to library size. First, reference spectral libraries are limited to rediscovery of previously identified peptides and are not applicable to new peptides, because of their incomplete coverage of the human proteome. Second, problems arise when searching a spectral library the size of the entire human proteome. We observed that traditional dot product scoring methods do not scale well with spectral library size, showing reduction in sensitivity when library size is increased. We show that this problem can be addressed by optimizing scoring metrics for spectrum-to-spectrum searches with large spectral libraries. MS/MS spectra for the 1.3 million predicted tryptic peptides in the human proteome are simulated using a kinetic fragmentation model (MassAnalyzer version2.1) to create a proteome-wide simulated spectral library. Searches of the simulated library increase MS/MS assignments by 24% compared with Mascot, when using probabilistic and rank based scoring methods. The proteome-wide coverage of the simulated library leads to 11% increase in unique peptide assignments, compared with parallel searches of a reference spectral library. Further improvement is attained when reference spectra and simulated spectra are combined into a hybrid spectral library, yielding 52% increased MS/MS assignments compared with Mascot searches. Our study demonstrates the advantages of using probabilistic and rank based scores to improve performance of spectrum-to-spectrum search strategies.  相似文献   

7.
MS2 library spectra are rich in reproducible information about peptide fragmentation patterns compared to theoretical spectra modeled by a sequence search tool. So far, spectrum library searches are mostly applied to detect peptides as they are present in the library. However, they also allow finding modified variants of the library peptides if the search is done with a large precursor mass window and an adapted Spectrum-Spectrum Match (SSM) scoring algorithm. We perform a thorough evaluation on the use of library spectra as opposed to theoretical peptide spectra for the identification of PTMs, analyzing spectra of a well-annotated modification-rich test data set compiled from public data repositories. These initial studies motivate the development of our modification tolerant spectrum library search tool QuickMod, designed to identify modified variants of the peptides listed in the spectrum library without any prior input from the user estimating the modifications present in the sample. We built the search algorithm of QuickMod after carefully testing different SSM similarity scores. The final spectrum scoring scheme uses a support vector machine (SVM) on a selection of scoring features to classify correct and incorrect SSM. After identification of a list of modified peptides at a given False Discovery Rate (FDR), the modifications need to be positioned on the peptide sequence. We present a rapid modification site assignment algorithm and evaluate its positioning accuracy. Finally, we demonstrate that QuickMod performs favorably in terms of speed and identification rate when compared to other software solutions for PTM analysis.  相似文献   

8.
9.
基于质谱的蛋白质组学快速发展,蛋白质质谱数据也呈指数式增长。寻找速度快、准确度高以及重复性好的鉴定方法是该领域的一项重要任务。谱图库搜索策略直接比较实验谱图与谱图库中的真实谱图,充分利用了谱图中的丰度、非常规碎裂模式和其他的一些特征,使得搜索更加快速和准确,成为蛋白质组学的主流鉴定方法之一。文中介绍基于谱图库的蛋白质组质谱数据鉴定策略,并针对其中两个关键步骤——谱图库构建方法和谱图库搜索方法进行深入介绍,探讨了谱图库策略的进展和挑战。  相似文献   

10.
Staphylococcus aureus is an opportunistic human pathogen, which can cause life‐threatening disease. Proteome analyses of the bacterium can provide new insights into its pathophysiology and important facets of metabolic adaptation and, thus, aid the recognition of targets for intervention. However, the value of such proteome studies increases with their comprehensiveness. We present an MS–driven, proteome‐wide characterization of the strain S. aureus HG001. Combining 144 high precision proteomic data sets, we identified 19 109 peptides from 2088 distinct S. aureus HG001 proteins, which account for 72% of the predicted ORFs. Peptides were further characterized concerning pI, GRAVY, and detectability scores in order to understand the low peptide coverage of 8.7% (19 109 out of 220 245 theoretical peptides). The high quality peptide‐centric spectra have been organized into a comprehensive peptide fragmentation library (SpectraST) and used for identification of S. aureus‐typic peptides in highly complex host–pathogen interaction experiments, which significantly improved the number of identified S. aureus proteins compared to a MASCOT search. This effort now allows the elucidation of crucial pathophysiological questions in S. aureus‐specific host–pathogen interaction studies through comprehensive proteome analysis. The S. aureus‐specific spectra resource developed here also represents an important spectral repository for SRM or for data‐independent acquisition MS approaches. All MS data have been deposited in the ProteomeXchange with identifier PXD000702 ( http://proteomecentral.proteomexchange.org/dataset/PXD000702 ).  相似文献   

11.
Ahrné E  Ohta Y  Nikitin F  Scherl A  Lisacek F  Müller M 《Proteomics》2011,11(20):4085-4095
The relevance of libraries of annotated MS/MS spectra is growing with the amount of proteomic data generated in high-throughput experiments. These reference libraries provide a fast and accurate way to identify newly acquired MS/MS spectra. In the context of multiple hypotheses testing, the control of the number of false-positive identifications expected in the final result list by means of the calculation of the false discovery rate (FDR). In a classical sequence search where experimental MS/MS spectra are compared with the theoretical peptide spectra calculated from a sequence database, the FDR is estimated by searching randomized or decoy sequence databases. Despite on-going discussion on how exactly the FDR has to be calculated, this method is widely accepted in the proteomic community. Recently, similar approaches to control the FDR of spectrum library searches were discussed. We present in this paper a detailed analysis of the similarity between spectra of distinct peptides to set the basis of our own solution for decoy library creation (DeLiberator). It differs from the previously published results in some key points, mainly in implementing new methods that prevent decoy spectra from being too similar to the original library spectra while keeping important features of real MS/MS spectra. Using different proteomic data sets and library creation methods, we evaluate our approach and compare it with alternative methods.  相似文献   

12.
Peptide identification by tandem mass spectrometry is the dominant proteomics workflow for protein characterization in complex samples. The peptide fragmentation spectra generated by these workflows exhibit characteristic fragmentation patterns that can be used to identify the peptide. In other fields, where the compounds of interest do not have the convenient linear structure of peptides, fragmentation spectra are identified by comparing new spectra with libraries of identified spectra, an approach called spectral matching. In contrast to sequence-based tandem mass spectrometry search engines used for peptides, spectral matching can make use of the intensities of fragment peaks in library spectra to assess the quality of a match. We evaluate a hidden Markov model approach (HMMatch) to spectral matching, in which many examples of a peptide's fragmentation spectrum are summarized in a generative probabilistic model that captures the consensus and variation of each peak's intensity. We demonstrate that HMMatch has good specificity and superior sensitivity, compared to sequence database search engines such as X!Tandem. HMMatch achieves good results from relatively few training spectra, is fast to train, and can evaluate many spectra per second. A statistical significance model permits HMMatch scores to be compared with each other, and with other peptide identification tools, on a unified scale. HMMatch shows a similar degree of concordance with X!Tandem, Mascot, and NIST's MS Search, as they do with each other, suggesting that each tool can assign peptides to spectra that the others miss. Finally, we show that it is possible to extrapolate HMMatch models beyond a single peptide's training spectra to the spectra of related peptides, expanding the application of spectral matching techniques beyond the set of peptides previously observed.  相似文献   

13.
The effectiveness of database search algorithms, such as Mascot, Sequest and ProteinPilot is limited by the quality of the input spectra: spurious peaks in MS/MS spectra can jeopardize the correct identification of peptides or reduce their score significantly. Consequently, an efficient preprocessing of MS/MS spectra can increase the sensitivity of peptide identification at reduced file sizes and run time without compromising its specificity. We investigate the performance of 25 MS/MS preprocessing methods on various data sets and make software for improved preprocessing of mgf/dta‐files freely available from http://hci.iwr.uni‐heidelberg.de/mip/proteomics or http://www.childrenshospital.org/research/steenlab .  相似文献   

14.
Only a small fraction of spectra acquired in LC-MS/MS runs matches peptides from target proteins upon database searches. The remaining, operationally termed background, spectra originate from a variety of poorly controlled sources and affect the throughput and confidence of database searches. Here, we report an algorithm and its software implementation that rapidly removes background spectra, regardless of their precise origin. The method estimates the dissimilarity distance between screened MS/MS spectra and unannotated spectra from a partially redundant background library compiled from several control and blank runs. Filtering MS/MS queries enhanced the protein identification capacity when searches lacked spectrum to sequence matching specificity. In sequence-similarity searches it reduced by, on average, 30-fold the number of orphan hits, which were not explicitly related to background protein contaminants and required manual validation. Removing high quality background MS/MS spectra, while preserving in the data set the genuine spectra from target proteins, decreased the false positive rate of stringent database searches and improved the identification of low-abundance proteins.  相似文献   

15.
Identification of proteins by MS plays an important role in proteomics. A crucial step concerns the identification of peptides from MS/MS spectra. The X!Tandem Project ( http://www.thegpm.org/tandem ) supplies an open‐source search engine for this purpose. In this study, we present an open‐source Java library called XTandem Parser that parses X!Tandem XML result files into an easily accessible and fully functional object model ( http://xtandem‐parser.googlecode.com ). In addition, a graphical user interface is provided that functions as a usage example and an end‐user visualization tool.  相似文献   

16.
Database-searching programs generally identify only a fraction of the spectra acquired in a standard LC/MS/MS study of digested proteins. Subtle variations in database-searching algorithms for assigning peptides to MS/MS spectra have been known to provide different identification results. To leverage this variation, a probabilistic framework is developed for combining the results of multiple search engines. The scores for each search engine are first independently converted into peptide probabilities. These probabilities can then be readily combined across search engines using Bayesian rules and the expectation maximization learning algorithm. A significant gain in the number of peptides identified with high confidence with each additional search engine is demonstrated using several data sets of increasing complexity, from a control protein mixture to a human plasma sample, searched using SEQUEST, Mascot, and X! Tandem database-searching programs. The increased rate of peptide assignments also translates into a substantially larger number of protein identifications in LC/MS/MS studies compared to a typical analysis using a single database-search tool.  相似文献   

17.
LC-MS/MS analysis on a linear ion trap LTQ mass spectrometer, combined with data processing, stringent, and sequence-similarity database searching tools, was employed in a layered manner to identify proteins in organisms with unsequenced genomes. Highly specific stringent searches (MASCOT) were applied as a first layer screen to identify either known (i.e. present in a database) proteins, or unknown proteins sharing identical peptides with related database sequences. Once the confidently matched spectra were removed, the remainder was filtered against a nonannotated library of background spectra that cleaned up the dataset from spectra of common protein and chemical contaminants. The rectified spectral dataset was further subjected to rapid batch de novo interpretation by PepNovo software, followed by the MS BLAST sequence-similarity search that used multiple redundant and partially accurate candidate peptide sequences. Importantly, a single dataset was acquired at the uncompromised sensitivity with no need of manual selection of MS/MS spectra for subsequent de novo interpretation. This approach enabled a completely automated identification of novel proteins that were, otherwise, missed by conventional database searches.  相似文献   

18.
We demonstrate an approach for global quantitative analysis of protein mixtures using differential stable isotopic labeling of the enzyme-digested peptides combined with microbore liquid chromatography (LC) matrix-assisted laser desorption ionization (MALDI) mass spectrometry (MS). Microbore LC provides higher sample loading, compared to capillary LC, which facilitates the quantification of low abundance proteins in protein mixtures. In this work, microbore LC is combined with MALDI MS via a heated droplet interface. The compatibilities of two global peptide labeling methods (i.e., esterification to carboxylic groups and dimethylation to amine groups of peptides) with this LC-MALDI technique are evaluated. Using a quadrupole-time-of-flight mass spectrometer, MALDI spectra of the peptides in individual sample spots are obtained to determine the abundance ratio among pairs of differential isotopically labeled peptides. MS/MS spectra are subsequently obtained from the peptide pairs showing significant abundance differences to determine the sequences of selected peptides for protein identification. The peptide sequences determined from MS/MS database search are confirmed by using the overlaid fragment ion spectra generated from a pair of differentially labeled peptides. The effectiveness of this microbore LC-MALDI approach is demonstrated in the quantification and identification of peptides from a mixture of standard proteins as well as E. coli whole cell extract of known relative concentrations. It is shown that this approach provides a facile and economical means of comparing relative protein abundances from two proteome samples.  相似文献   

19.
Proteome identification using peptide-centric proteomics techniques is a routinely used analysis technique. One of the most powerful and popular methods for the identification of peptides from MS/MS spectra is protein database matching using search engines. Significance thresholding through false discovery rate (FDR) estimation by target/decoy searches is used to ensure the retention of predominantly confident assignments of MS/MS spectra to peptides. However, shortcomings have become apparent when such decoy searches are used to estimate the FDR. To study these shortcomings, we here introduce a novel kind of decoy database that contains isobaric mutated versions of the peptides that were identified in the original search. Because of the supervised way in which the entrapment sequences are generated, we call this a directed decoy database. Since the peptides found in our directed decoy database are thus specifically designed to look quite similar to the forward identifications, the limitations of the existing search algorithms in making correct calls in such strongly confusing situations can be analyzed. Interestingly, for the vast majority of confidently identified peptide identifications, a directed decoy peptide-to-spectrum match can be found that has a better or equal match score than the forward match score, highlighting an important issue in the interpretation of peptide identifications in present-day high-throughput proteomics.  相似文献   

20.
Searching tandem mass spectra against a protein database has been a mainstream method for peptide identification. Improving peptide identification results by ranking true Peptide-Spectrum Matches (PSMs) over their false counterparts leads to the development of various reranking algorithms. In peptide reranking, discriminative information is essential to distinguish true PSMs from false PSMs. Generally, most peptide reranking methods obtain discriminative information directly from database search scores or by training machine learning models. Information in the protein database and MS1 spectra (i.e., single stage MS spectra) is ignored. In this paper, we propose to use information in the protein database and MS1 spectra to rerank peptide identification results. To quantitatively analyze their effects to peptide reranking results, three peptide reranking methods are proposed: PPMRanker, PPIRanker, and MIRanker. PPMRanker only uses Protein-Peptide Map (PPM) information from the protein database, PPIRanker only uses Precursor Peak Intensity (PPI) information, and MIRanker employs both PPM information and PPI information. According to our experiments on a standard protein mixture data set, a human data set and a mouse data set, PPMRanker and MIRanker achieve better peptide reranking results than PetideProphet, PeptideProphet+NSP (number of sibling peptides) and a score regularization method SRPI. The source codes of PPMRanker, PPIRanker, and MIRanker, and all supplementary documents are available at our website: http://bioinformatics.ust.hk/pepreranking/. Alternatively, these documents can also be downloaded from: http://sourceforge.net/projects/pepreranking/.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号