首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Searching spectral libraries in MS/MS is an important new approach to improving the quality of peptide and protein identification. The idea relies on the observation that ion intensities in an MS/MS spectrum of a given peptide are generally reproducible across experiments, and thus, matching between spectra from an experiment and the spectra of previously identified peptides stored in a spectral library can lead to better peptide identification compared to the traditional database search. However, the use of libraries is greatly limited by their coverage of peptide sequences: even for well‐studied organisms a large fraction of peptides have not been previously identified. To address this issue, we propose to expand spectral libraries by predicting the MS/MS spectra of peptides based on the spectra of peptides with similar sequences. We first demonstrate that the intensity patterns of dominant fragment ions between similar peptides tend to be similar. In accordance with this observation, we develop a neighbor‐based approach that first selects peptides that are likely to have spectra similar to the target peptide and then combines their spectra using a weighted K‐nearest neighbor method to accurately predict fragment ion intensities corresponding to the target peptide. This approach has the potential to predict spectra for every peptide in the proteome. When rigorous quality criteria are applied, we estimate that the method increases the coverage of spectral libraries available from the National Institute of Standards and Technology by 20–60%, although the values vary with peptide length and charge state. We find that the overall best search performance is achieved when spectral libraries are supplemented by the high quality predicted spectra.  相似文献   

2.
Protein identification has been greatly facilitated by database searches against protein sequences derived from product ion spectra of peptides. This approach is primarily based on the use of fragment ion mass information contained in a MS/MS spectrum. Unambiguous protein identification from a spectrum with low sequence coverage or poor spectral quality can be a major challenge. We present a two-dimensional (2D) mass spectrometric method in which the numbers of nitrogen atoms in the molecular ion and the fragment ions are used to provide additional discriminating power for much improved protein identification and de novo peptide sequencing. The nitrogen number is determined by analyzing the mass difference of corresponding peak pairs in overlaid spectra of (15)N-labeled and unlabeled peptides. These peptides are produced by enzymatic or chemical cleavage of proteins from cells grown in (15)N-enriched and normal media, respectively. It is demonstrated that, using 2D information, i.e., m/z and its associated nitrogen number, this method can, not only confirm protein identification results generated by MS/MS database searching, but also identify peptides that are not possible to identify by database searching alone. Examples are presented of analyzing Escherichia coli K12 extracts that yielded relatively poor MS/MS spectra, presumably from the digests of low abundance proteins, which can still give positive protein identification using this method. Additionally, this 2D MS method can facilitate spectral interpretation for de novo peptide sequencing and identification of posttranslational or other chemical modifications. We envision that this method should be particularly useful for proteome expression profiling of organelles or cells that can be grown in (15)N-enriched media.  相似文献   

3.
Spectral libraries have emerged as a viable alternative to protein sequence databases for peptide identification. These libraries contain previously detected peptide sequences and their corresponding tandem mass spectra (MS/MS). Search engines can then identify peptides by comparing experimental MS/MS scans to those in the library. Many of these algorithms employ the dot product score for measuring the quality of a spectrum-spectrum match (SSM). This scoring system does not offer a clear statistical interpretation and ignores fragment ion m/z discrepancies in the scoring. We developed a new spectral library search engine, Pepitome, which employs statistical systems for scoring SSMs. Pepitome outperformed the leading library search tool, SpectraST, when analyzing data sets acquired on three different mass spectrometry platforms. We characterized the reliability of spectral library searches by confirming shotgun proteomics identifications through RNA-Seq data. Applying spectral library and database searches on the same sample revealed their complementary nature. Pepitome identifications enabled the automation of quality analysis and quality control (QA/QC) for shotgun proteomics data acquisition pipelines.  相似文献   

4.
Peptide mass fingerprinting (PMF) is a valuable method for rapid and high-throughput protein identification using the proteomics approach. Automated search engines, such as Ms-Fit, Mascot, ProFound, and Peptldent, have facilitated protein identification through PMF. The potential to obtain a true MS protein identification result depends on the choice of algorithm as well as experimental factors that influence the information content in MS data. When mass spectral data are incomplete and/or have low mass accuracy, the “number of matches” approach may be inadequate for a useful identification. Several studies have evaluated factors influencing the quality of mass spectrometry (MS) experiments. Missed cleavages, posttranslational modifications of peptides and contaminants (e.g., keratin) are important factors that can affect the results of MS analyses by influencing the identification process as well as the quality of the MS spectra. We compared search engines frequently used to identify proteins fromHomo sapiens andHalobacterium salinarum by evaluating factors, including data-based and mass tolerance to develop an improved search engine for PMF. This study may provide information to help develop a more effective algorithm for protein identification in each species through PMF.  相似文献   

5.
Identification of metabolites is a major challenge in biological studies and relies in principle on mass spectrometry (MS) and nuclear magnetic resonance (NMR) methods. The increased sensitivity and stability of both NMR and MS systems have made dereplication of complex biological samples feasible. Metabolic databases can be of help in the identification process. Nonetheless, there is still a lack of adequate spectral databases that contain high quality spectra, but new developments in this area will assist in the (semi-)automated identification process in the near future. Here, we discuss new developments for the structural elucidation of low abundant metabolites present in complex sample matrices. We describe how a recently developed combination of high resolution MS multistage fragmentation (MS n ) and high resolution one dimensional (1D)-proton (1H)-NMR of liquid chromatography coupled to solid phase extraction (LC–SPE) purified metabolites can circumvent the need for isolating extensive amounts of the compounds of interest to elucidate their structures. The LC–MS–SPE–NMR hardware configuration in conjunction with high quality databases facilitates complete structural elucidation of metabolites even at sub-microgram levels of compound in crude extracts. However, progress is still required to optimally exploit the power of an integrated MS and NMR approach. Especially, there is a need to improve and expand both MS n and NMR spectral databases. Adequate and user-friendly software is required to assist in candidate selection based on the comparison of acquired MS and NMR spectral information with reference data. It is foreseen that these focal points will contribute to a better transfer and exploitation of structural information gained from diverse analytical platforms.  相似文献   

6.
Aims: To propose a universal workflow of sample preparation method for the identification of highly pathogenic bacteria by MALDI‐TOF MS. Methods and Results: Fifteen bacterial species, including highly virulent Gram‐positive (Bacillus anthracis and Clostridium botulinum) and Gram‐negative bacteria (Brucella melitensis, Burkholderia mallei, Francisella tularensis, Shigella dysenteriae, Vibrio cholerae, Yersinia pestis and Legionella pneumophila), were employed in the comparative study of four sample preparation methods compatible with MALDI‐TOF MS. The yield of bacterial proteins was determined by spectrophotometry, and the quality of the mass spectra, recorded in linear mode in the range of 2000–20 000 Da, was evaluated with respect to the information content (number of signals) and quality (S/N ratio). Conclusions: Based on the values of protein concentration and spectral quality, the method using combination of ethanol treatment followed by extraction with formic acid and acetonitrile was the most efficient sample preparation method for the identification of highly pathogenic bacteria using MALDI‐TOF MS. Significance and Impact of the Study: The method using ethanol/formic acid generally shows the highest extraction efficacy and the spectral quality with no detrimental effect caused by storage. Thus, this can be considered as a universal sample preparation method for the identification of highly virulent micro‐organisms by MALDI‐TOF mass spectrometry.  相似文献   

7.
Rescoring of mass spectrometry (MS) search results using spectral predictors can strongly increase peptide spectrum match (PSM) identification rates. This approach is particularly effective when aiming to search MS data against large databases, for example, when dealing with nonspecific cleavage in immunopeptidomics or inflation of the reference database for noncanonical peptide identification. Here, we present inSPIRE (in silico Spectral Predictor Informed REscoring), a flexible and performant open-source rescoring pipeline built on Prosit MS spectral prediction, which is compatible with common database search engines. inSPIRE allows large-scale rescoring with data from multiple MS search files, increases sensitivity to minor differences in amino acid residue position, and can be applied to various MS sample types, including tryptic proteome digestions and immunopeptidomes. inSPIRE boosts PSM identification rates in immunopeptidomics, leading to better performance than the original Prosit rescoring pipeline, as confirmed by benchmarking of inSPIRE performance on ground truth datasets. The integration of various features in the inSPIRE backbone further boosts the PSM identification in immunopeptidomics, with a potential benefit for the identification of noncanonical peptides.  相似文献   

8.
Mass spectrometry (MS)-based shotgun proteomics allows protein identifications even in complex biological samples. Protein abundances can then be estimated from the counts of tandem MS (MS/MS) spectra attributable to each protein, provided one accounts for differential MS detectability of contributing peptides. We developed a method, APEX, which calculates Absolute Protein EXpression levels based upon learned correction factors, MS/MS spectral counts and each protein's probability of correct identification. This protocol describes APEX-based calculations in three parts. (i) Using training data, peptide sequences and their sequence properties, a model is built to estimate MS detectability (O(i)) for any given protein. (ii) Absolute protein abundances are calculated from spectral counts, identification probabilities and the learned O(i)-values. (iii) Simple statistics allow calculation of differential expression in two distinct biological samples, i.e., measuring relative protein abundances. APEX-based protein abundances span 3-4 orders of magnitude and are applicable to mixtures of 100s to 1,000s of proteins.  相似文献   

9.
A convenient synthesis of some homologous light isotope-coded affinity tags (ICAT-L) containing an acid-labile moiety between the affinity component biotin and an electrophilic polar linker is described. These light ICAT reagents give smooth mass spectral signals in tandem mass spectrometry (MS/MS) analyses of some commercially available cysteine-containing peptides. However, these ICAT molecules are designed for use in identification and relative quantification of whole or partially purified cellular and tissue proteomes. Since the biotin moiety can be readily cleaved off the reagent after mass tagging, undesired residual fragmentation patterns caused by biotin of derived peptides, as normally observed using biotin-containing ICAT reagents, are effectively eliminated. This strategy should enhance peptide sequence coverage significantly which, in turn, should result in improving the quality of data obtained during data-dependent peptide mass and tandem mass spectral analysis of whole proteomes.  相似文献   

10.
MOTIVATION: Peptide identification following tandem mass spectrometry (MS/MS) is usually achieved by searching for the best match between the mass spectrum of an unidentified peptide and model spectra generated from peptides in a sequence database. This methodology will be successful only if the peptide under investigation belongs to an available database. Our objective is to develop and test the performance of a heuristic optimization algorithm capable of dealing with some features commonly found in actual MS/MS spectra that tend to stop simpler deterministic solution approaches. RESULTS: We present the implementation of a Genetic Algorithm (GA) in the reconstruction of amino acid sequences using only spectral features, discuss some of the problems associated with this approach and compare its performance to a de novo sequencing method. The GA can potentially overcome some of the most problematic aspects associated with de novo analysis of real MS/MS data such as missing or unclearly defined peaks and may prove to be a valuable tool in the proteomics field. We assess the performance of our algorithm under conditions of perfect spectral information, in situations where key spectral features are missing, and using real MS/MS spectral data.  相似文献   

11.
Wenguang Shao  Kan Zhu  Henry Lam 《Proteomics》2013,13(22):3273-3283
Spectral library searching is a maturing approach for peptide identification from MS/MS, offering an alternative to traditional sequence database searching. Spectral library searching relies on direct spectrum‐to‐spectrum matching between the query data and the spectral library, which affords better discrimination of true and false matches, leading to improved sensitivity. However, due to the inherent diversity of the peak location and intensity profiles of real spectra, the resulting similarity score distributions often take on unpredictable shapes. This makes it difficult to model the scores of the false matches accurately, necessitating the use of decoy searching to sample the score distribution of the false matches. Here, we refined the similarity scoring in spectral library searching to enable the validation of spectral search results without the use of decoys. We rank‐transformed the peak intensities to standardize all spectra, making it possible to fit a parametric distribution to the scores of the nontop‐scoring spectral matches. The statistical significance of the top‐scoring match can then be estimated in a rigorous manner according to Extreme Value Theory. The overall result is a more robust and interpretable measure of the quality of the spectral match, which can be obtained without decoys. We tested this refined similarity scoring function on real datasets and demonstrated its effectiveness. This approach reduces search time, increases sensitivity, and extends spectral library searching to situations where decoy spectra cannot be readily generated, such as in searching unidentified and nonpeptide spectral libraries.  相似文献   

12.
数据非依赖采集(DIA)是蛋白质组学领域近年来快速发展的质谱采集技术,其通过无偏碎裂隔离窗口内的所有母离子采集二级谱图,理论上可实现蛋白质样品的深度覆盖,同时具有高通量、高重现性和高灵敏度的优点。现有的DIA数据采集方法可以分为全窗口碎裂方法、隔离窗口序列碎裂方法和四维DIA数据采集方法(4D-DIA)3大类。针对DIA数据的不同特点,主要数据解析方法包括谱库搜索方法、蛋白质序列库直接搜索方法、伪二级谱图鉴定方法和从头测序方法4大类。解析得到的肽段鉴定结果需要进行可信度评估,包括使用机器学习方法的重排序和对报告结果集合的假发现率估计两个步骤,实现对数据解析结果的质控。本文对DIA数据的采集方法、数据解析方法及软件和鉴定结果可信度评估方法进行了整理和综述,并展望了未来的发展方向。  相似文献   

13.
The unambiguous assignment of tandem mass spectra (MS/MS) to peptide sequences remains a key unsolved problem in proteomics. Spectral library search strategies have emerged as a promising alternative for peptide identification, in which MS/MS spectra are directly compared against a reference library of confidently assigned spectra. Two problems relate to library size. First, reference spectral libraries are limited to rediscovery of previously identified peptides and are not applicable to new peptides, because of their incomplete coverage of the human proteome. Second, problems arise when searching a spectral library the size of the entire human proteome. We observed that traditional dot product scoring methods do not scale well with spectral library size, showing reduction in sensitivity when library size is increased. We show that this problem can be addressed by optimizing scoring metrics for spectrum-to-spectrum searches with large spectral libraries. MS/MS spectra for the 1.3 million predicted tryptic peptides in the human proteome are simulated using a kinetic fragmentation model (MassAnalyzer version2.1) to create a proteome-wide simulated spectral library. Searches of the simulated library increase MS/MS assignments by 24% compared with Mascot, when using probabilistic and rank based scoring methods. The proteome-wide coverage of the simulated library leads to 11% increase in unique peptide assignments, compared with parallel searches of a reference spectral library. Further improvement is attained when reference spectra and simulated spectra are combined into a hybrid spectral library, yielding 52% increased MS/MS assignments compared with Mascot searches. Our study demonstrates the advantages of using probabilistic and rank based scores to improve performance of spectrum-to-spectrum search strategies.  相似文献   

14.
Hu Y  Li Y  Lam H 《Proteomics》2011,11(24):4702-4711
Spectral library searching is a promising alternative to sequence database searching in peptide identification from MS/MS spectra. The key advantage of spectral library searching is the utilization of more spectral features to improve score discrimination between good and bad matches, and hence sensitivity. However, the coverage of reference spectral library is limited by current experimental and computational methods. We developed a computational approach to expand the coverage of spectral libraries with semi-empirical spectra predicted from perturbing known spectra of similar sequences, such as those with single amino acid substitutions. We hypothesized that the peptide of similar sequences should produce similar fragmentation patterns, at least in most cases. Our results confirm our hypothesis and specify when this approach can be applied. In actual spectral searching of real data sets, the sensitivity advantage of spectral library searching over sequence database searching can be mostly retained even when all real spectra are replaced by semi-empirical ones. We demonstrated the applicability of this approach by detecting several known non-synonymous single-nucleotide polymorphisms in three large human data sets by spectral searching.  相似文献   

15.
Formalin-fixed, paraffin-embedded (FFPE) tissue banks represent an invaluable resource for biomarker discovery. Recently, the combination of full-length protein extraction, GeLC-MS/MS analysis, and spectral counting quantification has been successfully applied to mine proteomic information from these tissues. However, several sources of variability affect these samples; among these, the duration of the fixation process is one of the most important and most easily controllable ones. To assess its influence on quality of GeLC-MS/MS data, the impact of fixation time on efficiency of full-length protein extraction efficiency and on quality of label-free quantitative data was evaluated. As a result, although proteins were successfully extracted from FFPE liver samples fixed for up to eight days, fixation time appeared to negatively influence both protein extraction yield and GeLC-MS/MS quantitative proteomic data. Particularly, MS identification efficiency decreased with increasing fixation times. Moreover, amino acid modifications putatively induced by formaldehyde were detected and characterized. These results demonstrate that proteomic information can be achieved also from tissue samples fixed for relatively long times, but suggest that variations in fixation time need to be carefully taken into account when performing proteomic biomarker discovery studies on fixed tissue archives.  相似文献   

16.
Peptide identification by tandem mass spectrometry is the dominant proteomics workflow for protein characterization in complex samples. The peptide fragmentation spectra generated by these workflows exhibit characteristic fragmentation patterns that can be used to identify the peptide. In other fields, where the compounds of interest do not have the convenient linear structure of peptides, fragmentation spectra are identified by comparing new spectra with libraries of identified spectra, an approach called spectral matching. In contrast to sequence-based tandem mass spectrometry search engines used for peptides, spectral matching can make use of the intensities of fragment peaks in library spectra to assess the quality of a match. We evaluate a hidden Markov model approach (HMMatch) to spectral matching, in which many examples of a peptide's fragmentation spectrum are summarized in a generative probabilistic model that captures the consensus and variation of each peak's intensity. We demonstrate that HMMatch has good specificity and superior sensitivity, compared to sequence database search engines such as X!Tandem. HMMatch achieves good results from relatively few training spectra, is fast to train, and can evaluate many spectra per second. A statistical significance model permits HMMatch scores to be compared with each other, and with other peptide identification tools, on a unified scale. HMMatch shows a similar degree of concordance with X!Tandem, Mascot, and NIST's MS Search, as they do with each other, suggesting that each tool can assign peptides to spectra that the others miss. Finally, we show that it is possible to extrapolate HMMatch models beyond a single peptide's training spectra to the spectra of related peptides, expanding the application of spectral matching techniques beyond the set of peptides previously observed.  相似文献   

17.
Wagner C  Sefkow M  Kopka J 《Phytochemistry》2003,62(6):887-900
The non-supervised construction of a mass spectral and retention time index data base (MS/RI library) from a set of plant metabolic profiles covering major organs of potato (Solanum tuberosum), tobacco (Nicotiana tabaccum), and Arabidopsis thaliana, was demonstrated. Typically 300-500 mass spectral components with a signal to noise ratio > or =75 were obtained from GC/EI-time-of-flight (TOF)-MS metabolite profiles of methoxyaminated and trimethylsilylated extracts. Profiles from non-sample controls contained approximately 100 mass spectral components. A MS/RI library of 6205 mass spectral components was accumulated and applied to automated identification of the model compounds galactonic acid, a primary metabolite, and 3-caffeoylquinic acid, a secondary metabolite. Neither MS nor RI alone were sufficient for unequivocal identification of unknown mass spectral components. However library searches with single bait mass spectra of the respective reference substance allowed clear identification by mass spectral match and RI window. Moreover, the hit lists of mass spectral searches were demonstrated to comprise candidate components of highly similar chemical nature. The search for the model compound galactonic acid allowed identification of gluconic and gulonic acid among the top scoring mass spectral components. Equally successful was the exemplary search for 3-caffeoylquinic acid, which led to the identification of quinic acid and of the positional isomers, 4-caffeoylquinic acid, 5-caffeoylquinic acid among other still non-identified conjugates of caffeic and quinic acid. All identifications were verified by co-analysis of reference substances. Finally we applied hierarchical clustering to a complete set of pair-wise mass spectral comparisons of unknown components and reference substances with known chemical structure. We demonstrated that the resulting clustering tree depicted the chemical nature of the reference substances and that most of the nearest neighbours represented either identical components, as judged by co-elution, or conformational isomers exhibiting differential retention behaviour. Unknown components could be classified automatically by grouping with the respective branches and sub-branches of the clustering tree.  相似文献   

18.
19.
基于质谱的蛋白质组学快速发展,蛋白质质谱数据也呈指数式增长。寻找速度快、准确度高以及重复性好的鉴定方法是该领域的一项重要任务。谱图库搜索策略直接比较实验谱图与谱图库中的真实谱图,充分利用了谱图中的丰度、非常规碎裂模式和其他的一些特征,使得搜索更加快速和准确,成为蛋白质组学的主流鉴定方法之一。文中介绍基于谱图库的蛋白质组质谱数据鉴定策略,并针对其中两个关键步骤——谱图库构建方法和谱图库搜索方法进行深入介绍,探讨了谱图库策略的进展和挑战。  相似文献   

20.
Despite recent mass spectrometry (MS)‐based breakthroughs, comprehensive ADP‐ribose (ADPr)‐acceptor amino acid identification and ADPr‐site localization remain challenging. Here, we report the establishment of an unbiased, multistep ADP‐ribosylome data analysis workflow that led to the identification of tyrosine as a novel ARTD1/PARP1‐dependent in vivo ADPr‐acceptor amino acid. MS analyses of in vitro ADP‐ribosylated proteins confirmed tyrosine as an ADPr‐acceptor amino acid in RPS3A (Y155) and HPF1 (Y238) and demonstrated that trans‐modification of RPS3A is dependent on HPF1. We provide an ADPr‐site Localization Spectra Database (ADPr‐LSD), which contains 288 high‐quality ADPr‐modified peptide spectra, to serve as ADPr spectral references for correct ADPr‐site localizations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号