首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Spectral searching has drawn increasing interest as an alternative to sequence-database searching in proteomics. We developed and validated an open-source software toolkit, SpectraST, to enable proteomics researchers to build spectral libraries and to integrate this promising approach in their data-analysis pipeline. It allows individual researchers to condense raw data into spectral libraries, summarizing information about observed proteomes into a concise and retrievable format for future data analyses.  相似文献   

2.
3.
Shotgun proteomics experiments are dependent upon database search engines to identify peptides from tandem mass spectra. Many of these algorithms score potential identifications by evaluating the number of fragment ions matched between each peptide sequence and an observed spectrum. These systems, however, generally do not distinguish between matching an intense peak and matching a minor peak. We have developed a statistical model to score peptide matches that is based upon the multivariate hypergeometric distribution. This scorer, part of the "MyriMatch" database search engine, places greater emphasis on matching intense peaks. The probability that the best match for each spectrum has occurred by random chance can be employed to separate correct matches from random ones. We evaluated this software on data sets from three different laboratories employing three different ion trap instruments. Employing a novel system for testing discrimination, we demonstrate that stratifying peaks into multiple intensity classes improves the discrimination of scoring. We compare MyriMatch results to those of Sequest and X!Tandem, revealing that it is capable of higher discrimination than either of these algorithms. When minimal peak filtering is employed, performance plummets for a scoring model that does not stratify matched peaks by intensity. On the other hand, we find that MyriMatch discrimination improves as more peaks are retained in each spectrum. MyriMatch also scales well to tandem mass spectra from high-resolution mass analyzers. These findings may indicate limitations for existing database search scorers that count matched peaks without differentiating them by intensity. This software and source code is available under Mozilla Public License at this URL: http://www.mc.vanderbilt.edu/msrc/bioinformatics/.  相似文献   

4.
5.
Peptide identification by tandem mass spectrometry is the dominant proteomics workflow for protein characterization in complex samples. The peptide fragmentation spectra generated by these workflows exhibit characteristic fragmentation patterns that can be used to identify the peptide. In other fields, where the compounds of interest do not have the convenient linear structure of peptides, fragmentation spectra are identified by comparing new spectra with libraries of identified spectra, an approach called spectral matching. In contrast to sequence-based tandem mass spectrometry search engines used for peptides, spectral matching can make use of the intensities of fragment peaks in library spectra to assess the quality of a match. We evaluate a hidden Markov model approach (HMMatch) to spectral matching, in which many examples of a peptide's fragmentation spectrum are summarized in a generative probabilistic model that captures the consensus and variation of each peak's intensity. We demonstrate that HMMatch has good specificity and superior sensitivity, compared to sequence database search engines such as X!Tandem. HMMatch achieves good results from relatively few training spectra, is fast to train, and can evaluate many spectra per second. A statistical significance model permits HMMatch scores to be compared with each other, and with other peptide identification tools, on a unified scale. HMMatch shows a similar degree of concordance with X!Tandem, Mascot, and NIST's MS Search, as they do with each other, suggesting that each tool can assign peptides to spectra that the others miss. Finally, we show that it is possible to extrapolate HMMatch models beyond a single peptide's training spectra to the spectra of related peptides, expanding the application of spectral matching techniques beyond the set of peptides previously observed.  相似文献   

6.
Previously, different approaches of spectral comparison were evaluated, and the spectral difference (SD) method was shown to be valuable for its linearity with spectral changes and its independence on data spacing (Anal. Biochem. 434 (2013) 153–165). In this note, we present an enhancement of the SD calculation, referred to as the “weighted spectral difference” (WSD), by implementing a weighting function based on relative signal magnitude. While maintaining the advantages of the SD method, WSD improves the method sensitivity to spectral changes and tolerance for baseline inclusion. Furthermore, a generalized formula is presented to unify further development of approaches to quantify spectral difference.  相似文献   

7.
Conditions and simple precautions are presented for carrying out highly reproducible and sensitive peptide mapping by thin-layer chromatography and subsequent electrophoresis of subnanomole amounts of tryptic digest on silica gel G or GHL plates. The fluorogenic reagent “fluorescamine” is employed for visualization under long-wavelength ultraviolet illumination. Permanent photorecording of high-contrast images, using readily available filters, is substituted for subjective hand scoring of plates. Contrast reversal is used to produce peptide maps suitable for half-tone reproduction.  相似文献   

8.
A notable inefficiency of shotgun proteomics experiments is the repeated rediscovery of the same identifiable peptides by sequence database searching methods, which often are time-consuming and error-prone. A more precise and efficient method, in which previously observed and identified peptide MS/MS spectra are catalogued and condensed into searchable spectral libraries to allow new identifications by spectral matching, is seen as a promising alternative. To that end, an open-source, functionally complete, high-throughput and readily extensible MS/MS spectral searching tool, SpectraST, was developed. A high-quality spectral library was constructed by combining the high-confidence identifications of millions of spectra taken from various data repositories and searched using four sequence search engines. The resulting library consists of over 30,000 spectra for Saccharomyces cerevisiae. Using this library, SpectraST vastly outperforms the sequence search engine SEQUEST in terms of speed and the ability to discriminate good and bad hits. A unique advantage of SpectraST is its full integration into the popular Trans Proteomic Pipeline suite of software, which facilitates user adoption and provides important functionalities such as peptide and protein probability assignment, quantification, and data visualization. This method of spectral library searching is especially suited for targeted proteomics applications, offering superior performance to traditional sequence searching.  相似文献   

9.
During development of CGP56901, a monoclonal antibody (MAb) specific for a unique epitope on human IgE, the protein A‐purified IgG from one of the candidate production cell lines, showed an additional minor heavy chain (H‐chain) band with a molecular weight slightly lower than that of the principal H‐chain band on SDS‐PAGE. The N‐terminal amino acid sequence of this minor H‐chain species indicated that at least the first 30 amino acids were identical to those of the antibody light‐chain (L‐chain) variable domain. More detailed studies using peptide mapping and amino acid sequencing analysis confirmed a crossover event between the V genes of the antibody. The position is between Arg108 of the L chain and Ala124 of the H chain. This crossover resulted in a variant H chain, which had 16 fewer amino acid residues than the normal CGP56901 H chain. These results show that peptide mapping is a useful “first‐line” analytical tool in the characterization of the quality of the monoclonal antibody. © 1999 John Wiley & Sons, Inc. Biotechnol Bioeng 62: 485–488, 1999.  相似文献   

10.

Background

Peptide identification from tandem mass spectrometry (MS/MS) data is one of the most important problems in computational proteomics. This technique relies heavily on the accurate assessment of the quality of peptide-spectrum matches (PSMs). However, current MS technology and PSM scoring algorithm are far from perfect, leading to the generation of incorrect peptide-spectrum pairs. Thus, it is critical to develop new post-processing techniques that can distinguish true identifications from false identifications effectively.

Results

In this paper, we present a consistency-based PSM re-ranking method to improve the initial identification results. This method uses one additional assumption that two peptides belonging to the same protein should be correlated to each other. We formulate an optimization problem that embraces two objectives through regularization: the smoothing consistency among scores of correlated peptides and the fitting consistency between new scores and initial scores. This optimization problem can be solved analytically. The experimental study on several real MS/MS data sets shows that this re-ranking method improves the identification performance.

Conclusions

The score regularization method can be used as a general post-processing step for improving peptide identifications. Source codes and data sets are available at: http://bioinformatics.ust.hk/SRPI.rar.
  相似文献   

11.
Spectral library searching is an emerging approach in peptide identifications from tandem mass spectra, a critical step in proteomic data analysis. In spectral library searching, a spectral library is first meticulously compiled from a large collection of previously observed peptide MS/MS spectra that are conclusively assigned to their corresponding amino acid sequence. An unknown spectrum is then identified by comparing it to all the candidates in the spectral library for the most similar match. This review discusses the basic principles of spectral library building and searching, describes its advantages and limitations, and provides a primer for researchers interested in adopting this new approach in their data analysis. It will also discuss the future outlook on the evolution and utility of spectral libraries in the field of proteomics.  相似文献   

12.
In order to maximize protein identification by peptide mass fingerprinting noise peaks must be removed from spectra and recalibration is often required. The preprocessing of the spectra before database searching is essential but is time-consuming. Nevertheless, the optimal database search parameters often vary over a batch of samples. For high-throughput protein identification, these factors should be set automatically, with no or little human intervention. In the present work automated batch filtering and recalibration using a statistical filter is described. The filter is combined with multiple data searches that are performed automatically. We show that, using several hundred protein digests, protein identification rates could be more than doubled, compared to standard database searching. Furthermore, automated large-scale in-gel digestion of proteins with endoproteinase LysC, and matrix-assisted laser desorption/ionization-time of flight (MALDI-TOF) analysis, followed by subsequent trypsin digestion and MALDI-TOF analysis were performed. Several proteins could be identified only after digestion with one of the enzymes, and some less significant protein identifications were confirmed after digestion with the other enzyme. The results indicate that identification of especially small and low-abundance proteins could be significantly improved after sequential digestions with two enzymes.  相似文献   

13.
14.
MOTIVATION: We reformulate the problem of comparing mass-spectra by mapping spectra to a vector space model. Our search method leverages a metric space indexing algorithm to produce an initial candidate set, which can be followed by any fine ranking scheme. RESULTS: We consider three distance measures integrated into a multi-vantage point index structure. Of these, a semi-metric fuzzy-cosine distance using peptide precursor mass constraints performs the best. The index acts as a coarse, lossless filter with respect to the SEQUEST and ProFound scoring schemes, reducing the number of distance computations and returned candidates for fine filtering to about 0.5% and 0.02% of the database respectively. The fuzzy cosine distance term improves specificity over a peptide precursor mass filter, reducing the number of returned candidates by an order of magnitude. Run time measurements suggest proportional speedups in overall search times. Using an implementation of ProFound's Bayesian score as an example of a fine filter on a test set of Escherichia coli protein fragmentation spectra, the top results of our sample system are consistent with that of SEQUEST.  相似文献   

15.
Confident peptide identification is one of the most important components in mass-spectrometry-based proteomics. We propose a method to properly combine the results from different database search methods to enhance the accuracy of peptide identifications. The database search methods included in our analysis are SEQUEST (v27 rev12), ProbID (v1.0), InsPecT (v20060505), Mascot (v2.1), X! Tandem (v2007.07.01.2), OMSSA (v2.0) and RAId_DbS. Using two data sets, one collected in profile mode and one collected in centroid mode, we tested the search performance of all 21 combinations of two search methods as well as all 35 possible combinations of three search methods. The results obtained from our study suggest that properly combining search methods does improve retrieval accuracy. In addition to performance results, we also describe the theoretical framework which in principle allows one to combine many independent scoring methods including de novo sequencing and spectral library searches. The correlations among different methods are also investigated in terms of common true positives, common false positives, and a global analysis. We find that the average correlation strength, between any pairwise combination of the seven methods studied, is usually smaller than the associated standard error. This indicates only weak correlation may be present among different methods and validates our approach in combining the search results. The usefulness of our approach is further confirmed by showing that the average cumulative number of false positive peptides agrees reasonably well with the combined E-value. The data related to this study are freely available upon request.  相似文献   

16.
Yang  Runmin  Zhu  Daming 《BMC genomics》2018,19(7):666-39

Background

Database search has been the main approach for proteoform identification by top-down tandem mass spectrometry. However, when the target proteoform that produced the spectrum contains post-translational modifications (PTMs) and/or mutations, it is quite time consuming to align a query spectrum against all protein sequences without any PTMs and mutations in a large database. Consequently, it is essential to develop efficient and sensitive filtering algorithms for speeding up database search.

Results

In this paper, we propose a spectrum graph matching (SGM) based protein sequence filtering method for top-down mass spectral identification. It uses the subspectra of a query spectrum to generate spectrum graphs and searches them against a protein database to report the best candidates. As the sequence tag and gaped tag approaches need the preprocessing step to extract and select tags, the SGM filtering method circumvents this preprocessing step, thus simplifying data processing. We evaluated the filtration efficiency of the SGM filtering method with various parameter settings on an Escherichia coli top-down mass spectrometry data set and compared the performances of the SGM filtering method and two tag-based filtering methods on a data set of MCF-7 cells.

Conclusions

Experimental results on the data sets show that the SGM filtering method achieves high sensitivity in protein sequence filtration. When coupled with a spectral alignment algorithm, the SGM filtering method significantly increases the number of identified proteoform spectrum-matches compared with the tag-based methods in top-down mass spectrometry data analysis.
  相似文献   

17.
Many signal processing based methods for finding hidden periodicities in DNA sequences have primarily focused on assigning numerical values to the symbolic DNA sequence and then applying spectral analysis tools such as the short-time discrete Fourier transform (ST-DFT) to locate these repeats. The key results pertaining to this approach are however obtained using a very specific symbolic to numerical map, namely the so-called Voss representation. An important research problem is to therefore quantify the sensitivity of these results to the choice of the symbolic to numerical map. In this article, a novel algebraic approach to the periodicity detection problem is presented and provides a natural framework for studying the role of the symbolic to numerical map in finding these repeats. More specifically, we derive a new matrix-based expression of the DNA spectrum that comprises most of the widely used mappings in the literature as special cases, shows that the DNA spectrum is in fact invariable under all these mappings, and generates a necessary and sufficient condition for the invariance of the DNA spectrum to the symbolic to numerical map. Furthermore, the new algebraic framework decomposes the periodicity detection problem into several fundamental building blocks that are totally independent of each other. Sophisticated digital filters and/or alternate fast data transforms such as the discrete cosine and sine transforms can therefore be always incorporated in the periodicity detection scheme regardless of the choice of the symbolic to numerical map. Although the newly proposed framework is matrix based, identification of these periodicities can be achieved at a low computational cost.  相似文献   

18.
The endogenous cellular oncogene products, pp60c-src, exhibits a protein kinase activity, but is itself a phosphoprotein. Based on the assumption that pp60c-src might play a role in the control of cell proliferation, we have studied its behaviour as a substrate for phosphorylation known to occur when quiescent, serum-deprived cells are stimulated to enter cell cycle following addition of either serum, platelet-derived growth factor or the phorbol ester derivative, 12-O-tetradecanoyl-phorbol-13-acetate. For this purpose a partial purification of pp60c-src on DEAE ion-exchange chromatography was combined with immune precipitation. A 2-4-fold increase in serine phosphorylation of pp60c-src was consistently observed after stimulation of quiescent cells to growth.  相似文献   

19.
We describe the creation of a mass spectral library composed of all identifiable spectra derived from the tryptic digest of the NISTmAb IgG1κ. The library is a unique reference spectral collection developed from over six million peptide-spectrum matches acquired by liquid chromatography-mass spectrometry (LC-MS) over a wide range of collision energy. Conventional one-dimensional (1D) LC-MS was used for various digestion conditions and 20- and 24-fraction two-dimensional (2D) LC-MS studies permitted in-depth analyses of single digests. Computer methods were developed for automated analysis of LC-MS isotopic clusters to determine the attributes for all ions detected in the 1D and 2D studies. The library contains a selection of over 12,600 high-quality tandem spectra of more than 3,300 peptide ions identified and validated by accurate mass, differential elution pattern, and expected peptide classes in peptide map experiments. These include a variety of biologically modified peptide spectra involving glycosylated, oxidized, deamidated, glycated, and N/C-terminal modified peptides, as well as artifacts. A complete glycation profile was obtained for the NISTmAb with spectra for 58% and 100% of all possible glycation sites in the heavy and light chains, respectively. The site-specific quantification of methionine oxidation in the protein is described. The utility of this reference library is demonstrated by the analysis of a commercial monoclonal antibody (adalimumab, Humira®), where 691 peptide ion spectra are identifiable in the constant regions, accounting for 60% coverage for both heavy and light chains. The NIST reference library platform may be used as a tool for facile identification of the primary sequence and post-translational modifications, as well as the recognition of LC-MS method-induced artifacts for human and recombinant IgG antibodies. Its development also provides a general method for creating comprehensive peptide libraries of individual proteins.  相似文献   

20.
We describe the application of a peptide retention time reversed phase liquid chromatography (RPLC) prediction model previously reported (Petritis et al. Anal. Chem. 2003, 75, 1039) for improved peptide identification. The model uses peptide sequence information to generate a theoretical (predicted) elution time that can be compared with the observed elution time. Using data from a set of known proteins, the retention time parameter was incorporated into a discriminant function for use with tandem mass spectrometry (MS/MS) data analyzed with the peptide/protein identification program SEQUEST. For singly charged ions, the number of confident identifications increased by 12% when the elution time metric is included compared to when mass spectral data is the sole source of information in the context of a Drosophila melanogaster database. A 3-4% improvement was obtained for doubly and triply charged ions for the same biological system. Application to the larger Rattus norvegicus (rat) and human proteome databases resulted in an 8-9% overall increase in the number of confident identifications, when both the discriminant function and elution time are used. The effect of adding "runner-up" hits (peptide matches that are not the highest scoring for a spectra) from SEQUEST is also explored, and we find that the number of confident identifications is further increased by 1% when these hits are also considered. Finally, application of the discriminant functions derived in this work with approximately 2.2 million spectra from over three hundred LC-MS/MS analyses of peptides from human plasma protein resulted in a 16% increase in confident peptide identifications (9022 vs 7779) using elution time information. Further improvements from the use of elution time information can be expected as both the experimental control of elution time reproducibility and the predictive capability are improved.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号