首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The SwePep database is designed for endogenous peptides and mass spectrometry. It contains information about the peptides such as mass, pl, precursor protein and potential post-translational modifications. Here, we have improved and extended the SwePep database with tandem mass spectra, by adding a locally curated version of the global proteome machine database (GPMDB). In peptidomic experiment practice, many peptide sequences contain multiple tandem mass spectra with different quality. The new tandem mass spectra database in SwePep enables validation of low quality spectra using high quality tandem mass spectra. The validation is performed by comparing the fragmentation patterns of the two spectra using algorithms for calculating the correlation coefficient between the spectra. The present study is the first step in developing a tandem spectrum database for endogenous peptides that can be used for spectrum-to-spectrum identifications instead of peptide identifications using traditional protein sequence database searches.  相似文献   

2.
Quantitative proteomics relies on accurate protein identification, which often is carried out by automated searching of a sequence database with tandem mass spectra of peptides. When these spectra contain limited information, automated searches may lead to incorrect peptide identifications. It is therefore necessary to validate the identifications by careful manual inspection of the mass spectra. Not only is this task time-consuming, but the reliability of the validation varies with the experience of the analyst. Here, we report a systematic approach to evaluating peptide identifications made by automated search algorithms. The method is based on the principle that the candidate peptide sequence should adequately explain the observed fragment ions. Also, the mass errors of neighboring fragments should be similar. To evaluate our method, we studied tandem mass spectra obtained from tryptic digests of E. coli and HeLa cells. Candidate peptides were identified with the automated search engine Mascot and subjected to the manual validation method. The method found correct peptide identifications that were given low Mascot scores (e.g., 20-25) and incorrect peptide identifications that were given high Mascot scores (e.g., 40-50). The method comprehensively detected false results from searches designed to produce incorrect identifications. Comparison of the tandem mass spectra of synthetic candidate peptides to the spectra obtained from the complex peptide mixtures confirmed the accuracy of the evaluation method. Thus, the evaluation approach described here could help boost the accuracy of protein identification, increase number of peptides identified, and provide a step toward developing a more accurate next-generation algorithm for protein identification.  相似文献   

3.
4.
Peptide identification by tandem mass spectrometry is the dominant proteomics workflow for protein characterization in complex samples. The peptide fragmentation spectra generated by these workflows exhibit characteristic fragmentation patterns that can be used to identify the peptide. In other fields, where the compounds of interest do not have the convenient linear structure of peptides, fragmentation spectra are identified by comparing new spectra with libraries of identified spectra, an approach called spectral matching. In contrast to sequence-based tandem mass spectrometry search engines used for peptides, spectral matching can make use of the intensities of fragment peaks in library spectra to assess the quality of a match. We evaluate a hidden Markov model approach (HMMatch) to spectral matching, in which many examples of a peptide's fragmentation spectrum are summarized in a generative probabilistic model that captures the consensus and variation of each peak's intensity. We demonstrate that HMMatch has good specificity and superior sensitivity, compared to sequence database search engines such as X!Tandem. HMMatch achieves good results from relatively few training spectra, is fast to train, and can evaluate many spectra per second. A statistical significance model permits HMMatch scores to be compared with each other, and with other peptide identification tools, on a unified scale. HMMatch shows a similar degree of concordance with X!Tandem, Mascot, and NIST's MS Search, as they do with each other, suggesting that each tool can assign peptides to spectra that the others miss. Finally, we show that it is possible to extrapolate HMMatch models beyond a single peptide's training spectra to the spectra of related peptides, expanding the application of spectral matching techniques beyond the set of peptides previously observed.  相似文献   

5.
One of the challenges associated with large-scale proteome analysis using tandem mass spectrometry (MS/MS) and automated database searching is to reduce the number of false positive identifications without sacrificing the number of true positives found. In this work, a systematic investigation of the effect of 2MEGA labeling (N-terminal dimethylation after lysine guanidination) on the proteome analysis of a membrane fraction of an Escherichia coli cell extract by 2-dimensional liquid chromatography MS/MS is presented. By a large-scale comparison of MS/MS spectra of native peptides with those from the 2MEGA-labeled peptides, the labeled peptides were found to undergo facile fragmentation with enhanced a1 or a1-related (a(1)-17 and a(1)-45) ions derived from all N-terminal amino acids in the MS/MS spectra; these ions are usually difficult to detect in the MS/MS spectra of nonderivatized peptides. The 2MEGA labeling alleviated the biased detection of arginine-terminated peptides that is often observed in MALDI and ESI MS experiments. 2MEGA labeling was found not only to increase the number of peptides and proteins identified but also to generate enhanced a1 or a1-related ions as a constraint to reduce the number of false positive identifications. In total, 640 proteins were identified from the E. coli membrane fraction, with each protein identified based on peptide mass and sequence match of one or more peptides using MASCOT database search algorithm from the MS/MS spectra generated by a quadrupole time-of-flight mass spectrometer. Among them, the subcellular locations of 336 proteins are presently known, including 258 membrane and membrane-associated proteins (76.8%). Among the classified proteins, there was a dramatic increase in the total number of integral membrane proteins identified in the 2MEGA-labeled sample (153 proteins) versus the unlabeled sample (77 proteins).  相似文献   

6.
Information about peptides and proteins in urine can be used to search for biomarkers of early stages of various diseases. The main technology currently used for identification of peptides and proteins is tandem mass spectrometry, in which peptides are identified by mass spectra of their fragmentation products. However, the presence of the fragmentation stage decreases sensitivity of analysis and increases its duration. We have developed a method for identification of human urinary proteins and peptides. This method based on the accurate mass and time tag (AMT) method does not use tandem mass spectrometry. The database of AMT tags containing more than 1381 AMT tags of peptides has been constructed. The software for database filling with AMT tags, normalizing the chromatograms, database application for identification of proteins and peptides, and their quantitative estimation has been developed. The new procedures for peptide identification by tandem mass spectra and the AMT tag database are proposed. The paper also lists novel proteins that have been identified in human urine for the first time.  相似文献   

7.
The identification of peptides that result from post-translational modifications is critical for understanding normal pathways of cellular regulation as well as identifying damage from, or exposures to xenobiotics, i.e. the exposome. However, because of their low abundance in proteomes, effective detection of modified peptides by mass spectrometry (MS) typically requires enrichment to eliminate false identifications. We present a new method for confidently identifying peptides with mercury (Hg)-containing adducts that is based on the influence of mercury's seven stable isotopes on peptide isotope distributions detected by high-resolution MS. Using a pure protein and E. coli cultures exposed to phenyl mercuric acetate, we show the pattern of peak heights in isotope distributions from primary MS single scans efficiently identified Hg adducts in data from chromatographic separation coupled with tandem mass spectrometry with sensitivity and specificity greater than 90%. Isotope distributions are independent of peptide identifications based on peptide fragmentation (e.g. by SEQUEST), so both methods can be combined to eliminate false positives. Summing peptide isotope distributions across multiple scans improved specificity to 99.4% and sensitivity above 95%, affording identification of an unexpected Hg modification. We also illustrate the theoretical applicability of the method for detection of several less common elements including the essential element, selenium, as selenocysteine in peptides.  相似文献   

8.
Assembling peptides identified from tandem mass spectra into a list of proteins, referred to as protein inference, is a critical step in proteomics research. Due to the existence of degenerate peptides and 'one-hit wonders', it is very difficult to determine which proteins are present in the sample. In this paper, we review existing protein inference methods and classify them according to the source of peptide identifications and the principle of algorithms. It is hoped that the readers will gain a good understanding of the current development in this field after reading this review and come up with new protein inference algorithms.  相似文献   

9.
Mass spectrometry-based quantitative proteomics has become an important component of biological and clinical research. Although such analyses typically assume that a protein's peptide fragments are observed with equal likelihood, only a few so-called 'proteotypic' peptides are repeatedly and consistently identified for any given protein present in a mixture. Using >600,000 peptide identifications generated by four proteomic platforms, we empirically identified >16,000 proteotypic peptides for 4,030 distinct yeast proteins. Characteristic physicochemical properties of these peptides were used to develop a computational tool that can predict proteotypic peptides for any protein from any organism, for a given platform, with >85% cumulative accuracy. Possible applications of proteotypic peptides include validation of protein identifications, absolute quantification of proteins, annotation of coding sequences in genomes, and characterization of the physical principles governing key elements of mass spectrometric workflows (e.g., digestion, chromatography, ionization and fragmentation).  相似文献   

10.
11.
Although peptide mass fingerprinting is currently the method of choice to identify proteins, the number of proteins available in databases is increasing constantly, and hence, the advantage of having sequence data on a selected peptide, in order to increase the effectiveness of database searching, is more crucial. Until recently, the ability to identify proteins based on the peptide sequence was essentially limited to the use of electrospray ionization tandem mass spectrometry (MS) methods. The recent development of new instruments with matrix-assisted laser desorption/ionization (MALDI) sources and true tandem mass spectrometry (MS/MS) capabilities creates the capacity to obtain high quality tandem mass spectra of peptides. In this work, using the new high resolution tandem time of flight MALDI-(TOF/TOF) mass spectrometer from Applied Biosystems, examples of successful identification and characterization of bovine heart proteins (SWISS-PROT entries: P02192, Q9XSC6, P13620) separated by two-dimensional electrophoresis and blotted onto polyvinylidene difluoride membrane are described. Tryptic protein digests were analyzed by MALDI-TOF to identify peptide masses afterward used for MS/MS. Subsequent high energy MALDI-TOF/TOF collision-induced dissociation spectra were recorded on selected ions. All data, both MS and MS/MS, were recorded on the same instrument. Tandem mass spectra were submitted to database searching using MS-Tag or were manually de novo sequenced. An interesting modification of a tryptophan residue, a "double oxidation", came to light during these analyses.  相似文献   

12.
MOTIVATION: Statistical evaluation of the confidence of peptide and protein identifications made by tandem mass spectrometry is a critical component for appropriately interpreting the experimental data and conducting downstream analysis. Although many approaches have been developed to assign confidence measure from different perspectives, a unified statistical framework that integrates the uncertainty of peptides and proteins is still missing. RESULTS: We developed a hierarchical statistical model (HSM) that jointly models the uncertainty of the identified peptides and proteins and can be applied to any scoring system. With data sets of a standard mixture and the yeast proteome, we demonstrate that the HSM offers a reliable or at least conservative false discovery rate (FDR) estimate for peptide and protein identifications. The probability measure of HSM also offers a powerful discriminating score for peptide identification. AVAILABILITY: The algorithm is available upon request from the authors.  相似文献   

13.
In high-throughput proteomics the development of computational methods and novel experimental strategies often rely on each other. In certain areas, mass spectrometry methods for data acquisition are ahead of computational methods to interpret the resulting tandem mass spectra. Particularly, although there are numerous situations in which a mixture tandem mass spectrum can contain fragment ions from two or more peptides, nearly all database search tools still make the assumption that each tandem mass spectrum comes from one peptide. Common examples include mixture spectra from co-eluting peptides in complex samples, spectra generated from data-independent acquisition methods, and spectra from peptides with complex post-translational modifications. We propose a new database search tool (MixDB) that is able to identify mixture tandem mass spectra from more than one peptide. We show that peptides can be reliably identified with up to 95% accuracy from mixture spectra while considering only a 0.01% of all possible peptide pairs (four orders of magnitude speedup). Comparison with current database search methods indicates that our approach has better or comparable sensitivity and precision at identifying single-peptide spectra while simultaneously being able to identify 38% more peptides from mixture spectra at significantly higher precision.  相似文献   

14.
We report on the effectiveness of CID, HCD, and ETD for LC-FT MS/MS analysis of peptides using a tandem linear ion trap-Orbitrap mass spectrometer. A range of software tools and analysis parameters were employed to explore the use of CID, HCD, and ETD to identify peptides (isolated from human blood plasma) without the use of specific "enzyme rules". In the evaluation of an FDR-controlled SEQUEST scoring method, the use of accurate masses for fragments increased the number of identified peptides (by ~50%) compared to the use of conventional low accuracy fragment mass information, and CID provided the largest contribution to the identified peptide data sets compared to HCD and ETD. The FDR-controlled Mascot scoring method provided significantly fewer peptide identifications than SEQUEST (by 1.3-2.3 fold) and CID, HCD, and ETD provided similar contributions to identified peptides. Evaluation of de novo sequencing and the UStags method for more intense fragment ions revealed that HCD afforded more contiguous residues (e.g., ≥ 7 amino acids) than either CID or ETD. Both the FDR-controlled SEQUEST and Mascot scoring methods provided peptide data sets that were affected by the decoy database used and mass tolerances applied (e.g., identical peptides between data sets could be limited to ~70%), while the UStags method provided the most consistent peptide data sets (>90% overlap). The m/z ranges in which CID, HCD, and ETD contributed the largest number of peptide identifications were substantially overlapping. This work suggests that the three peptide ion fragmentation methods are complementary and that maximizing the number of peptide identifications benefits significantly from a careful match with the informatics tools and methods applied. These results also suggest that the decoy strategy may inaccurately estimate identification FDRs.  相似文献   

15.
High-throughput proteomics is made possible by a combination of modern mass spectrometry instruments capable of generating many millions of tandem mass (MS(2)) spectra on a daily basis and the increasingly sophisticated associated software for their automated identification. Despite the growing accumulation of collections of identified spectra and the regular generation of MS(2) data from related peptides, the mainstream approach for peptide identification is still the nearly two decades old approach of matching one MS(2) spectrum at a time against a database of protein sequences. Moreover, database search tools overwhelmingly continue to require that users guess in advance a small set of 4-6 post-translational modifications that may be present in their data in order to avoid incurring substantial false positive and negative rates. The spectral networks paradigm for analysis of MS(2) spectra differs from the mainstream database search paradigm in three fundamental ways. First, spectral networks are based on matching spectra against other spectra instead of against protein sequences. Second, spectral networks find spectra from related peptides even before considering their possible identifications. Third, spectral networks determine consensus identifications from sets of spectra from related peptides instead of separately attempting to identify one spectrum at a time. Even though spectral networks algorithms are still in their infancy, they have already delivered the longest and most accurate de novo sequences to date, revealed a new route for the discovery of unexpected post-translational modifications and highly-modified peptides, enabled automated sequencing of cyclic non-ribosomal peptides with unknown amino acids and are now defining a novel approach for mapping the entire molecular output of biological systems that is suitable for analysis with tandem mass spectrometry. Here we review the current state of spectral networks algorithms and discuss possible future directions for automated interpretation of spectra from any class of molecules.  相似文献   

16.
This report describes the profiling of proteins in a sample prepared by laser capture microdissection (LCM) from a breast cancer cell line (SKBR-3). This experimental approach serves as a model system for proteomic studies on selected tissue samples and for studies of specific cell types. The captured cells were isolated in a dehydrated and reduced state and solubilized with a denaturing buffer. After dilution the protein mixture was digested with trypsin and the resulting peptide mixture was fractionated by reversed phase HPLC (RPLC) and analyzed on an ion trap mass spectrometer. A key part of this study is the combination of the LCM process with an extraction/digestion procedure that allowed effective solubilization of a significant part of the cellular sample in a single step. The identity of the peptides was determined by tandem mass spectrometry measurements in which the resulting spectra were compared with genomic and proteomic databases and protein identifications were made. While only peptides with a high probability assignment were used, the interpretation of mass spectral fragmentation patterns were also confirmed by manual interpretation of the spectra. Also, for the more abundant proteins the initial protein assignment from the best match peptide was strengthened by the observation of additional confirmatory peptide identifications. Another selection criteria was correlation of the mass spectrometric studies with clinical and genomic studies of potential cancer markers in tumor samples. This proteomic study allowed identification of the following proteins: human receptor protein kinase HER-2 or ERBB-2 and related kinases HER-3 and HER-4, the gene products from breast cancer type I and II susceptibility genes and cytoskeletal components such as cytokeratins 8, 18 and 19. Other proteins include fibroblast growth factor receptor variants (FGFR-2&4) and T-lymphoma invasion and metastasis inducing protein 1 (TIAM1). In addition several nonreceptor protein kinases YES, FAK and JAK-1 and 3 were identified. Since the study was performed on a limited number of cells (approximately 10,000) it raises the possibility of such studies being performed on individual patient samples prepared by needle biopsy.  相似文献   

17.
Shotgun proteomics uses liquid chromatography-tandem mass spectrometry to identify proteins in complex biological samples. We describe an algorithm, called Percolator, for improving the rate of confident peptide identifications from a collection of tandem mass spectra. Percolator uses semi-supervised machine learning to discriminate between correct and decoy spectrum identifications, correctly assigning peptides to 17% more spectra from a tryptic Saccharomyces cerevisiae dataset, and up to 77% more spectra from non-tryptic digests, relative to a fully supervised approach.  相似文献   

18.
To interpret LC-MS/MS data in proteomics, most popular protein identification algorithms primarily use predicted fragment m/z values to assign peptide sequences to fragmentation spectra. The intensity information is often undervalued, because it is not as easy to predict and incorporate into algorithms. Nevertheless, the use of intensity to assist peptide identification is an attractive prospect and can potentially improve the confidence of matches and generate more identifications. On the basis of our previously reported study of fragmentation intensity patterns, we developed a protein identification algorithm, SeQuence IDentfication (SQID), that makes use of the coarse intensity from a statistical analysis. The scoring scheme was validated by comparing with Sequest and X!Tandem using three data sets, and the results indicate an improvement in the number of identified peptides, including unique peptides that are not identified by Sequest or X!Tandem. The software and source code are available under the GNU GPL license at http://quiz2.chem.arizona.edu/wysocki/bioinformatics.htm.  相似文献   

19.
The utility and advantages of the recently introduced two-dimensional quadrupole ion trap mass spectrometer in proteomics over the traditional three-dimensional ion trap mass spectrometer have not been systematically characterized. Here we rigorously compared the performance of these two platforms by using over 100,000 tandem mass spectra acquired with identical complex peptide mixtures and acquisition parameters. Specifically we compared four factors that are critical for a successful proteomic study: 1) the number of proteins identified, 2) sequence coverage or the number of peptides identified for every protein, 3) the data base matching SEQUEST X(corr) and S(p) score, and 4) the quality of the fragment ion series of peptides. We found a 4-6-fold increase in the number of peptides and proteins identified on the two-dimensional ion trap mass spectrometer as a direct result of improvement in all the other parameters examined. Interestingly more than 70% of the doubly and triply charged peptides, but not the singly charged peptides, showed better quality of fragmentation spectra on the two-dimensional ion trap. These results highlight specific advantages of the two-dimensional ion trap over the conventional three-dimensional ion traps for protein identification in proteomic experiments.  相似文献   

20.
Highly sensitive peptide fragmentation and identification in sequence databases is a cornerstone of proteomics. Previously, a two-layered strategy consisting of MALDI peptide mass fingerprinting followed by electrospray tandem mass spectrometry of the unidentified proteins has been successfully employed. Here, we describe a high-sensitivity/high-throughput system based on orthogonal MALDI tandem mass spectrometry (o-MALDI) and the automated recognition of fragments corresponding to the N- and C-terminal amino acid residues. Robotic deposition of samples onto hydrophobic anchor substrates is employed, and peptide spectra are acquired automatically. The pulsing feature of the QSTAR o-MALDI mass spectrometer enhances the low mass region of the spectra by approximately 1 order of magnitude. Software has been developed to automatically recognize characteristic features in the low mass region (such as the y1 ion of tryptic peptides), maintaining high mass accuracy even with very low count events. Typically, the sum of the N-terminal two ions (b2 ion), the third N-terminal ion (b3 ion), and the two C-terminal fragments of the peptide (y1 and y2) can be determined. Given mass accuracy in the low ppm range, peptide end sequencing on one or two tryptic peptides is sufficient to uniquely identify a protein from gel samples in the low silver-stained range.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号