首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Despite a recent surge of interest in database-independent peptide identifications, accurate de novo peptide sequencing remains an elusive goal. While the recently introduced spectral network approach resulted in accurate peptide sequencing in low-complexity samples, its success depends on the chance of presence of spectra from overlapping peptides. On the other hand, while multistage mass spectrometry (collecting multiple MS 3 spectra from each MS 2 spectrum) can be applied to all spectra in a complex sample, there are currently no software tools for de novo peptide sequencing by multistage mass spectrometry. We describe a rigorous probabilistic framework for analyzing spectra of overlapping peptides and show how to apply it for multistage mass spectrometry. Our software results in both accurate de novo peptide sequencing from multistage mass spectra (despite the inferior quality of MS 3 spectra) and improved interpretation of spectral networks. We further study the problem of de novo peptide sequencing with accurate parent mass (but inaccurate fragment masses), the protocol that may soon become the dominant mode of spectral acquisition. Most existing peptide sequencing algorithms (based on the spectrum graph approach) do not track the accurate parent mass and are thus not equipped for solving this problem. We describe a de novo peptide sequencing algorithm aimed at this experimental protocol and show that it improves the sequencing accuracy on both tandem and multistage mass spectrometry.  相似文献   

2.
De novo peptide sequencing via tandem mass spectrometry.   总被引:10,自引:0,他引:10  
Peptide sequencing via tandem mass spectrometry (MS/MS) is one of the most powerful tools in proteomics for identifying proteins. Because complete genome sequences are accumulating rapidly, the recent trend in interpretation of MS/MS spectra has been database search. However, de novo MS/MS spectral interpretation remains an open problem typically involving manual interpretation by expert mass spectrometrists. We have developed a new algorithm, SHERENGA, for de novo interpretation that automatically learns fragment ion types and intensity thresholds from a collection of test spectra generated from any type of mass spectrometer. The test data are used to construct optimal path scoring in the graph representations of MS/MS spectra. A ranked list of high scoring paths corresponds to potential peptide sequences. SHERENGA is most useful for interpreting sequences of peptides resulting from unknown proteins and for validating the results of database search algorithms in fully automated, high-throughput peptide sequencing.  相似文献   

3.
Pitzer E  Masselot A  Colinge J 《Proteomics》2007,7(17):3051-3054
De novo peptide sequencing algorithms are often tested on relatively small data sets made of excellent spectra. Since there are always more and more tandem mass spectra available, we have assembled six large, reliable, and diverse (three mass spectrometer types) data sets intended for such tests and we make them accessible via a web server. To exemplify their use we investigate the performance of Lutefisk, PepNovo, and PepNovoTag, three well-established peptide de novo sequencing programs.  相似文献   

4.
Despite significant advances in the identification of known proteins, the analysis of unknown proteins by MS/MS still remains a challenging open problem. Although Klaus Biemann recognized the potential of MS/MS for sequencing of unknown proteins in the 1980s, low throughput Edman degradation followed by cloning still remains the main method to sequence unknown proteins. The automated interpretation of MS/MS spectra has been limited by a focus on individual spectra and has not capitalized on the information contained in spectra of overlapping peptides. Indeed the powerful shotgun DNA sequencing strategies have not been extended to automated protein sequencing. We demonstrate, for the first time, the feasibility of automated shotgun protein sequencing of protein mixtures by utilizing MS/MS spectra of overlapping and possibly modified peptides generated via multiple proteases of different specificities. We validate this approach by generating highly accurate de novo reconstructions of multiple regions of various proteins in western diamondback rattlesnake venom. We further argue that shotgun protein sequencing has the potential to overcome the limitations of current protein sequencing approaches and thus catalyze the otherwise impractical applications of proteomics methodologies in studies of unknown proteins.  相似文献   

5.
The characterization by de novo peptide sequencing of the different protein nucleoside diphosphate kinase B (NDK B) from all the commercial hakes and grenadiers belonging to the family Merlucciidae is reported. A classical proteomics approach, consisting of two-dimmensional gel electrophoresis, tryptic in-gel digestion of the excised spots, MALDI-TOF MS, LC-MS/MS, and nanoESI-MS/MS analyses, was followed for the purification and characterization of the different isoforms of the NDK B. Fragmentation spectra were used for de novo peptide sequence. A high degree of homology was found between the sequences of all the species studied and the NDK B sequence from Gillichthys mirabilis, which is accessible in the protein databases. Particular attention was paid to the differential characterization of species-specific peptides that could be used for fish authentication purposes. These findings allowed us to propose a rapid and effective classification method, based in the detection of these biomarker peptides using the selective ion reaction monitoring (SIRM) scan mode in mass spectrometry.  相似文献   

6.
Protein identification has been greatly facilitated by database searches against protein sequences derived from product ion spectra of peptides. This approach is primarily based on the use of fragment ion mass information contained in a MS/MS spectrum. Unambiguous protein identification from a spectrum with low sequence coverage or poor spectral quality can be a major challenge. We present a two-dimensional (2D) mass spectrometric method in which the numbers of nitrogen atoms in the molecular ion and the fragment ions are used to provide additional discriminating power for much improved protein identification and de novo peptide sequencing. The nitrogen number is determined by analyzing the mass difference of corresponding peak pairs in overlaid spectra of (15)N-labeled and unlabeled peptides. These peptides are produced by enzymatic or chemical cleavage of proteins from cells grown in (15)N-enriched and normal media, respectively. It is demonstrated that, using 2D information, i.e., m/z and its associated nitrogen number, this method can, not only confirm protein identification results generated by MS/MS database searching, but also identify peptides that are not possible to identify by database searching alone. Examples are presented of analyzing Escherichia coli K12 extracts that yielded relatively poor MS/MS spectra, presumably from the digests of low abundance proteins, which can still give positive protein identification using this method. Additionally, this 2D MS method can facilitate spectral interpretation for de novo peptide sequencing and identification of posttranslational or other chemical modifications. We envision that this method should be particularly useful for proteome expression profiling of organelles or cells that can be grown in (15)N-enriched media.  相似文献   

7.
Determining glycan structures is vital to comprehend cell-matrix, cell-cell, and even intracellular biological events. Glycan sequencing, which determines the primary structure of a glycan using tandem mass spectrometry (MS/MS), remains one of the most important tasks in proteomics. Analogous to peptide de novo sequencing, glycan de novo sequencing determines the structure without the aid of a known glycan database. We show in this paper that glycan de novo sequencing is NP-hard. We then provide a heuristic algorithm and develop a software program to solve the problem in practical cases. Experiments on real MS/MS data of glycopeptides demonstrate that our heuristic algorithm gives satisfactory results on practical data.  相似文献   

8.
Thomas H  Shevchenko A 《Proteomics》2008,8(20):4173-4177
Along with unequivocal hits produced by matching multiple MS/MS spectra to database sequences, LC-MS/MS analysis often yields a large number of hits of borderline statistical confidence. To simplify their validation, we propose to use rapid de novo interpretation of all acquired MS/MS spectra and, with the help of a simple software tool, display the candidate sequences together with each database search hit. We demonstrate that comparing hit database sequences and independent de novo interpretations of the same MS/MS spectra assists in rapid examination of ambiguous matches.  相似文献   

9.
MOTIVATION: Peptide identification following tandem mass spectrometry (MS/MS) is usually achieved by searching for the best match between the mass spectrum of an unidentified peptide and model spectra generated from peptides in a sequence database. This methodology will be successful only if the peptide under investigation belongs to an available database. Our objective is to develop and test the performance of a heuristic optimization algorithm capable of dealing with some features commonly found in actual MS/MS spectra that tend to stop simpler deterministic solution approaches. RESULTS: We present the implementation of a Genetic Algorithm (GA) in the reconstruction of amino acid sequences using only spectral features, discuss some of the problems associated with this approach and compare its performance to a de novo sequencing method. The GA can potentially overcome some of the most problematic aspects associated with de novo analysis of real MS/MS data such as missing or unclearly defined peaks and may prove to be a valuable tool in the proteomics field. We assess the performance of our algorithm under conditions of perfect spectral information, in situations where key spectral features are missing, and using real MS/MS spectral data.  相似文献   

10.
In tandem mass spectrometry (MS/MS), there are several different fragmentation techniques possible, including, collision‐induced dissociation (CID) higher energy collisional dissociation (HCD), electron‐capture dissociation (ECD), and electron transfer dissociation (ETD). When using pairs of spectra for de novo peptide sequencing, the most popular methods are designed for CID (or HCD) and ECD (or ETD) spectra because of the complementarity between them. Less attention has been paid to the use of CID and HCD spectra pairs. In this study, a new de novo peptide sequencing method is proposed for these spectra pairs. This method includes a CID and HCD spectra merging criterion and a parent mass correction step, along with improvements to our previously proposed algorithm for sequencing merged spectra. Three pairs of spectral datasets were used to investigate and compare the performance of the proposed method with other existing methods designed for single spectrum (HCD or CID) sequencing. Experimental results showed that full‐length peptide sequencing accuracy was increased significantly by using spectra pairs in the proposed method, with the highest accuracy reaching 81.31%.  相似文献   

11.
For the identification of novel proteins using MS/MS, de novo sequencing software computes one or several possible amino acid sequences (called sequence tags) for each MS/MS spectrum. Those tags are then used to match, accounting amino acid mutations, the sequences in a protein database. If the de novo sequencing gives correct tags, the homologs of the proteins can be identified by this approach and software such as MS-BLAST is available for the matching. However, de novo sequencing very often gives only partially correct tags. The most common error is that a segment of amino acids is replaced by another segment with approximately the same masses. We developed a new efficient algorithm to match sequence tags with errors to database sequences for the purpose of protein and peptide identification. A software package, SPIDER, was developed and made available on Internet for free public use. This paper describes the algorithms and features of the SPIDER software.  相似文献   

12.
Because of the intrinsic physical properties of single- or double-charged ions, MALDI-based CID on these peptide precursor ions tends to be incomplete, resulting in a large number of MS/MS spectra unassigned or ambiguously identified. Consequently, the TOF/TOF high throughput capability may not be fully explored and utilized. Here, we describe a novel method for de novo sequence assignment of those MALDI TOF/TOF MS/MS spectra with incomplete or weak fragment ion series. In this approach, the deuterium-labeled lysine and leucine precursors were used in parallel to mass-tag the proteome of a metastatic human hepatocellular carcinoma (HCC) cell line during in vivo cell culturing. These stable isotope precursor markers not only position at terminal but at internal MS/MS fragment ions with the characteristic isotope pattern induced by multiple mass tagging in parallel. This enhanced signal specificity evidently resolved ambiguities in those sparse poor-quality TOF/TOF spectra by providing critical sequential links among MS/MS fragment ions. Our data-dependent approach was able to reduce many false-positives in current genome sequence-based peptide sequencing. With developing new algorithms accordingly, our approach is amenable for automation that will lead to more comprehensive and reliable identification for proteomes.  相似文献   

13.
In shotgun proteomics, protein identification by tandem mass spectrometry relies on bioinformatics tools. Despite recent improvements in identification algorithms, a significant number of high quality spectra remain unidentified for various reasons. Here we present ScanRanker, an open-source tool that evaluates the quality of tandem mass spectra via sequence tagging with reliable performance in data from different instruments. The superior performance of ScanRanker enables it not only to find unassigned high quality spectra that evade identification through database search but also to select spectra for de novo sequencing and cross-linking analysis. In addition, we demonstrate that the distribution of ScanRanker scores predicts the richness of identifiable spectra among multiple LC-MS/MS runs in an experiment, and ScanRanker scores assist the process of peptide assignment validation to increase confident spectrum identifications. The source code and executable versions of ScanRanker are available from http://fenchurch.mc.vanderbilt.edu.  相似文献   

14.
Protein identifications with the borderline statistical confidence are typically produced by matching a few marginal quality MS/MS spectra to database peptide sequences and represent a significant bottleneck in the reliable and reproducible characterization of proteomes. Here, we present a method for rapid validation of borderline hits that circumvents the need in, often biased, manual inspection of raw MS/MS spectra. The approach takes advantage of the independent interpretation of corresponding MS/MS spectra by PepNovo de novo sequencing software followed by mass spectrometry-driven BLAST (MS BLAST) sequence-similarity database searches that utilize all partially inaccurate, degenerate and redundant candidate peptide sequences. In a case study involving the identification of more than 180 Caenorhabditis elegans proteins by nanoLC-MS/MS analysis on a linear ion trap LTQ mass spectrometer, the approach enabled rapid assignment (confirmation or rejection) of more than 70% of Mascot hits of borderline statistical confidence.  相似文献   

15.
Here we describe a method for protein identification and quantification using stable isotopes via in vivo metabolic labeling of the hyperthermophilic crenarchaeon Sulfolobus solfataricus. Stable isotope labeling for quantitative proteomics is becoming increasingly popular; however, its usefulness in protein identification has not been fully exploited. We use both 15N and 13C labeling to create three different versions of the same peptide, corresponding to the unlabeled, 15N and 13C labeled versions. The peptide then appears as three different peaks in a TOF-MS scan and three corresponding sets of MS/MS spectra are obtained. With this information, the elemental carbon and nitrogen compositions for each peptide and each fragment can be calculated. When this is used as a constraint in database searching and/or de novo sequencing, the confidence of a match is increased (for an example intact peptide from 34 choices to 1). This makes the method a useful proteomic tool for both sequenced and unsequenced organisms. Furthermore, it allows for accurate protein quantitation (standard deviations over >4 peptides per protein were within 10%) of three phenotypes in one MS experiment. Abundances for each peptide are calculated by determining the relative areas of each of the three peaks in the TOF-MS spectrum.  相似文献   

16.
Liu J  Jiang J  Wu Z  Xie F 《Journal of Proteomics》2012,75(18):5807-5821
Eight intact antimicrobial peptides were identified from the skin of Odorrana jingdongensis by de novo sequencing following low energy ESI CID Q-TOF MS/MS in positive-mode with the help of Edman degradation and structural similarity analysis. We devised exact mass measurements to discriminate the K/Q amino acid residue in the peptides between 2.0kDa to 3.8kDa. Moreover, the cleavage at the CS bond at the side chain of Met was observed in all the spectra of the peptides containing Met residue. And we found unusual cleavages within the intramolecular disulfide loop with high frequency. Our data revealed that the cleavage pathways are significantly different from those reported previously which are similar to the cycle peptide cleavage mode followed by the secondary cleavage at the CS bond on oxidized Cys. Thus, our results highly suggest that ion series generated from the cleavages within the intramolecular disulfide loop should be considered in both the top-down sequencing and the disulfide bridge location with the presence of a relatively high intensity of MH(+)-28 ion marker. Furthermore, our activity data implied that different AMPs may use different strategies to kill microbes.  相似文献   

17.
Mass spectrometry data generated in differential profiling of complex protein samples are classically exploited using database searches. In addition, quantitative profiling is performed by various methods, one of them using isotopically coded affinity tags, where one typically uses a light and a heavy tag. Here, we present a new algorithm, ICATcher, which detects pairs of light/heavy peptide MS/MS spectra independent of sequence databases. The method can be used for de novo sequencing and detection of posttranslational modifications. ICATcher is distributed as open source software.  相似文献   

18.
The conventional approach in modern proteomics to identify proteins from limited information provided by molecular and fragment masses of their enzymatic degradation products carries an inherent risk of both false positive and false negative identifications. For reliable identification of even known proteins, complete de novo sequencing of their peptides is desired. The main problems of conventional sequencing based on tandem mass spectrometry are incomplete backbone fragmentation and the frequent overlap of fragment masses. In this work, the first proteomics-grade de novo approach is presented, where the above problems are alleviated by the use of complementary fragmentation techniques CAD and ECD. Implementation of a high-current, large-area dispenser cathode as a source of low-energy electrons provided efficient ECD of doubly charged peptides, the most abundant species (65-80%), in a typical trypsin-based proteomics experiment. A new linear de novo algorithm is developed combining efficiency and speed, processing on a conventional 3 GHz PC, 1000 MS/MS data sets in 60 s. More than 6% of all MS/MS data for doubly charged peptides yielded complete sequences, and another 13% gave nearly complete sequences with a maximum gap of two amino acid residues. These figures are comparable with the typical success rates (5-15%) of database identification. For peptides reliably found in the database (Mowse score > or = 34), the agreement with de novo-derived full sequences was >95%. Full sequences were derived in 67% of the cases when full sequence information was present in MS/MS spectra. Thus the new de novo sequencing approach reached the same level of efficiency and reliability as conventional database-identification strategies.  相似文献   

19.
The recent proliferation of novel mass spectrometers such as Fourier transform, QTOF, and OrbiTrap marks a transition into the era of precision mass spectrometry, providing a 2 orders of magnitude boost to the mass resolution, as compared to low-precision ion-trap detectors. We investigate peptide de novo sequencing by precision mass spectrometry and explore some of the differences when compared to analysis of low-precision data. We demonstrate how the dramatically improved performance of de novo sequencing with precision mass spectrometry paves the way for novel approaches to peptide identification that are based on direct sequence lookups, rather than comparisons of spectra to a database. With the direct sequence lookup, it is not only possible to search a database very efficiently, but also to use the database in novel ways, such as searching for products of alternative splicing or products of fusion proteins in cancer. Our de novo sequencing software is available for download at http://peptide.ucsd.edu/.  相似文献   

20.
Identification of post-translational modifications (PTMs) is important to understanding the biological functions of proteins. MS/MS is a useful tool to identify PTMs. Most existing search tools are restricted to take only a few types of PTMs as input. Here we describe a new algorithm, called MOD(i) (pronounced "mod eye"), that rapidly searches for all known types of PTMs at once without limiting a multitude of modified sites in a peptide. MOD(i) introduces the notion of a tag chain, a combination structure made from multiple sequence tags, that effectively localizes modified regions within a spectrum and overcomes de novo sequencing errors common in tag-based approaches. MOD(i) showed its performance competence by identifying various types of PTMs in analysis of PTM-rich proteins such as glyceraldehyde-3-phosphate dehydrogenase and lens protein. We demonstrated that MOD(i) innovatively manages the computational complexity of identifying multiple PTMs in a peptide, which may exist in a greater variety than usually expected. In addition, it is suggested that MOD(i) has great potential to discover novel modifications.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号