首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 796 毫秒
1.
Mass spectrometry (MS) analysis of peptides carrying post‐translational modifications is challenging due to the instability of some modifications during MS analysis. However, glycopeptides as well as acetylated, methylated and other modified peptides release specific fragment ions during CID (collision‐induced dissociation) and HCD (higher energy collisional dissociation) fragmentation. These fragment ions can be used to validate the presence of the PTM on the peptide. Here, we present PTM MarkerFinder, a software tool that takes advantage of such marker ions. PTM MarkerFinder screens the MS/MS spectra in the output of a database search (i.e., Mascot) for marker ions specific for selected PTMs. Moreover, it reports and annotates the HCD and the corresponding electron transfer dissociation (ETD) spectrum (when present), and summarizes information on the type, number, and ratios of marker ions found in the data set. In the present work, a sample containing enriched N‐acetylhexosamine (HexNAc) glycopeptides from yeast has been analyzed by liquid chromatography‐mass spectrometry on an LTQ Orbitrap Velos using both HCD and ETD fragmentation techniques. The identification result (Mascot .dat file) was submitted as input to PTM MarkerFinder and screened for HexNAc oxonium ions. The software output has been used for high‐throughput validation of the identification results.  相似文献   

2.
Recent emergence of new mass spectrometry techniques (e.g. electron transfer dissociation, ETD) and improved availability of additional proteases (e.g. Lys-N) for protein digestion in high-throughput experiments raised the challenge of designing new algorithms for interpreting the resulting new types of tandem mass (MS/MS) spectra. Traditional MS/MS database search algorithms such as SEQUEST and Mascot were originally designed for collision induced dissociation (CID) of tryptic peptides and are largely based on expert knowledge about fragmentation of tryptic peptides (rather than machine learning techniques) to design CID-specific scoring functions. As a result, the performance of these algorithms is suboptimal for new mass spectrometry technologies or nontryptic peptides. We recently proposed the generating function approach (MS-GF) for CID spectra of tryptic peptides. In this study, we extend MS-GF to automatically derive scoring parameters from a set of annotated MS/MS spectra of any type (e.g. CID, ETD, etc.), and present a new database search tool MS-GFDB based on MS-GF. We show that MS-GFDB outperforms Mascot for ETD spectra or peptides digested with Lys-N. For example, in the case of ETD spectra, the number of tryptic and Lys-N peptides identified by MS-GFDB increased by a factor of 2.7 and 2.6 as compared with Mascot. Moreover, even following a decade of Mascot developments for analyzing CID spectra of tryptic peptides, MS-GFDB (that is not particularly tailored for CID spectra or tryptic peptides) resulted in 28% increase over Mascot in the number of peptide identifications. Finally, we propose a statistical framework for analyzing multiple spectra from the same precursor (e.g. CID/ETD spectral pairs) and assigning p values to peptide-spectrum-spectrum matches.Since the introduction of electron capture dissociation (ECD)1 in 1998 (1), electron-based peptide dissociation technologies have played an important role in analyzing intact proteins and post-translational modifications (2). However, until recently, this research-grade technology was available only to a small number of laboratories because it was commercially unavailable, required experience for operation, and could be implemented only with expensive FT-ICR instruments. The discovery of electron-transfer dissociation (ETD) (3) enabled an ECD-like technology to be implemented in (relatively cheap) ion-trap instruments. Nowadays, many researchers are employing the ETD technology for tandem mass spectra generation (49).Although the hardware technologies to generate ETD spectra are maturing rapidly, software technologies to analyze ETD spectra are still in infancy. There are two major approaches to analyzing tandem mass spectra: de novo sequencing and database search. Both approaches find the best-scoring peptide either among all possible peptides (de novo sequencing) or among all peptides in a protein database (database search). Although de novo sequencing is emerging as an alternative to database search, database search remains a more accurate (and thus preferred) method of spectral interpretation, so here we focus on the database search approach.Numerous database search engines are currently available, including SEQUEST (10), Mascot (11), OMSSA (12), X!Tandem (13), and InsPecT (14). However, most of them are inadequate for the analysis of ETD spectra because they are optimized for collision induced dissociation (CID) spectra that show different fragmentation propensities than those of ETD spectra. Additionally, the existing tandem mass spectrometry (MS/MS) tools are biased toward the analysis of tryptic peptides because trypsin is usually used for CID, and thus not suitable for the analysis of nontryptic peptides that are common for ETD. Therefore, even though some database search engines support the analysis of ETD spectra (e.g. SEQUEST, Mascot, and OMSSA), their performance remains suboptimal when it comes to analyzing ETD spectra. Recently, an ETD-specific database search tool (Z-Core) was developed; however it does not significantly improve over OMSSA (15).We present a new database search tool (MS-GFDB) that significantly outperforms existing database search engines in the analysis of ETD spectra, and performs equally well on nontryptic peptides. MS-GFDB employs the generating function approach (MS-GF) that computes rigorous p values of peptide-spectrum matches (PSMs) based on the spectrum-specific score histogram of all peptides (16).2 MS-GF p values are dependent only on the PSM (and not on the database), thus can be used as an alternative scoring function for the database search.Computing p values requires a scoring model evaluating qualities of PSMs. MS-GF adopts a probabilistic scoring model (MS-Dictionary scoring model) described in Kim et al., 2009 (17), considering multiple features including product ion types, peak intensities and mass errors. To define the parameters of this scoring model, MS-GF only needs a set of training PSMs.3 This set of PSMs can be obtained in a variety of ways: for example, one can generate CID/ETD pairs and use peptides identified by CID to form PSMs for ETD. Alternatively, one can generate spectra from a purified protein (when PSMs can be inferred from the accurate parent mass alone) or use a previously developed (not necessary optimal) tool to generate training PSMs. From these training PSMs, MS-GF automatically derives scoring parameters without assuming any prior knowledge about the specifics of a particular peptide fragmentation method (e.g. ETD, CID, etc.) and/or proteolytic origin of the peptides. MS-GF was originally designed for the analysis of CID spectra, but now it has been extended to other types of spectra generated by various fragmentation techniques and/or various enzymes. We show that MS-GF can be successfully applied to novel types of spectra (e.g. ETD of Lys-N peptides (18, 19)) by simply retraining scoring parameters without any modification. Note that although the same scoring model is used for different types of spectra, the parameters derived to score different types of spectra are dissimilar.We compared the performance of MS-GFDB with Mascot on a large ETD data set and found that it generated many more peptide identifications for the same false discovery rates (FDR). For example, at 1% peptide level FDR, MS-GFDB identified 9450 unique peptides from 81,864 ETD spectra of Lys-N peptides whereas Mascot only identified 3672 unique peptides, ≈160% increase in the number of peptide identifications (a similar improvement is observed for ETD spectra of tryptic peptides).4 MS-GFDB also showed a significant 28% improvement in the number of identified peptides from CID spectra of tryptic peptides (16,203 peptides as compared with 12,658 peptides identified by Mascot).The ETD technology complements rather than replaces CID because both technologies have some advantages: CID for smaller peptides with small charges, ETD for larger and multiply charged peptides (20, 21). An alternative way to utilize ETD is to use it in conjunction with CID because CID and ETD generate complementary sequence information (20, 22, 23). ETD-enabled instruments often support generating both CID and ETD spectra (CID/ETD pairs) for the same peptide. Although the CID/ETD pairs promise a great improvement in peptide identification, the full potential of such pairs has not been fully realized yet. In the case of de novo sequencing, de novo sequencing tools utilizing CID/ETD pairs indeed result in more accurate de novo peptide sequencing than traditional CID-based algorithms (23, 24, 25). However, in the case of database search, the argument that the use of CID/ETD pairs improves peptide identifications remains poorly substantiated. A few tools are developed to use CID/ETD (or CID/ECD) pairs for the database search but they are limited to preprocessing/postprocessing of the spectral data before or following running a traditional database search tool (26, 27). Nielsen et al., 2005 (22) pioneered the combined use of CID and ECD for the database search. Given a CID/ECD pair, they generated a combined spectrum comprised only of complementary pairs of peaks, and searched it with Mascot.5 However, this approach is hard to generalize to less accurate CID/ETD pairs generated by ion-trap instruments because there is a higher chance that the identified complementary pairs of peaks are spurious. More importantly, using traditional MS/MS tools (such as Mascot) for the database search of the combined spectrum is inappropriate, because they are not optimized for analyzing such combined spectra; a better approach would be to develop a new database search tool tailored for the combined spectrum. Recently, Molina et al., 2008 (26) studied database search of CID/ETD pairs using Spectrum Mill (Agilent Technologies, Santa Clara, CA) and came to a counterintuitive conclusion that using only CID spectra identifies 12% more unique peptides than using CID/ETD pairs. We believe that it is an acknowledgment of limitations of the traditional MS/MS database search tools for the analysis of multiple spectra generated from a single peptide.In this paper, we modify the generating function approach for interpreting CID/ETD pairs and further apply it to improve the database search with CID/ETD pairs. In contrast to previous approaches, our scoring is specially designed to interpret CID/ETD pairs and can be generalized to analyzing any type of multiple spectra generated from a single peptide. When CID/ETD pairs from trypsin digests are used, MS-GFDB identified 13% and 27% more peptides compared with the case when only CID spectra and only ETD spectra are used, respectively. The difference was even more prominent when CID/ETD pairs from Lys-N digests were used, with 41% and 33% improvement over CID only and ETD only, respectively.Assigning a p value to a PSM greatly helped researchers to evaluate the quality of peptide identifications. We now turn to the problem of assigning a p value to a peptide-spectrum-spectrum match (PS2M) when two spectra in PS2M are generated by different fragmentation technologies (e.g. ETD and CID). We argue that assigning statistical significance to a PS2M (or even PSnM) is a prerequisite for rigorous CID/ETD analyses. To our knowledge, MS-GFDB is the first tool to generate statistically rigorous p values of PSnMs.The MS-GFDB executable and source code is available at the website of Center for Computational Mass Spectrometry at UCSD (http://proteomics.ucsd.edu). It takes a set of spectra (CID, ETD, or CID/ETD pairs) and a protein database as an input and outputs peptide matches. If the input is a set of CID/ETD pairs, it outputs the best scoring peptide matches and their p values (1) using only CID spectra, (2) using only ETD spectra, and (3) using combined spectra of CID/ETD pairs.  相似文献   

3.
We report on the effectiveness of CID, HCD, and ETD for LC-FT MS/MS analysis of peptides using a tandem linear ion trap-Orbitrap mass spectrometer. A range of software tools and analysis parameters were employed to explore the use of CID, HCD, and ETD to identify peptides (isolated from human blood plasma) without the use of specific "enzyme rules". In the evaluation of an FDR-controlled SEQUEST scoring method, the use of accurate masses for fragments increased the number of identified peptides (by ~50%) compared to the use of conventional low accuracy fragment mass information, and CID provided the largest contribution to the identified peptide data sets compared to HCD and ETD. The FDR-controlled Mascot scoring method provided significantly fewer peptide identifications than SEQUEST (by 1.3-2.3 fold) and CID, HCD, and ETD provided similar contributions to identified peptides. Evaluation of de novo sequencing and the UStags method for more intense fragment ions revealed that HCD afforded more contiguous residues (e.g., ≥ 7 amino acids) than either CID or ETD. Both the FDR-controlled SEQUEST and Mascot scoring methods provided peptide data sets that were affected by the decoy database used and mass tolerances applied (e.g., identical peptides between data sets could be limited to ~70%), while the UStags method provided the most consistent peptide data sets (>90% overlap). The m/z ranges in which CID, HCD, and ETD contributed the largest number of peptide identifications were substantially overlapping. This work suggests that the three peptide ion fragmentation methods are complementary and that maximizing the number of peptide identifications benefits significantly from a careful match with the informatics tools and methods applied. These results also suggest that the decoy strategy may inaccurately estimate identification FDRs.  相似文献   

4.
5.
Over the past decade peptide sequencing by collision induced dissociation (CID) has become the method of choice in mass spectrometry-based proteomics. The development of alternative fragmentation techniques such as electron transfer dissociation (ETD) has extended the possibilities within tandem mass spectrometry. Recent advances in instrumentation allow peptide fragment ions to be detected with high speed and sensitivity (e.g., in a 2D or 3D ion trap) or at high resolution and high mass accuracy (e.g., an Orbitrap or a ToF). Here, we describe a comprehensive experimental comparison of using ETD, ion-trap CID, and beam type CID (HCD) in combination with either linear ion trap or Orbitrap readout for the large-scale analysis of tryptic peptides. We investigate which combination of fragmentation technique and mass analyzer provides the best performance for the analysis of distinct peptide populations such as N-acetylated, phosphorylated, and tryptic peptides with up to two missed cleavages. We found that HCD provides more peptide identifications than CID and ETD for doubly charged peptides. In terms of Mascot score, ETD FT outperforms the other techniques for peptides with charge states higher than 2. Our data shows that there is a trade-off between spectral quality and speed when using the Orbitrap for fragment ion detection. We conclude that a decision-tree regulated combination of higher-energy collisional dissociation (HCD) and ETD can improve the average Mascot score.  相似文献   

6.
In proteomic studies, assigning protein identity from organisms whose genomes are yet to be completely sequenced remains a challenging task. For these organisms, protein identification is typically based on cross species matching of amino acid sequence obtained from collision induced dissociation (CID) of peptides using mass spectrometry. The most direct approach of de novo sequencing is slow and often difficult, due to the complexity of the resultant CID spectra. For MALDI-MS, this problem has been addressed by using chemical derivatisation to direct peptide fragmentation, thereby simplifying CID spectra and facilitating de novo interpretation. In this study, milk whey proteins from the tammar wallaby (Macropus eugenii) were used to evaluate three chemical derivatisation methods compatible with MALDI MS/MS. These methods included (i) guanidination and sulfonation using chemically-assisted fragmentation (CAF), (ii) guanidination and sulfonation using 4-sulfophenyl isothiocyanate (SPITC) and (iii) derivatising the epsilon-amino group of lysine residues with Lys Tag 4H. Derivatisation with CAF and SPITC resulted in more protein identification than Lys Tag 4H. Sulfonation using SPITC was the preferred method due to the low cost per experiment, the reactivity with both lysine and arginine terminated peptides and the resultant simplified MS/MS spectra.*Australian Peptide Conference Issue.**This project was funded by an ARC Linkage grant to Deane supported by TGR Biosciences and facilitated by access to the Australian Proteome Analysis Facility established under the Australian Government’s Major National Research Facilities program.  相似文献   

7.
Kim MS  Pandey A 《Proteomics》2012,12(4-5):530-542
Mass spectrometry has rapidly evolved to become the platform of choice for proteomic analysis. While CID remains the major fragmentation method for peptide sequencing, electron transfer dissociation (ETD) is emerging as a complementary method for the characterization of peptides and post-translational modifications (PTMs). Here, we review the evolution of ETD and some of its newer applications including characterization of PTMs, non-tryptic peptides and intact proteins. We will also discuss some of the unique features of ETD such as its complementarity with CID and the use of alternating CID/ETD along with issues pertaining to analysis of ETD data. The potential of ETD for applications such as multiple reaction monitoring and proteogenomics in the future will also be discussed.  相似文献   

8.
Large scale mass spectrometry analysis of N-linked glycopeptides is complicated by the inherent complexity of the glycan structures. Here, we evaluate a mass spectrometry approach for the targeted analysis of N-linked glycopeptides in complex mixtures that does not require prior knowledge of the glycan structures or pre-enrichment of the glycopeptides. Despite the complexity of N-glycans, the core of the glycan remains constant, comprising two N-acetylglucosamine and three mannose units. Collision-induced dissociation (CID) mass spectrometry of N-glycopeptides results in the formation of the N-acetylglucosamine (GlcNAc) oxonium ion and a [mannose+GlcNAc] fragment (in addition to other fragments resulting from cleavage within the glycan). In ion-trap CID, those ions are not detected due to the low m/z cutoff; however, they are detected following the beam-type CID known as higher energy collision dissociation (HCD) on the orbitrap mass spectrometer. The presence of these product ions following HCD can be used as triggers for subsequent electron transfer dissociation (ETD) mass spectrometry analysis of the precursor ion. The ETD mass spectrum provides peptide sequence information, which is unobtainable from HCD. A Lys-C digest of ribonuclease B and trypsin digest of immunoglobulin G were separated by ZIC-HILIC liquid chromatography and analyzed by HCD product ion-triggered ETD. The data were analyzed both manually and by search against protein databases by commonly used algorithms. The results show that the product ion-triggered approach shows promise for the field of glycoproteomics and highlight the requirement for more sophisticated data mining tools.  相似文献   

9.
Wiesner J  Premsler T  Sickmann A 《Proteomics》2008,8(21):4466-4483
Despite major advantages in the field of proteomics, the analysis of PTMs still poses a major challenge; thus far, preventing insights into the role and regulation of protein networks. Additionally, top-down sequencing of proteins is another powerful approach to reveal comprehensive information for biological function. A commonly used fragmentation technique in MS-based peptide sequencing is CID. As CID often fails in PTM-analysis and performs best on doubly-charged, short and middle-sized peptides, confident peptide identification may be hampered. A newly developed fragmentation technique, namely electron transfer dissociation (ETD), supports both, PTM- and top-down analysis, and generally results in more confident identification of long, highly charged or modified peptides. The following review presents the theoretical background of ETD and its technical implementation in mass analyzers. Furthermore, current improvements of ETD and approaches for the PTM-analysis and top-down sequencing are introduced. Alternating both fragmentation techniques, ETD and CID, increases the amount of information derived from peptide fragmentation, thereby enhancing both, peptide sequence coverage and the confidence of peptide and protein identification.  相似文献   

10.
Isobaric tagging, via TMT or iTRAQ, is widely used in quantitative proteomics. To date, tandem mass spectrometric analysis of isobarically-labeled peptides with hybrid ion trap–orbitrap (LTQ-OT) instruments has been mainly carried out with higher-energy C-trap dissociation (HCD) or pulsed q dissociation (PQD). HCD provides good fragmentation of the reporter-ions, but peptide sequence-ion recovery is generally poor compared to collision-induced dissociation (CID). Herein, we describe an approach where CID and HCD spectra are combined. The approach ensures efficiently both identification and relative quantification of proteins. Tandem mass tags (TMTs) were used to label digests of human plasma and LC-MS/MS was performed with an LTQ-OT instrument. Different HCD collision energies were tested. The benefits to use CID and HCD with respect to HCD alone were demonstrated in terms of number of identifications, subsequent number of quantifiable proteins, and quantification accuracy. A program was developed to merge the peptide sequence-ion m/z range from CID spectra and the reporter-ion m/z range from HCD spectra, and alternatively to separate both spectral data into different files. As parallel CID in the LTQ almost doesn't affect the analysis duty cycle, the procedure should become a standard for quantitative analyses of proteins with isobaric tagging using LTQ-OT instruments.  相似文献   

11.
Panax ginseng is an important herb that has clear effects on the treatment of diverse diseases. Until now, the natural peptide constitution of this herb remains unclear. Here, we conduct an extensive characterization of Ginseng peptidome using MS‐based data mining and sequencing. The screen on the charge states of precursor ions indicated that Ginseng is a peptide‐rich herb in comparison of a number of commonly used herbs. The Ginseng peptides were then extracted and submitted to nano‐LC‐MS/MS analysis using different fragmentation modes, including CID, high‐energy collisional dissociation, and electron transfer dissociation. Further database search and de novo sequencing allowed the identification of total 308 peptides, some of which might have important biological activities. This study illustrates the abundance and sequences of endogenous Ginseng peptides, thus providing the information of more candidates for the screening of active compounds for future biological research and drug discovery studies.  相似文献   

12.
Full-length de novo sequencing from tandem mass (MS/MS) spectra of unknown proteins such as antibodies or proteins from organisms with unsequenced genomes remains a challenging open problem. Conventional algorithms designed to individually sequence each MS/MS spectrum are limited by incomplete peptide fragmentation or low signal to noise ratios and tend to result in short de novo sequences at low sequencing accuracy. Our shotgun protein sequencing (SPS) approach was developed to ameliorate these limitations by first finding groups of unidentified spectra from the same peptides (contigs) and then deriving a consensus de novo sequence for each assembled set of spectra (contig sequences). But whereas SPS enables much more accurate reconstruction of de novo sequences longer than can be recovered from individual MS/MS spectra, it still requires error-tolerant matching to homologous proteins to group smaller contig sequences into full-length protein sequences, thus limiting its effectiveness on sequences from poorly annotated proteins. Using low and high resolution CID and high resolution HCD MS/MS spectra, we address this limitation with a Meta-SPS algorithm designed to overlap and further assemble SPS contigs into Meta-SPS de novo contig sequences extending as long as 100 amino acids at over 97% accuracy without requiring any knowledge of homologous protein sequences. We demonstrate Meta-SPS using distinct MS/MS data sets obtained with separate enzymatic digestions and discuss how the remaining de novo sequencing limitations relate to MS/MS acquisition settings.Database search tools, such as Sequest (3), Mascot (4), and InsPecT (5), are the most frequently used methods for reliable protein identification in tandem mass (MS/MS) spectrometry based proteomics. These operate by separately matching each MS/MS spectrum to peptide sequences from reference protein databases where all proteins of interest are presumably contained. But this assumption often does not hold true as many important proteins, such as monoclonal antibodies, are not contained in any database because mechanisms of antibody variation (including genetic recombination and somatic hyper-mutation (6)) constantly create new proteins with novel unique sequences. These mechanisms of variation are the foundation of adaptive immune systems and have enabled highly successful antibody-based therapeutic strategies (7, 8). Nevertheless, such variation also means that antibody MS/MS spectra are typically impossible to identify via standard database search techniques whenever the corresponding sequences are not known in advance. An inherent drawback of database search strategies is that they are only as good as the database(s) being searched and incomplete databases often result in proteins being misidentified or left unidentified (9).Despite the importance of novel protein identification, few high-throughput methods have been developed for de novo sequencing of unknown proteins. Low-throughput Edman degradation is a well-known de novo sequencing approach that can accurately call amino acid sequences in N/C-terminal regions of unknown proteins but has drawbacks that make it unsuitable for sequencing proteins longer than 50 amino acids or proteins with post-translational modifications (10, 11). Many have recognized the potential of tandem mass spectrometry for protein sequencing. For example, in 1987 Johnson and Biemann (12) manually sequenced a complete protein from rabbit bone marrow. Meanwhile, automated de novo sequencing methods that rely on interpretations of individual MS/MS spectra are limited in that they typically cannot reconstruct long (8+ AA) sequences without mis-predicting 1 in 5 AA on average for low accuracy collision-induced dissociation (CID) spectra (13, 14). Recent advances in de novo peptide sequencing have improved sequencing accuracy to over 95% for high resolution higher energy collisional dissociation (HCD)1 spectra (15), but at limited sequence coverage (Chi H et al. report only 55% sequence coverage of peptides identified by database search). In fact, all current per-spectrum de novo sequencing strategies face a significant tradeoff between sequencing accuracy and coverage as spectra exhibiting complete peptide fragmentation rarely cover entire target proteins, yet are required to accurately reconstruct full-length peptide sequences. An alternative approach to separately sequencing individual spectra is to simultaneously interpret multiple MS/MS spectra from overlapping peptides. This Shotgun Protein Sequencing (SPS) paradigm differs from traditional algorithms by deriving consensus sequences from contigs - sets of multiple MS/MS spectra from distinct peptides with overlapping sequences (1, 16). Because SPS aggregates multiple spectra from overlapping peptides, protein sequences extending beyond the length of enzymatically digested peptides can be extracted from spectra with incomplete peptide fragmentation. Furthermore, SPS has been found to generate sequences that frequently cover 90–95+% of the target protein sequence(s) whereas mis-predicting only 1 out of every 20 amino acids on high resolution MS/MS spectra (2). But a remaining limitation of SPS is that it still generates fragmented sequences that do not singularly cover large regions of the target protein sequences, much less complete proteins: SPS sequences have an average length of 10–15 amino acids (depending on input data) and the longest recovered SPS de novo sequence is less than 45 amino acids long (1).The considerable limitations of de novo sequencing strategies have typically been addressed by attempting to circumvent them using error-tolerant matching to known protein sequences. One such strategy (17) is to generate short de novo sequence tags and then match them exactly to protein databases without requiring matching the N/C-term flanking masses (to allow for unexpected polymorphisms or post-translational modifications). Short sequence tags are usually derived from parts of the spectrum with high signal-to-noise ratios and typically have higher sequencing accuracy than full-length de novo sequences (18). This approach was later extended in MS-Shotgun (19) and continues to be a popular technique for speeding up database search tools (5, 2022). Homology matching of full length de novo sequences was first explored in CIDentify (23) and later in MS-BLAST (24) by searching de novo sequences using FASTA and WU-BLAST2 (respectively) to find homologous matches to sequences of related proteins; FASTS (25) also approached the problem using a modified version of FASTA. However, common de novo sequencing errors tend to produce sequences that are heavily penalized in pure sequence homology searches. For example, missing peaks in MS/MS spectra may easily cause GA subsequences to be reconstructed as Q or AG (same-mass sequences), thus making subsequent BLAST searches unlikely to succeed. This issue was partially considered in CIDentify and more thoroughly addressed in SPIDER (26) by explicitly modeling de novo sequencing errors together with BLOSUM scores in MS/MS-based sequence homology searches. In addition, OpenSea (27) further explored database matching of de novo sequences for analysis of unexpected post-translational modifications (PTMs). Finally, Shen et al. (28) used short unique de novo sequence tags, called UStags, to discover protein-localized PTMs.Recent approaches to homology matching of de novo sequences have built on genome assembly and sequencing techniques to achieve database-assisted full-length sequencing of unknown proteins. Comparative Shotgun Protein Sequencing (cSPS) complemented SPS assembly techniques with usage of error tolerant matching of de novo sequences to find overlapping SPS de novo sequences that are then further assembled into full-length protein sequences (2). cSPS was designed to support the sequencing of highly divergent proteins that have regions close enough in homology to transfer matches from a reference. cSPS was shown to enable de novo sequencing of monoclonal antibodies at 95+% sequencing accuracy, while simultaneously tolerating and identifying unexpected PTMs (29). In difference from cSPS, Champs (30) de novo sequences individual spectra to obtain putative peptide sequences, which are then mapped to homologous proteins to correct sequencing errors and reconstruct protein sequences with 100% accuracy and 99% coverage. However, Champs is designed to only map peptides that differ from the reference sequence by one or two amino acids and does not handle PTMs. As such, its sequencing accuracy is not directly comparable to that of cSPS as Champs was not designed to sequence highly divergent proteins (such as monoclonal antibodies) with multiple PTMs, insertions, deletions, and/or recombinations. GenoMS (31) extended the approaches in cSPS/Champs by explicitly modeling protein splice variants as paths in splice graphs where nodes represent translated exon regions (32). MS/MS spectra are first searched for exact sequence matches against all possible protein isoforms. The remaining unidentified MS/MS spectra are then aligned to the matched peptides and de novo sequenced to extend the matched sequences into novel regions. Reported sequences are 97–99% accurate and cover 96–99% of target proteins depending on sequence similarity between the novel and reference sequences (31). However, GenoMS de novo sequences are usually extended less than 3 amino acids beyond matched peptides because sequencing accuracy degrades as sequences are extended, thus preventing the consistent extension of long (10+ AA) sequences. Altogether, the use of homology matching approaches for full-length de novo protein sequencing continues to be limited by 1) requiring the previous knowledge of closely related protein sequences and 2) the inherent difficulties in statistically significant homology-tolerant matching of error-prone short de novo sequences.The Meta-SPS approach proposed here seeks to de novo sequence complete proteins, or long protein regions, without any use of a database. Meta-SPS builds upon SPS by treating SPS de novo sequences (contig sequences) as input spectra and further assembling them into longer de novo sequences (meta-contig sequences). We show that Meta-SPS extends de novo sequences to lengths over 100 AA while boosting sequencing accuracy to only 1 mistake per 40 amino acid predictions, thus enabling database-free de novo sequencing of completely novel proteins while also allowing error-tolerant matching approaches to support higher-divergence homologies (by searching longer, more accurate de novo sequences). Meta-SPS algorithms are demonstrated on CID and HCD MS/MS spectra and its limitations are discussed in relation to the underlying limitations of bottom-up tandem mass spectrometry.  相似文献   

13.
Triply and doubly charged iTRAQ ( isobaric tagging for relative and absolute quantitation) labeled peptide cations from a tryptic peptide mixture of bovine carbonic anhydrase II were subjected to electron transfer ion/ion reactions to investigate the effect of charge bearing modifications associated with iTRAQ on the fragmentation pattern. It was noted that electron transfer dissociation (ETD) of triply charged or activated ETD (ETD and supplemental collisional activation of intact electron transfer species) of doubly charged iTRAQ tagged peptide ions yielded extensive sequence information, in analogy with ETD of unmodified peptide ions. That is, addition of the fixed charge iTRAQ tag showed relatively little deleterious effect on the ETD performance of the modified peptides. ETD of the triply charged iTRAQ labeled peptide ions followed by collision-induced dissociation (CID) of the product ion at m/ z 162 yielded the reporter ion at m/ z 116, which is the reporter ion used for quantitation via CID of the same precursor ions. The reporter ion formed via the two-step activation process is expected to provide quantitative information similar to that directly produced from CID. A 103 Da neutral loss species observed in the ETD spectra of all the triply and doubly charged iTRAQ labeled peptide ions is unique to the 116 Da iTRAQ reagent, which implies that this process also has potential for quantitation of peptides/proteins. Therefore, ETD with or without supplemental collisional activation, depending on the precursor ion charge state, has the potential to directly identify and quantify the peptides/proteins simultaneously using existing iTRAQ reagents.  相似文献   

14.
Kim MS  Zhong J  Kandasamy K  Delanghe B  Pandey A 《Proteomics》2011,11(12):2568-2572
CID has become a routine method for fragmentation of peptides in shotgun proteomics, whereas electron transfer dissociation (ETD) has been described as a preferred method for peptides carrying labile PTMs. Though both of these fragmentation techniques have their obvious advantages, they also have their own drawbacks. By combining data from CID and ETD fragmentation, some of these disadvantages can potentially be overcome because of the complementarity of fragment ions produced. To evaluate alternating CID and ETD fragmentation, we analyzed a complex mixture of phosphopeptides on an LTQ-Orbitrap mass spectrometer. When the CID and ETD-derived spectra were searched separately, we observed 2504, 491, 2584, and 3249 phosphopeptide-spectrum matches from CID alone, ETD alone, decision tree-based CID/ETD, and alternating CID and ETD, respectively. Combining CID and ETD spectra prior to database searching should, intuitively, be superior to either method alone. However, when spectra from the alternating CID and ETD method were merged prior to database searching, we observed a reduction in the number of phosphopeptide-spectrum matches. The poorer identification rates observed after merging CID and ETD spectra are a reflection of a lack of optimized search algorithms for carrying out such searches and perhaps inherent weaknesses of this approach. Thus, although alternating CID and ETD experiments for phosphopeptide identification are desirable for increasing the confidence of identifications, merging spectra prior to database search has to be carefully evaluated further in the context of the various algorithms before adopting it as a routine strategy.  相似文献   

15.
Peptide identification using tandem mass spectrometry is a core technology in proteomics. Latest generations of mass spectrometry instruments enable the use of electron transfer dissociation (ETD) to complement collision induced dissociation (CID) for peptide fragmentation. However, a critical limitation to the use of ETD has been optimal database search software. Percolator is a post-search algorithm, which uses semi-supervised machine learning to improve the rate of peptide spectrum identifications (PSMs) together with providing reliable significance measures. We have previously interfaced the Mascot search engine with Percolator and demonstrated sensitivity and specificity benefits with CID data. Here, we report recent developments in the Mascot Percolator V2.0 software including an improved feature calculator and support for a wider range of ion series. The updated software is applied to the analysis of several CID and ETD fragmented peptide data sets. This version of Mascot Percolator increases the number of CID PSMs by up to 80% and ETD PSMs by up to 60% at a 0.01 q-value (1% false discovery rate) threshold over a standard Mascot search, notably recovering PSMs from high charge state precursor ions. The greatly increased number of PSMs and peptide coverage afforded by Mascot Percolator has enabled a fuller assessment of CID/ETD complementarity to be performed. Using a data set of CID and ETcaD spectral pairs, we find that at a 1% false discovery rate, the overlap in peptide identifications by CID and ETD is 83%, which is significantly higher than that obtained using either stand-alone Mascot (69%) or OMSSA (39%). We conclude that Mascot Percolator is a highly sensitive and accurate post-search algorithm for peptide identification and allows direct comparison of peptide identifications using multiple alternative fragmentation techniques.  相似文献   

16.
Ribosomal ribonucleic acid (RNA), transfer RNA and other biological or synthetic RNA polymers can contain nucleotides that have been modified by the addition of chemical groups. Traditional Sanger sequencing methods cannot establish the chemical nature and sequence of these modified-nucleotide containing oligomers. Mass spectrometry (MS) has become the conventional approach for determining the nucleotide composition, modification status and sequence of modified RNAs. Modified RNAs are analyzed by MS using collision-induced dissociation tandem mass spectrometry (CID MS/MS), which produces a complex dataset of oligomeric fragments that must be interpreted to identify and place modified nucleosides within the RNA sequence. Here we report the development of RoboOligo, an interactive software program for the robust analysis of data generated by CID MS/MS of RNA oligomers. There are three main functions of RoboOligo: (i) automated de novo sequencing via the local search paradigm. (ii) Manual sequencing with real-time spectrum labeling and cumulative intensity scoring. (iii) A hybrid approach, coined ‘variable sequencing’, which combines the user intuition of manual sequencing with the high-throughput sampling of automated de novo sequencing.  相似文献   

17.

Background  

Often high-quality MS/MS spectra of tryptic peptides do not match to any database entry because of only partially sequenced genomes and therefore, protein identification requires de novo peptide sequencing. To achieve protein identification of the economically important but still unsequenced plant pathogenic oomycete Plasmopara halstedii, we first evaluated the performance of three different de novo peptide sequencing algorithms applied to a protein digests of standard proteins using a quadrupole TOF (QStar Pulsar i).  相似文献   

18.
Collision‐activated dissociation and electron‐transfer dissociation (ETD) each produce spectra containing unique features. Though several database search algorithms (e.g. SEQUEST, MASCOT, and Open Mass Spectrometry Search Algorithm) have been modified to search ETD data, this consists chiefly of the ability to search for c‐ and z?‐ions; additional ETD‐specific features are often unaccounted for and may hinder identification. Removal of these features via spectral processing increased total search sensitivity by ~20% for both human and yeast data sets; unique peptide identifications increased by ~17% for the yeast data sets and ~16% for the human data set.  相似文献   

19.
The use of electron transfer dissociation (ETD) fragmentation for analysis of peptides eluting in liquid chromatography tandem mass spectrometry experiments is increasingly common and can allow identification of many peptides and proteins in complex mixtures. Peptide identification is performed through the use of search engines that attempt to match spectra to peptides from proteins in a database. However, software for the analysis of ETD fragmentation data is currently less developed than equivalent algorithms for the analysis of the more ubiquitous collision-induced dissociation fragmentation spectra. In this study, a new scoring system was developed for analysis of peptide ETD fragmentation data that varies the ion type weighting depending on the precursor ion charge state and peptide sequence. This new scoring regime was applied to the analysis of data from previously published results where four search engines (Mascot, Open Mass Spectrometry Search Algorithm (OMSSA), Spectrum Mill, and X!Tandem) were compared (Kandasamy, K., Pandey, A., and Molina, H. (2009) Evaluation of several MS/MS search algorithms for analysis of spectra derived from electron transfer dissociation experiments. Anal. Chem. 81, 7170–7180). Protein Prospector identified 80% more spectra at a 1% false discovery rate than the most successful alternative searching engine in this previous publication. These results suggest that other search engines would benefit from the application of similar rules.The recently developed fragmentation approach of electron transfer dissociation (ETD)1 has become a genuine alternative to the more ubiquitous collision-induced dissociation (CID) for high throughput and high sensitivity proteomic analysis (13). ETD (4) and the related fragmentation process electron capture dissociation (ECD) (5) have been demonstrated to have particular advantages for the analysis of large peptides and small proteins (68) as well as the analysis of peptides bearing labile post-translational modifications (911). The results achieved through ETD and ECD analysis have been shown to be highly complementary to those obtained through CID fragmentation analysis, both through increasing confidence in particular identifications of peptides and also by allowing identification of extra components in complex mixtures (10, 12, 13). As CID and ETD can be sequentially or alternatively performed on precursor ions in the same mass spectrometric run, it is expected that the combined use of these two fragmentation analysis techniques will become increasingly common to enable more comprehensive sample analysis.Software for analysis of CID spectra is significantly more advanced than that for ECD/ETD data. This is partly because the behavior of peptides under CID fragmentation is better characterized and understood so software has been developed that is better able to predict the fragment ions expected. The fragment ion types observed in ETD and ECD are largely known (5, 14, 15), but information about the frequency and peak intensities of the different ion types observed is less well documented.We recently performed a study to characterize how frequently the different fragment ion types are detected in ETD spectra when analyzing complex digest mixtures produced by proteolytic enzymes or chemical cleavage reagents of different sequence specificity (16). These results were analyzed with respect to precursor charge state and location of basic residues, which were both shown to be significant factors in controlling the fragment ion types observed. The results showed that ETD spectra of doubly charged precursor ions produced very different fragment ions depending on the location of a basic residue in the sequence.Based on this statistical analysis of ETD data from a diverse range of peptides (16), in the present study, a new scoring system was developed and implemented in the search engine Batch-Tag within Protein Prospector that adjusts the weighting for different fragment ion types based on the precursor charge state and the presence of basic amino acid residues at either peptide terminus. The results using this new scoring system were compared with the previous generation of Batch-Tag, which used ion score weightings based on the average frequency of observation of different fragment types in ETD spectra of tryptic peptides and used the same scoring irrespective of precursor charge and sequence. The performance of this new scoring was also compared with those reported by other search engines using results previously published from a large standard data set (17). The new scoring system allowed identification of significantly more spectra than achieved with the previous scoring system. It also assigned 80% more spectra than the most successful of the compared search engines when using the same false discovery rate threshold.  相似文献   

20.
The 4-plex iTRAQ platform was utilized to analyze the protein profiles in four stages of grapevine berry skin ripening, from pre-veraison to fully ripening. Mass spectrometric data were acquired from three replicated analyses using a parallel acquisition method in an Orbitrap instrument by combining collision-induced dissociation (CID) and higher energy collision-induced dissociation (HCD) peptide ion fragmentations. As a result, the number of spectra suitable for peptide identification (either from CID or HCD) increased 5-fold in relation to those suitable for quantification (from HCD). Spectra were searched against an NCBInr protein database subset containing all the Vitis sequences, including those derived from whole genome sequencing. In general, 695 unique proteins were identified with more than one single peptide, and 513 of them were quantified. The sequence annotation and GO term enrichment analysis assisted by the automatic annotation tool Blast2GO permitted a pathway analysis which resulted in finding that biological processes and metabolic pathways de-regulated throughout ripening. A detailed analysis of the function-related proteins profiles helped discover a set of proteins of known Vitis gene origin as the potential candidates to play key roles in grapevine berry quality, growth regulation and disease resistance.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号