首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 546 毫秒
1.
A novel computational approach, termed Search for Modified Peptides (SeMoP), for the unrestricted discovery and verification of peptide modifications in shotgun proteomic experiments using low resolution ion trap MS/MS spectra is presented. Various peptide modifications, including post-translational modifications, sequence polymorphisms, as well as sample handling-induced changes, can be identified using this approach. SeMoP utilizes a three-step strategy: (1) a standard database search to identify proteins in a sample; (2) an unrestricted search for modifications using a newly developed algorithm; and (3) a second standard database search targeted to specific modifications found using the unrestricted search. This targeted approach provides verification of discovered modifications and, due to increased sensitivity, a general increase in the number of peptides with the specific modification. The feasibility of the overall strategy has been first demonstrated in the analysis of 65 plasma proteins. Various sample handling induced modifications, such as beta-elimination of disulfide bridges and pyrocarbamidomethylation, as well as biologically induced modifications, such as phosphorylation and methylation, have been detected. A subsequent targeted Sequest search has been used to verify selected modifications, and a 4-fold increase in the number of modified peptides was obtained. In a second application, 1367 proteins of a cervical cancer cell line were processed, leading to detection of several novel amino acid substitutions. By conducting the search against a database of peptides derived from proteins with decoy sequences, a false discovery rate of less than 5% for the unrestricted search resulted. SeMoP is shown to be an effective and easily implemented approach for the discovery and verification of peptide modifications.  相似文献   

2.
Chromatographed peptide signals form the basis of further data processing that eventually results in functional information derived from data‐dependent bottom‐up proteomics assays. We seek to rank LC/MS parent ions by the quality of their extracted ion chromatograms. Ranked extracted ion chromatograms act as an intuitive physical/chemical preselection filter to improve the quality of MS/MS fragment scans submitted for database search. We identify more than 4900 proteins when considering detector shifts of less than 7 ppm. High quality parent ions for which the database search yields no hits become candidates for subsequent unrestricted analysis for PTMs. Following this rational approach, we prioritize identification of more than 5000 spectrum matches from modified peptides and confirmed the presence of acetylaldehyde‐modified His/Lys. We present a logical workflow that scores data‐dependent selected ion chromatograms and leverage information about semianalytical LC/LC dimension prior to MS. Our method can be successfully used to identify unexpected modifications in peptides with excellent chromatography characteristics, independent of fragmentation pattern and activation methods. We illustrate analysis of ion chromatograms detected in two different modes by RF linear ion trap and electrostatic field orbitrap.  相似文献   

3.
We describe a method to identify cross-linked peptides from complex samples and large protein sequence databases by combining isotopically tagged cross-linkers, chromatographic enrichment, targeted proteomics and a new search engine called xQuest. This software reduces the search space by an upstream candidate-peptide search before the recombination step. We showed that xQuest can identify cross-linked peptides from a total Escherichia coli lysate with an unrestricted database search.  相似文献   

4.
High-throughput proteomics is made possible by a combination of modern mass spectrometry instruments capable of generating many millions of tandem mass (MS(2)) spectra on a daily basis and the increasingly sophisticated associated software for their automated identification. Despite the growing accumulation of collections of identified spectra and the regular generation of MS(2) data from related peptides, the mainstream approach for peptide identification is still the nearly two decades old approach of matching one MS(2) spectrum at a time against a database of protein sequences. Moreover, database search tools overwhelmingly continue to require that users guess in advance a small set of 4-6 post-translational modifications that may be present in their data in order to avoid incurring substantial false positive and negative rates. The spectral networks paradigm for analysis of MS(2) spectra differs from the mainstream database search paradigm in three fundamental ways. First, spectral networks are based on matching spectra against other spectra instead of against protein sequences. Second, spectral networks find spectra from related peptides even before considering their possible identifications. Third, spectral networks determine consensus identifications from sets of spectra from related peptides instead of separately attempting to identify one spectrum at a time. Even though spectral networks algorithms are still in their infancy, they have already delivered the longest and most accurate de novo sequences to date, revealed a new route for the discovery of unexpected post-translational modifications and highly-modified peptides, enabled automated sequencing of cyclic non-ribosomal peptides with unknown amino acids and are now defining a novel approach for mapping the entire molecular output of biological systems that is suitable for analysis with tandem mass spectrometry. Here we review the current state of spectral networks algorithms and discuss possible future directions for automated interpretation of spectra from any class of molecules.  相似文献   

5.
Protein and peptide mass analysis and amino acid sequencing by mass spectrometry is widely used for identification and annotation of post-translational modifications (PTMs) in proteins. Modification-specific mass increments, neutral losses or diagnostic fragment ions in peptide mass spectra provide direct evidence for the presence of post-translational modifications, such as phosphorylation, acetylation, methylation or glycosylation. However, the commonly used database search engines are not always practical for exhaustive searches for multiple modifications and concomitant missed proteolytic cleavage sites in large-scale proteomic datasets, since the search space is dramatically expanded. We present a formal definition of the problem of searching databases with tandem mass spectra of peptides that are partially (sub-stoichiometrically) modified. In addition, an improved search algorithm and peptide scoring scheme that includes modification specific ion information from MS/MS spectra was implemented and tested using the Virtual Expert Mass Spectrometrist (VEMS) software. A set of 2825 peptide MS/MS spectra were searched with 16 variable modifications and 6 missed cleavages. The scoring scheme returned a large set of post-translationally modified peptides including precise information on modification type and position. The scoring scheme was able to extract and distinguish the near-isobaric modifications of trimethylation and acetylation of lysine residues based on the presence and absence of diagnostic neutral losses and immonium ions. In addition, the VEMS software contains a range of new features for analysis of mass spectrometry data obtained in large-scale proteomic experiments. Windows binaries are available at http://www.yass.sdu.dk/.  相似文献   

6.
Mass spectrometry (MS) analysis of peptides carrying post‐translational modifications is challenging due to the instability of some modifications during MS analysis. However, glycopeptides as well as acetylated, methylated and other modified peptides release specific fragment ions during CID (collision‐induced dissociation) and HCD (higher energy collisional dissociation) fragmentation. These fragment ions can be used to validate the presence of the PTM on the peptide. Here, we present PTM MarkerFinder, a software tool that takes advantage of such marker ions. PTM MarkerFinder screens the MS/MS spectra in the output of a database search (i.e., Mascot) for marker ions specific for selected PTMs. Moreover, it reports and annotates the HCD and the corresponding electron transfer dissociation (ETD) spectrum (when present), and summarizes information on the type, number, and ratios of marker ions found in the data set. In the present work, a sample containing enriched N‐acetylhexosamine (HexNAc) glycopeptides from yeast has been analyzed by liquid chromatography‐mass spectrometry on an LTQ Orbitrap Velos using both HCD and ETD fragmentation techniques. The identification result (Mascot .dat file) was submitted as input to PTM MarkerFinder and screened for HexNAc oxonium ions. The software output has been used for high‐throughput validation of the identification results.  相似文献   

7.
Protein identification has been greatly facilitated by database searches against protein sequences derived from product ion spectra of peptides. This approach is primarily based on the use of fragment ion mass information contained in a MS/MS spectrum. Unambiguous protein identification from a spectrum with low sequence coverage or poor spectral quality can be a major challenge. We present a two-dimensional (2D) mass spectrometric method in which the numbers of nitrogen atoms in the molecular ion and the fragment ions are used to provide additional discriminating power for much improved protein identification and de novo peptide sequencing. The nitrogen number is determined by analyzing the mass difference of corresponding peak pairs in overlaid spectra of (15)N-labeled and unlabeled peptides. These peptides are produced by enzymatic or chemical cleavage of proteins from cells grown in (15)N-enriched and normal media, respectively. It is demonstrated that, using 2D information, i.e., m/z and its associated nitrogen number, this method can, not only confirm protein identification results generated by MS/MS database searching, but also identify peptides that are not possible to identify by database searching alone. Examples are presented of analyzing Escherichia coli K12 extracts that yielded relatively poor MS/MS spectra, presumably from the digests of low abundance proteins, which can still give positive protein identification using this method. Additionally, this 2D MS method can facilitate spectral interpretation for de novo peptide sequencing and identification of posttranslational or other chemical modifications. We envision that this method should be particularly useful for proteome expression profiling of organelles or cells that can be grown in (15)N-enriched media.  相似文献   

8.
MS2 library spectra are rich in reproducible information about peptide fragmentation patterns compared to theoretical spectra modeled by a sequence search tool. So far, spectrum library searches are mostly applied to detect peptides as they are present in the library. However, they also allow finding modified variants of the library peptides if the search is done with a large precursor mass window and an adapted Spectrum-Spectrum Match (SSM) scoring algorithm. We perform a thorough evaluation on the use of library spectra as opposed to theoretical peptide spectra for the identification of PTMs, analyzing spectra of a well-annotated modification-rich test data set compiled from public data repositories. These initial studies motivate the development of our modification tolerant spectrum library search tool QuickMod, designed to identify modified variants of the peptides listed in the spectrum library without any prior input from the user estimating the modifications present in the sample. We built the search algorithm of QuickMod after carefully testing different SSM similarity scores. The final spectrum scoring scheme uses a support vector machine (SVM) on a selection of scoring features to classify correct and incorrect SSM. After identification of a list of modified peptides at a given False Discovery Rate (FDR), the modifications need to be positioned on the peptide sequence. We present a rapid modification site assignment algorithm and evaluate its positioning accuracy. Finally, we demonstrate that QuickMod performs favorably in terms of speed and identification rate when compared to other software solutions for PTM analysis.  相似文献   

9.
The discovery of unanticipated protein modifications is one of the most challenging problems in proteomics. Whereas widely used algorithms such as Sequest and Mascot enable mapping of modifications when the mass and amino acid specificity are known, unexpected modifications cannot be identified with these tools. We have developed an algorithm and software called P-Mod, which enables discovery and sequence mapping of modifications to target proteins known to be represented in the analysis or identified by Sequest. P-Mod matches MS/MS spectra to peptide sequences in a search list. For spectra of modified peptides, P-Mod calculates mass differences between search peptide sequences and MS/MS precursors and localizes the mass shift to a sequence position in the peptide. Because modifications are detected as mass shifts, P-Mod does not require the user to guess at masses or sequence locations of modifications. P-Mod uses extreme value statistics to assign p value estimates to sequence-to-spectrum matches. The reported p values are scaled to account for the number of comparisons, so that error rates do not increase with the expanded search lists that result from incorporating potential peptide modifications. Combination of P-Mod searches from multiple LC-MS/MS analyses and multiple samples revealed previously unreported BSA modifications, including a novel decarboxymethylation or D-->G substitution at position 579 of the protein. P-Mod can serve a unique role in the identification of protein modifications both from exogenous and endogenous sources and may be useful for identifying modified protein forms as biomarkers for toxicity and disease processes.  相似文献   

10.
Protein phosphorylation, one of the most important protein post-translational modifications, is involved in various biological processes, and the identification of phosphorylation peptides (phosphopeptides) and their corresponding phosphorylation sites (phosphosites) will facilitate the understanding of the molecular mechanism and function of phosphorylation. Mass spectrometry (MS) provides a high-throughput technology that enables the identification of large numbers of phosphosites. PhoPepMass is designed to assist human phosphopeptide identification from MS data based on a specific database of phophopeptide masses and a multivariate hypergeometric matching algorithm. It contains 244,915 phosphosites from several public sources. Moreover, the accurate masses of peptides and fragments with phosphosites were calculated. It is the first database that provides a systematic resource for the query of phosphosites on peptides and their corresponding masses. This allows researchers to search certain proteins of which phosphosites have been reported, to browse detailed phosphopeptide and fragment information, to match masses from MS analyses with defined threshold to the corresponding phosphopeptide, and to compare proprietary phosphopeptide discovery results with results from previous studies. Additionally, a database search software is created and a “two-stage search strategy” is suggested to identify phosphopeptides from tandem mass spectra of proteomics data. We expect PhoPepMass to be a useful tool and a source of reference for proteomics researchers. PhoPepMass is available at https://www.scbit.org/phopepmass/index.html.  相似文献   

11.
Informatics for protein identification by mass spectrometry   总被引:3,自引:0,他引:3  
High throughput protein analysis (i.e., proteomics) first became possible when sensitive peptide mass mapping techniques were developed, thereby allowing for the possibility of identifying and cataloging most 2D gel electrophoresis spots. Shortly thereafter a few groups pioneered the idea of identifying proteins by using peptide tandem mass spectra to search protein sequence databases. Hence, it became possible to identify proteins from very complex mixtures. One drawback to these latter techniques is that it is not entirely straightforward to make matches using tandem mass spectra of peptides that are modified or have sequences that differ slightly from what is present in the sequence database that is being searched. This has been part of the motivation behind automated de novo sequencing programs that attempt to derive a peptide sequence regardless of its presence in a sequence database. The sequence candidates thus generated are then subjected to homology-based database search programs (e.g., BLAST or FASTA). These homology search programs, however, were not developed with mass spectrometry in mind, and it became necessary to make minor modifications such that mass spectrometric ambiguities can be taken into account when comparing query and database sequences. Finally, this review will discuss the important issue of validating protein identifications. All of the search programs will produce a top ranked answer; however, only the credulous are willing to accept them carte blanche.  相似文献   

12.
The Paragon Algorithm, a novel database search engine for the identification of peptides from tandem mass spectrometry data, is presented. Sequence Temperature Values are computed using a sequence tag algorithm, allowing the degree of implication by an MS/MS spectrum of each region of a database to be determined on a continuum. Counter to conventional approaches, features such as modifications, substitutions, and cleavage events are modeled with probabilities rather than by discrete user-controlled settings to consider or not consider a feature. The use of feature probabilities in conjunction with Sequence Temperature Values allows for a very large increase in the effective search space with only a very small increase in the actual number of hypotheses that must be scored. The algorithm has a new kind of user interface that removes the user expertise requirement, presenting control settings in the language of the laboratory that are translated to optimal algorithmic settings. To validate this new algorithm, a comparison with Mascot is presented for a series of analogous searches to explore the relative impact of increasing search space probed with Mascot by relaxing the tryptic digestion conformance requirements from trypsin to semitrypsin to no enzyme and with the Paragon Algorithm using its Rapid mode and Thorough mode with and without tryptic specificity. Although they performed similarly for small search space, dramatic differences were observed in large search space. With the Paragon Algorithm, hundreds of biological and artifact modifications, all possible substitutions, and all levels of conformance to the expected digestion pattern can be searched in a single search step, yet the typical cost in search time is only 2-5 times that of conventional small search space. Despite this large increase in effective search space, there is no drastic loss of discrimination that typically accompanies the exploration of large search space.  相似文献   

13.
While tandem mass spectrometry (MS/MS) is routinely used to identify proteins from complex mixtures, certain types of proteins present unique challenges for MS/MS analyses. The major wheat gluten proteins, gliadins and glutenins, are particularly difficult to distinguish by MS/MS. Each of these groups contains many individual proteins with similar sequences that include repetitive motifs rich in proline and glutamine. These proteins have few cleavable tryptic sites, often resulting in only one or two tryptic peptides that may not provide sufficient information for identification. Additionally, there are less than 14,000 complete protein sequences from wheat in the current NCBInr release. In this paper, MS/MS methods were optimized for the identification of the wheat gluten proteins. Chymotrypsin and thermolysin as well as trypsin were used to digest the proteins and the collision energy was adjusted to improve fragmentation of chymotryptic and thermolytic peptides. Specialized databases were constructed that included protein sequences derived from contigs from several assemblies of wheat expressed sequence tags (ESTs), including contigs assembled from ESTs of the cultivar under study. Two different search algorithms were used to interrogate the database and the results were analyzed and displayed using a commercially available software package (Scaffold). We examined the effect of protein database content and size on the false discovery rate. We found that as database size increased above 30,000 sequences there was a decrease in the number of proteins identified. Also, the type of decoy database influenced the number of proteins identified. Using three enzymes, two search algorithms and a specialized database allowed us to greatly increase the number of detected peptides and distinguish proteins within each gluten protein group.  相似文献   

14.
Mass spectrometric analyses of protein digests produce large numbers of fragmentation spectra that are not identified by routine database searching strategies. Some of these spectra could be identified by development of improved search engines. However, many of these spectra represent fragmentation of peptide components bearing modifications that are not routinely considered in database searches. Here we present new software within Protein Prospector that allows comprehensive analysis of data sets by analyzing the data at increasing levels of depth. Analysis of published data sets is presented to illustrate that the software is not biased to any instrument types. The results show that these data sets contain many modified peptides. As well as searching for known modification types, Protein Prospector permits the detection and identification of unexpected or novel modifications by searching for any mass shift within a user-specified mass range to any chosen amino acid(s). Several modifications never previously reported in proteomics data were identified in these standard data sets using this mass modification searching approach.  相似文献   

15.
16.
In high-throughput proteomics the development of computational methods and novel experimental strategies often rely on each other. In certain areas, mass spectrometry methods for data acquisition are ahead of computational methods to interpret the resulting tandem mass spectra. Particularly, although there are numerous situations in which a mixture tandem mass spectrum can contain fragment ions from two or more peptides, nearly all database search tools still make the assumption that each tandem mass spectrum comes from one peptide. Common examples include mixture spectra from co-eluting peptides in complex samples, spectra generated from data-independent acquisition methods, and spectra from peptides with complex post-translational modifications. We propose a new database search tool (MixDB) that is able to identify mixture tandem mass spectra from more than one peptide. We show that peptides can be reliably identified with up to 95% accuracy from mixture spectra while considering only a 0.01% of all possible peptide pairs (four orders of magnitude speedup). Comparison with current database search methods indicates that our approach has better or comparable sensitivity and precision at identifying single-peptide spectra while simultaneously being able to identify 38% more peptides from mixture spectra at significantly higher precision.  相似文献   

17.
Many software tools have been developed for the automated identification of peptides from tandem mass spectra. The accuracy and sensitivity of the identification software via database search are critical for successful proteomics experiments. A new database search tool, PEAKS DB, has been developed by incorporating the de novo sequencing results into the database search. PEAKS DB achieves significantly improved accuracy and sensitivity over two other commonly used software packages. Additionally, a new result validation method, decoy fusion, has been introduced to solve the issue of overconfidence that exists in the conventional target decoy method for certain types of peptide identification software.  相似文献   

18.
LC-MS/MS has demonstrated potential for detecting plant pathogens. Unlike PCR or ELISA, LC-MS/MS does not require pathogen-specific reagents for the detection of pathogen-specific proteins and peptides. However, the MS/MS approach we and others have explored does require a protein sequence reference database and database-search software to interpret tandem mass spectra. To evaluate the limitations of database composition on pathogen identification, we analyzed proteins from cultured Ustilago maydis, Phytophthora sojae, Fusarium graminearum, and Rhizoctonia solani by LC-MS/MS. When the search database did not contain sequences for a target pathogen, or contained sequences to related pathogens, target pathogen spectra were reliably matched to protein sequences from nontarget organisms, giving an illusion that proteins from nontarget organisms were identified. Our analysis demonstrates that when database-search software is used as part of the identification process, a paradox exists whereby additional sequences needed to detect a wide variety of possible organisms may lead to more cross-species protein matches and misidentification of pathogens.  相似文献   

19.
Zhang N  Li XJ  Ye M  Pan S  Schwikowski B  Aebersold R 《Proteomics》2005,5(16):4096-4106
In MS/MS experiments with automated precursor ion, selection only a fraction of sequencing attempts lead to the successful identification of a peptide. A number of reasons may contribute to this situation. They include poor fragmentation of the selected precursor ion, the presence of modified residues in the peptide, mismatches with sequence databases, and frequently, the concurrent fragmentation of multiple precursors in the same CID attempt. Current database search engines are incapable of correctly assigning the sequences of multiple precursors to such spectra. We have developed a search engine, ProbIDtree, which can identify multiple peptides from a CID spectrum generated by the concurrent fragmentation of multiple precursor ions. This is achieved by iterative database searching in which the submitted spectra are generated by subtracting the fragment ions assigned to a tentatively matched peptide from the acquired spectrum and in which each match is assigned a tentative probability score. Tentatively matched peptides are organized in a tree structure from which their adjusted probability scores are calculated and used to determine the correct identifications. The results using MALDI-TOF-TOF MS/MS data demonstrate that multiple peptides can be effectively identified simultaneously with high confidence using ProbIDtree.  相似文献   

20.
A database of high-mass accuracy tryptic peptides has been created. The database contains 15 897 unique, annotated MS/MS spectra. It is possible to search for peptides according to their mass, number of missed cleavages, and sequence motifs. All of the data contained in the database is downloadable, and each spectrum can be visualized. An example is presented of how the database can be used for studying peptide fragmentation. Fragmentation of different types of missed cleaved peptides has been studied, and the results can be used to improve identification of these types of peptides.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号