首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 31 毫秒
The discovery of many noncanonical peptides detectable with sensitive mass spectrometry inside, outside, and on cells shepherded the development of novel methods for their identification, often not supported by a systematic benchmarking with other methods. We here propose iBench, a bioinformatic tool that can construct ground truth proteomics datasets and cognate databases, thereby generating a training court wherein methods, search engines, and proteomics strategies can be tested, and their performances estimated by the same tool. iBench can be coupled to the main database search engines, allows the selection of customized features of mass spectrometry spectra and peptides, provides standard benchmarking outputs, and is open source. The proof-of-concept application to tryptic proteome digestions, immunopeptidomes, and synthetic peptide libraries dissected the impact that noncanonical peptides could have on the identification of canonical peptides by Mascot search with rescoring via Percolator (Mascot+Percolator).  相似文献   

Many software tools have been developed for the automated identification of peptides from tandem mass spectra. The accuracy and sensitivity of the identification software via database search are critical for successful proteomics experiments. A new database search tool, PEAKS DB, has been developed by incorporating the de novo sequencing results into the database search. PEAKS DB achieves significantly improved accuracy and sensitivity over two other commonly used software packages. Additionally, a new result validation method, decoy fusion, has been introduced to solve the issue of overconfidence that exists in the conventional target decoy method for certain types of peptide identification software.  相似文献   

Rescoring of mass spectrometry (MS) search results using spectral predictors can strongly increase peptide spectrum match (PSM) identification rates. This approach is particularly effective when aiming to search MS data against large databases, for example, when dealing with nonspecific cleavage in immunopeptidomics or inflation of the reference database for noncanonical peptide identification. Here, we present inSPIRE (in silico Spectral Predictor Informed REscoring), a flexible and performant open-source rescoring pipeline built on Prosit MS spectral prediction, which is compatible with common database search engines. inSPIRE allows large-scale rescoring with data from multiple MS search files, increases sensitivity to minor differences in amino acid residue position, and can be applied to various MS sample types, including tryptic proteome digestions and immunopeptidomes. inSPIRE boosts PSM identification rates in immunopeptidomics, leading to better performance than the original Prosit rescoring pipeline, as confirmed by benchmarking of inSPIRE performance on ground truth datasets. The integration of various features in the inSPIRE backbone further boosts the PSM identification in immunopeptidomics, with a potential benefit for the identification of noncanonical peptides.  相似文献   

MS/MS and database searching has emerged as a valuable technology for rapidly analyzing protein expression, localization, and post-translational modifications. The probability-based search engine Mascot has found widespread use as a tool to correlate tandem mass spectra with peptides in a sequence database. Although the Mascot scoring algorithm provides a probability-based model for peptide identification, the independent peptide scores do not correlate with the significance of the proteins to which they match. Herein, we describe a heuristic method for organizing proteins identified at a specified false-discovery rate using Mascot-matched peptides. We call this method PROVALT, and it uses peptide matches from a random database to calculate false-discovery rates for protein identifications and reduces a complex list of peptide matches to a nonredundant list of homologous protein groups. This method was evaluated using Mascot-identified peptides from a Trypanosoma cruzi epimastigote whole-cell lysate, which was separated by multidimensional LC and analyzed by MS/MS. PROVALT was then compared with the two traditional methods of protein identification when using Mascot, the single peptide score and cumulative protein score methods, and was shown to be superior to both in regards to the number of proteins identified and the inclusion of lower scoring nonrandom peptide matches.  相似文献   

Here we report on a novel peptide library based method for HLA class II binding motif identification. The approach is based on water soluble HLA class II molecules and soluble dedicated peptide libraries. A high number of different synthetic peptides are competing to interact with a limited amount of HLA molecules, giving a selective force in the binding. The peptide libraries can be designed so that the sequence length, the alignment of binding registers, the numbers and composition of random positions are controlled, and also modified amino acids can be included. Selected library peptides bound to HLA are then isolated by size exclusion chromatography and sequenced by tandem mass spectrometry online coupled to liquid chromatography. The MS/MS data are subsequently searched against a library defined database using a search engine such as Mascot, followed by manual inspection of the results. We used two dodecamer and two decamer peptide libraries and HLA-DQ2.5 to test possibilities and limits of this method. The selected sequences which we identified in the fraction eluted from HLA-DQ2.5 showed a higher average of their predicted binding affinity values compared to the original peptide library. The eluted sequences fit very well with the previously described HLA-DQ2.5 peptide binding motif. This novel method, limited by library complexity and sensitivity of mass spectrometry, allows the analysis of several thousand synthetic sequences concomitantly in a simple water soluble format.  相似文献   

De novo interpretation of tandem mass spectrometry (MS/MS) spectra provides sequences for searching protein databases when limited sequence information is present in the database. Our objective was to define a strategy for this type of homology-tolerant database search. Homology searches, using MS-Homology software, were conducted with 20, 10, or 5 of the most abundant peptides from 9 proteins, based either on precursor trigger intensity or on total ion current, and allowing for 50%, 30%, or 10% mismatch in the search. Protein scores were corrected by subtracting a threshold score that was calculated from random peptides. The highest (p < .01) corrected protein scores (i.e., above the threshold) were obtained by submitting 20 peptides and allowing 30% mismatch. Using these criteria, protein identification based on ion mass searching using MS/MS data (i.e., Mascot) was compared with that obtained using homology search. The highest-ranking protein was the same using Mascot, homology search using the 20 most intense peptides, or homology search using all peptides, for 63.4% of 112 spots from two-dimensional polyacrylamide gel electrophoresis gels. For these proteins, the percent coverage was greatest using Mascot compared with the use of all or just the 20 most intense peptides in a homology search (25.1%, 18.3%, and 10.6%, respectively). Finally, 35% of de novo sequences completely matched the corresponding known amino acid sequence of the matching peptide. This percentage increased when the search was limited to the 20 most intense peptides (44.0%). After identifying the protein using MS-Homology, a peptide mass search may increase the percent coverage of the protein identified.  相似文献   

Mass spectrometry (MS) analysis of peptides carrying post‐translational modifications is challenging due to the instability of some modifications during MS analysis. However, glycopeptides as well as acetylated, methylated and other modified peptides release specific fragment ions during CID (collision‐induced dissociation) and HCD (higher energy collisional dissociation) fragmentation. These fragment ions can be used to validate the presence of the PTM on the peptide. Here, we present PTM MarkerFinder, a software tool that takes advantage of such marker ions. PTM MarkerFinder screens the MS/MS spectra in the output of a database search (i.e., Mascot) for marker ions specific for selected PTMs. Moreover, it reports and annotates the HCD and the corresponding electron transfer dissociation (ETD) spectrum (when present), and summarizes information on the type, number, and ratios of marker ions found in the data set. In the present work, a sample containing enriched N‐acetylhexosamine (HexNAc) glycopeptides from yeast has been analyzed by liquid chromatography‐mass spectrometry on an LTQ Orbitrap Velos using both HCD and ETD fragmentation techniques. The identification result (Mascot .dat file) was submitted as input to PTM MarkerFinder and screened for HexNAc oxonium ions. The software output has been used for high‐throughput validation of the identification results.  相似文献   

Peptide mass fingerprinting (PMF) is a valuable method for rapid and high-throughput protein identification using the proteomics approach. Automated search engines, such as Ms-Fit, Mascot, ProFound, and Peptldent, have facilitated protein identification through PMF. The potential to obtain a true MS protein identification result depends on the choice of algorithm as well as experimental factors that influence the information content in MS data. When mass spectral data are incomplete and/or have low mass accuracy, the “number of matches” approach may be inadequate for a useful identification. Several studies have evaluated factors influencing the quality of mass spectrometry (MS) experiments. Missed cleavages, posttranslational modifications of peptides and contaminants (e.g., keratin) are important factors that can affect the results of MS analyses by influencing the identification process as well as the quality of the MS spectra. We compared search engines frequently used to identify proteins fromHomo sapiens andHalobacterium salinarum by evaluating factors, including data-based and mass tolerance to develop an improved search engine for PMF. This study may provide information to help develop a more effective algorithm for protein identification in each species through PMF.  相似文献   

Database search programs are essential tools for identifying peptides via mass spectrometry (MS) in shotgun proteomics. Simultaneously achieving high sensitivity and high specificity during a database search is crucial for improving proteome coverage. Here we present JUMP, a new hybrid database search program that generates amino acid tags and ranks peptide spectrum matches (PSMs) by an integrated score from the tags and pattern matching. In a typical run of liquid chromatography coupled with high-resolution tandem MS, more than 95% of MS/MS spectra can generate at least one tag, whereas the remaining spectra are usually too poor to derive genuine PSMs. To enhance search sensitivity, the JUMP program enables the use of tags as short as one amino acid. Using a target-decoy strategy, we compared JUMP with other programs (e.g. SEQUEST, Mascot, PEAKS DB, and InsPecT) in the analysis of multiple datasets and found that JUMP outperformed these preexisting programs. JUMP also permitted the analysis of multiple co-fragmented peptides from “mixture spectra” to further increase PSMs. In addition, JUMP-derived tags allowed partial de novo sequencing and facilitated the unambiguous assignment of modified residues. In summary, JUMP is an effective database search algorithm complementary to current search programs.Peptide identification by tandem mass spectra is a critical step in mass spectrometry (MS)-based1 proteomics (1). Numerous computational algorithms and software tools have been developed for this purpose (26). These algorithms can be classified into three categories: (i) pattern-based database search, (ii) de novo sequencing, and (iii) hybrid search that combines database search and de novo sequencing. With the continuous development of high-performance liquid chromatography and high-resolution mass spectrometers, it is now possible to analyze almost all protein components in mammalian cells (7). In contrast to rapid data collection, it remains a challenge to extract accurate information from the raw data to identify peptides with low false positive rates (specificity) and minimal false negatives (sensitivity) (8).Database search methods usually assign peptide sequences by comparing MS/MS spectra to theoretical peptide spectra predicted from a protein database, as exemplified in SEQUEST (9), Mascot (10), OMSSA (11), X!Tandem (12), Spectrum Mill (13), ProteinProspector (14), MyriMatch (15), Crux (16), MS-GFDB (17), Andromeda (18), BaMS2 (19), and Morpheus (20). Some other programs, such as SpectraST (21) and Pepitome (22), utilize a spectral library composed of experimentally identified and validated MS/MS spectra. These methods use a variety of scoring algorithms to rank potential peptide spectrum matches (PSMs) and select the top hit as a putative PSM. However, not all PSMs are correctly assigned. For example, false peptides may be assigned to MS/MS spectra with numerous noisy peaks and poor fragmentation patterns. If the samples contain unknown protein modifications, mutations, and contaminants, the related MS/MS spectra also result in false positives, as their corresponding peptides are not in the database. Other false positives may be generated simply by random matches. Therefore, it is of importance to remove these false PSMs to improve dataset quality. One common approach is to filter putative PSMs to achieve a final list with a predefined false discovery rate (FDR) via a target-decoy strategy, in which decoy proteins are merged with target proteins in the same database for estimating false PSMs (2326). However, the true and false PSMs are not always distinguishable based on matching scores. It is a problem to set up an appropriate score threshold to achieve maximal sensitivity and high specificity (13, 27, 28).De novo methods, including Lutefisk (29), PEAKS (30), NovoHMM (31), PepNovo (32), pNovo (33), Vonovo (34), and UniNovo (35), identify peptide sequences directly from MS/MS spectra. These methods can be used to derive novel peptides and post-translational modifications without a database, which is useful, especially when the related genome is not sequenced. High-resolution MS/MS spectra greatly facilitate the generation of peptide sequences in these de novo methods. However, because MS/MS fragmentation cannot always produce all predicted product ions, only a portion of collected MS/MS spectra have sufficient quality to extract partial or full peptide sequences, leading to lower sensitivity than achieved with the database search methods.To improve the sensitivity of the de novo methods, a hybrid approach has been proposed to integrate peptide sequence tags into PSM scoring during database searches (36). Numerous software packages have been developed, such as GutenTag (37), InsPecT (38), Byonic (39), DirecTag (40), and PEAKS DB (41). These methods use peptide tag sequences to filter a protein database, followed by error-tolerant database searching. One restriction in most of these algorithms is the requirement of a minimum tag length of three amino acids for matching protein sequences in the database. This restriction reduces the sensitivity of the database search, because it filters out some high-quality spectra in which consecutive tags cannot be generated.In this paper, we describe JUMP, a novel tag-based hybrid algorithm for peptide identification. The program is optimized to balance sensitivity and specificity during tag derivation and MS/MS pattern matching. JUMP can use all potential sequence tags, including tags consisting of only one amino acid. When we compared its performance to that of two widely used search algorithms, SEQUEST and Mascot, JUMP identified ∼30% more PSMs at the same FDR threshold. In addition, the program provides two additional features: (i) using tag sequences to improve modification site assignment, and (ii) analyzing co-fragmented peptides from mixture MS/MS spectra.  相似文献   

A common problem encountered when performing large‐scale MS proteome analysis is the loss of information due to the high percentage of unassigned spectra. To determine the causes behind this loss we have analyzed the proteome of one of the smallest living bacteria that can be grown axenically, Mycoplasma pneumoniae (729 ORFs). The proteome of M. pneumoniae cells, grown in defined media, was analyzed by MS. An initial search with both Mascot and a species‐specific NCBInr database with common contaminants (NCBImpn), resulted in around 79% of the acquired spectra not having an assignment. The percentage of non‐assigned spectra was reduced to 27% after re‐analysis of the data with the PEAKS software, thereby increasing the proteome coverage of M. pneumoniae from the initial 60% to over 76%. Nonetheless, 33 413 spectra with assigned amino acid sequences could not be mapped to any NCBInr database protein sequence. Approximately, 1% of these unassigned peptides corresponded to PTMs and 4% to M. pneumoniae protein variants (deamidation and translation inaccuracies). The most abundant peptide sequence variants (Phe‐Tyr and Ala‐Ser) could be explained by alterations in the editing capacity of the corresponding tRNA synthases. About another 1% of the peptides not associated to any protein had repetitions of the same aromatic/hydrophobic amino acid at the N‐terminus, or had Arg/Lys at the C‐terminus. Thus, in a model system, we have maximized the number of assigned spectra to 73% (51 453 out of the 70 040 initial acquired spectra). All MS data have been deposited in the ProteomeXchange with identifier PXD002779 ( http://proteomecentral.proteomexchange.org/dataset/PXD002779 ).  相似文献   

Proteome identification using peptide-centric proteomics techniques is a routinely used analysis technique. One of the most powerful and popular methods for the identification of peptides from MS/MS spectra is protein database matching using search engines. Significance thresholding through false discovery rate (FDR) estimation by target/decoy searches is used to ensure the retention of predominantly confident assignments of MS/MS spectra to peptides. However, shortcomings have become apparent when such decoy searches are used to estimate the FDR. To study these shortcomings, we here introduce a novel kind of decoy database that contains isobaric mutated versions of the peptides that were identified in the original search. Because of the supervised way in which the entrapment sequences are generated, we call this a directed decoy database. Since the peptides found in our directed decoy database are thus specifically designed to look quite similar to the forward identifications, the limitations of the existing search algorithms in making correct calls in such strongly confusing situations can be analyzed. Interestingly, for the vast majority of confidently identified peptide identifications, a directed decoy peptide-to-spectrum match can be found that has a better or equal match score than the forward match score, highlighting an important issue in the interpretation of peptide identifications in present-day high-throughput proteomics.  相似文献   

Hundreds of ribosomally synthesized cyclopeptides have been isolated from all domains of life, the vast majority having been reported in the last 15 years. Studies of cyclic peptides have highlighted their exceptional potential both as stable drug scaffolds and as biomedicines in their own right. Despite this, computational techniques for cyclopeptide identification are still in their infancy, with many such peptides remaining uncharacterized. Tandem mass spectrometry has occupied a niche role in cyclopeptide identification, taking over from traditional techniques such as nuclear magnetic resonance spectroscopy (NMR). MS/MS studies require only picogram quantities of peptide (compared to milligrams for NMR studies) and are applicable to complex samples, abolishing the requirement for time-consuming chromatographic purification. While database search tools such as Sequest and Mascot have become standard tools for the MS/MS identification of linear peptides, they are not applicable to cyclopeptides, due to the parent mass shift resulting from cyclization and different fragmentation patterns of cyclic peptides. In this paper, we describe the development of a novel database search methodology to aid in the identification of cyclopeptides by mass spectrometry and evaluate its utility in identifying two peptide rings from Helianthus annuus, a bacterial cannibalism factor from Bacillus subtilis, and a θ-defensin from Rhesus macaque.  相似文献   

The time is ripe for staging the Human Immunopeptidome Project, whose goal is to analyze the full repertoires of peptides bound to the HLA molecules, in both health and disease. Mass spectrometry technologies have matured to enable comprehensive analyses of both the membrane-bound and the plasma soluble immunopeptidomes associated with each of the HLA allomorphs and the different diseases. The expected outcomes of such project will include basic understanding of the molecular mechanisms involved with formation of immunopeptidomes, correlating them with their source cellular proteomes, definition of both the consensus motifs and the scope of each allomorphs-specific immunopeptidomes, and most importantly, identification of disease-related HLA peptides, which may eventually serve as biomarkers or immunotherapeutics. Ideally, the Human Immunopeptidome Project will become public and the gathered data will be shared, as soon as possible. Other immunopeptidome projects, of other animals, will follow suit.  相似文献   

LC–MS/MS has become the standard platform for the characterization of immunopeptidomes, the collection of peptides naturally presented by major histocompatibility complex molecules to the cell surface. The protocols and algorithms used for immunopeptidomics data analysis are based on tools developed for traditional bottom‐up proteomics that address the identification of peptides generated by tryptic digestion. Such algorithms are generally not tailored to the specific requirements of MHC ligand identification and, as a consequence, immunopeptidomics datasets suffer from dismissal of informative spectral information and high false discovery rates. Here, a new pipeline for the refinement of peptide‐spectrum matches (PSM) is proposed, based on the assumption that immunopeptidomes contain a limited number of recurring peptide motifs, corresponding to MHC specificities. Sequence motifs are learned directly from the individual peptidome by training a prediction model on high‐confidence PSMs. The model is then applied to PSM candidates with lower confidence, and sequences that score significantly higher than random peptides are rescued as likely true ligands. The pipeline is applied to MHC class I immunopeptidomes from three different species, and it is shown that it can increase the number of identified ligands by up to 20–30%, while effectively removing false positives and products of co‐precipitation. Spectral validation using synthetic peptides confirms the identity of a large proportion of rescued ligands in the experimental peptidome.  相似文献   

The effectiveness of database search algorithms, such as Mascot, Sequest and ProteinPilot is limited by the quality of the input spectra: spurious peaks in MS/MS spectra can jeopardize the correct identification of peptides or reduce their score significantly. Consequently, an efficient preprocessing of MS/MS spectra can increase the sensitivity of peptide identification at reduced file sizes and run time without compromising its specificity. We investigate the performance of 25 MS/MS preprocessing methods on various data sets and make software for improved preprocessing of mgf/dta‐files freely available from http://hci.iwr.uni‐heidelberg.de/mip/proteomics or http://www.childrenshospital.org/research/steenlab .  相似文献   

采用自动在线纳流多维液相色谱 串联质谱联用的方法分离和鉴定蔗糖密度梯度离心法分离和富集的小鼠肝脏质膜蛋白质 .以强阳离子交换柱为第一相 ,反相柱为第二相 ,在两相之间连接一预柱脱盐和浓缩肽段 .用含去污剂的溶剂提取细胞质膜中的蛋白质 ,获得的质膜蛋白质经酶解和适当的酸化后通过离子交换柱吸附 ,分别用 10个不同浓度的乙酸铵盐溶液进行分段洗脱 .洗脱物经预柱脱盐和浓缩后进入毛细管反相柱进行反相分离 ,分离后的肽段直接进入质谱仪离子源进行一级和二级质谱分析 .质谱仪采得的数据经计算机处理后用Mascot软件进行蛋白质数据库搜寻 ,共鉴定出 12 6种蛋白质 ,其中 4 1种为膜蛋白 ,包括与膜相关的蛋白质和具有多个跨膜区的整合膜蛋白 ,为建立质膜蛋白质组学研究的适宜方法和质膜蛋白质数据库提供了有价值的基础性研究资料 .  相似文献   

Ahmad W  Li L  Deng Y 《BMB reports》2008,41(7):516-522
The glycation of BSA leads to protein/peptide modifications that result in the formation of AGEs. AGEs react with the amino groups of N-terminal amino acid residues, particularly arginine and lysine residues. Enhanced AGE formation exists in the blood and tissues of diabetics, as well as in aging and other disorders. The Identification of AGEs is of great importance. Mass spectrometry has been applied to identify and structurally elucidate AGEs. Here, we report on the identification of AGE- peptides and AGE-precursors based on relative mass changes as a result of specific AGE formation. HPLC-ESIMS, ESI-MS/MS, and the Mascot database were used. The relative mass changes due to the specific type of AGE formation were added to the identified peptides followed by a manual search of the glycated samples, which resulted in the identification of seven peptides for the formation of five AGEs, namely CML, pyrraline, imidazolone A, imidazolone B, and AFGP. Four glycated peptides (FPK, ECCDKPLLEK, IETMR, and HLVDEPQNLIK) were identified in the formation of AGE-precursors.  相似文献   

Soluble human leukocyte antigen class I (sHLA)‐peptide complexes have been suggested to play a role in the modulation of immune responses and in immune evasion of cancer cells. The set of peptides eluted from sHLA molecules could serve as biomarker for the monitoring of patients with cancer or other conditions. Here, we describe an improved sHLA peptidomics methodology resulting in the identification of 1816 to 2761 unique peptide sequences from triplicate analyses of serum or plasma taken from three healthy donors. More than 90% of the identified peptides were 8–11mers and 74% of these sequences were predicted to bind to cognate HLA alleles, confirming the quality of the resulting immunopeptidomes. In comparison to the HLA peptidome of cultured cells, the plasma‐derived peptides were predicted to have a higher stability in complex with the cognate HLA molecules and mainly derived from proteins of the plasma membrane or from the extracellular space. The sHLA peptidomes can efficiently be characterized by using the new methodology, thus serving as potential source of biomarkers in various pathological conditions.  相似文献   

A major challenge in the life sciences is the extraction of detailed molecular information from plants and animals that are not among the handful of exhaustively studied "model organisms." As a consequence, certain species with novel phenotypes are often ignored due to the lack of searchable databases, tractable genetics, stock centers, and more recently, a sequenced genome. Characterization of phenotype at the molecular level commonly relies on the identification of differentially expressed proteins by combining database searching with tandem mass spectrometry (MS) of peptides derived from protein fragmentation. However, the identification of short peptides from nonmodel organisms can be hampered by the lack of sufficient amino acid sequence homology with proteins in existing databases; therefore, a database search strategy that encompasses both identity and homology can provide stronger evidence than a single search alone. The use of multiple algorithms for database searches may also increase the probability of correct protein identification since it is unlikely that each program would produce false negative or positive hits for the same peptides. In this study, four software packages, Mascot, Pro ID, Sequest, and Pro BLAST, were compared in their ability to identify proteins from the thirteen-lined ground squirrel (Spermophilus tridecemlineatus), a hibernating mammal that lacks a completely sequenced genome. Our results show similarities as well as the degree of variability among different software packages when the identical protein database is searched. In the process of this study, we identified the up-regulation of succinyl CoA-transferase (SCOT) in the heart of hibernators. SCOT is the rate-limiting enzyme in the catabolism of ketone bodies, an important alternative fuel source during hibernation.  相似文献   

Database-searching programs generally identify only a fraction of the spectra acquired in a standard LC/MS/MS study of digested proteins. Subtle variations in database-searching algorithms for assigning peptides to MS/MS spectra have been known to provide different identification results. To leverage this variation, a probabilistic framework is developed for combining the results of multiple search engines. The scores for each search engine are first independently converted into peptide probabilities. These probabilities can then be readily combined across search engines using Bayesian rules and the expectation maximization learning algorithm. A significant gain in the number of peptides identified with high confidence with each additional search engine is demonstrated using several data sets of increasing complexity, from a control protein mixture to a human plasma sample, searched using SEQUEST, Mascot, and X! Tandem database-searching programs. The increased rate of peptide assignments also translates into a substantially larger number of protein identifications in LC/MS/MS studies compared to a typical analysis using a single database-search tool.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号