首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In a previous paper we introduced a novel model-based approach (OLAV) to the problem of identifying peptides via tandem mass spectrometry, for which early implementations showed promising performance. We recently further improved this performance to a remarkable level (1-2% false positive rate at 95% true positive rate) and characterized key properties of OLAV like robustness and training set size. We present these results in a synthetic and coherent way along with detailed performance comparisons, a new scoring component making use of peptide amino acidic composition, and new developments like automatic parameter learning. Finally, we discuss the impact of OLAV on the automation of proteomics projects.  相似文献   

2.
De novo peptide sequencing via tandem mass spectrometry.   总被引:10,自引:0,他引:10  
Peptide sequencing via tandem mass spectrometry (MS/MS) is one of the most powerful tools in proteomics for identifying proteins. Because complete genome sequences are accumulating rapidly, the recent trend in interpretation of MS/MS spectra has been database search. However, de novo MS/MS spectral interpretation remains an open problem typically involving manual interpretation by expert mass spectrometrists. We have developed a new algorithm, SHERENGA, for de novo interpretation that automatically learns fragment ion types and intensity thresholds from a collection of test spectra generated from any type of mass spectrometer. The test data are used to construct optimal path scoring in the graph representations of MS/MS spectra. A ranked list of high scoring paths corresponds to potential peptide sequences. SHERENGA is most useful for interpreting sequences of peptides resulting from unknown proteins and for validating the results of database search algorithms in fully automated, high-throughput peptide sequencing.  相似文献   

3.
In high-throughput proteomics the development of computational methods and novel experimental strategies often rely on each other. In certain areas, mass spectrometry methods for data acquisition are ahead of computational methods to interpret the resulting tandem mass spectra. Particularly, although there are numerous situations in which a mixture tandem mass spectrum can contain fragment ions from two or more peptides, nearly all database search tools still make the assumption that each tandem mass spectrum comes from one peptide. Common examples include mixture spectra from co-eluting peptides in complex samples, spectra generated from data-independent acquisition methods, and spectra from peptides with complex post-translational modifications. We propose a new database search tool (MixDB) that is able to identify mixture tandem mass spectra from more than one peptide. We show that peptides can be reliably identified with up to 95% accuracy from mixture spectra while considering only a 0.01% of all possible peptide pairs (four orders of magnitude speedup). Comparison with current database search methods indicates that our approach has better or comparable sensitivity and precision at identifying single-peptide spectra while simultaneously being able to identify 38% more peptides from mixture spectra at significantly higher precision.  相似文献   

4.
The identification of peptides that result from post-translational modifications is critical for understanding normal pathways of cellular regulation as well as identifying damage from, or exposures to xenobiotics, i.e. the exposome. However, because of their low abundance in proteomes, effective detection of modified peptides by mass spectrometry (MS) typically requires enrichment to eliminate false identifications. We present a new method for confidently identifying peptides with mercury (Hg)-containing adducts that is based on the influence of mercury's seven stable isotopes on peptide isotope distributions detected by high-resolution MS. Using a pure protein and E. coli cultures exposed to phenyl mercuric acetate, we show the pattern of peak heights in isotope distributions from primary MS single scans efficiently identified Hg adducts in data from chromatographic separation coupled with tandem mass spectrometry with sensitivity and specificity greater than 90%. Isotope distributions are independent of peptide identifications based on peptide fragmentation (e.g. by SEQUEST), so both methods can be combined to eliminate false positives. Summing peptide isotope distributions across multiple scans improved specificity to 99.4% and sensitivity above 95%, affording identification of an unexpected Hg modification. We also illustrate the theoretical applicability of the method for detection of several less common elements including the essential element, selenium, as selenocysteine in peptides.  相似文献   

5.
Protein and peptide mass analysis and amino acid sequencing by mass spectrometry is widely used for identification and annotation of post-translational modifications (PTMs) in proteins. Modification-specific mass increments, neutral losses or diagnostic fragment ions in peptide mass spectra provide direct evidence for the presence of post-translational modifications, such as phosphorylation, acetylation, methylation or glycosylation. However, the commonly used database search engines are not always practical for exhaustive searches for multiple modifications and concomitant missed proteolytic cleavage sites in large-scale proteomic datasets, since the search space is dramatically expanded. We present a formal definition of the problem of searching databases with tandem mass spectra of peptides that are partially (sub-stoichiometrically) modified. In addition, an improved search algorithm and peptide scoring scheme that includes modification specific ion information from MS/MS spectra was implemented and tested using the Virtual Expert Mass Spectrometrist (VEMS) software. A set of 2825 peptide MS/MS spectra were searched with 16 variable modifications and 6 missed cleavages. The scoring scheme returned a large set of post-translationally modified peptides including precise information on modification type and position. The scoring scheme was able to extract and distinguish the near-isobaric modifications of trimethylation and acetylation of lysine residues based on the presence and absence of diagnostic neutral losses and immonium ions. In addition, the VEMS software contains a range of new features for analysis of mass spectrometry data obtained in large-scale proteomic experiments. Windows binaries are available at http://www.yass.sdu.dk/.  相似文献   

6.
Identification of fusion proteins has contributed significantly to our understanding of cancer progression, yielding important predictive markers and therapeutic targets. While fusion proteins can be potentially identified by mass spectrometry, all previously found fusion proteins were identified using genomic (rather than mass spectrometry) technologies. This lack of MS/MS applications in studies of fusion proteins is caused by the lack of computational tools that are able to interpret mass spectra from peptides covering unknown fusion breakpoints (fusion peptides). Indeed, the number of potential fusion peptides is so large that the existing MS/MS database search tools become impractical even in the case of small genomes. We explore computational approaches to identifying fusion peptides, propose an algorithm for solving the fusion peptide identification problem, and analyze the performance of this algorithm on simulated data. We further illustrate how this approach can be modified for human exons prediction.  相似文献   

7.
A new database, SwePep, specifically designed for endogenous peptides, has been constructed to significantly speed up the identification process from complex tissue samples utilizing mass spectrometry. In the identification process the experimental peptide masses are compared with the peptide masses stored in the database both with and without possible post-translational modifications. This intermediate identification step is fast and singles out peptides that are potential endogenous peptides and can later be confirmed with tandem mass spectrometry data. Successful applications of this methodology are presented. The SwePep database is a relational database developed using MySql and Java. The database contains 4180 annotated endogenous peptides from different tissues originating from 394 different species as well as 50 novel peptides from brain tissue identified in our laboratory. Information about the peptides, including mass, isoelectric point, sequence, and precursor protein, is also stored in the database. This new approach holds great potential for removing the bottleneck that occurs during the identification process in the field of peptidomics. The SwePep database is available to the public.  相似文献   

8.
Information about peptides and proteins in urine can be used to search for biomarkers of early stages of various diseases. The main technology currently used for identification of peptides and proteins is tandem mass spectrometry, in which peptides are identified by mass spectra of their fragmentation products. However, the presence of the fragmentation stage decreases sensitivity of analysis and increases its duration. We have developed a method for identification of human urinary proteins and peptides. This method based on the accurate mass and time tag (AMT) method does not use tandem mass spectrometry. The database of AMT tags containing more than 1381 AMT tags of peptides has been constructed. The software for database filling with AMT tags, normalizing the chromatograms, database application for identification of proteins and peptides, and their quantitative estimation has been developed. The new procedures for peptide identification by tandem mass spectra and the AMT tag database are proposed. The paper also lists novel proteins that have been identified in human urine for the first time.  相似文献   

9.
The relatively small numbers of proteins and fewer possible post-translational modifications in microbes provide a unique opportunity to comprehensively characterize their dynamic proteomes. We have constructed a PeptideAtlas (PA) covering 62.7% of the predicted proteome of the extremely halophilic archaeon Halobacterium salinarum NRC-1 by compiling approximately 636 000 tandem mass spectra from 497 mass spectrometry runs in 88 experiments. Analysis of the PA with respect to biophysical properties of constituent peptides, functional properties of parent proteins of detected peptides, and performance of different mass spectrometry approaches has highlighted plausible strategies for improving proteome coverage and selecting signature peptides for targeted proteomics. Notably, discovery of a significant correlation between absolute abundances of mRNAs and proteins has helped identify low abundance of proteins as the major limitation in peptide detection. Furthermore, we have discovered that iTRAQ labeling for quantitative proteomic analysis introduces a significant bias in peptide detection by mass spectrometry. Therefore, despite identifying at least one proteotypic peptide for almost all proteins in the PA, a context-dependent selection of proteotypic peptides appears to be the most effective approach for targeted proteomics.  相似文献   

10.
11.
In mass spectrometry-based proteomics, frequently hundreds of thousands of MS/MS spectra are collected in a single experiment. Of these, a relatively small fraction is confidently assigned to peptide sequences, whereas the majority of the spectra are not further analyzed. Spectra are not assigned to peptides for diverse reasons. These include deficiencies of the scoring schemes implemented in the database search tools, sequence variations (e.g. single nucleotide polymorphisms) or omissions in the database searched, post-translational or chemical modifications of the peptide analyzed, or the observation of sequences that are not anticipated from the genomic sequence (e.g. splice forms, somatic rearrangement, and processed proteins). To increase the amount of information that can be extracted from proteomic MS/MS datasets we developed a robust method that detects high quality spectra within the fraction of spectra unassigned by conventional sequence database searching and computes a quality score for each spectrum. We also demonstrate that iterative search strategies applied to such detected unassigned high quality spectra significantly increase the number of spectra that can be assigned from datasets and that biologically interesting new insights can be gained from existing data.  相似文献   

12.
Han X  He L  Xin L  Shan B  Ma B 《Journal of proteome research》2011,10(7):2930-2936
Tandem mass spectrometry (MS/MS) has been routinely used to identify peptides from a protein sequence database. To identify post-translationally modified peptides, most existing software requires the specification of a few possible modifications. However, such knowledge of possible modifications is not always available. In this paper, we describe a new algorithm for identifying modified peptides without requiring the user to specify the possible modifications; instead, all modifications from the Unimod database are considered. Meanwhile, several new techniques are employed to avoid the exponential growth of the search space, as well as to control the false discoveries due to this unrestricted search approach. Finally, a software tool, PeaksPTM, has been developed and already achieved a stronger performance than competitive tools for unrestricted identification of post-translational modifications.  相似文献   

13.
We report a significantly-enhanced bioinformatics suite and database for proteomics research called Yale Protein Expression Database(YPED) that is used by investigators at more than 300 institutions worldwide. YPED meets the data management, archival, and analysis needs of a high-throughput mass spectrometry-based proteomics research ranging from a singlelaboratory, group of laboratories within and beyond an institution, to the entire proteomics community. The current version is a significant improvement over the first version in that it contains new modules for liquid chromatography–tandem mass spectrometry(LC–MS/MS) database search results, label and label-free quantitative proteomic analysis, and several scoring outputs for phosphopeptide site localization. In addition, we have added both peptide and protein comparative analysis tools to enable pairwise analysis of distinct peptides/proteins in each sample and of overlapping peptides/proteins between all samples in multiple datasets. We have also implemented a targeted proteomics module for automated multiple reaction monitoring(MRM)/selective reaction monitoring(SRM) assay development. We have linked YPED's database search results and both label-based and label-free fold-change analysis to the Skyline Panorama repository for online spectra visualization. In addition, we have built enhanced functionality to curate peptide identifications into an MS/MS peptide spectral library for all of our protein database search identification results.  相似文献   

14.
The identification of ubiquitin (Ub) and Ub‐like protein (Ubl) conjugation sites is important in understanding their roles in biological pathway regulations. However, unambiguously and sensitively identifying Ub/Ubl conjugation sites through high‐throughput MS remains challenging. We introduce an improved workflow for identifying Ub/Ubl conjugation sites based on the ChopNSpice and X!Tandem software. ChopNSpice is modified to generate Ub/Ubl conjugation peptides in the form of a cross‐link. A combinatorial FASTA database can be acquired using the modified ChopNSpice (MchopNSpice). The modified X!Tandem (UblSearch) introduces a new fragmentation model for the Ub/Ubl conjugation peptides to match unambiguously the MS/MS spectra with linear peptides or Ub/Ubl conjugation peptides using the combinatorial FASTA database. The novel workflow exhibited better performance in analyzing an Ub and Ubl spectral library and a large‐scale Trypanosoma cruzi small Ub‐related modifier dataset compared with the original ChopNSpice method. The proposed workflow is more suitable for processing large‐scale MS datasets of Ub/Ubl modification. MchopNSpice and UblSearch are freely available under the GNU General Public License v3.0 at http://sourceforge.net/projects/maublsearch .  相似文献   

15.
Mass spectrometric techniques for identification of proteins by "mass fingerprinting" (matching the masses of tryptic peptides from a protein digest to the theoretical peptides in a database) such as matrix-assisted laser desorption ionization-time of flight (MALDI-TOF) are rapidly growing in popularity as the demand for high throughput analysis of the proteome increases. This is due, in part, to the ability to automate the technique and the rapid rate with which mass spectra may be acquired. An important factor in the accuracy of the technique is the number of tryptic peptides that are identified in the various searching algorithms that exist. The greater sequence coverage of the parent protein that is obtained, the higher the level of confidence in the identification that is determined. One impediment to high levels of sequence coverage is the bias of MALDI-TOF mass spectrometry to arginine-containing peptides. Increasing the sensitivity to lysine-containing peptides should increase the sequence coverage obtained. In order to achieve this result we have developed conditions to modify the epsilon-amine group of lysine in tryptic peptides with O-methylisourea. The conditions utilized result in the conversion of lysine to homoarginine with no modification of the amine terminus of the peptides. The sensitivity of MALDI-TOF mass spectrometry detection of peptides was increased dramatically following modification. The modification chemistry may be applied to tryptic peptide mixtures prior to desalting and spotting onto MALDI-TOF plates. This technique will be particularly useful for identifying proteins with a high lysine/arginine ratio.  相似文献   

16.
The accurate mass and time (AMT) tag strategy has been recognized as a powerful tool for high-throughput analysis in liquid chromatography–mass spectrometry (LC–MS)-based proteomics. Due to the complexity of the human proteome, this strategy requires highly accurate mass measurements for confident identifications. We have developed a method of building a reference map that allows relaxed criteria for mass errors yet delivers high confidence for peptide identifications. The samples used for generating the peptide database were produced by collecting cysteine-containing peptides from T47D cells and then fractionating the peptides using strong cationic exchange chromatography (SCX). LC–tandem mass spectrometry (MS/MS) data from the SCX fractions were combined to create a comprehensive reference map. After the reference map was built, it was possible to skip the SCX step in further proteomic analyses. We found that the reference-driven identification increases the overall throughput and proteomic coverage by identifying peptides with low intensity or complex interference. The use of the reference map also facilitates the quantitation process by allowing extraction of peptide intensities of interest and incorporating models of theoretical isotope distribution.  相似文献   

17.
Imaging mass spectrometry (IMS) has developed into a powerful tool allowing label-free detection of numerous biomolecules in situ. In contrast to shotgun proteomics, proteins/peptides can be detected directly from biological tissues and correlated to its morphology leading to a gain of crucial clinical information. However, direct identification of the detected molecules is currently challenging for MALDI–IMS, thereby compelling researchers to use complementary techniques and resource intensive experimental setups. Despite these strategies, sufficient information could not be extracted because of lack of an optimum data combination strategy/software. Here, we introduce a new open-source software ImShot that aims at identifying peptides obtained in MALDI–IMS. This is achieved by combining information from IMS and shotgun proteomics (LC–MS) measurements of serial sections of the same tissue. The software takes advantage of a two-group comparison to determine the search space of IMS masses after deisotoping the corresponding spectra. Ambiguity in annotations of IMS peptides is eliminated by introduction of a novel scoring system that identifies the most likely parent protein of a detected peptide in the corresponding IMS dataset. Thanks to its modular structure, the software can also handle LC–MS data separately and display interactive enrichment plots and enriched Gene Ontology terms or cellular pathways. The software has been built as a desktop application with a conveniently designed graphic user interface to provide users with a seamless experience in data analysis. ImShot can run on all the three major desktop operating systems and is freely available under Massachusetts Institute of Technology license.  相似文献   

18.
MOTIVATION: Tandem mass spectrometry combined with sequence database searching is one of the most powerful tools for protein identification. As thousands of spectra are generated by a mass spectrometer in one hour, the speed of database searching is critical, especially when searching against a large sequence database, or when the peptide is generated by some unknown or non-specific enzyme, even or when the target peptides have post-translational modifications (PTM). In practice, about 70-90% of the spectra have no match in the database. Many believe that a significant portion of them are due to peptides of non-specific digestions by unknown enzymes or amino acid modifications. In another case, scientists may choose to use some non-specific enzymes such as pepsin or thermolysin for proteolysis in proteomic study, in that not all proteins are amenable to be digested by some site-specific enzymes, and furthermore many digested peptides may not fall within the rang of molecular weight suitable for mass spectrometry analysis. Interpreting mass spectra of these kinds will cost a lot of computational time of database search engines. OVERVIEW: The present study was designed to speed up the database searching process for both cases. More specifically speaking, we employed an approach combining suffix tree data structure and spectrum graph. The suffix tree is used to preprocess the protein sequence database, while the spectrum graph is used to preprocess the tandem mass spectrum. We then search the suffix tree against the spectrum graph for candidate peptides. We design an efficient algorithm to compute a matching threshold with some statistical significance level, e.g. p = 0.01, for each spectrum, and use it to select candidate peptides. Then we rank these peptides using a SEQUEST-like scoring function. The algorithms were implemented and tested on experimental data. For post-translational modifications, we allow arbitrary number of any modification to a protein. AVAILABILITY: The executable program and other supplementary materials are available online at: http://hto-c.usc.edu:8000/msms/suffix/.  相似文献   

19.
用于串联质谱鉴定多肽的计量方法   总被引:1,自引:0,他引:1  
目前已有多种对串联质谱与数据库中多肽的理论质谱的一致性进行评估的高通量计量算法用于鸟枪法蛋白质组学 (shotgunproteomics)研究。然而这些方法操作时存在大量错误的多肽鉴定。这里提出一种新的串联质谱识别多肽序列的计量算法。该算法综合考虑了串联质谱中不同离子出现的概率、多肽的酶切位点数、理论离子与实验离子的匹配程度和匹配模式。对大容量的串联质谱数据集的测试表明 ,根据算法开发的软件PepSearch比目前最常用的软件SEQUEST有更好的鉴定准确性。PepSearch可从http : compbio.sibsnet.org projects pepsearch下载。  相似文献   

20.
MOTIVATION: Tandem mass spectrometry (MS/MS) identifies protein sequences using database search engines, at the core of which is a score that measures the similarity between peptide MS/MS spectra and a protein sequence database. The TANDEM application was developed as a freely available database search engine for the proteomics research community. To extend TANDEM as a platform for further research on developing improved database scoring methods, we modified the software to allow users to redefine the scoring function and replace the native TANDEM scoring function while leaving the remaining core application intact. Redefinition is performed at run time so multiple scoring functions are available to be selected and applied from a single search engine binary. We introduce the implementation of the pluggable scoring algorithm and also provide implementations of two TANDEM compatible scoring functions, one previously described scoring function compatible with PeptideProphet and one very simple scoring function that quantitative researchers may use to begin their development. This extension builds on the open-source TANDEM project and will facilitate research into and dissemination of novel algorithms for matching MS/MS spectra to peptide sequences. The pluggable scoring schema is also compatible with related search applications P3 and Hunter, which are part of the X! suite of database matching algorithms. The pluggable scores and the X! suite of applications are all written in C++. AVAILABILITY: Source code for the scoring functions is available from http://proteomics.fhcrc.org  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号