首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
用于串联质谱鉴定多肽的计量方法   总被引:1,自引:0,他引:1  
目前已有多种对串联质谱与数据库中多肽的理论质谱的一致性进行评估的高通量计量算法用于鸟枪法蛋白质组学 (shotgunproteomics)研究。然而这些方法操作时存在大量错误的多肽鉴定。这里提出一种新的串联质谱识别多肽序列的计量算法。该算法综合考虑了串联质谱中不同离子出现的概率、多肽的酶切位点数、理论离子与实验离子的匹配程度和匹配模式。对大容量的串联质谱数据集的测试表明 ,根据算法开发的软件PepSearch比目前最常用的软件SEQUEST有更好的鉴定准确性。PepSearch可从http : compbio.sibsnet.org projects pepsearch下载。  相似文献   

2.
Spectral library searching is an emerging approach in peptide identifications from tandem mass spectra, a critical step in proteomic data analysis. In spectral library searching, a spectral library is first meticulously compiled from a large collection of previously observed peptide MS/MS spectra that are conclusively assigned to their corresponding amino acid sequence. An unknown spectrum is then identified by comparing it to all the candidates in the spectral library for the most similar match. This review discusses the basic principles of spectral library building and searching, describes its advantages and limitations, and provides a primer for researchers interested in adopting this new approach in their data analysis. It will also discuss the future outlook on the evolution and utility of spectral libraries in the field of proteomics.  相似文献   

3.
With the recent quick expansion of DNA and protein sequence databases, intensive efforts are underway to interpret the linear genetic information of DNA in terms of function, structure, and control of biological processes. The systematic identification and quantification of expressed proteins has proven particularly powerful in this regard. Large-scale protein identification is usually achieved by automated liquid chromatography-tandem mass spectrometry of complex peptide mixtures and sequence database searching of the resulting spectra [Aebersold and Goodlett, Chem. Rev. 2001, 101, 269-295]. As generating large numbers of sequence-specific mass spectra (collision-induced dissociation/CID) spectra has become a routine operation, research has shifted from the generation of sequence database search results to their validation. Here we describe in detail a novel probabilistic model and score function that ranks the quality of the match between tandem mass spectral data and a peptide sequence in a database. We document the performance of the algorithm on a reference data set and in comparison with another sequence database search tool. The software is publicly available for use and evaluation at http://www.systemsbiology.org/research/software/proteomics/ProbID.  相似文献   

4.
The promise of mass spectrometry as a tool for probing signal-transduction is predicated on reliable identification of post-translational modifications. Phosphorylations are key mediators of cellular signaling, yet are hard to detect, partly because of unusual fragmentation patterns of phosphopeptides. In addition to being accurate, MS/MS identification software must be robust and efficient to deal with increasingly large spectral data sets. Here, we present a new scoring function for the Inspect software for phosphorylated peptide tandem mass spectra for ion-trap instruments, without the need for manual validation. The scoring function was modeled by learning fragmentation patterns from 7677 validated phosphopeptide spectra. We compare our algorithm against SEQUEST and X!Tandem on testing and training data sets. At a 1% false positive rate, Inspect identified the greatest total number of phosphorylated spectra, 13% more than SEQUEST and 39% more than X!Tandem. Spectra identified by Inspect tended to score better in several spectral quality measures. Furthermore, Inspect runs much faster than either SEQUEST or X!Tandem, making desktop phosphoproteomics feasible. Finally, we used our new models to reanalyze a corpus of 423,000 LTQ spectra acquired for a phosphoproteome analysis of Saccharomyces cerevisiae DNA damage and repair pathways and discovered 43% more phosphopeptides than the previous study.  相似文献   

5.
Robust statistical validation of peptide identifications obtained by tandem mass spectrometry and sequence database searching is an important task in shotgun proteomics. PeptideProphet is a commonly used computational tool that computes confidence measures for peptide identifications. In this paper, we investigate several limitations of the PeptideProphet modeling approach, including the use of fixed coefficients in computing the discriminant search score and selection of the top scoring peptide assignment per spectrum only. To address these limitations, we describe an adaptive method in which a new discriminant function is learned from the data in an iterative fashion. We extend the modeling framework to go beyond the top scoring peptide assignment per spectrum. We also investigate the effect of clustering the spectra according to their spectrum quality score followed by cluster-specific mixture modeling. The analysis is carried out using data acquired from a mixture of purified proteins on four different types of mass spectrometers, as well as using a complex human serum data set. A special emphasis is placed on the analysis of data generated on high mass accuracy instruments.  相似文献   

6.

Background  

The key to mass-spectrometry-based proteomics is peptide identification, which relies on software analysis of tandem mass spectra. Although each search engine has its strength, combining the strengths of various search engines is not yet realizable largely due to the lack of a unified statistical framework that is applicable to any method.  相似文献   

7.
8.
Shotgun tandem mass spectrometry-based peptide sequencing using programs such as SEQUEST allows high-throughput identification of peptides, which in turn allows the identification of corresponding proteins. We have applied a machine learning algorithm, called the support vector machine, to discriminate between correctly and incorrectly identified peptides using SEQUEST output. Each peptide was characterized by SEQUEST-calculated features such as delta Cn and Xcorr, measurements such as precursor ion current and mass, and additional calculated parameters such as the fraction of matched MS/MS peaks. The trained SVM classifier performed significantly better than previous cutoff-based methods at separating positive from negative peptides. Positive and negative peptides were more readily distinguished in training set data acquired on a QTOF, compared to an ion trap mass spectrometer. The use of 13 features, including four new parameters, significantly improved the separation between positive and negative peptides. Use of the support vector machine and these additional parameters resulted in a more accurate interpretation of peptide MS/MS spectra and is an important step toward automated interpretation of peptide tandem mass spectrometry data in proteomics.  相似文献   

9.
Peptide identification via tandem mass spectrometry sequence database searching is a key method in the array of tools available to the proteomics researcher. The ability to rapidly and sensitively acquire tandem mass spectrometry data and perform peptide and protein identifications has become a commonly used proteomics analysis technique because of advances in both instrumentation and software. Although many different tandem mass spectrometry database search tools are currently available from both academic and commercial sources, these algorithms share similar core elements while maintaining distinctive features. This review revisits the mechanism of sequence database searching and discusses how various parameter settings impact the underlying search.  相似文献   

10.
Although tandem mass spectrometry (MS/MS) has become an integral part of proteomics, intensity patterns in MS/MS spectra are rarely weighted heavily in most widely used algorithms because they are not yet fully understood. Here a knowledge mining approach is demonstrated to discover fragmentation intensity patterns and elucidate the chemical factors behind such patterns. Fragmentation intensity information from 28 330 ion trap peptide MS/MS spectra of different charge states and sequences went through unsupervised clustering using a penalized K-means algorithm. Without any prior chemistry assumptions, four clusters with distinctive fragmentation patterns were obtained. A decision tree was generated to investigate peptide sequence motif and charge state status that caused these fragmentation patterns. This data-mining scheme is generally applicable for any large data sets. It bypasses the common prior knowledge constraints and reports on the overall peptide fragmentation behavior. It improves the understanding of gas-phase peptide dissociation and provides a foundation for new or improved protein identification algorithms.  相似文献   

11.
An important step in mass spectrometry (MS)-based proteomics is the identification of peptides by their fragment spectra. Regardless of the identification score achieved, almost all tandem-MS (MS/MS) spectra contain remaining peaks that are not assigned by the search engine. These peaks may be explainable by human experts but the scale of modern proteomics experiments makes this impractical. In computer science, Expert Systems are a mature technology to implement a list of rules generated by interviews with practitioners. We here develop such an Expert System, making use of literature knowledge as well as a large body of high mass accuracy and pure fragmentation spectra. Interestingly, we find that even with high mass accuracy data, rule sets can quickly become too complex, leading to over-annotation. Therefore we establish a rigorous false discovery rate, calculated by random insertion of peaks from a large collection of other MS/MS spectra, and use it to develop an optimized knowledge base. This rule set correctly annotates almost all peaks of medium or high abundance. For high resolution HCD data, median intensity coverage of fragment peaks in MS/MS spectra increases from 58% by search engine annotation alone to 86%. The resulting annotation performance surpasses a human expert, especially on complex spectra such as those of larger phosphorylated peptides. Our system is also applicable to high resolution collision-induced dissociation data. It is available both as a part of MaxQuant and via a webserver that only requires an MS/MS spectrum and the corresponding peptides sequence, and which outputs publication quality, annotated MS/MS spectra (www.biochem.mpg.de/mann/tools/). It provides expert knowledge to beginners in the field of MS-based proteomics and helps advanced users to focus on unusual and possibly novel types of fragment ions.In MS-based proteomics, peptides are matched to peptide sequences in databases using search engines (13). Statistical criteria are established for accepted versus rejected peptide spectra matches based on the search engine score, and usually a 99% certainty is required for reported peptides. The search engines typically only take sequence specific backbone fragmentation into account (i.e. a, b, and y ions) and some of their neutral losses. However, tandem mass spectra—especially of larger peptides—can be quite complex and contain a number of medium or even high abundance peptide fragments that are not annotated by the search engine result. This can result in uncertainty for the user—especially if only relatively few peaks are annotated—because it may reflect an incorrect identification. However, the most common cause of unlabeled peaks is that another peptide was present in the precursor selection window and was cofragmented. This has variously been termed “chimeric spectra” (46), or the problem of low precursor ion fraction (PIF)1 (7). Such spectra may still be identifiable with high confidence. The Andromeda search engine in MaxQuant, for instance, attempts to identify a second peptide in such cases (8, 9). However, even “pure” spectra (those with a high PIF) often still contain many unassigned peaks. These can be caused by different fragment types, such as internal ions, single or combined neutral losses as well as immonium and other ion types in the low mass region. A mass spectrometric expert can assign many or all of these peaks, based on expert knowledge of fragmentation and manual calculation of fragment masses, resulting in a higher degree of confidence for the identification. However, there are more and more practitioners of proteomics without in depth training or experience in annotating MS/MS spectra and such annotation would in any case be prohibitive for hundreds of thousands of spectra. Furthermore, even human experts may wrongly annotate a given peak—especially with low mass accuracy tandem mass spectra—or fail to consider every possibility that could have resulted in this fragment mass.Given the desirability of annotating fragment peaks to the highest degree possible, we turned to “Expert Systems,” a well-established technology in computer science. Expert Systems achieved prominence in the 1970s and 1980s and were meant to solve complex problems by reasoning about knowledge (10, 11). Interestingly, one of the first examples was developed by Nobel Prize winner Joshua Lederberg more than 40 years ago, and dealt with the interpretation of mass spectrometric data. The program''s name was Heuristic DENTRAL (12), and it was capable of interpreting the mass spectra of aliphatic ethers and their fragments. The hypotheses produced by the program described molecular structures that are plausible explanations of the data. To infer these explanations from the data, the program incorporated a theory of chemical stability that provided limiting constraints as well as heuristic rules.In general, the aim of an Expert System is to encode knowledge extracted from professionals in the field in question. This then powers a rule-based system that can be applied broadly and in an automated manner. A rule-based Expert System represents the information obtained from human specialists in the form of IF-THEN rules. These are used to perform operations on input data to reach appropriate conclusion. A generic Expert System is essentially a computer program that provides a framework for performing a large number of inferences in a predictable way, using forward or backward chains, backtracking, and other mechanisms (13). Therefore, in contrast to statistics based learning, the “expert program” does not know what it knows through the raw volume of facts in the computer''s memory. Instead, like a human expert, it relies on a reasoning-like process of applying an empirically derived set of rules to the data.Here we implemented an Expert System for the interpretation for high mass accuracy tandem mass spectrometry data of peptides. It was developed in an iterative manner together with human experts on peptide fragmentation, using the published literature on fragmentation pathways as well as large data sets of higher-energy collisional dissociation (HCD) (14) and collision-induced dissociation (CID) based peptide identifications. Our goal was to achieve an annotation performance similar or better than experienced mass spectrometrists (15), thus making comprehensively annotated peptide spectra available in large scale proteomics.  相似文献   

12.
We report on the effectiveness of CID, HCD, and ETD for LC-FT MS/MS analysis of peptides using a tandem linear ion trap-Orbitrap mass spectrometer. A range of software tools and analysis parameters were employed to explore the use of CID, HCD, and ETD to identify peptides (isolated from human blood plasma) without the use of specific "enzyme rules". In the evaluation of an FDR-controlled SEQUEST scoring method, the use of accurate masses for fragments increased the number of identified peptides (by ~50%) compared to the use of conventional low accuracy fragment mass information, and CID provided the largest contribution to the identified peptide data sets compared to HCD and ETD. The FDR-controlled Mascot scoring method provided significantly fewer peptide identifications than SEQUEST (by 1.3-2.3 fold) and CID, HCD, and ETD provided similar contributions to identified peptides. Evaluation of de novo sequencing and the UStags method for more intense fragment ions revealed that HCD afforded more contiguous residues (e.g., ≥ 7 amino acids) than either CID or ETD. Both the FDR-controlled SEQUEST and Mascot scoring methods provided peptide data sets that were affected by the decoy database used and mass tolerances applied (e.g., identical peptides between data sets could be limited to ~70%), while the UStags method provided the most consistent peptide data sets (>90% overlap). The m/z ranges in which CID, HCD, and ETD contributed the largest number of peptide identifications were substantially overlapping. This work suggests that the three peptide ion fragmentation methods are complementary and that maximizing the number of peptide identifications benefits significantly from a careful match with the informatics tools and methods applied. These results also suggest that the decoy strategy may inaccurately estimate identification FDRs.  相似文献   

13.
Despite the publication of several software tools for analysis of glycopeptide tandem mass spectra, there remains a lack of consensus regarding the most effective and appropriate methods. In part, this reflects problems with applying standard methods for proteomics database searching and false discovery rate calculation. While the analysis of small post-translational modifications (PTMs) may be regarded as an extension of proteomics database searching, glycosylation requires specialized approaches. This is because glycans are large and heterogeneous by nature, causing glycopeptides to exist as multiple glycosylated variants. Thus, the mass of the peptide cannot be calculated directly from that of the intact glycopeptide. In addition, the chemical nature of the glycan strongly influences product ion patterns observed for glycopeptides. As a result, glycopeptidomics requires specialized bioinformatics methods. We summarize the recent progress towards a consensus for effective glycopeptide tandem mass spectrometric analysis.  相似文献   

14.
数据非依赖采集(DIA)是蛋白质组学领域近年来快速发展的质谱采集技术,其通过无偏碎裂隔离窗口内的所有母离子采集二级谱图,理论上可实现蛋白质样品的深度覆盖,同时具有高通量、高重现性和高灵敏度的优点。现有的DIA数据采集方法可以分为全窗口碎裂方法、隔离窗口序列碎裂方法和四维DIA数据采集方法(4D-DIA)3大类。针对DIA数据的不同特点,主要数据解析方法包括谱库搜索方法、蛋白质序列库直接搜索方法、伪二级谱图鉴定方法和从头测序方法4大类。解析得到的肽段鉴定结果需要进行可信度评估,包括使用机器学习方法的重排序和对报告结果集合的假发现率估计两个步骤,实现对数据解析结果的质控。本文对DIA数据的采集方法、数据解析方法及软件和鉴定结果可信度评估方法进行了整理和综述,并展望了未来的发展方向。  相似文献   

15.
Hernandez P  Gras R  Frey J  Appel RD 《Proteomics》2003,3(6):870-878
In recent years, proteomics research has gained importance due to increasingly powerful techniques in protein purification, mass spectrometry and identification, and due to the development of extensive protein and DNA databases from various organisms. Nevertheless, current identification methods from spectrometric data have difficulties in handling modifications or mutations in the source peptide. Moreover, they have low performance when run on large databases (such as genomic databases), or with low quality data, for example due to bad calibration or low fragmentation of the source peptide. We present a new algorithm dedicated to automated protein identification from tandem mass spectrometry (MS/MS) data by searching a peptide sequence database. Our identification approach shows promising properties for solving the specific difficulties enumerated above. It consists of matching theoretical peptide sequences issued from a database with a structured representation of the source MS/MS spectrum. The representation is similar to the spectrum graphs commonly used by de novo sequencing software. The identification process involves the parsing of the graph in order to emphasize relevant sections for each theoretical sequence, and leads to a list of peptides ranked by a correlation score. The parsing of the graph, which can be a highly combinatorial task, is performed by a bio-inspired algorithm called Ant Colony Optimization algorithm.  相似文献   

16.
Liquid chromatography coupled tandem mass spectrometry (LC‐MS/MS) is an important technique for detecting peptides in proteomics studies. Here, we present an open source software tool, termed IPeak, a peptide identification pipeline that is designed to combine the Percolator post‐processing algorithm and multi‐search strategy to enhance the sensitivity of peptide identifications without compromising accuracy. IPeak provides a graphical user interface (GUI) as well as a command‐line interface, which is implemented in JAVA and can work on all three major operating system platforms: Windows, Linux/Unix and OS X. IPeak has been designed to work with the mzIdentML standard from the Proteomics Standards Initiative (PSI) as an input and output, and also been fully integrated into the associated mzidLibrary project, providing access to the overall pipeline, as well as modules for calling Percolator on individual search engine result files. The integration thus enables IPeak (and Percolator) to be used in conjunction with any software packages implementing the mzIdentML data standard. IPeak is freely available and can be downloaded under an Apache 2.0 license at https://code.google.com/p/mzidentml‐lib/ .  相似文献   

17.
The identification of proteins in proteomics experiments is usually based on mass information derived from tandem mass spectrometry data. To improve the performance of the identification algorithms, additional information available in the fragment peak intensity patterns has been shown to be useful. In this study, we consider the effect of iTRAQ labeling on the fragment peak intensity patterns of singly charged peptides from MALDI tandem MS data. The presence of an iTRAQ-modified basic group on the N-terminus leads to a more pronounced set of b-ion peaks and distinct changes in the abundance of specific peptide types. We performed a simple intensity prediction by using a decision-tree machine learning approach and were able to show that the relative ion abundance in a spectrum can be correctly predicted and distinguished from closely related sequences. This information will be useful for the development of improved method-specific intensity-based protein identification algorithms.  相似文献   

18.
One of the major challenges for large scale proteomics research is the quality evaluation of results. Protein identification from complex biological samples or experimental setups is often a manual and subjective task which lacks profound statistical evaluation. This is not feasible for high-throughput proteomic experiments which result in large datasets of thousands of peptides and proteins and their corresponding mass spectra. To improve the quality, reliability and comparability of scientific results, an estimation of the rate of erroneously identified proteins is advisable. Moreover, scientific journals increasingly stipulate that articles containing considerable MS data should be subject to stringent statistical evaluation. We present a newly developed easy-to-use software tool enabling quality evaluation by generating composite target-decoy databases usable with all relevant protein search engines. This tool, when used in conjunction with relevant statistical quality criteria, enables to reliably determine peptides and proteins of high quality, even for nonexperienced users (e.g. laboratory staff, researchers without programming knowledge). Different strategies for building decoy databases are implemented and the resulting databases are characterized and compared. The quality of protein identification in high-throughput proteomics is usually measured by the false positive rate (FPR), but it is shown that the false discovery rate (FDR) delivers a more meaningful, robust and comparable value.  相似文献   

19.
In high-throughput proteomics the development of computational methods and novel experimental strategies often rely on each other. In certain areas, mass spectrometry methods for data acquisition are ahead of computational methods to interpret the resulting tandem mass spectra. Particularly, although there are numerous situations in which a mixture tandem mass spectrum can contain fragment ions from two or more peptides, nearly all database search tools still make the assumption that each tandem mass spectrum comes from one peptide. Common examples include mixture spectra from co-eluting peptides in complex samples, spectra generated from data-independent acquisition methods, and spectra from peptides with complex post-translational modifications. We propose a new database search tool (MixDB) that is able to identify mixture tandem mass spectra from more than one peptide. We show that peptides can be reliably identified with up to 95% accuracy from mixture spectra while considering only a 0.01% of all possible peptide pairs (four orders of magnitude speedup). Comparison with current database search methods indicates that our approach has better or comparable sensitivity and precision at identifying single-peptide spectra while simultaneously being able to identify 38% more peptides from mixture spectra at significantly higher precision.  相似文献   

20.
Beer I  Barnea E  Ziv T  Admon A 《Proteomics》2004,4(4):950-960
Tandem mass spectrometry (MS/MS), coupled with liquid chromatography (LC), is a powerful tool for the analysis and comparison of complex protein and peptide mixtures. However, the extremely large amounts of data that result from the process are very complex and difficult to analyze. We show how the clustering of similar spectra from multiple LC-MS/MS runs can help in data management and improve the analysis of complex peptide mixtures. The major effect of spectrum clustering is the reduction of the huge amounts of data to a manageable size. As a result, analysis time is shorter and more data can be stored for further analysis. Furthermore, spectrum quality improvement allows the identification of more peptides with greater confidence, the comparison of complex peptide mixtures is facilitated, and the entire proteomics project is presented in concise form. Pep-Miner is an advanced software tool that implements these clustering-based applications. It proved useful in several comparative proteomics projects involving lung cancer cells and various other cell types. In one of these projects, Pep-Miner reduced 517 000 spectra to 20 900 clusters and identified 2518 peptides derived from 830 proteins. Clustering and identification lasted less than two hours on an IBM Thinkpad T23 computer (laptop). Pep-Miner's unique properties make it a very useful tool for large-scale shotgun proteomics projects.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号