首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Tandem mass spectrometry fragments a large number of molecules of the same peptide sequence into charged molecules of prefix and suffix peptide subsequences and then measures mass/charge ratios of these ions. The de novo peptide sequencing problem is to reconstruct the peptide sequence from a given tandem mass spectral data of k ions. By implicitly transforming the spectral data into an NC-spectrum graph G (V, E) where /V/ = 2k + 2, we can solve this problem in O(/V//E/) time and O(/V/2) space using dynamic programming. For an ideal noise-free spectrum with only b- and y-ions, we improve the algorithm to O(/V/ + /E/) time and O(/V/) space. Our approach can be further used to discover a modified amino acid in O(/V//E/) time. The algorithms have been implemented and tested on experimental data.  相似文献   

2.
For the identification of peptides with tandem mass spectrometry (MS/MS), many software tools rely on the comparison between an experimental spectrum and a theoretically predicted spectrum. Consequently, the accurate prediction of the theoretical spectrum from a peptide sequence can potentially improve the peptide identification performance and is an important problem for mass spectrometry based proteomics. In this study a new approach, called MS-Simulator, is presented for predicting the y-ion intensities in the spectrum of a given peptide. The new approach focuses on the accurate prediction of the relative intensity ratio between every two adjacent y-ions. The theoretical spectrum can then be derived from these ratios. The prediction of a ratio is a closed-form equation that involves up to five consecutive amino acids nearby the two y-ions and the two peptide termini. Compared with another existing spectrum prediction tool MassAnalyzer, the new approach not only simplifies the computation, but also improves the prediction accuracy.  相似文献   

3.
In high-throughput proteomics the development of computational methods and novel experimental strategies often rely on each other. In certain areas, mass spectrometry methods for data acquisition are ahead of computational methods to interpret the resulting tandem mass spectra. Particularly, although there are numerous situations in which a mixture tandem mass spectrum can contain fragment ions from two or more peptides, nearly all database search tools still make the assumption that each tandem mass spectrum comes from one peptide. Common examples include mixture spectra from co-eluting peptides in complex samples, spectra generated from data-independent acquisition methods, and spectra from peptides with complex post-translational modifications. We propose a new database search tool (MixDB) that is able to identify mixture tandem mass spectra from more than one peptide. We show that peptides can be reliably identified with up to 95% accuracy from mixture spectra while considering only a 0.01% of all possible peptide pairs (four orders of magnitude speedup). Comparison with current database search methods indicates that our approach has better or comparable sensitivity and precision at identifying single-peptide spectra while simultaneously being able to identify 38% more peptides from mixture spectra at significantly higher precision.  相似文献   

4.
We report on the effectiveness of CID, HCD, and ETD for LC-FT MS/MS analysis of peptides using a tandem linear ion trap-Orbitrap mass spectrometer. A range of software tools and analysis parameters were employed to explore the use of CID, HCD, and ETD to identify peptides (isolated from human blood plasma) without the use of specific "enzyme rules". In the evaluation of an FDR-controlled SEQUEST scoring method, the use of accurate masses for fragments increased the number of identified peptides (by ~50%) compared to the use of conventional low accuracy fragment mass information, and CID provided the largest contribution to the identified peptide data sets compared to HCD and ETD. The FDR-controlled Mascot scoring method provided significantly fewer peptide identifications than SEQUEST (by 1.3-2.3 fold) and CID, HCD, and ETD provided similar contributions to identified peptides. Evaluation of de novo sequencing and the UStags method for more intense fragment ions revealed that HCD afforded more contiguous residues (e.g., ≥ 7 amino acids) than either CID or ETD. Both the FDR-controlled SEQUEST and Mascot scoring methods provided peptide data sets that were affected by the decoy database used and mass tolerances applied (e.g., identical peptides between data sets could be limited to ~70%), while the UStags method provided the most consistent peptide data sets (>90% overlap). The m/z ranges in which CID, HCD, and ETD contributed the largest number of peptide identifications were substantially overlapping. This work suggests that the three peptide ion fragmentation methods are complementary and that maximizing the number of peptide identifications benefits significantly from a careful match with the informatics tools and methods applied. These results also suggest that the decoy strategy may inaccurately estimate identification FDRs.  相似文献   

5.
As experimental technologies for characterization of proteomes emerge, bioinformatic analysis of the data becomes essential. Separation and identification technologies currently based on two-dimensional gels/mass spectrometry provide the inherent analytical power required. This strategy involves protein spot digestion and accurate mass mapping together with computational interrogation of available data bases for protein functional identification. When either no exact match is found or when the possible matches only partially account for molecular weights actually observed, peptide sequencing by tandem mass spectrometry has emerged as the methodology of choice to provide the basic additional information required. To evaluate the capabilities of bioinformatics methods employed for identifying homologs of a protein of interest, we attempted to identify the major proteins from the 20 S proteasome of Trypanosoma brucei using sequence information determined using mass spectrometry. The results suggest that neither the traditional query engines, BLAST and FASTA, nor specialized software developed for analysis of sequence information obtained by mass spectrometry are able to identify even closely related sequences at statistically significant scores. To address this deficit, new bioinformatics approaches were developed for concomitant use of the multiple fragments of short sequence typically available from methods of tandem mass spectrometry. These approaches rely on the occurrence of congruence across searches of multiple fragments from a single protein. This method resulted in sharply better statistical significance values for correct hits in the data base output relative to that achieved for independent searches using single sequence fragments.  相似文献   

6.
The applicability of a trypsin-based monolithic bioreactor coupled on-line with LC/MS/MS for rapid proteolytic digestion and protein identification is here described. Dilute samples are passed through the bioreactor for generation of proteolytic fragments in less than 10 min. After digestion and peptide separation, electrospray ionization tandem mass spectrometry is used to generate a peptide map and to identify proteolytic peptides by correlating their fragmentation spectra with amino acid sequences from a protein database. By digesting picomoles of proteins sufficient data from ESI and MS/MS were obtained to unambiguously identify proteins alone and in serum samples. This approach was also extended to locate mutation sites in beta-lactoglobulin A and B variants.  相似文献   

7.
Highly sensitive peptide fragmentation and identification in sequence databases is a cornerstone of proteomics. Previously, a two-layered strategy consisting of MALDI peptide mass fingerprinting followed by electrospray tandem mass spectrometry of the unidentified proteins has been successfully employed. Here, we describe a high-sensitivity/high-throughput system based on orthogonal MALDI tandem mass spectrometry (o-MALDI) and the automated recognition of fragments corresponding to the N- and C-terminal amino acid residues. Robotic deposition of samples onto hydrophobic anchor substrates is employed, and peptide spectra are acquired automatically. The pulsing feature of the QSTAR o-MALDI mass spectrometer enhances the low mass region of the spectra by approximately 1 order of magnitude. Software has been developed to automatically recognize characteristic features in the low mass region (such as the y1 ion of tryptic peptides), maintaining high mass accuracy even with very low count events. Typically, the sum of the N-terminal two ions (b2 ion), the third N-terminal ion (b3 ion), and the two C-terminal fragments of the peptide (y1 and y2) can be determined. Given mass accuracy in the low ppm range, peptide end sequencing on one or two tryptic peptides is sufficient to uniquely identify a protein from gel samples in the low silver-stained range.  相似文献   

8.
Bandeira N 《BioTechniques》2007,42(6):687, 689, 691 passim
Significant technological advances have accelerated high-throughput proteomics to the automated generation of millions of tandem mass spectra on a daily basis. In such a setup, the desire for greater sequence coverage combines with standard experimental procedures to commonly yield multiple tandem mass spectra from overlapping peptides-typical observations include peptides differing by one or two terminal amino acids and spectra from modified and unmodified variants of the same peptides. In a departure from the traditional spectrum identification algorithms that analyze each tandem mass spectrum in isolation, spectral networks define a new computational approach that instead finds and simultaneously interprets sets of spectra from overlapping peptides. In shotgun protein sequencing, spectral networks capitalize on the redundant sequence information in the aligned spectra to deliver the longest and most accurate de novo sequences ever reported for ion trap data. Also, by combining spectra from multiple modified and unmodified variants of the same peptides, spectral networks are able to bypass the dominant guess/confirm approach to the identification of posttranslational modifications and alternatively discover modifications and highly modified peptides directly from experimental data. Open-source implementations of these algorithms may be downloaded from peptide.ucsd.edu.  相似文献   

9.
Mascot, a database-search algorithm, is used to deduce an amino acid sequence from a peptide tandem mass spectrum. The magnitude of the Ions score associated with each peptide mostly reflects the extent of b-y ion matching in a collision-induced dissociation spectrum. Recently, several studies have reported peptides identified with abnormally low Ions scores. While a majority of the spectra in these studies may be correctly assigned, low-scoring spectra could lack discernible b-y ion fragments needed to clearly delineate a peptide sequence. It appears that low-scoring identification may be predicated primarily on judgmental parent ion mass accuracy and that justification to include such low-scoring peptides may be based on inaccurate false discovery rate modeling. It is likely that additional scientific experimentation is needed or appropriate methodologies adopted before substandard fragment ion matching can be considered proof of peptide identification.  相似文献   

10.
Reliable statistical validation of peptide and protein identifications is a top priority in large-scale mass spectrometry based proteomics. PeptideProphet is one of the computational tools commonly used for assessing the statistical confidence in peptide assignments to tandem mass spectra obtained using database search programs such as SEQUEST, MASCOT, or X! TANDEM. We present two flexible methods, the variable component mixture model and the semiparametric mixture model, that remove the restrictive parametric assumptions in the mixture modeling approach of PeptideProphet. Using a control protein mixture data set generated on an linear ion trap Fourier transform (LTQ-FT) mass spectrometer, we demonstrate that both methods improve parametric models in terms of the accuracy of probability estimates and the power to detect correct identifications controlling the false discovery rate to the same degree. The statistical approaches presented here require that the data set contain a sufficient number of decoy (known to be incorrect) peptide identifications, which can be obtained using the target-decoy database search strategy.  相似文献   

11.
The sequence tag-based peptide identification methods are a promising alternative to the traditional database search approach. However, a more comprehensive analysis, optimization, and comparison with established methods are necessary before these methods can gain widespread use in the proteomics community. Using the InsPecT open source code base ( Tanner et al., Anal. Chem. 2005, 77, 4626- 39 ), we present an improved sequence tag generation method that directly incorporates multicharged fragment ion peaks present in many tandem mass spectra of higher charge states. We also investigate the performance of sequence tagging under different settings using control data sets generated on five different types of mass spectrometers, as well as using a complex phosphopeptide-enriched sample. We also demonstrate that additional modeling of InsPecT search scores using a semiparametric approach incorporating the accuracy of the precursor ion mass measurement provides additional improvement in the ability to discriminate between correct and incorrect peptide identifications. The overall superior performance of the sequence tag-based peptide identification method is demonstrated by comparison with a commonly used SEQUEST/PeptideProphet approach.  相似文献   

12.
Using proteomics to mine genome sequences   总被引:2,自引:0,他引:2  
We present a method for mining unannotated or annotated genome sequences with proteomic data to identify open reading frames. The region of a genome coding for a protein sequence is identified by using information from the analysis of proteins and peptides with MALDI-TOF mass spectrometry. The raw genome sequence or any unassembled contigs of an organism are theoretically cleaved into a number of equal sized but overlapping fragments, and these are then translated in all six frames into a series of virtual proteins. Each virtual protein is then subjected to a theoretical enzymatic digestion. Standard proteomic sample preparation methods are used to separate, array, and digest the proteins of interest to peptides. The masses of the resulting peptides are measured using mass spectrometry and compared to the theoretical peptide masses of the virtual proteins. The region of the genome responsible for coding for a particular protein can then be identified when there are a large number of hits between peptides from the protein and peptides from the virtual protein. The method makes no assumptions about the location of a protein in a particular gene sequence or the positions or types of start and stop codons. To illustrate this approach, all 773 proteins of Pseudomonas aeruginosa contained in SWISS-PROT were used to theoretically test the method and optimize parameters. Increasing the size of the virtual proteins results in an overall improvement in the ability to detect the coding region, at the cost of decreasing the sensitivity of the method for smaller proteins. Increasing the minimum number of matching peptides, lowering the mass error tolerance, or increasing the signal-to-noise ratio of the simulated mass spectrum, improves the ability to detect coding regions. The method is further demonstrated on experimental data from Mycobacterium tuberculosis and is also shown to work with eukaryotic organisms (e.g., Homo sapiens).  相似文献   

13.
A novel hybrid methodology for the automated identification of peptides via de novo integer linear optimization, local database search, and tandem mass spectrometry is presented in this article. A modified version of the de novo identification algorithm PILOT, is utilized to construct accurate de novo peptide sequences. A modified version of the local database search tool FASTA is used to query these de novo predictions against the nonredundant protein database to resolve any low-confidence amino acids in the candidate sequences. The computational burden associated with performing several alignments is alleviated with the use of distributive computing. Extensive computational studies are presented for this new hybrid methodology, as well as comparisons with MASCOT for a set of 38 quadrupole time-of-flight (QTOF) and 380 OrbiTrap tandem mass spectra. The results for our proposed hybrid method for the OrbiTrap spectra are also compared with a modified version of PepNovo, which was trained for use on high-precision tandem mass spectra, and the tag-based method InsPecT. The de novo sequences of PILOT and PepNovo are also searched against the nonredundant protein database using CIDentify to compare with the alignments achieved by our modifications of FASTA. The comparative studies demonstrate the excellent peptide identification accuracy gained from combining the strengths of our de novo method, which is based on integer linear optimization, and database driven search methods.  相似文献   

14.
We evaluate statistical models used in two-hypothesis tests for identifying peptides from tandem mass spectrometry data. The null hypothesis H(0), that a peptide matches a spectrum by chance, requires information on the probability of by-chance matches between peptide fragments and peaks in the spectrum. Likewise, the alternate hypothesis H(A), that the spectrum is due to a particular peptide, requires probabilities that the peptide fragments would indeed be observed if it was the causative agent. We compare models for these probabilities by determining the identification rates produced by the models using an independent data set. The initial models use different probabilities depending on fragment ion type, but uniform probabilities for each ion type across all of the labile bonds along the backbone. More sophisticated models for probabilities under both H(A) and H(0) are introduced that do not assume uniform probabilities for each ion type. In addition, the performance of these models using a standard likelihood model is compared to an information theory approach derived from the likelihood model. Also, a simple but effective model for incorporating peak intensities is described. Finally, a support-vector machine is used to discriminate between correct and incorrect identifications based on multiple characteristics of the scoring functions. The results are shown to reduce the misidentification rate significantly when compared to a benchmark cross-correlation based approach.  相似文献   

15.

Background

Liquid chromatography combined with tandem mass spectrometry is an important tool in proteomics for peptide identification. Liquid chromatography temporally separates the peptides in a sample. The peptides that elute one after another are analyzed via tandem mass spectrometry by measuring the mass-to-charge ratio of a peptide and its fragments. De novo peptide sequencing is the problem of reconstructing the amino acid sequences of a peptide from this measurement data. Past de novo sequencing algorithms solely consider the mass spectrum of the fragments for reconstructing a sequence.

Results

We propose to additionally exploit the information obtained from liquid chromatography. We study the problem of computing a sequence that is not only in accordance with the experimental mass spectrum, but also with the chromatographic retention time. We consider three models for predicting the retention time and develop algorithms for de novo sequencing for each model.

Conclusions

Based on an evaluation for two prediction models on experimental data from synthesized peptides we conclude that the identification rates are improved by exploiting the chromatographic information. In our evaluation, we compare our algorithms using the retention time information with algorithms using the same scoring model, but not the retention time.
  相似文献   

16.
A computer algorithm is described that utilizes both Edman and mass spectrometric data for simultaneous determination of the amino acid sequences of several peptides in a mixture. Gas phase sequencing of a peptide mixture results in a list of observed amino acids for each cycle of Edman degradation, which by itself may not be informative and typically requires reanalysis following additional chromatographic steps. Tandem mass spectrometry, on the other hand, has a proven ability to analyze sequences of peptides present in mixtures. However, mass spectrometric data may lack a complete set of sequence-defining fragment ions, so that more than one possible sequence may account for the observed fragment ions. A combination of the two types of data reduces the ambiguity inherent in each. The algorithm first utilizes the Edman data to determine all hypothetical sequences with a calculated mass equal to the observed mass of one of the peptides present in the mixture. These sequences are then assigned figures of merit according to how well each of them accounts for the fragment ions in the tandem mass spectrum of that peptide. The program was tested on tryptic and chymotryptic peptides from hen lysozyme, and the results are compared with those of another computer program that uses only mass spectral data for peptide sequencing. In order to assess the utility of this method the program is tested using simulated mixtures of varying complexity and tandem mass spectra of varying quality.  相似文献   

17.
A new method for enhancing peptide ion identification in proteomics analyses using ion mobility data is presented. Ideally, direct comparisons of experimental drift times (t(D)) with a standard mobility database could be used to rank candidate peptide sequence assignments. Such a database would represent only a fraction of sequences in protein databases and significant difficulties associated with the verification of data for constituent peptide ions would exist. A method that employs intrinsic amino acid size parameters to obtain ion mobility predictions that can be used to rank candidate peptide ion assignments is proposed. Intrinsic amino acid size parameters have been determined for doubly charged peptide ions from an annotated yeast proteome. Predictions of ion mobilities using the intrinsic size parameters are more accurate than those obtained from a polynomial fit to t(D) versus molecular weight data. More than a 2-fold improvement in prediction accuracy has been observed for a group of arginine-terminated peptide ions 12 residues in length. The use of this predictive enhancement as a means to aid peptide ion identification is discussed, and a simple peptide ion scoring scheme is presented.  相似文献   

18.
Shotgun tandem mass spectrometry-based peptide sequencing using programs such as SEQUEST allows high-throughput identification of peptides, which in turn allows the identification of corresponding proteins. We have applied a machine learning algorithm, called the support vector machine, to discriminate between correctly and incorrectly identified peptides using SEQUEST output. Each peptide was characterized by SEQUEST-calculated features such as delta Cn and Xcorr, measurements such as precursor ion current and mass, and additional calculated parameters such as the fraction of matched MS/MS peaks. The trained SVM classifier performed significantly better than previous cutoff-based methods at separating positive from negative peptides. Positive and negative peptides were more readily distinguished in training set data acquired on a QTOF, compared to an ion trap mass spectrometer. The use of 13 features, including four new parameters, significantly improved the separation between positive and negative peptides. Use of the support vector machine and these additional parameters resulted in a more accurate interpretation of peptide MS/MS spectra and is an important step toward automated interpretation of peptide tandem mass spectrometry data in proteomics.  相似文献   

19.
用于串联质谱鉴定多肽的计量方法   总被引:1,自引:0,他引:1  
目前已有多种对串联质谱与数据库中多肽的理论质谱的一致性进行评估的高通量计量算法用于鸟枪法蛋白质组学 (shotgunproteomics)研究。然而这些方法操作时存在大量错误的多肽鉴定。这里提出一种新的串联质谱识别多肽序列的计量算法。该算法综合考虑了串联质谱中不同离子出现的概率、多肽的酶切位点数、理论离子与实验离子的匹配程度和匹配模式。对大容量的串联质谱数据集的测试表明 ,根据算法开发的软件PepSearch比目前最常用的软件SEQUEST有更好的鉴定准确性。PepSearch可从http : compbio.sibsnet.org projects pepsearch下载。  相似文献   

20.
Identification of proteins from the mass spectra of peptide fragments generated by proteolytic cleavage using database searching has become one of the most powerful techniques in proteome science, capable of rapid and efficient protein identification. Using computer simulation, we have studied how the application of chemical derivatisation techniques may improve the efficiency of protein identification from mass spectrometric data. These approaches enhance ion yield and lead to the promotion of specific ions and fragments, yielding additional database search information. The impact of three alternative techniques has been assessed by searching representative proteome databases for both single proteins and simple protein mixtures. For example, by reliably promoting fragmentation of singly-charged peptide ions at aspartic acid residues after homoarginine derivatisation, 82% of yeast proteins can be unambiguously identified from a single typical peptide-mass datum, with a measured mass accuracy of 50 ppm, by using the associated secondary ion data. The extra search information also provides a means to confidently identify proteins in protein mixtures where only limited data are available. Furthermore, the inclusion of limited sequence information for the peptides can compensate and exceed the search efficiency available via high accuracy searches of around 5 ppm, suggesting that this is a potentially useful approach for simple protein mixtures routinely obtained from two-dimensional gels.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号