首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 476 毫秒
1.
We report on a new de novo peptide sequencing algorithm that uses spectral graph partitioning. In this approach, relationships between m/z peaks are represented by attractive and repulsive springs, and the vibrational modes of the spring system are used to infer information about the peaks (such as "likely b-ion" or "likely y-ion"). We demonstrate the effectiveness of this approach by comparison with other de novo sequencers on test sets of ion-trap and QTOF spectra, including spectra of mixtures of peptides. On all datasets, we outperform the other sequencers. Along with spectral graph theory techniques, the new de novo sequencer EigenMS incorporates another improvement of independent interest: robust statistical methods for recalibration of time-of-flight mass measurements. Robust recalibration greatly outperforms simple least-squares recalibration, achieving about three times the accuracy for one QTOF dataset.  相似文献   

2.
In high-throughput proteomics the development of computational methods and novel experimental strategies often rely on each other. In certain areas, mass spectrometry methods for data acquisition are ahead of computational methods to interpret the resulting tandem mass spectra. Particularly, although there are numerous situations in which a mixture tandem mass spectrum can contain fragment ions from two or more peptides, nearly all database search tools still make the assumption that each tandem mass spectrum comes from one peptide. Common examples include mixture spectra from co-eluting peptides in complex samples, spectra generated from data-independent acquisition methods, and spectra from peptides with complex post-translational modifications. We propose a new database search tool (MixDB) that is able to identify mixture tandem mass spectra from more than one peptide. We show that peptides can be reliably identified with up to 95% accuracy from mixture spectra while considering only a 0.01% of all possible peptide pairs (four orders of magnitude speedup). Comparison with current database search methods indicates that our approach has better or comparable sensitivity and precision at identifying single-peptide spectra while simultaneously being able to identify 38% more peptides from mixture spectra at significantly higher precision.  相似文献   

3.
The random accumulation of variations in the human genome over time implicitly encodes a history of how human populations have arisen, dispersed, and intermixed since we emerged as a species. Reconstructing that history is a challenging computational and statistical problem but has important applications both to basic research and to the discovery of genotype-phenotype correlations. We present a novel approach to inferring human evolutionary history from genetic variation data. We use the idea of consensus trees, a technique generally used to reconcile species trees from divergent gene trees, adapting it to the problem of finding robust relationships within a set of intraspecies phylogenies derived from local regions of the genome. Validation on both simulated and real data shows the method to be effective in recapitulating known true structure of the data closely matching our best current understanding of human evolutionary history. Additional comparison with results of leading methods for the problem of population substructure assignment verifies that our method provides comparable accuracy in identifying meaningful population subgroups in addition to inferring relationships among them. The consensus tree approach thus provides a promising new model for the robust inference of substructure and ancestry from large-scale genetic variation data.  相似文献   

4.
Peptide identification by tandem mass spectrometry is the dominant proteomics workflow for protein characterization in complex samples. The peptide fragmentation spectra generated by these workflows exhibit characteristic fragmentation patterns that can be used to identify the peptide. In other fields, where the compounds of interest do not have the convenient linear structure of peptides, fragmentation spectra are identified by comparing new spectra with libraries of identified spectra, an approach called spectral matching. In contrast to sequence-based tandem mass spectrometry search engines used for peptides, spectral matching can make use of the intensities of fragment peaks in library spectra to assess the quality of a match. We evaluate a hidden Markov model approach (HMMatch) to spectral matching, in which many examples of a peptide's fragmentation spectrum are summarized in a generative probabilistic model that captures the consensus and variation of each peak's intensity. We demonstrate that HMMatch has good specificity and superior sensitivity, compared to sequence database search engines such as X!Tandem. HMMatch achieves good results from relatively few training spectra, is fast to train, and can evaluate many spectra per second. A statistical significance model permits HMMatch scores to be compared with each other, and with other peptide identification tools, on a unified scale. HMMatch shows a similar degree of concordance with X!Tandem, Mascot, and NIST's MS Search, as they do with each other, suggesting that each tool can assign peptides to spectra that the others miss. Finally, we show that it is possible to extrapolate HMMatch models beyond a single peptide's training spectra to the spectra of related peptides, expanding the application of spectral matching techniques beyond the set of peptides previously observed.  相似文献   

5.
MOTIVATION: Ion-type identification is a fundamental problem in computational proteomics. Methods for accurate identification of ion types provide the basis for many mass spectrometry data interpretation problems, including (a) de novo sequencing, (b) identification of post-translational modifications and mutations and (c) validation of database search results. RESULTS: Here, we present a novel graph-theoretic approach for solving the problem of separating b ions from y ions in a set of tandem mass spectra. We represent each spectral peak as a node and consider two types of edges: type-1 edge connecting two peaks probably of the same ion types and type-2 edge connecting two peaks probably of different ion types. The problem of ion-separation is formulated and solved as a graph partition problem, which is to partition the graph into three subgraphs, representing b, y and others ions, respectively, through maximizing the total weight of type-1 edges while minimizing the total weight of type-2 edges within each partitioned subgraph. We have developed a dynamic programming algorithm for rigorously solving this graph partition problem and implemented it as a computer program PRIME (PaRtition of Ion types in tandem Mass spEctra). The tests on a large amount of simulated mass spectra and 19 sets of high-quality experimental Fourier transform ion cyclotron resonance tandem mass spectra indicate that an accuracy level of approximately 90% for the separation of b and y ions was achieved. AVAILABILITY: The executable code of PRIME is available upon request. CONTACT: xyn@bmb.uga.edu.  相似文献   

6.
Identification of fusion proteins has contributed significantly to our understanding of cancer progression, yielding important predictive markers and therapeutic targets. While fusion proteins can be potentially identified by mass spectrometry, all previously found fusion proteins were identified using genomic (rather than mass spectrometry) technologies. This lack of MS/MS applications in studies of fusion proteins is caused by the lack of computational tools that are able to interpret mass spectra from peptides covering unknown fusion breakpoints (fusion peptides). Indeed, the number of potential fusion peptides is so large that the existing MS/MS database search tools become impractical even in the case of small genomes. We explore computational approaches to identifying fusion peptides, propose an algorithm for solving the fusion peptide identification problem, and analyze the performance of this algorithm on simulated data. We further illustrate how this approach can be modified for human exons prediction.  相似文献   

7.
A key problem in computational proteomics is distinguishing between correct and false peptide identifications. We argue that evaluating the error rates of peptide identifications is not unlike computing generating functions in combinatorics. We show that the generating functions and their derivatives ( spectral energy and spectral probability) represent new features of tandem mass spectra that, similarly to Delta-scores, significantly improve peptide identifications. Furthermore, the spectral probability provides a rigorous solution to the problem of computing statistical significance of spectral identifications. The spectral energy/probability approach improves the sensitivity-specificity tradeoff of existing MS/MS search tools, addresses the notoriously difficult problem of "one-hit-wonders" in mass spectrometry, and often eliminates the need for decoy database searches. We therefore argue that the generating function approach has the potential to increase the number of peptide identifications in MS/MS searches.  相似文献   

8.
Despite a recent surge of interest in database-independent peptide identifications, accurate de novo peptide sequencing remains an elusive goal. While the recently introduced spectral network approach resulted in accurate peptide sequencing in low-complexity samples, its success depends on the chance of presence of spectra from overlapping peptides. On the other hand, while multistage mass spectrometry (collecting multiple MS 3 spectra from each MS 2 spectrum) can be applied to all spectra in a complex sample, there are currently no software tools for de novo peptide sequencing by multistage mass spectrometry. We describe a rigorous probabilistic framework for analyzing spectra of overlapping peptides and show how to apply it for multistage mass spectrometry. Our software results in both accurate de novo peptide sequencing from multistage mass spectra (despite the inferior quality of MS 3 spectra) and improved interpretation of spectral networks. We further study the problem of de novo peptide sequencing with accurate parent mass (but inaccurate fragment masses), the protocol that may soon become the dominant mode of spectral acquisition. Most existing peptide sequencing algorithms (based on the spectrum graph approach) do not track the accurate parent mass and are thus not equipped for solving this problem. We describe a de novo peptide sequencing algorithm aimed at this experimental protocol and show that it improves the sequencing accuracy on both tandem and multistage mass spectrometry.  相似文献   

9.
Geometric algorithms for the analysis of 2D-electrophoresis gels.   总被引:1,自引:0,他引:1  
In proteomics, two-dimensional gel electrophoresis (2-DE) is a separation technique for proteins. The resulting protein spots can be identified either by using picking robots and subsequent mass spectrometry or by visual cross inspection of a new gel image with an already analyzed master gel. Difficulties especially arise from inherent noise and irregular geometric distortions in 2-DE images. Aiming at the automated analysis of large series of 2-DE images, or at the even more difficult interlaboratory gel comparisons, the bottleneck is to solve the two most basic algorithmic problems with high quality: Identifying protein spots and computing a matching between two images. For the development of the analysis software CAROl at Freie Universit?t Berlin, we have reconsidered these two problems and obtained new solutions which rely on methods from computational geometry. Their novelties are: 1. Spot detection is also possible for complex regions formed by several "merged" (usually saturated) spots; 2. User-defined landmarks are not necessary for the matching. Furthermore, images for comparison are allowed to represent different parts of the entire protein pattern, which only partially "overlap." The implementation is done in a client server architecture to allow queries via the internet. We also discuss and point at related theoretical questions in computational geometry.  相似文献   

10.
Tandem mass spectrometry (MS/MS) has emerged as a cornerstone of proteomics owing in part to robust spectral interpretation algorithms. Widely used algorithms do not fully exploit the intensity patterns present in mass spectra. Here, we demonstrate that intensity pattern modeling improves peptide and protein identification from MS/MS spectra. We modeled fragment ion intensities using a machine-learning approach that estimates the likelihood of observed intensities given peptide and fragment attributes. From 1,000,000 spectra, we chose 27,000 with high-quality, nonredundant matches as training data. Using the same 27,000 spectra, intensity was similarly modeled with mismatched peptides. We used these two probabilistic models to compute the relative likelihood of an observed spectrum given that a candidate peptide is matched or mismatched. We used a 'decoy' proteome approach to estimate incorrect match frequency, and demonstrated that an intensity-based method reduces peptide identification error by 50-96% without any loss in sensitivity.  相似文献   

11.
MOTIVATION: Tandem mass spectrometry combined with sequence database searching is one of the most powerful tools for protein identification. As thousands of spectra are generated by a mass spectrometer in one hour, the speed of database searching is critical, especially when searching against a large sequence database, or when the peptide is generated by some unknown or non-specific enzyme, even or when the target peptides have post-translational modifications (PTM). In practice, about 70-90% of the spectra have no match in the database. Many believe that a significant portion of them are due to peptides of non-specific digestions by unknown enzymes or amino acid modifications. In another case, scientists may choose to use some non-specific enzymes such as pepsin or thermolysin for proteolysis in proteomic study, in that not all proteins are amenable to be digested by some site-specific enzymes, and furthermore many digested peptides may not fall within the rang of molecular weight suitable for mass spectrometry analysis. Interpreting mass spectra of these kinds will cost a lot of computational time of database search engines. OVERVIEW: The present study was designed to speed up the database searching process for both cases. More specifically speaking, we employed an approach combining suffix tree data structure and spectrum graph. The suffix tree is used to preprocess the protein sequence database, while the spectrum graph is used to preprocess the tandem mass spectrum. We then search the suffix tree against the spectrum graph for candidate peptides. We design an efficient algorithm to compute a matching threshold with some statistical significance level, e.g. p = 0.01, for each spectrum, and use it to select candidate peptides. Then we rank these peptides using a SEQUEST-like scoring function. The algorithms were implemented and tested on experimental data. For post-translational modifications, we allow arbitrary number of any modification to a protein. AVAILABILITY: The executable program and other supplementary materials are available online at: http://hto-c.usc.edu:8000/msms/suffix/.  相似文献   

12.
In quantitative mass spectrometry-based proteomics, the metabolic incorporation of a single source of 15N-labeled nitrogen has many advantages over using stable isotope-labeled amino acids. However, the lack of a robust computational framework for analyzing the resulting spectra has impeded wide use of this approach. We have addressed this challenge by introducing a new computational methodology for analyzing 15N spectra in which quantification is integrated with identification. Application of this method to an Escherichia coli growth transition reveals significant improvement in quantification accuracy over previous methods.  相似文献   

13.
In the last two years, because of advances in protein separation and mass spectrometry, top-down mass spectrometry moved from analyzing single proteins to analyzing complex samples and identifying hundreds and even thousands of proteins. However, computational tools for database search of top-down spectra against protein databases are still in their infancy. We describe MS-Align+, a fast algorithm for top-down protein identification based on spectral alignment that enables searches for unexpected post-translational modifications. We also propose a method for evaluating statistical significance of top-down protein identifications and further benchmark various software tools on two top-down data sets from Saccharomyces cerevisiae and Salmonella typhimurium. We demonstrate that MS-Align+ significantly increases the number of identified spectra as compared with MASCOT and OMSSA on both data sets. Although MS-Align+ and ProSightPC have similar performance on the Salmonella typhimurium data set, MS-Align+ outperforms ProSightPC on the (more complex) Saccharomyces cerevisiae data set.  相似文献   

14.
Glycan decorations dictate protein functions and thus have crucialimportance in life sciences. Previously glycoprotein analysiswas mainly focused on the analysis of the liberated glycansallowing detailed structural, but lacking positional information.Analysis of intact glycopeptides required purified glycoproteinsand manual interpretation of spectra. We developed an approachwhere mixtures of native glycopeptides were analyzed with tandemmass spectrometry and the spectra were analyzed with automatedin silico workflows. The latter included combination of theoriginal spectra, generation of a human N-glycopeptide library,matching the glycopeptide spectra to the theoretical peptidefragments, scoring the observations, predicting the glycan composition,which were then matched against the observed spectra, statisticalvalidation of the results with target–decoy filtering,and finally the calculation of glycan structures. We verifiedthis approach with the 150 serotransferrin glycopeptide spectra,where we automatically generated 105 putative interpretationsfrom >109 theoretical glycopeptides. After scoring 62 glycopeptidespectra obtained validated interpretation with concomitant aminoacid sequences, glycan compositions, and structures. When applyingthis method to an unknown mixture of human plasma glycoproteinswe identified 80 glycopeptides with their glycan compositionsor structures. Instead of weeks and months of interpretationwork of mass spectrometry files our automated workflow can beexecuted in few hours and provide information concomitantlyfrom both the amino acid and glycan moieties of intact glycopeptidesin mixtures. No advanced computational skills were needed touse these preformed and tested workflows. In case users wantto add complexity to the analysis they are allowed to alterall parameters and rebuild the workflows.  相似文献   

15.
We propose a new approach to the problem of determining an ensemble of protein structures with a set of interatomic distance bounds in NMR protein modeling. Similarly to X-ray crystallography, we assume that the protein has an equilibrium structure and the atoms fluctuate around their equilibrium positions. Then, the problem can be formulated as a generalized distance geometry problem, to find the equilibrium positions and maximal possible fluctuation radii for the atoms in the protein, subject to the condition that the fluctuations should be within the given distance bounds. We describe the scientific background of the work, the motivation of the new approach and the formulation of the problem. We develop a geometric buildup algorithm for an approximate solution to the problem and present some preliminary test results as a first step concept proofing. We also discuss related theoretical and computational issues and potential impacts of this work in NMR protein modeling.  相似文献   

16.
Detection of posttranslational modifications is expected to be one of the major future experimental challenges for proteomics. We describe herein a mass spectrometric procedure to screen for protein modifications by peptide mass fingerprinting that is based on post-data acquisition improvement of the mass accuracy by exporting the peptide mass values into analytical software for multipoint recalibration on recognized peaks. Subsequently, the calibrated peak mass data set is used in searching for modified peptides, i.e., peptides possessing specific mass deviations. In order to identify the location of Lys- and Gln-residues available for transglutaminase-catalyzed isopeptide bond formation, mammalian small heat shock proteins (sHsps) were screened for labeling with the two hexapeptide probes GQDPVR and GNDPVK in presence of transglutaminase. Peptide modification due to cross-linking of the GQDPVR hexa-peptide probe was detected for C-terminal Lys residues. Novel transglutaminase-susceptible Gln sites were identified in two sHsps (Q31/Q27 in Hsp20 and HspB2, respectively), by cross-linking of the GNDPVK hexapeptide probe. Deamidation of specific Gln residues was also detected, as well an isopeptide derived from intramolecular Gln-Lys isopeptide bond formation. We conclude that peptide mass fingerprinting can be an efficient way of screening for various posttranslational modifications. Basically any instrumentation for MALDI mass spectrometry can be used, provided that post-data acquisition recalibration is applied.  相似文献   

17.
Reliable statistical validation of peptide and protein identifications is a top priority in large-scale mass spectrometry based proteomics. PeptideProphet is one of the computational tools commonly used for assessing the statistical confidence in peptide assignments to tandem mass spectra obtained using database search programs such as SEQUEST, MASCOT, or X! TANDEM. We present two flexible methods, the variable component mixture model and the semiparametric mixture model, that remove the restrictive parametric assumptions in the mixture modeling approach of PeptideProphet. Using a control protein mixture data set generated on an linear ion trap Fourier transform (LTQ-FT) mass spectrometer, we demonstrate that both methods improve parametric models in terms of the accuracy of probability estimates and the power to detect correct identifications controlling the false discovery rate to the same degree. The statistical approaches presented here require that the data set contain a sufficient number of decoy (known to be incorrect) peptide identifications, which can be obtained using the target-decoy database search strategy.  相似文献   

18.
Mass spectrometry combined with database searching has become the preferred method for identifying proteins in proteomics projects. Proteins are digested by one or several enzymes to obtain peptides, which are analyzed by mass spectrometry. We introduce a new family of scoring schemes, named OLAV, aimed at identifying peptides in a database from their tandem mass spectra. OLAV scoring schemes are based on signal detection theory, and exploit mass spectrometry information more extensively than previously existing schemes. We also introduce a new concept of structural matching that uses pattern detection methods to better separate true from false positives. We show the superiority of OLAV scoring schemes compared to MASCOT, a widely used identification program. We believe that this work introduces a new way of designing scoring schemes that are especially adapted to high-throughput projects such as GeneProt large-scale human plasma project, where it is impractical to check all identifications manually.  相似文献   

19.
We have developed an approach that allows peptide mass mapping by matrix-assisted laser desorption ionization-mass spectrometry of proteins visualized on a nitrocellulose membrane by immunochemical detection. Proteins are separated by sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE), electroblotted onto a nitrocellulose membrane and after blocking with a nonprotein-containing polymer such as polyvinylpyrrolidone 40 (PVP-40) or Tween 20, the proteins are stained with fount India ink. After incubation with primary and, if required, secondary peroxidase-coupled antibodies, immunochemically reactive proteins can be visualized using conventional enhanced chemiluminescence detection and assigned to the India ink-stained membrane by simple superposition. The proteins of interest are excised, submitted to "on-membrane" cleavage and the peptides are analyzed by mass spectrometry. Protein-based blocking reagents normally used in standard immunodetection protocols, such as skimmed milk, can be employed. We have obtained high-quality mass spectra of bovine serum albumin (BSA) detected on an immunoblot with an estimated amount of 100 fmol applied onto the gel, indicating the sensitivity of the present method. In addition, the approach is demonstrated with two other commercially available proteins, a serum protein, the successful identification of a tyrosine phosphorylated protein from total rat liver homogenate and serine phosphorylated proteins from an EcR 293 nuclear extract separated by two-dimensional (2-D) SDS-PAGE.  相似文献   

20.
The high-throughput nature of proteomics mass spectrometry is enabled by a productive combination of data acquisition protocols and the computational tools used to interpret the resulting spectra. One of the key components in mainstream protocols is the generation of tandem mass (MS/MS) spectra by peptide fragmentation using collision induced dissociation, the approach currently used in the large majority of proteomics experiments to routinely identify hundreds to thousands of proteins from single mass spectrometry runs. Complementary to these, alternative peptide fragmentation methods such as electron capture/transfer dissociation and higher-energy collision dissociation have consistently achieved significant improvements in the identification of certain classes of peptides, proteins, and post-translational modifications. Recognizing these advantages, mass spectrometry instruments now conveniently support fine-tuned methods that automatically alternate between peptide fragmentation modes for either different types of peptides or for acquisition of multiple MS/MS spectra from each peptide. But although these developments have the potential to substantially improve peptide identification, their routine application requires corresponding adjustments to the software tools and procedures used for automated downstream processing. This review discusses the computational implications of alternative and alternate modes of MS/MS peptide fragmentation and addresses some practical aspects of using such protocols for identification of peptides and post-translational modifications.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号