首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Zhao Y  Lin YH 《Proteomics》2005,5(4):853-855
Instead of using the probability mean, a simple and yet effective heuristic approach was employed to treat experimentally obtained tandem mass spectrometry (MS/MS) data for protein identification. The proposed approach is based on the total number (T) of identified experimental MS/MS data. To warrant the subsequent ranking, the total number of identified b- and y-type ions (Tb+y) must be greater than 50% of T. Peptides having the same T and Tb+y are either ranked by the contiguity of identified ions or discarded during identification. When compared to other protein identification tools, good agreement with the searched results was seen.  相似文献   

2.

Background  

In proteomics experiments, database-search programs are the method of choice for protein identification from tandem mass spectra. As amino acid sequence databases grow however, computing resources required for these programs have become prohibitive, particularly in searches for modified proteins. Recently, methods to limit the number of spectra to be searched based on spectral quality have been proposed by different research groups, but rankings of spectral quality have thus far been based on arbitrary cut-off values. In this work, we develop a more readily interpretable spectral quality statistic by providing probability values for the likelihood that spectra will be identifiable.  相似文献   

3.
4.
We report on a new de novo peptide sequencing algorithm that uses spectral graph partitioning. In this approach, relationships between m/z peaks are represented by attractive and repulsive springs, and the vibrational modes of the spring system are used to infer information about the peaks (such as "likely b-ion" or "likely y-ion"). We demonstrate the effectiveness of this approach by comparison with other de novo sequencers on test sets of ion-trap and QTOF spectra, including spectra of mixtures of peptides. On all datasets, we outperform the other sequencers. Along with spectral graph theory techniques, the new de novo sequencer EigenMS incorporates another improvement of independent interest: robust statistical methods for recalibration of time-of-flight mass measurements. Robust recalibration greatly outperforms simple least-squares recalibration, achieving about three times the accuracy for one QTOF dataset.  相似文献   

5.
Shotgun proteomics experiments are dependent upon database search engines to identify peptides from tandem mass spectra. Many of these algorithms score potential identifications by evaluating the number of fragment ions matched between each peptide sequence and an observed spectrum. These systems, however, generally do not distinguish between matching an intense peak and matching a minor peak. We have developed a statistical model to score peptide matches that is based upon the multivariate hypergeometric distribution. This scorer, part of the "MyriMatch" database search engine, places greater emphasis on matching intense peaks. The probability that the best match for each spectrum has occurred by random chance can be employed to separate correct matches from random ones. We evaluated this software on data sets from three different laboratories employing three different ion trap instruments. Employing a novel system for testing discrimination, we demonstrate that stratifying peaks into multiple intensity classes improves the discrimination of scoring. We compare MyriMatch results to those of Sequest and X!Tandem, revealing that it is capable of higher discrimination than either of these algorithms. When minimal peak filtering is employed, performance plummets for a scoring model that does not stratify matched peaks by intensity. On the other hand, we find that MyriMatch discrimination improves as more peaks are retained in each spectrum. MyriMatch also scales well to tandem mass spectra from high-resolution mass analyzers. These findings may indicate limitations for existing database search scorers that count matched peaks without differentiating them by intensity. This software and source code is available under Mozilla Public License at this URL: http://www.mc.vanderbilt.edu/msrc/bioinformatics/.  相似文献   

6.
Despite the publication of several software tools for analysis of glycopeptide tandem mass spectra, there remains a lack of consensus regarding the most effective and appropriate methods. In part, this reflects problems with applying standard methods for proteomics database searching and false discovery rate calculation. While the analysis of small post-translational modifications (PTMs) may be regarded as an extension of proteomics database searching, glycosylation requires specialized approaches. This is because glycans are large and heterogeneous by nature, causing glycopeptides to exist as multiple glycosylated variants. Thus, the mass of the peptide cannot be calculated directly from that of the intact glycopeptide. In addition, the chemical nature of the glycan strongly influences product ion patterns observed for glycopeptides. As a result, glycopeptidomics requires specialized bioinformatics methods. We summarize the recent progress towards a consensus for effective glycopeptide tandem mass spectrometric analysis.  相似文献   

7.
In high-throughput proteomics the development of computational methods and novel experimental strategies often rely on each other. In certain areas, mass spectrometry methods for data acquisition are ahead of computational methods to interpret the resulting tandem mass spectra. Particularly, although there are numerous situations in which a mixture tandem mass spectrum can contain fragment ions from two or more peptides, nearly all database search tools still make the assumption that each tandem mass spectrum comes from one peptide. Common examples include mixture spectra from co-eluting peptides in complex samples, spectra generated from data-independent acquisition methods, and spectra from peptides with complex post-translational modifications. We propose a new database search tool (MixDB) that is able to identify mixture tandem mass spectra from more than one peptide. We show that peptides can be reliably identified with up to 95% accuracy from mixture spectra while considering only a 0.01% of all possible peptide pairs (four orders of magnitude speedup). Comparison with current database search methods indicates that our approach has better or comparable sensitivity and precision at identifying single-peptide spectra while simultaneously being able to identify 38% more peptides from mixture spectra at significantly higher precision.  相似文献   

8.
It is shown that by a combination of positive and negative FAB mass spectrometry with collision activation using a tandem mass spectrometer diacyl glycerophosphoric acid ester mixtures can be analysed. The following results will be obtained: Molecular masses of the single components, nature of the residue bound to the phosphoric acid (choline, serine etc.), fatty acids present in the single components and (if different) their location at C-1/C-2 as well as a quantitative analysis both of the fatty acids in the mixture and of the various species making up the latter.  相似文献   

9.
Generating all plausible de novo interpretations of a peptide tandem mass (MS/MS) spectrum (Spectral Dictionary) and quickly matching them against the database represent a recently emerged alternative approach to peptide identification. However, the sizes of the Spectral Dictionaries quickly grow with the peptide length making their generation impractical for long peptides. We introduce Gapped Spectral Dictionaries (all plausible de novo interpretations with gaps) that can be easily generated for any peptide length thus addressing the limitation of the Spectral Dictionary approach. We show that Gapped Spectral Dictionaries are small thus opening a possibility of using them to speed-up MS/MS searches. Our MS-Gapped-Dictionary algorithm (based on Gapped Spectral Dictionaries) enables proteogenomics applications (such as searches in the six-frame translation of the human genome) that are prohibitively time consuming with existing approaches. MS-Gapped-Dictionary generates gapped peptides that occupy a niche between accurate but short peptide sequence tags and long but inaccurate full length peptide reconstructions. We show that, contrary to conventional wisdom, some high-quality spectra do not have good peptide sequence tags and introduce gapped tags that have advantages over the conventional peptide sequence tags in MS/MS database searches.  相似文献   

10.
Cross-linking technology combined with tandem mass spectrometry (MS-MS) is a powerful method that provides a rapid solution to the discovery of protein-protein interactions and protein structures. We studied the problem of detecting cross-linked peptides and cross-linked amino acids from tandem mass spectral data. Our method consists of two steps: the first step finds two protein subsequences whose mass sum equals a given mass measured from the mass spectrometry; and the second step finds the best cross-linked amino acids in these two peptide sequences that are optimally correlated to a given tandem mass spectrum. We designed fast and space-efficient algorithms for these two steps and implemented and tested them on experimental data of cross-linked hemoglobin proteins. An interchain cross-link between two beta subunits was found in two tandem mass spectra. The length of the cross-linker (7.7 A) is very close to the actual distance (8.18 A) obtained from the molecular structure in PDB.  相似文献   

11.
Chemical ionization (CI), field ionization (FI) and field desorption (FD) are sometimes preferable to electron impact (EI) mass spectrometry as methods for obtaining abundant high-mass ions from lipids. FD often provides mass spectral information which is unobtainable by other methods, and is the best method for obtaining molecular weight information. Fragment ions are observed in the spectra from all the ionization methods, which provide structural information complementing that obtainable from an EI spectrum. Using CI, high-mass ions carrying a large proportion of the total ionization current can be monitored by selected ion monitoring, resulting in enhanced sensitivity for quantitative studies in some cases.  相似文献   

12.
13.
Mass spectrometry-based proteomics is a maturing discipline of biologic research that is experiencing substantial growth. Instrumentation has steadily improved over time with the advent of faster and more sensitive instruments collecting ever larger data files. Consequently, the computational process of matching a peptide fragmentation pattern to its sequence, traditionally accomplished by sequence database searching and more recently also by spectral library searching, has become a bottleneck in many mass spectrometry experiments. In both of these methods, the main rate-limiting step is the comparison of an acquired spectrum with all potential matches from a spectral library or sequence database. This is a highly parallelizable process because the core computational element can be represented as a simple but arithmetically intense multiplication of two vectors. In this paper, we present a proof of concept project taking advantage of the massively parallel computing available on graphics processing units (GPUs) to distribute and accelerate the process of spectral assignment using spectral library searching. This program, which we have named FastPaSS (for Fast Parallelized Spectral Searching), is implemented in CUDA (Compute Unified Device Architecture) from NVIDIA, which allows direct access to the processors in an NVIDIA GPU. Our efforts demonstrate the feasibility of GPU computing for spectral assignment, through implementation of the validated spectral searching algorithm SpectraST in the CUDA environment.  相似文献   

14.
Protein kinases constitute a large superfamily of enzymes with key regulatory functions in nearly all signal transmission processes of eukaryotic cells. However, due to their relatively low abundance compared with the vast majority of cellular proteins, currently available proteomics techniques do not permit the comprehensive biochemical characterization of protein kinases. To address these limitations, we have developed a prefractionation strategy that uses a combination of immobilized low molecular weight inhibitors for the selective affinity capture of protein kinases. This approach resulted in the direct purification of cell type-specific sets of expressed protein kinases, and more than 140 different members of this enzyme family could be detected by LC-MS/MS. Furthermore the enrichment technique combined with phosphopeptide fractionation led to the identification of more than 200 different phosphorylation sites on protein kinases, which often remain occluded in global phosphoproteome analysis. As the phosphorylation states of protein kinases can provide a readout for the signaling activities within a cellular system, kinase-selective phosphoproteomics based on the procedures described here has the potential to become an important tool in signal transduction analysis.  相似文献   

15.
Tandem mass spectrometry (MS/MS) combined with database searching is currently the most widely used method for high-throughput peptide and protein identification. Many different algorithms, scoring criteria, and statistical models have been used to identify peptides and proteins in complex biological samples, and many studies, including our own, describe the accuracy of these identifications, using at best generic terms such as "high confidence." False positive identification rates for these criteria can vary substantially with changing organisms under study, growth conditions, sequence databases, experimental protocols, and instrumentation; therefore, study-specific methods are needed to estimate the accuracy (false positive rates) of these peptide and protein identifications. We present and evaluate methods for estimating false positive identification rates based on searches of randomized databases (reversed and reshuffled). We examine the use of separate searches of a forward then a randomized database and combined searches of a randomized database appended to a forward sequence database. Estimated error rates from randomized database searches are first compared against actual error rates from MS/MS runs of known protein standards. These methods are then applied to biological samples of the model microorganism Shewanella oneidensis strain MR-1. Based on the results obtained in this study, we recommend the use of use of combined searches of a reshuffled database appended to a forward sequence database as a means providing quantitative estimates of false positive identification rates of peptides and proteins. This will allow researchers to set criteria and thresholds to achieve a desired error rate and provide the scientific community with direct and quantifiable measures of peptide and protein identification accuracy as opposed to vague assessments such as "high confidence."  相似文献   

16.

Background

As a promising way to transform medicine, mass spectrometry based proteomics technologies have seen a great progress in identifying disease biomarkers for clinical diagnosis and prognosis. However, there is a lack of effective feature selection methods that are able to capture essential data behaviors to achieve clinical level disease diagnosis. Moreover, it faces a challenge from data reproducibility, which means that no two independent studies have been found to produce same proteomic patterns. Such reproducibility issue causes the identified biomarker patterns to lose repeatability and prevents it from real clinical usage.

Methods

In this work, we propose a novel machine-learning algorithm: derivative component analysis (DCA) for high-dimensional mass spectral proteomic profiles. As an implicit feature selection algorithm, derivative component analysis examines input proteomics data in a multi-resolution approach by seeking its derivatives to capture latent data characteristics and conduct de-noising. We further demonstrate DCA's advantages in disease diagnosis by viewing input proteomics data as a profile biomarker via integrating it with support vector machines to tackle the reproducibility issue, besides comparing it with state-of-the-art peers.

Results

Our results show that high-dimensional proteomics data are actually linearly separable under proposed derivative component analysis (DCA). As a novel multi-resolution feature selection algorithm, DCA not only overcomes the weakness of the traditional methods in subtle data behavior discovery, but also suggests an effective resolution to overcoming proteomics data's reproducibility problem and provides new techniques and insights in translational bioinformatics and machine learning. The DCA-based profile biomarker diagnosis makes clinical level diagnostic performances reproducible across different proteomic data, which is more robust and systematic than the existing biomarker discovery based diagnosis.

Conclusions

Our findings demonstrate the feasibility and power of the proposed DCA-based profile biomarker diagnosis in achieving high sensitivity and conquering the data reproducibility issue in serum proteomics. Furthermore, our proposed derivative component analysis suggests the subtle data characteristics gleaning and de-noising are essential in separating true signals from red herrings for high-dimensional proteomic profiles, which can be more important than the conventional feature selection or dimension reduction. In particular, our profile biomarker diagnosis can be generalized to other omics data for derivative component analysis (DCA)'s nature of generic data analysis.
  相似文献   

17.
To interpret LC-MS/MS data in proteomics, most popular protein identification algorithms primarily use predicted fragment m/z values to assign peptide sequences to fragmentation spectra. The intensity information is often undervalued, because it is not as easy to predict and incorporate into algorithms. Nevertheless, the use of intensity to assist peptide identification is an attractive prospect and can potentially improve the confidence of matches and generate more identifications. On the basis of our previously reported study of fragmentation intensity patterns, we developed a protein identification algorithm, SeQuence IDentfication (SQID), that makes use of the coarse intensity from a statistical analysis. The scoring scheme was validated by comparing with Sequest and X!Tandem using three data sets, and the results indicate an improvement in the number of identified peptides, including unique peptides that are not identified by Sequest or X!Tandem. The software and source code are available under the GNU GPL license at http://quiz2.chem.arizona.edu/wysocki/bioinformatics.htm.  相似文献   

18.
Although generating large amounts of proteomic data using tandem mass spectrometry has become routine, there is currently no single set of comprehensive tools for the rigorous analysis of tandem mass spectrometry results given the large variety of possible experimental aims. Currently available applications are typically designed for displaying proteins and posttranslational modifications from the point of view of the mass spectrometrist and are not versatile enough to allow investigators to develop biological models of protein function, protein structure, or cell state. In addition, storage and dissemination of mass spectrometry-based proteomic data are problems facing the scientific community. To address these issues, we have developed a relational database model that efficiently stores and manages large amounts of tandem mass spectrometry results. We have developed an integrated suite of multifunctional analysis software for interpreting, comparing, and displaying these results. Our system, Bioinformatic Graphical Comparative Analysis Tools (BIGCAT), allows sophisticated analysis of tandem mass spectrometry results in a biologically intuitive format and provides a solution to many data storage and dissemination issues.  相似文献   

19.
Evaluation of cellular processes and their changes at the level of protein expression and post-translational modifications may allow identification of novel proteins and the mechanisms involved in pathogenic processes. However, the number of proteins and, after tryptic digestion, of peptides from a single cell can be tremendously high. Separation and analysis of such complex peptide mixtures can be performed using multidimensional separation techniques such as two-dimensional gel electrophoresis or two-dimensional-high-performance liquid chromatography (2-D-HPLC). The aim of this work was to establish a fully automated on-line 2-D-HPLC separation method with column switching for the separation of complex tryptic digests. A model mixture of five proteins as well as a nuclear matrix protein sample were digested with trypsin and separated using a strong cation exchange (SCX) column in the first dimension and nano reversed phase in the second dimension. Separated peptides were detected using an ion trap mass spectrometer. The advantages of this new fully automated method are rapid sample loading, the possibility of injecting large volumes and no introduction of salt into the mass spectrometer. Furthermore, column switching allows the independent control and optimization of the two dimensions independently.  相似文献   

20.
The goal of many shotgun proteomics experiments is to determine the protein complement of a complex biological mixture. For many mixtures, most methodological approaches fall significantly short of this goal. Existing solutions to this problem typically subdivide the task into two stages: first identifying a collection of peptides with a low false discovery rate and then inferring from the peptides a corresponding set of proteins. In contrast, we formulate the protein identification problem as a single optimization problem, which we solve using machine learning methods. This approach is motivated by the observation that the peptide and protein level tasks are cooperative, and the solution to each can be improved by using information about the solution to the other. The resulting algorithm directly controls the relevant error rate, can incorporate a wide variety of evidence and, for complex samples, provides 18-34% more protein identifications than the current state of the art approaches.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号