首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
High-throughput protein analysis by tandem mass spectrometry produces anywhere from thousands to millions of spectra that are being used for peptide and protein identifications. Though each spectrum corresponds only to one charged peptide (ion) state, repetitive database searches of multiple charge states are typically conducted since the resolution of many common mass spectrometers is not sufficient to determine the charge state. The resulting database searches are both error-prone and time-consuming. We describe a straightforward, accurate approach on charge state estimation (CHASTE). CHASTE relies on fragment ion peak distributions, and by using reliable logistic regression models, combines different measurements to improve its accuracy. CHASTE's performance has been validated on data sets, comprised of known peptide dissociation spectra, obtained by replicate analyses of our earlier developed protein standard mixture using ion trap mass spectrometers at different laboratories. CHASTE was able to reduce number of needed database searches by at least 60% and the number of redundant searches by at least 90% virtually without any informational loss. This greatly alleviates one of the major bottlenecks in high throughput peptide and protein identifications. Thresholds and parameter estimates can be tailored to specific analysis situations, pipelines, and instrumentations. CHASTE was implemented in Java GUI-based and command-line-based interfaces.  相似文献   

2.
We report the results of our work to facilitate protein identification using tandem mass spectra and protein sequence databases. We describe a parallel version of SEQUEST (SEQUEST-PVM) that is tolerant toward arithmetic exceptions. The changes we report effectively separate search processes on slave nodes from each other. Therefore, if one of the slave nodes drops out of the cluster due to an error, the rest of the cluster will carry the search process to the end. SEQUEST has been widely used for protein identifications. The modifications made to the code improve its stability and effectiveness in a high-throughput production environment. We evaluate the overhead associated with the parallelization of SEQUEST. A prior version of software to preprocess LC/MS/MS data attempted to differentiate the charge states of ions. Singly charged ions can be accurately identified, but the software was unable to reliably differentiate tandem mass spectra of +2 and +3 charge states. We have designed and implemented a computational approach to narrow charge states of precursor ions from nominal resolution ion-trap tandem mass spectra. The preprocessing code, 2to3, determines the charge state of the precursor ion using its mass-to-charge ratio (m/z) and fragment ions contained in the tandem mass spectrum. For each possible charge state the program calculates the expected fragment ions that account for precursor ion m/z values. If any one of the numbers is less than an empirically determined threshold value then the spectrum corresponding to that charge state is removed. If both numbers are higher than the threshold value then +2 and +3 copies of the spectrum are kept. We present the comparison of results from protein identification experiments with and without using 2 to 3. It is shown that by determining the charge state and eliminating poor quality spectra 2to3 decreases the number of spectral files to be searched without affecting the search results. The decrease reduces computer requirements and researcher efforts for analysis of the results.  相似文献   

3.
Shotgun proteomics using mass spectrometry is a powerful method for protein identification but suffers limited sensitivity in complex samples. Integrating peptide identifications from multiple database search engines is a promising strategy to increase the number of peptide identifications and reduce the volume of unassigned tandem mass spectra. Existing methods pool statistical significance scores such as p-values or posterior probabilities of peptide-spectrum matches (PSMs) from multiple search engines after high scoring peptides have been assigned to spectra, but these methods lack reliable control of identification error rates as data are integrated from different search engines. We developed a statistically coherent method for integrative analysis, termed MSblender. MSblender converts raw search scores from search engines into a probability score for every possible PSM and properly accounts for the correlation between search scores. The method reliably estimates false discovery rates and identifies more PSMs than any single search engine at the same false discovery rate. Increased identifications increment spectral counts for most proteins and allow quantification of proteins that would not have been quantified by individual search engines. We also demonstrate that enhanced quantification contributes to improve sensitivity in differential expression analyses.  相似文献   

4.
Lin W  Wu FX  Shi J  Ding J  Zhang W 《Proteomics》2011,11(19):3773-3778
In our recent work on denoising, a linear combination of five features was used to adjust the peak intensities in tandem mass spectra. Although the method showed a promise, the coefficients (weights) of the linear combination were fixed and determined empirically. In this paper, we proposed an adaptive approach for estimating these weights. The proposed approach: (i) calculates the score for each peak in a data set with the previous empirically determined weights, (ii) selects the training data set based on the scores of peaks, (iii) applies the linear discriminant analysis to the training data set and takes the solution of linear discriminant analysis as the new weights, (iv) calculates the score again with the new weights, (v) repeats (ii)-(iv) until the weights have no significant change. After getting the final weights, the proposed approach follows the previous methods. The proposed approach was applied to two tandem mass spectra data sets: ISB (with low resolution) and TOV-Q (with high resolution) to evaluate its performance. The results show that about 66% of peaks (likely noise peaks) can be removed and that the number of peptides identified by MASCOT increases by 14 and 23.4% for ISB and TOV-Q data set, respectively, compared to the previous work.  相似文献   

5.
High throughput identification of peptides in databases from tandem mass spectrometry data is a key technique in modern proteomics. Common approaches to interpret large scale peptide identification results are based on the statistical analysis of average score distributions, which are constructed from the set of best scores produced by large collections of MS/MS spectra by using searching engines such as SEQUEST. Other approaches calculate individual peptide identification probabilities on the basis of theoretical models or from single-spectrum score distributions constructed by the set of scores produced by each MS/MS spectrum. In this work, we study the mathematical properties of average SEQUEST score distributions by introducing the concept of spectrum quality and expressing these average distributions as compositions of single-spectrum distributions. We predict and demonstrate in the practice that average score distributions are dominated by the quality distribution in the spectra collection, except in the low probability region, where it is possible to predict the dependence of average probability on database size. Our analysis leads to a novel indicator, the probability ratio, which takes optimally into account the statistical information provided by the first and second best scores. The probability ratio is a non-parametric and robust indicator that makes spectra classification according to parameters such as charge state unnecessary and allows a peptide identification performance, on the basis of false discovery rates, that is better than that obtained by other empirical statistical approaches. The probability ratio also compares favorably with statistical probability indicators obtained by the construction of single-spectrum SEQUEST score distributions. These results make the robustness, conceptual simplicity, and ease of automation of the probability ratio algorithm a very attractive alternative to determine peptide identification confidences and error rates in high throughput experiments.  相似文献   

6.
Spectral similarity is used as a proxy for structural similarity in many tandem mass spectrometry (MS/MS) based metabolomics analyses such as library matching and molecular networking. Although weaknesses in the relationship between spectral similarity scores and the true structural similarities have been described, little development of alternative scores has been undertaken. Here, we introduce Spec2Vec, a novel spectral similarity score inspired by a natural language processing algorithm—Word2Vec. Spec2Vec learns fragmental relationships within a large set of spectral data to derive abstract spectral embeddings that can be used to assess spectral similarities. Using data derived from GNPS MS/MS libraries including spectra for nearly 13,000 unique molecules, we show how Spec2Vec scores correlate better with structural similarity than cosine-based scores. We demonstrate the advantages of Spec2Vec in library matching and molecular networking. Spec2Vec is computationally more scalable allowing structural analogue searches in large databases within seconds.  相似文献   

7.
The structural elucidation of small molecules using mass spectrometry plays an important role in modern life sciences and bioanalytical approaches. This review covers different soft and hard ionization techniques and figures of merit for modern mass spectrometers, such as mass resolving power, mass accuracy, isotopic abundance accuracy, accurate mass multiple-stage MS(n) capability, as well as hybrid mass spectrometric and orthogonal chromatographic approaches. The latter part discusses mass spectral data handling strategies, which includes background and noise subtraction, adduct formation and detection, charge state determination, accurate mass measurements, elemental composition determinations, and complex data-dependent setups with ion maps and ion trees. The importance of mass spectral library search algorithms for tandem mass spectra and multiple-stage MS(n) mass spectra as well as mass spectral tree libraries that combine multiple-stage mass spectra are outlined. The successive chapter discusses mass spectral fragmentation pathways, biotransformation reactions and drug metabolism studies, the mass spectral simulation and generation of in silico mass spectra, expert systems for mass spectral interpretation, and the use of computational chemistry to explain gas-phase phenomena. A single chapter discusses data handling for hyphenated approaches including mass spectral deconvolution for clean mass spectra, cheminformatics approaches and structure retention relationships, and retention index predictions for gas and liquid chromatography. The last section reviews the current state of electronic data sharing of mass spectra and discusses the importance of software development for the advancement of structure elucidation of small molecules. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s12566-010-0015-9) contains supplementary material, which is available to authorized users.  相似文献   

8.
数据非依赖采集(DIA)是蛋白质组学领域近年来快速发展的质谱采集技术,其通过无偏碎裂隔离窗口内的所有母离子采集二级谱图,理论上可实现蛋白质样品的深度覆盖,同时具有高通量、高重现性和高灵敏度的优点。现有的DIA数据采集方法可以分为全窗口碎裂方法、隔离窗口序列碎裂方法和四维DIA数据采集方法(4D-DIA)3大类。针对DIA数据的不同特点,主要数据解析方法包括谱库搜索方法、蛋白质序列库直接搜索方法、伪二级谱图鉴定方法和从头测序方法4大类。解析得到的肽段鉴定结果需要进行可信度评估,包括使用机器学习方法的重排序和对报告结果集合的假发现率估计两个步骤,实现对数据解析结果的质控。本文对DIA数据的采集方法、数据解析方法及软件和鉴定结果可信度评估方法进行了整理和综述,并展望了未来的发展方向。  相似文献   

9.

Background

In a single proteomic project, tandem mass spectrometers can produce hundreds of millions of tandem mass spectra. However, majority of tandem mass spectra are of poor quality, it wastes time to search them for peptides. Therefore, the quality assessment (before database search) is very useful in the pipeline of protein identification via tandem mass spectra, especially on the reduction of searching time and the decrease of false identifications. Most existing methods for quality assessment are supervised machine learning methods based on a number of features which describe the quality of tandem mass spectra. These methods need the training datasets with knowing the quality of all spectra, which are usually unavailable for the new datasets.

Results

This study proposes an unsupervised machine learning method for quality assessment of tandem mass spectra without any training dataset. This proposed method estimates the conditional probabilities of spectra being high quality from the quality assessments based on individual features. The probabilities are estimated through a constraint optimization problem. An efficient algorithm is developed to solve the constraint optimization problem and is proved to be convergent. Experimental results on two datasets illustrate that if we search only tandem spectra with the high quality determined by the proposed method, we can save about 56 % and 62% of database searching time while losing only a small amount of high-quality spectra.

Conclusions

Results indicate that the proposed method has a good performance for the quality assessment of tandem mass spectra and the way we estimate the conditional probabilities is effective.
  相似文献   

10.
The ability to acquire structurally informative daughter ion spectra for individual peptides undergoing separation and analysis by continuous flow fast atom bombardment (CF FAB) is demonstrated. To illustrate the potential of this methodology, tryptic and chymotryptic digests of the 29-residue peptide glucagon were analyzed by CF FAB using mass spectrometric and tandem mass spectrometric detection in consecutive analyses. Daughter ion spectra were recorded using B/E linked scans for the major hydrolysis products observed by liquid chromatography/mass spectrometry. The peptide mixtures were separated by gradient capillary high-performance liquid chromatography with the FAB matrix being added post-column using a coaxial flow interface between the column and flow probe. The entire effluent (3 microl min(-1)) was sampled by the mass spectrometer. Results obtained using less than 300 pmol of digested glucagon indicated several advantages to tandem mass spectrometric detection including the ability to confirm identities for products of enzymatic digestion and the potential use of this method for tandem sequence analysis of peptide mixtures.  相似文献   

11.
Two-dimensional gel electrophoresis-separated and excised haptoglobin alpha2-chain protein spots were subjected to in-gel digestion with trypsin. Previously unassigned peptide ion signals observed in mass spectrometric fingerprinting experiments were sequenced using the matrix-assisted laser desorption/ionization-quadrupole ion trap-time of flight (MALDI-QIT-TOF) mass spectrometer and showed that the haptoglobin alpha-chain derivative under study was cleaved by trypsin unspecifically. Abundant cleavages occurred C-terminal to histidine residues at H23, H28, and H87. In addition, mild acidic hydrolysis leading to cleavage after aspartic acid residues at D13 was observed. The uninterpreted tandem mass spectrometry (MS/MS) spectrum of the peptide with ion signal at 2620.19 was submitted to database search and yielded the identification of the corresponding peptide sequence comprising amino acids (aa) aa65-87 from the haptoglobin alpha-chain protein. Also, the presence of a mixture of two tryptic peptides (mass to charge ratio m/z 1708.8; aa40-54, and aa99-113, respectively), that is caused by a tiny sequence variation between the two repeats in the haptoglobin alpha2-chain protein was resolved by MS/MS fragmentation using the MALDI-QIT-TOF mass spectrometer instrument. Advantageous features such as (i) easy parent ion creation, (ii) minimal sample consumption, and (iii) real collision induced dissociation conditions, were combined successfully to determine the amino acid sequences of the previously unassigned peptides. Hence, the novel mass spectrometric sequencing method applied here has proven effective for identification of distinct molecular protein structures.  相似文献   

12.
Correct phosphorylation site assignment is a critical aspect of phosphoproteomic analysis. Large-scale phosphopeptide data sets that are generated through liquid chromatography-coupled tandem mass spectrometry (LC-MS/MS) analysis often contain hundreds or thousands of phosphorylation sites that require validation. To this end, we have created PhosphoScore, an open-source assignment program that is compatible with phosphopeptide data from multiple MS levels (MS(n)). The algorithm takes into account both the match quality and normalized intensity of observed spectral peaks compared to a theoretical spectrum. PhosphoScore produced >95% correct MS(2) assignments from known synthetic data, > 98% agreement with an established MS(2) assignment algorithm (Ascore), and >92% agreement with visual inspection of MS(3) and MS(4) spectra.  相似文献   

13.
Peptide identification by tandem mass spectrometry is the dominant proteomics workflow for protein characterization in complex samples. The peptide fragmentation spectra generated by these workflows exhibit characteristic fragmentation patterns that can be used to identify the peptide. In other fields, where the compounds of interest do not have the convenient linear structure of peptides, fragmentation spectra are identified by comparing new spectra with libraries of identified spectra, an approach called spectral matching. In contrast to sequence-based tandem mass spectrometry search engines used for peptides, spectral matching can make use of the intensities of fragment peaks in library spectra to assess the quality of a match. We evaluate a hidden Markov model approach (HMMatch) to spectral matching, in which many examples of a peptide's fragmentation spectrum are summarized in a generative probabilistic model that captures the consensus and variation of each peak's intensity. We demonstrate that HMMatch has good specificity and superior sensitivity, compared to sequence database search engines such as X!Tandem. HMMatch achieves good results from relatively few training spectra, is fast to train, and can evaluate many spectra per second. A statistical significance model permits HMMatch scores to be compared with each other, and with other peptide identification tools, on a unified scale. HMMatch shows a similar degree of concordance with X!Tandem, Mascot, and NIST's MS Search, as they do with each other, suggesting that each tool can assign peptides to spectra that the others miss. Finally, we show that it is possible to extrapolate HMMatch models beyond a single peptide's training spectra to the spectra of related peptides, expanding the application of spectral matching techniques beyond the set of peptides previously observed.  相似文献   

14.
A computer algorithm is described that utilizes both Edman and mass spectrometric data for simultaneous determination of the amino acid sequences of several peptides in a mixture. Gas phase sequencing of a peptide mixture results in a list of observed amino acids for each cycle of Edman degradation, which by itself may not be informative and typically requires reanalysis following additional chromatographic steps. Tandem mass spectrometry, on the other hand, has a proven ability to analyze sequences of peptides present in mixtures. However, mass spectrometric data may lack a complete set of sequence-defining fragment ions, so that more than one possible sequence may account for the observed fragment ions. A combination of the two types of data reduces the ambiguity inherent in each. The algorithm first utilizes the Edman data to determine all hypothetical sequences with a calculated mass equal to the observed mass of one of the peptides present in the mixture. These sequences are then assigned figures of merit according to how well each of them accounts for the fragment ions in the tandem mass spectrum of that peptide. The program was tested on tryptic and chymotryptic peptides from hen lysozyme, and the results are compared with those of another computer program that uses only mass spectral data for peptide sequencing. In order to assess the utility of this method the program is tested using simulated mixtures of varying complexity and tandem mass spectra of varying quality.  相似文献   

15.
In shotgun proteomics, protein identification by tandem mass spectrometry relies on bioinformatics tools. Despite recent improvements in identification algorithms, a significant number of high quality spectra remain unidentified for various reasons. Here we present ScanRanker, an open-source tool that evaluates the quality of tandem mass spectra via sequence tagging with reliable performance in data from different instruments. The superior performance of ScanRanker enables it not only to find unassigned high quality spectra that evade identification through database search but also to select spectra for de novo sequencing and cross-linking analysis. In addition, we demonstrate that the distribution of ScanRanker scores predicts the richness of identifiable spectra among multiple LC-MS/MS runs in an experiment, and ScanRanker scores assist the process of peptide assignment validation to increase confident spectrum identifications. The source code and executable versions of ScanRanker are available from http://fenchurch.mc.vanderbilt.edu.  相似文献   

16.
Despite a recent surge of interest in database-independent peptide identifications, accurate de novo peptide sequencing remains an elusive goal. While the recently introduced spectral network approach resulted in accurate peptide sequencing in low-complexity samples, its success depends on the chance of presence of spectra from overlapping peptides. On the other hand, while multistage mass spectrometry (collecting multiple MS 3 spectra from each MS 2 spectrum) can be applied to all spectra in a complex sample, there are currently no software tools for de novo peptide sequencing by multistage mass spectrometry. We describe a rigorous probabilistic framework for analyzing spectra of overlapping peptides and show how to apply it for multistage mass spectrometry. Our software results in both accurate de novo peptide sequencing from multistage mass spectra (despite the inferior quality of MS 3 spectra) and improved interpretation of spectral networks. We further study the problem of de novo peptide sequencing with accurate parent mass (but inaccurate fragment masses), the protocol that may soon become the dominant mode of spectral acquisition. Most existing peptide sequencing algorithms (based on the spectrum graph approach) do not track the accurate parent mass and are thus not equipped for solving this problem. We describe a de novo peptide sequencing algorithm aimed at this experimental protocol and show that it improves the sequencing accuracy on both tandem and multistage mass spectrometry.  相似文献   

17.
Matrix-assisted laser desorption/ionization-mass spectrometry (MALDI-MS) is the pre-eminent technique for mass mapping of glycans. In order to make this technique practical for high-throughput screening, reliable automatic methods of annotating peaks must be devised. We describe an algorithm called Cartoonist that labels peaks in MALDI spectra of permethylated N-glycans with cartoons which represent the most plausible glycans consistent with the peak masses and the types of glycans being analyzed. There are three main parts to Cartoonist. (i) It selects annotations from a library of biosynthetically plausible cartoons. The library we currently use has about 2800 cartoons, but was constructed using only about 300 archetype cartoons entered by hand. (ii) It determines the precision and calibration of the machine used to generate the spectrum. It does this automatically based on the spectrum itself. (iii) It assigns a confidence score to each annotation. In particular, rather than making a binary yes/no decision when annotating a peak, it makes all plausible annotations and associates them with scores indicating the probability that they are correct.  相似文献   

18.
Curie-point pyrolysis (Py)-mass spectrometry has been used to differentiate 19 microorganisms by Gram type on the basis of the methyl esters of their fatty acid distribution. The mass spectra of gram-negative microorganisms were characterized by the presence of palmitoleic acid (C(inf16:1)) and oleic acid (C(inf18:1)), as well as a higher abundance of palmitic acid (C(inf16:0)) than pentadecanoic acid (C(inf15:0)). For gram-positive microorganisms, a signal of branched C(inf15:0) (isoC(inf15:0) and/or anteisoC(inf15:0)) more intense than that of palmitic acid was observed in the mass spectra. Principal components analysis of these mass spectral data segregated the microorganisms investigated in this study into three discrete clusters that correlated to their gram reactions and pathogenicities. Further tandem mass spectrometric analysis demonstrated that the nature of the C(inf15:0) fatty acid isomer (branched or normal) present in the mass spectrum of each microorganism was important for achieving the classification into three clusters.  相似文献   

19.
Introduction  The tandem mass spectrometer is a powerful tool with which to generate peptide (tandem) mass spectrum data for the analysis of complex biological protein mixtures in genomic-related disease cell lines. However, the majority of experimental tandem mass spectra cannot be interpreted by any database search engines. One of the main reasons this happens is that majority of experimental spectra are of quality too poor to be interpretable. Interpreting these “un-interpretable” spectra is a waste of time. Therefore, it is worthwhile to determine the quality of mass spectra before any interpretation. Objectives  This paper proposes an approach to classifying tandem spectra into two groups: one with high quality and one with poor quality. Methods  The proposed approach has two steps. First, each spectrum is mapped to a feature vector which describes the quality of the spectrum. Then, a weighted K-means clustering method is applied in order to classify the tandem mass spectra. Results and Conclusion  Computational experiments illustrate that one cluster contains the majority of the high-quality spectra, while the other contains the majority of the poor-quality spectra. This result indicates that if we just search the spectra in the high-quality cluster, we can save the time for searching the majority of poor-quality spectra while losing a minimal amount of high-quality spectra. The software created for this work is available upon request.  相似文献   

20.
Bandeira N 《BioTechniques》2007,42(6):687, 689, 691 passim
Significant technological advances have accelerated high-throughput proteomics to the automated generation of millions of tandem mass spectra on a daily basis. In such a setup, the desire for greater sequence coverage combines with standard experimental procedures to commonly yield multiple tandem mass spectra from overlapping peptides-typical observations include peptides differing by one or two terminal amino acids and spectra from modified and unmodified variants of the same peptides. In a departure from the traditional spectrum identification algorithms that analyze each tandem mass spectrum in isolation, spectral networks define a new computational approach that instead finds and simultaneously interprets sets of spectra from overlapping peptides. In shotgun protein sequencing, spectral networks capitalize on the redundant sequence information in the aligned spectra to deliver the longest and most accurate de novo sequences ever reported for ion trap data. Also, by combining spectra from multiple modified and unmodified variants of the same peptides, spectral networks are able to bypass the dominant guess/confirm approach to the identification of posttranslational modifications and alternatively discover modifications and highly modified peptides directly from experimental data. Open-source implementations of these algorithms may be downloaded from peptide.ucsd.edu.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号