首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We derive the optimal number of peaks (defined as the minimum number that provides the required efficiency of spectra identification) in the theoretical spectra as a function of (i) the experimental accuracy, sigma, of the measured ratio m/z; (ii) experimental spectrum density; (iii) size of the database; (iv) number of peaks in the theoretical spectra; and (v) types of ions that the peaks represent. We show that if theoretical spectra are constructed including b and y ions alone, then for sigma = 0.5, which is typical for high-throughput data, peptide chains of eight amino acids or longer can be identified based on the positions of peaks alone, at a rate of false identification below 1%. To discriminate between shorter peptides, additional (e.g., intensity-inferred) information is necessary. We derive the dependence of the probability of false identification on the number of peaks in the theoretical spectra and on the types of ions that the peaks represent. Our results suggest that the class of mass spectrum identification problems, for which more elaborate development of fragmentation rules (such as intensity model) is required, can be reduced to the problems that involve homologous peptides.  相似文献   

2.
3.
4.
Matrix-assisted laser desorption/ionization-mass spectrometry (MALDI-MS) is the pre-eminent technique for mass mapping of glycans. In order to make this technique practical for high-throughput screening, reliable automatic methods of annotating peaks must be devised. We describe an algorithm called Cartoonist that labels peaks in MALDI spectra of permethylated N-glycans with cartoons which represent the most plausible glycans consistent with the peak masses and the types of glycans being analyzed. There are three main parts to Cartoonist. (i) It selects annotations from a library of biosynthetically plausible cartoons. The library we currently use has about 2800 cartoons, but was constructed using only about 300 archetype cartoons entered by hand. (ii) It determines the precision and calibration of the machine used to generate the spectrum. It does this automatically based on the spectrum itself. (iii) It assigns a confidence score to each annotation. In particular, rather than making a binary yes/no decision when annotating a peak, it makes all plausible annotations and associates them with scores indicating the probability that they are correct.  相似文献   

5.
In proteomics, tandem mass spectrometry is the key technology for peptide sequencing. However, partially due to the deficiency of peptide identification software, a large portion of the tandem mass spectra are discarded in almost all proteomics centers because they are not interpretable. The problem is more acute with the lower quality data from low end but more popular devices such as the ion trap instruments. In order to deal with the noisy and low quality data, this paper develops a systematic machine learning approach to construct a robust linear scoring function, whose coefficients are determined by a linear programming. A prototype, PRIMA, was implemented. When tested with large benchmarks of varying qualities, PRIMA consistently has higher accuracy than commonly used software MASCOT, SEQUEST and X! Tandem.  相似文献   

6.
Hu Y  Li Y  Lam H 《Proteomics》2011,11(24):4702-4711
Spectral library searching is a promising alternative to sequence database searching in peptide identification from MS/MS spectra. The key advantage of spectral library searching is the utilization of more spectral features to improve score discrimination between good and bad matches, and hence sensitivity. However, the coverage of reference spectral library is limited by current experimental and computational methods. We developed a computational approach to expand the coverage of spectral libraries with semi-empirical spectra predicted from perturbing known spectra of similar sequences, such as those with single amino acid substitutions. We hypothesized that the peptide of similar sequences should produce similar fragmentation patterns, at least in most cases. Our results confirm our hypothesis and specify when this approach can be applied. In actual spectral searching of real data sets, the sensitivity advantage of spectral library searching over sequence database searching can be mostly retained even when all real spectra are replaced by semi-empirical ones. We demonstrated the applicability of this approach by detecting several known non-synonymous single-nucleotide polymorphisms in three large human data sets by spectral searching.  相似文献   

7.
8.
9.
We report the results of our work to facilitate protein identification using tandem mass spectra and protein sequence databases. We describe a parallel version of SEQUEST (SEQUEST-PVM) that is tolerant toward arithmetic exceptions. The changes we report effectively separate search processes on slave nodes from each other. Therefore, if one of the slave nodes drops out of the cluster due to an error, the rest of the cluster will carry the search process to the end. SEQUEST has been widely used for protein identifications. The modifications made to the code improve its stability and effectiveness in a high-throughput production environment. We evaluate the overhead associated with the parallelization of SEQUEST. A prior version of software to preprocess LC/MS/MS data attempted to differentiate the charge states of ions. Singly charged ions can be accurately identified, but the software was unable to reliably differentiate tandem mass spectra of +2 and +3 charge states. We have designed and implemented a computational approach to narrow charge states of precursor ions from nominal resolution ion-trap tandem mass spectra. The preprocessing code, 2to3, determines the charge state of the precursor ion using its mass-to-charge ratio (m/z) and fragment ions contained in the tandem mass spectrum. For each possible charge state the program calculates the expected fragment ions that account for precursor ion m/z values. If any one of the numbers is less than an empirically determined threshold value then the spectrum corresponding to that charge state is removed. If both numbers are higher than the threshold value then +2 and +3 copies of the spectrum are kept. We present the comparison of results from protein identification experiments with and without using 2 to 3. It is shown that by determining the charge state and eliminating poor quality spectra 2to3 decreases the number of spectral files to be searched without affecting the search results. The decrease reduces computer requirements and researcher efforts for analysis of the results.  相似文献   

10.
Enrichment is essential for phosphoproteome analysis because phosphorylated proteins are usually present in cells in low abundance. Recently, titanium dioxide (TiO2) has been demonstrated to enrich phosphopeptides from simple peptide mixtures with high specificity; however, the technology has not been optimized. In the present study, significant non-specific bindings were observed when proteome samples were applied to TiO2 columns. Column wash with an NH4Glu solution after loading peptide mixtures significantly increased the efficiency of TiO2 phosphopeptide enrichment with a recovery of up to 84%. Also, for proteome samples, more than a 2-fold increase in unique phosphopeptide identifications has been achieved. The use of NH4Glu for a TiO2 column wash does not significantly reduce the phosphopeptide recovery. A total of 858 phosphopeptides corresponding to 1034 distinct phosphosites has been identified from HeLa cells using the improved TiO2 enrichment procedure in combination with data-dependent neutral loss nano-RPLC-MS2-MS3 analysis. While 41 and 35% of the phosphopeptides were identified only by MS2 and MS3, respectively, 24% was identified by both MS2 and MS3. Cross-validation of the phosphopeptide assignment by MS2 and MS3 scans resulted in the highest confidence in identification (99.5%). Many phosphosites identified in this study appear to be novel, including sites from antigen Ki-67, nucleolar phosphoprotein p130, and Treacle protein. The study also indicates that evaluation of confidence levels for phosphopeptide identification via the reversed sequence database searching strategy might underestimate the false positive rate.  相似文献   

11.
A protein precipitation, liquid chromatography/tandem mass spectrometry (LC/MS/MS) method has been developed and validated for the simultaneous determination of valganciclovir and its active metabolite ganciclovir in human plasma. The solvent system also served as a protein precipitation reagent. The chromatographic separation was achieved on an Aquasil C18 column (50 mm x 2.1mm, 5 microm). A linear gradient mobile phase between 0.02% formic acid and methanol was used. Detection was by positive ion electrospray tandem mass spectrometry on a Sciex API3000. The standard curves, which ranged from 4 to 512 ng/mL for valganciclovir and from 0.1 to 12.8 microg/mL for ganciclovir, were fitted to a 1/x weighted quadratic regression model. The method was proved to be accurate, specific and sensitive enough and was successfully applied to a pharmacokinetic study.  相似文献   

12.
We report an isotope labeling shotgun proteome analysis strategy to validate the spectrum-to-sequence assignments generated by using sequence-database searching for the construction of a more reliable MS/MS spectral library. This strategy is demonstrated in the analysis of the E. coli K12 proteome. In the workflow, E. coli cells were cultured in normal and (15)N-enriched media. The differentially labeled proteins from the cell extracts were subjected to trypsin digestion and two-dimensional liquid chromatography quadrupole time-of-flight tandem mass spectrometry (2D-LC QTOF MS/MS) analysis. The MS/MS spectra of the two samples were individually searched using Mascot against the E. coli proteome database to generate lists of peptide sequence matches. The two data sets were compared by overlaying the spectra of unlabeled and labeled matches of the same peptide sequence for validation. Two cutoff filters, one based on the number of common fragment ions and another one on the similarity of intensity patterns among the common ions, were developed and applied to the overlaid spectral pairs to reject the low quality or incorrectly assigned spectra. By examining 257,907 and 245,156 spectra acquired from the unlabeled and (15)N-labeled samples, respectively, an experimentally validated MS/MS spectral library of tryptic peptides was constructed for E. coli K12 that consisted of 9,302 unique spectra with unique sequence and charge state, representing 7,763 unique peptide sequences. This E. coli spectral library could be readily expanded, and the overall strategy should be applicable to other organisms. Even with this relatively small library, it was shown that more peptides could be identified with higher confidence using the spectral search method than by sequence-database searching.  相似文献   

13.
14.
The identification of proteins separated on two-dimensional gels is most commonly performed by trypsin digestion and subsequent matrix-assisted laser desorption ionization (MALDI) with time-of-flight (TOF). Recently, atmospheric pressure (AP) MALDI coupled to an ion trap (IT) has emerged as a convenient method to obtain tandem mass spectra (MS/MS) from samples on MALDI target plates. In the present work, we investigated the feasibility of using the two methodologies in line as a standard method for protein identification. In this setup, the high mass accuracy MALDI-TOF spectra are used to calibrate the peptide precursor masses in the lower mass accuracy AP-MALDI-IT MS/MS spectra. Several software tools were developed to automate the analysis process. Two sets of MALDI samples, consisting of 142 and 421 gel spots, respectively, were analyzed in a highly automated manner. In the first set, the protein identification rate increased from 61% for MALDI-TOF only to 85% for MALDI-TOF combined with AP-MALDI-IT. In the second data set the increase in protein identification rate was from 44% to 58%. AP-MALDI-IT MS/MS spectra were in general less effective than the MALDI-TOF spectra for protein identification, but the combination of the two methods clearly enhanced the confidence in protein identification.  相似文献   

15.
A novel hierarchical MS2/MS3 database search algorithm has been developed to analyze MS2/MS3 phosphopeptides proteomic data. The algorithm is incorporated in an automated database search program, MassMatrix. The algorithm matches experimental MS2 spectra against a supplied protein database to determine candidate peptide matches. It then matches the corresponding experimental MS3 spectra against those candidate peptide matches. The MS2 and MS3 spectra are used in concert to arrive at peptide matches with overall higher confidence rather than combining MS2 and MS3 data searched separately. Receiver operating characteristic analysis showed that hierarchical MS2/MS3 database searches with MassMatrix had better sensitivity and specificity than the two‐stage MS2/MS3 database searches obtained with MassMatrix, MASCOT, and X!Tandem. A greater number of true peptide matches at a given false rate were identified by use of this new algorithm for data collected on both LCQ and LTQ‐FTICR mass spectrometers. The additional MS3 spectral data also improved the overall reliability and the number of true positives (TPs) due to the fact that the TPs of the MS2/MS3 search results had higher scores than those of the MS2.  相似文献   

16.
17.
The effectiveness of database search algorithms, such as Mascot, Sequest and ProteinPilot is limited by the quality of the input spectra: spurious peaks in MS/MS spectra can jeopardize the correct identification of peptides or reduce their score significantly. Consequently, an efficient preprocessing of MS/MS spectra can increase the sensitivity of peptide identification at reduced file sizes and run time without compromising its specificity. We investigate the performance of 25 MS/MS preprocessing methods on various data sets and make software for improved preprocessing of mgf/dta‐files freely available from http://hci.iwr.uni‐heidelberg.de/mip/proteomics or http://www.childrenshospital.org/research/steenlab .  相似文献   

18.
A database of high-mass accuracy tryptic peptides has been created. The database contains 15 897 unique, annotated MS/MS spectra. It is possible to search for peptides according to their mass, number of missed cleavages, and sequence motifs. All of the data contained in the database is downloadable, and each spectrum can be visualized. An example is presented of how the database can be used for studying peptide fragmentation. Fragmentation of different types of missed cleaved peptides has been studied, and the results can be used to improve identification of these types of peptides.  相似文献   

19.
It is an established fact that allelic variation and post-translational modifications create different variants of proteins, which are observed as isoelectric and size subspecies in two-dimensional gel based proteomics. Here we explore the stromal proteome of spinach and Arabidopsis chloroplast and show that clustering of mass spectra is a useful tool for investigating such variants and detecting modified peptides with amino acid substitutions or post-translational modifications. This study employs data mining by hierarchical clustering of MALDI-MS spectra, using the web version of the SPECLUST program (http://bioinfo.thep.lu.se/speclust.html). The tool can also be used to remove peaks of contaminating proteins and to improve protein identification, especially for species without a fully sequenced genome. Mutually exclusive peptide peaks within a cluster provide a good starting point for MS/MS investigation of modified peptides, here exemplified by the identification of an A to E substitution that accounts for the isoelectric heterogeneity in protein isoforms.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号