首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Shotgun tandem mass spectrometry-based peptide sequencing using programs such as SEQUEST allows high-throughput identification of peptides, which in turn allows the identification of corresponding proteins. We have applied a machine learning algorithm, called the support vector machine, to discriminate between correctly and incorrectly identified peptides using SEQUEST output. Each peptide was characterized by SEQUEST-calculated features such as delta Cn and Xcorr, measurements such as precursor ion current and mass, and additional calculated parameters such as the fraction of matched MS/MS peaks. The trained SVM classifier performed significantly better than previous cutoff-based methods at separating positive from negative peptides. Positive and negative peptides were more readily distinguished in training set data acquired on a QTOF, compared to an ion trap mass spectrometer. The use of 13 features, including four new parameters, significantly improved the separation between positive and negative peptides. Use of the support vector machine and these additional parameters resulted in a more accurate interpretation of peptide MS/MS spectra and is an important step toward automated interpretation of peptide tandem mass spectrometry data in proteomics.  相似文献   

2.
3.
4.
Nucleotide sequence of cloned cDNA coding for preproricin   总被引:20,自引:0,他引:20  
The primary structure of a precursor protein that contains the toxic (A) and galactose-binding (B) chains of the castor bean lectin, ricin, has been deduced from the nucleotide sequence of cloned DNA complementary to preproricin mRNA. A cDNA library was constructed using maturing castor bean endosperm poly(A)-rich RNA enriched for lectin precursor mRNA by size fractionation. Clones containing lectin mRNA sequences were isolated by hybridization using as a probe a mixture of synthetic oligonucleotides representing all possible sequences for a peptide of the ricin B chain. The entire coding sequence of preproricin was deduced from two overlapping cDNA clones having inserts of 1614 and 1049 base pairs. The coding region (1695 base pairs) consists of a 24-amino-acid N-terminal signal sequence (molecular mass 2836 Da) preceding the A chain 267 amino acids, molecular mass 29 399 Da), which is joined to the B chain (262 amino acids, molecular mass 28 517) by a 12-amino-acid linking region (molecular mass 1385 Da).  相似文献   

5.
A computer algorithm is described that utilizes both Edman and mass spectrometric data for simultaneous determination of the amino acid sequences of several peptides in a mixture. Gas phase sequencing of a peptide mixture results in a list of observed amino acids for each cycle of Edman degradation, which by itself may not be informative and typically requires reanalysis following additional chromatographic steps. Tandem mass spectrometry, on the other hand, has a proven ability to analyze sequences of peptides present in mixtures. However, mass spectrometric data may lack a complete set of sequence-defining fragment ions, so that more than one possible sequence may account for the observed fragment ions. A combination of the two types of data reduces the ambiguity inherent in each. The algorithm first utilizes the Edman data to determine all hypothetical sequences with a calculated mass equal to the observed mass of one of the peptides present in the mixture. These sequences are then assigned figures of merit according to how well each of them accounts for the fragment ions in the tandem mass spectrum of that peptide. The program was tested on tryptic and chymotryptic peptides from hen lysozyme, and the results are compared with those of another computer program that uses only mass spectral data for peptide sequencing. In order to assess the utility of this method the program is tested using simulated mixtures of varying complexity and tandem mass spectra of varying quality.  相似文献   

6.
De novo interpretation of tandem mass spectrometry (MS/MS) spectra provides sequences for searching protein databases when limited sequence information is present in the database. Our objective was to define a strategy for this type of homology-tolerant database search. Homology searches, using MS-Homology software, were conducted with 20, 10, or 5 of the most abundant peptides from 9 proteins, based either on precursor trigger intensity or on total ion current, and allowing for 50%, 30%, or 10% mismatch in the search. Protein scores were corrected by subtracting a threshold score that was calculated from random peptides. The highest (p < .01) corrected protein scores (i.e., above the threshold) were obtained by submitting 20 peptides and allowing 30% mismatch. Using these criteria, protein identification based on ion mass searching using MS/MS data (i.e., Mascot) was compared with that obtained using homology search. The highest-ranking protein was the same using Mascot, homology search using the 20 most intense peptides, or homology search using all peptides, for 63.4% of 112 spots from two-dimensional polyacrylamide gel electrophoresis gels. For these proteins, the percent coverage was greatest using Mascot compared with the use of all or just the 20 most intense peptides in a homology search (25.1%, 18.3%, and 10.6%, respectively). Finally, 35% of de novo sequences completely matched the corresponding known amino acid sequence of the matching peptide. This percentage increased when the search was limited to the 20 most intense peptides (44.0%). After identifying the protein using MS-Homology, a peptide mass search may increase the percent coverage of the protein identified.  相似文献   

7.
Data-independent acquisition (DIA) methods have become increasingly popular in mass spectrometry–based proteomics because they enable continuous acquisition of fragment spectra for all precursors simultaneously. However, these advantages come with the challenge of correctly reconstructing the precursor–fragment relationships in these highly convoluted spectra for reliable identification and quantification. Here, we introduce a scan mode for the combination of trapped ion mobility spectrometry with parallel accumulation—serial fragmentation (PASEF) that seamlessly and continuously follows the natural shape of the ion cloud in ion mobility and peptide precursor mass dimensions. Termed synchro-PASEF, it increases the detected fragment ion current several-fold at sub-second cycle times. Consecutive quadrupole selection windows move synchronously through the mass and ion mobility range. In this process, the quadrupole slices through the peptide precursors, which separates fragment ion signals of each precursor into adjacent synchro-PASEF scans. This precisely defines precursor–fragment relationships in ion mobility and mass dimensions and effectively deconvolutes the DIA fragment space. Importantly, the partitioned parts of the fragment ion transitions provide a further dimension of specificity via a lock-and-key mechanism. This is also advantageous for quantification, where signals from interfering precursors in the DIA selection window do not affect all partitions of the fragment ion, allowing to retain only the specific parts for quantification. Overall, we establish the defining features of synchro-PASEF and explore its potential for proteomic analyses.  相似文献   

8.
Turkina MV  Villarejo A  Vener AV 《FEBS letters》2004,564(1-2):104-108
The surface-exposed peptides were cleaved by trypsin from the photosynthetic thylakoid membranes isolated from the green alga Chlamydomonas reinhardtii. Two phosphorylated peptides, enriched from the peptide mixture and sequenced by nanospray quadrupole time-of-flight mass spectrometry, revealed overlapping sequences corresponding to the N-terminus of a nuclear-encoded chlorophyll a/b-binding protein CP29. In contrast to all known nuclear-encoded thylakoid proteins, the transit peptide in the mature algal CP29 was not removed but processed by methionine excision, N-terminal acetylation and phosphorylation on threonine 6. The importance of this phosphorylation site is proposed as the reason of the unique transit peptide retention.  相似文献   

9.
Tandem mass spectrometry has emerged to be one of the most powerful high-throughput techniques for protein identification. Tandem mass spectrometry selects and fragments peptides of interest into N-terminal ions and C-terminal ions, and it measures the mass/charge ratios of these ions. The de novo peptide sequencing problem is to derive the peptide sequences from given tandem mass spectral data of k ion peaks without searching against protein databases. By transforming the spectral data into a matrix spectrum graph G = (V, E), where |V| = O(k(2)) and |E| = O(k(3)), we give the first polynomial time suboptimal algorithm that finds all the suboptimal solutions (peptides) in O(p|E|) time, where p is the number of solutions. The algorithm has been implemented and tested on experimental data. The program is available at http://hto-c.usc.edu:8000/msms/menu/denovo.htm.  相似文献   

10.
A number of different approaches have been proposed to predict elemental component formulas (or molecular formulas) of molecular ions in low and medium resolution mass spectra. Most of them rely on isotope patterns, enumerate all possible formulas for an ion, and exclude certain formulas violating chemical constraints. However, these methods cannot be well generalized to the component prediction of fragment ions in tandem mass spectra. In this paper, a new method, FFP (fragment ion formula prediction), is presented to predict elemental component formulas of fragment ions. In the FFP method, the prediction of the best formulas is converted into the minimization of the distance between theoretical and observed isotope patterns. And, then, a novel local search model is proposed to generate a set of candidate formulas efficiently. After the search, FFP applies a new multiconstraint filtering to exclude as many invalid and improbable formulas as possible. FFP is experimentally compared with the previous enumeration methods, and shown to outperform them significantly. The results of this paper can help to improve the reliability of de novo in the identification of peptide sequences.  相似文献   

11.
The cDNAs corresponding to the mRNA encoding a polypeptide which is immunoreactive with the antisera specific to carcinoembryonic antigen (CEA) (1) are cloned. The amino acid sequences deduced from the nucleotide sequences of the cDNAs show that it is synthesized as a precursor with a signal peptide followed by 668 amino acids of the putative mature CEA peptide, whose N-terminal 24 amino acids and amino acids 286 to 295 exactly coincide with those known for N-terminal sequences of CEA (2) and NFA-1 (3), respectively. The first 108 N-terminal residues are followed by three very homologous repetitive domains of 178 residues each and then by 26 mostly hydrophobic residues which probably comprise a membrane anchor. Each repetitive domains contains 4 cysteines at precisely the same positions and as many as 28 possible N-glycosylation sites are found in the CEA peptide region agreeing with high carbohydrate content of purified CEA.  相似文献   

12.
In view of the significance of Asn deamidation and Asp isomerization to isoAsp at certain sites for protein aging and turnover, it was desirable to challenge the extreme analytical power of electrospray tandem mass spectrometry (ESI-MS/MS) for the possibility of a site-specific detection of this posttranslational modification. For this purpose, synthetic L-Asp/L-isoAsp containing oligopeptide pairs were investigated by ESI-MS/MS and low-energy collision-induced dissociation (CID). Replacement of L-Asp by L-isoAsp resulted in the same kind of shifts for all 15 peptide pairs investigated: (1) the b/y intensity ratio of complementary b and y ions generated by cleavage of the (L-Asp/L-isoAsp)-X bond and of the X-(L-Asp/L-isoAsp) bond was decreased, and (2) the Asp immonium ion abundance at m/z 88 was also decreased. It is proposed that the isoAsp structure hampers the accepted mechanism of b-ion formation on both its N- and C-terminal side. The b/y ion intensity ratio and the relative immonium ion intensity vary considerably, depending on the peptide sequence, but the corresponding values are reproducible when recorded on the same instrument under identical instrumental settings. Thus, once the reference product ion spectra have been documented for a pair of synthetic peptides containing either L-Asp or L-isoAsp, these identify one or the other form. Characterization and relative quantification of L-Asp/L-isoAsp peptide mixtures are also possible as demonstrated for two sequences for which isoAsp formation has been described, namely myrG-D/isoD-AAAAK (deamidated peptide 1-7 of protein kinase A catalytic subunit) and VQ-D/isoD-GLR (deamidated peptide 41-46 of human procollagen alpha 1). Thus, the analytical procedures described may be helpful for the identification of suspected Asn deamidation and Asp isomerization sites in proteolytic digests of proteins.  相似文献   

13.
Methods for treating MS/MS data to achieve accurate peptide identification are currently the subject of much research activity. In this study we describe a new method for filtering MS/MS data and refining precursor masses that provides highly accurate analyses of massive sets of proteomics data. This method, coined "postexperiment monoisotopic mass filtering and refinement" (PE-MMR), consists of several data processing steps: 1) generation of lists of all monoisotopic masses observed in a whole LC/MS experiment, 2) clusterization of monoisotopic masses of a peptide into unique mass classes (UMCs) based on their masses and LC elution times, 3) matching the precursor masses of the MS/MS data to a representative mass of a UMC, and 4) filtration of the MS/MS data based on the presence of corresponding monoisotopic masses and refinement of the precursor ion masses by the UMC mass. PE-MMR increases the throughput of proteomics data analysis, by efficiently removing "garbage" MS/MS data prior to database searching, and improves the mass measurement accuracies (i.e. 0.05 +/- 1.49 ppm for yeast data (from 4.46 +/- 2.81 ppm) and 0.03 +/- 3.41 ppm for glycopeptide data (from 4.8 +/- 7.4 ppm)) for an increased number of identified peptides. In proteomics analyses of glycopeptide-enriched samples, PE-MMR processing greatly reduces the degree of false glycopeptide identification by correctly assigning the monoisotopic masses for the precursor ions prior to database searching. By applying this technique to analyses of proteome samples of varying complexities, we demonstrate herein that PE-MMR is an effective and accurate method for treating massive sets of proteomics data.  相似文献   

14.
Protein identifications with the borderline statistical confidence are typically produced by matching a few marginal quality MS/MS spectra to database peptide sequences and represent a significant bottleneck in the reliable and reproducible characterization of proteomes. Here, we present a method for rapid validation of borderline hits that circumvents the need in, often biased, manual inspection of raw MS/MS spectra. The approach takes advantage of the independent interpretation of corresponding MS/MS spectra by PepNovo de novo sequencing software followed by mass spectrometry-driven BLAST (MS BLAST) sequence-similarity database searches that utilize all partially inaccurate, degenerate and redundant candidate peptide sequences. In a case study involving the identification of more than 180 Caenorhabditis elegans proteins by nanoLC-MS/MS analysis on a linear ion trap LTQ mass spectrometer, the approach enabled rapid assignment (confirmation or rejection) of more than 70% of Mascot hits of borderline statistical confidence.  相似文献   

15.
De novo peptide sequencing via tandem mass spectrometry.   总被引:10,自引:0,他引:10  
Peptide sequencing via tandem mass spectrometry (MS/MS) is one of the most powerful tools in proteomics for identifying proteins. Because complete genome sequences are accumulating rapidly, the recent trend in interpretation of MS/MS spectra has been database search. However, de novo MS/MS spectral interpretation remains an open problem typically involving manual interpretation by expert mass spectrometrists. We have developed a new algorithm, SHERENGA, for de novo interpretation that automatically learns fragment ion types and intensity thresholds from a collection of test spectra generated from any type of mass spectrometer. The test data are used to construct optimal path scoring in the graph representations of MS/MS spectra. A ranked list of high scoring paths corresponds to potential peptide sequences. SHERENGA is most useful for interpreting sequences of peptides resulting from unknown proteins and for validating the results of database search algorithms in fully automated, high-throughput peptide sequencing.  相似文献   

16.
Abstract A probability-based quantification framework is presented for the calculation of relative peptide and protein abundance in label-free and label-dependent LC-MS proteomics data. The results are accompanied by credible intervals and regulation probabilities. The algorithm takes into account data uncertainties via Poisson statistics modified by a noise contribution that is determined automatically during an initial normalization stage. Protein quantification relies on assignments of component peptides to the acquired data. These assignments are generally of variable reliability and may not be present across all of the experiments comprising an analysis. It is also possible for a peptide to be identified to more than one protein in a given mixture. For these reasons the algorithm accepts a prior probability of peptide assignment for each intensity measurement. The model is constructed in such a way that outliers of any type can be automatically reweighted. Two discrete normalization methods can be employed. The first method is based on a user-defined subset of peptides, while the second method relies on the presence of a dominant background of endogenous peptides for which the concentration is assumed to be unaffected. Normalization is performed using the same computational and statistical procedures employed by the main quantification algorithm. The performance of the algorithm will be illustrated on example data sets, and its utility demonstrated for typical proteomics applications. The quantification algorithm supports relative protein quantification based on precursor and product ion intensities acquired by means of data-dependent methods, originating from all common isotopically-labeled approaches, as well as label-free ion intensity-based data-independent methods.  相似文献   

17.
The usability of a quadrupole—quadrupole—time‐of‐flight (QqTOF) instrument for the tandem mass spectrometric sequencing of oligodeoxynuleotides was investigated. The sample set consisted of 21 synthetic oligodeoxynucleotides ranging in length from 5 to 42 nucleotides. The sequences were randomly selected. For the majority of tested oligonucleotides, two or three different charge states were selected as precursor ions. Each precursor ion was fragmented applying several different collision voltages. Overall 282 fragment ion mass spectra were acquired. Computer‐aided interpretation of fragment ion mass spectra was accomplished with a recently introduced comparative sequencing algorithm (COMPAS). The applied version of COMPAS was specifically optimized for the interpretation of information‐rich spectra obtained on the QqTOF. Sequences of oligodeoxynucleotides as large as 26‐mers were correctly verified in >94% of cases (182 of 192 spectra acquired). Fragment ion mass spectra of larger oligonucleotides were not specific enough for sequencing. Because of the occurrence of extensive internal fragmentation causing low sequence coverage paired with a high probability of assigning fragment ions to wrong sequences, tandem mass spectra obtained from oligonucleotides consisting of 30 and more nucleotides could not be used for sequence verification neither manually nor with COMPAS. © 2009 Wiley Periodicals, Inc. Biopolymers 91: 401–409, 2009. This article was originally published online as an accepted preprint. The “Published Online” date corresponds to the preprint version. You can request a copy of the preprint by emailing the Biopolymers editorial office at biopolymers@wiley.com  相似文献   

18.
Protonated molecular peptide ions and their product ions generated by tandem mass spectrometry appear as isotopologue clusters due to the natural isotopic variations of carbon, hydrogen, nitrogen, oxygen, and sulfur. Quantitation of the isotopic composition of peptides can be employed in experiments involving isotope effects, isotope exchange, and isotopic labeling by chemical reactions and in studies of metabolism by stable isotope incorporation. Both ion trap and quadrupole-time of flight mass spectrometry are shown to be capable of determining the isotopic composition of peptide product ions obtained by tandem mass spectrometry with both precision and accuracy. Tandem mass spectra of clusters of isotopologue ions obtained in profile mode are fit by nonlinear least squares to a series of Gaussian peaks which quantify the Mn/M0 values which define the isotopologue distribution (ID). To determine the isotopic composition of product ions from their ID, a new algorithm that predicts the Mn/M0 ratios and obviates the need to determine the intensity of all of the ions of an ID is developed. Consequently a precise and accurate determination of the isotopic composition of a product ion may be obtained from only the initial values of the ID, however, the entire isotopologue cluster must be isolated prior to fragmentation. Following optimization of the molecular ion isolation width, fragmentation energy, and detector sensitivity, the presence of isotopic excess (2H, 13C, 15N, 18O) is readily determined within 1%. The ability to determine the isotopic composition of sequential product ions permits the isotopic composition of individual amino acid residues in the precursor ion to be determined.  相似文献   

19.
An Z  Chen Y  Koomen JM  Merkler DJ 《Proteomics》2012,12(2):173-182
Amidation is a post-translational modification found at the C-terminus of ~50% of all neuropeptide hormones. Cleavage of the C(α)-N bond of a C-terminal glycine yields the α-amidated peptide in a reaction catalyzed by peptidylglycine α-amidating monooxygenase (PAM). The mass of an α-amidated peptide decreases by 58 Da relative to its precursor. The amino acid sequences of an α-amidated peptide and its precursor differ only by the C-terminal glycine meaning that the peptides exhibit similar RP-HPLC properties and tandem mass spectral (MS/MS) fragmentation patterns. Growth of cultured cells in the presence of a PAM inhibitor ensured the coexistence of α-amidated peptides and their precursors. A strategy was developed for precursor and α-amidated peptide pairing (PAPP): LC-MS/MS data of peptide extracts were scanned for peptide pairs that differed by 58 Da in mass, but had similar RP-HPLC retention times. The resulting peptide pairs were validated by checking for similar fragmentation patterns in their MS/MS data prior to identification by database searching or manual interpretation. This approach significantly reduced the number of spectra requiring interpretation, decreasing the computing time required for database searching and enabling manual interpretation of unidentified spectra. Reported here are the α-amidated peptides identified from AtT-20 cells using the PAPP method.  相似文献   

20.
A novel hybrid methodology for the automated identification of peptides via de novo integer linear optimization, local database search, and tandem mass spectrometry is presented in this article. A modified version of the de novo identification algorithm PILOT, is utilized to construct accurate de novo peptide sequences. A modified version of the local database search tool FASTA is used to query these de novo predictions against the nonredundant protein database to resolve any low-confidence amino acids in the candidate sequences. The computational burden associated with performing several alignments is alleviated with the use of distributive computing. Extensive computational studies are presented for this new hybrid methodology, as well as comparisons with MASCOT for a set of 38 quadrupole time-of-flight (QTOF) and 380 OrbiTrap tandem mass spectra. The results for our proposed hybrid method for the OrbiTrap spectra are also compared with a modified version of PepNovo, which was trained for use on high-precision tandem mass spectra, and the tag-based method InsPecT. The de novo sequences of PILOT and PepNovo are also searched against the nonredundant protein database using CIDentify to compare with the alignments achieved by our modifications of FASTA. The comparative studies demonstrate the excellent peptide identification accuracy gained from combining the strengths of our de novo method, which is based on integer linear optimization, and database driven search methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号