首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 109 毫秒
1.
Heparin and heparan sulfate are very large linear polysaccharides that undergo a complex variety of modifications and are known to play important roles in human development, cell–cell communication and disease. Sequencing of highly sulfated glycosaminoglycan oligosaccharides like heparin and heparan sulfate by liquid chromatography-tandem mass spectrometry (LC-MS/MS) remains challenging because of the presence of multiple isomeric sequences in a complex mixture of oligosaccharides, the difficulties in separation of these isomers, and the facile loss of sulfates in MS/MS. We have previously introduced a method for structural sequencing of heparin/heparan sulfate oligosaccharides involving chemical derivatizations that replace labile sulfates with stable acetyl groups. This chemical derivatization scheme allows the use of reversed phase LC for high-resolution separation and MS/MS for sequencing of isomeric heparan sulfate oligosaccharides. However, because of the large number of analytes present in complex mixtures of heparin/HS oligosaccharides, the resulting LC-MS/MS data sets are large and cannot be annotated with existing glycomics software because of the specifically designed chemical derivatization strategy. We have developed a tool, called GAG-ID, to automate the interpretation of derivatized heparin/heparan sulfate LC-MS/MS data based on a modified multivariate hypergeometric distribution to weight the annotation of more intense peaks. The software is tested on a LC-MS/MS data set collected from a mixture of 21 synthesized heparan sulfate tetrasaccharides. By testing the discrimination of scoring with this system, we show that stratifying peaks into different intensity classes benefits the discrimination of scoring, and GAG-ID is able to properly assign all 21 synthetic tetrasaccharides in a defined mixture from a single LC-MS/MS run.Heparin and heparan sulfate (HS)1 are involved in numerous physiological (1) and pathophysiological (2) processes, including cellular and organ development (3, 4), cancer (5, 6), and angiogenesis (7). Furthermore, heparin and heparan sulfate have been linked to regulation of cell growth (8), cell adhesion (9), inflammation and immune cell migration (10), neural development and regeneration (11, 12), and hemostasis (13). Heparin/HS is composed of variously sulfated hexuronic acid (1→4) d-glucosamine-repeating disaccharide building blocks, with heparin being a more heavily sulfated form of heparan sulfate (14). The uronic acid residue of heparin/HS may be either α-l-iduronic acid (IdoA) or β-d-glucuronic acid (GlcA) and can be unsubstituted or sulfated at the 2-O position. The modification reactions in heparin/HS biosynthesis are thought to occur in clusters along the chain, with regions devoid of sulfate separating the modified tracts. This arrangement gives rise to segments referred to as N-acetylated (NA), N-sulfated (NS), and mixed domains (NA/NS). The modification reactions often fail to go to completion, resulting in tremendous heterogeneity among the modified regions (15). Interestingly, the biological function of a particular region of heparin/HS is dictated primarily through the interactions that region has with specific effector proteins, and the specificity of these interactions is dictated by the pattern of modification of the heparin/HS region. This biologically essential microheterogeneity of heparin/HS makes sequencing of these oligosaccharide regions challenging because of the variable patterns of sulfation and acetylation, as well as the presence of epimers of uronic acid.Tandem mass spectrometry (MS/MS) is an important tool for the structural characterization of carbohydrates, as it offers high sensitivity coupled with reproducible structural information (16, 17). However, a major challenge to the use of tandem mass spectrometry for structural sequencing of heparin/HS oligosaccharides is sulfate loss during fragmentation. As heparin/HS is collisionally activated, one of the most common fragmentation pathways is the loss of the sulfate modifications, resulting in a loss of structural information regarding the original site of sulfation. It has been shown that the loss of sulfate groups can be minimized by using a combination of charge state manipulation and metal ion adduction or by using alternative fragmentation methods instead of conventional collision induced dissociation, such as electron detachment dissociation and negative electron transfer dissociation (1820). However, substantial difficulties remain in coupling this technology with separations technology capable of separating isomeric sequences. Our lab has developed a chemical derivatization strategy including sequential permethylation, desulfation, and pertrideuteroacetylation to allow successful separation and sequencing of mixtures of GAG oligosaccharide by LC-MS/MS. This method is attractive as it allows for electrospray-compatible separation of isomeric sequences and is able to fully sequence all sulfation and acetylation patterns using only glycosidic bond cleavages. However, the data from this derivatization method cannot be easily incorporated into current glycomics software, such as GlycoWorkbench (21, 22), because of the multistep derivatizations and lack of a confident scoring algorithm for evaluating matches (23).Database searching approaches have been successfully shown in proteomic research as a key bioinformatics tool to link proteomic MS/MS spectra to peptide sequences from the protein database. The importance of scoring matches between peptide sequences and MS/MS spectra can be observed in the diversity of algorithms created for this purpose. Comparisons have been conducted by cross correlation (24), hypergeometric distribution (25, 26), Poisson distributions (27), Mowse scores (28), Bayesian statistics (29), dot products (30), and several other methods. Many of these algorithms score potential identifications by evaluating the number of fragment ions matched between each peptide sequence and an observed spectrum. However, these systems often do not distinguish between matching an intense peak and matching a minor peak. This does not benefit the discrimination of scoring, where matching the significant peaks in the spectrum should lead to a result being more reliable. Tabb and coworkers (31) have introduced an open-source program called MyriMatch, which uses a statistical model to score peptide matches and is based on multivariate hypergeometric distribution analysis. This program highlights the limitation of existing database search algorithms that count matched peaks without differentiating them by intensity. However, it is designed for proteomic research and modeled based on proteomic data sets.The development of software tools in glycomics research is currently undergoing rapid changes, yet remains insufficient, especially in the fields of GAGs (32). Four software tools have recently been described for the targeted evaluation of GAG MS data. Venkatraman and coworkers (33) developed a systematic method to manually sequence oligosaccharides using sequential enzyme digestion from the target oligosaccharide, but the software reported lacks an advanced scoring system. Saad and Leary (34) refined this basic approach and introduced a program called heparin oligosaccharide sequencing tool (HOST) for automated sequencing using the results of tandem mass spectrometry for disaccharides produced by sequential enzyme digestion from the target oligosaccharide. The use of such a method is limited in application to structurally homogeneous samples and requires digestion with several heparin lyases. Later, Maxwell et al. (35) published an open-source program, GlycReSoft, for compositional annotation of multiple charged glycan ions from LC/MS data. Although this software aims to provide confident compositional analyses of oligosaccharides in complex data sets, it is limited to MS analysis to determine composition and not MS/MS data to provide oligosaccharide sequence.Hu et al. (36) recently published HS-SEQ, the first comprehensive algorithm for HS de novo sequencing using high resolution negative electron transfer dissociation tandem mass spectra. Although the program successfully aims to optimize the sulfation patterns of GAG oligosaccharide without searching against a database, which does not exist because of the nontemplated nature of GAG biosynthesis, it is limited to analysis of a single compound instead of mixtures.Heparin/HS oligosaccharides have linear structures composed of a finite and defined array of disaccharide sequences, analogous to peptide sequences. Therefore, we have pursued a strategy modeled on approaches currently used for analysis of peptide MS/MS data. However, the major difference is that peptide sequencing algorithms typically match the MS/MS spectra to a database of theoretical spectra derived from protein sequences computationally reconstructed from nucleic acid template sequencing, which does not exist for GAGs like heparin/HS. We have developed a theoretical sequence database, GAG-DB, which contains every possible derivatized sequence of heparin/HS for oligosaccharides up to dodecamer, and we use it as our database for spectral matching. We employed a multivariate hypergeometric distribution as the core scoring algorithm for matching experimental spectra to our comprehensive theoretical database. Our software, GAG-ID, scores oligosaccharide fragment ion matches against theoretical fragmentation patterns using a multivariate hypergeometric distribution scoring model in order to compute the probability of the match occurring by random chance for each pairing of candidate sequence and spectrum. Using theoretical HS sequence assignments to spectra generated from a defined mixture of 21 synthesized heparin/HS tetrasaccharides, the model is shown to produce probability-based scores that accurately identify the correct HS structure and discriminate correct from incorrect HS identifications.This method for calculating the probability-based score that a synthesized oligosaccharide is present in the sample given the acquired mass spectrometric information is of great importance of glycosaminoglycan sequencing research. It is automated and does not fully rely on subjective “expert” judgment (manual validation). Furthermore, the searching time, which depends on the complexity of the sample and selected database size, is reasonable for the data set sizes typically encountered. Finally, our GAG-ID software coupled with our previously published heparin/HS derivatization LC-MS/MS method developed in our lab makes high-throughput sequencing of heparin/HS oligosaccharide mixtures possible.  相似文献   

2.
We present MassSieve, a Java‐based platform for visualization and parsimony analysis of single and comparative LC‐MS/MS database search engine results. The success of mass spectrometric peptide sequence assignment algorithms has led to the need for a tool to merge and evaluate the increasing data set sizes that result from LC‐MS/MS‐based shotgun proteomic experiments. MassSieve supports reports from multiple search engines with differing search characteristics, which can increase peptide sequence coverage and/or identify conflicting or ambiguous spectral assignments.  相似文献   

3.
A major limitation in identifying peptides from complex mixtures by shotgun proteomics is the ability of search programs to accurately assign peptide sequences using mass spectrometric fragmentation spectra (MS/MS spectra). Manual analysis is used to assess borderline identifications; however, it is error-prone and time-consuming, and criteria for acceptance or rejection are not well defined. Here we report a Manual Analysis Emulator (MAE) program that evaluates results from search programs by implementing two commonly used criteria: 1) consistency of fragment ion intensities with predicted gas phase chemistry and 2) whether a high proportion of the ion intensity (proportion of ion current (PIC)) in the MS/MS spectra can be derived from the peptide sequence. To evaluate chemical plausibility, MAE utilizes similarity (Sim) scoring against theoretical spectra simulated by MassAnalyzer software (Zhang, Z. (2004) Prediction of low-energy collision-induced dissociation spectra of peptides. Anal. Chem. 76, 3908-3922) using known gas phase chemical mechanisms. The results show that Sim scores provide significantly greater discrimination between correct and incorrect search results than achieved by Sequest XCorr scoring or Mascot Mowse scoring, allowing reliable automated validation of borderline cases. To evaluate PIC, MAE simplifies the DTA text files summarizing the MS/MS spectra and applies heuristic rules to classify the fragment ions. MAE output also provides data mining functions, which are illustrated by using PIC to identify spectral chimeras, where two or more peptide ions were sequenced together, as well as cases where fragmentation chemistry is not well predicted.  相似文献   

4.
Database-searching programs generally identify only a fraction of the spectra acquired in a standard LC/MS/MS study of digested proteins. Subtle variations in database-searching algorithms for assigning peptides to MS/MS spectra have been known to provide different identification results. To leverage this variation, a probabilistic framework is developed for combining the results of multiple search engines. The scores for each search engine are first independently converted into peptide probabilities. These probabilities can then be readily combined across search engines using Bayesian rules and the expectation maximization learning algorithm. A significant gain in the number of peptides identified with high confidence with each additional search engine is demonstrated using several data sets of increasing complexity, from a control protein mixture to a human plasma sample, searched using SEQUEST, Mascot, and X! Tandem database-searching programs. The increased rate of peptide assignments also translates into a substantially larger number of protein identifications in LC/MS/MS studies compared to a typical analysis using a single database-search tool.  相似文献   

5.
A key challenge to investigations into the functional roles of glycosaminoglycans (GAGs) in biological systems is the difficulty in achieving sensitive, stable, and reproducible mass spectrometric analysis. GAGs are linear carbohydrates with domains that vary in the extent of sulfation, acetylation, and uronic acid epimerization. It is of particular importance to determine spatial and temporal variations of GAG domain structures in biological tissues. In order to analyze GAGs from tissue, it is useful to couple MS with an on‐line separation system. The purposes of the separation system are both to remove components that inhibit GAG ionization and to enable the analysis of very complex mixtures. This contribution presents amide–silica hydrophilic interaction chromatography (HILIC) in a chip‐based format for LC/MS of heparin, heparan sulfate (HS) GAGs. The chip interface yields robust performance in the negative ion mode that is essential for GAGs and other acidic glycan classes while the built‐in trapping cartridge reduces background from the biological tissue matrix. The HILIC chromatographic separation is based on a combination of the glycan chain lengths and the numbers of hydrophobic acetate (Ac) groups and acidic sulfate groups. In summary, chip based amide‐HILIC LC/MS is an enabling technology for GAG glycomics profiling.  相似文献   

6.
Development of robust statistical methods for validation of peptide assignments to tandem mass (MS/MS) spectra obtained using database searching remains an important problem. PeptideProphet is one of the commonly used computational tools available for that purpose. An alternative simple approach for validation of peptide assignments is based on addition of decoy (reversed, randomized, or shuffled) sequences to the searched protein sequence database. The probabilistic modeling approach of PeptideProphet and the decoy strategy can be combined within a single semisupervised framework, leading to improved robustness and higher accuracy of computed probabilities even in the case of most challenging data sets. We present a semisupervised expectation-maximization (EM) algorithm for constructing a Bayes classifier for peptide identification using the probability mixture model, extending PeptideProphet to incorporate decoy peptide matches. Using several data sets of varying complexity, from control protein mixtures to a human plasma sample, and using three commonly used database search programs, SEQUEST, MASCOT, and TANDEM/k-score, we illustrate that more accurate mixture estimation leads to an improved control of the false discovery rate in the classification of peptide assignments.  相似文献   

7.

Background  

In proteomic analysis, MS/MS spectra acquired by mass spectrometer are assigned to peptides by database searching algorithms such as SEQUEST. The assignations of peptides to MS/MS spectra by SEQUEST searching algorithm are defined by several scores including Xcorr, ΔCn, Sp, Rsp, matched ion count and so on. Filtering criterion using several above scores is used to isolate correct identifications from random assignments. However, the filtering criterion was not favorably optimized up to now.  相似文献   

8.
The effectiveness of database search algorithms, such as Mascot, Sequest and ProteinPilot is limited by the quality of the input spectra: spurious peaks in MS/MS spectra can jeopardize the correct identification of peptides or reduce their score significantly. Consequently, an efficient preprocessing of MS/MS spectra can increase the sensitivity of peptide identification at reduced file sizes and run time without compromising its specificity. We investigate the performance of 25 MS/MS preprocessing methods on various data sets and make software for improved preprocessing of mgf/dta‐files freely available from http://hci.iwr.uni‐heidelberg.de/mip/proteomics or http://www.childrenshospital.org/research/steenlab .  相似文献   

9.
We report an isotope labeling shotgun proteome analysis strategy to validate the spectrum-to-sequence assignments generated by using sequence-database searching for the construction of a more reliable MS/MS spectral library. This strategy is demonstrated in the analysis of the E. coli K12 proteome. In the workflow, E. coli cells were cultured in normal and (15)N-enriched media. The differentially labeled proteins from the cell extracts were subjected to trypsin digestion and two-dimensional liquid chromatography quadrupole time-of-flight tandem mass spectrometry (2D-LC QTOF MS/MS) analysis. The MS/MS spectra of the two samples were individually searched using Mascot against the E. coli proteome database to generate lists of peptide sequence matches. The two data sets were compared by overlaying the spectra of unlabeled and labeled matches of the same peptide sequence for validation. Two cutoff filters, one based on the number of common fragment ions and another one on the similarity of intensity patterns among the common ions, were developed and applied to the overlaid spectral pairs to reject the low quality or incorrectly assigned spectra. By examining 257,907 and 245,156 spectra acquired from the unlabeled and (15)N-labeled samples, respectively, an experimentally validated MS/MS spectral library of tryptic peptides was constructed for E. coli K12 that consisted of 9,302 unique spectra with unique sequence and charge state, representing 7,763 unique peptide sequences. This E. coli spectral library could be readily expanded, and the overall strategy should be applicable to other organisms. Even with this relatively small library, it was shown that more peptides could be identified with higher confidence using the spectral search method than by sequence-database searching.  相似文献   

10.
A novel software tool named PTM-Explorer has been applied to LC-MS/MS datasets acquired within the Human Proteome Organisation (HUPO) Brain Proteome Project (BPP). PTM-Explorer enables automatic identification of peptide MS/MS spectra that were not explained in typical sequence database searches. The main focus was detection of PTMs, but PTM-Explorer detects also unspecific peptide cleavage, mass measurement errors, experimental modifications, amino acid substitutions, transpeptidation products and unknown mass shifts. To avoid a combinatorial problem the search is restricted to a set of selected protein sequences, which stem from previous protein identifications using a common sequence database search. Prior to application to the HUPO BPP data, PTM-Explorer was evaluated on excellently manually characterized and evaluated LC-MS/MS data sets from Alpha-A-Crystallin gel spots obtained from mouse eye lens. Besides various PTMs including phosphorylation, a wealth of experimental modifications and unspecific cleavage products were successfully detected, completing the primary structure information of the measured proteins. Our results indicate that a large amount of MS/MS spectra that currently remain unidentified in standard database searches contain valuable information that can only be elucidated using suitable software tools.  相似文献   

11.
Proteome identification using peptide-centric proteomics techniques is a routinely used analysis technique. One of the most powerful and popular methods for the identification of peptides from MS/MS spectra is protein database matching using search engines. Significance thresholding through false discovery rate (FDR) estimation by target/decoy searches is used to ensure the retention of predominantly confident assignments of MS/MS spectra to peptides. However, shortcomings have become apparent when such decoy searches are used to estimate the FDR. To study these shortcomings, we here introduce a novel kind of decoy database that contains isobaric mutated versions of the peptides that were identified in the original search. Because of the supervised way in which the entrapment sequences are generated, we call this a directed decoy database. Since the peptides found in our directed decoy database are thus specifically designed to look quite similar to the forward identifications, the limitations of the existing search algorithms in making correct calls in such strongly confusing situations can be analyzed. Interestingly, for the vast majority of confidently identified peptide identifications, a directed decoy peptide-to-spectrum match can be found that has a better or equal match score than the forward match score, highlighting an important issue in the interpretation of peptide identifications in present-day high-throughput proteomics.  相似文献   

12.
Glycoproteins fulfill many indispensable biological functions, and changes in protein glycosylation have been observed in various diseases. Improved analytical methods are needed to allow a complete characterization of this complex and common post-translational modification. In this study, we present a workflow for the analysis of the microheterogeneity of N-glycoproteins that couples hydrophilic interaction and nanoreverse-phase C18 chromatography to tandem QTOF mass spectrometric analysis. A glycan database search program, GlycoPeptideSearch, was developed to match N-glycopeptide MS/MS spectra with the glycopeptides comprised of a glycan drawn from the GlycomeDB glycan structure database and a peptide from a user-specified set of potentially glycosylated peptides. Application of the workflow to human haptoglobin and hemopexin, two microheterogeneous N-glycoproteins, identified a total of 57 distinct site-specific glycoforms in the case of haptoglobin and 14 site-specific glycoforms of hemopexin. Using glycan oxonium ions and peptide-characteristic glycopeptide fragment ions and by collapsing topologically redundant glycans, the search software was able to make unique N-glycopeptide assignments for 51% of assigned spectra, with the remaining assignments primarily representing isobaric topological rearrangements. The optimized workflow, coupled with GlycoPeptideSearch, is expected to make high-throughput semiautomated glycopeptide identification feasible for a wide range of users.  相似文献   

13.
Querying MS/MS spectra against a database containing only proteotypic peptides reduces data analysis time due to reduction of database size. Despite the speed advantage, this search strategy is challenged by issues of statistical significance and coverage. The former requires separating systematically significant identifications from less confident identifications, while the latter arises when the underlying peptide is not present, due to single amino acid polymorphisms (SAPs) or post-translational modifications (PTMs), in the proteotypic peptide libraries searched. To address both issues simultaneously, we have extended RAId's knowledge database to include proteotypic information, utilized RAId's statistical strategy to assign statistical significance to proteotypic peptides, and modified RAId's programs to allow for consideration of proteotypic information during database searches. The extended database alleviates the coverage problem since all annotated modifications, even those that occurred within proteotypic peptides, may be considered. Taking into account the likelihoods of observation, the statistical strategy of RAId provides accurate E-value assignments regardless whether a candidate peptide is proteotypic or not. The advantage of including proteotypic information is evidenced by its superior retrieval performance when compared to regular database searches.  相似文献   

14.
Reliable statistical validation of peptide and protein identifications is a top priority in large-scale mass spectrometry based proteomics. PeptideProphet is one of the computational tools commonly used for assessing the statistical confidence in peptide assignments to tandem mass spectra obtained using database search programs such as SEQUEST, MASCOT, or X! TANDEM. We present two flexible methods, the variable component mixture model and the semiparametric mixture model, that remove the restrictive parametric assumptions in the mixture modeling approach of PeptideProphet. Using a control protein mixture data set generated on an linear ion trap Fourier transform (LTQ-FT) mass spectrometer, we demonstrate that both methods improve parametric models in terms of the accuracy of probability estimates and the power to detect correct identifications controlling the false discovery rate to the same degree. The statistical approaches presented here require that the data set contain a sufficient number of decoy (known to be incorrect) peptide identifications, which can be obtained using the target-decoy database search strategy.  相似文献   

15.
Alves G  Ogurtsov AY  Yu YK 《PloS one》2010,5(11):e15438
Statistically meaningful comparison/combination of peptide identification results from various search methods is impeded by the lack of a universal statistical standard. Providing an E-value calibration protocol, we demonstrated earlier the feasibility of translating either the score or heuristic E-value reported by any method into the textbook-defined E-value, which may serve as the universal statistical standard. This protocol, although robust, may lose spectrum-specific statistics and might require a new calibration when changes in experimental setup occur. To mitigate these issues, we developed a new MS/MS search tool, RAId_aPS, that is able to provide spectrum-specific-values for additive scoring functions. Given a selection of scoring functions out of RAId score, K-score, Hyperscore and XCorr, RAId_aPS generates the corresponding score histograms of all possible peptides using dynamic programming. Using these score histograms to assign E-values enables a calibration-free protocol for accurate significance assignment for each scoring function. RAId_aPS features four different modes: (i) compute the total number of possible peptides for a given molecular mass range, (ii) generate the score histogram given a MS/MS spectrum and a scoring function, (iii) reassign E-values for a list of candidate peptides given a MS/MS spectrum and the scoring functions chosen, and (iv) perform database searches using selected scoring functions. In modes (iii) and (iv), RAId_aPS is also capable of combining results from different scoring functions using spectrum-specific statistics. The web link is http://www.ncbi.nlm.nih.gov/CBBresearch/Yu/raid_aps/index.html. Relevant binaries for Linux, Windows, and Mac OS X are available from the same page.  相似文献   

16.
We demonstrate an approach for global quantitative analysis of protein mixtures using differential stable isotopic labeling of the enzyme-digested peptides combined with microbore liquid chromatography (LC) matrix-assisted laser desorption ionization (MALDI) mass spectrometry (MS). Microbore LC provides higher sample loading, compared to capillary LC, which facilitates the quantification of low abundance proteins in protein mixtures. In this work, microbore LC is combined with MALDI MS via a heated droplet interface. The compatibilities of two global peptide labeling methods (i.e., esterification to carboxylic groups and dimethylation to amine groups of peptides) with this LC-MALDI technique are evaluated. Using a quadrupole-time-of-flight mass spectrometer, MALDI spectra of the peptides in individual sample spots are obtained to determine the abundance ratio among pairs of differential isotopically labeled peptides. MS/MS spectra are subsequently obtained from the peptide pairs showing significant abundance differences to determine the sequences of selected peptides for protein identification. The peptide sequences determined from MS/MS database search are confirmed by using the overlaid fragment ion spectra generated from a pair of differentially labeled peptides. The effectiveness of this microbore LC-MALDI approach is demonstrated in the quantification and identification of peptides from a mixture of standard proteins as well as E. coli whole cell extract of known relative concentrations. It is shown that this approach provides a facile and economical means of comparing relative protein abundances from two proteome samples.  相似文献   

17.
The Virtual Expert Mass Spectrometrist (VEMS) program package was developed for flexible, automated, and manual de novo tandem mass spectrometry (MS/MS) protein sequencing, and includes accessory programs for matrix-assisted laser desorption/ionization-mass spectrometry (MS) interpretation, and generation of protein and peptide databases. VEMS V2.0 has been developed into a fast tool for combining database-independent and -dependent protein assignments in an extended analysis of MS/MS-peptide data. MS or MS/MS data can be directly recalibrated after the first search by fitting the data to the best search result using polynomial equations. The score function is an improvement of known scoring algorithms and can be adapted for any MS instrument type. In addition, VEMS offers a novel statistical model for evaluating the significance of the protein assignment. The novel features are illustrated by the analysis of the fragmentation spectra obtained by liquid chromatrography-MS/MS analysis of peptides from an anionic peroxidase enriched protein fraction from potato root tissue. The extended analysis mode resulted in the additional assignment of spectra for nine modified tryptic peptides and nine miscleaved peptides, in addition to the 45 spectra from regular tryptic peptides. Of the nine modified peptides, three were glycosylated.  相似文献   

18.
Ahrné E  Ohta Y  Nikitin F  Scherl A  Lisacek F  Müller M 《Proteomics》2011,11(20):4085-4095
The relevance of libraries of annotated MS/MS spectra is growing with the amount of proteomic data generated in high-throughput experiments. These reference libraries provide a fast and accurate way to identify newly acquired MS/MS spectra. In the context of multiple hypotheses testing, the control of the number of false-positive identifications expected in the final result list by means of the calculation of the false discovery rate (FDR). In a classical sequence search where experimental MS/MS spectra are compared with the theoretical peptide spectra calculated from a sequence database, the FDR is estimated by searching randomized or decoy sequence databases. Despite on-going discussion on how exactly the FDR has to be calculated, this method is widely accepted in the proteomic community. Recently, similar approaches to control the FDR of spectrum library searches were discussed. We present in this paper a detailed analysis of the similarity between spectra of distinct peptides to set the basis of our own solution for decoy library creation (DeLiberator). It differs from the previously published results in some key points, mainly in implementing new methods that prevent decoy spectra from being too similar to the original library spectra while keeping important features of real MS/MS spectra. Using different proteomic data sets and library creation methods, we evaluate our approach and compare it with alternative methods.  相似文献   

19.
Improvements in ion trap instrumentation have made n-dimensional mass spectrometry more practical. The overall goal of the study was to describe a model for making use of MS(2) and MS(3) information in mass spectrometry experiments. We present a statistical model for adjusting peptide identification probabilities based on the combined information obtained by coupling peptide assignments of consecutive MS(2) and MS(3) spectra. Using two data sets, a mixture of known proteins and a complex phosphopeptide-enriched sample, we demonstrate an increase in discriminating power of the adjusted probabilities compared with models using MS(2) or MS(3) data only. This work also addresses the overall value of generating MS(3) data as compared with an MS(2)-only approach with a focus on the analysis of phosphopeptide data.  相似文献   

20.
For bottom‐up proteomics, there are wide variety of database‐searching algorithms in use for matching peptide sequences to tandem MS spectra. Likewise, there are numerous strategies being employed to produce a confident list of peptide identifications from the different search algorithm outputs. Here we introduce a grid‐search approach for determining optimal database filtering criteria in shotgun proteomics data analyses that is easily adaptable to any search. Systematic Trial and Error Parameter Selection‐–referred to as STEPS‐–utilizes user‐defined parameter ranges to test a wide array of parameter combinations to arrive at an optimal “parameter set” for data filtering, thus maximizing confident identifications. The benefits of this approach in terms of numbers of true‐positive identifications are demonstrated using datasets derived from immunoaffinity‐depleted blood serum and a bacterial cell lysate, two common proteomics sample types.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号