共查询到20条相似文献,搜索用时 15 毫秒
1.
The probability-based search engine MASCOT has been widely used to identify peptides and proteins in shotgun proteomic research. Most subsequent quality control methods filter out ambiguous assignments according to the ion score and thresholds provided by MASCOT. On the basis of target-decoy database search strategy, we evaluated the performance of several filter methods on MASCOT search results and demonstrated that using filter boundaries on two-dimensional feature spaces, the MASCOT ion score and its relative score can improve the sensitivity of the filter process. Furthermore, using a linear combination of several characteristics of the assigned peptides, including the MASCOT scores, 15 previously employed features, and some newly introduced features, we applied a Bayesian nonparametric model to MASCOT search results and validated more correctly identified peptides in control and complex data sets than those could be validated by empirical score thresholds. 相似文献
2.
Improving LC-MS sensitivity through increases in chromatographic performance: comparisons of UPLC-ES/MS/MS to HPLC-ES/MS/MS 总被引:1,自引:0,他引:1
Churchwell MI Twaddle NC Meeker LR Doerge DR 《Journal of chromatography. B, Analytical technologies in the biomedical and life sciences》2005,825(2):134-143
Recent technological advances have made available reverse phase chromatographic media with a 1.7 microm particle size along with a liquid handling system that can operate such columns at much higher pressures. This technology, termed ultra performance liquid chromatography (UPLC), offers significant theoretical advantages in resolution, speed, and sensitivity for analytical determinations, particularly when coupled with mass spectrometers capable of high-speed acquisitions. This paper explores the differences in LC-MS performance by conducting a side-by-side comparison of UPLC for several methods previously optimized for HPLC-based separation and quantification of multiple analytes with maximum throughput. In general, UPLC produced significant improvements in method sensitivity, speed, and resolution. Sensitivity increases with UPLC, which were found to be analyte-dependent, were as large as 10-fold and improvements in method speed were as large as 5-fold under conditions of comparable peak separations. Improvements in chromatographic resolution with UPLC were apparent from generally narrower peak widths and from a separation of diastereomers not possible using HPLC. Overall, the improvements in LC-MS method sensitivity, speed, and resolution provided by UPLC show that further advances can be made in analytical methodology to add significant value to hypothesis-driven research. 相似文献
3.
Manual checking is commonly employed to validate the phosphopeptide identifications from database searching of tandem mass spectra. It is very time-consuming and labor intensive as the number of phosphopeptide identifications increases greatly. In this study, a simple automatic validation approach was developed for phosphopeptide identification by combining consecutive stage mass spectrometry data and the target-decoy database searching strategy. Only phosphopeptides identified from both MS2 and its corresponding MS3 were accepted for further filtering, which greatly improved the reliability in phosphopeptide identification. Before database searching, the spectra were validated for charge state and neutral loss peak intensity, and then the invalid MS2/MS3 spectra were removed, which greatly reduced the database searching time. It was found that the sensitivity was significantly improved in MS2/MS3 strategy as the number of identified phosphopeptides was 2.5 times that obtained by the conventional filter-based MS2 approach. Because of the use of the target-decoy database, the false-discovery rate (FDR) of the identified phosphopeptides could be easily determined, and it was demonstrated that the determined FDR can precisely reflect the actual FDR without any manual validation stage. 相似文献
4.
Robust statistical validation of peptide identifications obtained by tandem mass spectrometry and sequence database searching is an important task in shotgun proteomics. PeptideProphet is a commonly used computational tool that computes confidence measures for peptide identifications. In this paper, we investigate several limitations of the PeptideProphet modeling approach, including the use of fixed coefficients in computing the discriminant search score and selection of the top scoring peptide assignment per spectrum only. To address these limitations, we describe an adaptive method in which a new discriminant function is learned from the data in an iterative fashion. We extend the modeling framework to go beyond the top scoring peptide assignment per spectrum. We also investigate the effect of clustering the spectra according to their spectrum quality score followed by cluster-specific mixture modeling. The analysis is carried out using data acquired from a mixture of purified proteins on four different types of mass spectrometers, as well as using a complex human serum data set. A special emphasis is placed on the analysis of data generated on high mass accuracy instruments. 相似文献
5.
A new database search algorithm has been developed to identify disulfide-linked peptides in tandem MS data sets. The algorithm is included in the newly developed tandem MS database search program, MassMatrix. The algorithm exploits the probabilistic scoring model in MassMatrix to achieve identification of disulfide bonds in proteins and peptides. Proteins and peptides with disulfide bonds can be identified with high confidence without chemical reduction or other derivatization. The approach was tested on peptide and protein standards with known disulfide bonds. All disulfide bonds in the standard set were identified by MassMatrix. The algorithm was further tested on bovine pancreatic ribonuclease A (RNaseA). The 4 native disulfide bonds in RNaseA were detected by MassMatrix with multiple validated peptide matches for each disulfide bond with high statistical scores. Fifteen nonnative disulfide bonds were also observed in the protein digest under basic conditions (pH = 8.0) due to disulfide bond interchange. After minimizing the disulfide bond interchange (pH = 6.0) during digestion, only one nonnative disulfide bond was observed. The MassMatrix algorithm offers an additional approach for the discovery of disulfide bond from tandem mass spectrometry data. 相似文献
6.
In this paper, we improve the homology search performance by the combination of the predicted protein secondary structures and protein sequences. Previous research suggested that the straightforward combination of predicted secondary structures did not improve the homology search performance, mostly because of the errors in the structure prediction. We solved this problem by taking into account the confidence scores output by the prediction programs. 相似文献
7.
An evaluation, comparison, and accurate benchmarking of several publicly available MS/MS search algorithms: sensitivity and specificity analysis 总被引:1,自引:0,他引:1
Kapp EA Schütz F Connolly LM Chakel JA Meza JE Miller CA Fenyo D Eng JK Adkins JN Omenn GS Simpson RJ 《Proteomics》2005,5(13):3475-3490
MS/MS and associated database search algorithms are essential proteomic tools for identifying peptides. Due to their widespread use, it is now time to perform a systematic analysis of the various algorithms currently in use. Using blood specimens used in the HUPO Plasma Proteome Project, we have evaluated five search algorithms with respect to their sensitivity and specificity, and have also accurately benchmarked them based on specified false-positive (FP) rates. Spectrum Mill and SEQUEST performed well in terms of sensitivity, but were inferior to MASCOT, X!Tandem, and Sonar in terms of specificity. Overall, MASCOT, a probabilistic search algorithm, correctly identified most peptides based on a specified FP rate. The rescoring algorithm, PeptideProphet, enhanced the overall performance of the SEQUEST algorithm, as well as provided predictable FP error rates. Ideally, score thresholds should be calculated for each peptide spectrum or minimally, derived from a reversed-sequence search as demonstrated in this study based on a validated data set. The availability of open-source search algorithms, such as X!Tandem, makes it feasible to further improve the validation process (manual or automatic) on the basis of "consensus scoring", i.e., the use of multiple (at least two) search algorithms to reduce the number of FPs. complement. 相似文献
8.
Confident peptide identification is one of the most important components in mass-spectrometry-based proteomics. We propose a method to properly combine the results from different database search methods to enhance the accuracy of peptide identifications. The database search methods included in our analysis are SEQUEST (v27 rev12), ProbID (v1.0), InsPecT (v20060505), Mascot (v2.1), X! Tandem (v2007.07.01.2), OMSSA (v2.0) and RAId_DbS. Using two data sets, one collected in profile mode and one collected in centroid mode, we tested the search performance of all 21 combinations of two search methods as well as all 35 possible combinations of three search methods. The results obtained from our study suggest that properly combining search methods does improve retrieval accuracy. In addition to performance results, we also describe the theoretical framework which in principle allows one to combine many independent scoring methods including de novo sequencing and spectral library searches. The correlations among different methods are also investigated in terms of common true positives, common false positives, and a global analysis. We find that the average correlation strength, between any pairwise combination of the seven methods studied, is usually smaller than the associated standard error. This indicates only weak correlation may be present among different methods and validates our approach in combining the search results. The usefulness of our approach is further confirmed by showing that the average cumulative number of false positive peptides agrees reasonably well with the combined E-value. The data related to this study are freely available upon request. 相似文献
9.
10.
Yu LR Zhu Z Chan KC Issaq HJ Dimitrov DS Veenstra TD 《Journal of proteome research》2007,6(11):4150-4162
Enrichment is essential for phosphoproteome analysis because phosphorylated proteins are usually present in cells in low abundance. Recently, titanium dioxide (TiO2) has been demonstrated to enrich phosphopeptides from simple peptide mixtures with high specificity; however, the technology has not been optimized. In the present study, significant non-specific bindings were observed when proteome samples were applied to TiO2 columns. Column wash with an NH4Glu solution after loading peptide mixtures significantly increased the efficiency of TiO2 phosphopeptide enrichment with a recovery of up to 84%. Also, for proteome samples, more than a 2-fold increase in unique phosphopeptide identifications has been achieved. The use of NH4Glu for a TiO2 column wash does not significantly reduce the phosphopeptide recovery. A total of 858 phosphopeptides corresponding to 1034 distinct phosphosites has been identified from HeLa cells using the improved TiO2 enrichment procedure in combination with data-dependent neutral loss nano-RPLC-MS2-MS3 analysis. While 41 and 35% of the phosphopeptides were identified only by MS2 and MS3, respectively, 24% was identified by both MS2 and MS3. Cross-validation of the phosphopeptide assignment by MS2 and MS3 scans resulted in the highest confidence in identification (99.5%). Many phosphosites identified in this study appear to be novel, including sites from antigen Ki-67, nucleolar phosphoprotein p130, and Treacle protein. The study also indicates that evaluation of confidence levels for phosphopeptide identification via the reversed sequence database searching strategy might underestimate the false positive rate. 相似文献
11.
Higdon R Kolker N Picone A van Belle G Kolker E 《Omics : a journal of integrative biology》2004,8(4):357-369
This study addresses the issue of peptide identification resulting from tandem mass spectrometry proteomics analysis followed by database search. This work shows that the Logistic Identification of Peptides (LIP) Index achieves high sensitivity and specificity for peptide classification relative to a manually verified "gold" standard and also accurately estimates the probability of a correct peptide match. The LIP Index is a weighted average of SEQUEST output variables based on logistic regression models and is a transparent, easy to use, inclusive, extendable, and statistically sound approach to classify correct peptide identifications. Modifications, such as normalizing cross-correlations (Xcorr) for peptide length, adjusting for charge state, and the number of tryptic termini, significantly improve the fit the logistic regression models, as well as increase sensitivity and specificity. The LIP Index also incorporates earlier developed statistical models on spectral quality assessment and peptide identification, which further improves sensitivity and specificity. 相似文献
12.
Mass spectrometry data are often corrupted by noise. It is very difficult to simultaneously detect low-abundance peaks and reduce false-positive peak detection caused by noise. In this paper, we propose to improve peak detection using an additional constraint: the consistent appearance of similar true peaks across multiple spectra. We observe that false -positive peaks in general do not repeat themselves well across multiple spectra. When we align all the identified peaks (including false-positive ones) from multiple spectra together, those false-positive peaks are not as consistent as true peaks. Thus, we propose to use information from other spectra in order to reduce false-positive peaks. The new method improves the detection of peaks over the traditional single spectrum based peak detection methods. Consequently, the discovery of cancer biomarkers also benefits from this improvement. Source code and additional data are available at: http://www.ece.ust.hk/ approximately eeyu/mspeak.htm. 相似文献
13.
Statistically meaningful comparison/combination of peptide identification results from various search methods is impeded by the lack of a universal statistical standard. Providing an E-value calibration protocol, we demonstrated earlier the feasibility of translating either the score or heuristic E-value reported by any method into the textbook-defined E-value, which may serve as the universal statistical standard. This protocol, although robust, may lose spectrum-specific statistics and might require a new calibration when changes in experimental setup occur. To mitigate these issues, we developed a new MS/MS search tool, RAId_aPS, that is able to provide spectrum-specific-values for additive scoring functions. Given a selection of scoring functions out of RAId score, K-score, Hyperscore and XCorr, RAId_aPS generates the corresponding score histograms of all possible peptides using dynamic programming. Using these score histograms to assign E-values enables a calibration-free protocol for accurate significance assignment for each scoring function. RAId_aPS features four different modes: (i) compute the total number of possible peptides for a given molecular mass range, (ii) generate the score histogram given a MS/MS spectrum and a scoring function, (iii) reassign E-values for a list of candidate peptides given a MS/MS spectrum and the scoring functions chosen, and (iv) perform database searches using selected scoring functions. In modes (iii) and (iv), RAId_aPS is also capable of combining results from different scoring functions using spectrum-specific statistics. The web link is http://www.ncbi.nlm.nih.gov/CBBresearch/Yu/raid_aps/index.html. Relevant binaries for Linux, Windows, and Mac OS X are available from the same page. 相似文献
14.
Indicator species are species that are used as ecological indicators of community or habitat types, environmental conditions, or environmental changes. In order to determine indicator species, the characteristic to be predicted is represented in the form of a classification of the sites, which is compared to the patterns of distribution of the species found at the sites. Indicator species analysis should take into account the fact that species have different niche breadths: if a species is related to the conditions prevailing in two or more groups of sites, an indicator species analysis undertaken on individual groups of sites may fail to reveal this association. In this paper, we suggest improving indicator species analysis by considering all possible combinations of groups of sites and selecting the combination for which the species can be best used as indicator. When using a correlation index, such as the point‐biserial correlation, the method yields the combination where the difference between the observed and expected abundance/frequency of the species is the largest. When an indicator value index (IndVal) is used, the method provides the set of site‐groups that best matches the observed distribution pattern of the species. We illustrate the advantages of the method in three different examples. Consideration of combinations of groups of sites provides an extra flexibility to qualitatively model the habitat preferences of the species of interest. The method also allows users to cross multiple classifications of the same sites, increasing the amount of information resulting from the analysis. When applied to community types, it allows one to distinguish those species that characterize individual types from those that characterize the relationships between them. This distinction is useful to determine the number of types that maximizes the number of indicator species. 相似文献
15.
The identification of proteins separated on two-dimensional gels is most commonly performed by trypsin digestion and subsequent matrix-assisted laser desorption ionization (MALDI) with time-of-flight (TOF). Recently, atmospheric pressure (AP) MALDI coupled to an ion trap (IT) has emerged as a convenient method to obtain tandem mass spectra (MS/MS) from samples on MALDI target plates. In the present work, we investigated the feasibility of using the two methodologies in line as a standard method for protein identification. In this setup, the high mass accuracy MALDI-TOF spectra are used to calibrate the peptide precursor masses in the lower mass accuracy AP-MALDI-IT MS/MS spectra. Several software tools were developed to automate the analysis process. Two sets of MALDI samples, consisting of 142 and 421 gel spots, respectively, were analyzed in a highly automated manner. In the first set, the protein identification rate increased from 61% for MALDI-TOF only to 85% for MALDI-TOF combined with AP-MALDI-IT. In the second data set the increase in protein identification rate was from 44% to 58%. AP-MALDI-IT MS/MS spectra were in general less effective than the MALDI-TOF spectra for protein identification, but the combination of the two methods clearly enhanced the confidence in protein identification. 相似文献
16.
The one-out-all-out approach (OOAO) for aggregating the assessments of single elements (e.g. species or ecosystem components) has found application in environmental polices such as the European Water Framework Directive or the Marine Strategy Framework Directive (MSFD). However, the OOAO has been challenged as being too pessimistic by making positive assessment results virtually impossible along the increasing number of aggregated elements. This study presents a generic approach, the probabilistic One-out-all-out approach, (pOOAO), to account for this issue and thereby reconciling the OOAO with probabilistic aggregation methods The pOOAO allows to determine the minimum number of elements (KGES), which should meet their assessment benchmarks and thus should achieve a good status. By pre-setting a generic confidence level for each single assessment (e.g. 0.95) the binomial distribution can be used to obtain KGES for any number of assessed elements. The pOOAO can also accommodate for the integration of assessments from multiple indicators within an element by adjusting the confidence level in relation to the number of integrated indicators. Depending on the generic confidence level as well as on the number of and integrated indicators and aggregated elements, the pOOAO is either consistent with the OOAO or allows for a certain number of negative assessment results, which are attributed to statistical uncertainty and error propagation. The pOOAO is consistent with the OOAO if the desired confidence level in the single assessment results is very high (>0.99) and/or the number of aggregated elements and integrated indicators is low. Through this flexibility the pOOAO could find wide application within integrated ecosystem assessments frameworks such as the MSFD, but would require to estimate the confidence level for each single assessment. 相似文献
17.
Wedge DC Krishna R Blackhurst P Siepen JA Jones AR Hubbard SJ 《Journal of proteome research》2011,10(4):2088-2094
Confident identification of peptides via tandem mass spectrometry underpins modern high-throughput proteomics. This has motivated considerable recent interest in the postprocessing of search engine results to increase confidence and calculate robust statistical measures, for example through the use of decoy databases to calculate false discovery rates (FDR). FDR-based analyses allow for multiple testing and can assign a single confidence value for both sets and individual peptide spectrum matches (PSMs). We recently developed an algorithm for combining the results from multiple search engines, integrating FDRs for sets of PSMs made by different search engine combinations. Here we describe a web-server and a downloadable application that makes this routinely available to the proteomics community. The web server offers a range of outputs including informative graphics to assess the confidence of the PSMs and any potential biases. The underlying pipeline also provides a basic protein inference step, integrating PSMs into protein ambiguity groups where peptides can be matched to more than one protein. Importantly, we have also implemented full support for the mzIdentML data standard, recently released by the Proteomics Standards Initiative, providing users with the ability to convert native formats to mzIdentML files, which are available to download. 相似文献
18.
19.
A model for an antibody specific for the carcinoembryonic antigen (CEA) has been constructed using a method which combines the concept of canonical structures with conformational search. A conformational search technique is introduced which couples random generation of backbone loop conformations to a simulated annealing method for assigning side chain conformations. This technique was used both to verify conformations selected from the set of known canonical structures and to explore conformations available to the H3 loop in CEA ab initio. Canonical structures are not available for H3 due to its variability in length, sequence, and observed conformation in known antibody structures. Analysis of the results of conformational search resulted in three equally probable conformations for H3 loop in CEA. Force field energies, solvation free energies, exposure of charged residues and burial of hydrophobic residues, and packing of hydrophobic residues at the base of the loop were used as selection criteria. The existence of three equally plausible structures may reflect the high degree of flexibility expected for an exposed loop of this length. The nature of the combining site and features which could be important to interaction with antigen are discussed. 相似文献
20.
Tandem mass spectrometry-based proteomics experiments produce large amounts of raw data, and different database search engines are needed to reliably identify all the proteins from this data. Here, we present Compid, an easy-to-use software tool that can be used to integrate and compare protein identification results from two search engines, Mascot and Paragon. Additionally, Compid enables extraction of information from large Mascot result files that cannot be opened via the Web interface and calculation of general statistical information about peptide and protein identifications in a data set. To demonstrate the usefulness of this tool, we used Compid to compare Mascot and Paragon database search results for mitochondrial proteome sample of human keratinocytes. The reports generated by Compid can be exported and opened as Excel documents or as text files using configurable delimiters, allowing the analysis and further processing of Compid output with a multitude of programs. Compid is freely available and can be downloaded from http://users.utu.fi/lanatr/compid. It is released under an open source license (GPL), enabling modification of the source code. Its modular architecture allows for creation of supplementary software components e.g. to enable support for additional input formats and report categories. 相似文献