首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
LC‐MS experiments can generate large quantities of data, for which a variety of database search engines are available to make peptide and protein identifications. Decoy databases are becoming widely used to place statistical confidence in result sets, allowing the false discovery rate (FDR) to be estimated. Different search engines produce different identification sets so employing more than one search engine could result in an increased number of peptides (and proteins) being identified, if an appropriate mechanism for combining data can be defined. We have developed a search engine independent score, based on FDR, which allows peptide identifications from different search engines to be combined, called the FDR Score. The results demonstrate that the observed FDR is significantly different when analysing the set of identifications made by all three search engines, by each pair of search engines or by a single search engine. Our algorithm assigns identifications to groups according to the set of search engines that have made the identification, and re‐assigns the score (combined FDR Score). The combined FDR Score can differentiate between correct and incorrect peptide identifications with high accuracy, allowing on average 35% more peptide identifications to be made at a fixed FDR than using a single search engine.  相似文献   

2.
Recently, we have developed a high-resolution two-dimensional separation strategy for the analysis of complex peptide mixtures. This methodology employs isoelectric focusing of peptides on immobilized pH gradient (IPG) gels in the first dimension, followed by reversed-phase chromatography in the second dimension, and subsequent tandem mass spectrometry analysis. The traditional approach to this mixture problem employs strong-cation-exchange (SCX) chromatography in the first dimension. Here, we present a direct comparison of these two first-dimensional techniques using complex protein samples derived from the testis of Rattus norvegicus. It was found that the use of immobilized pH gradients (narrow range pH 3.5-4.5) for peptide separation in the first dimension yielded 13% more protein identifications than the optimized off-line SCX approach (employing the entire pI range of the sample). In addition, the IPG technique allows for a much more efficient use on mass spectrometer analysis time. Separation of a tryptic digest derived from a rat testis sample on a narrow range pH gradient (over the 3.5-4.5 pH range) yielded 7626 and 2750 peptides and proteins, respectively. Peptide and protein identification was performed with high confidence using SEQUEST in combination with a data filtering program employing pI and statistical based functions to remove false-positives from the data.  相似文献   

3.
In the analysis of proteins in complex samples, pre-fractionation is imperative to obtain the necessary depth in the number of reliable protein identifications by mass spectrometry. Here we explore isoelectric focusing of peptides (peptide IEF) as an effective fractionation step that at the same time provides the added possibility to eliminate spurious peptide identifications by filtering for pI. Peptide IEF in IPG strips is fast and sharply confines peptides to their pI. We have evaluated systematically the contribution of pI filtering and accurate mass measurements on the total number of protein identifications in a complex protein mixture (Drosophila nuclear extract). At the same time, by varying Mascot identification cutoff scores, we have monitored the false positive rate among these identifications by searching reverse protein databases. From mass spectrometric analyses at low mass accuracy using an LTQ ion trap, false positive rates can be minimized by filtering of peptides not focusing at their expected pI. Analyses using an LTQ-FT mass spectrometer delivers low false positive rates by itself due to the high mass accuracy. In a direct comparison of peptide IEF with SDS-PAGE as a pre-fractionation step, IEF delivered 25% and 43% more proteins when identified using FT-MS and LTQ-MS, respectively. Cumulatively, 2190 non redundant proteins were identified in the Drosophila nuclear extract at a false positive rate of 0.5%. Of these, 1751 proteins (80%) were identified after peptide IEF and FT-MS alone. Overall, we show that peptide IEF allows to increase the confidence level of protein identifications, and is more sensitive than SDS-PAGE.  相似文献   

4.
We describe the application of LC-MS without the use of stable isotope labeling for differential quantitative proteomic analysis of whole cell lysates of Shewanella oneidensis MR-1 cultured under aerobic and suboxic conditions. LC-MS/MS was used to initially identify peptide sequences, and LC-FTICR was used to confirm these identifications as well as measure relative peptide abundances. 2343 peptides covering 668 proteins were identified with high confidence and quantified. Among these proteins, a subset of 56 changed significantly using statistical approaches such as statistical analysis of microarrays, whereas another subset of 56 that were annotated as performing housekeeping functions remained essentially unchanged in relative abundance. Numerous proteins involved in anaerobic energy metabolism exhibited up to a 10-fold increase in relative abundance when S. oneidensis was transitioned from aerobic to suboxic conditions.  相似文献   

5.
Shotgun proteomics using mass spectrometry is a powerful method for protein identification but suffers limited sensitivity in complex samples. Integrating peptide identifications from multiple database search engines is a promising strategy to increase the number of peptide identifications and reduce the volume of unassigned tandem mass spectra. Existing methods pool statistical significance scores such as p-values or posterior probabilities of peptide-spectrum matches (PSMs) from multiple search engines after high scoring peptides have been assigned to spectra, but these methods lack reliable control of identification error rates as data are integrated from different search engines. We developed a statistically coherent method for integrative analysis, termed MSblender. MSblender converts raw search scores from search engines into a probability score for every possible PSM and properly accounts for the correlation between search scores. The method reliably estimates false discovery rates and identifies more PSMs than any single search engine at the same false discovery rate. Increased identifications increment spectral counts for most proteins and allow quantification of proteins that would not have been quantified by individual search engines. We also demonstrate that enhanced quantification contributes to improve sensitivity in differential expression analyses.  相似文献   

6.
Proteomic discovery platforms generate both peptide expression information and protein identification information. Peptide expression data are used to determine which peptides are differentially expressed between study cohorts, and then these peptides are targeted for protein identification. In this paper, we demonstrate that peptide expression information is also a powerful tool for enhancing confidence in protein identification results. Specifically, we evaluate the following hypothesis: tryptic peptides originating from the same protein have similar expression profiles across samples in the discovery study. Evidence supporting this hypothesis is provided. This hypothesis is integrated into a protein identification tool, PIPER (Protein Identification and Peptide Expression Resolver), that reduces erroneous protein identifications below 5%. PIPER's utility is illustrated by application to a 72-sample biomarker discovery study where it is demonstrated that false positive protein identifications can be reduced below 5%. Consequently, it is recommended that PIPER methodology be incorporated into proteomic studies where both protein expression and identification data are collected.  相似文献   

7.
Proteomic approaches to biological research that will prove the most useful and productive require robust, sensitive, and reproducible technologies for both the qualitative and quantitative analysis of complex protein mixtures. Here we applied the isotope-coded affinity tag (ICAT) approach to quantitative protein profiling, in this case proteins that copurified with lipid raft plasma membrane domains isolated from control and stimulated Jurkat human T cells. With the ICAT approach, cysteine residues of the two related protein isolates were covalently labeled with isotopically normal and heavy versions of the same reagent, respectively. Following proteolytic cleavage of combined labeled proteins, peptides were fractionated by multidimensional chromatography and subsequently analyzed via automated tandem mass spectrometry. Individual tandem mass spectrometry spectra were searched against a human sequence database, and a variety of recently developed, publicly available software applications were used to sort, filter, analyze, and compare the results of two repetitions of the same experiment. In particular, robust statistical modeling algorithms were used to assign measures of confidence to both peptide sequences and the proteins from which they were likely derived, identified via the database searches. We show that by applying such statistical tools to the identification of T cell lipid raft-associated proteins, we were able to estimate the accuracy of peptide and protein identifications made. These tools also allow for determination of the false positive rate as a function of user-defined data filtering parameters, thus giving the user significant control over and information about the final output of large-scale proteomic experiments. With the ability to assign probabilities to all identifications, the need for manual verification of results is substantially reduced, thus making the rapid evaluation of large proteomic datasets possible. Finally, by repeating the experiment, information relating to the general reproducibility and validity of this approach to large-scale proteomic analyses was also obtained.  相似文献   

8.
To interpret LC-MS/MS data in proteomics, most popular protein identification algorithms primarily use predicted fragment m/z values to assign peptide sequences to fragmentation spectra. The intensity information is often undervalued, because it is not as easy to predict and incorporate into algorithms. Nevertheless, the use of intensity to assist peptide identification is an attractive prospect and can potentially improve the confidence of matches and generate more identifications. On the basis of our previously reported study of fragmentation intensity patterns, we developed a protein identification algorithm, SeQuence IDentfication (SQID), that makes use of the coarse intensity from a statistical analysis. The scoring scheme was validated by comparing with Sequest and X!Tandem using three data sets, and the results indicate an improvement in the number of identified peptides, including unique peptides that are not identified by Sequest or X!Tandem. The software and source code are available under the GNU GPL license at http://quiz2.chem.arizona.edu/wysocki/bioinformatics.htm.  相似文献   

9.
Tandem mass spectrometry (MS/MS) combined with database searching is currently the most widely used method for high-throughput peptide and protein identification. Many different algorithms, scoring criteria, and statistical models have been used to identify peptides and proteins in complex biological samples, and many studies, including our own, describe the accuracy of these identifications, using at best generic terms such as "high confidence." False positive identification rates for these criteria can vary substantially with changing organisms under study, growth conditions, sequence databases, experimental protocols, and instrumentation; therefore, study-specific methods are needed to estimate the accuracy (false positive rates) of these peptide and protein identifications. We present and evaluate methods for estimating false positive identification rates based on searches of randomized databases (reversed and reshuffled). We examine the use of separate searches of a forward then a randomized database and combined searches of a randomized database appended to a forward sequence database. Estimated error rates from randomized database searches are first compared against actual error rates from MS/MS runs of known protein standards. These methods are then applied to biological samples of the model microorganism Shewanella oneidensis strain MR-1. Based on the results obtained in this study, we recommend the use of use of combined searches of a reshuffled database appended to a forward sequence database as a means providing quantitative estimates of false positive identification rates of peptides and proteins. This will allow researchers to set criteria and thresholds to achieve a desired error rate and provide the scientific community with direct and quantifiable measures of peptide and protein identification accuracy as opposed to vague assessments such as "high confidence."  相似文献   

10.
In recent years, a variety of approaches have been developed using decoy databases to empirically assess the error associated with peptide identifications from large-scale proteomics experiments. We have developed an approach for calculating the expected uncertainty associated with false-positive rate determination using concatenated reverse and forward protein sequence databases. After explaining the theoretical basis of our model, we compare predicted error with the results of experiments characterizing a series of mixtures containing known proteins. In general, results from characterization of known proteins show good agreement with our predictions. Finally, we consider how these approaches may be applied to more complicated data sets, as when peptides are separated by charge state prior to false-positive determination.  相似文献   

11.
Mixtures of moderate complexity were formed from 23 peptides and 12 proteins digested with trypsin, all individually characterized. These mixtures were analyzed with replicates in full and windowed m/z ranges using online high-performance reverse phase liquid chromatography coupled via electrospray ionization to an ion trap mass spectrometer. The resulting spectra were searched using SEQUEST against databases of different sizes and contents and confidences of the observed identifications were evaluated by our earlier statistical model. These data were then combined with biologically derived spectral data, searched, and further evaluated. All peptides but one and all proteins were identified with high confidence. Additionally, the presence and behavior of quadruply charged peptides was analyzed. The properties of the proposed peptide and protein mixtures as well as the performance of the statistical model were carefully investigated. These mixtures mimic the complexity seen in large-scale proteomics experiments, and are proposed to serve as quality assessment standards for future proteome studies.  相似文献   

12.
Comparing a protein's concentrations across two or more treatments is the focus of many proteomics studies. A frequent source of measurements for these comparisons is a mass spectrometry (MS) analysis of a protein's peptide ions separated by liquid chromatography (LC) following its enzymatic digestion. Alas, LC-MS identification and quantification of equimolar peptides can vary significantly due to their unequal digestion, separation, and ionization. This unequal measurability of peptides, the largest source of LC-MS nuisance variation, stymies confident comparison of a protein's concentration across treatments. Our objective is to introduce a mixed-effects statistical model for comparative LC-MS proteomics studies. We describe LC-MS peptide abundance with a linear model featuring pivotal terms that account for unequal peptide LC-MS measurability. We advance fitting this model to an often incomplete LC-MS data set with REstricted Maximum Likelihood (REML) estimation, producing estimates of model goodness-of-fit, treatment effects, standard errors, confidence intervals, and protein relative concentrations. We illustrate the model with an experiment featuring a known dilution series of a filamentous ascomycete fungus Trichoderma reesei protein mixture. For 781 of the 1546 T. reesei proteins with sufficient data coverage, the fitted mixed-effects models capably described the LC-MS measurements. The LC-MS measurability terms effectively accounted for this major source of uncertainty. Ninety percent of the relative concentration estimates were within 0.5-fold of the true relative concentrations. Akin to the common ratio method, this model also produced biased estimates, albeit less biased. Bias decreased significantly, both absolutely and relative to the ratio method, as the number of observed peptides per protein increased. Mixed-effects statistical modeling offers a flexible, well-established methodology for comparative proteomics studies integrating common experimental designs with LC-MS sample processing plans. It favorably accounts for the unequal LC-MS measurability of peptides and produces informative quantitative comparisons of a protein's concentration across treatments with objective measures of uncertainties.  相似文献   

13.
14.
Shotgun proteomics commonly utilizes database search like Mascot to identify proteins from tandem MS/MS spectra. False discovery rate (FDR) is often used to assess the confidence of peptide identifications. However, a widely accepted FDR of 1% sacrifices the sensitivity of peptide identification while improving the accuracy. This article details a machine learning approach combining retention time based support vector regressor (RT-SVR) with q value based statistical analysis to improve peptide and protein identifications with high sensitivity and accuracy. The use of confident peptide identifications as training examples and careful feature selection ensures high R values (>0.900) for all models. The application of RT-SVR model on Mascot results (p=0.10) increases the sensitivity of peptide identifications. q Value, as a function of deviation between predicted and experimental RTs (ΔRT), is used to assess the significance of peptide identifications. We demonstrate that the peptide and protein identifications increase by up to 89.4% and 83.5%, respectively, for a specified q value of 0.01 when applying the method to proteomic analysis of the natural killer leukemia cell line (NKL). This study establishes an effective methodology and provides a platform for profiling confident proteomes in more relevant species as well as a future investigation of accurate protein quantification.  相似文献   

15.
A very popular approach in proteomics is the so-called "shotgun LC-MS/MS" strategy. In its mostly used form, a total protein digest is separated by ion exchange fractionation in the first dimension followed by off- or on-line RP LC-MS/MS. We replaced the first dimension by isoelectric focusing in the liquid phase using the Off-Gel device producing 15 fractions. As peptides are separated by their isoelectric point in the first dimension and hydrophobicity in the second, those experimentally derived parameters (pI and R(T)) can be used for the validation of potentially identified peptides. We applied this strategy to a cellular extract of Drosophila Kc167 cells and identified peptides with two different database search engines, namely PHENYX and SEQUEST, with PeptideProphet validation of the SEQUEST results. PHENYX returned 7582 potential peptide identifications and SEQUEST 7629. The SEQUEST results were reduced to 2006 identifications by validation with PeptideProphet. Validation of the PeptideProphet, SEQUEST and PHENYX results by pI and R(T) parameters confirmed 1837 PeptideProphet identifications while in the remainder of the SEQUEST results another 1130 peptides were found to be likely hits. The validation on PHENYX resulted in the fixation of a solid p-value threshold of <1 x 10(-04) that sets by itself the correct identification confidence to >95%, and a final count of 2034 highly confident peptide identifications was achieved after pI and R(T) validation. Although the PeptideProphet and PHENYX datasets have a very high confidence the overlap of common identifications was only at 79.4%, to be explained by the fact that data interpretation was done searching different protein databases with two search engines of different algorithms. The approach used in this study allowed for an automated and improved data validation process for shotgun proteomics projects producing MS/MS peptide identification results of very high confidence.  相似文献   

16.
A new database search algorithm has been developed to identify disulfide-linked peptides in tandem MS data sets. The algorithm is included in the newly developed tandem MS database search program, MassMatrix. The algorithm exploits the probabilistic scoring model in MassMatrix to achieve identification of disulfide bonds in proteins and peptides. Proteins and peptides with disulfide bonds can be identified with high confidence without chemical reduction or other derivatization. The approach was tested on peptide and protein standards with known disulfide bonds. All disulfide bonds in the standard set were identified by MassMatrix. The algorithm was further tested on bovine pancreatic ribonuclease A (RNaseA). The 4 native disulfide bonds in RNaseA were detected by MassMatrix with multiple validated peptide matches for each disulfide bond with high statistical scores. Fifteen nonnative disulfide bonds were also observed in the protein digest under basic conditions (pH = 8.0) due to disulfide bond interchange. After minimizing the disulfide bond interchange (pH = 6.0) during digestion, only one nonnative disulfide bond was observed. The MassMatrix algorithm offers an additional approach for the discovery of disulfide bond from tandem mass spectrometry data.  相似文献   

17.
As a test case for optimizing how to perform proteomics experiments, we chose a yeast model system in which the UPF1 gene, a protein involved in nonsense-mediated mRNA decay, was knocked out by homologous recombination. The results from five complete isotope-coded affinity tag (ICAT) experiments were combined, two using matrix-assisted laser desorption/ionization (MALDI) tandem mass spectrometry (MS/MS) and three using electrospray MS/MS. We sought to assess the reproducibility of peptide identification and to develop an informatics structure that characterizes the identification process as well as possible, especially with regard to tenuous identifications. The cleavable form of the ICAT reagent system was used for quantification. Most proteins did not change significantly in expression as a consequence of the upf1 knockout. As expected, the Upf1 protein itself was down-regulated, and there were reproducible increases in expression of proteins involved in arginine biosynthesis. Initially, it seemed that about 10% of the proteins had changed in expression level, but after more thorough examination of the data it turned out that most of these apparent changes could be explained by artifacts of quantification caused by overlapping heavy/light pairs. About 700 proteins altogether were identified with high confidence and quantified. Many peptides with chemical modifications were identified, as well as peptides with noncanonical tryptic termini. Nearly all of these modified peptides corresponded to the most abundant yeast proteins, and some would otherwise have been attributed to "single hit" proteins at low confidence. To improve our confidence in the identifications, in MALDI experiments, the parent masses for the peptides were calibrated against nearby components. In addition, five novel parameters reflecting different aspects of identification were collected for each spectrum in addition to the Mascot score that was originally used. The interrelationship between these scoring parameters and confidence in protein identification is discussed.  相似文献   

18.
Reliable statistical validation of peptide and protein identifications is a top priority in large-scale mass spectrometry based proteomics. PeptideProphet is one of the computational tools commonly used for assessing the statistical confidence in peptide assignments to tandem mass spectra obtained using database search programs such as SEQUEST, MASCOT, or X! TANDEM. We present two flexible methods, the variable component mixture model and the semiparametric mixture model, that remove the restrictive parametric assumptions in the mixture modeling approach of PeptideProphet. Using a control protein mixture data set generated on an linear ion trap Fourier transform (LTQ-FT) mass spectrometer, we demonstrate that both methods improve parametric models in terms of the accuracy of probability estimates and the power to detect correct identifications controlling the false discovery rate to the same degree. The statistical approaches presented here require that the data set contain a sufficient number of decoy (known to be incorrect) peptide identifications, which can be obtained using the target-decoy database search strategy.  相似文献   

19.
In high-throughput mass spectrometry proteomics, peptides and proteins are not simply identified as present or not present in a sample, rather the identifications are associated with differing levels of confidence. The false discovery rate (FDR) has emerged as an accepted means for measuring the confidence associated with identifications. We have developed the Systematic Protein Investigative Research Environment (SPIRE) for the purpose of integrating the best available proteomics methods. Two successful approaches to estimating the FDR for MS protein identifications are the MAYU and our current SPIRE methods. We present here a method to combine these two approaches to estimating the FDR for MS protein identifications into an integrated protein model (IPM). We illustrate the high quality performance of this IPM approach through testing on two large publicly available proteomics datasets. MAYU and SPIRE show remarkable consistency in identifying proteins in these datasets. Still, IPM results in a more robust FDR estimation approach and additional identifications, particularly among low abundance proteins. IPM is now implemented as a part of the SPIRE system.  相似文献   

20.
Identification of novel diagnostic or therapeutic biomarkers from human blood plasma would benefit significantly from quantitative measurements of the proteome constituents over a range of physiological conditions. Herein we describe an initial demonstration of proteome-wide quantitative analysis of human plasma. The approach utilizes postdigestion trypsin-catalyzed 16O/18O peptide labeling, two-dimensional LC-FTICR mass spectrometry, and the accurate mass and time (AMT) tag strategy to identify and quantify peptides/proteins from complex samples. A peptide accurate mass and LC elution time AMT tag data base was initially generated using MS/MS following extensive multidimensional LC separations to provide the basis for subsequent peptide identifications. The AMT tag data base contains >8,000 putative identified peptides, providing 938 confident plasma protein identifications. The quantitative approach was applied without depletion of high abundance proteins for comparative analyses of plasma samples from an individual prior to and 9 h after lipopolysaccharide (LPS) administration. Accurate quantification of changes in protein abundance was demonstrated by both 1:1 labeling of control plasma and the comparison between the plasma samples following LPS administration. A total of 429 distinct plasma proteins were quantified from the comparative analyses, and the protein abundances for 25 proteins, including several known inflammatory response mediators, were observed to change significantly following LPS administration.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号