首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Large-scale protein identifications from highly complex protein mixtures have recently been achieved using multidimensional liquid chromatography coupled with tandem mass spectrometry (LC/LC-MS/MS) and subsequent database searching with algorithms such as SEQUEST. Here, we describe a probability-based evaluation of false positive rates associated with peptide identifications from three different human proteome samples. Peptides from human plasma, human mammary epithelial cell (HMEC) lysate, and human hepatocyte (Huh)-7.5 cell lysate were separated by strong cation exchange (SCX) chromatography coupled offline with reversed-phase capillary LC-MS/MS analyses. The MS/MS spectra were first analyzed by SEQUEST, searching independently against both normal and sequence-reversed human protein databases, and the false positive rates of peptide identifications for the three proteome samples were then analyzed and compared. The observed false positive rates of peptide identifications for human plasma were significantly higher than those for the human cell lines when identical filtering criteria were used, suggesting that the false positive rates are significantly dependent on sample characteristics, particularly the number of proteins found within the detectable dynamic range. Two new sets of filtering criteria are proposed for human plasma and human cell lines, respectively, to provide an overall confidence of >95% for peptide identifications. The new criteria were compared, using a normalized elution time (NET) criterion (Petritis et al. Anal. Chem. 2003, 75, 1039-1048), with previously published criteria (Washburn et al. Nat. Biotechnol. 2001, 19, 242-247). The results demonstrate that the present criteria provide significantly higher levels of confidence for peptide identifications from mammalian proteomes without greatly decreasing the number of identifications.  相似文献   

2.
Wang W  Guo T  Song T  Lee CS  Balgley BM 《Proteomics》2007,7(8):1178-1187
As demonstrated in this study, a CIEF-based multidimensional separation platform not only is compatible with the detergent-based membrane protein preparation protocol, but also achieves both the largest yeast membrane proteome coverage and the most comprehensive analysis of the yeast proteome to date. By using a 1% false discovery rate for total peptide identifications, a total of 2513 distinct yeast proteins are identified from the SDS-solubilized fraction with an average of 5.4 peptides leading to each protein identification. Among proteins identified from the SDS-solubilized fraction, 407 proteins are predicted to contain at least two or more transmembrane domains using TMHMM (www.cbs.dtu.dk/services/TMHMM-2.0/), corresponding to 46% yeast membrane proteome coverage. Only four additional membrane proteins are identified in the soluble and urea-solubilized fractions, affirming the utility of SDS extraction for enriching the membrane proteome. By combining proteome results obtained from the soluble, urea-solubilized, and SDS-solubilized fractions, a single yeast proteome analysis yields the identification of 3632 distinct yeast proteins, corresponding to 55% theoretical yeast proteome coverage or 70% of proteins predicted to be expressed during log-phase growth in rich media.  相似文献   

3.
Shotgun proteomics using mass spectrometry is a powerful method for protein identification but suffers limited sensitivity in complex samples. Integrating peptide identifications from multiple database search engines is a promising strategy to increase the number of peptide identifications and reduce the volume of unassigned tandem mass spectra. Existing methods pool statistical significance scores such as p-values or posterior probabilities of peptide-spectrum matches (PSMs) from multiple search engines after high scoring peptides have been assigned to spectra, but these methods lack reliable control of identification error rates as data are integrated from different search engines. We developed a statistically coherent method for integrative analysis, termed MSblender. MSblender converts raw search scores from search engines into a probability score for every possible PSM and properly accounts for the correlation between search scores. The method reliably estimates false discovery rates and identifies more PSMs than any single search engine at the same false discovery rate. Increased identifications increment spectral counts for most proteins and allow quantification of proteins that would not have been quantified by individual search engines. We also demonstrate that enhanced quantification contributes to improve sensitivity in differential expression analyses.  相似文献   

4.
We present a wrapper-based approach to estimate and control the false discovery rate for peptide identifications using the outputs from multiple commercially available MS/MS search engines. Features of the approach include the flexibility to combine output from multiple search engines with sequence and spectral derived features in a flexible classification model to produce a score associated with correct peptide identifications. This classification model score from a reversed database search is taken as the null distribution for estimating p-values and false discovery rates using a simple and established statistical procedure. Results from 10 analyses of rat sera on an LTQ-FT mass spectrometer indicate that the method is well calibrated for controlling the proportion of false positives in a set of reported peptide identifications while correctly identifying more peptides than rule-based methods using one search engine alone.  相似文献   

5.
Peptide identification using tandem mass spectrometry is a core technology in proteomics. Latest generations of mass spectrometry instruments enable the use of electron transfer dissociation (ETD) to complement collision induced dissociation (CID) for peptide fragmentation. However, a critical limitation to the use of ETD has been optimal database search software. Percolator is a post-search algorithm, which uses semi-supervised machine learning to improve the rate of peptide spectrum identifications (PSMs) together with providing reliable significance measures. We have previously interfaced the Mascot search engine with Percolator and demonstrated sensitivity and specificity benefits with CID data. Here, we report recent developments in the Mascot Percolator V2.0 software including an improved feature calculator and support for a wider range of ion series. The updated software is applied to the analysis of several CID and ETD fragmented peptide data sets. This version of Mascot Percolator increases the number of CID PSMs by up to 80% and ETD PSMs by up to 60% at a 0.01 q-value (1% false discovery rate) threshold over a standard Mascot search, notably recovering PSMs from high charge state precursor ions. The greatly increased number of PSMs and peptide coverage afforded by Mascot Percolator has enabled a fuller assessment of CID/ETD complementarity to be performed. Using a data set of CID and ETcaD spectral pairs, we find that at a 1% false discovery rate, the overlap in peptide identifications by CID and ETD is 83%, which is significantly higher than that obtained using either stand-alone Mascot (69%) or OMSSA (39%). We conclude that Mascot Percolator is a highly sensitive and accurate post-search algorithm for peptide identification and allows direct comparison of peptide identifications using multiple alternative fragmentation techniques.  相似文献   

6.
In the analysis of proteins in complex samples, pre-fractionation is imperative to obtain the necessary depth in the number of reliable protein identifications by mass spectrometry. Here we explore isoelectric focusing of peptides (peptide IEF) as an effective fractionation step that at the same time provides the added possibility to eliminate spurious peptide identifications by filtering for pI. Peptide IEF in IPG strips is fast and sharply confines peptides to their pI. We have evaluated systematically the contribution of pI filtering and accurate mass measurements on the total number of protein identifications in a complex protein mixture (Drosophila nuclear extract). At the same time, by varying Mascot identification cutoff scores, we have monitored the false positive rate among these identifications by searching reverse protein databases. From mass spectrometric analyses at low mass accuracy using an LTQ ion trap, false positive rates can be minimized by filtering of peptides not focusing at their expected pI. Analyses using an LTQ-FT mass spectrometer delivers low false positive rates by itself due to the high mass accuracy. In a direct comparison of peptide IEF with SDS-PAGE as a pre-fractionation step, IEF delivered 25% and 43% more proteins when identified using FT-MS and LTQ-MS, respectively. Cumulatively, 2190 non redundant proteins were identified in the Drosophila nuclear extract at a false positive rate of 0.5%. Of these, 1751 proteins (80%) were identified after peptide IEF and FT-MS alone. Overall, we show that peptide IEF allows to increase the confidence level of protein identifications, and is more sensitive than SDS-PAGE.  相似文献   

7.
Identification of proteins by MS/MS is performed by matching experimental mass spectra against calculated spectra of all possible peptides in a protein data base. The search engine assigns each spectrum a score indicating how well the experimental data complies with the expected one; a higher score means increased confidence in the identification. One problem is the false-positive identifications, which arise from incomplete data as well as from the presence of misleading ions in experimental mass spectra due to gas-phase reactions, stray ions, contaminants, and electronic noise. We employed a novel technique of reduction of false positives that is based on a combined use of orthogonal fragmentation techniques electron capture dissociation (ECD) and collisionally activated dissociation (CAD). Since ECD and CAD exhibit many complementary properties, their combined use greatly increased the analysis specificity, which was further strengthened by the high mass accuracy (approximately 1 ppm) afforded by Fourier transform mass spectrometry. The utility of this approach is demonstrated on a whole cell lysate from Escherichia coli. Analysis was made using the data-dependent acquisition mode. Extraction of complementary sequence information was performed prior to data base search using in-house written software. Only masses involved in complementary pairs in the MS/MS spectrum from the same or orthogonal fragmentation techniques were submitted to the data base search. ECD/CAD identified twice as many proteins at a fixed statistically significant confidence level with on average a 64% higher Mascot score. The confidence in protein identification was hereby increased by more than 1 order of magnitude. The combined ECD/CAD searches were on average 20% faster than CAD-only searches. A specially developed test with scrambled MS/MS data revealed that the amount of false-positive identifications was dramatically reduced by the combined use of CAD and ECD.  相似文献   

8.
MOTIVATION: Statistical evaluation of the confidence of peptide and protein identifications made by tandem mass spectrometry is a critical component for appropriately interpreting the experimental data and conducting downstream analysis. Although many approaches have been developed to assign confidence measure from different perspectives, a unified statistical framework that integrates the uncertainty of peptides and proteins is still missing. RESULTS: We developed a hierarchical statistical model (HSM) that jointly models the uncertainty of the identified peptides and proteins and can be applied to any scoring system. With data sets of a standard mixture and the yeast proteome, we demonstrate that the HSM offers a reliable or at least conservative false discovery rate (FDR) estimate for peptide and protein identifications. The probability measure of HSM also offers a powerful discriminating score for peptide identification. AVAILABILITY: The algorithm is available upon request from the authors.  相似文献   

9.
Proteomic discovery platforms generate both peptide expression information and protein identification information. Peptide expression data are used to determine which peptides are differentially expressed between study cohorts, and then these peptides are targeted for protein identification. In this paper, we demonstrate that peptide expression information is also a powerful tool for enhancing confidence in protein identification results. Specifically, we evaluate the following hypothesis: tryptic peptides originating from the same protein have similar expression profiles across samples in the discovery study. Evidence supporting this hypothesis is provided. This hypothesis is integrated into a protein identification tool, PIPER (Protein Identification and Peptide Expression Resolver), that reduces erroneous protein identifications below 5%. PIPER's utility is illustrated by application to a 72-sample biomarker discovery study where it is demonstrated that false positive protein identifications can be reduced below 5%. Consequently, it is recommended that PIPER methodology be incorporated into proteomic studies where both protein expression and identification data are collected.  相似文献   

10.

Background  

Protein identification using mass spectrometry is an important tool in many areas of the life sciences, and in proteomics research in particular. Increasing the number of proteins correctly identified is dependent on the ability to include new knowledge about the mass spectrometry fragmentation process, into computational algorithms designed to separate true matches of peptides to unidentified mass spectra from spurious matches. This discrimination is achieved by computing a function of the various features of the potential match between the observed and theoretical spectra to give a numerical approximation of their similarity. It is these underlying "metrics" that determine the ability of a protein identification package to maximise correct identifications while limiting false discovery rates. There is currently no software available specifically for the simple implementation and analysis of arbitrary novel metrics for peptide matching and for the exploration of fragmentation patterns for a given dataset.  相似文献   

11.
Li N  Wu S  Zhang C  Chang C  Zhang J  Ma J  Li L  Qian X  Xu P  Zhu Y  He F 《Proteomics》2012,12(11):1720-1725
In this study, we presented a quality control tool named PepDistiller to facilitate the validation of MASCOT search results. By including the number of tryptic termini, and integrating a refined false discovery rate (FDR) calculation method, we demonstrated the improved sensitivity of peptide identifications obtained from semitryptic search results. Based on the analysis of a complex data set, approximately 7% more peptide identifications were obtained using PepDistiller than using MASCOT Percolator. Moreover, the refined method generated lower FDR estimations than the percentage of incorrect target (PIT) fixed method applied in Percolator. Using a standard data set, we further demonstrated the increased accuracy of the refined FDR estimations relative to the PIT-fixed FDR estimations. PepDistiller is fast and convenient to use, and is freely available for academic access. The software can be downloaded from http://www.bprc.ac.cn/pepdistiller.  相似文献   

12.
13.
14.
The development of liquid chromatography coupled with tandem mass spectrometry (LC-MS/MS) has made it possible to characterize phosphopeptides in an increasingly large-scale and high-throughput fashion. However, extracting confident phosphopeptide identifications from the resulting large data sets in a similar high-throughput fashion remains difficult, as does rigorously estimating the false discovery rate (FDR) of a set of phosphopeptide identifications. This article describes a data analysis pipeline designed to address these issues. The first step is to reanalyze phosphopeptide identifications that contain ambiguous assignments for the incorporated phosphate(s) to determine the most likely arrangement of the phosphate(s). The next step is to employ an expectation maximization algorithm to estimate the joint distribution of the peptide scores. A linear discriminant analysis is then performed to determine how to optimally combine peptide scores (in this case, from SEQUEST) into a discriminant score that possesses the maximum discriminating power. Based on this discriminant score, the p- and q-values for each phosphopeptide identification are calculated, and the phosphopeptide identification FDR is then estimated. This data analysis approach was applied to data from a study of irradiated human skin fibroblasts to provide a robust estimate of FDR for phosphopeptides. The Phosphopeptide FDR Estimator software is freely available for download at http://ncrr.pnl.gov/software/.  相似文献   

15.
Researchers have several options when designing proteomics experiments. Primary among these are choices of experimental method, instrumentation and spectral interpretation software. To evaluate these choices on a proteome scale, we compared triplicate measurements of the yeast proteome by liquid chromatography tandem mass spectrometry (LC-MS/MS) using linear ion trap (LTQ) and hybrid quadrupole time-of-flight (QqTOF; QSTAR) mass spectrometers. Acquired MS/MS spectra were interpreted with Mascot and SEQUEST algorithms with and without the requirement that all returned peptides be tryptic. Using a composite target decoy database strategy, we selected scoring criteria yielding 1% estimated false positive identifications at maximum sensitivity for all data sets, allowing reasonable comparisons between them. These comparisons indicate that Mascot and SEQUEST yield similar results for LTQ-acquired spectra but less so for QSTAR spectra. Furthermore, low reproducibility between replicate data acquisitions made on one or both instrument platforms can be exploited to increase sensitivity and confidence in large-scale protein identifications.  相似文献   

16.
Tandem mass spectrometry is commonly used to identify peptides, typically by comparing their product ion spectra with those predicted from a protein sequence database and scoring these matches. The most reported quality metric for a set of peptide identifications is the false discovery rate (FDR), the fraction of expected false identifications in the set. This metric has so far only been used for completely sequenced organisms or known protein mixtures. We have investigated whether FDR estimations are also applicable in the case of partially sequenced organisms, where many high-quality spectra fail to identify the correct peptides because the latter are not present in the searched sequence database. Using real data from human plasma and simulated partial sequence databases derived from two complete human sequence databases with different levels of redundancy, we could demonstrate that the mixture model approach in PeptideProphet is robust for partial databases, particularly if used in combination with decoy sequences. We therefore recommend using this method when estimating the FDR and reporting peptide identifications from incompletely sequenced organisms.  相似文献   

17.
Proteome identification using peptide-centric proteomics techniques is a routinely used analysis technique. One of the most powerful and popular methods for the identification of peptides from MS/MS spectra is protein database matching using search engines. Significance thresholding through false discovery rate (FDR) estimation by target/decoy searches is used to ensure the retention of predominantly confident assignments of MS/MS spectra to peptides. However, shortcomings have become apparent when such decoy searches are used to estimate the FDR. To study these shortcomings, we here introduce a novel kind of decoy database that contains isobaric mutated versions of the peptides that were identified in the original search. Because of the supervised way in which the entrapment sequences are generated, we call this a directed decoy database. Since the peptides found in our directed decoy database are thus specifically designed to look quite similar to the forward identifications, the limitations of the existing search algorithms in making correct calls in such strongly confusing situations can be analyzed. Interestingly, for the vast majority of confidently identified peptide identifications, a directed decoy peptide-to-spectrum match can be found that has a better or equal match score than the forward match score, highlighting an important issue in the interpretation of peptide identifications in present-day high-throughput proteomics.  相似文献   

18.
Proteins do not carry out their functions alone. Instead, they often act by participating in macromolecular complexes and play different functional roles depending on the other members of the complex. It is therefore interesting to identify co-complex relationships. Although protein complexes can be identified in a high-throughput manner by experimental technologies such as affinity purification coupled with mass spectrometry (APMS), these large-scale datasets often suffer from high false positive and false negative rates. Here, we present a computational method that predicts co-complexed protein pair (CCPP) relationships using kernel methods from heterogeneous data sources. We show that a diffusion kernel based on random walks on the full network topology yields good performance in predicting CCPPs from protein interaction networks. In the setting of direct ranking, a diffusion kernel performs much better than the mutual clustering coefficient. In the setting of SVM classifiers, a diffusion kernel performs much better than a linear kernel. We also show that combination of complementary information improves the performance of our CCPP recognizer. A summation of three diffusion kernels based on two-hybrid, APMS, and genetic interaction networks and three sequence kernels achieves better performance than the sequence kernels or diffusion kernels alone. Inclusion of additional features achieves a still better ROC(50) of 0.937. Assuming a negative-to-positive ratio of 600ratio1, the final classifier achieves 89.3% coverage at an estimated false discovery rate of 10%. Finally, we applied our prediction method to two recently described APMS datasets. We find that our predicted positives are highly enriched with CCPPs that are identified by both datasets, suggesting that our method successfully identifies true CCPPs. An SVM classifier trained from heterogeneous data sources provides accurate predictions of CCPPs in yeast. This computational method thereby provides an inexpensive method for identifying protein complexes that extends and complements high-throughput experimental data.  相似文献   

19.
MS-based proteomics generates rapidly increasing amounts of precise and quantitative information. Analysis of individual proteomic experiments has made great strides, but the crucial ability to compare and store information across different proteome measurements still presents many challenges. For example, it has been difficult to avoid contamination of databases with low quality peptide identifications, to control for the inflation in false positive identifications when combining data sets, and to integrate quantitative data. Although, for example, the contamination with low quality identifications has been addressed by joint analysis of deposited raw data in some public repositories, we reasoned that there should be a role for a database specifically designed for high resolution and quantitative data. Here we describe a novel database termed MaxQB that stores and displays collections of large proteomics projects and allows joint analysis and comparison. We demonstrate the analysis tools of MaxQB using proteome data of 11 different human cell lines and 28 mouse tissues. The database-wide false discovery rate is controlled by adjusting the project specific cutoff scores for the combined data sets. The 11 cell line proteomes together identify proteins expressed from more than half of all human genes. For each protein of interest, expression levels estimated by label-free quantification can be visualized across the cell lines. Similarly, the expression rank order and estimated amount of each protein within each proteome are plotted. We used MaxQB to calculate the signal reproducibility of the detected peptides for the same proteins across different proteomes. Spearman rank correlation between peptide intensity and detection probability of identified proteins was greater than 0.8 for 64% of the proteome, whereas a minority of proteins have negative correlation. This information can be used to pinpoint false protein identifications, independently of peptide database scores. The information contained in MaxQB, including high resolution fragment spectra, is accessible to the community via a user-friendly web interface at http://www.biochem.mpg.de/maxqb.  相似文献   

20.
Large-scale phosphoproteomic analysis employing liquid chromatography-tandem mass spectrometry (LC-MS/MS) often requires a significant amount of manual manipulation of phosphopeptide datasets in the post-acquisition phase. To assist in this process, we have created software, PhosphoPIC (PhosphoPeptide Identification and Compilation), which can perform a variety of useful functions including automated selection and compilation of phosphopeptide identifications from multiple MS levels, estimation of dataset false discovery rate, and application of appropriate cross-correlation (XCorr) filters. In addition, the output files generated by this program are compatible with downstream phosphorylation site assignment using the Ascore algorithm, as well as phosphopeptide quantification via QUOIL. In this report, we utilized this software to analyze phosphoproteins from short-term vasopressin-treated rat kidney inner medullary collecting duct (IMCD). A total of 925 phosphopeptides representing 173 unique proteins were identified from membrane-enriched fractions of IMCD with a false discovery rate of 1.5%. Of these proteins, 106 were found only in the membrane-enriched fraction of IMCD cells and not in whole IMCD cell lysates. These identifications included a number of well-studied ion and solute transporters including ClC-1, LAT4, MCT2, NBC3, and NHE1, all of which contained novel phosphorylation sites. Using a label-free quantification approach, we identified phosphoproteins that changed in abundance with vasopressin exposure including aquaporin-2 (AQP2), Hnrpa3, IP3 receptor 3, and pur-beta.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号