共查询到20条相似文献,搜索用时 0 毫秒
1.
Robust statistical validation of peptide identifications obtained by tandem mass spectrometry and sequence database searching is an important task in shotgun proteomics. PeptideProphet is a commonly used computational tool that computes confidence measures for peptide identifications. In this paper, we investigate several limitations of the PeptideProphet modeling approach, including the use of fixed coefficients in computing the discriminant search score and selection of the top scoring peptide assignment per spectrum only. To address these limitations, we describe an adaptive method in which a new discriminant function is learned from the data in an iterative fashion. We extend the modeling framework to go beyond the top scoring peptide assignment per spectrum. We also investigate the effect of clustering the spectra according to their spectrum quality score followed by cluster-specific mixture modeling. The analysis is carried out using data acquired from a mixture of purified proteins on four different types of mass spectrometers, as well as using a complex human serum data set. A special emphasis is placed on the analysis of data generated on high mass accuracy instruments. 相似文献
2.
鸟枪法蛋白质鉴定质量控制方法研究进展 总被引:1,自引:0,他引:1
鸟枪法串联质谱蛋白质鉴定策略由于其高可靠和高效率而被广泛应用于蛋白质组学研究中,这种方法直接对蛋白质混合物进行酶切,以肽段为鉴定单元,继而推导真实的样品蛋白质.由于利用质谱图推导肽段存在一定的假阳性率,而且直接对蛋白质混合物的酶切也导致了肽段和蛋白质之间关联信息的丢失,所鉴定的蛋白质难免存在部分不可靠结果.因此,蛋白质鉴定的质量控制在蛋白质组学研究中极为重要.蛋白质鉴定的质量控制包含两大类主要方法,其一为利用肽段进行蛋白质组装,当前最常用也被证明最有效的方法是使用简约原则,即用最少的蛋白质解释所有鉴定肽段,现有的方法可以分为布尔型和概率型,其二为鉴定蛋白质的可靠性评估,包括单个蛋白质鉴定置信度和蛋白质鉴定整体水平的假阳性率计算.综合各种可辅助蛋白质鉴定的先验信息,构建普适的概率统计模型,是目前蛋白质鉴定质量控制方法的发展趋势. 相似文献
3.
4.
A framework for intelligent data acquisition and real-time database searching for shotgun proteomics
Graumann J Scheltema RA Zhang Y Cox J Mann M 《Molecular & cellular proteomics : MCP》2012,11(3):M111.013185
In the analysis of complex peptide mixtures by MS-based proteomics, many more peptides elute at any given time than can be identified and quantified by the mass spectrometer. This makes it desirable to optimally allocate peptide sequencing and narrow mass range quantification events. In computer science, intelligent agents are frequently used to make autonomous decisions in complex environments. Here we develop and describe a framework for intelligent data acquisition and real-time database searching and showcase selected examples. The intelligent agent is implemented in the MaxQuant computational proteomics environment, termed MaxQuant Real-Time. It analyzes data as it is acquired on the mass spectrometer, constructs isotope patterns and SILAC pair information as well as controls MS and tandem MS events based on real-time and prior MS data or external knowledge. Re-implementing a top10 method in the intelligent agent yields similar performance to the data dependent methods running on the mass spectrometer itself. We demonstrate the capabilities of MaxQuant Real-Time by creating a real-time search engine capable of identifying peptides "on-the-fly" within 30 ms, well within the time constraints of a shotgun fragmentation "topN" method. The agent can focus sequencing events onto peptides of specific interest, such as those originating from a specific gene ontology (GO) term, or peptides that are likely modified versions of already identified peptides. Finally, we demonstrate enhanced quantification of SILAC pairs whose ratios were poorly defined in survey spectra. MaxQuant Real-Time is flexible and can be applied to a large number of scenarios that would benefit from intelligent, directed data acquisition. Our framework should be especially useful for new instrument types, such as the quadrupole-Orbitrap, that are currently becoming available. 相似文献
5.
M. L. Pridatchenko I. A. Tarasova V. Guryca A. S. Kononikhin C. Adams D. A. Tolmachev A. Yu. Agapov V. V. Evreinov I. A. Popov E. N. Nikolaev R. A. Zubarev A. V. Gorshkov C. D. Masselon M. V. Gorshkov 《Biochemistry. Biokhimii?a》2009,74(11):1195-1202
Generation of a complex proteome database requires use of powerful analytical methods capable of following rapid changes in the proteome due to changing physiological and pathological states of the organism under study. One of the promising technologies with this regard is the use of so-called Accurate Mass and Time (AMT) tag peptide databases. Generation of an AMT database for a complex proteome requires combined efforts by many research groups and laboratories, but the chromatography data resulting from these efforts are tied to the particular experimental conditions and, in general, are not transferable from one platform to another. In this work, we consider an approach to solve this problem that is based on the generation of a universal scale for the chromatography data using a multiple-point normalization method. The method follows from the concept of linear correlation between chromatography data obtained over a wide range of separation parameters. The method is further tested for tryptic peptide mixtures with experimental data collected from mutual studies by different independent research groups using different separation protocols and mass spectrometry data processing tools. 相似文献
6.
Booth JG Eilertson KE Olinares PD Yu H 《Molecular & cellular proteomics : MCP》2011,10(8):M110.007203
Recent developments in mass-spectrometry-based shotgun proteomics, especially methods using spectral counting, have enabled large-scale identification and differential profiling of complex proteomes. Most such proteomic studies are interested in identifying proteins, the abundance of which is different under various conditions. Several quantitative methods have recently been proposed and implemented for this purpose. Building on some techniques that are now widely accepted in the microarray literature, we developed and implemented a new method using a Bayesian model to calculate posterior probabilities of differential abundance for thousands of proteins in a given experiment simultaneously. Our Bayesian model is shown to deliver uniformly superior performance when compared with several existing methods. 相似文献
7.
Here, we describe the novel use of a volatile surfactant, perfluorooctanoic acid (PFOA), for shotgun proteomics. PFOA was found to solubilize membrane proteins as effectively as sodium dodecyl sulfate (SDS). PFOA concentrations up to 0.5% (w/v) did not significantly inhibit trypsin activity. The unique features of PFOA allowed us to develop a single-tube shotgun proteomics method that used all volatile chemicals that could easily be removed by evaporation prior to mass spectrometry analysis. The experimental procedures involved: 1) extraction of proteins in 2% PFOA; 2) reduction of cystine residues with triethyl phosphine and their S-alkylation with iodoethanol; 3) trypsin digestion of proteins in 0.5% PFOA; 4) removal of PFOA by evaporation; and 5) LC-MS/MS analysis of the resulting peptides. The general applicability of the method was demonstrated with the membrane preparation of photoreceptor outer segments. We identified 75 proteins from 1 μg of the tryptic peptides in a single, 1-hour, LC-MS/MS run. About 67% of the proteins identified were classified as membrane proteins. We also demonstrate that a proteolytic (18)O labeling procedure can be incorporated after the PFOA removal step for quantitative proteomic experiments. The present method does not require sample clean-up devices such as solid-phase extractions and membrane filters, so no proteins/peptides are lost in any experimental steps. Thus, this single-tube shotgun proteomics method overcomes the major drawbacks of surfactant use in proteomic experiments. 相似文献
8.
Li N Wu S Zhang C Chang C Zhang J Ma J Li L Qian X Xu P Zhu Y He F 《Proteomics》2012,12(11):1720-1725
In this study, we presented a quality control tool named PepDistiller to facilitate the validation of MASCOT search results. By including the number of tryptic termini, and integrating a refined false discovery rate (FDR) calculation method, we demonstrated the improved sensitivity of peptide identifications obtained from semitryptic search results. Based on the analysis of a complex data set, approximately 7% more peptide identifications were obtained using PepDistiller than using MASCOT Percolator. Moreover, the refined method generated lower FDR estimations than the percentage of incorrect target (PIT) fixed method applied in Percolator. Using a standard data set, we further demonstrated the increased accuracy of the refined FDR estimations relative to the PIT-fixed FDR estimations. PepDistiller is fast and convenient to use, and is freely available for academic access. The software can be downloaded from http://www.bprc.ac.cn/pepdistiller. 相似文献
9.
Background
In proteomic analysis, MS/MS spectra acquired by mass spectrometer are assigned to peptides by database searching algorithms such as SEQUEST. The assignations of peptides to MS/MS spectra by SEQUEST searching algorithm are defined by several scores including Xcorr, ΔCn, Sp, Rsp, matched ion count and so on. Filtering criterion using several above scores is used to isolate correct identifications from random assignments. However, the filtering criterion was not favorably optimized up to now. 相似文献10.
Paulo C Carvalho Juliana SG Fischer Emily I Chen John R YatesIII Valmir C Barbosa 《BMC bioinformatics》2008,9(1):316
Background
A goal of proteomics is to distinguish between states of a biological system by identifying protein expression differences. Liu et al. demonstrated a method to perform semi-relative protein quantitation in shotgun proteomics data by correlating the number of tandem mass spectra obtained for each protein, or "spectral count", with its abundance in a mixture; however, two issues have remained open: how to normalize spectral counting data and how to efficiently pinpoint differences between profiles. Moreover, Chen et al. recently showed how to increase the number of identified proteins in shotgun proteomics by analyzing samples with different MS-compatible detergents while performing proteolytic digestion. The latter introduced new challenges as seen from the data analysis perspective, since replicate readings are not acquired. 相似文献11.
Li J Su Z Ma ZQ Slebos RJ Halvey P Tabb DL Liebler DC Pao W Zhang B 《Molecular & cellular proteomics : MCP》2011,10(5):M110.006536
Shotgun proteomics data analysis usually relies on database search. However, commonly used protein sequence databases do not contain information on protein variants and thus prevent variant peptides and proteins from been identified. Including known coding variations into protein sequence databases could help alleviate this problem. Based on our recently published human Cancer Proteome Variation Database, we have created a protein sequence database that comprehensively annotates thousands of cancer-related coding variants collected in the Cancer Proteome Variation Database as well as noncancer-specific ones from the Single Nucleotide Polymorphism Database (dbSNP). Using this database, we then developed a data analysis workflow for variant peptide identification in shotgun proteomics. The high risk of false positive variant identifications was addressed by a modified false discovery rate estimation method. Analysis of colorectal cancer cell lines SW480, RKO, and HCT-116 revealed a total of 81 peptides that contain either noncancer-specific or cancer-related variations. Twenty-three out of 26 variants randomly selected from the 81 were confirmed by genomic sequencing. We further applied the workflow on data sets from three individual colorectal tumor specimens. A total of 204 distinct variant peptides were detected, and five carried known cancer-related mutations. Each individual showed a specific pattern of cancer-related mutations, suggesting potential use of this type of information for personalized medicine. Compatibility of the workflow has been tested with four popular database search engines including Sequest, Mascot, X!Tandem, and MyriMatch. In summary, we have developed a workflow that effectively uses existing genomic data to enable variant peptide detection in proteomics. 相似文献
12.
13.
The target-decoy approach to estimating and controlling false discovery rate (FDR) has become a de facto standard in shotgun proteomics, and it has been applied at both the peptide-to-spectrum match (PSM) and protein levels. Current bioinformatics methods control either the PSM- or the protein-level FDR, but not both. In order to obtain the most reliable information from their data, users must employ one method when the number of tandem mass spectra exceeds the number of proteins in the database and another method when the reverse is true. Here we propose a simple variation of the standard target-decoy strategy that estimates and controls PSM and protein FDRs simultaneously, regardless of the relative numbers of spectra and proteins. We demonstrate that even if the final goal is a list of PSMs with a fixed low FDR and not a list of protein identifications, the proposed two-dimensional strategy offers advantages over a pure PSM-level strategy. 相似文献
14.
The analysis by liquid chromatography coupled to tandem mass spectrometry of complex peptide mixtures, generated by proteolysis of protein samples, is the main proteomics method used today. The approach is based on the assumption that each protein present in a sample reproducibly and predictably generates a relatively small number of peptides that can be identified by mass spectrometry. In this study this assumption was examined by a targeted peptide sequencing strategy using inclusion lists to trigger peptide fragmentation attempts. It was found that the number of peptides observed from a single protein is at least one order of magnitude greater than previously assumed. This unexpected complexity of proteomics samples implies substantial technical challenges, explains some perplexing results in the proteomics literature, and prompts the need for developing alternative experimental strategies for the rapid and comprehensive analysis of proteomes. 相似文献
15.
An optimization and comparison of trypsin digestion strategies for peptide/protein identifications by microLC-MS/MS with or without MS compatible detergents in mixed organic-aqueous and aqueous systems was carried out in this study. We determine that adding MS-compatible detergents to proteolytic digestion protocols dramatically increases peptide and protein identifications in complex protein mixtures by shotgun proteomics. Protein solubilization and proteolytic efficiency are increased by including MS-compatible detergents in trypsin digestion buffers. A modified trypsin digestion protocol incorporating the MS compatible detergents consistently identifies over 300 proteins from 5 microg of pancreatic cell lysates and generates a greater number of peptide identifications than trypsin digestion with urea when using LC-MS/MS. Furthermore, over 700 proteins were identified by merging protein identifications from trypsin digestion with three different MS-compatible detergents. We also observe that the use of mixed aqueous and organic solvent systems can influence protein identifications in combinations with different MS-compatible detergents. Peptide mixtures generated from different MS-compatible detergents and buffer combinations show a significant difference in hydrophobicity. Our results show that protein digestion schemes incorporating MS-compatible detergents generate quantitative as well as qualitative changes in observed peptide identifications, which lead to increased protein identifications overall and potentially increased identification of low-abundance proteins. 相似文献
16.
Forshed J Johansson HJ Pernemalm M Branca RM Sandberg A Lehtiö J 《Molecular & cellular proteomics : MCP》2011,10(10):M111.010264
We present a tool to improve quantitative accuracy and precision in mass spectrometry based on shotgun proteomics: protein quantification by peptide quality control, PQPQ. The method is based on the assumption that the quantitative pattern of peptides derived from one protein will correlate over several samples. Dissonant patterns arise either from outlier peptides or because of the presence of different protein species. By correlation analysis, protein quantification by peptide quality control identifies and excludes outliers and detects the existence of different protein species. Alternative protein species are then quantified separately. By validating the algorithm on seven data sets related to different cancer studies we show that data processing by protein quantification by peptide quality control improves the information output from shotgun proteomics. Data from two labeling procedures and three different instrumental platforms was included in the evaluation. With this unique method using both peptide sequence data and quantitative data we can improve the quantitative accuracy and precision on the protein level and detect different protein species. 相似文献
17.
Koskinen VR Emery PA Creasy DM Cottrell JS 《Molecular & cellular proteomics : MCP》2011,10(6):M110.003822
A new result report for Mascot search results is described. A greedy set cover algorithm is used to create a minimal set of proteins, which is then grouped into families on the basis of shared peptide matches. Protein families with multiple members are represented by dendrograms, generated by hierarchical clustering using the score of the nonshared peptide matches as a distance metric. The peptide matches to the proteins in a family can be compared side by side to assess the experimental evidence for each protein. If the evidence for a particular family member is considered inadequate, the dendrogram can be cut to reduce the number of distinct family members. 相似文献
18.
Computational analysis of shotgun proteomics data 总被引:2,自引:0,他引:2
MacCoss MJ 《Current opinion in chemical biology》2005,9(1):88-94
Proteomics technology is progressing at an incredible rate. The latest generation of tandem mass spectrometers can now acquire tens of thousands of fragmentation spectra in a matter of hours. Furthermore, quantitative proteomics methods have been developed that incorporate a stable isotope-labeled internal standard for every peptide within a complex protein mixture for the measurement of relative protein abundances. These developments have opened the doors for 'shotgun' proteomics, yet have also placed a burden on the computational approaches that manage the data. With each new method that is developed, the quantity of data that can be derived from a single experiment increases. To deal with this increase, new computational approaches are being developed to manage the data and assess false positives. This review discusses current approaches for analyzing proteomics data by mass spectrometry and identifies present computational limitations and bottlenecks. 相似文献
19.
Geromanos SJ Hughes C Golick D Ciavarini S Gorenstein MV Richardson K Hoyes JB Vissers JP Langridge JI 《Proteomics》2011,11(6):1189-1211
The computational simulation of complete proteomic data sets and their utility to validate detection and interpretation algorithms, to aid in the design of experiments and to assess protein and peptide false discovery rates is presented. The simulation software has been developed for emulating data originating from data-dependent and data-independent LC-MS workflows. Data from all types of commonly used hybrid mass spectrometers can be simulated. The algorithms are based on empirically derived physicochemical liquid and gas phase models for proteins and peptides. Sample composition in terms of complexity and dynamic range, as well as chromatographic, experimental and MS conditions, can be controlled and adjusted independently. The effect of on-column amounts, gradient length, mass resolution and ion mobility on search specificity will be demonstrated using tryptic peptides from human and yeast cellular lysates simulated over five orders of magnitude in dynamic range. Initial justification of the simulated data sets is achieved by comparing and contrasting the in silico simulated data to experimentally derived results from a 48 protein mixture, spanning a similar magnitude of five orders of magnitude. Additionally, experimental data from replicate and dilutions series experiments will be utilized to determine error rates at the peptide and protein level with respect to mass, area, retention and drift time. The data presented reveal a high degree of similarity at the ion detection, peptide and protein level when analyzed under similar conditions. 相似文献
20.
Mass spectrometry has become a key technology for modern large-scale protein sequencing. Tandem mass spectrometry, the process of peptide ion dissociation followed by mass-to-charge ratio (m/z) analysis, is the critical component for peptide identification. Recent advances in mass spectrometry now permit two discrete, and complementary, types of peptide ion fragmentation: collision-activated dissociation (CAD) and electron transfer dissociation (ETD) on a single instrument. To exploit this complementarity and increase sequencing success rates, we designed and embedded a data-dependent decision tree algorithm (DT) to make unsupervised, real-time decisions of which fragmentation method to use based on precursor charge and m/z. Applying the DT to large-scale proteome analyses of Saccharomyces cerevisiae and human embryonic stem cells, we identified 53,055 peptides in total, which was greater than by using CAD (38,293) or ETD (39,507) alone. In addition, the DT method also identified 7,422 phosphopeptides, compared to either 2,801 (CAD) or 5,874 (ETD) phosphopeptides. 相似文献