首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Computational analysis of shotgun proteomics data   总被引:2,自引:0,他引:2  
Proteomics technology is progressing at an incredible rate. The latest generation of tandem mass spectrometers can now acquire tens of thousands of fragmentation spectra in a matter of hours. Furthermore, quantitative proteomics methods have been developed that incorporate a stable isotope-labeled internal standard for every peptide within a complex protein mixture for the measurement of relative protein abundances. These developments have opened the doors for 'shotgun' proteomics, yet have also placed a burden on the computational approaches that manage the data. With each new method that is developed, the quantity of data that can be derived from a single experiment increases. To deal with this increase, new computational approaches are being developed to manage the data and assess false positives. This review discusses current approaches for analyzing proteomics data by mass spectrometry and identifies present computational limitations and bottlenecks.  相似文献   

2.
3.
Spectral counting has become a commonly used approach for measuring protein abundance in label-free shotgun proteomics. At the same time, the development of data analysis methods has lagged behind. Currently most studies utilizing spectral counts rely on simple data transforms and posthoc corrections of conventional signal-to-noise ratio statistics. However, these adjustments can neither handle the bias toward high abundance proteins nor deal with the drawbacks due to the limited number of replicates. We present a novel statistical framework (QSpec) for the significance analysis of differential expression with extensions to a variety of experimental design factors and adjustments for protein properties. Using synthetic and real experimental data sets, we show that the proposed method outperforms conventional statistical methods that search for differential expression for individual proteins. We illustrate the flexibility of the model by analyzing a data set with a complicated experimental design involving cellular localization and time course.  相似文献   

4.
5.
Here, we describe the novel use of a volatile surfactant, perfluorooctanoic acid (PFOA), for shotgun proteomics. PFOA was found to solubilize membrane proteins as effectively as sodium dodecyl sulfate (SDS). PFOA concentrations up to 0.5% (w/v) did not significantly inhibit trypsin activity. The unique features of PFOA allowed us to develop a single-tube shotgun proteomics method that used all volatile chemicals that could easily be removed by evaporation prior to mass spectrometry analysis. The experimental procedures involved: 1) extraction of proteins in 2% PFOA; 2) reduction of cystine residues with triethyl phosphine and their S-alkylation with iodoethanol; 3) trypsin digestion of proteins in 0.5% PFOA; 4) removal of PFOA by evaporation; and 5) LC-MS/MS analysis of the resulting peptides. The general applicability of the method was demonstrated with the membrane preparation of photoreceptor outer segments. We identified 75 proteins from 1 μg of the tryptic peptides in a single, 1-hour, LC-MS/MS run. About 67% of the proteins identified were classified as membrane proteins. We also demonstrate that a proteolytic (18)O labeling procedure can be incorporated after the PFOA removal step for quantitative proteomic experiments. The present method does not require sample clean-up devices such as solid-phase extractions and membrane filters, so no proteins/peptides are lost in any experimental steps. Thus, this single-tube shotgun proteomics method overcomes the major drawbacks of surfactant use in proteomic experiments.  相似文献   

6.
7.
8.
Recent developments in mass-spectrometry-based shotgun proteomics, especially methods using spectral counting, have enabled large-scale identification and differential profiling of complex proteomes. Most such proteomic studies are interested in identifying proteins, the abundance of which is different under various conditions. Several quantitative methods have recently been proposed and implemented for this purpose. Building on some techniques that are now widely accepted in the microarray literature, we developed and implemented a new method using a Bayesian model to calculate posterior probabilities of differential abundance for thousands of proteins in a given experiment simultaneously. Our Bayesian model is shown to deliver uniformly superior performance when compared with several existing methods.  相似文献   

9.
Beer I  Barnea E  Ziv T  Admon A 《Proteomics》2004,4(4):950-960
Tandem mass spectrometry (MS/MS), coupled with liquid chromatography (LC), is a powerful tool for the analysis and comparison of complex protein and peptide mixtures. However, the extremely large amounts of data that result from the process are very complex and difficult to analyze. We show how the clustering of similar spectra from multiple LC-MS/MS runs can help in data management and improve the analysis of complex peptide mixtures. The major effect of spectrum clustering is the reduction of the huge amounts of data to a manageable size. As a result, analysis time is shorter and more data can be stored for further analysis. Furthermore, spectrum quality improvement allows the identification of more peptides with greater confidence, the comparison of complex peptide mixtures is facilitated, and the entire proteomics project is presented in concise form. Pep-Miner is an advanced software tool that implements these clustering-based applications. It proved useful in several comparative proteomics projects involving lung cancer cells and various other cell types. In one of these projects, Pep-Miner reduced 517 000 spectra to 20 900 clusters and identified 2518 peptides derived from 830 proteins. Clustering and identification lasted less than two hours on an IBM Thinkpad T23 computer (laptop). Pep-Miner's unique properties make it a very useful tool for large-scale shotgun proteomics projects.  相似文献   

10.
In the analysis of complex peptide mixtures by MS-based proteomics, many more peptides elute at any given time than can be identified and quantified by the mass spectrometer. This makes it desirable to optimally allocate peptide sequencing and narrow mass range quantification events. In computer science, intelligent agents are frequently used to make autonomous decisions in complex environments. Here we develop and describe a framework for intelligent data acquisition and real-time database searching and showcase selected examples. The intelligent agent is implemented in the MaxQuant computational proteomics environment, termed MaxQuant Real-Time. It analyzes data as it is acquired on the mass spectrometer, constructs isotope patterns and SILAC pair information as well as controls MS and tandem MS events based on real-time and prior MS data or external knowledge. Re-implementing a top10 method in the intelligent agent yields similar performance to the data dependent methods running on the mass spectrometer itself. We demonstrate the capabilities of MaxQuant Real-Time by creating a real-time search engine capable of identifying peptides "on-the-fly" within 30 ms, well within the time constraints of a shotgun fragmentation "topN" method. The agent can focus sequencing events onto peptides of specific interest, such as those originating from a specific gene ontology (GO) term, or peptides that are likely modified versions of already identified peptides. Finally, we demonstrate enhanced quantification of SILAC pairs whose ratios were poorly defined in survey spectra. MaxQuant Real-Time is flexible and can be applied to a large number of scenarios that would benefit from intelligent, directed data acquisition. Our framework should be especially useful for new instrument types, such as the quadrupole-Orbitrap, that are currently becoming available.  相似文献   

11.

Background  

A goal of proteomics is to distinguish between states of a biological system by identifying protein expression differences. Liu et al. demonstrated a method to perform semi-relative protein quantitation in shotgun proteomics data by correlating the number of tandem mass spectra obtained for each protein, or "spectral count", with its abundance in a mixture; however, two issues have remained open: how to normalize spectral counting data and how to efficiently pinpoint differences between profiles. Moreover, Chen et al. recently showed how to increase the number of identified proteins in shotgun proteomics by analyzing samples with different MS-compatible detergents while performing proteolytic digestion. The latter introduced new challenges as seen from the data analysis perspective, since replicate readings are not acquired.  相似文献   

12.
Orthogonal analysis of amino acid substitutions as a result of SNPs in existing proteomic datasets provides a critical foundation for the emerging field of population-based proteomics. Large-scale proteomics datasets, derived from shotgun tandem MS analysis of complex cellular protein mixtures, contain many unassigned spectra that may correspond to alternate alleles coded by SNPs. The purpose of this work was to identify tandem MS spectra in LC-MS/MS shotgun proteomics datasets that may represent coding nonsynonymous SNPs (nsSNP). To this end, we generated a tryptic peptide database created from allelic information found in NCBI's dbSNP. We searched this database with tandem MS spectra of tryptic peptides from DU4475 breast tumor cells that had been fractioned by pI in the first-dimension and reverse-phase LC in the second dimension. In all we identified 629 nsSNPs, of which 36 were of alternate SNP alleles not found in the reference NCBI or IPI protein databases. Searches for SNP-peptides carry a high risk of false positives due both to mass shifts caused by modifications and because of multiple representations of the same peptide within the genome. In this work, false positives were filtered using a novel peptide pI prediction algorithm and characterized using a decoy database developed by random substitution of similarly sized reference peptides. Secondary validation by sequencing of corresponding genomic DNA confirmed the presence of the predicted SNP in 8 of 10 SNP-peptides. This work highlights that the usefulness of interpreting unassigned spectra as polymorphisms is highly reliant on the ability to detect and filter false positives.  相似文献   

13.
14.
The target-decoy approach to estimating and controlling false discovery rate (FDR) has become a de facto standard in shotgun proteomics, and it has been applied at both the peptide-to-spectrum match (PSM) and protein levels. Current bioinformatics methods control either the PSM- or the protein-level FDR, but not both. In order to obtain the most reliable information from their data, users must employ one method when the number of tandem mass spectra exceeds the number of proteins in the database and another method when the reverse is true. Here we propose a simple variation of the standard target-decoy strategy that estimates and controls PSM and protein FDRs simultaneously, regardless of the relative numbers of spectra and proteins. We demonstrate that even if the final goal is a list of PSMs with a fixed low FDR and not a list of protein identifications, the proposed two-dimensional strategy offers advantages over a pure PSM-level strategy.  相似文献   

15.
SUMMARY: We present an approach to statistically pinpoint differentially expressed proteins that have quantitation values near the quantitation threshold and are not identified in all replicates (marginal cases). Our method uses a Bayesian strategy to combine parametric statistics with an empirical distribution built from the reproducibility quality of the technical replicates. AVAILABILITY: The software is freely available for academic use at http://pcarvalho.com/patternlab.  相似文献   

16.
ABSTRACT

Introduction: The last decade has yielded significant developments in the field of proteomics, especially in mass spectrometry (MS) and data analysis tools. In particular, a shift from gel-based to MS-based proteomics has been observed, thereby providing a platform with which to construct proteome atlases for all life forms. Nevertheless, the analysis of plant proteomes, especially those of samples that contain high-abundance proteins (HAPs), such as soybean seeds, remains challenging.

Areas covered: Here, we review recent progress in soybean seed proteomics and highlight advances in HAPs depletion methods and peptide pre-fractionation, identification, and quantification methods. We also suggest a pipeline for future proteomic analysis, in order to increase the dynamic coverage of the soybean seed proteome.

Expert opinion: Because HAPs limit the dynamic resolution of the soybean seed proteome, the depletion of HAPs is a prerequisite of high-throughput proteome analysis, and owing to the use of two-dimensional gel electrophoresis-based proteomic approaches, few soybean seed proteins have been identified or characterized. Recent advances in proteomic technologies, which have significantly increased the proteome coverage of other plants, could be used to overcome the current complexity and limitation of soybean seed proteomics.  相似文献   

17.
The analysis by liquid chromatography coupled to tandem mass spectrometry of complex peptide mixtures, generated by proteolysis of protein samples, is the main proteomics method used today. The approach is based on the assumption that each protein present in a sample reproducibly and predictably generates a relatively small number of peptides that can be identified by mass spectrometry. In this study this assumption was examined by a targeted peptide sequencing strategy using inclusion lists to trigger peptide fragmentation attempts. It was found that the number of peptides observed from a single protein is at least one order of magnitude greater than previously assumed. This unexpected complexity of proteomics samples implies substantial technical challenges, explains some perplexing results in the proteomics literature, and prompts the need for developing alternative experimental strategies for the rapid and comprehensive analysis of proteomes.  相似文献   

18.
An optimization and comparison of trypsin digestion strategies for peptide/protein identifications by microLC-MS/MS with or without MS compatible detergents in mixed organic-aqueous and aqueous systems was carried out in this study. We determine that adding MS-compatible detergents to proteolytic digestion protocols dramatically increases peptide and protein identifications in complex protein mixtures by shotgun proteomics. Protein solubilization and proteolytic efficiency are increased by including MS-compatible detergents in trypsin digestion buffers. A modified trypsin digestion protocol incorporating the MS compatible detergents consistently identifies over 300 proteins from 5 microg of pancreatic cell lysates and generates a greater number of peptide identifications than trypsin digestion with urea when using LC-MS/MS. Furthermore, over 700 proteins were identified by merging protein identifications from trypsin digestion with three different MS-compatible detergents. We also observe that the use of mixed aqueous and organic solvent systems can influence protein identifications in combinations with different MS-compatible detergents. Peptide mixtures generated from different MS-compatible detergents and buffer combinations show a significant difference in hydrophobicity. Our results show that protein digestion schemes incorporating MS-compatible detergents generate quantitative as well as qualitative changes in observed peptide identifications, which lead to increased protein identifications overall and potentially increased identification of low-abundance proteins.  相似文献   

19.
Comparative LC-MS is a powerful method for detailed quantitative comparison of complex protein mixtures. Dedicated software is required for detection, matching, and alignment of peaks in multiple LC-MS datasets. However, retention time shifts, saturation effects, limitations of experimental accuracy, and possible occurrence of split peaks make it difficult for software to perfectly match all chromatograms. We describe a procedure to assess the above problems and show that dataset quality can be enhanced with the aid of cluster analysis.  相似文献   

20.
Granholm V  Käll L 《Proteomics》2011,11(6):1086-1093
The peptide identification process in shotgun proteomics is most frequently solved with search engines. Such search engines assign scores that reflect similarity between the measured fragmentation spectrum and the theoretical spectra of the peptides of a given database. However, the scores from most search engines do not have a direct statistical interpretation. To understand and make use of the significance of peptide identifications, one must thus be familiar with some statistical concepts. Here, we discuss different statistical scores used to show the confidence of an identification and a set of methods to estimate these scores. We also describe the variance of statistical scores and imperfections of scoring functions of peptide-spectrum matches.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号