首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Nesvizhskii AI 《Proteomics》2012,12(10):1639-1655
Analysis of protein interaction networks and protein complexes using affinity purification and mass spectrometry (AP/MS) is among most commonly used and successful applications of proteomics technologies. One of the foremost challenges of AP/MS data is a large number of false-positive protein interactions present in unfiltered data sets. Here we review computational and informatics strategies for detecting specific protein interaction partners in AP/MS experiments, with a focus on incomplete (as opposite to genome wide) interactome mapping studies. These strategies range from standard statistical approaches, to empirical scoring schemes optimized for a particular type of data, to advanced computational frameworks. The common denominator among these methods is the use of label-free quantitative information such as spectral counts or integrated peptide intensities that can be extracted from AP/MS data. We also discuss related issues such as combining multiple biological or technical replicates, and dealing with data generated using different tagging strategies. Computational approaches for benchmarking of scoring methods are discussed, and the need for generation of reference AP/MS data sets is highlighted. Finally, we discuss the possibility of more extended modeling of experimental AP/MS data, including integration with external information such as protein interaction predictions based on functional genomics data.  相似文献   

2.
This tutorial article introduces mass spectrometry (MS) for peptide fragmentation and protein identification. The current approaches being used for protein identification include top-down and bottom-up sequencing. Top-down sequencing, a relatively new approach that involves fragmenting intact proteins directly, is briefly introduced. Bottom-up sequencing, a traditional approach that fragments peptides in the gas phase after protein digestion, is discussed in more detail. The most widely used ion activation and dissociation process, gas-phase collision-activated dissociation (CAD), is discussed from a practical point of view. Infrared multiphoton dissociation (IRMPD) and electron capture dissociation (ECD) are introduced as two alternative dissociation methods. For spectral interpretation, the common fragment ion types in peptide fragmentation and their structures are introduced; the influence of instrumental methods on the fragmentation pathways and final spectra are discussed. A discussion is also provided on the complications in sample preparation for MS analysis. The final section of this article provides a brief review of recent research efforts on different algorithmic approaches being developed to improve protein identification searches.  相似文献   

3.
Capture and analysis of quantitative proteomic data   总被引:1,自引:0,他引:1  
Whilst the array of techniques available for quantitative proteomics continues to grow, the attendant bioinformatic software tools are similarly expanding in number. The data capture and analysis of such quantitative data is obviously crucial to the experiment and the methods used to process it will critically affect the quality of the data obtained. These tools must deal with a variety of issues, including identification of labelled and unlabelled peptide species, location of the corresponding MS scans in the experiment, construction of representative ion chromatograms, location of the true peptide ion chromatogram start and end, elimination of background signal in the mass spectrum and chromatogram and calculation of both peptide and protein ratios/abundances. A variety of tools and approaches are available, in part restricted by the nature of the experiment to be performed and available instrumentation. Currently, although there is no single consensus on precisely how to calculate protein and peptide abundances, many common themes have emerged which identify and reduce many of the key sources of error. These issues will be discussed, along with those relating to deposition of quantitative data. At present, mature data standards for quantitative proteomics are not yet available, although formats are beginning to emerge.  相似文献   

4.
Tandem mass spectrometry (MS/MS) combined with database searching is currently the most widely used method for high-throughput peptide and protein identification. Many different algorithms, scoring criteria, and statistical models have been used to identify peptides and proteins in complex biological samples, and many studies, including our own, describe the accuracy of these identifications, using at best generic terms such as "high confidence." False positive identification rates for these criteria can vary substantially with changing organisms under study, growth conditions, sequence databases, experimental protocols, and instrumentation; therefore, study-specific methods are needed to estimate the accuracy (false positive rates) of these peptide and protein identifications. We present and evaluate methods for estimating false positive identification rates based on searches of randomized databases (reversed and reshuffled). We examine the use of separate searches of a forward then a randomized database and combined searches of a randomized database appended to a forward sequence database. Estimated error rates from randomized database searches are first compared against actual error rates from MS/MS runs of known protein standards. These methods are then applied to biological samples of the model microorganism Shewanella oneidensis strain MR-1. Based on the results obtained in this study, we recommend the use of use of combined searches of a reshuffled database appended to a forward sequence database as a means providing quantitative estimates of false positive identification rates of peptides and proteins. This will allow researchers to set criteria and thresholds to achieve a desired error rate and provide the scientific community with direct and quantifiable measures of peptide and protein identification accuracy as opposed to vague assessments such as "high confidence."  相似文献   

5.
The computational simulation of complete proteomic data sets and their utility to validate detection and interpretation algorithms, to aid in the design of experiments and to assess protein and peptide false discovery rates is presented. The simulation software has been developed for emulating data originating from data-dependent and data-independent LC-MS workflows. Data from all types of commonly used hybrid mass spectrometers can be simulated. The algorithms are based on empirically derived physicochemical liquid and gas phase models for proteins and peptides. Sample composition in terms of complexity and dynamic range, as well as chromatographic, experimental and MS conditions, can be controlled and adjusted independently. The effect of on-column amounts, gradient length, mass resolution and ion mobility on search specificity will be demonstrated using tryptic peptides from human and yeast cellular lysates simulated over five orders of magnitude in dynamic range. Initial justification of the simulated data sets is achieved by comparing and contrasting the in silico simulated data to experimentally derived results from a 48 protein mixture, spanning a similar magnitude of five orders of magnitude. Additionally, experimental data from replicate and dilutions series experiments will be utilized to determine error rates at the peptide and protein level with respect to mass, area, retention and drift time. The data presented reveal a high degree of similarity at the ion detection, peptide and protein level when analyzed under similar conditions.  相似文献   

6.
Tandem mass spectrometry-based proteomics is currently in great demand of computational methods that facilitate the elimination of likely false positives in peptide and protein identification. In the last few years, a number of new peptide identification programs have been described, but scores or other significance measures reported by these programs cannot always be directly translated into an easy to interpret error rate measurement such as the false discovery rate. In this work we used generalized lambda distributions to model frequency distributions of database search scores computed by MASCOT, X!TANDEM with k-score plug-in, OMSSA, and InsPecT. From these distributions, we could successfully estimate p values and false discovery rates with high accuracy. From the set of peptide assignments reported by any of these engines, we also defined a generic protein scoring scheme that enabled accurate estimation of protein-level p values by simulation of random score distributions that was also found to yield good estimates of protein-level false discovery rate. The performance of these methods was evaluated by searching four freely available data sets ranging from 40,000 to 285,000 MS/MS spectra.  相似文献   

7.
Citrullination is a posttranslational modification of arginine. It plays both a physiological role, for instance during apoptosis and epigenetics, and a pathological role in cancer or diseases of the central nervous system. Most research on citrullination to date focuses on its role in auto-immune diseases such as multiple sclerosis and rheumatoid arthritis. In this context, the exact knowledge of citrullination sites in a protein can provide invaluable information about the etiological importance of these citrullinated proteins. However, few techniques exist that can accurately detect citrullination on the peptide level. This review aims to give an overview of the different methods available to date for the detection of citrullinated proteins and peptides. These include 2D-SDS-PAGE and immunodetection, as well as specific mass spectrometry (MS) approaches, both labeled and unlabeled. These MS approaches have been developed to pinpoint the exact location of citrullination on the peptide level. Improving the currently existing detection strategies while focusing on the role of citrullinated proteins will be invaluable to elucidate the importance of this posttranslational modification in vivo.  相似文献   

8.
Computational analysis of shotgun proteomics data   总被引:2,自引:0,他引:2  
Proteomics technology is progressing at an incredible rate. The latest generation of tandem mass spectrometers can now acquire tens of thousands of fragmentation spectra in a matter of hours. Furthermore, quantitative proteomics methods have been developed that incorporate a stable isotope-labeled internal standard for every peptide within a complex protein mixture for the measurement of relative protein abundances. These developments have opened the doors for 'shotgun' proteomics, yet have also placed a burden on the computational approaches that manage the data. With each new method that is developed, the quantity of data that can be derived from a single experiment increases. To deal with this increase, new computational approaches are being developed to manage the data and assess false positives. This review discusses current approaches for analyzing proteomics data by mass spectrometry and identifies present computational limitations and bottlenecks.  相似文献   

9.
Lipids, once thought to be mainly for energy-storage and structural purpose, have now gained immense recognition as a class of critical metabolites with versatile functions. The diversity and complexity of the cellular lipids are the main challenge for the comprehensive analysis of a lipidome. Lipidomics, which aims at mapping all of the lipids in a cell, is expanded rapidly in recent years, mainly attributed to recent advances in mass spectrometry (MS). MS-based lipidomic approaches developed recently allow the quick profiling of hundreds of lipids in a crude lipid extract. With the aid of latest computational tools/software (chemometrics), aberrant lipid metabolites or important signaling lipid(s) could be easily identified using unbiased lipid profiling approaches. Further tandem MS (MS/MS)-based lipidomic approaches, known as targeted approaches and able to convey structural information, hold the promise for high-throughput lipidome analysis. In this review, I discussed the basic strategy for systems level analysis of lipidome in biomedical study.  相似文献   

10.
The high-throughput nature of proteomics mass spectrometry is enabled by a productive combination of data acquisition protocols and the computational tools used to interpret the resulting spectra. One of the key components in mainstream protocols is the generation of tandem mass (MS/MS) spectra by peptide fragmentation using collision induced dissociation, the approach currently used in the large majority of proteomics experiments to routinely identify hundreds to thousands of proteins from single mass spectrometry runs. Complementary to these, alternative peptide fragmentation methods such as electron capture/transfer dissociation and higher-energy collision dissociation have consistently achieved significant improvements in the identification of certain classes of peptides, proteins, and post-translational modifications. Recognizing these advantages, mass spectrometry instruments now conveniently support fine-tuned methods that automatically alternate between peptide fragmentation modes for either different types of peptides or for acquisition of multiple MS/MS spectra from each peptide. But although these developments have the potential to substantially improve peptide identification, their routine application requires corresponding adjustments to the software tools and procedures used for automated downstream processing. This review discusses the computational implications of alternative and alternate modes of MS/MS peptide fragmentation and addresses some practical aspects of using such protocols for identification of peptides and post-translational modifications.  相似文献   

11.
Uni- or multidimensional microcapillary liquid chromatography (microLC) matrix-assisted laser desorption/ionization (MALDI) tandem mass spectrometry (MS/MS) approaches have gained significant attention for quantifying and identifying proteins in complex biological samples. The off-line coupling of microLC with MS quantitation and MS/MS identification methods makes new result-dependent workflows possible. A relational database is used to store the results from multiple high performance liquid chromatography runs, including information about MALDI plate positions, and both peptide and protein quantitations, and identifications. Unlike electrospray methodology, where all the decisions about which peptide to fragment, must be made during peptide fractionations, in the MALDI experiments the samples are effectively "frozen in time". Therefore, additional MS and MS/MS spectra can be acquired, to promote more accurate quantitation or additional identifications until reliable results are derived that meet experimental design criteria. In the case of what can be designated the expression-dependent workflow, quantitation can be detached from identification and only peak pairs with biological relevant expression changes can be selected for further MS/MS analyses. Alternatively, additional MS/MS data can be acquired to confirm tentative peptide mass fingerprint hits in what is designated a search result-dependent workflow. In the MS data-dependent workflow, the goal is to collect as many meaningful spectra as possible by judiciously adjusting the acquisition parameters based on characteristics of the parent masses. This level of sophistication requires the development of innovative algorithms for these three result-dependent workflows that make MS and MS/MS analysis more efficient and also add confidence to experimental results.  相似文献   

12.
Monoclonal antibodies (mAbs) are powerful therapeutics, and their characterization has drawn considerable attention and urgency. Unlike small-molecule drugs (150–600 Da) that have rigid structures, mAbs (∼150 kDa) are engineered proteins that undergo complicated folding and can exist in a number of low-energy structures, posing a challenge for traditional methods in structural biology. Mass spectrometry (MS)-based biophysical characterization approaches can provide structural information, bringing high sensitivity, fast turnaround, and small sample consumption. This review outlines various MS-based strategies for protein biophysical characterization and then reviews how these strategies provide structural information of mAbs at the protein level (intact or top-down approaches), peptide, and residue level (bottom-up approaches), affording information on higher order structure, aggregation, and the nature of antibody complexes.  相似文献   

13.
Protein identification by mass spectrometry is mainly based on MS/MS spectra and the accuracy of molecular mass determination. However, the high complexity and dynamic ranges for any species of proteomic samples, surpass the separation capacity and detection power of the most advanced multidimensional liquid chromatographs and mass spectrometers. Only a tiny portion of signals is selected for MS/MS experiments and a still considerable number of them do not provide reliable peptide identification. In this article, an in silico analysis for a novel methodology of peptides and proteins identification is described. The approach is based on mass accuracy, isoelectric point (pI), retention time (t(R)) and N-terminal amino acid determination as protein identification criteria regardless of high quality MS/MS spectra. When the methodology was combined with the selective isolation methods, the number of unique peptides and identified proteins increases. Finally, to demonstrate the feasibility of the methodology, an OFFGEL-LC-MS/MS experiment was also implemented. We compared the more reliable peptide identified with MS/MS information, and peptide identified with three experimental features (pI, t(R), molecular mass). Also, two theoretical assumptions from MS/MS identification (selective isolation of peptides and N-terminal amino acid) were analyzed. Our results show that using the information provided by these features and selective isolation methods we could found the 93% of the high confidence protein identified by MS/MS with false-positive rate lower than 5%.  相似文献   

14.
Discovery of urinary biomarkers   总被引:4,自引:0,他引:4  
A myriad of proteins and peptides can be identified in normal human urine. These are derived from a variety of sources including glomerular filtration of blood plasma, cell sloughing, apoptosis, proteolytic cleavage of cell surface glycosylphosphatidylinositol-linked proteins, and secretion of exosomes by epithelial cells. Mass spectrometry-based approaches to urinary protein and peptide profiling can, in principle, reveal changes in excretion rates of specific proteins/peptides that can have predictive value in the clinical arena, e.g. in the early diagnosis of disease, in classification of disease with regard to likely therapeutic responses, in assessment of prognosis, and in monitoring response to therapy. These approaches have potential value, not only in diseases of the kidney and urinary tract but also in systemic diseases that are associated with circulating small protein and peptide markers that can pass the glomerular filter. Most large scale biomarker discovery studies reported thus far have used one of two approaches to identify proteins and peptides whose excretion in urine changes in specific disease states: 1) two-dimensional electrophoresis with mass spectrometric and/or immunochemical identification of proteins and 2) top-down mass spectrometric methods (SELDI-TOF-MS and capillary electrophoresis-MS). These studies have been chiefly in the areas of nephrology, urology, and oncology. We review these applications, focusing on two areas of progress, viz. in bladder cancer and in acute rejection of renal transplants. Progress has been limited so far. However, with the advent of powerful LC-MS/MS methods along with methods for quantifying LC-MS/MS output, there is hope for an accelerated discovery and validation of disease biomarkers in urine.  相似文献   

15.
Patel K  Stein R  Benvenuti S  Zvelebil MJ 《Proteomics》2002,2(10):1464-1473
It is only recently that quantitative studies of differential proteome analysis (DPA) have become possible. In this paper the issues involved in quantitative DPA are discussed and novel tools to select features for identification by mass spectrometry (MS) are described. The problem of comparing two sets of gels on a global level is explored as well as how to find specific protein features that differentiate two sets of two-dimensional electrophoresis gels. The concept of a 'virtual' gel, derived from gene expression data, is introduced. The virtual gel enables the co-analysis of data from gene and protein expression. We discuss the value of such an approach, and consider what new information can be gained by using gene and protein expression together. These tools are illustrated by analysis of data from tandem gene and protein expression experiments. Features that are highlighted by the above methods are putative candidates for MS identification. Tools are described that integrate the process of feature selection, cutting, and MS analysis.  相似文献   

16.
Modern proteomics approaches include techniques to examine the expression, localization, modifications, and complex formation of proteins in cells. In order to address issues of protein function in vitro using classical biochemical and biophysical approaches, high-throughput methods of cloning the appropriate reading frames, and expressing and purifying proteins efficiently are an important goal of modern proteomics approaches. This process becomes more difficult as functional proteomics efforts focus on the proteins from higher organisms, since issues of correctly identifying intron-exon boundaries and efficiently expressing and solubilizing the (often) multi-domain proteins from higher eukaryotes are challenging. Recently, 12,000 open-reading-frame (ORF) sequences from Caenorhabditis elegans have become available for functional proteomics studies [Nat. Gen. 34 (2003) 35]. We have implemented a high-throughput screening procedure to express, purify, and analyze by mass spectrometry hexa-histidine-tagged C. elegans ORFs in Escherichia coli using metal affinity ZipTips. We find that over 65% of the expressed proteins are of the correct mass as analyzed by matrix-assisted laser desorption MS. Many of the remaining proteins indicated to be "incorrect" can be explained by high-throughput cloning or genome database annotation errors. This provides a general understanding of the expected error rates in such high-throughput cloning projects. The ZipTip purified proteins can be further analyzed under both native and denaturing conditions for functional proteomics efforts.  相似文献   

17.
基质辅助激光解吸电离飞行时间质谱(MALDI-TOF MS)因其具有快速、准确、高通量等特点在食品微生物检测和临床微生物鉴定领域有广泛的应用。对MALDI-TOF MS数据的预处理和分析是微生物鉴定的关键步骤,通过对数据的处理可以从大量的数据中提取微生物的特征肽或者蛋白信息,并通过有监督和无监督学习方法对这些特征信息进行分类和聚类,从而实现对微生物的鉴定、分型和同源性分析。本文就MALDI-TOF MS鉴定微生物中所应用的数理统计分析方法和数据分析软件进行综述。  相似文献   

18.
A key problem in computational proteomics is distinguishing between correct and false peptide identifications. We argue that evaluating the error rates of peptide identifications is not unlike computing generating functions in combinatorics. We show that the generating functions and their derivatives ( spectral energy and spectral probability) represent new features of tandem mass spectra that, similarly to Delta-scores, significantly improve peptide identifications. Furthermore, the spectral probability provides a rigorous solution to the problem of computing statistical significance of spectral identifications. The spectral energy/probability approach improves the sensitivity-specificity tradeoff of existing MS/MS search tools, addresses the notoriously difficult problem of "one-hit-wonders" in mass spectrometry, and often eliminates the need for decoy database searches. We therefore argue that the generating function approach has the potential to increase the number of peptide identifications in MS/MS searches.  相似文献   

19.
High throughput identification of peptides in databases from tandem mass spectrometry data is a key technique in modern proteomics. Common approaches to interpret large scale peptide identification results are based on the statistical analysis of average score distributions, which are constructed from the set of best scores produced by large collections of MS/MS spectra by using searching engines such as SEQUEST. Other approaches calculate individual peptide identification probabilities on the basis of theoretical models or from single-spectrum score distributions constructed by the set of scores produced by each MS/MS spectrum. In this work, we study the mathematical properties of average SEQUEST score distributions by introducing the concept of spectrum quality and expressing these average distributions as compositions of single-spectrum distributions. We predict and demonstrate in the practice that average score distributions are dominated by the quality distribution in the spectra collection, except in the low probability region, where it is possible to predict the dependence of average probability on database size. Our analysis leads to a novel indicator, the probability ratio, which takes optimally into account the statistical information provided by the first and second best scores. The probability ratio is a non-parametric and robust indicator that makes spectra classification according to parameters such as charge state unnecessary and allows a peptide identification performance, on the basis of false discovery rates, that is better than that obtained by other empirical statistical approaches. The probability ratio also compares favorably with statistical probability indicators obtained by the construction of single-spectrum SEQUEST score distributions. These results make the robustness, conceptual simplicity, and ease of automation of the probability ratio algorithm a very attractive alternative to determine peptide identification confidences and error rates in high throughput experiments.  相似文献   

20.
The use of quantitative proteomics methods to study protein complexes has the potential to provide in-depth information on the abundance of different protein components as well as their modification state in various cellular conditions. To interrogate protein complex quantitation using shotgun proteomic methods, we have focused on the analysis of protein complexes using label-free multidimensional protein identification technology and studied the reproducibility of biological replicates. For these studies, we focused on three highly related and essential multi-protein enzymes, RNA polymerase I, II, and III from Saccharomyces cerevisiae. We found that label-free quantitation using spectral counting is highly reproducible at the protein and peptide level when analyzing RNA polymerase I, II, and III. In addition, we show that peptide sampling does not follow a random sampling model, and we show the need for advanced computational models to predict peptide detection probabilities. In order to address these issues, we used the APEX protocol to model the expected peptide detectability based on whole cell lysate acquired using the same multidimensional protein identification technology analysis used for the protein complexes. Neither method was able to predict the peptide sampling levels that we observed using replicate multidimensional protein identification technology analyses. In addition to the analysis of the RNA polymerase complexes, our analysis provides quantitative information about several RNAP associated proteins including the RNAPII elongation factor complexes DSIF and TFIIF. Our data shows that DSIF and TFIIF are the most highly enriched RNAP accessory factors in Rpb3-TAP purifications and demonstrate our ability to measure low level associated protein abundance across biological replicates. In addition, our quantitative data supports a model in which DSIF and TFIIF interact with RNAPII in a dynamic fashion in agreement with previously published reports.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号