首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
A major unmet need in LC-MS/MS-based proteomics analyses is a set of tools for quantitative assessment of system performance and evaluation of technical variability. Here we describe 46 system performance metrics for monitoring chromatographic performance, electrospray source stability, MS1 and MS2 signals, dynamic sampling of ions for MS/MS, and peptide identification. Applied to data sets from replicate LC-MS/MS analyses, these metrics displayed consistent, reasonable responses to controlled perturbations. The metrics typically displayed variations less than 10% and thus can reveal even subtle differences in performance of system components. Analyses of data from interlaboratory studies conducted under a common standard operating procedure identified outlier data and provided clues to specific causes. Moreover, interlaboratory variation reflected by the metrics indicates which system components vary the most between laboratories. Application of these metrics enables rational, quantitative quality assessment for proteomics and other LC-MS/MS analytical applications.LC-MS/MS provides the most widely used technology platform for proteomics analyses of purified proteins, simple mixtures, and complex proteomes. In a typical analysis, protein mixtures are proteolytically digested, the peptide digest is fractionated, and the resulting peptide fractions then are analyzed by LC-MS/MS (1, 2). Database searches of the MS/MS spectra yield peptide identifications and, by inference and assembly, protein identifications. Depending on protein sample load and the extent of peptide fractionation used, LC-MS/MS analytical systems can generate from hundreds to thousands of peptide and protein identifications (3). Many variations of LC-MS/MS analytical platforms have been described, and the performance of these systems is influenced by a number of experimental design factors (4).Comparison of data sets obtained by LC-MS/MS analyses provides a means to evaluate the proteomic basis for biologically significant states or phenotypes. For example, data-dependent LC-MS/MS analyses of tumor and normal tissues enabled unbiased discovery of proteins whose expression is enhanced in cancer (57). Comparison of data-dependent LC-MS/MS data sets from phosphotyrosine peptides in drug-responsive and -resistant cell lines identified differentially regulated phosphoprotein signaling networks (8, 9). Similarly, activity-based probes and data-dependent LC-MS/MS analysis were used to identify differentially regulated enzymes in normal and tumor tissues (10). All of these approaches assume that the observed differences reflect differences in the proteomic composition of the samples analyzed rather than analytical system variability. The validity of this assumption is difficult to assess because of a lack of objective criteria to assess analytical system performance.The problem of variability poses three practical questions for analysts using LC-MS/MS proteomics platforms. First, is the analytical system performing optimally for the reproducible analysis of complex proteomes? Second, can the sources of suboptimal performance and variability be identified, and can the impact of changes or improvements be evaluated? Third, can system performance metrics provide documentation to support the assessment of proteomic differences between biologically interesting samples?Currently, the most commonly used measure of variability in LC-MS/MS proteomics analyses is the number of confident peptide identifications (1113). Although consistency in numbers of identifications may indicate repeatability, the numbers do not indicate whether system performance is optimal or which components require optimization. One well characterized source of variability in peptide identifications is the automated sampling of peptide ion signals for acquisition of MS/MS spectra by instrument control software, which results in stochastic sampling of lower abundance peptides (14). Variability certainly also arises from sample preparation methods (e.g. protein extraction and digestion). A largely unexplored source of variability is the performance of the core LC-MS/MS analytical system, which includes the LC system, the MS instrument, and system software. The configuration, tuning, and operation of these system components govern sample injection, chromatography, electrospray ionization, MS signal detection, and sampling for MS/MS analysis. These characteristics all are subject to manipulation by the operator and thus provide means to optimize system performance.Here we describe the development of 46 metrics for evaluating the performance of LC-MS/MS system components. We have implemented a freely available software pipeline that generates these metrics directly from LC-MS/MS data files. We demonstrate their use in characterizing sources of variability in proteomics platforms, both for replicate analyses on a single instrument and in the context of large interlaboratory studies conducted by the National Cancer Institute-supported Clinical Proteomic Technology Assessment for Cancer (CPTAC)1 Network.  相似文献   

3.
iTRAQ (isobaric tags for relative or absolute quantitation) is a mass spectrometry technology that allows quantitative comparison of protein abundance by measuring peak intensities of reporter ions released from iTRAQ-tagged peptides by fragmentation during MS/MS. However, current data analysis techniques for iTRAQ struggle to report reliable relative protein abundance estimates and suffer with problems of precision and accuracy. The precision of the data is affected by variance heterogeneity: low signal data have higher relative variability; however, low abundance peptides dominate data sets. Accuracy is compromised as ratios are compressed toward 1, leading to underestimation of the ratio. This study investigated both issues and proposed a methodology that combines the peptide measurements to give a robust protein estimate even when the data for the protein are sparse or at low intensity. Our data indicated that ratio compression arises from contamination during precursor ion selection, which occurs at a consistent proportion within an experiment and thus results in a linear relationship between expected and observed ratios. We proposed that a correction factor can be calculated from spiked proteins at known ratios. Then we demonstrated that variance heterogeneity is present in iTRAQ data sets irrespective of the analytical packages, LC-MS/MS instrumentation, and iTRAQ labeling kit (4-plex or 8-plex) used. We proposed using an additive-multiplicative error model for peak intensities in MS/MS quantitation and demonstrated that a variance-stabilizing normalization is able to address the error structure and stabilize the variance across the entire intensity range. The resulting uniform variance structure simplifies the downstream analysis. Heterogeneity of variance consistent with an additive-multiplicative model has been reported in other MS-based quantitation including fields outside of proteomics; consequently the variance-stabilizing normalization methodology has the potential to increase the capabilities of MS in quantitation across diverse areas of biology and chemistry.Different techniques are being used and developed in the field of proteomics to allow quantitative comparison of samples between one state and another. These can be divided into gel- (14) or mass spectrometry-based (58) techniques. Comparative studies have found that each technique has strengths and weaknesses and plays a complementary role in proteomics (9, 10). There is significant interest in stable isotope labeling strategies of proteins or peptides as with every measurement there is the potential to use an internal reference allowing relative quantitation comparison, which significantly increases sensitivity of detection of change in abundance. Isobaric labeling techniques such as tandem mass tags (11, 12) or isobaric tags for relative or absolute quantitation (iTRAQ)1 (13, 14) allow multiplexing of four, six and eight separately labeled samples within one experiment. In contrast to most other quantitative proteomics methods where precursor ion intensities are measured, here the measurement and ensuing quantitation of iTRAQ reporter ions occurs after fragmentation of the precursor ion. Differentially labeled peptides are selected in MS as a single mass precursor ion as the size difference of the tags is equalized by a balance group. The reporter ions are only liberated in MS/MS after the reporter ion and balance groups fragment from the labeled peptides during CID. iTRAQ has been applied to a wide range of biological applications from bacteria under nitrate stress (15) to mouse models of cerebellar dysfunction (16).For the majority of MS-based quantitation methods (including MS/MS-based methods like iTRAQ), the measurements are made at the peptide level and then combined to compute a summarized value for the protein from which they arose. An advantage is that the protein can be identified and quantified from data of multiple peptides often with multiple values per distinct peptide, thereby enhancing confidence in both identity and the abundance. However, the question arises of how to summarize the peptide readings to obtain an estimate of the protein ratio. This will involve some sort of averaging, and we need to consider the distribution of the data, in particular the following three aspects. (i) Are the data centered around a single mode (which would be related to the true protein quantitation), or are there phenomena that make them multimodal? (ii) Are the data approximately symmetric (non-skewed) around the mode? (iii) Are there outliers? In the case of multimodality, it is recommended that an effort be made to separate the various phenomena into their separate variables and to dissect the multimodality. Li et al. (17) developed ASAP ratio for ICAT data that includes a complex data combination strategy. Peptide abundance ratios are calculated by combining data from multiple fractions across MS runs and then averaging across peptides to give an abundance ratio for each parent protein. GPS Explorer, a software package developed for iTRAQ, assumes normality in the peptide ratio for a protein once an outlier filter is applied (18). The iTRAQ package ProQuant assumes that peptide ratio data for a protein follow a log-normal distribution (19). Averaging can be via mean (20), weighted average (21, 22), or weighted correlation (23). Some of these methods try to take into account the varying precision of the peptide measurements. There are many different ideas of how to process peptide data, but as yet no systematic study has been completed to guide analysis and ensure the methods being utilized are appropriate.The quality of a quantitation method can be considered in terms of precision, which refers to how well repeated measurements agree with each other, and accuracy, which refers to how much they on average deviate from the true value. Both of these types of variability are inherent to the measurement process. Precision is affected by random errors, non-reproducible and unpredictable fluctuations around the true value. (In)accuracy, by contrast, is caused by systematic biases that go consistently in the same direction. In iTRAQ, systematic biases can arise because of inconsistencies in iTRAQ labeling efficiency and protein digestion (22). Typically, ratiometric normalization has been used to address this tag bias where all peptide ratios are multiplied by a global normalization factor determined to center the ratio distribution on 1 (19, 22). Even after such normalization, concerns have been raised that iTRAQ has imperfect accuracy with ratios shrunken toward 1, and this underestimation has been reported across multiple MS platforms (2327). It has been suggested that this underestimation arises from co-eluting peptides with similar m/z values, which are co-selected during ion selection and co-fragmented during CID (23, 27). As the majority of these will be at a 1:1 ratio across the reporter ion tags (as required for normalization in iTRAQ experiments), they will contribute a background value equally to each of the iTRAQ reporter ion signals and diminish the computed ratios.With regard to random errors, iTRAQ data are seen to exhibit heterogeneity of variance; that is the variance of the signal depends on its mean. In particular, the coefficient of variation (CV) is higher in data from low intensity peaks than in data from high intensity peaks (16, 22, 23). This has also been observed in other MS-based quantitation techniques when quantifying from the MS signal (2830). Different approaches have been proposed to model the variance heterogeneity. Pavelka et al. (31) used a power law global error model in conjunction with quantitation data derived from spectral counts. Other authors have proposed that the higher CV at low signal arises from the majority of MS instrumentation measuring ion counts as whole numbers (32). Anderle et al. (28) described a two-component error model in which Poisson statistics of ion counts measured as whole numbers dominate at the low intensity end of the dynamic range and multiplicative effects dominate at the high intensity end and demonstrated its fit to label-free LC-MS quantitation data. Previously, in the 1990s, Rocke and Lorenzato (29) proposed a two-component additive-multiplicative error model in an environmental toxin monitoring study utilizing gas chromatography MS.How can the variance heterogeneity be addressed in the data analysis? Some of the current approaches include outlier removal (18, 25), weighted means (21, 22), inclusion filters (16, 22), logarithmic transformation (19), and weighted correlation analysis (23). Outlier removal methods, for example using Dixon''s test, assume a normal distribution for which there is little empirical basis. The inclusion filter method, where low intensity data are excluded, reduces the protein coverage considerably if the heterogeneity is to be significantly reduced. The weighted mean method results in higher intensity readings contributing more to the weighted mean than readings from low intensity readings. Filtering, outlier removal, and weighted methods are of limited use for peptides for which only a few low intensity readings were made; however, such cases typically dominate the data sets. Even with a logarithmic transformation, heterogeneity has been reported for iTRAQ data (16, 19, 22). Current methods struggle to address the issue and to maintain sensitivity.Here we investigate the data analysis issues that relate to precision and accuracy in quantitation and propose a robust methodology that is designed to make use of all data without ad hoc filtering rules. The additive-multiplicative model mentioned above motivates the so-called generalized logarithm transformation, a transformation that addresses heterogeneity of variance by approximately stabilizing the variance of the transformed signal across its whole dynamic range (33). Huber et al. (33) provided an open source software package, variance-stabilizing normalization (VSN), that determines the data-dependent transformation parameters. Here we report that the application of this transformation is beneficial for the analysis of iTRAQ data. We investigated the error structure of iTRAQ quantitation data using different peak identification and quantitation packages, LC-MS/MS data collection systems, and both the 4-plex and 8-plex iTRAQ systems. The usefulness of the VSN transformation to address heterogeneity of variance was demonstrated. Furthermore, we considered the correlations between multiple, peptide-level readings for the same protein and proposed a method to summarize them to a protein abundance estimate. We considered same-same comparisons to assess the magnitude of experimental variability and then used a set of complex biological samples whose biology has been well characterized to assess the power of the method to detect true differential abundance. We assessed the accuracy of the system with a four-protein mixture at known ratios spanning a -fold change expression range of 1–4. From this, we proposed a methodology to address the accuracy issues of iTRAQ.  相似文献   

4.
Quantitative analysis of discovery-based proteomic workflows now relies on high-throughput large-scale methods for identification and quantitation of proteins and post-translational modifications. Advancements in label-free quantitative techniques, using either data-dependent or data-independent mass spectrometric acquisitions, have coincided with improved instrumentation featuring greater precision, increased mass accuracy, and faster scan speeds. We recently reported on a new quantitative method called MS1 Filtering (Schilling et al. (2012) Mol. Cell. Proteomics 11, 202–214) for processing data-independent MS1 ion intensity chromatograms from peptide analytes using the Skyline software platform. In contrast, data-independent acquisitions from MS2 scans, or SWATH, can quantify all fragment ion intensities when reference spectra are available. As each SWATH acquisition cycle typically contains an MS1 scan, these two independent label-free quantitative approaches can be acquired in a single experiment. Here, we have expanded the capability of Skyline to extract both MS1 and MS2 ion intensity chromatograms from a single SWATH data-independent acquisition in an Integrated Dual Scan Analysis approach. The performance of both MS1 and MS2 data was examined in simple and complex samples using standard concentration curves. Cases of interferences in MS1 and MS2 ion intensity data were assessed, as were the differentiation and quantitation of phosphopeptide isomers in MS2 scan data. In addition, we demonstrated an approach for optimization of SWATH m/z window sizes to reduce interferences using MS1 scans as a guide. Finally, a correlation analysis was performed on both MS1 and MS2 ion intensity data obtained from SWATH acquisitions on a complex mixture using a linear model that automatically removes signals containing interferences. This work demonstrates the practical advantages of properly acquiring and processing MS1 precursor data in addition to MS2 fragment ion intensity data in a data-independent acquisition (SWATH), and provides an approach to simultaneously obtain independent measurements of relative peptide abundance from a single experiment.Mass spectrometry is the leading technology for large-scale identification and quantitation of proteins and post-translational modifications (PTMs)1 in biological systems (1, 2). Although several types of experimental designs are employed in such workflows, most large-scale applications use data-dependent acquisitions (DDA) where peptide precursors are first identified in the MS1 scan and one or more peaks are then selected for subsequent fragmentation to generate their corresponding MS2 spectra. In experiments using DDA, one can employ either chemical/metabolic labeling or label-free strategies for relative quantitation of peptides (and proteins) (3, 4). Depending on the type of labeling approach employed, i.e. metabolic labeling with SILAC or postmetabolic labeling with ICAT or isobaric tags such as iTRAQ or TMT, the relative quantitation of these peptides are made using either MS1 or MS2 ion intensity data (47). Label-free quantitative techniques have until recently been based entirely on integrated ion intensity measurements of precursors in the MS1 scan, or in the case of spectral counting the number of assigned MS2 spectra (3, 8, 9).Label-free approaches have recently generated more widespread interest (1012), in part because of their adaptability to a wide range of proteomic workflows, including human samples that are not amenable to most metabolic labeling techniques, or where chemical labeling may be cost prohibitive and/or interfere with subsequent enrichment steps (11, 13). However the use of DDA for label-free quantitation is also susceptible to several limitations including insufficient reproducibility because of under-sampling, digestion efficiency, as well as misidentifications (14, 15). Moreover, low ion abundance may prohibit peptide selection, especially in complex samples (14). These limitations often present challenges in data analysis when making comparisons across samples, or when a peptide is sampled in only one of the study conditions.To address the challenges in obtaining more comprehensive sampling in MS1 space, Purvine et al. first demonstrated the ability to obtain sequence information from peptides fragmented across the entire m/z range using “shotgun or parallel collision-induced dissociation (CID)” on an orthogonal time of flight instrument (16). Shortly thereafter Venable et al. reported on a data independent acquisition methodology to limit the complexity of the MS2 scan by using a segmented approach for the sequential isolation and fragmentation of all peptides in a defined precursor window (e.g. 10 m/z) using an ion trap mass spectrometer (17). However, the proper implementation of this DIA technique suffered from technical limitations of instruments available at that time, including slow acquisition rates and low MS2 resolution that made systematic product ion extraction problematic. To alleviate the challenge of long duty cycles in DIAs, researchers at the Waters Corporation adopted an alternative approach by rapidly switching between low (MS1) and high energy (MS2) scans and then using proprietary software to align peptide precursor and fragment ion information to determine peptide sequences (18, 19). Recent mass spectrometry innovations in efficient high-speed scanning capabilities, together with high-resolution data acquisition of both MS1 and MS2 scans, and multiplexing of scan windows have overcome many of these limitations (10, 20, 21). Moreover, the simultaneous development of novel software solutions for extracting ion intensity chromatograms based on spectral libraries has enabled the use of DIA for large-scale label free quantitation of multiple peptide analytes (21, 22). In addition to targeting specific peptides from a previously generated peptide spectral library, the data can also be reexamined (i.e. post-acquisition) for additional peptides of interest as new reference data emerges. On the SCIEX TripleTOF 5600, a quadrupole orthogonal time-of-flight mass spectrometer, this technique has been optimized and extended to what is called ‘SWATH MS2′ based on a combination of new technical and software improvements (10, 22).In a DIA experiment a MS1 survey scan is carried out across the mass range followed by a SWATH MS2 acquisition series, however the cycle time of the MS1 scan is dramatically shortened compared with DDA type experiments. The Q1 quadrupole is set to transmit a wider window, typically Δ25 m/z, to the collision cell in incremental steps over the full mass range. Therefore the MS/MS spectra produced during a SWATH MS2 acquisition are of much greater complexity as the MS/MS spectra are a composite of all fragment ions produced from peptide analytes with molecular ions within the selected MS1 m/z window. The cycle of data independent MS1 survey scans and SWATH MS2 scans is repeated throughout the entire LC-MS acquisition. Fragment ion information contained in these SWATH MS2 spectra can be used to uniquely identify specific peptides by comparisons to reference spectra or spectral libraries. Moreover, ion intensities of these fragment ions can also be used for quantitation. Although MS2 typically increases selectivity and reduces the chemical noise often observed in MS1 scans, quantifying peptides from SWATH MS2 scans can be problematic because of the presence of interferences in one or more fragment ions or decreased ion intensity of MS2 scans as compared with the MS1 precursor ion abundance.To partially alleviate some of these limitations in SWATH MS2 scan quantitation it is potentially advantageous to exploit MS1 ion intensity data, which is acquired independently as part of each SWATH scan cycle. Recently, our laboratories and others have developed label free quantitation tools for data dependent acquisitions (11, 12, 23) using MS1 ion intensity data. For example, the MS1 Filtering algorithm uses expanded features in the open source software application Skyline (11, 24). Skyline MS1 Filtering processes precursor ion intensity chromatograms of peptide analytes from full scan mass spectral data acquired during data dependent acquisitions by LC MS/MS. New graphical tools were developed within Skyline to enable visual inspection and manual interrogation and integration of extracted ion chromatograms across multiple acquisitions. MS1 Filtering was subsequently shown to have excellent linear response across several orders of magnitude with limits of detection in the low attomole range (11). We, and others, have demonstrated the utility of this method for carrying out large-scale quantitation of peptide analytes across a range of applications (2528). However, quantifying peptides based on MS1 precursor ion intensities can be compromised by a low signal-to-noise ratio. This is particularly the case when quantifying low abundance peptides in a complex sample where the MS1 ion “background” signal is high, or when chromatograms contain interferences, or partial overlap of multiple target precursor ions.Currently MS1 scans are underutilized or even deemphasized by some vendors during DIA workflows. However, we believe an opportunity exists that would improve data-independent acquisitions (DIA) experiments by including MS1 ion intensity data in the final data processing of LC-MS/MS acquisitions. Therefore, to address this possibility, we have adapted Skyline to efficiently extract and process both precursor and product ion chromatograms for label free quantitation across multiple samples. The graphical tools and features originally developed for SRM and MS1 Filtering experiments have been expanded to process DIA data sets from multiple vendors including SCIEX, Thermo, Waters, Bruker, and Agilent. These expanded features provide a single platform for data mining of targeted proteomics using both the MS1 and MS2 scans that we call Integrated Dual Scan Analysis, or IDSA. As a test of this approach, a series of SWATH MS2 acquisitions of simple and complex mixtures was analyzed on an SCIEX TripleTOF 5600 mass spectrometer. We also investigated the use of MS2 scans for differentiating a case of phosphopeptide isomers that are indistinguishable at the MS1 level. In addition, we investigated whether smaller SWATH m/z windows would provide more reliable quantitative data in these cases by reducing the number of potential interferences. Lastly, we performed a statistical assessment of the accuracy and reproducibility of the estimated (log) fold change of mitochondrial lysates from mouse liver at different concentration levels to better assess the overall value of acquiring MS1 and MS2 data in combination and as independent measurements during DIA experiments.  相似文献   

5.
Comprehensive proteomic profiling of biological specimens usually requires multidimensional chromatographic peptide fractionation prior to mass spectrometry. However, this approach can suffer from poor reproducibility because of the lack of standardization and automation of the entire workflow, thus compromising performance of quantitative proteomic investigations. To address these variables we developed an online peptide fractionation system comprising a multiphasic liquid chromatography (LC) chip that integrates reversed phase and strong cation exchange chromatography upstream of the mass spectrometer (MS). We showed superiority of this system for standardizing discovery and targeted proteomic workflows using cancer cell lysates and nondepleted human plasma. Five-step multiphase chip LC MS/MS acquisition showed clear advantages over analyses of unfractionated samples by identifying more peptides, consuming less sample and often improving the lower limits of quantitation, all in highly reproducible, automated, online configuration. We further showed that multiphase chip LC fractionation provided a facile means to detect many N- and C-terminal peptides (including acetylated N terminus) that are challenging to identify in complex tryptic peptide matrices because of less favorable ionization characteristics. Given as much as 95% of peptides were detected in only a single salt fraction from cell lysates we exploited this high reproducibility and coupled it with multiple reaction monitoring on a high-resolution MS instrument (MRM-HR). This approach increased target analyte peak area and improved lower limits of quantitation without negatively influencing variance or bias. Further, we showed a strategy to use multiphase LC chip fractionation LC-MS/MS for ion library generation to integrate with SWATHTM data-independent acquisition quantitative workflows. All MS data are available via ProteomeXchange with identifier PXD001464.Mass spectrometry based proteomic quantitation is an essential technique used for contemporary, integrative biological studies. Whether used in discovery experiments or for targeted biomarker applications, quantitative proteomic studies require high reproducibility at many levels. It requires reproducible run-to-run peptide detection, reproducible peptide quantitation, reproducible depth of proteome coverage, and ideally, a high degree of cross-laboratory analytical reproducibility. Mass spectrometry centered proteomics has evolved steadily over the past decade, now mature enough to derive extensive draft maps of the human proteome (1, 2). Nonetheless, a key requirement yet to be realized is to ensure that quantitative proteomics can be carried out in a timely manner while satisfying the aforementioned challenges associated with reproducibility. This is especially important for recent developments using data independent MS quantitation and multiple reaction monitoring on high-resolution MS (MRM-HR)1 as they are both highly dependent on LC peptide retention time reproducibility and precursor detectability, while attempting to maximize proteome coverage (3). Strategies usually employed to increase the depth of proteome coverage utilize various sample fractionation methods including gel-based separation, affinity enrichment or depletion, protein or peptide chemical modification-based enrichment, and various peptide chromatography methods, particularly ion exchange chromatography (410). In comparison to an unfractionated “naive” sample, the trade-off in using these enrichments/fractionation approaches are higher risk of sample losses, introduction of undesired chemical modifications (e.g. oxidation, deamidation, N-terminal lactam formation), and the potential for result skewing and bias, as well as numerous time and human resources required to perform the sample preparation tasks. Online-coupled approaches aim to minimize those risks and address resource constraints. A widely practiced example of the benefits of online sample fractionation has been the decade long use of combining strong cation exchange chromatography (SCX) with C18 reversed-phase (RP) for peptide fractionation (known as MudPIT – multidimensional protein identification technology), where SCX and RP is performed under the same buffer conditions and the SCX elution performed with volatile organic cations compatible with reversed phase separation (11). This approach greatly increases analyte detection while avoiding sample handling losses. The MudPIT approach has been widely used for discovery proteomics (1214), and we have previously shown that multiphasic separations also have utility for targeted proteomics when configured for selected reaction monitoring MS (SRM-MS). We showed substantial advantages of MudPIT-SRM-MS with reduced ion suppression, increased peak areas and lower limits of detection (LLOD) compared with conventional RP-SRM-MS (15).To improve the reproducibility of proteomic workflows, increase throughput and minimize sample loss, numerous microfluidic devices have been developed and integrated for proteomic applications (16, 17). These devices can broadly be classified into two groups: (1) microfluidic chips for peptide separation (1825) and; (2) proteome reactors that combine enzymatic processing with peptide based fractionation (2630). Because of the small dimension of these devices, they are readily able to integrate into nanoLC workflows. Various applications have been described including increasing proteome coverage (22, 27, 28) and targeting of phosphopeptides (24, 31, 32), glycopeptides and released glycans (29, 33, 34).In this work, we set out to take advantage of the benefits of multiphasic peptide separations and address the reproducibility needs required for high-throughput comparative proteomics using a variety of workflows. We integrated a multiphasic SCX and RP column in a “plug-and-play” microfluidic chip format for online fractionation, eliminating the need for users to make minimal dead volume connections between traps and columns. We show the flexibility of this format to provide robust peptide separation and reproducibility using conventional and topical mass spectrometry workflows. This was undertaken by coupling the multiphase liquid chromatography (LC) chip to a fast scanning Q-ToF mass spectrometer for data dependent MS/MS, data independent MS (SWATH) and for targeted proteomics using MRM-HR, showing clear advantages for repeatable analyses compared with conventional proteomic workflows.  相似文献   

6.
A complete understanding of the biological functions of large signaling peptides (>4 kDa) requires comprehensive characterization of their amino acid sequences and post-translational modifications, which presents significant analytical challenges. In the past decade, there has been great success with mass spectrometry-based de novo sequencing of small neuropeptides. However, these approaches are less applicable to larger neuropeptides because of the inefficient fragmentation of peptides larger than 4 kDa and their lower endogenous abundance. The conventional proteomics approach focuses on large-scale determination of protein identities via database searching, lacking the ability for in-depth elucidation of individual amino acid residues. Here, we present a multifaceted MS approach for identification and characterization of large crustacean hyperglycemic hormone (CHH)-family neuropeptides, a class of peptide hormones that play central roles in the regulation of many important physiological processes of crustaceans. Six crustacean CHH-family neuropeptides (8–9.5 kDa), including two novel peptides with extensive disulfide linkages and PTMs, were fully sequenced without reference to genomic databases. High-definition de novo sequencing was achieved by a combination of bottom-up, off-line top-down, and on-line top-down tandem MS methods. Statistical evaluation indicated that these methods provided complementary information for sequence interpretation and increased the local identification confidence of each amino acid. Further investigations by MALDI imaging MS mapped the spatial distribution and colocalization patterns of various CHH-family neuropeptides in the neuroendocrine organs, revealing that two CHH-subfamilies are involved in distinct signaling pathways.Neuropeptides and hormones comprise a diverse class of signaling molecules involved in numerous essential physiological processes, including analgesia, reward, food intake, learning and memory (1). Disorders of the neurosecretory and neuroendocrine systems influence many pathological processes. For example, obesity results from failure of energy homeostasis in association with endocrine alterations (2, 3). Previous work from our lab used crustaceans as model organisms found that multiple neuropeptides were implicated in control of food intake, including RFamides, tachykinin related peptides, RYamides, and pyrokinins (46).Crustacean hyperglycemic hormone (CHH)1 family neuropeptides play a central role in energy homeostasis of crustaceans (717). Hyperglycemic response of the CHHs was first reported after injection of crude eyestalk extract in crustaceans. Based on their preprohormone organization, the CHH family can be grouped into two sub-families: subfamily-I containing CHH, and subfamily-II containing molt-inhibiting hormone (MIH) and mandibular organ-inhibiting hormone (MOIH). The preprohormones of the subfamily-I have a CHH precursor related peptide (CPRP) that is cleaved off during processing; and preprohormones of the subfamily-II lack the CPRP (9). Uncovering their physiological functions will provide new insights into neuroendocrine regulation of energy homeostasis.Characterization of CHH-family neuropeptides is challenging. They are comprised of more than 70 amino acids and often contain multiple post-translational modifications (PTMs) and complex disulfide bridge connections (7). In addition, physiological concentrations of these peptide hormones are typically below picomolar level, and most crustacean species do not have available genome and proteome databases to assist MS-based sequencing.MS-based neuropeptidomics provides a powerful tool for rapid discovery and analysis of a large number of endogenous peptides from the brain and the central nervous system. Our group and others have greatly expanded the peptidomes of many model organisms (3, 1833). For example, we have discovered more than 200 neuropeptides with several neuropeptide families consisting of as many as 20–40 members in a simple crustacean model system (5, 6, 2531, 34). However, a majority of these neuropeptides are small peptides with 5–15 amino acid residues long, leaving a gap of identifying larger signaling peptides from organisms without sequenced genome. The observed lack of larger size peptide hormones can be attributed to the lack of effective de novo sequencing strategies for neuropeptides larger than 4 kDa, which are inherently more difficult to fragment using conventional techniques (3437). Although classical proteomics studies examine larger proteins, these tools are limited to identification based on database searching with one or more peptides matching without complete amino acid sequence coverage (36, 38).Large populations of neuropeptides from 4–10 kDa exist in the nervous systems of both vertebrates and invertebrates (9, 39, 40). Understanding their functional roles requires sufficient molecular knowledge and a unique analytical approach. Therefore, developing effective and reliable methods for de novo sequencing of large neuropeptides at the individual amino acid residue level is an urgent gap to fill in neurobiology. In this study, we present a multifaceted MS strategy aimed at high-definition de novo sequencing and comprehensive characterization of the CHH-family neuropeptides in crustacean central nervous system. The high-definition de novo sequencing was achieved by a combination of three methods: (1) enzymatic digestion and LC-tandem mass spectrometry (MS/MS) bottom-up analysis to generate detailed sequences of proteolytic peptides; (2) off-line LC fractionation and subsequent top-down MS/MS to obtain high-quality fragmentation maps of intact peptides; and (3) on-line LC coupled to top-down MS/MS to allow rapid sequence analysis of low abundance peptides. Combining the three methods overcomes the limitations of each, and thus offers complementary and high-confidence determination of amino acid residues. We report the complete sequence analysis of six CHH-family neuropeptides including the discovery of two novel peptides. With the accurate molecular information, MALDI imaging and ion mobility MS were conducted for the first time to explore their anatomical distribution and biochemical properties.  相似文献   

7.
Understanding how a small brain region, the suprachiasmatic nucleus (SCN), can synchronize the body''s circadian rhythms is an ongoing research area. This important time-keeping system requires a complex suite of peptide hormones and transmitters that remain incompletely characterized. Here, capillary liquid chromatography and FTMS have been coupled with tailored software for the analysis of endogenous peptides present in the SCN of the rat brain. After ex vivo processing of brain slices, peptide extraction, identification, and characterization from tandem FTMS data with <5-ppm mass accuracy produced a hyperconfident list of 102 endogenous peptides, including 33 previously unidentified peptides, and 12 peptides that were post-translationally modified with amidation, phosphorylation, pyroglutamylation, or acetylation. This characterization of endogenous peptides from the SCN will aid in understanding the molecular mechanisms that mediate rhythmic behaviors in mammals.Central nervous system neuropeptides function in cell-to-cell signaling and are involved in many physiological processes such as circadian rhythms, pain, hunger, feeding, and body weight regulation (14). Neuropeptides are produced from larger protein precursors by the selective action of endopeptidases, which cleave at mono- or dibasic sites and then remove the C-terminal basic residues (1, 2). Some neuropeptides undergo functionally important post-translational modifications (PTMs),1 including amidation, phosphorylation, pyroglutamylation, or acetylation. These aspects of peptide synthesis impact the properties of neuropeptides, further expanding their diverse physiological implications. Therefore, unveiling new peptides and unreported peptide properties is critical to advancing our understanding of nervous system function.Historically, the analysis of neuropeptides was performed by Edman degradation in which the N-terminal amino acid is sequentially removed. However, analysis by this method is slow and does not allow for sequencing of the peptides containing N-terminal PTMs (5). Immunological techniques, such as radioimmunoassay and immunohistochemistry, are used for measuring relative peptide levels and spatial localization, but these methods only detect peptide sequences with known structure (6). More direct, high throughput methods of analyzing brain regions can be used.Mass spectrometry, a rapid and sensitive method that has been used for the analysis of complex biological samples, can detect and identify the precise forms of neuropeptides without prior knowledge of peptide identity, with these approaches making up the field of peptidomics (712). The direct tissue and single neuron analysis by MALDI MS has enabled the discovery of hundreds of neuropeptides in the last decade, and the neuronal homogenate analysis by fractionation and subsequent ESI or MALDI MS has yielded an equivalent number of new brain peptides (5). Several recent peptidome studies, including the work by Dowell et al. (10), have used the specificity of FTMS for peptide discovery (10, 1315). Here, we combine the ability to fragment ions at ultrahigh mass accuracy (16) with a software pipeline designed for neuropeptide discovery. We use nanocapillary reversed-phase LC coupled to 12 Tesla FTMS for the analysis of peptides present in the suprachiasmatic nucleus (SCN) of rat brain.A relatively small, paired brain nucleus located at the base of the hypothalamus directly above the optic chiasm, the SCN contains a biological clock that generates circadian rhythms in behaviors and homeostatic functions (17, 18). The SCN comprises ∼10,000 cellular clocks that are integrated as a tissue level clock which, in turn, orchestrates circadian rhythms throughout the brain and body. It is sensitive to incoming signals from the light-sensing retina and other brain regions, which cause temporal adjustments that align the SCN appropriately with changes in environmental or behavioral state. Previous physiological studies have implicated peptides as critical synchronizers of normal SCN function as well as mediators of SCN inputs, internal signal processing, and outputs; however, only a small number of peptides have been identified and explored in the SCN, leaving unresolved many circadian mechanisms that may involve peptide function.Most peptide expression in the SCN has only been studied through indirect antibody-based techniques (1929), although we recently used MS approaches to characterize several peptides detected in SCN releasates (30). Previous studies indicate that the SCN expresses a rich diversity of peptides relative to other brain regions studied with the same techniques. Previously used immunohistochemical approaches are not only inadequate for comprehensively evaluating PTMs and alternate isoforms of known peptides but are also incapable of exhaustively examining the full peptide complement of this complex biological network of peptidergic inputs and intrinsic components. A comprehensive study of SCN peptidomics is required that utilizes high resolution strategies for directly analyzing the peptide content of the neuronal networks comprising the SCN.In our study, the SCN was obtained from ex vivo coronal brain slices via tissue punch and subjected to multistage peptide extraction. The SCN tissue extract was analyzed by FTMS/MS, and the high resolution MS and MS/MS data were processed using ProSightPC 2.0 (16), which allows the identification and characterization of peptides or proteins from high mass accuracy MS/MS data. In addition, the Sequence Gazer included in ProSightPC was used for manually determining PTMs (31, 32). As a result, a total of 102 endogenous peptides were identified, including 33 that were previously unidentified, and 12 PTMs (including amidation, phosphorylation, pyroglutamylation, and acetylation) were found. The present study is the first comprehensive peptidomics study for identifying peptides present within the mammalian SCN. In fact, this is one of the first peptidome studies to work with discrete brain nuclei as opposed to larger brain structures and follows up on our recent report using LC-ion trap for analysis of the peptides in the supraoptic nucleus (33); here, the use of FTMS allows a greater range of PTMs to be confirmed and allows higher confidence in the peptide assignments. This information on the peptides in the SCN will serve as a basis to more exhaustively explore the extent that previously unreported SCN neuropeptides may function in SCN regulation of mammalian circadian physiology.  相似文献   

8.
Top-down mass spectrometry (MS)-based proteomics is arguably a disruptive technology for the comprehensive analysis of all proteoforms arising from genetic variation, alternative splicing, and posttranslational modifications (PTMs). However, the complexity of top-down high-resolution mass spectra presents a significant challenge for data analysis. In contrast to the well-developed software packages available for data analysis in bottom-up proteomics, the data analysis tools in top-down proteomics remain underdeveloped. Moreover, despite recent efforts to develop algorithms and tools for the deconvolution of top-down high-resolution mass spectra and the identification of proteins from complex mixtures, a multifunctional software platform, which allows for the identification, quantitation, and characterization of proteoforms with visual validation, is still lacking. Herein, we have developed MASH Suite Pro, a comprehensive software tool for top-down proteomics with multifaceted functionality. MASH Suite Pro is capable of processing high-resolution MS and tandem MS (MS/MS) data using two deconvolution algorithms to optimize protein identification results. In addition, MASH Suite Pro allows for the characterization of PTMs and sequence variations, as well as the relative quantitation of multiple proteoforms in different experimental conditions. The program also provides visualization components for validation and correction of the computational outputs. Furthermore, MASH Suite Pro facilitates data reporting and presentation via direct output of the graphics. Thus, MASH Suite Pro significantly simplifies and speeds up the interpretation of high-resolution top-down proteomics data by integrating tools for protein identification, quantitation, characterization, and visual validation into a customizable and user-friendly interface. We envision that MASH Suite Pro will play an integral role in advancing the burgeoning field of top-down proteomics.With well-developed algorithms and computational tools for mass spectrometry (MS)1 data analysis, peptide-based bottom-up proteomics has gained considerable popularity in the field of systems biology (19). Nevertheless, the bottom-up approach is suboptimal for the analysis of protein posttranslational modifications (PTMs) and sequence variants as a result of protein digestion (10). Alternatively, the protein-based top-down proteomics approach analyzes intact proteins, which provides a “bird''s eye” view of all proteoforms (11), including those arising from sequence variations, alternative splicing, and diverse PTMs, making it a disruptive technology for the comprehensive analysis of proteoforms (1224). However, the complexity of top-down high-resolution mass spectra presents a significant challenge for data analysis. In contrast to the well-developed software packages available for processing data from bottom-up proteomics experiments, the data analysis tools in top-down proteomics remain underdeveloped.The initial step in the analysis of top-down proteomics data is deconvolution of high-resolution mass and tandem mass spectra. Thorough high-resolution analysis of spectra by horn (THRASH), which was the first algorithm developed for the deconvolution of high-resolution mass spectra (25), is still widely used. THRASH automatically detects and evaluates individual isotopomer envelopes by comparing the experimental isotopomer envelope with a theoretical envelope and reporting those that score higher than a user-defined threshold. Another commonly used algorithm, MS-Deconv, utilizes a combinatorial approach to address the difficulty of grouping MS peaks from overlapping isotopomer envelopes (26). Recently, UniDec, which employs a Bayesian approach to separate mass and charge dimensions (27), can also be applied to the deconvolution of high-resolution spectra. Although these algorithms assist in data processing, unfortunately, the deconvolution results often contain a considerable amount of misassigned peaks as a consequence of the complexity of the high-resolution MS and MS/MS data generated in top-down proteomics experiments. Errors such as these can undermine the accuracy of protein identification and PTM localization and, thus, necessitate the implementation of visual components that allow for the validation and manual correction of the computational outputs.Following spectral deconvolution, a typical top-down proteomics workflow incorporates identification, quantitation, and characterization of proteoforms; however, most of the recently developed data analysis tools for top-down proteomics, including ProSightPC (28, 29), Mascot Top Down (also known as Big-Mascot) (30), MS-TopDown (31), and MS-Align+ (32), focus almost exclusively on protein identification. ProSightPC was the first software tool specifically developed for top-down protein identification. This software utilizes “shotgun annotated” databases (33) that include all possible proteoforms containing user-defined modifications. Consequently, ProSightPC is not optimized for identifying PTMs that are not defined by the user(s). Additionally, the inclusion of all possible modified forms within the database dramatically increases the size of the database and, thus, limits the search speed (32). Mascot Top Down (30) is based on standard Mascot but enables database searching using a higher mass limit for the precursor ions (up to 110 kDa), which allows for the identification of intact proteins. Protein identification using Mascot Top Down is fundamentally similar to that used in bottom-up proteomics (34), and, therefore, it is somewhat limited in terms of identifying unexpected PTMs. MS-TopDown (31) employs the spectral alignment algorithm (35), which matches the top-down tandem mass spectra to proteins in the database without prior knowledge of the PTMs. Nevertheless, MS-TopDown lacks statistical evaluation of the search results and performs slowly when searching against large databases. MS-Align+ also utilizes spectral alignment for top-down protein identification (32). It is capable of identifying unexpected PTMs and allows for efficient filtering of candidate proteins when the top-down spectra are searched against a large protein database. MS-Align+ also provides statistical evaluation for the selection of proteoform spectrum match (PrSM) with high confidence. More recently, Top-Down Mass Spectrometry Based Proteoform Identification and Characterization (TopPIC) was developed (http://proteomics.informatics.iupui.edu/software/toppic/index.html). TopPIC is an updated version of MS-Align+ with increased spectral alignment speed and reduced computing requirements. In addition, MSPathFinder, developed by Kim et al., also allows for the rapid identification of proteins from top-down tandem mass spectra (http://omics.pnl.gov/software/mspathfinder) using spectral alignment. Although software tools employing spectral alignment, such as MS-Align+ and MSPathFinder, are particularly useful for top-down protein identification, these programs operate using command line, making them difficult to use for those with limited knowledge of command syntax.Recently, new software tools have been developed for proteoform characterization (36, 37). Our group previously developed MASH Suite, a user-friendly interface for the processing, visualization, and validation of high-resolution MS and MS/MS data (36). Another software tool, ProSight Lite, developed recently by the Kelleher group (37), also allows characterization of protein PTMs. However, both of these software tools require prior knowledge of the protein sequence for the effective localization of PTMs. In addition, both software tools cannot process data from liquid chromatography (LC)-MS and LC-MS/MS experiments, which limits their usefulness in large-scale top-down proteomics. Thus, despite these recent efforts, a multifunctional software platform enabling identification, quantitation, and characterization of proteins from top-down spectra, as well as visual validation and data correction, is still lacking.Herein, we report the development of MASH Suite Pro, an integrated software platform, designed to incorporate tools for protein identification, quantitation, and characterization into a single comprehensive package for the analysis of top-down proteomics data. This program contains a user-friendly customizable interface similar to the previously developed MASH Suite (36) but also has a number of new capabilities, including the ability to handle complex proteomics datasets from LC-MS and LC-MS/MS experiments, as well as the ability to identify unknown proteins and PTMs using MS-Align+ (32). Importantly, MASH Suite Pro also provides visualization components for the validation and correction of the computational outputs, which ensures accurate and reliable deconvolution of the spectra and localization of PTMs and sequence variations.  相似文献   

9.
Knowledge of elaborate structures of protein complexes is fundamental for understanding their functions and regulations. Although cross-linking coupled with mass spectrometry (MS) has been presented as a feasible strategy for structural elucidation of large multisubunit protein complexes, this method has proven challenging because of technical difficulties in unambiguous identification of cross-linked peptides and determination of cross-linked sites by MS analysis. In this work, we developed a novel cross-linking strategy using a newly designed MS-cleavable cross-linker, disuccinimidyl sulfoxide (DSSO). DSSO contains two symmetric collision-induced dissociation (CID)-cleavable sites that allow effective identification of DSSO-cross-linked peptides based on their distinct fragmentation patterns unique to cross-linking types (i.e. interlink, intralink, and dead end). The CID-induced separation of interlinked peptides in MS/MS permits MS3 analysis of single peptide chain fragment ions with defined modifications (due to DSSO remnants) for easy interpretation and unambiguous identification using existing database searching tools. Integration of data analyses from three generated data sets (MS, MS/MS, and MS3) allows high confidence identification of DSSO cross-linked peptides. The efficacy of the newly developed DSSO-based cross-linking strategy was demonstrated using model peptides and proteins. In addition, this method was successfully used for structural characterization of the yeast 20 S proteasome complex. In total, 13 non-redundant interlinked peptides of the 20 S proteasome were identified, representing the first application of an MS-cleavable cross-linker for the characterization of a multisubunit protein complex. Given its effectiveness and simplicity, this cross-linking strategy can find a broad range of applications in elucidating the structural topology of proteins and protein complexes.Proteins form stable and dynamic multisubunit complexes under different physiological conditions to maintain cell viability and normal cell homeostasis. Detailed knowledge of protein interactions and protein complex structures is fundamental to understanding how individual proteins function within a complex and how the complex functions as a whole. However, structural elucidation of large multisubunit protein complexes has been difficult because of a lack of technologies that can effectively handle their dynamic and heterogeneous nature. Traditional methods such as nuclear magnetic resonance (NMR) analysis and x-ray crystallography can yield detailed information on protein structures; however, NMR spectroscopy requires large quantities of pure protein in a specific solvent, whereas x-ray crystallography is often limited by the crystallization process.In recent years, chemical cross-linking coupled with mass spectrometry (MS) has become a powerful method for studying protein interactions (13). Chemical cross-linking stabilizes protein interactions through the formation of covalent bonds and allows the detection of stable, weak, and/or transient protein-protein interactions in native cells or tissues (49). In addition to capturing protein interacting partners, many studies have shown that chemical cross-linking can yield low resolution structural information about the constraints within a molecule (2, 3, 10) or protein complex (1113). The application of chemical cross-linking, enzymatic digestion, and subsequent mass spectrometric and computational analyses for the elucidation of three-dimensional protein structures offers distinct advantages over traditional methods because of its speed, sensitivity, and versatility. Identification of cross-linked peptides provides distance constraints that aid in constructing the structural topology of proteins and/or protein complexes. Although this approach has been successful, effective detection and accurate identification of cross-linked peptides as well as unambiguous assignment of cross-linked sites remain extremely challenging due to their low abundance and complicated fragmentation behavior in MS analysis (2, 3, 10, 14). Therefore, new reagents and methods are urgently needed to allow unambiguous identification of cross-linked products and to improve the speed and accuracy of data analysis to facilitate its application in structural elucidation of large protein complexes.A number of approaches have been developed to facilitate MS detection of low abundance cross-linked peptides from complex mixtures. These include selective enrichment using affinity purification with biotinylated cross-linkers (1517) and click chemistry with alkyne-tagged (18) or azide-tagged (19, 20) cross-linkers. In addition, Staudinger ligation has recently been shown to be effective for selective enrichment of azide-tagged cross-linked peptides (21). Apart from enrichment, detection of cross-linked peptides can be achieved by isotope-labeled (2224), fluorescently labeled (25), and mass tag-labeled cross-linking reagents (16, 26). These methods can identify cross-linked peptides with MS analysis, but interpretation of the data generated from interlinked peptides (two peptides connected with the cross-link) by automated database searching remains difficult. Several bioinformatics tools have thus been developed to interpret MS/MS data and determine interlinked peptide sequences from complex mixtures (12, 14, 2732). Although promising, further developments are still needed to make such data analyses as robust and reliable as analyzing MS/MS data of single peptide sequences using existing database searching tools (e.g. Protein Prospector, Mascot, or SEQUEST).Various types of cleavable cross-linkers with distinct chemical properties have been developed to facilitate MS identification and characterization of cross-linked peptides. These include UV photocleavable (33), chemical cleavable (19), isotopically coded cleavable (24), and MS-cleavable reagents (16, 26, 3438). MS-cleavable cross-linkers have received considerable attention because the resulting cross-linked products can be identified based on their characteristic fragmentation behavior observed during MS analysis. Gas-phase cleavage sites result in the detection of a “reporter” ion (26), single peptide chain fragment ions (3538), or both reporter and fragment ions (16, 34). In each case, further structural characterization of the peptide product ions generated during the cleavage reaction can be accomplished by subsequent MSn1 analysis. Among these linkers, the “fixed charge” sulfonium ion-containing cross-linker developed by Lu et al. (37) appears to be the most attractive as it allows specific and selective fragmentation of cross-linked peptides regardless of their charge and amino acid composition based on their studies with model peptides.Despite the availability of multiple types of cleavable cross-linkers, most of the applications have been limited to the study of model peptides and single proteins. Additionally, complicated synthesis and fragmentation patterns have impeded most of the known MS-cleavable cross-linkers from wide adaptation by the community. Here we describe the design and characterization of a novel and simple MS-cleavable cross-linker, DSSO, and its application to model peptides and proteins and the yeast 20 S proteasome complex. In combination with new software developed for data integration, we were able to identify DSSO-cross-linked peptides from complex peptide mixtures with speed and accuracy. Given its effectiveness and simplicity, we anticipate a broader application of this MS-cleavable cross-linker in the study of structural topology of other protein complexes using cross-linking and mass spectrometry.  相似文献   

10.
The success of high-throughput proteomics hinges on the ability of computational methods to identify peptides from tandem mass spectra (MS/MS). However, a common limitation of most peptide identification approaches is the nearly ubiquitous assumption that each MS/MS spectrum is generated from a single peptide. We propose a new computational approach for the identification of mixture spectra generated from more than one peptide. Capitalizing on the growing availability of large libraries of single-peptide spectra (spectral libraries), our quantitative approach is able to identify up to 98% of all mixture spectra from equally abundant peptides and automatically adjust to varying abundance ratios of up to 10:1. Furthermore, we show how theoretical bounds on spectral similarity avoid the need to compare each experimental spectrum against all possible combinations of candidate peptides (achieving speedups of over five orders of magnitude) and demonstrate that mixture-spectra can be identified in a matter of seconds against proteome-scale spectral libraries. Although our approach was developed for and is demonstrated on peptide spectra, we argue that the generality of the methods allows for their direct application to other types of spectral libraries and mixture spectra.The success of tandem MS (MS/MS1) approaches to peptide identification is partly due to advances in computational techniques allowing for the reliable interpretation of MS/MS spectra. Mainstream computational techniques mainly fall into two categories: database search approaches that score each spectrum against peptides in a sequence database (14) or de novo techniques that directly reconstruct the peptide sequence from each spectrum (58). The combination of these methods with advances in high-throughput MS/MS have promoted the accelerated growth of spectral libraries, collections of peptide MS/MS spectra the identification of which were validated by accepted statistical methods (9, 10) and often also manually confirmed by mass spectrometry experts. The similar concept of spectral archives was also recently proposed to denote spectral libraries including “interesting” nonidentified spectra (11) (i.e. recurring spectra with good de novo reconstructions but no database match). The growing availability of these large collections of MS/MS spectra has reignited the development of alternative peptide identification approaches based on spectral matching (1214) and alignment (1517) algorithms.However, mainstream approaches were developed under the (often unstated) assumption that each MS/MS spectrum is generated from a single peptide. Although chromatographic procedures greatly contribute to making this a reasonable assumption, there are several situations where it is difficult or even impossible to separate pairs of peptides. Examples include certain permutations of the peptide sequence or post-translational modifications (see (18) for examples of co-eluting histone modification variants). In addition, innovative experimental setups have demonstrated the potential for increased throughput in peptide identification using mixture spectra; examples include data-independent acquisition (19) ion-mobility MS (20), and MSE strategies (21).To alleviate the algorithmic bottleneck in such scenarios, we describe a computational approach, M-SPLIT (mixture-spectrum partitioning using library of identified tandem mass spectra), that is able to reliably and efficiently identify peptides from mixture spectra, which are generated from a pair of peptides. In brief, a mixture spectrum is modeled as linear combination of two single-peptide spectra, and peptide identification is done by searching against a spectral library. We show that efficient filtration and accurate branch-and-bound strategies can be used to avoid the huge computational cost of searching all possible pairs. Thus equipped, our approach is able to identify the correct matches by considering only a minuscule fraction of all possible matches. Beyond potentially enhancing the identification capabilities of current MS/MS acquisition setups, we argue that the availability of methods to reliably identify MS/MS spectra from mixtures of peptides could enable the collection of MS/MS data using accelerated chromatography setups to obtain the same or better peptide identification results in a fraction of the experimental time currently required for exhaustive peptide separation.  相似文献   

11.
The use of ultraviolet photodissociation (UVPD) for the activation and dissociation of peptide anions is evaluated for broader coverage of the proteome. To facilitate interpretation and assignment of the resulting UVPD mass spectra of peptide anions, the MassMatrix database search algorithm was modified to allow automated analysis of negative polarity MS/MS spectra. The new UVPD algorithms were developed based on the MassMatrix database search engine by adding specific fragmentation pathways for UVPD. The new UVPD fragmentation pathways in MassMatrix were rigorously and statistically optimized using two large data sets with high mass accuracy and high mass resolution for both MS1 and MS2 data acquired on an Orbitrap mass spectrometer for complex Halobacterium and HeLa proteome samples. Negative mode UVPD led to the identification of 3663 and 2350 peptides for the Halo and HeLa tryptic digests, respectively, corresponding to 655 and 645 peptides that were unique when compared with electron transfer dissociation (ETD), higher energy collision-induced dissociation, and collision-induced dissociation results for the same digests analyzed in the positive mode. In sum, 805 and 619 proteins were identified via UVPD for the Halobacterium and HeLa samples, respectively, with 49 and 50 unique proteins identified in contrast to the more conventional MS/MS methods. The algorithm also features automated charge determination for low mass accuracy data, precursor filtering (including intact charge-reduced peaks), and the ability to combine both positive and negative MS/MS spectra into a single search, and it is freely open to the public. The accuracy and specificity of the MassMatrix UVPD search algorithm was also assessed for low resolution, low mass accuracy data on a linear ion trap. Analysis of a known mixture of three mitogen-activated kinases yielded similar sequence coverage percentages for UVPD of peptide anions versus conventional collision-induced dissociation of peptide cations, and when these methods were combined into a single search, an increase of up to 13% sequence coverage was observed for the kinases. The ability to sequence peptide anions and cations in alternating scans in the same chromatographic run was also demonstrated. Because ETD has a significant bias toward identifying highly basic peptides, negative UVPD was used to improve the identification of the more acidic peptides in conjunction with positive ETD for the more basic species. In this case, tryptic peptides from the cytosolic section of HeLa cells were analyzed by polarity switching nanoLC-MS/MS utilizing ETD for cation sequencing and UVPD for anion sequencing. Relative to searching using ETD alone, positive/negative polarity switching significantly improved sequence coverages across identified proteins, resulting in a 33% increase in unique peptide identifications and more than twice the number of peptide spectral matches.The advent of new high-performance tandem mass spectrometers equipped with the most versatile collision- and electron-based activation methods and ever more powerful database search algorithms has catalyzed tremendous progress in the field of proteomics (14). Despite these advances in instrumentation and methodologies, there are few methods that fully exploit the information available from the acidic proteome or acidic regions of proteins. Typical high-throughput, bottom-up workflows consist of the chromatographic separation of complex mixtures of digested proteins followed by online mass spectrometry (MS) and MSn analysis. This bottom-up approach remains the most popular strategy for protein identification, biomarker discovery, quantitative proteomics, and elucidation of post-translational modifications. To date, proteome characterization via mass spectrometry has overwhelmingly focused on the analysis of peptide cations (5), resulting in an inherent bias toward basic peptides that easily ionize under acidic mobile phase conditions and positive polarity MS settings. Given that ∼50% of peptides/proteins are naturally acidic (6) and that many of the most important post-translational modifications (e.g. phosphorylation, acetylation, sulfonation, etc.) significantly decrease the isoelectric points of peptides (7, 8), there is a compelling need for better analytical methodologies for characterization of the acidic proteome.A principal reason for the shortage of methods for peptide anion characterization is the lack of MS/MS techniques suitable for the efficient and predictable dissociation of peptide anions. Although there are a growing array of new ion activation methods for the dissociation of peptides, most have been developed for the analysis of positively charged peptides. Collision-induced dissociation (CID)1 of peptide anions, for example, often yields unpredictable or uninformative fragmentation behavior, with spectra dominated by neutral losses from both precursor and product ions (9), resulting in insufficient peptide sequence information. The two most promising new electron-based methods, electron-capture dissociation and electron-transfer dissociation (ETD), are applicable only to positively charged ions, not to anions (1013). Because of the known inadequacy of CID and the lack of feasibility of electron-capture dissociation and ETD for peptide anion sequencing, several alternative MSn methods have been developed recently. Electron detachment dissociation using high-energy electrons to induce backbone cleavages was developed for peptide anions (14, 15). Another new technique, negative ETD, entails reactions of radical cation reagents with peptide anions to promote electron transfer from the peptide to the reagent that causes radical-directed dissociation (16, 17). Activated-electron photodetachment dissociation, an MS3 technique, uses UV irradiation to produce intact peptide radical anions, which are then collisionally activated (18, 19). Although they represent inroads in the characterization of peptide anions, these methods also suffer from several significant shortcomings. Electron detachment dissociation and activated-electron photodetachment dissociation are both low-efficiency methods that require long averaging cycles and activation times that range from half a second to multiple seconds, impeding the integration of these methods with chromatographic timescales (1419). In addition, the fragmentation patterns frequently yield many high-abundance neutral losses from product ions, which clutter the spectra (1417), and few sequence ions (14, 18, 19). Recently, we reported the use of 193-nm photons (ultraviolet photodissociation (UVPD)) for peptide anion activation, which was shown to yield rich and predictable fragmentation patterns with high sequence coverage on a fast liquid chromatographic timeline (20). This method showed promise for a range of peptide charge states (i.e. from 3- to 1-), as well as for both unmodified and phosphorylated species.Several widely used or commercial database searching techniques are available for automated “bottom-up” analysis of peptide cations; SEQUEST (21), MASCOT (22), OMSSA (23), X! Tandem (24), and MASPIC (25) are all popular choices and yield comparable results (26). MassMatrix (27), a recently introduced searching algorithm, uses a mass accuracy sensitive probability-based scoring scheme for both the total number of matched product ions and the total abundance of matched products. This searching method also utilizes LC retention times to filter false positive peptide matches (28) and has been shown to yield results comparable to or better than those obtained with SEQUEST, MASCOT, OMSSA, and X! Tandem (29). Despite the ongoing innovation in automated peptide cation analysis, there is a lack of publically available methods for automated peptide anion analysis.In this work, we have modified the mass accuracy sensitive probabilistic MassMatrix algorithms to allow database searching of negative polarity MS/MS spectra. The algorithm is specific to the fragmentation behavior generated from 193-nm UVPD of peptide anions. The UVPD pathways in MassMatrix were rigorously and statistically optimized using two large data sets with high mass accuracy and high mass resolution for both MS1 and MS2 data acquired on an Orbitrap mass spectrometer for complex HeLa and Halo proteome samples. For low mass accuracy/low mass resolution data, we also incorporated a charge-state-filtering algorithm that identifies the charge state of each MS/MS spectrum based on the fragmentation patterns prior to searching. MassMatrix not only can analyze both positive and negative polarity LC-MS/MS files separately, but also can combine files from different polarities and different dissociation methods into a single search, thus maximizing the information content for a given proteomics experiment. The explicit incorporation of mass accuracy in the scores for the UVPD MS/MS spectra of peptide anions increases peptide assignments and identifications. Finally, we showcase the utility of integrating MassMatrix searching with positive/negative polarity MS/MS switching (i.e. data-dependent positive ETD and negative UVPD during a single proteomic LC-MS/MS run). MassMatrix is available to the public as a free search engine online.  相似文献   

12.
13.
The performances of 10 different normalization methods on data of endogenous brain peptides produced with label-free nano-LC-MS were evaluated. Data sets originating from three different species (mouse, rat, and Japanese quail), each consisting of 35–45 individual LC-MS analyses, were used in the study. Each sample set contained both technical and biological replicates, and the LC-MS analyses were performed in a randomized block fashion. Peptides in all three data sets were found to display LC-MS analysis order-dependent bias. Global normalization methods will only to some extent correct this type of bias. Only the novel normalization procedure RegrRun (linear regression followed by analysis order normalization) corrected for this type of bias. The RegrRun procedure performed the best of the normalization methods tested and decreased the median S.D. by 43% on average compared with raw data. This method also produced the smallest fraction of peptides with interblock differences while producing the largest fraction of differentially expressed peaks between treatment groups in all three data sets. Linear regression normalization (Regr) performed second best and decreased median S.D. by 38% on average compared with raw data. All other examined methods reduced median S.D. by 20–30% on average compared with raw data.Peptidomics is defined as the analysis of the peptide content within an organism, tissue, or cell (13). The proteome and peptidome have common features, but there are also prominent differences. Proteomics generally identifies proteins by using the information of biologically inactive peptides derived from tryptic digestion, whereas peptidomics tries to identify endogenous peptides using single peptide sequence information only (4). Endogenous neuropeptides are peptides used for intracellular signaling that can act as neurotransmitters or neuromodulators in the nervous system. These polypeptides of 3–100 amino acids can be abundantly produced in large neural populations or in trace levels from single neurons (5) and are often generated through the cleavage of precursor proteins. However, unwanted peptides can also be created through post-mortem induced proteolysis (6). The later aspect complicates the technical analysis of neuropeptides as post-mortem conditions increase the number of degradation peptides. The possibility to detect, identify, and quantify lowly expressed neuropeptides using label-free LC-MS techniques has improved with the development of new sample preparation techniques including rapid heating of the tissue, which prevents protein degradation and inhibition of post-mortem proteolytic activity (7, 8).It has been suggested by us (4, 5) and others (9) that comparing the peptidome between samples of e.g. diseased and normal tissue may lead to the discovery of biologically relevant peptides of certain pathological or pharmacological events. However, differences in relative peptide abundance measurements may not only originate from biological differences but also from systematic bias and noise. To reduce the effects of experimentally induced variability it is common to normalize the raw data. This is a concept well known in the area of genomics studies using gene expression microarrays (1012). As a consequence, many methods developed for microarray data have also been adapted for normalizing peptide data produced with LC-MS techniques (1016). Normally the underlying assumption for applying these techniques is that the total or mean/median peak abundances should be equal across different experiments, in this case between LC-MS analyses. Global normalization methods refer to cases where all peak abundances are used to determine a single normalization factor between experiments (13, 15, 16), a subset of peaks assumed to be similarly abundant between experiments (16) is used, or spiked-in peptides are used as internal standards. In a study by Callister et al. (14), normalization methods for tryptic LC-FTICR-MS peptide data were compared. The authors concluded that global or iterative linear regression works best in most cases but also recommended that the best procedure should be selected for each data set individually. Methods used for normalizing LC-MS data have been reviewed previously (14, 17, 18), but to our knowledge only Callister et al. (14) have used small data sets to systematically evaluate such methods. None of these studies have targeted data of endogenous peptides.In this study, the effects of 10 different normalization methods were evaluated on data produced by a nano-LC system coupled to an electrospray Q-TOF or linear trap quadrupole (LTQ)1 mass spectrometer. Normalization methods that originally were developed for gene expression data were used, and one novel method, linear regression followed by analysis order normalization (RegrRun), is presented. The normalization methods were evaluated using three data sets of endogenous brain peptides originating from three different species (mouse, rat, and Japanese quail), each consisting of 35–45 individual LC-MS analyses. Each data set contained both technical and biological replicates.  相似文献   

14.
Optimal performance of LC-MS/MS platforms is critical to generating high quality proteomics data. Although individual laboratories have developed quality control samples, there is no widely available performance standard of biological complexity (and associated reference data sets) for benchmarking of platform performance for analysis of complex biological proteomes across different laboratories in the community. Individual preparations of the yeast Saccharomyces cerevisiae proteome have been used extensively by laboratories in the proteomics community to characterize LC-MS platform performance. The yeast proteome is uniquely attractive as a performance standard because it is the most extensively characterized complex biological proteome and the only one associated with several large scale studies estimating the abundance of all detectable proteins. In this study, we describe a standard operating protocol for large scale production of the yeast performance standard and offer aliquots to the community through the National Institute of Standards and Technology where the yeast proteome is under development as a certified reference material to meet the long term needs of the community. Using a series of metrics that characterize LC-MS performance, we provide a reference data set demonstrating typical performance of commonly used ion trap instrument platforms in expert laboratories; the results provide a basis for laboratories to benchmark their own performance, to improve upon current methods, and to evaluate new technologies. Additionally, we demonstrate how the yeast reference, spiked with human proteins, can be used to benchmark the power of proteomics platforms for detection of differentially expressed proteins at different levels of concentration in a complex matrix, thereby providing a metric to evaluate and minimize preanalytical and analytical variation in comparative proteomics experiments.Access to proteomics performance standards is essential for several reasons. First, to generate the highest quality data possible, proteomics laboratories routinely benchmark and perform quality control (QC)1 monitoring of the performance of their instrumentation using standards. Second, appropriate standards greatly facilitate the development of improvements in technologies by providing a timeless standard with which to evaluate new protocols or instruments that claim to improve performance. For example, it is common practice for an individual laboratory considering purchase of a new instrument to require the vendor to run “demo” samples so that data from the new instrument can be compared head to head with existing instruments in the laboratory. Third, large scale proteomics studies designed to aggregate data across laboratories can be facilitated by the use of a performance standard to measure reproducibility across sites or to compare the performance of different LC-MS configurations or sample processing protocols used between laboratories to facilitate development of optimized standard operating procedures (SOPs).Most individual laboratories have adopted their own QC standards, which range from mixtures of known synthetic peptides to digests of bovine serum albumin or more complex mixtures of several recombinant proteins (1). However, because each laboratory performs QC monitoring in isolation, it is difficult to compare the performance of LC-MS platforms throughout the community.Several standards for proteomics are available for request or purchase (2, 3). RM8327 is a mixture of three peptides developed as a reference material in collaboration between the National Institute of Standards and Technology (NIST) and the Association of Biomolecular Resource Facilities. Mixtures of 15–48 purified human proteins are also available, such as the HUPO (Human Proteome Organisation) Gold MS Protein Standard (Invitrogen), the Universal Proteomics Standard (UPS1; Sigma), and CRM470 from the European Union Institute for Reference Materials and Measurements. Although defined mixtures of peptides or proteins can address some benchmarking and QC needs, there is an additional need for more complex reference materials to fully represent the challenges of LC-MS data acquisition in complex matrices encountered in biological samples (2, 3).Although it has not been widely distributed as a reference material, the yeast Saccharomyces cerevisiae proteome has been extensively used by the proteomics community to characterize the capabilities of a variety of LC-MS-based approaches (415). Yeast provides a uniquely attractive complex performance standard for several reasons. Yeast encodes a complex proteome consisting of ∼4,500 proteins expressed during normal growth conditions (7, 1618). The concentration range of yeast proteins is sufficient to challenge the dynamic range of conventional mass spectrometers; the abundance of proteins ranges from fewer than 50 to more than 106 molecules per cell (4, 15, 16). Additionally, it is the most extensively characterized complex biological proteome and the only one associated with several large scale studies estimating the abundance of all detectable proteins (5, 9, 16, 17, 19, 20) as well as LC-MS/MS data sets showing good correlation between LC-MS/MS detection efficiency and the protein abundance estimates (4, 11, 12, 15). Finally, it is inexpensive and easy to produce large quantities of yeast protein extract for distribution.In this study, we describe large scale production of a yeast S. cerevisiae performance standard, which we offer to the community through NIST. Through a series of interlaboratory studies, we created a reference data set characterizing the yeast performance standard and defining reasonable performance of ion trap-based LC-MS platforms in expert laboratories using a series of performance metrics. This publicly available data set provides a basis for additional laboratories using the yeast standard to benchmark their own performance as well as to improve upon the current status by evolving protocols, improving instrumentation, or developing new technologies. Finally, we demonstrate how the yeast performance standard, spiked with human proteins, can be used to benchmark the power of proteomics platforms for detection of differentially expressed proteins at different levels of concentration in a complex matrix.  相似文献   

15.
A decoding algorithm is tested that mechanistically models the progressive alignments that arise as the mRNA moves past the rRNA tail during translation elongation. Each of these alignments provides an opportunity for hybridization between the single-stranded, -terminal nucleotides of the 16S rRNA and the spatially accessible window of mRNA sequence, from which a free energy value can be calculated. Using this algorithm we show that a periodic, energetic pattern of frequency 1/3 is revealed. This periodic signal exists in the majority of coding regions of eubacterial genes, but not in the non-coding regions encoding the 16S and 23S rRNAs. Signal analysis reveals that the population of coding regions of each bacterial species has a mean phase that is correlated in a statistically significant way with species () content. These results suggest that the periodic signal could function as a synchronization signal for the maintenance of reading frame and that codon usage provides a mechanism for manipulation of signal phase.[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32]  相似文献   

16.
Liquid chromatography-tandem mass spectrometry (LC-MS/MS)-based proteomics provides a wealth of information about proteins present in biological samples. In bottom-up LC-MS/MS-based proteomics, proteins are enzymatically digested into peptides prior to query by LC-MS/MS. Thus, the information directly available from the LC-MS/MS data is at the peptide level. If a protein-level analysis is desired, the peptide-level information must be rolled up into protein-level information. We propose a principal component analysis-based statistical method, ProPCA, for efficiently estimating relative protein abundance from bottom-up label-free LC-MS/MS data that incorporates both spectral count information and LC-MS peptide ion peak attributes, such as peak area, volume, or height. ProPCA may be used effectively with a variety of quantification platforms and is easily implemented. We show that ProPCA outperformed existing quantitative methods for peptide-protein roll-up, including spectral counting methods and other methods for combining LC-MS peptide peak attributes. The performance of ProPCA was validated using a data set derived from the LC-MS/MS analysis of a mixture of protein standards (the UPS2 proteomic dynamic range standard introduced by The Association of Biomolecular Resource Facilities Proteomics Standards Research Group in 2006). Finally, we applied ProPCA to a comparative LC-MS/MS analysis of digested total cell lysates prepared for LC-MS/MS analysis by alternative lysis methods and show that ProPCA identified more differentially abundant proteins than competing methods.One of the fundamental goals of proteomics methods for the biological sciences is to identify and quantify all proteins present in a sample. LC-MS/MS-based proteomics methodologies offer a promising approach to this problem (13). These methodologies allow for the acquisition of a vast amount of information about the proteins present in a sample. However, extracting reliable protein abundance information from LC-MS/MS data remains challenging. In this work, we were primarily concerned with the analysis of data acquired using bottom-up label-free LC-MS/MS-based proteomics techniques where “bottom-up” refers to the fact that proteins are enzymatically digested into peptides prior to query by the LC-MS/MS instrument platform (4), and “label-free” indicates that analyses are performed without the aid of stable isotope labels. One challenge inherent in the bottom-up approach to proteomics is that information directly available from the LC-MS/MS data is at the peptide level. When a protein-level analysis is desired, as is often the case with discovery-driven LC-MS research, peptide-level information must be rolled up into protein-level information.Spectral counting (510) is a straightforward and widely used example of peptide-protein roll-up for LC-MS/MS data. Information experimentally acquired in single stage (MS) and tandem (MS/MS) spectra may lead to the assignment of MS/MS spectra to peptide sequences in a database-driven or database-free manner using various peptide identification software platforms (SEQUEST (11) and Mascot (12), for instance); the identified peptide sequences correspond, in turn, to proteins. In principle, the number of tandem spectra matched to peptides corresponding to a certain protein, the spectral count (SC),1 is positively associated with the abundance of a protein (5). In spectral counting techniques, raw or normalized SCs are used as a surrogate for protein abundance. Spectral counting methods have been moderately successful in quantifying protein abundance and identifying significant proteins in various settings. However, SC-based methods do not make full use of information available from peaks in the LC-MS domain, and this surely leads to loss of efficiency.Peaks in the LC-MS domain corresponding to peptide ion species are highly sensitive to differences in protein abundance (13, 14). Identifying LC-MS peaks that correspond to detected peptides and measuring quantitative attributes of these peaks (such as height, area, or volume) offers a promising alternative to spectral counting methods. These methods have become especially popular in applications using stable isotope labeling (15). However, challenges remain, especially in the label-free analysis of complex proteomics samples where complications in peak detection, alignment, and integration are a significant obstacle. In practice, alignment, identification, and quantification of LC-MS peptide peak attributes (PPAs) may be accomplished using recently developed peak matching platforms (1618). A highly sensitive indicator of protein abundance may be obtained by rolling up PPA measurements into protein-level information (16, 19, 20). Existing peptide-protein roll-up procedures based on PPAs typically involve taking the mean of (possibly normalized) PPA measurements over all peptides corresponding to a protein to obtain a protein-level estimate of abundance. Despite the promise of PPA-based procedures for protein quantification, the performance of PPA-based methods may vary widely depending on the particular roll-up procedure used; furthermore, PPA-based procedures are limited by difficulties in accurately identifying and measuring peptide peak attributes. These two issues are related as the latter issue affects the robustness of PPA-based roll-up methods. Indeed, existing peak matching and quantification platforms tend to result in PPA measurement data sets with substantial missingness (16, 19, 21), especially when working with very complex samples where substantial dynamic ranges and ion suppression are difficulties that must be overcome. Missingness may, in turn, lead to instability in protein-level abundance estimates. A good peptide-protein roll-up procedure that utilizes PPAs should account for this missingness and the resulting instability in a principled way. However, even in the absence of missingness, there is no consensus in the existing literature on peptide-protein roll-up for PPA measurements.In this work, we propose ProPCA, a peptide-protein roll-up method for efficiently extracting protein abundance information from bottom-up label-free LC-MS/MS data. ProPCA is an easily implemented, unsupervised method that is related to principle component analysis (PCA) (22). ProPCA optimally combines SC and PPA data to obtain estimates of relative protein abundance. ProPCA addresses missingness in PPA measurement data in a unified way while capitalizing on strengths of both SCs and PPA-based roll-up methods. In particular, ProPCA adapts to the quality of the available PPA measurement data. If the PPA measurement data are poor and, in the extreme case, no PPA measurements are available, then ProPCA is equivalent to spectral counting. On the other hand, if there is no missingness in the PPA measurement data set, then the ProPCA estimate is a weighted mean of PPA measurements and spectral counts where the weights are chosen to reflect the ability of spectral counts and each peptide to predict protein abundance.Below, we assess the performance of ProPCA using a data set obtained from the LC-MS/MS analysis of protein standards (UPS2 proteomic dynamic range standard set2 manufactured by Sigma-Aldrich) and show that ProPCA outperformed other existing roll-up methods by multiple metrics. The applicability of ProPCA is not limited by the quantification platform used to obtain SCs and PPA measurements. To demonstrate this, we show that ProPCA continued to perform well when used with an alternative quantification platform. Finally, we applied ProPCA to a comparative LC-MS/MS analysis of digested total human hepatocellular carcinoma (HepG2) cell lysates prepared for LC-MS/MS analysis by alternative lysis methods. We show that ProPCA identified more differentially abundant proteins than competing methods.  相似文献   

17.
The field of proteomics has evolved hand-in-hand with technological advances in LC-MS/MS systems, now enabling the analysis of very deep proteomes in a reasonable time. However, most applications do not deal with full cell or tissue proteomes but rather with restricted subproteomes relevant for the research context at hand or resulting from extensive fractionation. At the same time, investigation of many conditions or perturbations puts a strain on measurement capacity. Here, we develop a high-throughput workflow capable of dealing with large numbers of low or medium complexity samples and specifically aim at the analysis of 96-well plates in a single day (15 min per sample). We combine parallel sample processing with a modified liquid chromatography platform driving two analytical columns in tandem, which are coupled to a quadrupole Orbitrap mass spectrometer (Q Exactive HF). The modified LC platform eliminates idle time between measurements, and the high sequencing speed of the Q Exactive HF reduces required measurement time. We apply the pipeline to the yeast chromatin remodeling landscape and demonstrate quantification of 96 pull-downs of chromatin complexes in about 1 day. This is achieved with only 500 μg input material, enabling yeast cultivation in a 96-well format. Our system retrieved known complex-members and the high throughput allowed probing with many bait proteins. Even alternative complex compositions were detectable in these very short gradients. Thus, sample throughput, sensitivity and LC/MS-MS duty cycle are improved severalfold compared with established workflows. The pipeline can be extended to different types of interaction studies and to other medium complexity proteomes.Shotgun proteomics is concerned with the identification and quantification of proteins (13). Prior to analysis, the proteins are digested into peptides, resulting in highly complex mixtures. To deal with this complexity, the peptides are separated by liquid chromatography followed by online analysis with mass spectrometry (MS), today facilitating the characterization of almost complete cell line proteomes in a short time (35). In addition to the characterization of entire proteomes, there is also a great demand for analyzing low or medium complexity samples. Given the trend toward a systems biology view, relatively larges sets of samples often have to be measured. One such category of lower complexity protein mixtures occurs in the determination of physical interaction partners of a protein of interest, which requires the identification and quantification of the proteins “pulled-down” or immunoprecipitated via a bait protein. Protein interactions are essential for almost all biological processes and orchestrate a cell''s behavior by regulating enzymes, forming macromolecular assemblies and functionalizing multiprotein complexes that are capable of more complex behavior than the sum of their parts. The human genome has almost 20,000 protein encoding genes, and it has been estimated that 80% of the proteins engage in complex interactions and that 130,000 to 650,000 protein interactions can take place in a human cell (6, 7). These numbers demonstrate a clear need for systematic and high-throughput mapping of protein–protein interactions (PPIs) to understand these complexes.The introduction of generic methods to detect PPIs, such as the yeast two-hybrid screen (Y2H) (8) or affinity purification combined with mass spectrometry (AP-MS)1 (9), have revolutionized the protein interactomics field. AP-MS in particular has emerged as an important tool to catalogue interactions with the aim of better understanding basic biochemical mechanisms in many different organisms (1017). It can be performed under near-physiological conditions and is capable of identifying functional protein complexes (18). In addition, the combination of affinity purification with quantitative mass spectrometry has greatly improved the discrimination of true interactors from unspecific background binders, a long-standing challenge in the AP-MS field (1921). Nowadays, quantitative AP-MS is employed to address many different biological questions, such as detection of dynamic changes in PPIs upon perturbation (2225) or the impact of posttranslational signaling on PPIs (26, 27). Recent developments even make it possible to provide abundances and stoichiometry information of the bait and prey proteins under study, combined with quantitative data from very deep cellular proteomes. Furthermore, sample preparation in AP-MS can now be performed in high-throughput formats capable of producing hundreds of samples per day. With such throughput in sample generation, the LC-MS/MS part of the AP-MS pipeline has become a major bottleneck for large studies, limiting throughput to a small fraction of the available samples. In principle, this limitation could be circumvented by multiplexing analysis via isotope-labeling strategies (28, 29) or by drastically reducing the measurement time per sample (3032). The former strategy requires exquisite control of the processing steps and has not been widely implemented yet. The latter strategy depends on mass spectrometers with sufficiently high sequencing speed to deal with the pull-down in a very short time. Since its introduction about 10 years ago (33), the Orbitrap mass spectrometer has featured ever-faster sequencing capabilities, with the Q Exactive HF now reaching a peptide sequencing speed of up to 17 Hz (34). This should now make it feasible to substantially lower the amount of time spent per measurement.Although very short LC-MS/MS runs can in principle be used for high-throughput analyses, they usually lead to a drop in LC-MS duty cycle. This is because each sample needs initial washing, loading, and equilibration steps, independent of gradient time, which takes a substantial percentage for most LC setups - typically at least 15–20 min. To achieve a more efficient LC-MS duty cycle, while maintaining high sensitivity, a second analytical column can be introduced. This enables the parallelization of several steps related to sample loading and to the LC operating steps, including valve switching. Such dual analytical column or “double-barrel: setups have been described for various applications and platforms (30, 3539).Starting from the reported performance and throughput of workflows that are standard today (16, 21, 4042), we asked if it would be possible to obtain a severalfold increase in both sample throughput and sensitivity, as well as a considerable reduction in overall wet lab costs and working time. Specifically, our goal was to quantify 96 medium complexity samples in a single day. Such a number of samples can be processed with a 96-well plate, which currently is the format of choice for highly parallelized sample preparation workflows, often with a high degree of automation. We investigated which advances were needed in sample preparation, liquid chromatography, and mass spectrometry. Based on our findings, we developed a parallelized platform for high-throughput sample preparation and LC-MS/MS analysis, which we applied to pull-down samples from the yeast chromatin remodeling landscape. The extent of retrieval of known complex members served as a quality control of the developed pipeline.  相似文献   

18.
19.
A Boolean network is a model used to study the interactions between different genes in genetic regulatory networks. In this paper, we present several algorithms using gene ordering and feedback vertex sets to identify singleton attractors and small attractors in Boolean networks. We analyze the average case time complexities of some of the proposed algorithms. For instance, it is shown that the outdegree-based ordering algorithm for finding singleton attractors works in time for , which is much faster than the naive time algorithm, where is the number of genes and is the maximum indegree. We performed extensive computational experiments on these algorithms, which resulted in good agreement with theoretical results. In contrast, we give a simple and complete proof for showing that finding an attractor with the shortest period is NP-hard.[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32]  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号