首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 175 毫秒
1.
Top-down mass spectrometry (MS)-based proteomics is arguably a disruptive technology for the comprehensive analysis of all proteoforms arising from genetic variation, alternative splicing, and posttranslational modifications (PTMs). However, the complexity of top-down high-resolution mass spectra presents a significant challenge for data analysis. In contrast to the well-developed software packages available for data analysis in bottom-up proteomics, the data analysis tools in top-down proteomics remain underdeveloped. Moreover, despite recent efforts to develop algorithms and tools for the deconvolution of top-down high-resolution mass spectra and the identification of proteins from complex mixtures, a multifunctional software platform, which allows for the identification, quantitation, and characterization of proteoforms with visual validation, is still lacking. Herein, we have developed MASH Suite Pro, a comprehensive software tool for top-down proteomics with multifaceted functionality. MASH Suite Pro is capable of processing high-resolution MS and tandem MS (MS/MS) data using two deconvolution algorithms to optimize protein identification results. In addition, MASH Suite Pro allows for the characterization of PTMs and sequence variations, as well as the relative quantitation of multiple proteoforms in different experimental conditions. The program also provides visualization components for validation and correction of the computational outputs. Furthermore, MASH Suite Pro facilitates data reporting and presentation via direct output of the graphics. Thus, MASH Suite Pro significantly simplifies and speeds up the interpretation of high-resolution top-down proteomics data by integrating tools for protein identification, quantitation, characterization, and visual validation into a customizable and user-friendly interface. We envision that MASH Suite Pro will play an integral role in advancing the burgeoning field of top-down proteomics.With well-developed algorithms and computational tools for mass spectrometry (MS)1 data analysis, peptide-based bottom-up proteomics has gained considerable popularity in the field of systems biology (19). Nevertheless, the bottom-up approach is suboptimal for the analysis of protein posttranslational modifications (PTMs) and sequence variants as a result of protein digestion (10). Alternatively, the protein-based top-down proteomics approach analyzes intact proteins, which provides a “bird''s eye” view of all proteoforms (11), including those arising from sequence variations, alternative splicing, and diverse PTMs, making it a disruptive technology for the comprehensive analysis of proteoforms (1224). However, the complexity of top-down high-resolution mass spectra presents a significant challenge for data analysis. In contrast to the well-developed software packages available for processing data from bottom-up proteomics experiments, the data analysis tools in top-down proteomics remain underdeveloped.The initial step in the analysis of top-down proteomics data is deconvolution of high-resolution mass and tandem mass spectra. Thorough high-resolution analysis of spectra by horn (THRASH), which was the first algorithm developed for the deconvolution of high-resolution mass spectra (25), is still widely used. THRASH automatically detects and evaluates individual isotopomer envelopes by comparing the experimental isotopomer envelope with a theoretical envelope and reporting those that score higher than a user-defined threshold. Another commonly used algorithm, MS-Deconv, utilizes a combinatorial approach to address the difficulty of grouping MS peaks from overlapping isotopomer envelopes (26). Recently, UniDec, which employs a Bayesian approach to separate mass and charge dimensions (27), can also be applied to the deconvolution of high-resolution spectra. Although these algorithms assist in data processing, unfortunately, the deconvolution results often contain a considerable amount of misassigned peaks as a consequence of the complexity of the high-resolution MS and MS/MS data generated in top-down proteomics experiments. Errors such as these can undermine the accuracy of protein identification and PTM localization and, thus, necessitate the implementation of visual components that allow for the validation and manual correction of the computational outputs.Following spectral deconvolution, a typical top-down proteomics workflow incorporates identification, quantitation, and characterization of proteoforms; however, most of the recently developed data analysis tools for top-down proteomics, including ProSightPC (28, 29), Mascot Top Down (also known as Big-Mascot) (30), MS-TopDown (31), and MS-Align+ (32), focus almost exclusively on protein identification. ProSightPC was the first software tool specifically developed for top-down protein identification. This software utilizes “shotgun annotated” databases (33) that include all possible proteoforms containing user-defined modifications. Consequently, ProSightPC is not optimized for identifying PTMs that are not defined by the user(s). Additionally, the inclusion of all possible modified forms within the database dramatically increases the size of the database and, thus, limits the search speed (32). Mascot Top Down (30) is based on standard Mascot but enables database searching using a higher mass limit for the precursor ions (up to 110 kDa), which allows for the identification of intact proteins. Protein identification using Mascot Top Down is fundamentally similar to that used in bottom-up proteomics (34), and, therefore, it is somewhat limited in terms of identifying unexpected PTMs. MS-TopDown (31) employs the spectral alignment algorithm (35), which matches the top-down tandem mass spectra to proteins in the database without prior knowledge of the PTMs. Nevertheless, MS-TopDown lacks statistical evaluation of the search results and performs slowly when searching against large databases. MS-Align+ also utilizes spectral alignment for top-down protein identification (32). It is capable of identifying unexpected PTMs and allows for efficient filtering of candidate proteins when the top-down spectra are searched against a large protein database. MS-Align+ also provides statistical evaluation for the selection of proteoform spectrum match (PrSM) with high confidence. More recently, Top-Down Mass Spectrometry Based Proteoform Identification and Characterization (TopPIC) was developed (http://proteomics.informatics.iupui.edu/software/toppic/index.html). TopPIC is an updated version of MS-Align+ with increased spectral alignment speed and reduced computing requirements. In addition, MSPathFinder, developed by Kim et al., also allows for the rapid identification of proteins from top-down tandem mass spectra (http://omics.pnl.gov/software/mspathfinder) using spectral alignment. Although software tools employing spectral alignment, such as MS-Align+ and MSPathFinder, are particularly useful for top-down protein identification, these programs operate using command line, making them difficult to use for those with limited knowledge of command syntax.Recently, new software tools have been developed for proteoform characterization (36, 37). Our group previously developed MASH Suite, a user-friendly interface for the processing, visualization, and validation of high-resolution MS and MS/MS data (36). Another software tool, ProSight Lite, developed recently by the Kelleher group (37), also allows characterization of protein PTMs. However, both of these software tools require prior knowledge of the protein sequence for the effective localization of PTMs. In addition, both software tools cannot process data from liquid chromatography (LC)-MS and LC-MS/MS experiments, which limits their usefulness in large-scale top-down proteomics. Thus, despite these recent efforts, a multifunctional software platform enabling identification, quantitation, and characterization of proteins from top-down spectra, as well as visual validation and data correction, is still lacking.Herein, we report the development of MASH Suite Pro, an integrated software platform, designed to incorporate tools for protein identification, quantitation, and characterization into a single comprehensive package for the analysis of top-down proteomics data. This program contains a user-friendly customizable interface similar to the previously developed MASH Suite (36) but also has a number of new capabilities, including the ability to handle complex proteomics datasets from LC-MS and LC-MS/MS experiments, as well as the ability to identify unknown proteins and PTMs using MS-Align+ (32). Importantly, MASH Suite Pro also provides visualization components for the validation and correction of the computational outputs, which ensures accurate and reliable deconvolution of the spectra and localization of PTMs and sequence variations.  相似文献   

2.
Proteomics by mass spectrometry technology is widely used for identifying and quantifying peptides and proteins. The breadth and sensitivity of peptide detection have been advanced by the advent of data-independent acquisition mass spectrometry. Analysis of such data, however, is challenging due to the complexity of fragment ion spectra that have contributions from multiple co-eluting precursor ions. We present SWATHProphet software that identifies and quantifies peptide fragment ion traces in data-independent acquisition data, provides accurate probabilities to ensure results are correct, and automatically detects and removes contributions to quantitation originating from interfering precursor ions. Integration in the widely used open source Trans-Proteomic Pipeline facilitates subsequent analyses such as combining results of multiple data sets together for improved discrimination using iProphet and inferring sample proteins using ProteinProphet. This novel development should greatly help make data-independent acquisition mass spectrometry accessible to large numbers of users.Mass spectrometry is widely used to identify and quantify protein samples. Proteins are typically cleaved into peptides (either enzymatically or chemically), separated by at least one-dimensional fractionation (e.g. liquid chromatography), and collisionally fragmented, and fragment ions are detected by their unique m/z values in a mass spectrometer (1). Data-dependent acquisition (shotgun) selects individual precursor ions for fragmentation and is limited in its ability to consistently detect large numbers of peptides, particularly those of lower intensity, in samples (2). In contrast, selective reaction monitoring (SRM)1 is a targeted approach in which known precursor and a set of fragment ions are monitored over time upon selection by mass filters in a triple quadrupole instrument. The selected fragment ions in conjunction with the parent ion constitute a highly sensitive molecular assay specific for a precursor ion of interest. Although this strategy has been successfully applied for a large number of biological studies, it is limited by low throughput.An alternative approach, data-independent acquisition (DIA), aims to overcome the low throughput limitation of SRM while maintaining full quantitative analyses. It selects all ions within a sliding m/z precursor window for fragmentation (37) and effectively creates a digital record of the complete peptide contents of the sample. Its increased sensitivity, however, is limited by the challenge of interpreting fragment ion spectra generated from multiple precursors. This can be done by spectral deconvolution followed by database search (1, 8) or by query of the data with preselected fragment ions in a spectral library in a manner similar to targeted approaches such as SRM (3).Software packages currently available for targeted analysis of DIA MS data with precursor ion assays contained within a spectral library include PeakViewTM from (Sciex, Framingham, MA), for data generated on a TripleTOF mass spectrometer. The proprietary Spectronaut (Biognosys AG, Zurich, Switzerland) and open source OpenSWATH software (9) are adaptations of the mProphet software suite (10) originally designed for SRM data, and the widely used SRM software Skyline (11) now also incorporates mProphet software to handle DIA MS data. None of these available programs, however, provide validation of results with computed probabilities or detection and removal of fragment ion interferences that give rise to inaccurate quantitation and decreased sensitivity.Here we present SWATHProphet software that performs these functions in conjunction with a high quality spectral library. SWATHProphet validates results with accurate probabilities of being correct. These probabilities serve as input to downstream analyses in the highly developed Trans-Proteomic Pipeline (TPP) (12), such as combining together results of multiple runs for improved discrimination with iProphet (13) and inferring sample proteins with ProteinProphet (14). In addition, SWATHProphet uses these probabilities to help cope with complex spectra by automatically detecting fragment ion interferences and removing them in silico to yield accurate quantitation and adjusted probabilities.  相似文献   

3.
Selected reaction monitoring mass spectrometry (SRM-MS) is playing an increasing role in quantitative proteomics and biomarker discovery studies as a method for high throughput candidate quantification and verification. Although SRM-MS offers advantages in sensitivity and quantification compared with other MS-based techniques, current SRM technologies are still challenged by detection and quantification of low abundance proteins (e.g. present at ∼10 ng/ml or lower levels in blood plasma). Here we report enhanced detection sensitivity and reproducibility for SRM-based targeted proteomics by coupling a nanospray ionization multicapillary inlet/dual electrodynamic ion funnel interface to a commercial triple quadrupole mass spectrometer. Because of the increased efficiency in ion transmission, significant enhancements in overall signal intensities and improved limits of detection were observed with the new interface compared with the original interface for SRM measurements of tryptic peptides from proteins spiked into non-depleted mouse plasma over a range of concentrations. Overall, average SRM peak intensities were increased by ∼70-fold. The average level of detection for peptides also improved by ∼10-fold with notably improved reproducibility of peptide measurements as indicated by the reduced coefficients of variance. The ability to detect proteins ranging from 40 to 80 ng/ml within mouse plasma was demonstrated for all spiked proteins without the application of front-end immunoaffinity depletion and fractionation. This significant improvement in detection sensitivity for low abundance proteins in complex matrices is expected to enhance a broad range of SRM-MS applications including targeted protein and metabolite validation.Although mass spectrometry (MS)-based proteomics is a promising high throughput technology for biomarker discovery and validation (15), only a handful of cancer biomarkers have been approved by the United States Food and Drug Administration for clinical use in the last decade (6, 7). Assuming that low abundance biomarkers do exist in the biofluids to be studied, the success of biomarker discovery efforts primarily depends on the sensitivity, accuracy, and robustness of the measurement technologies; the quality and size of patient cohorts and clinical samples and execution within the context of an overall difficult and expensive path to clinical application that encompasses discovery, verification, and validation stages (1, 5, 810). A multiplexed assay platform increasingly considered for biomarker verification is selected reaction monitoring (SRM)1 by tandem mass spectrometry using e.g. a triple quadrupole (QqQ) mass spectrometer to attain high throughput quantitative measurements of targeted proteins in complex matrices (1, 11, 12).SRM utilizes two stages of mass filtering by selecting a specific analyte ion of interest (precursor ion) in the first stage followed by a specific fragment ion derived from the precursor (fragment ion) filter in the second stage after collision-activated dissociation. Typically, several transitions (precursor/fragment ion pairs) are monitored for greater selectivity and confidence in a targeted peptide assay, and large numbers of peptides can be monitored during a single LC-MS/MS analysis. The two-stage mass selection by individual quadrupoles enables more rapid and continuous monitoring of specific ions derived from analytes of interest such as peptides and leads to significantly enhanced detection sensitivity and quantitative accuracy compared with broad (i.e. non-targeted) LC-MS or LC-MS/MS measurements (11, 12). Both the sensitivity and selectivity of SRM-MS make this technique well suited for the targeted detection and quantification of low abundance proteins in highly complex biofluids (1316). The precision and reproducibility of SRM-based measurements of proteins in plasma across different laboratories have recently been assessed (17).Despite its promise, present SRM measurements still do not provide sufficient sensitivity for reliable detection and quantification of low abundance proteins in biofluids (e.g. present in plasma at ∼10 ng/ml or lower levels) primarily because of factors related to high sample complexity and the large dynamic range of relative protein abundances (7, 18, 19). Given sufficient selectivity, the sensitivity achievable is generally related to the peptide MS and MS/MS signal intensities obtained. One of the key factors limiting peptide MS intensities is the significant ion losses encountered between the electrospray ionization (ESI) source and the interface to the mass spectrometer. In typical LC-ESI-MS interfaces, the mass spectrometer inlet (e.g. heated capillary followed by a skimmer) presently provides total ion utilization and ion transmission efficiencies on the order of ∼1% (20) due to a combination of limited ion sampling from the atmospheric pressure ion source into the inlet and inefficient transmission of ions entering the first reduced pressure stage of the mass spectrometer.The electrodynamic ion funnel (21), which has been developed to efficiently capture, focus, and transmit ions to the high vacuum region of the mass spectrometer, is expected to provide a large benefit to SRM analyses. The original ion funnel interfaces, which operated at a maximum of ∼5 torr, were able to enhance signal intensities for a variety of MS analyzers (2224) by replacing the inefficient skimmer interface. Although achieving near lossless ion transmission to high vacuum, losses at the atmospheric pressure interface went unmitigated. More recently, a high pressure ion funnel interface capable of operating at a pressure of ∼30 torr was introduced (25). The higher operating pressures accommodated greater gas loads and enabled more efficient ion sampling from atmospheric pressure through a multicapillary inlet. With a dual ion funnel interface comprising a high pressure ion funnel with a heated multicapillary inlet followed by a standard ion funnel operated at 1–2 torrs, highly efficient ion sampling from atmospheric pressure to high vacuum is readily achieved.In this study, we report the enhanced sensitivity and reproducibility of SRM-based targeted proteomics measurements achieved by implementing a dual stage electrodynamic ion funnel interface that incorporates a multicapillary inlet with a triple quadrupole mass spectrometer. A series of LC-SRM-MS measurements were made using mouse plasma samples spiked with various concentrations of tryptic peptides from five standard proteins to evaluate the improvements in detection sensitivity and reproducibility attained by this modified interface relative to a standard Thermo (single capillary inlet/skimmer) interface. A ∼10-fold improvement in the limit of detection (LOD) as well as improved measurement reproducibility was achieved.  相似文献   

4.
Optimal performance of LC-MS/MS platforms is critical to generating high quality proteomics data. Although individual laboratories have developed quality control samples, there is no widely available performance standard of biological complexity (and associated reference data sets) for benchmarking of platform performance for analysis of complex biological proteomes across different laboratories in the community. Individual preparations of the yeast Saccharomyces cerevisiae proteome have been used extensively by laboratories in the proteomics community to characterize LC-MS platform performance. The yeast proteome is uniquely attractive as a performance standard because it is the most extensively characterized complex biological proteome and the only one associated with several large scale studies estimating the abundance of all detectable proteins. In this study, we describe a standard operating protocol for large scale production of the yeast performance standard and offer aliquots to the community through the National Institute of Standards and Technology where the yeast proteome is under development as a certified reference material to meet the long term needs of the community. Using a series of metrics that characterize LC-MS performance, we provide a reference data set demonstrating typical performance of commonly used ion trap instrument platforms in expert laboratories; the results provide a basis for laboratories to benchmark their own performance, to improve upon current methods, and to evaluate new technologies. Additionally, we demonstrate how the yeast reference, spiked with human proteins, can be used to benchmark the power of proteomics platforms for detection of differentially expressed proteins at different levels of concentration in a complex matrix, thereby providing a metric to evaluate and minimize preanalytical and analytical variation in comparative proteomics experiments.Access to proteomics performance standards is essential for several reasons. First, to generate the highest quality data possible, proteomics laboratories routinely benchmark and perform quality control (QC)1 monitoring of the performance of their instrumentation using standards. Second, appropriate standards greatly facilitate the development of improvements in technologies by providing a timeless standard with which to evaluate new protocols or instruments that claim to improve performance. For example, it is common practice for an individual laboratory considering purchase of a new instrument to require the vendor to run “demo” samples so that data from the new instrument can be compared head to head with existing instruments in the laboratory. Third, large scale proteomics studies designed to aggregate data across laboratories can be facilitated by the use of a performance standard to measure reproducibility across sites or to compare the performance of different LC-MS configurations or sample processing protocols used between laboratories to facilitate development of optimized standard operating procedures (SOPs).Most individual laboratories have adopted their own QC standards, which range from mixtures of known synthetic peptides to digests of bovine serum albumin or more complex mixtures of several recombinant proteins (1). However, because each laboratory performs QC monitoring in isolation, it is difficult to compare the performance of LC-MS platforms throughout the community.Several standards for proteomics are available for request or purchase (2, 3). RM8327 is a mixture of three peptides developed as a reference material in collaboration between the National Institute of Standards and Technology (NIST) and the Association of Biomolecular Resource Facilities. Mixtures of 15–48 purified human proteins are also available, such as the HUPO (Human Proteome Organisation) Gold MS Protein Standard (Invitrogen), the Universal Proteomics Standard (UPS1; Sigma), and CRM470 from the European Union Institute for Reference Materials and Measurements. Although defined mixtures of peptides or proteins can address some benchmarking and QC needs, there is an additional need for more complex reference materials to fully represent the challenges of LC-MS data acquisition in complex matrices encountered in biological samples (2, 3).Although it has not been widely distributed as a reference material, the yeast Saccharomyces cerevisiae proteome has been extensively used by the proteomics community to characterize the capabilities of a variety of LC-MS-based approaches (415). Yeast provides a uniquely attractive complex performance standard for several reasons. Yeast encodes a complex proteome consisting of ∼4,500 proteins expressed during normal growth conditions (7, 1618). The concentration range of yeast proteins is sufficient to challenge the dynamic range of conventional mass spectrometers; the abundance of proteins ranges from fewer than 50 to more than 106 molecules per cell (4, 15, 16). Additionally, it is the most extensively characterized complex biological proteome and the only one associated with several large scale studies estimating the abundance of all detectable proteins (5, 9, 16, 17, 19, 20) as well as LC-MS/MS data sets showing good correlation between LC-MS/MS detection efficiency and the protein abundance estimates (4, 11, 12, 15). Finally, it is inexpensive and easy to produce large quantities of yeast protein extract for distribution.In this study, we describe large scale production of a yeast S. cerevisiae performance standard, which we offer to the community through NIST. Through a series of interlaboratory studies, we created a reference data set characterizing the yeast performance standard and defining reasonable performance of ion trap-based LC-MS platforms in expert laboratories using a series of performance metrics. This publicly available data set provides a basis for additional laboratories using the yeast standard to benchmark their own performance as well as to improve upon the current status by evolving protocols, improving instrumentation, or developing new technologies. Finally, we demonstrate how the yeast performance standard, spiked with human proteins, can be used to benchmark the power of proteomics platforms for detection of differentially expressed proteins at different levels of concentration in a complex matrix.  相似文献   

5.
Comprehensive proteomic profiling of biological specimens usually requires multidimensional chromatographic peptide fractionation prior to mass spectrometry. However, this approach can suffer from poor reproducibility because of the lack of standardization and automation of the entire workflow, thus compromising performance of quantitative proteomic investigations. To address these variables we developed an online peptide fractionation system comprising a multiphasic liquid chromatography (LC) chip that integrates reversed phase and strong cation exchange chromatography upstream of the mass spectrometer (MS). We showed superiority of this system for standardizing discovery and targeted proteomic workflows using cancer cell lysates and nondepleted human plasma. Five-step multiphase chip LC MS/MS acquisition showed clear advantages over analyses of unfractionated samples by identifying more peptides, consuming less sample and often improving the lower limits of quantitation, all in highly reproducible, automated, online configuration. We further showed that multiphase chip LC fractionation provided a facile means to detect many N- and C-terminal peptides (including acetylated N terminus) that are challenging to identify in complex tryptic peptide matrices because of less favorable ionization characteristics. Given as much as 95% of peptides were detected in only a single salt fraction from cell lysates we exploited this high reproducibility and coupled it with multiple reaction monitoring on a high-resolution MS instrument (MRM-HR). This approach increased target analyte peak area and improved lower limits of quantitation without negatively influencing variance or bias. Further, we showed a strategy to use multiphase LC chip fractionation LC-MS/MS for ion library generation to integrate with SWATHTM data-independent acquisition quantitative workflows. All MS data are available via ProteomeXchange with identifier PXD001464.Mass spectrometry based proteomic quantitation is an essential technique used for contemporary, integrative biological studies. Whether used in discovery experiments or for targeted biomarker applications, quantitative proteomic studies require high reproducibility at many levels. It requires reproducible run-to-run peptide detection, reproducible peptide quantitation, reproducible depth of proteome coverage, and ideally, a high degree of cross-laboratory analytical reproducibility. Mass spectrometry centered proteomics has evolved steadily over the past decade, now mature enough to derive extensive draft maps of the human proteome (1, 2). Nonetheless, a key requirement yet to be realized is to ensure that quantitative proteomics can be carried out in a timely manner while satisfying the aforementioned challenges associated with reproducibility. This is especially important for recent developments using data independent MS quantitation and multiple reaction monitoring on high-resolution MS (MRM-HR)1 as they are both highly dependent on LC peptide retention time reproducibility and precursor detectability, while attempting to maximize proteome coverage (3). Strategies usually employed to increase the depth of proteome coverage utilize various sample fractionation methods including gel-based separation, affinity enrichment or depletion, protein or peptide chemical modification-based enrichment, and various peptide chromatography methods, particularly ion exchange chromatography (410). In comparison to an unfractionated “naive” sample, the trade-off in using these enrichments/fractionation approaches are higher risk of sample losses, introduction of undesired chemical modifications (e.g. oxidation, deamidation, N-terminal lactam formation), and the potential for result skewing and bias, as well as numerous time and human resources required to perform the sample preparation tasks. Online-coupled approaches aim to minimize those risks and address resource constraints. A widely practiced example of the benefits of online sample fractionation has been the decade long use of combining strong cation exchange chromatography (SCX) with C18 reversed-phase (RP) for peptide fractionation (known as MudPIT – multidimensional protein identification technology), where SCX and RP is performed under the same buffer conditions and the SCX elution performed with volatile organic cations compatible with reversed phase separation (11). This approach greatly increases analyte detection while avoiding sample handling losses. The MudPIT approach has been widely used for discovery proteomics (1214), and we have previously shown that multiphasic separations also have utility for targeted proteomics when configured for selected reaction monitoring MS (SRM-MS). We showed substantial advantages of MudPIT-SRM-MS with reduced ion suppression, increased peak areas and lower limits of detection (LLOD) compared with conventional RP-SRM-MS (15).To improve the reproducibility of proteomic workflows, increase throughput and minimize sample loss, numerous microfluidic devices have been developed and integrated for proteomic applications (16, 17). These devices can broadly be classified into two groups: (1) microfluidic chips for peptide separation (1825) and; (2) proteome reactors that combine enzymatic processing with peptide based fractionation (2630). Because of the small dimension of these devices, they are readily able to integrate into nanoLC workflows. Various applications have been described including increasing proteome coverage (22, 27, 28) and targeting of phosphopeptides (24, 31, 32), glycopeptides and released glycans (29, 33, 34).In this work, we set out to take advantage of the benefits of multiphasic peptide separations and address the reproducibility needs required for high-throughput comparative proteomics using a variety of workflows. We integrated a multiphasic SCX and RP column in a “plug-and-play” microfluidic chip format for online fractionation, eliminating the need for users to make minimal dead volume connections between traps and columns. We show the flexibility of this format to provide robust peptide separation and reproducibility using conventional and topical mass spectrometry workflows. This was undertaken by coupling the multiphase liquid chromatography (LC) chip to a fast scanning Q-ToF mass spectrometer for data dependent MS/MS, data independent MS (SWATH) and for targeted proteomics using MRM-HR, showing clear advantages for repeatable analyses compared with conventional proteomic workflows.  相似文献   

6.
Quantitative proteome analyses suggest that the well-established stain colloidal Coomassie Blue, when used as an infrared dye, may provide sensitive, post-electrophoretic in-gel protein detection that can rival even Sypro Ruby. Considering the central role of two-dimensional gel electrophoresis in top-down proteomic analyses, a more cost effective alternative such as Coomassie Blue could prove an important tool in ongoing refinements of this important analytical technique. To date, no systematic characterization of Coomassie Blue infrared fluorescence detection relative to detection with SR has been reported. Here, seven commercial Coomassie stain reagents and seven stain formulations described in the literature were systematically compared. The selectivity, threshold sensitivity, inter-protein variability, and linear-dynamic range of Coomassie Blue infrared fluorescence detection were assessed in parallel with Sypro Ruby. Notably, several of the Coomassie stain formulations provided infrared fluorescence detection sensitivity to <1 ng of protein in-gel, slightly exceeding the performance of Sypro Ruby. The linear dynamic range of Coomassie Blue infrared fluorescence detection was found to significantly exceed that of Sypro Ruby. However, in two-dimensional gel analyses, because of a blunted fluorescence response, Sypro Ruby was able to detect a few additional protein spots, amounting to 0.6% of the detected proteome. Thus, although both detection methods have their advantages and disadvantages, differences between the two appear to be small. Coomassie Blue infrared fluorescence detection is thus a viable alternative for gel-based proteomics, offering detection comparable to Sypro Ruby, and more reliable quantitative assessments, but at a fraction of the cost.Gel electrophoresis is an accessible, widely applicable and mature protein resolving technology. As the original top-down approach to proteomic analyses, among its many attributes the high resolution achievable by two dimensional gel-electrophoresis (2DE)1 ensures that it remains an effective analytical technology despite the appearance of alternatives. However, in-gel detection remains a limiting factor for gel-based analyses; available technology generally permits the detection and quantification of only relatively abundant proteins (35). Many critical components in normal physiology and also disease may be several orders of magnitude less abundant and thus below the detection threshold of in-gel stains, or indeed most techniques. Pre- and post-fractionation technologies have been developed to address this central issue in proteomics but these are not without limitations (15). Thus improved detection methods for gel-based proteomics continue to be a high priority, and the literature is rich with different in-gel detection methods and innovative improvements (634). This history of iterative refinement presents a wealth of choices when selecting a detection strategy for a gel-based proteomic analysis (35).Perhaps the best known in-gel detection method is the ubiquitous Coomassie Blue (CB) stain; CB has served as a gel stain and protein quantification reagent for over 40 years. Though affordable, robust, easy to use, and compatible with mass spectrometry (MS), CB staining is relatively insensitive. In traditional organic solvent formulations, CB detects ∼ 10 ng of protein in-gel, and some reports suggest poorer sensitivity (27, 29, 36, 37). Sensitivity is hampered by relatively high background staining because of nonspecific retention of dye within the gel matrix (32, 36, 38, 39). The development of colloidal CB (CCB) formulations largely addressed these limitations (12); the concentration of soluble CB was carefully controlled by sequestering the majority of the dye into colloidal particles, mediated by pH, solvent, and the ionic strength of the solution. Minimizing soluble dye concentration and penetration of the gel matrix mitigated background staining, and the introduction of phosphoric acid into the staining reagent enhanced dye-protein interactions (8, 12, 40), contributing to an in-gel staining sensitivity of 5–10 ng protein, with some formulations reportedly yielding sensitivities of 0.1–1 ng (8, 12, 22, 39, 41, 42). Thus CCB achieved higher sensitivity than traditional CB staining, yet maintained all the advantages of the latter, including low cost and compatibility with existing densitometric detection instruments and MS. Although surpassed by newer methods, the practical advantages of CCB ensure that it remains one of the most common gel stains in use.Fluorescent stains have become the routine and sensitive alternative to visible dyes. Among these, the ruthenium-organometallic family of dyes have been widely applied and the most commercially well-known is Sypro Ruby (SR), which is purported to interact noncovalently with primary amines in proteins (15, 18, 19, 43). Chief among the attributes of these dyes is their high sensitivity. In-gel detection limits of < 1 ng for some proteins have been reported for SR (6, 9, 14, 44, 45). Moreover, SR staining has been reported to yield a greater linear dynamic range (LDR), and reduced interprotein variability (IPV) compared with CCB and silver stains (15, 19, 4649). SR is easy to use, fully MS compatible, and relatively forgiving of variations in initial conditions (6, 15). The chief consequence of these advances remains high cost; SR and related stains are notoriously expensive, and beyond the budget of many laboratories. Furthermore, despite some small cost advantage relative to SR, none of the available alternatives has been consistently and quantitatively demonstrated to substantially improve on the performance of SR under practical conditions (9, 50).Notably, there is evidence to suggest that CCB staining is not fundamentally insensitive, but rather that its sensitivity has been limited by traditional densitometric detection (50, 51). When excited in the near IR at ∼650 nm, protein-bound CB in-gel emits light in the range of 700–800 nm. Until recently, the lack of low-cost, widely available and sufficiently sensitive infrared (IR)-capable imaging instruments prevented mainstream adoption of in-gel CB infrared fluorescence detection (IRFD); advances in imaging technology are now making such instruments far more accessible. Initial reports suggested that IRFD of CB-stained gels provided greater sensitivity than traditional densitometric detection (50, 51). Using CB R250, in-gel IRFD was reported to detect as little as 2 ng of protein in-gel, with a LDR of about an order of magnitude (2 to 20 ng, or 10 to 100 ng in separate gels), beyond which the fluorescent response saturated into the μg range (51). Using the G250 dye variant, it was determined that CB-IRFD of 2D gels detected ∼3 times as many proteins as densitometric imaging, and a comparable number of proteins as seen by SR (50). This study also concluded that CB-IRFD yielded a significantly higher signal to background ratio (S/BG) than SR, providing initial evidence that CB-IRFD may be superior to SR in some aspects of stain performance (50).Despite this initial evidence of the viability of CB-IRF as an in-gel protein detection method, a detailed characterization of this technology has not yet been reported. Here a more thorough, quantitative characterization of CB-IRFD is described, establishing its lowest limit of detection (LLD), IPV, and LDR in comparison to SR. Finally a wealth of modifications and enhancements of CCB formulations have been reported (8, 12, 21, 24, 26, 29, 40, 41, 5254), and likewise there are many commercially available CCB stain formulations. To date, none of these formulations have been compared quantitatively in terms of their relative performance when detected using IRF. As a general detection method for gel-based proteomics, CB-IRFD was found to provide comparable or even slightly superior performance to SR according to most criteria, including sensitivity and selectivity (50). Furthermore, in terms of LDR, CB-IRFD showed distinct advantages over SR. However, assessing proteomes resolved by 2DE revealed critical distinctions between CB-IRFD and SR in terms of protein quantification versus threshold detection: neither stain could be considered unequivocally superior to the other by all criteria. Nonetheless, IRFD proved the most sensitive method of detecting CB-stained protein in-gel, enabling high sensitivity detection without the need for expensive reagents or even commercial formulations. Overall, CB-IRFD is a viable alternative to SR and other mainstream fluorescent stains, mitigating the high cost of large-scale gel-based proteomic analyses, making high sensitivity gel-based proteomics accessible to all labs. With improvements to CB formulations and/or image acquisition instruments, the performance of this detection technology may be further enhanced.  相似文献   

7.
The data-independent acquisition (DIA) approach has recently been introduced as a novel mass spectrometric method that promises to combine the high content aspect of shotgun proteomics with the reproducibility and precision of selected reaction monitoring. Here, we evaluate, whether SWATH-MS type DIA effectively translates into a better protein profiling as compared with the established shotgun proteomics.We implemented a novel DIA method on the widely used Orbitrap platform and used retention-time-normalized (iRT) spectral libraries for targeted data extraction using Spectronaut. We call this combination hyper reaction monitoring (HRM). Using a controlled sample set, we show that HRM outperformed shotgun proteomics both in the number of consistently identified peptides across multiple measurements and quantification of differentially abundant proteins. The reproducibility of HRM in peptide detection was above 98%, resulting in quasi complete data sets compared with 49% of shotgun proteomics.Utilizing HRM, we profiled acetaminophen (APAP)1-treated three-dimensional human liver microtissues. An early onset of relevant proteome changes was revealed at subtoxic doses of APAP. Further, we detected and quantified for the first time human NAPQI-protein adducts that might be relevant for the toxicity of APAP. The adducts were identified on four mitochondrial oxidative stress related proteins (GATM, PARK7, PRDX6, and VDAC2) and two other proteins (ANXA2 and FTCD).Our findings imply that DIA should be the preferred method for quantitative protein profiling.Quantitative mass spectrometry is a powerful and widely used approach to identify differentially abundant proteins, e.g. for proteome profiling and biomarker discovery (1). Several tens of thousands of peptides and thousands of proteins can be routinely identified from a single sample injection in shotgun proteomics (2). Shotgun proteomics, however, is limited by low analytical reproducibility. This is due to the complexity of the samples that results in under sampling (supplemental Fig. 1) and to the fact that the acquisition of MS2 spectra is often triggered outside of the elution peak apex. As a result, only 17% of the detectable peptides are typically fragmented, and less than 60% of those are identified. This translates in reliable identification of only 10% of the detectable peptides (3). The overlap of peptide identification across technical replicates is typically 35–60% (4), which results in inconsistent peptide quantification. Alternatively to shotgun proteomics, selected reaction monitoring (SRM) enables quantification of up to 200–300 peptides at very high reproducibility, accuracy, and precision (58).Data-independent acquisition (DIA), a novel acquisition type, overcomes the semistochastic nature of shotgun proteomics (918). Spectra are acquired according to a predefined schema instead of dependent on the data. Targeted analysis of DIA data was introduced with SWATH-MS (19). For the originally published SWATH-MS, the mass spectrometer cycles through 32 predefined, contiguous, 25 Thomson wide precursor windows, and records high-resolution fragment ion spectra (19). This results in a comprehensive measurement of all detectable precursors of the selected mass range. The main novelty of SWATH-MS was in the analysis of the collected DIA data. Predefined fragment ions are extracted using precompiled spectrum libraries, which results in SRM-like data. Such targeted analyses are now enabled by several publicly available computational tools, in particular Spectronaut2, Skyline (20), and OpenSWATH (21). The accuracy of peptide identification is evaluated based on the mProphet method (22).We introduce a novel SWATH-MS-type DIA workflow termed hyper reaction monitoring (HRM) (reviewed in (23)) implemented on a Thermo Scientific Q Exactive platform. It consists of comprehensive DIA acquisition and targeted data analysis with retention-time-normalized spectral libraries (24). Its high accuracy of peptide identification and quantification is due to three aspects. First, we developed a novel, improved DIA method. Second, we reimplemented the mProphet (22) approach in the software Spectronaut (www.spectronaut.org). Third, we developed large, optimized, and retention-time-normalized (iRT) spectral libraries.We compared HRM and state-of-the-art shotgun proteomics in terms of ability to discover differentially abundant proteins. For this purpose, we used a “profiling standard sample set” with 12 non-human proteins spiked at known absolute concentrations into a stable human cell line protein extract. This resulted in quasi complete data sets for HRM and the detection of a larger number of differentially abundant proteins as compared with shotgun proteomics. We utilized HRM to identify changes in the proteome in primary three-dimensional human liver microtissues after APAP exposure (2527). These primary hepatocytes exhibit active drug metabolism. With a starting material of only 12,000 cells per sample, the abundance of 2,830 proteins was quantified over an APAP concentration range. Six novel NAPQI-cysteine proteins adducts that might be relevant for the toxicity of APAP were found and quantified mainly on mitochondrion-related proteins.  相似文献   

8.
The increasing scale and complexity of quantitative proteomics studies complicate subsequent analysis of the acquired data. Untargeted label-free quantification, based either on feature intensities or on spectral counting, is a method that scales particularly well with respect to the number of samples. It is thus an excellent alternative to labeling techniques. In order to profit from this scalability, however, data analysis has to cope with large amounts of data, process them automatically, and do a thorough statistical analysis in order to achieve reliable results. We review the state of the art with respect to computational tools for label-free quantification in untargeted proteomics. The two fundamental approaches are feature-based quantification, relying on the summed-up mass spectrometric intensity of peptides, and spectral counting, which relies on the number of MS/MS spectra acquired for a certain protein. We review the current algorithmic approaches underlying some widely used software packages and briefly discuss the statistical strategies for analyzing the data.Over recent decades, mass spectrometry has become the analytical method of choice in most proteomics studies (e.g. Refs. 14). A standard mass spectrometric workflow allows for both protein identification and protein quantification (5) in some form. For a long time, the technology has been used mainly for qualitative assessments of protein mixtures, namely, to assess whether a specific protein is in the sample or not. However, for the majority of interesting research questions, especially in the field of systems biology, this binary information (present or not) is not sufficient (6). The necessity of more detailed information on protein expression levels drives the field of quantitative proteomics (7, 8), which enables the integration of proteomics data with other data sources and allows network-centered studies, as reviewed in Ref. 9. Recent studies show that mass-spectrometry-based quantitative proteomics experiments can provide quantitative information (relative or absolute) for large parts, if not the entire set, of expressed proteins (1012).Since the isotope-coded affinity tag protocol was first published in 1999 (13), numerous labeling strategies have found their way into the field of quantitative proteomics (14). These include isotope-coded protein labeling (15), metabolic labeling (16, 17), and isobaric tags (18, 19). Comprehensive overviews of different quantification strategies can be found in Refs. 20 and 21. Because of the shortcomings of labeling strategies, label-free methods are increasingly gaining the interest of proteomics researchers (22, 23). In label-free quantification, no label is introduced to either of the samples. All samples are analyzed in separate LC/MS experiments, and the individual peptide properties of the individual measurements are then compared. Regardless of the quantification strategy, computational approaches for data analyses have become the critical final step of the proteomics workflow. Overviews of existing computational approaches in proteomics are provided in Refs. 24 and 25. The computational label-free quantification workflow in visualized in Fig. 1. Comparing peptide quantities using mass spectrometry remains a difficult task, because mass spectrometers have different response values for different chemical entities, and thus a direct comparison of different peptides is not possible. The computational analysis of a label-free quantitative data set consists of several steps that are mainly split in raw data signal processing and quantification. Signal processing steps comprise data reduction procedures such as baseline removal, denoising, and centroiding.Open in a separate windowFig. 1.The sample cohort that can be analyzed via label-free proteomics is not limited in size. Each sample is processed separately through the sample preparation and data acquisition pipeline. For data analysis, the data from the different LC/MS runs are combined.These steps can be accomplished in modular building blocks, or the entire analysis can be performed using monolithic analysis software. Recently, it has been shown that it is beneficial to combine modular blocks from different software tools to a consensus pipeline (26). The same study also illustrates the diversity of methods that are modularized by different software tools. In another recent publication, monolithic software packages are compared (27). In that study, the authors identify a set of seven metrics: detection sensitivity, detection consistency, intensity consistency, intensity accuracy, detection accuracy, statistical capability, and quantification accuracy. Despite the missing independence of these metrics and the loose reporting of software parameter settings, such comparative studies are of great interest to the field of quantitative proteomics. A general conclusion from these studies is that the choice of software might, to a certain degree, affect the final results of the study.Absolute quantification of peptides and proteins using intensity-based label-free methods is possible and can be done with excellent accuracy, if standard addition is used. With the help of known concentrations, calibration lines can be drawn, and absolute protein quantities can be directly inferred from these calibration measurements (28). Furthermore, it has been suggested that peptide peak intensities can be predicted and absolute quantities can be derived from these predictions (29). However, the limited accuracy of predictions or the need for peptides of known concentrations limits these approaches to selected proteins/peptides only and prevents their use on a proteome-wide scale.Spectral counting methods have also been used for the estimation of absolute concentrations on a global scale (30), albeit at drastically reduced accuracy relative to intensity-based methods. In one study, the authors used a mixture of 48 proteins with known concentrations and predicted the absolute copy number amounts of thousands of proteins based on that mixture. Despite the fact that large, proteome-wide data sets will dilute the effects of different peptide detectabilities on the individual protein level, such methods will always be limited in their accuracy of quantification.The generic nature of label-free quantification is not restricted to any model system and can also be employed with tissue or body fluids (31, 32). However, the label-free approach is more sensitive to technical deviations between LC/MS runs as information is compared between different measurements. Therefore, the reproducibility of the analytical platform is crucial for successful label-free quantification. The recent success of label-free quantification could only be accomplished through significant improvements of algorithms (3336). An increasingly large collection of software tools for label-free proteomics have been published as open source applications or have entered the market as commercially available packages. This review aims at outlining the computational methods that are generally implemented by these software tools. Furthermore, we illustrate strengths and weaknesses of different tools. The review provides an information resource for the broad proteomics audience and does not illustrate all algorithmic details of the individual tools.  相似文献   

9.
A Boolean network is a model used to study the interactions between different genes in genetic regulatory networks. In this paper, we present several algorithms using gene ordering and feedback vertex sets to identify singleton attractors and small attractors in Boolean networks. We analyze the average case time complexities of some of the proposed algorithms. For instance, it is shown that the outdegree-based ordering algorithm for finding singleton attractors works in time for , which is much faster than the naive time algorithm, where is the number of genes and is the maximum indegree. We performed extensive computational experiments on these algorithms, which resulted in good agreement with theoretical results. In contrast, we give a simple and complete proof for showing that finding an attractor with the shortest period is NP-hard.[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32]  相似文献   

10.
The use of electron transfer dissociation (ETD) fragmentation for analysis of peptides eluting in liquid chromatography tandem mass spectrometry experiments is increasingly common and can allow identification of many peptides and proteins in complex mixtures. Peptide identification is performed through the use of search engines that attempt to match spectra to peptides from proteins in a database. However, software for the analysis of ETD fragmentation data is currently less developed than equivalent algorithms for the analysis of the more ubiquitous collision-induced dissociation fragmentation spectra. In this study, a new scoring system was developed for analysis of peptide ETD fragmentation data that varies the ion type weighting depending on the precursor ion charge state and peptide sequence. This new scoring regime was applied to the analysis of data from previously published results where four search engines (Mascot, Open Mass Spectrometry Search Algorithm (OMSSA), Spectrum Mill, and X!Tandem) were compared (Kandasamy, K., Pandey, A., and Molina, H. (2009) Evaluation of several MS/MS search algorithms for analysis of spectra derived from electron transfer dissociation experiments. Anal. Chem. 81, 7170–7180). Protein Prospector identified 80% more spectra at a 1% false discovery rate than the most successful alternative searching engine in this previous publication. These results suggest that other search engines would benefit from the application of similar rules.The recently developed fragmentation approach of electron transfer dissociation (ETD)1 has become a genuine alternative to the more ubiquitous collision-induced dissociation (CID) for high throughput and high sensitivity proteomic analysis (13). ETD (4) and the related fragmentation process electron capture dissociation (ECD) (5) have been demonstrated to have particular advantages for the analysis of large peptides and small proteins (68) as well as the analysis of peptides bearing labile post-translational modifications (911). The results achieved through ETD and ECD analysis have been shown to be highly complementary to those obtained through CID fragmentation analysis, both through increasing confidence in particular identifications of peptides and also by allowing identification of extra components in complex mixtures (10, 12, 13). As CID and ETD can be sequentially or alternatively performed on precursor ions in the same mass spectrometric run, it is expected that the combined use of these two fragmentation analysis techniques will become increasingly common to enable more comprehensive sample analysis.Software for analysis of CID spectra is significantly more advanced than that for ECD/ETD data. This is partly because the behavior of peptides under CID fragmentation is better characterized and understood so software has been developed that is better able to predict the fragment ions expected. The fragment ion types observed in ETD and ECD are largely known (5, 14, 15), but information about the frequency and peak intensities of the different ion types observed is less well documented.We recently performed a study to characterize how frequently the different fragment ion types are detected in ETD spectra when analyzing complex digest mixtures produced by proteolytic enzymes or chemical cleavage reagents of different sequence specificity (16). These results were analyzed with respect to precursor charge state and location of basic residues, which were both shown to be significant factors in controlling the fragment ion types observed. The results showed that ETD spectra of doubly charged precursor ions produced very different fragment ions depending on the location of a basic residue in the sequence.Based on this statistical analysis of ETD data from a diverse range of peptides (16), in the present study, a new scoring system was developed and implemented in the search engine Batch-Tag within Protein Prospector that adjusts the weighting for different fragment ion types based on the precursor charge state and the presence of basic amino acid residues at either peptide terminus. The results using this new scoring system were compared with the previous generation of Batch-Tag, which used ion score weightings based on the average frequency of observation of different fragment types in ETD spectra of tryptic peptides and used the same scoring irrespective of precursor charge and sequence. The performance of this new scoring was also compared with those reported by other search engines using results previously published from a large standard data set (17). The new scoring system allowed identification of significantly more spectra than achieved with the previous scoring system. It also assigned 80% more spectra than the most successful of the compared search engines when using the same false discovery rate threshold.  相似文献   

11.
12.
Database search programs are essential tools for identifying peptides via mass spectrometry (MS) in shotgun proteomics. Simultaneously achieving high sensitivity and high specificity during a database search is crucial for improving proteome coverage. Here we present JUMP, a new hybrid database search program that generates amino acid tags and ranks peptide spectrum matches (PSMs) by an integrated score from the tags and pattern matching. In a typical run of liquid chromatography coupled with high-resolution tandem MS, more than 95% of MS/MS spectra can generate at least one tag, whereas the remaining spectra are usually too poor to derive genuine PSMs. To enhance search sensitivity, the JUMP program enables the use of tags as short as one amino acid. Using a target-decoy strategy, we compared JUMP with other programs (e.g. SEQUEST, Mascot, PEAKS DB, and InsPecT) in the analysis of multiple datasets and found that JUMP outperformed these preexisting programs. JUMP also permitted the analysis of multiple co-fragmented peptides from “mixture spectra” to further increase PSMs. In addition, JUMP-derived tags allowed partial de novo sequencing and facilitated the unambiguous assignment of modified residues. In summary, JUMP is an effective database search algorithm complementary to current search programs.Peptide identification by tandem mass spectra is a critical step in mass spectrometry (MS)-based1 proteomics (1). Numerous computational algorithms and software tools have been developed for this purpose (26). These algorithms can be classified into three categories: (i) pattern-based database search, (ii) de novo sequencing, and (iii) hybrid search that combines database search and de novo sequencing. With the continuous development of high-performance liquid chromatography and high-resolution mass spectrometers, it is now possible to analyze almost all protein components in mammalian cells (7). In contrast to rapid data collection, it remains a challenge to extract accurate information from the raw data to identify peptides with low false positive rates (specificity) and minimal false negatives (sensitivity) (8).Database search methods usually assign peptide sequences by comparing MS/MS spectra to theoretical peptide spectra predicted from a protein database, as exemplified in SEQUEST (9), Mascot (10), OMSSA (11), X!Tandem (12), Spectrum Mill (13), ProteinProspector (14), MyriMatch (15), Crux (16), MS-GFDB (17), Andromeda (18), BaMS2 (19), and Morpheus (20). Some other programs, such as SpectraST (21) and Pepitome (22), utilize a spectral library composed of experimentally identified and validated MS/MS spectra. These methods use a variety of scoring algorithms to rank potential peptide spectrum matches (PSMs) and select the top hit as a putative PSM. However, not all PSMs are correctly assigned. For example, false peptides may be assigned to MS/MS spectra with numerous noisy peaks and poor fragmentation patterns. If the samples contain unknown protein modifications, mutations, and contaminants, the related MS/MS spectra also result in false positives, as their corresponding peptides are not in the database. Other false positives may be generated simply by random matches. Therefore, it is of importance to remove these false PSMs to improve dataset quality. One common approach is to filter putative PSMs to achieve a final list with a predefined false discovery rate (FDR) via a target-decoy strategy, in which decoy proteins are merged with target proteins in the same database for estimating false PSMs (2326). However, the true and false PSMs are not always distinguishable based on matching scores. It is a problem to set up an appropriate score threshold to achieve maximal sensitivity and high specificity (13, 27, 28).De novo methods, including Lutefisk (29), PEAKS (30), NovoHMM (31), PepNovo (32), pNovo (33), Vonovo (34), and UniNovo (35), identify peptide sequences directly from MS/MS spectra. These methods can be used to derive novel peptides and post-translational modifications without a database, which is useful, especially when the related genome is not sequenced. High-resolution MS/MS spectra greatly facilitate the generation of peptide sequences in these de novo methods. However, because MS/MS fragmentation cannot always produce all predicted product ions, only a portion of collected MS/MS spectra have sufficient quality to extract partial or full peptide sequences, leading to lower sensitivity than achieved with the database search methods.To improve the sensitivity of the de novo methods, a hybrid approach has been proposed to integrate peptide sequence tags into PSM scoring during database searches (36). Numerous software packages have been developed, such as GutenTag (37), InsPecT (38), Byonic (39), DirecTag (40), and PEAKS DB (41). These methods use peptide tag sequences to filter a protein database, followed by error-tolerant database searching. One restriction in most of these algorithms is the requirement of a minimum tag length of three amino acids for matching protein sequences in the database. This restriction reduces the sensitivity of the database search, because it filters out some high-quality spectra in which consecutive tags cannot be generated.In this paper, we describe JUMP, a novel tag-based hybrid algorithm for peptide identification. The program is optimized to balance sensitivity and specificity during tag derivation and MS/MS pattern matching. JUMP can use all potential sequence tags, including tags consisting of only one amino acid. When we compared its performance to that of two widely used search algorithms, SEQUEST and Mascot, JUMP identified ∼30% more PSMs at the same FDR threshold. In addition, the program provides two additional features: (i) using tag sequences to improve modification site assignment, and (ii) analyzing co-fragmented peptides from mixture MS/MS spectra.  相似文献   

13.
The success of high-throughput proteomics hinges on the ability of computational methods to identify peptides from tandem mass spectra (MS/MS). However, a common limitation of most peptide identification approaches is the nearly ubiquitous assumption that each MS/MS spectrum is generated from a single peptide. We propose a new computational approach for the identification of mixture spectra generated from more than one peptide. Capitalizing on the growing availability of large libraries of single-peptide spectra (spectral libraries), our quantitative approach is able to identify up to 98% of all mixture spectra from equally abundant peptides and automatically adjust to varying abundance ratios of up to 10:1. Furthermore, we show how theoretical bounds on spectral similarity avoid the need to compare each experimental spectrum against all possible combinations of candidate peptides (achieving speedups of over five orders of magnitude) and demonstrate that mixture-spectra can be identified in a matter of seconds against proteome-scale spectral libraries. Although our approach was developed for and is demonstrated on peptide spectra, we argue that the generality of the methods allows for their direct application to other types of spectral libraries and mixture spectra.The success of tandem MS (MS/MS1) approaches to peptide identification is partly due to advances in computational techniques allowing for the reliable interpretation of MS/MS spectra. Mainstream computational techniques mainly fall into two categories: database search approaches that score each spectrum against peptides in a sequence database (14) or de novo techniques that directly reconstruct the peptide sequence from each spectrum (58). The combination of these methods with advances in high-throughput MS/MS have promoted the accelerated growth of spectral libraries, collections of peptide MS/MS spectra the identification of which were validated by accepted statistical methods (9, 10) and often also manually confirmed by mass spectrometry experts. The similar concept of spectral archives was also recently proposed to denote spectral libraries including “interesting” nonidentified spectra (11) (i.e. recurring spectra with good de novo reconstructions but no database match). The growing availability of these large collections of MS/MS spectra has reignited the development of alternative peptide identification approaches based on spectral matching (1214) and alignment (1517) algorithms.However, mainstream approaches were developed under the (often unstated) assumption that each MS/MS spectrum is generated from a single peptide. Although chromatographic procedures greatly contribute to making this a reasonable assumption, there are several situations where it is difficult or even impossible to separate pairs of peptides. Examples include certain permutations of the peptide sequence or post-translational modifications (see (18) for examples of co-eluting histone modification variants). In addition, innovative experimental setups have demonstrated the potential for increased throughput in peptide identification using mixture spectra; examples include data-independent acquisition (19) ion-mobility MS (20), and MSE strategies (21).To alleviate the algorithmic bottleneck in such scenarios, we describe a computational approach, M-SPLIT (mixture-spectrum partitioning using library of identified tandem mass spectra), that is able to reliably and efficiently identify peptides from mixture spectra, which are generated from a pair of peptides. In brief, a mixture spectrum is modeled as linear combination of two single-peptide spectra, and peptide identification is done by searching against a spectral library. We show that efficient filtration and accurate branch-and-bound strategies can be used to avoid the huge computational cost of searching all possible pairs. Thus equipped, our approach is able to identify the correct matches by considering only a minuscule fraction of all possible matches. Beyond potentially enhancing the identification capabilities of current MS/MS acquisition setups, we argue that the availability of methods to reliably identify MS/MS spectra from mixtures of peptides could enable the collection of MS/MS data using accelerated chromatography setups to obtain the same or better peptide identification results in a fraction of the experimental time currently required for exhaustive peptide separation.  相似文献   

14.
Top-down proteomics is emerging as a viable method for the routine identification of hundreds to thousands of proteins. In this work we report the largest top-down study to date, with the identification of 1,220 proteins from the transformed human cell line H1299 at a false discovery rate of 1%. Multiple separation strategies were utilized, including the focused isolation of mitochondria, resulting in significantly improved proteome coverage relative to previous work. In all, 347 mitochondrial proteins were identified, including ∼50% of the mitochondrial proteome below 30 kDa and over 75% of the subunits constituting the large complexes of oxidative phosphorylation. Three hundred of the identified proteins were found to be integral membrane proteins containing between 1 and 12 transmembrane helices, requiring no specific enrichment or modified LC-MS parameters. Over 5,000 proteoforms were observed, many harboring post-translational modifications, including over a dozen proteins containing lipid anchors (some previously unknown) and many others with phosphorylation and methylation modifications. Comparison between untreated and senescent H1299 cells revealed several changes to the proteome, including the hyperphosphorylation of HMGA2. This work illustrates the burgeoning ability of top-down proteomics to characterize large numbers of intact proteoforms in a high-throughput fashion.Although traditional bottom-up approaches to mass-spectrometry-based proteomics are capable of identifying thousands of protein groups from a complex mixture, proteolytic digestion can result in the loss of information pertaining to post-translational modifications and sequence variants (1, 2). The recent implementation of top-down proteomics in a high-throughput format using either Fourier transform ion cyclotron resonance (35) or Orbitrap instruments (6, 7) has shown an increasing scale of applicability while preserving information on combinatorial modifications and highly related sequence variants. For example, the identification of over 500 bacterial proteins helped researchers find covalent switches on cysteines (7), and over 1,000 proteins were identified from human cells (3). Such advances have driven the detection of whole protein forms, now simply called proteoforms (8), with several laboratories now seeking to tie these to specific functions in cell and disease biology (911).The term “proteoform” denotes a specific primary structure of an intact protein molecule that arises from a specific gene and refers to a precise combination of genetic variation, splice variants, and post-translational modifications. Whereas special attention is required in order to accomplish gene- and variant-specific identifications via the bottom-up approach, top-down proteomics routinely links proteins to specific genes without the problem of protein inference. However, the fully automated characterization of whole proteoforms still represents a significant challenge in the field. Another major challenge is to extend the top-down approach to the study of whole integral membrane proteins, whose hydrophobicity can often limit their analysis via LC-MS (5, 1216). Though integral membrane proteins are often difficult to solubilize, the long stretches of sequence information provided from fragmentation of their transmembrane domains in the gas phase can actually aid in their identification (5, 13).In parallel to the early days of bottom-up proteomics a decade ago (1721), in this work we brought the latest methods for top-down proteomics into combination with subcellular fractionation and cellular treatments to expand coverage of the human proteome. We utilized multiple dimensions of separation and an Orbitrap Elite mass spectrometer to achieve large-scale interrogation of intact proteins derived from H1299 cells. For this focus issue on post-translational modifications, we report this summary of findings from the largest implementation of top-down proteomics to date, which resulted in the identification of 1,220 proteins and thousands more proteoforms. We also applied the platform to H1299 cells induced into senescence by treatment with the DNA-damaging agent camptothecin.  相似文献   

15.
Previous studies have shown that protein-protein interactions among splicing factors may play an important role in pre-mRNA splicing. We report here identification and functional characterization of a new splicing factor, Sip1 (SC35-interacting protein 1). Sip1 was initially identified by virtue of its interaction with SC35, a splicing factor of the SR family. Sip1 interacts with not only several SR proteins but also with U1-70K and U2AF65, proteins associated with 5′ and 3′ splice sites, respectively. The predicted Sip1 sequence contains an arginine-serine-rich (RS) domain but does not have any known RNA-binding motifs, indicating that it is not a member of the SR family. Sip1 also contains a region with weak sequence similarity to the Drosophila splicing regulator suppressor of white apricot (SWAP). An essential role for Sip1 in pre-mRNA splicing was suggested by the observation that anti-Sip1 antibodies depleted splicing activity from HeLa nuclear extract. Purified recombinant Sip1 protein, but not other RS domain-containing proteins such as SC35, ASF/SF2, and U2AF65, restored the splicing activity of the Sip1-immunodepleted extract. Addition of U2AF65 protein further enhanced the splicing reconstitution by the Sip1 protein. Deficiency in the formation of both A and B splicing complexes in the Sip1-depleted nuclear extract indicates an important role of Sip1 in spliceosome assembly. Together, these results demonstrate that Sip1 is a novel RS domain-containing protein required for pre-mRNA splicing and that the functional role of Sip1 in splicing is distinct from those of known RS domain-containing splicing factors.Pre-mRNA splicing takes place in spliceosomes, the large RNA-protein complexes containing pre-mRNA, U1, U2, U4/6, and U5 small nuclear ribonucleoprotein particles (snRNPs), and a large number of accessory protein factors (for reviews, see references 21, 22, 37, 44, and 48). It is increasingly clear that the protein factors are important for pre-mRNA splicing and that studies of these factors are essential for further understanding of molecular mechanisms of pre-mRNA splicing.Most mammalian splicing factors have been identified by biochemical fractionation and purification (3, 15, 19, 3136, 45, 6971, 73), by using antibodies recognizing splicing factors (8, 9, 16, 17, 61, 66, 67, 74), and by sequence homology (25, 52, 74).Splicing factors containing arginine-serine-rich (RS) domains have emerged as important players in pre-mRNA splicing. These include members of the SR family, both subunits of U2 auxiliary factor (U2AF), and the U1 snRNP protein U1-70K (for reviews, see references 18, 41, and 59). Drosophila alternative splicing regulators transformer (Tra), transformer 2 (Tra2), and suppressor of white apricot (SWAP) also contain RS domains (20, 40, 42). RS domains in these proteins play important roles in pre-mRNA splicing (7, 71, 75), in nuclear localization of these splicing proteins (23, 40), and in protein-RNA interactions (56, 60, 64). Previous studies by us and others have demonstrated that one mechanism whereby SR proteins function in splicing is to mediate specific protein-protein interactions among spliceosomal components and between general splicing factors and alternative splicing regulators (1, 1a, 6, 10, 27, 63, 74, 77). Such protein-protein interactions may play critical roles in splice site recognition and association (for reviews, see references 4, 18, 37, 41, 47 and 59). Specific interactions among the splicing factors also suggest that it is possible to identify new splicing factors by their interactions with known splicing factors.Here we report identification of a new splicing factor, Sip1, by its interaction with the essential splicing factor SC35. The predicted Sip1 protein sequence contains an RS domain and a region with sequence similarity to the Drosophila splicing regulator, SWAP. We have expressed and purified recombinant Sip1 protein and raised polyclonal antibodies against the recombinant Sip1 protein. The anti-Sip1 antibodies specifically recognize a protein migrating at a molecular mass of approximately 210 kDa in HeLa nuclear extract. The anti-Sip1 antibodies sufficiently deplete Sip1 protein from the nuclear extract, and the Sip1-depleted extract is inactive in pre-mRNA splicing. Addition of recombinant Sip1 protein can partially restore splicing activity to the Sip1-depleted nuclear extract, indicating an essential role of Sip1 in pre-mRNA splicing. Other RS domain-containing proteins, including SC35, ASF/SF2, and U2AF65, cannot substitute for Sip1 in reconstituting splicing activity of the Sip1-depleted nuclear extract. However, addition of U2AF65 further increases splicing activity of Sip1-reconstituted nuclear extract, suggesting that there may be a functional interaction between Sip1 and U2AF65 in nuclear extract.  相似文献   

16.
A variety of high-throughput methods have made it possible to generate detailed temporal expression data for a single gene or large numbers of genes. Common methods for analysis of these large data sets can be problematic. One challenge is the comparison of temporal expression data obtained from different growth conditions where the patterns of expression may be shifted in time. We propose the use of wavelet analysis to transform the data obtained under different growth conditions to permit comparison of expression patterns from experiments that have time shifts or delays. We demonstrate this approach using detailed temporal data for a single bacterial gene obtained under 72 different growth conditions. This general strategy can be applied in the analysis of data sets of thousands of genes under different conditions.[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29]  相似文献   

17.
Hybrid quadrupole time-of-flight (QTOF) mass spectrometry is one of the two major principles used in proteomics. Although based on simple fundamentals, it has over the last decades greatly evolved in terms of achievable resolution, mass accuracy, and dynamic range. The Bruker impact platform of QTOF instruments takes advantage of these developments and here we develop and evaluate the impact II for shotgun proteomics applications. Adaption of our heated liquid chromatography system achieved very narrow peptide elution peaks. The impact II is equipped with a new collision cell with both axial and radial ion ejection, more than doubling ion extraction at high tandem MS frequencies. The new reflectron and detector improve resolving power compared with the previous model up to 80%, i.e. to 40,000 at m/z 1222. We analyzed the ion current from the inlet capillary and found very high transmission (>80%) up to the collision cell. Simulation and measurement indicated 60% transfer into the flight tube. We adapted MaxQuant for QTOF data, improving absolute average mass deviations to better than 1.45 ppm. More than 4800 proteins can be identified in a single run of HeLa digest in a 90 min gradient. The workflow achieved high technical reproducibility (R2 > 0.99) and accurate fold change determination in spike-in experiments in complex mixtures. Using label-free quantification we rapidly quantified haploid against diploid yeast and characterized overall proteome differences in mouse cell lines originating from different tissues. Finally, after high pH reversed-phase fractionation we identified 9515 proteins in a triplicate measurement of HeLa peptide mixture and 11,257 proteins in single measurements of cerebellum—the highest proteome coverage reported with a QTOF instrument so far.Building on the fundamental advance of the soft ionization techniques electrospray ionization and matrix-assisted laser desorption/ionization (1, 2), MS-based proteomics has advanced tremendously over the last two decades (36). Bottom-up, shotgun proteomics is usually performed in a liquid chromatography-tandem MS (LC-MS/MS)1 format, where nanoscale liquid chromatography is coupled through electrospray ionization to an instrument capable of measuring a mass spectrum and fragmenting the recognized precursor peaks on the chromatographic time scale. Fundamental challenges of shotgun proteomics include the very large numbers of peptides that elute over relatively short periods and peptide abundances that vary by many orders of magnitude. Developments in mass spectrometers toward higher sensitivity, sequencing speed, and resolution were needed and helped to address these critical challenges (7, 8). Especially the introduction of the Orbitrap mass analyzers has advanced the state of the art of the field because of their very high resolution and mass accuracy (9, 10). A popular configuration couples a quadrupole mass filter for precursor selection to the Orbitrap analyzer in a compact benchtop format (1113).In addition to the improvements in MS instrumentation, there have been key advances in the entire proteomics workflow, from sample preparation through improved LC systems and in computational proteomics (1416). Together, such advances are making shotgun proteomics increasingly comprehensive and deep analyses can now be performed in a reasonable time (13, 1719). Nevertheless, complete analysis of all expressed proteins in a complex system remains extremely challenging and complete measurement of all the peptides produced in shotgun proteomics may not even be possible in principle (20, 21). Therefore, an urgent need for continued improvements in proteomics technology remains.Besides the Orbitrap analyzer and other ion trap technologies, the main alternative MS technology is time-of-flight, a technology that has been used for many decades in diverse fields. The configuration employed in proteomics laboratories combines a quadrupole mass filter via a collision cell and orthogonal acceleration unit to a reflectron and a multichannel plate (MCP) detector (22). TOF scans are generated in much less than a millisecond (ms), and a number of these “pulses” are added to obtain an MS or MS/MS spectrum with the desired signal to noise ratio. Our own laboratory has used such a quadrupole time-of-flight (QTOF) instrument as the main workhorse in proteomics for many years, but then switched to high-resolution trapping instruments because of their superior resolution and mass accuracy. However, TOF technology has fundamental attractions, such as the extremely high scan speed and the absence of space charge, which limits the number of usable ions in all trapping instruments. In principle, the high spectra rate makes TOF instruments capable of making use of the majority of ions, thus promising optimal sensitivity, dynamic range and hence quantification. It also means that TOF can naturally be interfaced with ion mobility devices, which typically separate ions on the ms time scale. Data independent analysis strategies such as MSE, in which all precursors are fragmented simultaneously (23, 24) or SWATH, in which the precursor ion window is rapidly cycled through the entire mass range (25), also make use of the high scanning speed offered by QTOF instruments. It appears that QTOFs are set to make a comeback in proteomics with recent examples showing impressive depth of coverage of complex proteomes. For instance, using a variant of the MSE method, identification of 5468 proteins was reported in HeLa cells in single shots and small sample amounts (26). In another report, employing ion mobility for better transmission of fragment ions to the detector led to the identification of up to 7548 proteins in human ovary tissue (27).In this paper, we describe the impact II™, a benchtop QTOF instrument from Bruker Daltonics, and its use in shotgun proteomics. This QTOF instrument is a member of an instrument family first introduced in 2008, which consists of the compact, the impact, and the maXis. The original impact was introduced in 2011 and was followed by the impact HD, which was equipped with a better digitizer, expanding the dynamic range of the detector. With the impact II, which became commercially available in 2014, we aimed to achieve a resolution and sequencing speed adequate for demanding shotgun proteomics experiments. To achieve this we developed an improved collision cell, orthogonal accelerator scheme, reflectron, and detector. Here we measure ion transmission characteristics of this instrument and the actually realized resolution and mass accuracy in typical proteomics experiments. Furthermore, we investigated the attainable proteome coverage in single shot analysis and we ask if QTOF performance is now sufficient for very deep characterization of complex cell line and tissue proteomes.  相似文献   

18.
A decoding algorithm is tested that mechanistically models the progressive alignments that arise as the mRNA moves past the rRNA tail during translation elongation. Each of these alignments provides an opportunity for hybridization between the single-stranded, -terminal nucleotides of the 16S rRNA and the spatially accessible window of mRNA sequence, from which a free energy value can be calculated. Using this algorithm we show that a periodic, energetic pattern of frequency 1/3 is revealed. This periodic signal exists in the majority of coding regions of eubacterial genes, but not in the non-coding regions encoding the 16S and 23S rRNAs. Signal analysis reveals that the population of coding regions of each bacterial species has a mean phase that is correlated in a statistically significant way with species () content. These results suggest that the periodic signal could function as a synchronization signal for the maintenance of reading frame and that codon usage provides a mechanism for manipulation of signal phase.[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32]  相似文献   

19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号