首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Based on conventional data-dependent acquisition strategy of shotgun proteomics, we present a new workflow DeMix, which significantly increases the efficiency of peptide identification for in-depth shotgun analysis of complex proteomes. Capitalizing on the high resolution and mass accuracy of Orbitrap-based tandem mass spectrometry, we developed a simple deconvolution method of “cloning” chimeric tandem spectra for cofragmented peptides. Additional to a database search, a simple rescoring scheme utilizes mass accuracy and converts the unwanted cofragmenting events into a surprising advantage of multiplexing. With the combination of cloning and rescoring, we obtained on average nine peptide-spectrum matches per second on a Q-Exactive workbench, whereas the actual MS/MS acquisition rate was close to seven spectra per second. This efficiency boost to 1.24 identified peptides per MS/MS spectrum enabled analysis of over 5000 human proteins in single-dimensional LC-MS/MS shotgun experiments with an only two-hour gradient. These findings suggest a change in the dominant “one MS/MS spectrum - one peptide” paradigm for data acquisition and analysis in shotgun data-dependent proteomics. DeMix also demonstrated higher robustness than conventional approaches in terms of lower variation among the results of consecutive LC-MS/MS runs.Shotgun proteomics analysis based on a combination of high performance liquid chromatography and tandem mass spectrometry (MS/MS) (1) has achieved remarkable speed and efficiency (27). In a single four-hour long high performance liquid chromatography-MS/MS run, over 40,000 peptides and 5000 proteins can be identified using a high-resolution Orbitrap mass spectrometer with data-dependent acquisition (DDA)1 (2, 3). However, in a typical LC-MS analysis of unfractionated human cell lysate, over 100,000 individual peptide isotopic patterns can be detected (4), which corresponds to simultaneous elution of hundreds of peptides. With this complexity, a mass spectrometer needs to achieve ≥25 Hz MS/MS acquisition rate to fully sample all the detectable peptides, and ≥17 Hz to cover reasonably abundant ones (4). Although this acquisition rate is reachable by modern time-of-flight (TOF) instruments, the reported DDA identification results do not encompass all expected peptides. Recently, the next-generation Orbitrap instrument, working at 20 Hz MS/MS acquisition rate, demonstrated nearly full profiling of yeast proteome using an 80 min gradient, which opened the way for comprehensive analysis of human proteome in a time efficient manner (5).During the high performance liquid chromatography-MS/MS DDA analysis of complex samples, high density of co-eluting peptides results in a high probability for two or more peptides to overlap within an MS/MS isolation window. With the commonly used ±1.0–2.0 Th isolation windows, most MS/MS spectra are chimeric (4, 810), with cofragmenting precursors being naturally multiplexed. However, as has been discussed previously (9, 10), the cofragmentation events are currently ignored in most of the conventional analysis workflows. According to the prevailing assumption of “one MS/MS spectrum–one peptide,” chimeric MS/MS spectra are generally unwelcome in DDA, because the product ions from different precursors may interfere with the assignment of MS/MS fragment identities, increasing the rate of false discoveries in database search (8, 9). In some studies, the precursor isolation width was set as narrow as ±0.35 Th to prevent unwanted ions from being coselected, fragmented or detected (4, 5).On the contrary, multiplexing by cofragmentation is considered to be one of the solid advantages in data-independent acquisition (DIA) (1013). In several commonly used DIA methods, the precursor ion selection windows are set much wider than in DDA: from 25 Th as in SWATH (12), to extremely broad range as in AIF (13). In order to use the benefit of MS/MS multiplexing in DDA, several approaches have been proposed to deconvolute chimeric MS/MS spectra. In “alternative peptide identification” method implemented in Percolator (14), a machine learning algorithm reranks and rescores peptide-spectrum matches (PSMs) obtained from one or more MS/MS search engines. But the deconvolution in Percolator is limited to cofragmented peptides with masses differing from the target peptide by the tolerance of the database search, which can be as narrow as a few ppm. The “active demultiplexing” method proposed by Ledvina et al. (15) actively separates MS/MS data from several precursors using masses of complementary fragments. However, higher-energy collisional dissociation often produces MS/MS spectra with too few complementary pairs for reliable peptide identification. The “MixDB” method introduces a sophisticated new search engine, also with a machine learning algorithm (9). And the “second peptide identification” method implemented in Andromeda/MaxQuant workflow (16) submits the same dataset to the search engine several times based on the list of chromatographic peptide features, subtracting assigned MS/MS peaks after each identification round. This approach is similar to the ProbIDTree search engine that also performed iterative identification while removing assigned peaks after each round of identification (17).One important factor for spectral deconvolution that has not been fully utilized in most conventional workflows is the excellent mass accuracy achievable with modern high-resolution mass spectrometry (18). An Orbitrap Fourier-transform mass spectrometer can provide mass accuracy in the range of hundreds of ppb (parts per billion) for mass peaks with high signal-to-noise (S/N) ratio (19). However, the mass error of peaks with lower S/N ratios can be significantly higher and exceed 1 ppm. Despite this dependence of the mass accuracy from the S/N level, most MS and MS/MS search engines only allow users to set hard cut-off values for the mass error tolerances. Moreover, some search engines do not provide the option of choosing a relative error tolerance for MS/MS fragments. Such negligent treatment of mass accuracy reduces the analytical power of high accuracy experiments (18).Identification results coming from different MS/MS search engines are sometimes not consistent because of different statistical assumptions used in scoring PSMs. Introduction of tools integrating the results of different search engines (14, 20, 21) makes the data interpretation even more complex and opaque for the user. The opposite trend—simplification of MS/MS data interpretation—is therefore a welcome development. For example, an extremely straightforward algorithm recently proposed by Wenger et al. (22) demonstrated a surprisingly high performance in peptide identification, even though it is only marginally more complex than simply counting the number of matches of theoretical fragment peaks in high resolution MS/MS, without any a priori statistical assumption.In order to take advantage of natural multiplexing of MS/MS spectra in DDA, as well as properly utilize high accuracy of Orbitrap-based mass spectrometry, we developed a simple and robust data analysis workflow DeMix. It is presented in Fig. 1 as an expansion of the conventional workflow. Principles of some of the processes used by the workflow are borrowed from other approaches, including the custom-made mass peak centroiding (20), chromatographic feature detection (19, 20), and two-pass database search with the first limited pass to provide a “software lock mass” for mass scale recalibration (23).Open in a separate windowFig. 1.An overview of the DeMix workflow that expands the conventional workflow, shown by the dashed line. Processes are colored in purple for TOPP, red for search engine (Morpheus/Mascot/MS-GF+), and blue for in-house programs.In DeMix workflow, the deconvolution of chimeric MS/MS spectra consists of simply “cloning” an MS/MS spectrum if a potential cofragmented peptide is detected. The list of candidate peptide precursors is generated from chromatographic feature detection, as in the MaxQuant/Andromeda workflow (16, 19), but using The OpenMS Proteomics Pipeline (TOPP) (20, 24). During the cloning, the precursor is replaced by the new candidate, but no changes in the MS/MS fragment list are made, and therefore the cloned MS/MS spectra remain chimeric. Processing such spectra requires a search engine tolerant to the presence of unassigned peaks, as such peaks are always expected when multiple precursors cofragment. Thus, we chose Morpheus (22) as a search engine. Based on the original search algorithm, we implement a reformed scoring scheme: Morpheus-AS (advanced scoring). It inherits all the basic principles from Morpheus but deeper utilizes the high mass accuracy of the data. This kind of database search removes the necessity of spectral processing for physical separation of MS/MS data into multiple subspectra (15), or consecutive subtraction of peaks (16, 17).Despite the fact that DeMix workflow is largely a combination of known approaches, it provides remarkable improvement compared with the state-of-the-art. On our Orbitrap Q-Exactive workbench, testing on a benchmark dataset of two-hour single-dimension LC-MS/MS experiments from HeLa cell lysate, we identified on average 1.24 peptide per MS/MS spectrum, breaking the “one MS/MS spectrum–one peptide” paradigm on the level of whole data set. At 1% false discovery rate (FDR), we obtained on average nine PSMs per second (at the actual acquisition rate of ca. seven MS/MS spectra per second), and detected 40 human proteins per minute.  相似文献   

2.
Isobaric labeling techniques coupled with high-resolution mass spectrometry have been widely employed in proteomic workflows requiring relative quantification. For each high-resolution tandem mass spectrum (MS/MS), isobaric labeling techniques can be used not only to quantify the peptide from different samples by reporter ions, but also to identify the peptide it is derived from. Because the ions related to isobaric labeling may act as noise in database searching, the MS/MS spectrum should be preprocessed before peptide or protein identification. In this article, we demonstrate that there are a lot of high-frequency, high-abundance isobaric related ions in the MS/MS spectrum, and removing isobaric related ions combined with deisotoping and deconvolution in MS/MS preprocessing procedures significantly improves the peptide/protein identification sensitivity. The user-friendly software package TurboRaw2MGF (v2.0) has been implemented for converting raw TIC data files to mascot generic format files and can be downloaded for free from https://github.com/shengqh/RCPA.Tools/releases as part of the software suite ProteomicsTools. The data have been deposited to the ProteomeXchange with identifier PXD000994.Mass spectrometry-based proteomics has been widely applied to investigate protein mixtures derived from tissue, cell lysates, or from body fluids (1, 2). Liquid chromatography coupled to tandem mass spectrometry (LC-MS/MS)1 is the most popular strategy for protein/peptide mixtures analysis in shotgun proteomics (3). Large-scale protein/peptide mixtures are separated by liquid chromatography followed by online detection by tandem mass spectrometry. The capabilities of proteomics rely greatly on the performance of the mass spectrometer. With the improvement of MS technology, proteomics has benefited significantly from the high-resolution and excellent mass accuracy (4). In recent years, based on the higher efficiency of higher energy collision dissociation (HCD), a new “high–high” strategy (high-resolution MS as well as MS/MS(tandem MS)) has been applied instead of the “high–low” strategy (high-resolution MS, i.e. in Orbitrap, and low-resolution MS/MS, i.e. in ion trap) to obtain high quality tandem MS/MS data as well as full MS in shotgun proteomics. Both full MS scans and MS/MS scans can be performed, and the whole cycle time of MS detection is very compatible with the chromatographic time scale (5).High-resolution measurement is one of the most important features in mass spectrometric application. In this high–high strategy, high-resolution and accurate spectra will be achieved in tandem MS/MS scans as well as full MS scans, which makes isotopic peaks distinguishable from one another, thus enabling the easy calculation of precise charge states and monoisotopic mass. During an LC-MS/MS experiment, a multiply charged precursor ion (peptide) is usually isolated and fragmented, and then the multiple charge states of the fragment ions are generated and collected. After full extraction of peak lists from original tandem mass spectra, the commonly used search engines (i.e. Mascot (6), Sequest (7)) have no capability to distinguish isotopic peaks and recognize charge states, so all of the product ions are considered as all charge state hypotheses during the database search for protein identification. These multiple charge states of fragment ions and their isotopic cluster peaks can be incorrectly assigned by the search engine, which can cause false peptide identification. To overcome this issue, data preprocessing of the high-resolution MS/MS spectra is required before submitting them for identification. There are usually two major preprocessing steps used for high-resolution MS/MS data: deisotoping and deconvolution (8, 9). Deisotoping of spectra removes all isotopic peaks except monoisotopic peaks from multi-isotopic peaks. Deconvolution of spectra translates multiply charged ions to singly charged ions and also accumulates the intensity of fragment ions by summing up all the intensities from their multiply charged states. After performing these two data-preprocessing steps, the resulting spectra is simpler and cleaner and allows more precise database searching and accurate bioinformatics analysis.With the capacity to analyze multiple samples simultaneously, stable isotope labeling approaches have been widely used in quantitative proteomics. Stable isotope labeling approaches are categorized as metabolic labeling (SILAC, stable isotope labeling by amino acids in cell culture) and chemical labeling (10, 11). The peptides labeled by the SILAC approach are quantified by precursor ions in full MS spectra, whereas peptides that have been isobarically labeled using chemical means are quantified by reporter ions in MS/MS spectra. There are two similar isobaric chemical labeling methods: (1) isobaric tag for relative and absolute quantification (iTRAQ), and (2) tandem mass tag (TMT) (12, 13). These reagents contain an amino-reactive group that specifically reacts with N-terminal amino groups and epilson-amino groups of lysine residues to label digested peptides in a typical shotgun proteomics experiment. There are four different channels of isobaric tags: TMT two-plex, iTRAQ four-plex, TMT six-plex, and iTRAQ eight-plex (1216). The number before “plex” denotes the number of samples that can be analyzed by the mass spectrum simultaneously. Peptides labeled with different isotopic variants of the tag show identical or similar mass and appear as a single peak in full scans. This single peak may be selected for subsequent MS/MS analysis. In an MS/MS scan, the mass of reporter ions (114 to 117 for iTRAQ four-plex, 113 to 121 for iTRAQ eight-plex, and 126 to 131for TMT six-plex upon CID or HCD activation) are associated with corresponding samples, and the intensities represent the relative abundances of the labeled peptides. Meanwhile, the other ions from the MS/MS spectra can be used for peptide identification. Because of the multiplexing capability, isobaric labeling methods combined with bottom-up proteomics have been widely applied for accurate quantification of proteins on a global scale (14, 1719). Although mostly associated with peptide labeling, these isobaric labeling methods have also been applied at protein level (2023).For the proteomic analysis of isobarically labeled peptides/proteins in “high–high” MS strategy, the common consensus is that accurate reporter ions can contribute to more accurate quantification. However, there is no evidence to show how the ions related to isobaric labeling affect the peptide/protein identification and what preprocessing steps should be taken for high-resolution isobarically labeled MS/MS. To demonstrate the effectiveness and importance of preprocessing, we examined how the combination of preprocessing steps improved peptide/protein sensitivity in database searching. Several combinatorial ways of data-preprocessing were applied for high-throughput data analysis including deisotoping to keep simple monoisotopic mass peaks, deconvolution of ions with multiple charge states, and preservation of top 10 peaks in every 100 Dalton mass range. After systematic analysis of high-resolution isobarically labeled spectra, we further processed the spectra and removed interferential ions that were not related to the peptide. Our results suggested that the preprocessing of isobarically labeled high-resolution tandem mass spectra significantly improved the peptide/protein identification sensitivity.  相似文献   

3.
Database search programs are essential tools for identifying peptides via mass spectrometry (MS) in shotgun proteomics. Simultaneously achieving high sensitivity and high specificity during a database search is crucial for improving proteome coverage. Here we present JUMP, a new hybrid database search program that generates amino acid tags and ranks peptide spectrum matches (PSMs) by an integrated score from the tags and pattern matching. In a typical run of liquid chromatography coupled with high-resolution tandem MS, more than 95% of MS/MS spectra can generate at least one tag, whereas the remaining spectra are usually too poor to derive genuine PSMs. To enhance search sensitivity, the JUMP program enables the use of tags as short as one amino acid. Using a target-decoy strategy, we compared JUMP with other programs (e.g. SEQUEST, Mascot, PEAKS DB, and InsPecT) in the analysis of multiple datasets and found that JUMP outperformed these preexisting programs. JUMP also permitted the analysis of multiple co-fragmented peptides from “mixture spectra” to further increase PSMs. In addition, JUMP-derived tags allowed partial de novo sequencing and facilitated the unambiguous assignment of modified residues. In summary, JUMP is an effective database search algorithm complementary to current search programs.Peptide identification by tandem mass spectra is a critical step in mass spectrometry (MS)-based1 proteomics (1). Numerous computational algorithms and software tools have been developed for this purpose (26). These algorithms can be classified into three categories: (i) pattern-based database search, (ii) de novo sequencing, and (iii) hybrid search that combines database search and de novo sequencing. With the continuous development of high-performance liquid chromatography and high-resolution mass spectrometers, it is now possible to analyze almost all protein components in mammalian cells (7). In contrast to rapid data collection, it remains a challenge to extract accurate information from the raw data to identify peptides with low false positive rates (specificity) and minimal false negatives (sensitivity) (8).Database search methods usually assign peptide sequences by comparing MS/MS spectra to theoretical peptide spectra predicted from a protein database, as exemplified in SEQUEST (9), Mascot (10), OMSSA (11), X!Tandem (12), Spectrum Mill (13), ProteinProspector (14), MyriMatch (15), Crux (16), MS-GFDB (17), Andromeda (18), BaMS2 (19), and Morpheus (20). Some other programs, such as SpectraST (21) and Pepitome (22), utilize a spectral library composed of experimentally identified and validated MS/MS spectra. These methods use a variety of scoring algorithms to rank potential peptide spectrum matches (PSMs) and select the top hit as a putative PSM. However, not all PSMs are correctly assigned. For example, false peptides may be assigned to MS/MS spectra with numerous noisy peaks and poor fragmentation patterns. If the samples contain unknown protein modifications, mutations, and contaminants, the related MS/MS spectra also result in false positives, as their corresponding peptides are not in the database. Other false positives may be generated simply by random matches. Therefore, it is of importance to remove these false PSMs to improve dataset quality. One common approach is to filter putative PSMs to achieve a final list with a predefined false discovery rate (FDR) via a target-decoy strategy, in which decoy proteins are merged with target proteins in the same database for estimating false PSMs (2326). However, the true and false PSMs are not always distinguishable based on matching scores. It is a problem to set up an appropriate score threshold to achieve maximal sensitivity and high specificity (13, 27, 28).De novo methods, including Lutefisk (29), PEAKS (30), NovoHMM (31), PepNovo (32), pNovo (33), Vonovo (34), and UniNovo (35), identify peptide sequences directly from MS/MS spectra. These methods can be used to derive novel peptides and post-translational modifications without a database, which is useful, especially when the related genome is not sequenced. High-resolution MS/MS spectra greatly facilitate the generation of peptide sequences in these de novo methods. However, because MS/MS fragmentation cannot always produce all predicted product ions, only a portion of collected MS/MS spectra have sufficient quality to extract partial or full peptide sequences, leading to lower sensitivity than achieved with the database search methods.To improve the sensitivity of the de novo methods, a hybrid approach has been proposed to integrate peptide sequence tags into PSM scoring during database searches (36). Numerous software packages have been developed, such as GutenTag (37), InsPecT (38), Byonic (39), DirecTag (40), and PEAKS DB (41). These methods use peptide tag sequences to filter a protein database, followed by error-tolerant database searching. One restriction in most of these algorithms is the requirement of a minimum tag length of three amino acids for matching protein sequences in the database. This restriction reduces the sensitivity of the database search, because it filters out some high-quality spectra in which consecutive tags cannot be generated.In this paper, we describe JUMP, a novel tag-based hybrid algorithm for peptide identification. The program is optimized to balance sensitivity and specificity during tag derivation and MS/MS pattern matching. JUMP can use all potential sequence tags, including tags consisting of only one amino acid. When we compared its performance to that of two widely used search algorithms, SEQUEST and Mascot, JUMP identified ∼30% more PSMs at the same FDR threshold. In addition, the program provides two additional features: (i) using tag sequences to improve modification site assignment, and (ii) analyzing co-fragmented peptides from mixture MS/MS spectra.  相似文献   

4.
Quantitative analysis of discovery-based proteomic workflows now relies on high-throughput large-scale methods for identification and quantitation of proteins and post-translational modifications. Advancements in label-free quantitative techniques, using either data-dependent or data-independent mass spectrometric acquisitions, have coincided with improved instrumentation featuring greater precision, increased mass accuracy, and faster scan speeds. We recently reported on a new quantitative method called MS1 Filtering (Schilling et al. (2012) Mol. Cell. Proteomics 11, 202–214) for processing data-independent MS1 ion intensity chromatograms from peptide analytes using the Skyline software platform. In contrast, data-independent acquisitions from MS2 scans, or SWATH, can quantify all fragment ion intensities when reference spectra are available. As each SWATH acquisition cycle typically contains an MS1 scan, these two independent label-free quantitative approaches can be acquired in a single experiment. Here, we have expanded the capability of Skyline to extract both MS1 and MS2 ion intensity chromatograms from a single SWATH data-independent acquisition in an Integrated Dual Scan Analysis approach. The performance of both MS1 and MS2 data was examined in simple and complex samples using standard concentration curves. Cases of interferences in MS1 and MS2 ion intensity data were assessed, as were the differentiation and quantitation of phosphopeptide isomers in MS2 scan data. In addition, we demonstrated an approach for optimization of SWATH m/z window sizes to reduce interferences using MS1 scans as a guide. Finally, a correlation analysis was performed on both MS1 and MS2 ion intensity data obtained from SWATH acquisitions on a complex mixture using a linear model that automatically removes signals containing interferences. This work demonstrates the practical advantages of properly acquiring and processing MS1 precursor data in addition to MS2 fragment ion intensity data in a data-independent acquisition (SWATH), and provides an approach to simultaneously obtain independent measurements of relative peptide abundance from a single experiment.Mass spectrometry is the leading technology for large-scale identification and quantitation of proteins and post-translational modifications (PTMs)1 in biological systems (1, 2). Although several types of experimental designs are employed in such workflows, most large-scale applications use data-dependent acquisitions (DDA) where peptide precursors are first identified in the MS1 scan and one or more peaks are then selected for subsequent fragmentation to generate their corresponding MS2 spectra. In experiments using DDA, one can employ either chemical/metabolic labeling or label-free strategies for relative quantitation of peptides (and proteins) (3, 4). Depending on the type of labeling approach employed, i.e. metabolic labeling with SILAC or postmetabolic labeling with ICAT or isobaric tags such as iTRAQ or TMT, the relative quantitation of these peptides are made using either MS1 or MS2 ion intensity data (47). Label-free quantitative techniques have until recently been based entirely on integrated ion intensity measurements of precursors in the MS1 scan, or in the case of spectral counting the number of assigned MS2 spectra (3, 8, 9).Label-free approaches have recently generated more widespread interest (1012), in part because of their adaptability to a wide range of proteomic workflows, including human samples that are not amenable to most metabolic labeling techniques, or where chemical labeling may be cost prohibitive and/or interfere with subsequent enrichment steps (11, 13). However the use of DDA for label-free quantitation is also susceptible to several limitations including insufficient reproducibility because of under-sampling, digestion efficiency, as well as misidentifications (14, 15). Moreover, low ion abundance may prohibit peptide selection, especially in complex samples (14). These limitations often present challenges in data analysis when making comparisons across samples, or when a peptide is sampled in only one of the study conditions.To address the challenges in obtaining more comprehensive sampling in MS1 space, Purvine et al. first demonstrated the ability to obtain sequence information from peptides fragmented across the entire m/z range using “shotgun or parallel collision-induced dissociation (CID)” on an orthogonal time of flight instrument (16). Shortly thereafter Venable et al. reported on a data independent acquisition methodology to limit the complexity of the MS2 scan by using a segmented approach for the sequential isolation and fragmentation of all peptides in a defined precursor window (e.g. 10 m/z) using an ion trap mass spectrometer (17). However, the proper implementation of this DIA technique suffered from technical limitations of instruments available at that time, including slow acquisition rates and low MS2 resolution that made systematic product ion extraction problematic. To alleviate the challenge of long duty cycles in DIAs, researchers at the Waters Corporation adopted an alternative approach by rapidly switching between low (MS1) and high energy (MS2) scans and then using proprietary software to align peptide precursor and fragment ion information to determine peptide sequences (18, 19). Recent mass spectrometry innovations in efficient high-speed scanning capabilities, together with high-resolution data acquisition of both MS1 and MS2 scans, and multiplexing of scan windows have overcome many of these limitations (10, 20, 21). Moreover, the simultaneous development of novel software solutions for extracting ion intensity chromatograms based on spectral libraries has enabled the use of DIA for large-scale label free quantitation of multiple peptide analytes (21, 22). In addition to targeting specific peptides from a previously generated peptide spectral library, the data can also be reexamined (i.e. post-acquisition) for additional peptides of interest as new reference data emerges. On the SCIEX TripleTOF 5600, a quadrupole orthogonal time-of-flight mass spectrometer, this technique has been optimized and extended to what is called ‘SWATH MS2′ based on a combination of new technical and software improvements (10, 22).In a DIA experiment a MS1 survey scan is carried out across the mass range followed by a SWATH MS2 acquisition series, however the cycle time of the MS1 scan is dramatically shortened compared with DDA type experiments. The Q1 quadrupole is set to transmit a wider window, typically Δ25 m/z, to the collision cell in incremental steps over the full mass range. Therefore the MS/MS spectra produced during a SWATH MS2 acquisition are of much greater complexity as the MS/MS spectra are a composite of all fragment ions produced from peptide analytes with molecular ions within the selected MS1 m/z window. The cycle of data independent MS1 survey scans and SWATH MS2 scans is repeated throughout the entire LC-MS acquisition. Fragment ion information contained in these SWATH MS2 spectra can be used to uniquely identify specific peptides by comparisons to reference spectra or spectral libraries. Moreover, ion intensities of these fragment ions can also be used for quantitation. Although MS2 typically increases selectivity and reduces the chemical noise often observed in MS1 scans, quantifying peptides from SWATH MS2 scans can be problematic because of the presence of interferences in one or more fragment ions or decreased ion intensity of MS2 scans as compared with the MS1 precursor ion abundance.To partially alleviate some of these limitations in SWATH MS2 scan quantitation it is potentially advantageous to exploit MS1 ion intensity data, which is acquired independently as part of each SWATH scan cycle. Recently, our laboratories and others have developed label free quantitation tools for data dependent acquisitions (11, 12, 23) using MS1 ion intensity data. For example, the MS1 Filtering algorithm uses expanded features in the open source software application Skyline (11, 24). Skyline MS1 Filtering processes precursor ion intensity chromatograms of peptide analytes from full scan mass spectral data acquired during data dependent acquisitions by LC MS/MS. New graphical tools were developed within Skyline to enable visual inspection and manual interrogation and integration of extracted ion chromatograms across multiple acquisitions. MS1 Filtering was subsequently shown to have excellent linear response across several orders of magnitude with limits of detection in the low attomole range (11). We, and others, have demonstrated the utility of this method for carrying out large-scale quantitation of peptide analytes across a range of applications (2528). However, quantifying peptides based on MS1 precursor ion intensities can be compromised by a low signal-to-noise ratio. This is particularly the case when quantifying low abundance peptides in a complex sample where the MS1 ion “background” signal is high, or when chromatograms contain interferences, or partial overlap of multiple target precursor ions.Currently MS1 scans are underutilized or even deemphasized by some vendors during DIA workflows. However, we believe an opportunity exists that would improve data-independent acquisitions (DIA) experiments by including MS1 ion intensity data in the final data processing of LC-MS/MS acquisitions. Therefore, to address this possibility, we have adapted Skyline to efficiently extract and process both precursor and product ion chromatograms for label free quantitation across multiple samples. The graphical tools and features originally developed for SRM and MS1 Filtering experiments have been expanded to process DIA data sets from multiple vendors including SCIEX, Thermo, Waters, Bruker, and Agilent. These expanded features provide a single platform for data mining of targeted proteomics using both the MS1 and MS2 scans that we call Integrated Dual Scan Analysis, or IDSA. As a test of this approach, a series of SWATH MS2 acquisitions of simple and complex mixtures was analyzed on an SCIEX TripleTOF 5600 mass spectrometer. We also investigated the use of MS2 scans for differentiating a case of phosphopeptide isomers that are indistinguishable at the MS1 level. In addition, we investigated whether smaller SWATH m/z windows would provide more reliable quantitative data in these cases by reducing the number of potential interferences. Lastly, we performed a statistical assessment of the accuracy and reproducibility of the estimated (log) fold change of mitochondrial lysates from mouse liver at different concentration levels to better assess the overall value of acquiring MS1 and MS2 data in combination and as independent measurements during DIA experiments.  相似文献   

5.
The success of high-throughput proteomics hinges on the ability of computational methods to identify peptides from tandem mass spectra (MS/MS). However, a common limitation of most peptide identification approaches is the nearly ubiquitous assumption that each MS/MS spectrum is generated from a single peptide. We propose a new computational approach for the identification of mixture spectra generated from more than one peptide. Capitalizing on the growing availability of large libraries of single-peptide spectra (spectral libraries), our quantitative approach is able to identify up to 98% of all mixture spectra from equally abundant peptides and automatically adjust to varying abundance ratios of up to 10:1. Furthermore, we show how theoretical bounds on spectral similarity avoid the need to compare each experimental spectrum against all possible combinations of candidate peptides (achieving speedups of over five orders of magnitude) and demonstrate that mixture-spectra can be identified in a matter of seconds against proteome-scale spectral libraries. Although our approach was developed for and is demonstrated on peptide spectra, we argue that the generality of the methods allows for their direct application to other types of spectral libraries and mixture spectra.The success of tandem MS (MS/MS1) approaches to peptide identification is partly due to advances in computational techniques allowing for the reliable interpretation of MS/MS spectra. Mainstream computational techniques mainly fall into two categories: database search approaches that score each spectrum against peptides in a sequence database (14) or de novo techniques that directly reconstruct the peptide sequence from each spectrum (58). The combination of these methods with advances in high-throughput MS/MS have promoted the accelerated growth of spectral libraries, collections of peptide MS/MS spectra the identification of which were validated by accepted statistical methods (9, 10) and often also manually confirmed by mass spectrometry experts. The similar concept of spectral archives was also recently proposed to denote spectral libraries including “interesting” nonidentified spectra (11) (i.e. recurring spectra with good de novo reconstructions but no database match). The growing availability of these large collections of MS/MS spectra has reignited the development of alternative peptide identification approaches based on spectral matching (1214) and alignment (1517) algorithms.However, mainstream approaches were developed under the (often unstated) assumption that each MS/MS spectrum is generated from a single peptide. Although chromatographic procedures greatly contribute to making this a reasonable assumption, there are several situations where it is difficult or even impossible to separate pairs of peptides. Examples include certain permutations of the peptide sequence or post-translational modifications (see (18) for examples of co-eluting histone modification variants). In addition, innovative experimental setups have demonstrated the potential for increased throughput in peptide identification using mixture spectra; examples include data-independent acquisition (19) ion-mobility MS (20), and MSE strategies (21).To alleviate the algorithmic bottleneck in such scenarios, we describe a computational approach, M-SPLIT (mixture-spectrum partitioning using library of identified tandem mass spectra), that is able to reliably and efficiently identify peptides from mixture spectra, which are generated from a pair of peptides. In brief, a mixture spectrum is modeled as linear combination of two single-peptide spectra, and peptide identification is done by searching against a spectral library. We show that efficient filtration and accurate branch-and-bound strategies can be used to avoid the huge computational cost of searching all possible pairs. Thus equipped, our approach is able to identify the correct matches by considering only a minuscule fraction of all possible matches. Beyond potentially enhancing the identification capabilities of current MS/MS acquisition setups, we argue that the availability of methods to reliably identify MS/MS spectra from mixtures of peptides could enable the collection of MS/MS data using accelerated chromatography setups to obtain the same or better peptide identification results in a fraction of the experimental time currently required for exhaustive peptide separation.  相似文献   

6.
Top-down mass spectrometry (MS)-based proteomics is arguably a disruptive technology for the comprehensive analysis of all proteoforms arising from genetic variation, alternative splicing, and posttranslational modifications (PTMs). However, the complexity of top-down high-resolution mass spectra presents a significant challenge for data analysis. In contrast to the well-developed software packages available for data analysis in bottom-up proteomics, the data analysis tools in top-down proteomics remain underdeveloped. Moreover, despite recent efforts to develop algorithms and tools for the deconvolution of top-down high-resolution mass spectra and the identification of proteins from complex mixtures, a multifunctional software platform, which allows for the identification, quantitation, and characterization of proteoforms with visual validation, is still lacking. Herein, we have developed MASH Suite Pro, a comprehensive software tool for top-down proteomics with multifaceted functionality. MASH Suite Pro is capable of processing high-resolution MS and tandem MS (MS/MS) data using two deconvolution algorithms to optimize protein identification results. In addition, MASH Suite Pro allows for the characterization of PTMs and sequence variations, as well as the relative quantitation of multiple proteoforms in different experimental conditions. The program also provides visualization components for validation and correction of the computational outputs. Furthermore, MASH Suite Pro facilitates data reporting and presentation via direct output of the graphics. Thus, MASH Suite Pro significantly simplifies and speeds up the interpretation of high-resolution top-down proteomics data by integrating tools for protein identification, quantitation, characterization, and visual validation into a customizable and user-friendly interface. We envision that MASH Suite Pro will play an integral role in advancing the burgeoning field of top-down proteomics.With well-developed algorithms and computational tools for mass spectrometry (MS)1 data analysis, peptide-based bottom-up proteomics has gained considerable popularity in the field of systems biology (19). Nevertheless, the bottom-up approach is suboptimal for the analysis of protein posttranslational modifications (PTMs) and sequence variants as a result of protein digestion (10). Alternatively, the protein-based top-down proteomics approach analyzes intact proteins, which provides a “bird''s eye” view of all proteoforms (11), including those arising from sequence variations, alternative splicing, and diverse PTMs, making it a disruptive technology for the comprehensive analysis of proteoforms (1224). However, the complexity of top-down high-resolution mass spectra presents a significant challenge for data analysis. In contrast to the well-developed software packages available for processing data from bottom-up proteomics experiments, the data analysis tools in top-down proteomics remain underdeveloped.The initial step in the analysis of top-down proteomics data is deconvolution of high-resolution mass and tandem mass spectra. Thorough high-resolution analysis of spectra by horn (THRASH), which was the first algorithm developed for the deconvolution of high-resolution mass spectra (25), is still widely used. THRASH automatically detects and evaluates individual isotopomer envelopes by comparing the experimental isotopomer envelope with a theoretical envelope and reporting those that score higher than a user-defined threshold. Another commonly used algorithm, MS-Deconv, utilizes a combinatorial approach to address the difficulty of grouping MS peaks from overlapping isotopomer envelopes (26). Recently, UniDec, which employs a Bayesian approach to separate mass and charge dimensions (27), can also be applied to the deconvolution of high-resolution spectra. Although these algorithms assist in data processing, unfortunately, the deconvolution results often contain a considerable amount of misassigned peaks as a consequence of the complexity of the high-resolution MS and MS/MS data generated in top-down proteomics experiments. Errors such as these can undermine the accuracy of protein identification and PTM localization and, thus, necessitate the implementation of visual components that allow for the validation and manual correction of the computational outputs.Following spectral deconvolution, a typical top-down proteomics workflow incorporates identification, quantitation, and characterization of proteoforms; however, most of the recently developed data analysis tools for top-down proteomics, including ProSightPC (28, 29), Mascot Top Down (also known as Big-Mascot) (30), MS-TopDown (31), and MS-Align+ (32), focus almost exclusively on protein identification. ProSightPC was the first software tool specifically developed for top-down protein identification. This software utilizes “shotgun annotated” databases (33) that include all possible proteoforms containing user-defined modifications. Consequently, ProSightPC is not optimized for identifying PTMs that are not defined by the user(s). Additionally, the inclusion of all possible modified forms within the database dramatically increases the size of the database and, thus, limits the search speed (32). Mascot Top Down (30) is based on standard Mascot but enables database searching using a higher mass limit for the precursor ions (up to 110 kDa), which allows for the identification of intact proteins. Protein identification using Mascot Top Down is fundamentally similar to that used in bottom-up proteomics (34), and, therefore, it is somewhat limited in terms of identifying unexpected PTMs. MS-TopDown (31) employs the spectral alignment algorithm (35), which matches the top-down tandem mass spectra to proteins in the database without prior knowledge of the PTMs. Nevertheless, MS-TopDown lacks statistical evaluation of the search results and performs slowly when searching against large databases. MS-Align+ also utilizes spectral alignment for top-down protein identification (32). It is capable of identifying unexpected PTMs and allows for efficient filtering of candidate proteins when the top-down spectra are searched against a large protein database. MS-Align+ also provides statistical evaluation for the selection of proteoform spectrum match (PrSM) with high confidence. More recently, Top-Down Mass Spectrometry Based Proteoform Identification and Characterization (TopPIC) was developed (http://proteomics.informatics.iupui.edu/software/toppic/index.html). TopPIC is an updated version of MS-Align+ with increased spectral alignment speed and reduced computing requirements. In addition, MSPathFinder, developed by Kim et al., also allows for the rapid identification of proteins from top-down tandem mass spectra (http://omics.pnl.gov/software/mspathfinder) using spectral alignment. Although software tools employing spectral alignment, such as MS-Align+ and MSPathFinder, are particularly useful for top-down protein identification, these programs operate using command line, making them difficult to use for those with limited knowledge of command syntax.Recently, new software tools have been developed for proteoform characterization (36, 37). Our group previously developed MASH Suite, a user-friendly interface for the processing, visualization, and validation of high-resolution MS and MS/MS data (36). Another software tool, ProSight Lite, developed recently by the Kelleher group (37), also allows characterization of protein PTMs. However, both of these software tools require prior knowledge of the protein sequence for the effective localization of PTMs. In addition, both software tools cannot process data from liquid chromatography (LC)-MS and LC-MS/MS experiments, which limits their usefulness in large-scale top-down proteomics. Thus, despite these recent efforts, a multifunctional software platform enabling identification, quantitation, and characterization of proteins from top-down spectra, as well as visual validation and data correction, is still lacking.Herein, we report the development of MASH Suite Pro, an integrated software platform, designed to incorporate tools for protein identification, quantitation, and characterization into a single comprehensive package for the analysis of top-down proteomics data. This program contains a user-friendly customizable interface similar to the previously developed MASH Suite (36) but also has a number of new capabilities, including the ability to handle complex proteomics datasets from LC-MS and LC-MS/MS experiments, as well as the ability to identify unknown proteins and PTMs using MS-Align+ (32). Importantly, MASH Suite Pro also provides visualization components for the validation and correction of the computational outputs, which ensures accurate and reliable deconvolution of the spectra and localization of PTMs and sequence variations.  相似文献   

7.
In large-scale proteomic experiments, multiple peptide precursors are often cofragmented simultaneously in the same mixture tandem mass (MS/MS) spectrum. These spectra tend to elude current computational tools because of the ubiquitous assumption that each spectrum is generated from only one peptide. Therefore, tools that consider multiple peptide matches to each MS/MS spectrum can potentially improve the relatively low spectrum identification rate often observed in proteomics experiments. More importantly, data independent acquisition protocols promoting the cofragmentation of multiple precursors are emerging as alternative methods that can greatly improve the throughput of peptide identifications but their success also depends on the availability of algorithms to identify multiple peptides from each MS/MS spectrum. Here we address a fundamental question in the identification of mixture MS/MS spectra: determining the statistical significance of multiple peptides matched to a given MS/MS spectrum. We propose the MixGF generating function model to rigorously compute the statistical significance of peptide identifications for mixture spectra and show that this approach improves the sensitivity of current mixture spectra database search tools by a ≈30–390%. Analysis of multiple data sets with MixGF reveals that in complex biological samples the number of identified mixture spectra can be as high as 20% of all the identified spectra and the number of unique peptides identified only in mixture spectra can be up to 35.4% of those identified in single-peptide spectra.The advancement of technology and instrumentation has made tandem mass (MS/MS)1 spectrometry the leading high-throughput method to analyze proteins (1, 2, 3). In typical experiments, tens of thousands to millions of MS/MS spectra are generated and enable researchers to probe various aspects of the proteome on a large scale. Part of this success hinges on the availability of computational methods that can analyze the large amount of data generated from these experiments. The classical question in computational proteomics asks: given an MS/MS spectrum, what is the peptide that generated the spectrum? However, it is increasingly being recognized that this assumption that each MS/MS spectrum comes from only one peptide is often not valid. Several recent analyses show that as many as 50% of the MS/MS spectra collected in typical proteomics experiments come from more than one peptide precursor (4, 5). The presence of multiple peptides in mixture spectra can decrease their identification rate to as low as one half of that for MS/MS spectra generated from only one peptide (6, 7, 8). In addition, there have been numerous developments in data independent acquisition (DIA) technologies where multiple peptide precursors are intentionally selected to cofragment in each MS/MS spectrum (9, 10, 11, 12, 13, 14, 15). These emerging technologies can address some of the enduring disadvantages of traditional data-dependent acquisition (DDA) methods (e.g. low reproducibility (16)) and potentially increase the throughput of peptide identification 5–10 fold (4, 17). However, despite the growing importance of mixture spectra in various contexts, there are still only a few computational tools that can analyze mixture spectra from more than one peptide (18, 19, 20, 21, 8, 22). Our recent analysis indicated that current database search methods for mixture spectra still have relatively low sensitivity compared with their single-peptide counterpart and the main bottleneck is their limited ability to separate true matches from false positive matches (8). Traditionally problem of peptide identification from MS/MS spectra involves two sub-problems: 1) define a Peptide-Spectrum-Match (PSM) scoring function that assigns each MS/MS spectrum to the peptide sequence that most likely generated the spectrum; and 2) given a set of top-scoring PSMs, select a subset that corresponds to statistical significance PSMs. Here we focus on the second problem, which is still an ongoing research question even for the case of single-peptide spectra (23, 24, 25, 26). Intuitively the second problem is difficult because one needs to consider spectra across the whole data set (instead of comparing different peptide candidates against one spectrum as in the first problem) and PSM scoring functions are often not well-calibrated across different spectra (i.e. a PSM score of 50 may be good for one spectrum but poor for a different spectrum). Ideally, a scoring function will give high scores to all true PSMs and low scores to false PSMs regardless of the peptide or spectrum being considered. However, in practice, some spectra may receive higher scores than others simply because they have more peaks or their precursor mass results in more peptide candidates being considered from the sequence database (27, 28). Therefore, a scoring function that accounts for spectrum or peptide-specific effects can make the scores more comparable and thus help assess the confidence of identifications across different spectra. The MS-GF solution to this problem is to compute the per-spectrum statistical significance of each top-scoring PSM, which can be defined as the probability that a random peptide (out of all possible peptide within parent mass tolerance) will match to the spectrum with a score at least as high as that of the top-scoring PSM. This measures how good the current best match is in relation to all possible peptides matching to the same spectrum, normalizing any spectrum effect from the scoring function. Intuitively, our proposed MixGF approach extends the MS-GF approach to now calculate the statistical significance of the top pair of peptides matched from the database to a given mixture spectrum M (i.e. the significance of the top peptide–peptide spectrum match (PPSM)). As such, MixGF determines the probability that a random pair of peptides (out of all possible peptides within parent mass tolerance) will match a given mixture spectrum with a score at least as high as that of the top-scoring PPSM.Despite the theoretical attractiveness of computing statistical significance, it is generally prohibitive for any database search methods to score all possible peptides against a spectrum. Therefore, earlier works in this direction focus on approximating this probability by assuming the score distribution of all PSMs follows certain analytical form such as the normal, Poisson or hypergeometric distributions (29, 30, 31). In practice, because score distributions are highly data-dependent and spectrum-specific, these model assumptions do not always hold. Other approaches tried to learn the score distribution empirically from the data (29, 27). However, one is most interested in the region of the score distribution where only a small fraction of false positives are allowed (typically at 1% FDR). This usually corresponds to the extreme tail of the distribution where p values are on the order of 10−9 or lower and thus there is typically lack of sufficient data points to accurately model the tail of the score distribution (32). More recently, Kim et al. (24) and Alves et al. (33), in parallel, proposed a generating function approach to compute the exact score distribution of random peptide matches for any spectra without explicitly matching all peptides to a spectrum. Because it is an exact computation, no assumption is made about the form of score distribution and the tail of the distribution can be computed very accurately. As a result, this approach substantially improved the ability to separate true matches from false positive ones and lead to a significant increase in sensitivity of peptide identification over state-of-the-art database search tools in single-peptide spectra (24).For mixture spectra, it is expected that the scores for the top-scoring match will be even less comparable across different spectra because now more than one peptide and different numbers of peptides can be matched to each spectrum at the same time. We extend the generating function approach (24) to rigorously compute the statistical significance of multiple-Peptide-Spectrum Matches (mPSMs) and demonstrate its utility toward addressing the peptide identification problem in mixture spectra. In particular, we show how to extend the generating approach for mixture from two peptides. We focus on this relatively simple case of mixture spectra because it accounts for a large fraction of mixture spectra presented in traditional DDA workflows (5). This allows us to test and develop algorithmic concepts using readily-available DDA data because data with more complex mixture spectra such as those from DIA workflows (11) is still not widely available in public repositories.  相似文献   

8.
The field of proteomics has evolved hand-in-hand with technological advances in LC-MS/MS systems, now enabling the analysis of very deep proteomes in a reasonable time. However, most applications do not deal with full cell or tissue proteomes but rather with restricted subproteomes relevant for the research context at hand or resulting from extensive fractionation. At the same time, investigation of many conditions or perturbations puts a strain on measurement capacity. Here, we develop a high-throughput workflow capable of dealing with large numbers of low or medium complexity samples and specifically aim at the analysis of 96-well plates in a single day (15 min per sample). We combine parallel sample processing with a modified liquid chromatography platform driving two analytical columns in tandem, which are coupled to a quadrupole Orbitrap mass spectrometer (Q Exactive HF). The modified LC platform eliminates idle time between measurements, and the high sequencing speed of the Q Exactive HF reduces required measurement time. We apply the pipeline to the yeast chromatin remodeling landscape and demonstrate quantification of 96 pull-downs of chromatin complexes in about 1 day. This is achieved with only 500 μg input material, enabling yeast cultivation in a 96-well format. Our system retrieved known complex-members and the high throughput allowed probing with many bait proteins. Even alternative complex compositions were detectable in these very short gradients. Thus, sample throughput, sensitivity and LC/MS-MS duty cycle are improved severalfold compared with established workflows. The pipeline can be extended to different types of interaction studies and to other medium complexity proteomes.Shotgun proteomics is concerned with the identification and quantification of proteins (13). Prior to analysis, the proteins are digested into peptides, resulting in highly complex mixtures. To deal with this complexity, the peptides are separated by liquid chromatography followed by online analysis with mass spectrometry (MS), today facilitating the characterization of almost complete cell line proteomes in a short time (35). In addition to the characterization of entire proteomes, there is also a great demand for analyzing low or medium complexity samples. Given the trend toward a systems biology view, relatively larges sets of samples often have to be measured. One such category of lower complexity protein mixtures occurs in the determination of physical interaction partners of a protein of interest, which requires the identification and quantification of the proteins “pulled-down” or immunoprecipitated via a bait protein. Protein interactions are essential for almost all biological processes and orchestrate a cell''s behavior by regulating enzymes, forming macromolecular assemblies and functionalizing multiprotein complexes that are capable of more complex behavior than the sum of their parts. The human genome has almost 20,000 protein encoding genes, and it has been estimated that 80% of the proteins engage in complex interactions and that 130,000 to 650,000 protein interactions can take place in a human cell (6, 7). These numbers demonstrate a clear need for systematic and high-throughput mapping of protein–protein interactions (PPIs) to understand these complexes.The introduction of generic methods to detect PPIs, such as the yeast two-hybrid screen (Y2H) (8) or affinity purification combined with mass spectrometry (AP-MS)1 (9), have revolutionized the protein interactomics field. AP-MS in particular has emerged as an important tool to catalogue interactions with the aim of better understanding basic biochemical mechanisms in many different organisms (1017). It can be performed under near-physiological conditions and is capable of identifying functional protein complexes (18). In addition, the combination of affinity purification with quantitative mass spectrometry has greatly improved the discrimination of true interactors from unspecific background binders, a long-standing challenge in the AP-MS field (1921). Nowadays, quantitative AP-MS is employed to address many different biological questions, such as detection of dynamic changes in PPIs upon perturbation (2225) or the impact of posttranslational signaling on PPIs (26, 27). Recent developments even make it possible to provide abundances and stoichiometry information of the bait and prey proteins under study, combined with quantitative data from very deep cellular proteomes. Furthermore, sample preparation in AP-MS can now be performed in high-throughput formats capable of producing hundreds of samples per day. With such throughput in sample generation, the LC-MS/MS part of the AP-MS pipeline has become a major bottleneck for large studies, limiting throughput to a small fraction of the available samples. In principle, this limitation could be circumvented by multiplexing analysis via isotope-labeling strategies (28, 29) or by drastically reducing the measurement time per sample (3032). The former strategy requires exquisite control of the processing steps and has not been widely implemented yet. The latter strategy depends on mass spectrometers with sufficiently high sequencing speed to deal with the pull-down in a very short time. Since its introduction about 10 years ago (33), the Orbitrap mass spectrometer has featured ever-faster sequencing capabilities, with the Q Exactive HF now reaching a peptide sequencing speed of up to 17 Hz (34). This should now make it feasible to substantially lower the amount of time spent per measurement.Although very short LC-MS/MS runs can in principle be used for high-throughput analyses, they usually lead to a drop in LC-MS duty cycle. This is because each sample needs initial washing, loading, and equilibration steps, independent of gradient time, which takes a substantial percentage for most LC setups - typically at least 15–20 min. To achieve a more efficient LC-MS duty cycle, while maintaining high sensitivity, a second analytical column can be introduced. This enables the parallelization of several steps related to sample loading and to the LC operating steps, including valve switching. Such dual analytical column or “double-barrel: setups have been described for various applications and platforms (30, 3539).Starting from the reported performance and throughput of workflows that are standard today (16, 21, 4042), we asked if it would be possible to obtain a severalfold increase in both sample throughput and sensitivity, as well as a considerable reduction in overall wet lab costs and working time. Specifically, our goal was to quantify 96 medium complexity samples in a single day. Such a number of samples can be processed with a 96-well plate, which currently is the format of choice for highly parallelized sample preparation workflows, often with a high degree of automation. We investigated which advances were needed in sample preparation, liquid chromatography, and mass spectrometry. Based on our findings, we developed a parallelized platform for high-throughput sample preparation and LC-MS/MS analysis, which we applied to pull-down samples from the yeast chromatin remodeling landscape. The extent of retrieval of known complex members served as a quality control of the developed pipeline.  相似文献   

9.
The combination of chemical cross-linking and mass spectrometry has recently been shown to constitute a powerful tool for studying protein–protein interactions and elucidating the structure of large protein complexes. However, computational methods for interpreting the complex MS/MS spectra from linked peptides are still in their infancy, making the high-throughput application of this approach largely impractical. Because of the lack of large annotated datasets, most current approaches do not capture the specific fragmentation patterns of linked peptides and therefore are not optimal for the identification of cross-linked peptides. Here we propose a generic approach to address this problem and demonstrate it using disulfide-bridged peptide libraries to (i) efficiently generate large mass spectral reference data for linked peptides at a low cost and (ii) automatically train an algorithm that can efficiently and accurately identify linked peptides from MS/MS spectra. We show that using this approach we were able to identify thousands of MS/MS spectra from disulfide-bridged peptides through comparison with proteome-scale sequence databases and significantly improve the sensitivity of cross-linked peptide identification. This allowed us to identify 60% more direct pairwise interactions between the protein subunits in the 20S proteasome complex than existing tools on cross-linking studies of the proteasome complexes. The basic framework of this approach and the MS/MS reference dataset generated should be valuable resources for the future development of new tools for the identification of linked peptides.The study of protein–protein interactions is crucial to understanding how cellular systems function because proteins act in concert through a highly organized set of interactions. Most cellular processes are carried out by large macromolecular assemblies and regulated through complex cascades of transient protein–protein interactions (1). In the past several years numerous high-throughput studies have pioneered the systematic characterization of protein–protein interactions in model organisms (24). Such studies mainly utilize two techniques: the yeast two-hybrid system, which aims at identifying binary interactions (5), and affinity purification combined with tandem mass spectrometry analysis for the identification of multi-protein assemblies (68). Together these led to a rapid expansion of known protein–protein interactions in human and other model organisms. Patche and Aloy recently estimated that there are more than one million interactions catalogued to date (9).But despite rapid progress, most current techniques allow one to determine only whether proteins interact, which is only the first step toward understanding how proteins interact. A more complete picture comes from characterizing the three-dimensional structures of protein complexes, which provide mechanistic insights that govern how interactions occur and the high specificity observed inside the cell. Traditionally the gold-standard methods used to solve protein structures are x-ray crystallography and NMR, and there have been several efforts similar to structural genomics (10) aiming to comprehensively solve the structures of protein complexes (11, 12). Although there has been accelerated growth of structures for protein monomers in the Protein Data Bank in recent years (11), the growth of structures for protein complexes has remained relatively small (9). Many factors, including their large size, transient nature, and dynamics of interactions, have prevented many complexes from being solved via traditional approaches in structural biology. Thus, the development of complementary analytical techniques with which to probe the structure of large protein complexes continues to evolve (1318).Recent developments have advanced the analysis of protein structures and interaction by combining cross-linking and tandem mass spectrometry (17, 1924). The basic idea behind this technique is to capture and identify pairs of amino acid residues that are spatially close to each other. When these linked pairs of residues are from the same protein (intraprotein cross-links), they provide distance constraints that help one infer the possible conformations of protein structures. Conversely, when pairs of residues come from different proteins (interprotein cross-links), they provide information about how proteins interact with one another. Although cross-linking strategies date back almost a decade (25, 26), difficulty in analyzing the complex MS/MS spectrum generated from linked peptides made this approach challenging, and therefore it was not widely used. With recent advances in mass spectrometry instrumentation, there has been renewed interest in employing this strategy to determine protein structures and identify protein–protein interactions. However, most studies thus far have been focused on purified protein complexes. With today''s mass spectrometers being capable of analyzing tens of thousands of spectra in a single experiment, it is now potentially feasible to extend this approach to the analysis of complex biological samples. Researchers have tried to realize this goal using both experimental and computational approaches. Indeed, a plethora of chemical cross-linking reagents are now available for stabilizing these complexes, and some are designed to allow for easier peptide identification when employed in concert with MS analysis (20, 27, 28). There have also been several recent efforts to develop computational methods for the automatic identification of linked peptides from MS/MS spectra (2936). However, because of the lack of large annotated training data, most approaches to date either borrow fragmentation models learned from unlinked, linear peptides or learn the fragmentation statistics from training data of limited size (30, 37), which might not generalize well across different samples. In some cases it is possible to generate relatively large training data, but it is often very labor intensive and involves hundreds of separate LC-MS/MS runs (36). Here, employing disulfide-bridged peptides as an example, we propose a novel method that uses a combinatorial peptide library to (a) efficiently generate a large mass spectral reference dataset for linked peptides and (b) use these data to automatically train our new algorithm, MXDB, which can efficiently and accurately identify linked peptides from MS/MS spectra.  相似文献   

10.
Quantifying the similarity of spectra is an important task in various areas of spectroscopy, for example, to identify a compound by comparing sample spectra to those of reference standards. In mass spectrometry based discovery proteomics, spectral comparisons are used to infer the amino acid sequence of peptides. In targeted proteomics by selected reaction monitoring (SRM) or SWATH MS, predetermined sets of fragment ion signals integrated over chromatographic time are used to identify target peptides in complex samples. In both cases, confidence in peptide identification is directly related to the quality of spectral matches. In this study, we used sets of simulated spectra of well-controlled dissimilarity to benchmark different spectral comparison measures and to develop a robust scoring scheme that quantifies the similarity of fragment ion spectra. We applied the normalized spectral contrast angle score to quantify the similarity of spectra to objectively assess fragment ion variability of tandem mass spectrometric datasets, to evaluate portability of peptide fragment ion spectra for targeted mass spectrometry across different types of mass spectrometers and to discriminate target assays from decoys in targeted proteomics. Altogether, this study validates the use of the normalized spectral contrast angle as a sensitive spectral similarity measure for targeted proteomics, and more generally provides a methodology to assess the performance of spectral comparisons and to support the rational selection of the most appropriate similarity measure. The algorithms used in this study are made publicly available as an open source toolset with a graphical user interface.In “bottom-up” proteomics, peptide sequences are identified by the information contained in their fragment ion spectra (1). Various methods have been developed to generate peptide fragment ion spectra and to match them to their corresponding peptide sequences. They can be broadly grouped into discovery and targeted methods. In the widely used discovery (also referred to as shotgun) proteomic approach, peptides are identified by establishing peptide to spectrum matches via a method referred to as database searching. Each acquired fragment ion spectrum is searched against theoretical peptide fragment ion spectra computed from the entries of a specified sequence database, whereby the database search space is constrained to a user defined precursor mass tolerance (2, 3). The quality of the match between experimental and theoretical spectra is typically expressed with multiple scores. These include the number of matching or nonmatching fragments, the number of consecutive fragment ion matches among others. With few exceptions (47) commonly used search engines do not use the relative intensities of the acquired fragment ion signals even though this information could be expected to strengthen the confidence of peptide identification because the relative fragment ion intensity pattern acquired under controlled fragmentation conditions can be considered as a unique “fingerprint” for a given precursor. Thanks to community efforts in acquiring and sharing large number of datasets, the proteomes of some species are now essentially mapped out and experimental fragment ion spectra covering entire proteomes are increasingly becoming accessible through spectral databases (816). This has catalyzed the emergence of new proteomics strategies that differ from classical database searching in that they use prior spectral information to identify peptides. Those comprise inclusion list sequencing (directed sequencing), spectral library matching, and targeted proteomics (17). These methods explicitly use the information contained in empirical fragment ion spectra, including the fragment ion signal intensity to identify the target peptide. For these methods, it is therefore of highest importance to accurately control and quantify the degree of reproducibility of the fragment ion spectra across experiments, instruments, labs, methods, and to quantitatively assess the similarity of spectra. To date, dot product (1824), its corresponding arccosine spectral contrast angle (2527) and (Pearson-like) spectral correlation (2831), and other geometrical distance measures (18, 32), have been used in the literature for assessing spectral similarity. These measures have been used in different contexts including shotgun spectra clustering (19, 26), spectral library searching (18, 20, 21, 24, 25, 2729), cross-instrument fragmentation comparisons (22, 30) and for scoring transitions in targeted proteomics analyses such as selected reaction monitoring (SRM)1 (23, 31). However, to our knowledge, those scores have never been objectively benchmarked for their performance in discriminating well-defined levels of dissimilarities between spectra. In particular, similarity scores obtained by different methods have not yet been compared for targeted proteomics applications, where the sensitive discrimination of highly similar spectra is critical for the confident identification of targeted peptides.In this study, we have developed a method to objectively assess the similarity of fragment ion spectra. We provide an open-source toolset that supports these analyses. Using a computationally generated benchmark spectral library with increasing levels of well-controlled spectral dissimilarity, we performed a comprehensive and unbiased comparison of the performance of the main scores used to assess spectral similarity in mass spectrometry.We then exemplify how this method, in conjunction with its corresponding benchmarked perturbation spectra set, can be applied to answer several relevant questions for MS-based proteomics. As a first application, we show that it can efficiently assess the absolute levels of peptide fragmentation variability inherent to any given mass spectrometer. By comparing the instrument''s intrinsic fragmentation conservation distribution to that of the benchmarked perturbation spectra set, nominal values of spectral similarity scores can indeed be translated into a more directly understandable percentage of variability inherent to the instrument fragmentation. As a second application, we show that the method can be used to derive an absolute measure to estimate the conservation of peptide fragmentation between instruments or across proteomics methods. This allowed us to quantitatively evaluate, for example, the transferability of fragment ion spectra acquired by data dependent analysis in a first instrument into a fragment/transition assay list used for targeted proteomics applications (e.g. SRM or targeted extraction of data independent acquisition SWATH MS (33)) on another instrument. Third, we used the method to probe the fragmentation patterns of peptides carrying a post-translation modification (e.g. phosphorylation) by comparing the spectra of modified peptide with those of their unmodified counterparts. Finally, we used the method to determine the overall level of fragmentation conservation that is required to support target-decoy discrimination and peptide identification in targeted proteomics approaches such as SRM and SWATH MS.  相似文献   

11.
12.
The analysis and management of MS data, especially those generated by data independent MS acquisition, exemplified by SWATH-MS, pose significant challenges for proteomics bioinformatics. The large size and vast amount of information inherent to these data sets need to be properly structured to enable an efficient and straightforward extraction of the signals used to identify specific target peptides. Standard XML based formats are not well suited to large MS data files, for example, those generated by SWATH-MS, and compromise high-throughput data processing and storing.We developed mzDB, an efficient file format for large MS data sets. It relies on the SQLite software library and consists of a standardized and portable server-less single-file database. An optimized 3D indexing approach is adopted, where the LC-MS coordinates (retention time and m/z), along with the precursor m/z for SWATH-MS data, are used to query the database for data extraction.In comparison with XML formats, mzDB saves ∼25% of storage space and improves access times by a factor of twofold up to even 2000-fold, depending on the particular data access. Similarly, mzDB shows also slightly to significantly lower access times in comparison with other formats like mz5. Both C++ and Java implementations, converting raw or XML formats to mzDB and providing access methods, will be released under permissive license. mzDB can be easily accessed by the SQLite C library and its drivers for all major languages, and browsed with existing dedicated GUIs. The mzDB described here can boost existing mass spectrometry data analysis pipelines, offering unprecedented performance in terms of efficiency, portability, compactness, and flexibility.The continuous improvement of mass spectrometers (14) and HPLC systems (510) and the rapidly increasing volumes of data they produce pose a real challenge to software developers who constantly have to adapt their tools to deal with different types and increasing sizes of raw files. Indeed, the file size of a single MS analysis evolved from a few MB to several GB in less than 10 years. The introduction of high throughput, high mass accuracy MS analyses in data dependent acquisitions (DDA)1 and the adoption of Data Independent Acquisition (DIA) approaches, for example, SWATH-MS (11), were significant factors in this development. The management of these huge data files is a major issue for laboratories and raw file public repositories, which need to regularly upgrade their storage solutions and capacity.The availability of XML (eXtensible Markup Language) standard formats (12, 13) enhanced data exchange among laboratories. However, XMLs causes the inflation of raw file size by a factor of two to three times compared with their original size. Vendor files, although lighter, are proprietary formats, often not compatible with operating systems other than Microsoft Windows. They do not generally interface with many open source software tools, and do not offer a viable solution for data exchange. In addition to size inflation, other disadvantages associated with the use of XML for the representation of raw data have been previously described in the literature (1417). These include the verbosity of language syntax, the lack of support for multidimensional chromatographic analyses, and the low performance showed during data processing. Although XML standards were originally conceived as a format for enabling data sharing in the community, they are commonly used as the input for MS data analysis. Latest software tools (18, 19) are usually only compatible with mzML files, limiting de facto the throughput of proteomic analyses.To tackle these issues, some independent laboratories developed open formats relying on binary specifications (14, 17, 20, 21), to optimize both file size and data processing performance. Similar efforts started already more than ten years ago, and, among the others, the NetCDF version 4, first described in 2004, added the support for a new data model called HDF5. Because it is particularly well suited to the representation of complex data, HDF5 was used in several scientific projects to store and efficiently access large volumes of bytes, as for the mz5 format (17). Compared with XML based formats, mz5 is much more efficient in terms of file size, memory footprint, and access time. Thus, after replacing the JCAMP text format more than 10 years ago, netCDF is nowadays a suitable alternative to XML based formats. Nonetheless, solutions for storing and indexing large amounts of data in a binary file are not limited to netCDF. For instance, it has been demonstrated that a relational model can represent raw data, as in YAFMS format (14), which is based on SQLite, a technology that allows implementing a portable, self-contained, single file database. Similarly to mz5, YAFMS is definitely more efficient in terms of file size and access times than XML.Despite their improvements, a limitation of these new binary formats relies on the lack of a multi-indexing model to represent the bi-dimensional structure of LC-MS data. The inherently 2D indexing of LC-MS data can indeed be very useful when working with LC-MS/MS acquisition files. At the state-of-the-art, three main raw data access strategies can be identified across DDA and DIA approaches:
  • (1) Sequential reading of whole m/z spectra, for a systematic processing of the entire raw file. Use cases: file format conversion, peak picking, analysis of MS/MS spectra, and MS/MS peak list generation.
  • (2) Systematic processing of the data contained in specific m/z windows, across the entire chromatographic gradient. Use cases: extraction of XICs on the whole chromatographic gradient and MS features detection.
  • (3) Random access to a small region of the LC-MS map (a few spectra or an m/z window of consecutive spectra). Use cases: data visualization, targeted extraction of XICs on a small time range, and targeted extraction of a subset of spectra.
The adoption of a certain data access strategy depends upon the particular data analysis algorithms, which can perform signal extraction mainly by unsupervised or supervised approaches. Unsupervised approaches (18, 2225) recognize LC-MS features on the basis of patterns like the theoretical isotope distribution, the shape of the elution peaks, etc. Conversely, supervised approaches (2933) implement the peak picking as driven data access, using the a priori knowledge on peptide coordinates (m/z, retention time, and m/z precursor for DIA), which are provided by appropriate extraction lists given by the identification search engine or the transition lists in targeted proteomics (34). Data access overhead can vary significantly, according to the specific algorithm, data size, and length of the extraction list. In the unsupervised approach, feature detection is based first on the analysis of the full set of MS spectra and then on the grouping of the peaks detected in adjacent MS scans; thus, optimized sequential spectra access is required. In the supervised approach, peptide XICs are extracted using their a priori coordinates and therefore sequential spectra access is not a suitable solution; for instance, MS spectra shared by different peptides would be loaded multiple times leading to highly redundant data reloading. Even though sophisticated caching mechanisms can reduce the impact of this issue, they would increase memory consumption. It is thus preferable to perform a targeted access to specific MS spectra by leveraging an index in the time dimension. However, it would still be a sub-optimal solution because of redundant loads of full MS spectra, whereas only a small spectral window centered on the peptide m/z is of interest. Thus the quantification of dozens of thousands of peptides (32, 33) requires appropriate data access methods to cope with the repetitive and high load of MS data.We therefore deem that an ideal file format should show comparable efficiency regardless of the particular use case. In order to achieve this important flexibility and efficiency on any data access, we developed a new solution featuring multiple indexing strategies: the mzDB format (i.e. m/z database). As the YAFMS format, mzDB is implemented using SQLite, which is commonly adopted in several computational projects and is compatible with most programming languages. In contrast to mz5 and YAFMS formats, where each spectrum is referred by a single index entry, mzDB has an internal data structure allowing a multidimensional data indexing, and thus results in efficient queries along both time and m/z dimensions. This makes mzDB specifically suited to the processing of large-scale LC-MS/MS data. In particular, the multidimensional data-indexing model was extended for SWATH-MS data, where a third index is given by the m/z of the precursor ion, in addition to the RT and m/z of the fragment ions.In order to show its efficiency for all described data access strategies, mzDB was compared with the mzML format, which is the official XML standard, and the latest mz5 binary format, which has already been compared with many existing file formats (17). Results show that mzDB outperforms other formats on most comparisons, except in sequential reading benchmarks where mz5 and mzDB are comparable. mzDB access performance, portability, and compactness, as well as its compliance to the PSI controlled vocabulary make it complementary to existing solutions for both the storage and exchange of mass spectrometry data and will eventually address the issues related to data access overhead during their processing. mzDB can therefore enhance existing mass spectrometry data analysis pipelines, offering unprecedented performance and therefore possibilities.  相似文献   

13.
A complete understanding of the biological functions of large signaling peptides (>4 kDa) requires comprehensive characterization of their amino acid sequences and post-translational modifications, which presents significant analytical challenges. In the past decade, there has been great success with mass spectrometry-based de novo sequencing of small neuropeptides. However, these approaches are less applicable to larger neuropeptides because of the inefficient fragmentation of peptides larger than 4 kDa and their lower endogenous abundance. The conventional proteomics approach focuses on large-scale determination of protein identities via database searching, lacking the ability for in-depth elucidation of individual amino acid residues. Here, we present a multifaceted MS approach for identification and characterization of large crustacean hyperglycemic hormone (CHH)-family neuropeptides, a class of peptide hormones that play central roles in the regulation of many important physiological processes of crustaceans. Six crustacean CHH-family neuropeptides (8–9.5 kDa), including two novel peptides with extensive disulfide linkages and PTMs, were fully sequenced without reference to genomic databases. High-definition de novo sequencing was achieved by a combination of bottom-up, off-line top-down, and on-line top-down tandem MS methods. Statistical evaluation indicated that these methods provided complementary information for sequence interpretation and increased the local identification confidence of each amino acid. Further investigations by MALDI imaging MS mapped the spatial distribution and colocalization patterns of various CHH-family neuropeptides in the neuroendocrine organs, revealing that two CHH-subfamilies are involved in distinct signaling pathways.Neuropeptides and hormones comprise a diverse class of signaling molecules involved in numerous essential physiological processes, including analgesia, reward, food intake, learning and memory (1). Disorders of the neurosecretory and neuroendocrine systems influence many pathological processes. For example, obesity results from failure of energy homeostasis in association with endocrine alterations (2, 3). Previous work from our lab used crustaceans as model organisms found that multiple neuropeptides were implicated in control of food intake, including RFamides, tachykinin related peptides, RYamides, and pyrokinins (46).Crustacean hyperglycemic hormone (CHH)1 family neuropeptides play a central role in energy homeostasis of crustaceans (717). Hyperglycemic response of the CHHs was first reported after injection of crude eyestalk extract in crustaceans. Based on their preprohormone organization, the CHH family can be grouped into two sub-families: subfamily-I containing CHH, and subfamily-II containing molt-inhibiting hormone (MIH) and mandibular organ-inhibiting hormone (MOIH). The preprohormones of the subfamily-I have a CHH precursor related peptide (CPRP) that is cleaved off during processing; and preprohormones of the subfamily-II lack the CPRP (9). Uncovering their physiological functions will provide new insights into neuroendocrine regulation of energy homeostasis.Characterization of CHH-family neuropeptides is challenging. They are comprised of more than 70 amino acids and often contain multiple post-translational modifications (PTMs) and complex disulfide bridge connections (7). In addition, physiological concentrations of these peptide hormones are typically below picomolar level, and most crustacean species do not have available genome and proteome databases to assist MS-based sequencing.MS-based neuropeptidomics provides a powerful tool for rapid discovery and analysis of a large number of endogenous peptides from the brain and the central nervous system. Our group and others have greatly expanded the peptidomes of many model organisms (3, 1833). For example, we have discovered more than 200 neuropeptides with several neuropeptide families consisting of as many as 20–40 members in a simple crustacean model system (5, 6, 2531, 34). However, a majority of these neuropeptides are small peptides with 5–15 amino acid residues long, leaving a gap of identifying larger signaling peptides from organisms without sequenced genome. The observed lack of larger size peptide hormones can be attributed to the lack of effective de novo sequencing strategies for neuropeptides larger than 4 kDa, which are inherently more difficult to fragment using conventional techniques (3437). Although classical proteomics studies examine larger proteins, these tools are limited to identification based on database searching with one or more peptides matching without complete amino acid sequence coverage (36, 38).Large populations of neuropeptides from 4–10 kDa exist in the nervous systems of both vertebrates and invertebrates (9, 39, 40). Understanding their functional roles requires sufficient molecular knowledge and a unique analytical approach. Therefore, developing effective and reliable methods for de novo sequencing of large neuropeptides at the individual amino acid residue level is an urgent gap to fill in neurobiology. In this study, we present a multifaceted MS strategy aimed at high-definition de novo sequencing and comprehensive characterization of the CHH-family neuropeptides in crustacean central nervous system. The high-definition de novo sequencing was achieved by a combination of three methods: (1) enzymatic digestion and LC-tandem mass spectrometry (MS/MS) bottom-up analysis to generate detailed sequences of proteolytic peptides; (2) off-line LC fractionation and subsequent top-down MS/MS to obtain high-quality fragmentation maps of intact peptides; and (3) on-line LC coupled to top-down MS/MS to allow rapid sequence analysis of low abundance peptides. Combining the three methods overcomes the limitations of each, and thus offers complementary and high-confidence determination of amino acid residues. We report the complete sequence analysis of six CHH-family neuropeptides including the discovery of two novel peptides. With the accurate molecular information, MALDI imaging and ion mobility MS were conducted for the first time to explore their anatomical distribution and biochemical properties.  相似文献   

14.
Glycosylation is one of the most common and important protein modifications in biological systems. Many glycoproteins naturally occur at low abundances, which makes comprehensive analysis extremely difficult. Additionally, glycans are highly heterogeneous, which further complicates analysis in complex samples. Lectin enrichment has been commonly used, but each lectin is inherently specific to one or several carbohydrates, and thus no single or collection of lectin(s) can bind to all glycans. Here we have employed a boronic acid-based chemical method to universally enrich glycopeptides. The reaction between boronic acids and sugars has been extensively investigated, and it is well known that the interaction between boronic acid and diols is one of the strongest reversible covalent bond interactions in an aqueous environment. This strong covalent interaction provides a great opportunity to catch glycopeptides and glycoproteins by boronic acid, whereas the reversible property allows their release without side effects. More importantly, the boronic acid-diol recognition is universal, which provides great capability and potential for comprehensively mapping glycosylation sites in complex biological samples. By combining boronic acid enrichment with PNGase F treatment in heavy-oxygen water and MS, we have identified 816 N-glycosylation sites in 332 yeast proteins, among which 675 sites were well-localized with greater than 99% confidence. The results demonstrated that the boronic acid-based chemical method can effectively enrich glycopeptides for comprehensive analysis of protein glycosylation. A general trend seen within the large data set was that there were fewer glycosylation sites toward the C termini of proteins. Of the 332 glycoproteins identified in yeast, 194 were membrane proteins. Many proteins get glycosylated in the high-mannose N-glycan biosynthetic and GPI anchor biosynthetic pathways. Compared with lectin enrichment, the current method is more cost-efficient, generic, and effective. This method can be extensively applied to different complex samples for the comprehensive analysis of protein glycosylation.Glycosylation is an extremely important protein modification that frequently regulates protein folding, trafficking, and stability. It is also involved in a wide range of cellular events (1) such as immune response (2, 3), cell proliferation (4), cell-cell interactions (5), and signal transduction (6). Aberrant protein glycosylation is believed to have a direct correlation with the development of several diseases, including diabetes, infectious diseases, and cancer (711). Secretory proteins frequently get glycosylated, including those in body fluids such as blood, saliva, and urine (12, 13). Samples containing these proteins can be easily obtained and used for diagnostic and therapeutic purposes. Several glycoproteins have previously been identified as biomarkers, including Her2/Neu in breast cancer (14), prostate-specific antigen (PSA) in prostate cancer (15), and CA125 in ovarian cancer (16, 17), which highlights the clinical importance of identifying glycoproteins as indicators or biomarkers of diseases. Therefore, effective methods for systematic analysis of protein glycosylation are essential to understand the mechanisms of glycobiology, identify drug targets and discover biomarkers.Approximately half of mammalian cell proteins are estimated to be glycosylated at any given time (18). There have been many reports regarding identification of protein glycosylation sites and elucidation of glycan structures (1930). Glycan structure analysis can lead to potential therapeutic and diagnostic applications (31, 32), but it is also critical to identify which proteins are glycosylated as well as the sites at which the modification occurs. Despite progress in recent years, the large-scale analysis of protein glycosylation sites using MS-based proteomics methods is still a challenge. Without an effective enrichment method, the low abundance of glycoproteins prohibits the identification of the majority of sites using the popular intensity-dependent MS sequence method.About a decade ago, a very beautiful and elegant method based on hydrazide chemistry was developed to enrich glycopeptides. Hydrazide conjugated beads reacted with aldehydes formed from the oxidation of cis-diols in glycans (33). This method has been extensively applied to many different types of biological samples (3441). Besides the hydrazide-based enrichment method, lectins have also been frequently used to enrich glycopeptides or glycoproteins before MS analysis (28, 29, 4246). However, there are many different types of lectins, and each is specific to certain glycans (47, 48). Therefore, no combination of lectins can bind to all glycosylated peptides or proteins, which prevents comprehensive analysis of protein glycosylation. Because of the complexity of biological samples, effective enrichment methods are critical for the comprehensive analysis of protein glycosylation before MS analysis.One common feature of all glycoproteins and glycopeptides is that they contain multiple hydroxyl groups in their glycans. From a chemistry point of view, this can be exploited to effectively enrich them. Ideally, chemical enrichment probes must have both strong and specific interactions with multiple hydroxyl groups. The reaction between boronic acids and 1,2- or 1,3-cis-diols in sugars has been extensively studied (4952) and applied for the small-scale analysis of glycoproteins (5355). Furthermore, boronate affinity chromatography has been employed for the analysis of nonenzymatically glycated peptides (56, 57). Boronic acid-based chemical enrichment methods are expected to have great potential for global analysis of glycopeptides when combined with modern MS-based proteomics techniques. However, the method has not yet been used for the comprehensive analysis of protein N-glycosylation in complex biological samples (58).Yeast is an excellent model biological system that has been extensively used in a wide range of experiments. Last year, two papers reported the large-scale analysis of protein N-glycosylation in yeast (59, 60). In one study, a new MS-based method was developed based on N-glycopeptide mass envelopes with a pattern via metabolic incorporation of a defined mixture of N-acetylglucosamine isotopologs into N-glycans. Peptides with the recoded envelopes were specifically targeted for fragmentation, facilitating high confidence site mapping (59). Using this method, 133 N-glycosylation sites were confidently identified in 58 yeast proteins. When combined with an effective enrichment method, this MS-based analysis will provide a more complete coverage of the N-glycoproteome. The other work combined lectin enrichment with digestion by two enzymes (Glu_c and trypsin) to increase the peptide coverage, and 516 well-localized N-glycosylation sites were identified in 214 yeast proteins by MS (60).Here we have comprehensively identified protein N-glycosylation sites in yeast by combining a boronic acid-based chemical enrichment method with MS-based proteomics techniques. Magnetic beads conjugated with boronic acid were systematically optimized to selectively enrich glycosylated peptides from yeast whole cell lysates. The enriched peptides were subsequently treated with Peptide-N4-(N-acetyl-beta-glucosaminyl)asparagine amidase (PNGase F)1 in heavy-oxygen water. Finally, peptides were analyzed by an on-line LC-MS system. Over 800 protein N-glycosylation sites were identified in the yeast proteome, which clearly demonstrates that the boronic acid-based chemical method is an effective enrichment method for large-scale analysis of protein glycosylation by MS.  相似文献   

15.
16.
Hydroxyl radical footprinting based MS for protein structure assessment has the goal of understanding ligand induced conformational changes and macromolecular interactions, for example, protein tertiary and quaternary structure, but the structural resolution provided by typical peptide-level quantification is limiting. In this work, we present experimental strategies using tandem-MS fragmentation to increase the spatial resolution of the technique to the single residue level to provide a high precision tool for molecular biophysics research. Overall, in this study we demonstrated an eightfold increase in structural resolution compared with peptide level assessments. In addition, to provide a quantitative analysis of residue based solvent accessibility and protein topography as a basis for high-resolution structure prediction; we illustrate strategies of data transformation using the relative reactivity of side chains as a normalization strategy and predict side-chain surface area from the footprinting data. We tested the methods by examination of Ca+2-calmodulin showing highly significant correlations between surface area and side-chain contact predictions for individual side chains and the crystal structure. Tandem ion based hydroxyl radical footprinting-MS provides quantitative high-resolution protein topology information in solution that can fill existing gaps in structure determination for large proteins and macromolecular complexes.Hydroxyl radical footprinting (HRF)1 is valuable for assessing the structure of macromolecules. Single nucleotide resolution data enabled by the similar reactivity of the OH radical with each and every backbone position has helped solve important problems in the nucleic acids field, such as understanding RNA folding and ribosome assembly (15). Applications of HRF to probe protein structure are a subset of a family of structural MS approaches, including the use of reversible deuterium labeling or irreversible covalent labeling, including labeling with OH radicals (613). Hydrogen-deuterium exchange MS (HDX-MS) is particularly suited to measure secondary and tertiary structure stability through backbone exchange, whereas HRF-MS has been effective at measuring the relative solvent accessibility of specific amino acid side chains mediated by intramolecular tertiary and intermolecular quaternary structure interactions. Hydroxyl radicals can be generated by a variety of methods in each case the chemistry has been shown to be quite similar and the radicals react with side chains of surface residues resulting in well characterized oxidation products (7, 10, 11). As up to 18 side chains are potential probes, the overall protein coverage and resolution of the method is theoretically high.Both HDX-MS and HRF-MS utilize a “bottom-up” proteomics approach where proteins are digested to peptide states after labeling, and mass shifts of the resultant peptides are read-out to pinpoint sites of conformational change. Although this usually can provide 90% or more coverage across the entire protein length, in fact the structural resolution is limited as the size of the peptide fragments and the data report the average behavior of the individual residues across the entire peptide, which are typically in the range of five–20 residues (14). MS2 based quantification is in principle a general solution to the problem of increasing structural resolution, and has been attempted for HDX-MS, but the scrambling of the labels in the gas phase has been difficult to overcome using collision induced dissociation (15, 16). Alternative approaches for HDX-MS site localization, like electron transfer dissociation to achieve single residue resolution have potential promise but are typically limited to larger peptides that can access higher charge states easily (17, 18). MS2 strategies to enhance the resolution for covalent labeling experiments have been attempted with some success, as scrambling is not a limitation in covalent labeling experiments (7, 1921). On the other hand, MS1 based strategies to enhance structural resolution for both HDX and covalent labeling approaches using overlapping protease fragments are also a promising route to providing subpeptide resolution in many cases (7, 2027).In this work, we present a coupled set of high-throughput experimental and computational approaches to extend previous MS2 based HRF-MS strategies and provide a quantitative topographical structure assessment for proteins at the individual side chain level. The combined approach permits quantification of modifications through examination of a tandem-ion based ladder of peptide fragments and combining the ion abundances from both MS1 and MS2 quantification. The high-resolution information is transformed using the knowledge of the relative reactivity of side chains to predict side-chain surface area for the structurally well-characterized Ca2+ bound form of Calmodulin (CaM). In addition, we explored a statistical approach using random forest regression methods to predict solvent accessible surface area at the residue level. Overall, these studies provide a novel approach to provide high-resolution single-residue surface accessibility data with at least eightfold higher spatial resolution than peptide based measures for accurate protein topography predictions.  相似文献   

17.
The past 15 years have seen significant progress in LC-MS/MS peptide sequencing, including the advent of successful de novo and database search methods; however, analysis of glycopeptide and, more generally, glycoconjugate spectra remains a much more open problem, and much annotation is still performed manually. This is partly because glycans, unlike peptides, need not be linear chains and are instead described by trees. In this study, we introduce SweetSEQer, an extremely simple open source tool for identifying potential glycopeptide MS/MS spectra. We evaluate SweetSEQer on manually curated glycoconjugate spectra and on negative controls, and we demonstrate high quality filtering that can be easily improved for specific applications. We also demonstrate a high overlap between peaks annotated by experts and peaks annotated by SweetSEQer, as well as demonstrate inferred glycan graphs consistent with canonical glycan tree motifs. This study presents a novel tool for annotating spectra and producing glycan graphs from LC-MS/MS spectra. The tool is evaluated and shown to perform similarly to an expert on manually curated data.Protein glycosylation is a common modification, affecting ∼50% of all expressed proteins (1). Glycosylation affects critical biological functions, including cell-cell recognition, circulating half-life, substrate binding, immunogenicity, and others (2). Regrettably, determining the exact role glycosylation plays in different biological contexts is slowed by a dearth of analytical methods and of appropriate software. Such software is crucial for performing and aiding experts in data analysis complex glycosylation.Glycopeptides are highly heterogeneous in regard to glycan composition, glycan structure, and linkage stereochemistry in addition to the tens of thousands of possible peptides. The analysis of protein glycosylation is often segmented into three distinct types of mass spectrometry experiments, which together help to resolve this complexity. The first analyzes enzymatically or chemically released glycans (which may or may not be chemically modified), and the second determines glycosylation sites after release of glycans from peptides (the resulting mass spectra allow detection of glycosylation sites and the glycans on those sites simultaneously). The third determines the glycosylation sites and the glycans on those sites simultaneously, by MS of intact glycopeptides. Frequently, researchers will perform all three types of analysis, with the first two types providing information about possible combinations of glycan structures and peptides that could be found in the third experiment. Using this MS1 information, the problem is reduced to matching masses observed with a combinatorial pool of all possible glycans and all possible glycosylated peptides within a sample; however, this combinatorial approach alone is insufficient (3), and tandem mass spectrometry can provide copious additional information to help resolve the glycopeptide content from complex samples.The similar problem of inferring peptide sequences from MS/MS spectra has received considerably more attention. Peptide inference is more constrained than glycan inference, because the chain of MS/MS peaks corresponds to a linear peptide sequence; given an MS/MS spectrum, the linear peptide sequence can be inferred through brute force or dynamic programming via de novo methods (46) as described in Ref. 7. Additionally, the possible search space of peptides can be dramatically lowered by using database searching (821) as described in Ref. 7, which compares the MS/MS spectrum to the predicted spectra from only those peptides resulting from a protein database or translated open reading frames (ORFs) of a genomic database.The possible search space of glycans is larger than the search space of peptides because, in contrast to linear peptide chains, glycans may form branching trees. Identifying glycans using database search methodologies is impractical, as it is impractical to define the database when the detailed activities of the set of glycosyltransferases are not defined. Generating an overly large database would artificially inflate the set of incompletely characterized spectra, and too small of a search space would lead to inaccurate results. Furthermore, as glycosylation is not a template-driven process, no clear choice for a database matching approach is available, and de novo sequencing is therefore a more appropriate approach.As a result, few desirable software options are available for the high throughput analysis of tandem mass spectrometry data from intact glycopeptides (as noted in a recent review (22)). In fact, manual annotation of spectra is still commonplace, despite being slow and despite the potential for disagreement between different experts. Some available software requires user-defined lists of glycan and/or peptide masses as input, which is suboptimal from a sample consumption and throughput perspective (23, 24). These lists must typically be generated by parallel experiments or simply hypothesized a priori, meaning omissions in either list may affect the results. Furthermore, some software does not work on batched input files, meaning each spectrum must be analyzed separately (23, 2528). Moreover, there is an even greater lack of open source software for glycoproteomics, so modifying the existing software for the researchers individual applications is not easily achieved. The one open source tool that we know of (GlypID) is applicable only to the analysis of glycopeptide spectra acquired from a very specialized workflow, which requires MS1, CID, and higher-energy C-trap type dissociation (HCD) spectra (29). With that approach, oxonium ions from HCD spectra are necessary to predict the glycan class; potential peptide lists are queried by precursor m/z values (requiring accurate a priori knowledge of all modifications), and possible theoretical “N-linked” precursor m/z values are used to select candidate spectra (using templates, unlike de novo characterization). As a result, the tool is specialized and limited to analysis of “N-linked” glycopeptide spectra from very specific experimental setups.Free, open-source glycoproteomic software capable of batch analysis of general tandem mass spectrometry spectra of glycoconjugates is sorely needed. In this work, we present SweetSEQer, a tool for de novo analysis of tandem mass spectra of glycoconjugates (the most general class of spectra containing fragmentation involving sugars). Furthermore, because SweetSEQer is so general and simple, and because it does not require specific experimental setup, it is widely applicable to the analysis of general glycoconjugate spectra (e.g. it is already applicable to “O-linked” glycopeptide and glycoconjugate spectra). Moreover, because it is an open source and does not use external software, it not only eschews solving problems like MS1 deisotoping, it can also be easily customized and even used to augment and complement existing tools like GlypID (and, because we do not use a “copyleft” software license, our algorithm and code can even be added to non-open source and proprietary variants).SweetSEQer''s performance was tested on a validated, manually annotated set of glycoconjugate identifications from a urinary glycoproteomics study. Specificity was demonstrated by showing a low identification rate on negative control spectra from Escherichia coli. Annotated structures are shown to be consistent by a human expert by demonstrating a high overlap in identified glycan fragment ions, as well as a consistency between SweetSEQer''s predicted glycan graph and glycan chains produced by an expert. Our simple object-oriented python implementation is freely available (Apache 2.0 license) on line.  相似文献   

18.
Liquid chromatography-tandem mass spectrometry (LC-MS/MS)-based proteomics provides a wealth of information about proteins present in biological samples. In bottom-up LC-MS/MS-based proteomics, proteins are enzymatically digested into peptides prior to query by LC-MS/MS. Thus, the information directly available from the LC-MS/MS data is at the peptide level. If a protein-level analysis is desired, the peptide-level information must be rolled up into protein-level information. We propose a principal component analysis-based statistical method, ProPCA, for efficiently estimating relative protein abundance from bottom-up label-free LC-MS/MS data that incorporates both spectral count information and LC-MS peptide ion peak attributes, such as peak area, volume, or height. ProPCA may be used effectively with a variety of quantification platforms and is easily implemented. We show that ProPCA outperformed existing quantitative methods for peptide-protein roll-up, including spectral counting methods and other methods for combining LC-MS peptide peak attributes. The performance of ProPCA was validated using a data set derived from the LC-MS/MS analysis of a mixture of protein standards (the UPS2 proteomic dynamic range standard introduced by The Association of Biomolecular Resource Facilities Proteomics Standards Research Group in 2006). Finally, we applied ProPCA to a comparative LC-MS/MS analysis of digested total cell lysates prepared for LC-MS/MS analysis by alternative lysis methods and show that ProPCA identified more differentially abundant proteins than competing methods.One of the fundamental goals of proteomics methods for the biological sciences is to identify and quantify all proteins present in a sample. LC-MS/MS-based proteomics methodologies offer a promising approach to this problem (13). These methodologies allow for the acquisition of a vast amount of information about the proteins present in a sample. However, extracting reliable protein abundance information from LC-MS/MS data remains challenging. In this work, we were primarily concerned with the analysis of data acquired using bottom-up label-free LC-MS/MS-based proteomics techniques where “bottom-up” refers to the fact that proteins are enzymatically digested into peptides prior to query by the LC-MS/MS instrument platform (4), and “label-free” indicates that analyses are performed without the aid of stable isotope labels. One challenge inherent in the bottom-up approach to proteomics is that information directly available from the LC-MS/MS data is at the peptide level. When a protein-level analysis is desired, as is often the case with discovery-driven LC-MS research, peptide-level information must be rolled up into protein-level information.Spectral counting (510) is a straightforward and widely used example of peptide-protein roll-up for LC-MS/MS data. Information experimentally acquired in single stage (MS) and tandem (MS/MS) spectra may lead to the assignment of MS/MS spectra to peptide sequences in a database-driven or database-free manner using various peptide identification software platforms (SEQUEST (11) and Mascot (12), for instance); the identified peptide sequences correspond, in turn, to proteins. In principle, the number of tandem spectra matched to peptides corresponding to a certain protein, the spectral count (SC),1 is positively associated with the abundance of a protein (5). In spectral counting techniques, raw or normalized SCs are used as a surrogate for protein abundance. Spectral counting methods have been moderately successful in quantifying protein abundance and identifying significant proteins in various settings. However, SC-based methods do not make full use of information available from peaks in the LC-MS domain, and this surely leads to loss of efficiency.Peaks in the LC-MS domain corresponding to peptide ion species are highly sensitive to differences in protein abundance (13, 14). Identifying LC-MS peaks that correspond to detected peptides and measuring quantitative attributes of these peaks (such as height, area, or volume) offers a promising alternative to spectral counting methods. These methods have become especially popular in applications using stable isotope labeling (15). However, challenges remain, especially in the label-free analysis of complex proteomics samples where complications in peak detection, alignment, and integration are a significant obstacle. In practice, alignment, identification, and quantification of LC-MS peptide peak attributes (PPAs) may be accomplished using recently developed peak matching platforms (1618). A highly sensitive indicator of protein abundance may be obtained by rolling up PPA measurements into protein-level information (16, 19, 20). Existing peptide-protein roll-up procedures based on PPAs typically involve taking the mean of (possibly normalized) PPA measurements over all peptides corresponding to a protein to obtain a protein-level estimate of abundance. Despite the promise of PPA-based procedures for protein quantification, the performance of PPA-based methods may vary widely depending on the particular roll-up procedure used; furthermore, PPA-based procedures are limited by difficulties in accurately identifying and measuring peptide peak attributes. These two issues are related as the latter issue affects the robustness of PPA-based roll-up methods. Indeed, existing peak matching and quantification platforms tend to result in PPA measurement data sets with substantial missingness (16, 19, 21), especially when working with very complex samples where substantial dynamic ranges and ion suppression are difficulties that must be overcome. Missingness may, in turn, lead to instability in protein-level abundance estimates. A good peptide-protein roll-up procedure that utilizes PPAs should account for this missingness and the resulting instability in a principled way. However, even in the absence of missingness, there is no consensus in the existing literature on peptide-protein roll-up for PPA measurements.In this work, we propose ProPCA, a peptide-protein roll-up method for efficiently extracting protein abundance information from bottom-up label-free LC-MS/MS data. ProPCA is an easily implemented, unsupervised method that is related to principle component analysis (PCA) (22). ProPCA optimally combines SC and PPA data to obtain estimates of relative protein abundance. ProPCA addresses missingness in PPA measurement data in a unified way while capitalizing on strengths of both SCs and PPA-based roll-up methods. In particular, ProPCA adapts to the quality of the available PPA measurement data. If the PPA measurement data are poor and, in the extreme case, no PPA measurements are available, then ProPCA is equivalent to spectral counting. On the other hand, if there is no missingness in the PPA measurement data set, then the ProPCA estimate is a weighted mean of PPA measurements and spectral counts where the weights are chosen to reflect the ability of spectral counts and each peptide to predict protein abundance.Below, we assess the performance of ProPCA using a data set obtained from the LC-MS/MS analysis of protein standards (UPS2 proteomic dynamic range standard set2 manufactured by Sigma-Aldrich) and show that ProPCA outperformed other existing roll-up methods by multiple metrics. The applicability of ProPCA is not limited by the quantification platform used to obtain SCs and PPA measurements. To demonstrate this, we show that ProPCA continued to perform well when used with an alternative quantification platform. Finally, we applied ProPCA to a comparative LC-MS/MS analysis of digested total human hepatocellular carcinoma (HepG2) cell lysates prepared for LC-MS/MS analysis by alternative lysis methods. We show that ProPCA identified more differentially abundant proteins than competing methods.  相似文献   

19.
The use of ultraviolet photodissociation (UVPD) for the activation and dissociation of peptide anions is evaluated for broader coverage of the proteome. To facilitate interpretation and assignment of the resulting UVPD mass spectra of peptide anions, the MassMatrix database search algorithm was modified to allow automated analysis of negative polarity MS/MS spectra. The new UVPD algorithms were developed based on the MassMatrix database search engine by adding specific fragmentation pathways for UVPD. The new UVPD fragmentation pathways in MassMatrix were rigorously and statistically optimized using two large data sets with high mass accuracy and high mass resolution for both MS1 and MS2 data acquired on an Orbitrap mass spectrometer for complex Halobacterium and HeLa proteome samples. Negative mode UVPD led to the identification of 3663 and 2350 peptides for the Halo and HeLa tryptic digests, respectively, corresponding to 655 and 645 peptides that were unique when compared with electron transfer dissociation (ETD), higher energy collision-induced dissociation, and collision-induced dissociation results for the same digests analyzed in the positive mode. In sum, 805 and 619 proteins were identified via UVPD for the Halobacterium and HeLa samples, respectively, with 49 and 50 unique proteins identified in contrast to the more conventional MS/MS methods. The algorithm also features automated charge determination for low mass accuracy data, precursor filtering (including intact charge-reduced peaks), and the ability to combine both positive and negative MS/MS spectra into a single search, and it is freely open to the public. The accuracy and specificity of the MassMatrix UVPD search algorithm was also assessed for low resolution, low mass accuracy data on a linear ion trap. Analysis of a known mixture of three mitogen-activated kinases yielded similar sequence coverage percentages for UVPD of peptide anions versus conventional collision-induced dissociation of peptide cations, and when these methods were combined into a single search, an increase of up to 13% sequence coverage was observed for the kinases. The ability to sequence peptide anions and cations in alternating scans in the same chromatographic run was also demonstrated. Because ETD has a significant bias toward identifying highly basic peptides, negative UVPD was used to improve the identification of the more acidic peptides in conjunction with positive ETD for the more basic species. In this case, tryptic peptides from the cytosolic section of HeLa cells were analyzed by polarity switching nanoLC-MS/MS utilizing ETD for cation sequencing and UVPD for anion sequencing. Relative to searching using ETD alone, positive/negative polarity switching significantly improved sequence coverages across identified proteins, resulting in a 33% increase in unique peptide identifications and more than twice the number of peptide spectral matches.The advent of new high-performance tandem mass spectrometers equipped with the most versatile collision- and electron-based activation methods and ever more powerful database search algorithms has catalyzed tremendous progress in the field of proteomics (14). Despite these advances in instrumentation and methodologies, there are few methods that fully exploit the information available from the acidic proteome or acidic regions of proteins. Typical high-throughput, bottom-up workflows consist of the chromatographic separation of complex mixtures of digested proteins followed by online mass spectrometry (MS) and MSn analysis. This bottom-up approach remains the most popular strategy for protein identification, biomarker discovery, quantitative proteomics, and elucidation of post-translational modifications. To date, proteome characterization via mass spectrometry has overwhelmingly focused on the analysis of peptide cations (5), resulting in an inherent bias toward basic peptides that easily ionize under acidic mobile phase conditions and positive polarity MS settings. Given that ∼50% of peptides/proteins are naturally acidic (6) and that many of the most important post-translational modifications (e.g. phosphorylation, acetylation, sulfonation, etc.) significantly decrease the isoelectric points of peptides (7, 8), there is a compelling need for better analytical methodologies for characterization of the acidic proteome.A principal reason for the shortage of methods for peptide anion characterization is the lack of MS/MS techniques suitable for the efficient and predictable dissociation of peptide anions. Although there are a growing array of new ion activation methods for the dissociation of peptides, most have been developed for the analysis of positively charged peptides. Collision-induced dissociation (CID)1 of peptide anions, for example, often yields unpredictable or uninformative fragmentation behavior, with spectra dominated by neutral losses from both precursor and product ions (9), resulting in insufficient peptide sequence information. The two most promising new electron-based methods, electron-capture dissociation and electron-transfer dissociation (ETD), are applicable only to positively charged ions, not to anions (1013). Because of the known inadequacy of CID and the lack of feasibility of electron-capture dissociation and ETD for peptide anion sequencing, several alternative MSn methods have been developed recently. Electron detachment dissociation using high-energy electrons to induce backbone cleavages was developed for peptide anions (14, 15). Another new technique, negative ETD, entails reactions of radical cation reagents with peptide anions to promote electron transfer from the peptide to the reagent that causes radical-directed dissociation (16, 17). Activated-electron photodetachment dissociation, an MS3 technique, uses UV irradiation to produce intact peptide radical anions, which are then collisionally activated (18, 19). Although they represent inroads in the characterization of peptide anions, these methods also suffer from several significant shortcomings. Electron detachment dissociation and activated-electron photodetachment dissociation are both low-efficiency methods that require long averaging cycles and activation times that range from half a second to multiple seconds, impeding the integration of these methods with chromatographic timescales (1419). In addition, the fragmentation patterns frequently yield many high-abundance neutral losses from product ions, which clutter the spectra (1417), and few sequence ions (14, 18, 19). Recently, we reported the use of 193-nm photons (ultraviolet photodissociation (UVPD)) for peptide anion activation, which was shown to yield rich and predictable fragmentation patterns with high sequence coverage on a fast liquid chromatographic timeline (20). This method showed promise for a range of peptide charge states (i.e. from 3- to 1-), as well as for both unmodified and phosphorylated species.Several widely used or commercial database searching techniques are available for automated “bottom-up” analysis of peptide cations; SEQUEST (21), MASCOT (22), OMSSA (23), X! Tandem (24), and MASPIC (25) are all popular choices and yield comparable results (26). MassMatrix (27), a recently introduced searching algorithm, uses a mass accuracy sensitive probability-based scoring scheme for both the total number of matched product ions and the total abundance of matched products. This searching method also utilizes LC retention times to filter false positive peptide matches (28) and has been shown to yield results comparable to or better than those obtained with SEQUEST, MASCOT, OMSSA, and X! Tandem (29). Despite the ongoing innovation in automated peptide cation analysis, there is a lack of publically available methods for automated peptide anion analysis.In this work, we have modified the mass accuracy sensitive probabilistic MassMatrix algorithms to allow database searching of negative polarity MS/MS spectra. The algorithm is specific to the fragmentation behavior generated from 193-nm UVPD of peptide anions. The UVPD pathways in MassMatrix were rigorously and statistically optimized using two large data sets with high mass accuracy and high mass resolution for both MS1 and MS2 data acquired on an Orbitrap mass spectrometer for complex HeLa and Halo proteome samples. For low mass accuracy/low mass resolution data, we also incorporated a charge-state-filtering algorithm that identifies the charge state of each MS/MS spectrum based on the fragmentation patterns prior to searching. MassMatrix not only can analyze both positive and negative polarity LC-MS/MS files separately, but also can combine files from different polarities and different dissociation methods into a single search, thus maximizing the information content for a given proteomics experiment. The explicit incorporation of mass accuracy in the scores for the UVPD MS/MS spectra of peptide anions increases peptide assignments and identifications. Finally, we showcase the utility of integrating MassMatrix searching with positive/negative polarity MS/MS switching (i.e. data-dependent positive ETD and negative UVPD during a single proteomic LC-MS/MS run). MassMatrix is available to the public as a free search engine online.  相似文献   

20.
Significant progress in instrumentation and sample preparation approaches have recently expanded the potential of MALDI imaging mass spectrometry to the analysis of phospholipids and other endogenous metabolites naturally occurring in tissue specimens. Here we explore some of the requirements necessary for the successful analysis and imaging of phospholipids from thin tissue sections of various dimensions by MALDI time-of-flight mass spectrometry. We address methodology issues relative to the imaging of whole-body sections such as those cut from model laboratory animals, sections of intermediate dimensions typically prepared from individual organs, as well as the requirements for imaging areas of interests from these sections at a cellular scale spatial resolution. We also review existing limitations of MALDI imaging MS technology relative to compound identification. Finally, we conclude with a perspective on important issues relative to data exploitation and management that need to be solved to maximize biological understanding of the tissue specimen investigated.Since its introduction in the late 90s (1), MALDI imaging mass spectrometry (MS) technology has witnessed a phenomenal expansion. Initially introduced for the mapping of intact proteins from fresh frozen tissue sections (2), imaging MS is now routinely applied to a wide range of different compounds including peptides, proteins, lipids, metabolites, and xenobiotics (37). Numerous compound-specific sample preparation protocols and analytical strategies have been developed. These include tissue sectioning and handling (814), automated matrix deposition approaches and data acquisition strategies (1521), and the emergence of in situ tissue chemistries (2225). Originally performed on sections cut from fresh frozen tissue specimens, methodologies incorporating an in situ enzymatic digestion step prior to matrix application have been optimized to access the proteome locked in formalin-fixed paraffin-embedded tissue biopsies (2529). The possibility to use tissues preserved using non-cross-linking approaches has also been demonstrated (3032). These methodologies are of high importance for the study of numerous diseases because they potentially allow the retrospective analysis for biomarker validation and discovery of the millions of tissue biopsies currently stored worldwide in tissue banks and repositories.In the past decade, instrumentation for imaging MS has also greatly evolved. Whereas the first MS images were collected with time-of-flight instruments (TOF) capable of repetition rates of a few hertz, modern systems are today capable of acquiring data in the kilohertz range and above with improved sensitivity, mass resolving power, and accuracy, significantly reducing acquisition time and improving image quality (33, 34). Beyond time-of-flight analyzers, other MALDI-based instruments have been used such as ion traps (3537), Qq TOF instruments (3840), and trap-TOF (16, 41). Ion mobility technology has also been used in conjunction with imaging MS (4244). More recently, MALDI FT/ICR and Orbitrap mass spectrometers have been demonstrated to be extremely valuable instruments for the performance of imaging MS at very high mass resolving power (4547). These non-TOF-based systems have proven to be extremely powerful for the imaging of lower molecular weight compounds such as lipids, drugs, and metabolites. Home-built instrumentation and analytical approaches to probe tissues at higher spatial resolution (1–10 μm) have also been described (4850). In parallel to instrumentation developments, automated data acquisition, image visualization, and processing software packages have now also been developed by most manufacturers.To date, a wide range of biological systems have been studied using imaging MS as a primary methodology. Of strong interest are the organization and identification of the molecular composition of diseased tissues in direct correlation with the underlying histology and how it differs from healthy tissues. Such an approach has been used for the study of cancers (5154), neurologic disorders (5557), and other diseases (58, 59). The clinical potential of the imaging MS technology is enormous (7, 60, 61). Results give insights into the onset and progression of diseases, identify novel sets of disease-specific markers, and can provide a molecular confirmation of diagnosis as well as aide in outcome prediction (6264). Imaging MS has also been extensively used to study the development, functioning, and aging of different organs such as the kidney, prostate, epididymis, and eye lens (6570). Beyond the study of isolated tissues or organs, whole-body sections from several model animals such as leeches, mice, and rats have been investigated (7174). For these analyses, specialized instrumentation and protocols are necessary for tissue sectioning and handling (72, 73). Whole-body imaging MS opens the door to the study of the localization and accumulation of administered pharmaceuticals and their known metabolites at the level of entire organisms as well as the monitoring of their efficacy or toxicity as a function of time or dose (72, 73, 75, 76).There is considerable interest in determining the identification and localization of small biomolecules such as lipids in tissues because they are involved in many essential biological functions including cell signaling, energy storage, and membrane structure and function. Defects in lipid metabolism play a role in many diseases such as muscular dystrophy and cardiovascular disease. Phospholipids in tissues have been intensively studied by several groups (37, 40, 7783). In this respect, for optimal recovery of signal, several variables such as the choice of matrix for both imaging and fragmentation, solvent system, and instrument polarity have been investigated (20, 84). Particularly, the use of lithium cation adducts to facilitate phospholipid identification by tandem MS directly from tissue has also been reported (85). Of significant interest is the recent emergence of two new solvent-free matrix deposition approaches that perform exceptionally well for phospholipid imaging analyses. The first approach, described by Hankin et al. (86), consists in depositing the matrix on the sections through a sublimation process. The described sublimation system consists of sublimation glassware, a heated sand or oil bath (100–200 °C), and a primary vacuum pump (∼5 × 10−2 torr). Within a few minutes of initiating the sublimation process, an exceptionally homogeneous film of matrix forms on the section. The thickness of the matrix may be controlled by regulating pressure, temperature, and sublimation time. The second approach, described by Puolitaival et al.(87), uses a fine mesh sieve (≤20 μm) to filter finely ground matrix on the tissue sections. Agitation of the sieve results in passage of the matrix through the mesh and the deposition of a fairly homogeneous layer of submicrometer matrix crystals of the surface of the sections. The matrix density on the sections is controlled by direct observation using a standard light microscope. This matrix deposition approach was also found to be ideal to image certain drug compounds (88, 89). Both strategies allow very rapid production of homogeneous matrix coatings on tissue sections with a fairly inexpensive setup. Signal recovery was found to be comparable with those obtained by conventional spray deposition. With the appropriate size sublimation device or sieve, larger sections with dimensions of several centimeters such as those cut from mouse or rat whole bodies can also be rapidly and homogeneously coated.Here we present several examples of MALDI imaging MS of phospholipids from tissue sections using TOF mass spectrometers over a wide range of dimensions from whole-body sections (several centimeters), to individual organs (several millimeters), down to high spatial resolution imaging of selected tissue areas (hundreds of micrometers) at 10-μm lateral resolution and below. For all of these dimension ranges, technological considerations and practical aspects are discussed. In light of the imaging MS results, we also address issues faced for compound identification by tandem MS analysis performed directly on the sections. Finally, we discuss under “Perspective” our vision of the future of the field as well as the technological improvements and analytical tools that need to be improved upon and developed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号