首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 437 毫秒
1.
Cross-linking/mass spectrometry resolves protein–protein interactions or protein folds by help of distance constraints. Cross-linkers with specific properties such as isotope-labeled or collision-induced dissociation (CID)-cleavable cross-linkers are in frequent use to simplify the identification of cross-linked peptides. Here, we analyzed the mass spectrometric behavior of 910 unique cross-linked peptides in high-resolution MS1 and MS2 from published data and validate the observation by a ninefold larger set from currently unpublished data to explore if detailed understanding of their fragmentation behavior would allow computational delivery of information that otherwise would be obtained via isotope labels or CID cleavage of cross-linkers. Isotope-labeled cross-linkers reveal cross-linked and linear fragments in fragmentation spectra. We show that fragment mass and charge alone provide this information, alleviating the need for isotope-labeling for this purpose. Isotope-labeled cross-linkers also indicate cross-linker-containing, albeit not specifically cross-linked, peptides in MS1. We observed that acquisition can be guided to better than twofold enrich cross-linked peptides with minimal losses based on peptide mass and charge alone. By help of CID-cleavable cross-linkers, individual spectra with only linear fragments can be recorded for each peptide in a cross-link. We show that cross-linked fragments of ordinary cross-linked peptides can be linearized computationally and that a simplified subspectrum can be extracted that is enriched in information on one of the two linked peptides. This allows identifying candidates for this peptide in a simplified database search as we propose in a search strategy here. We conclude that the specific behavior of cross-linked peptides in mass spectrometers can be exploited to relax the requirements on cross-linkers.Cross-linking/mass spectrometry extends the use of mass-spectrometry-based proteomics from identification (1, 2), quantification (3), and characterization of protein complexes (4) into resolving protein structures and protein–protein interactions (58). Chemical reagents (cross-linkers) covalently connect amino acid pairs that are within a cross-linker-specific distance range in the native three-dimensional structure of a protein or protein complex. A cross-linking/mass spectrometry experiment is typically conducted in four steps: (1) cross-linking of the target protein or complex, (2) protein digestion (usually with trypsin), (3) LC-MS analysis, and (4) database search. The digested peptide mixture consists of linear and cross-linked peptides, and the latter can be enriched by strong cation exchange (9) or size exclusion chromatography (10). Cross-linked peptides are of high value as they provide direct information on the structure and interactions of proteins.Cross-linked peptides fragment under collision-induced dissociation (CID) conditions primarily into b- and y-ions, as do their linear counterparts. An important difference regarding database searches between linear and cross-linked peptides stems from not knowing which peptides might be cross-linked. Therefore, one has to consider each single peptide and all pairwise combinations of peptides in the database. Having n peptides leads to (n2 + n)/2 possible pairwise combinations. This leads to two major challenges: With increasing size of the database, search time and the risk of identifying false positives increases. One way of circumventing these problems is to use MS2-cleavable cross-linkers (11, 12), at the cost of limited experimental design and choice of cross-linker.In a first database search approach (13), all pairwise combinations of peptides in a database were considered in a concatenated and linearized form. Thereby, all possible single bond fragments are considered in one of the two database entries per peptide pair, and the cross-link can be identified by a normal protein identification algorithm. Already, the second search approach split the peptides for the purpose of their identification (14). Linear fragments were used to retrieve candidate peptides from the database that are then matched based on the known mass of the cross-linked pair and scored as a pair against the spectrum. Isotope-labeled cross-linkers were used to sort the linear and cross-linked fragments apart. Many other search tools and approaches have been developed since (10, 1519); see (20) for a more detailed list, at least some of which follow the general idea of an open modification search (2124).As a general concept for open modification search of cross-linked peptides, cross-linked peptides represent two peptides, each with an unknown modification given by the mass of the other peptide and the cross-linker. One identifies both peptides individually and then matches them based on knowing the mass of cross-linked pair (14, 22, 24). Alternatively, one peptide is identified first and, using that peptide and the cross-linker as a modification mass, the second peptide is identified from the database (21, 23). An important element of the open modification search approach is that it essentially converts the quadratic search space of the cross-linked peptides into a linear search space of modified peptides. Still, many peptides and many modification positions have to be considered, especially when working with large databases or when using highly reactive cross-linkers with limited amino acid selectivity (25).We hypothesize that detailed knowledge of the fragmentation behavior of cross-linked peptides might reveal ways to improve the identification of cross-linked peptides. Detailed analyses of the fragmentation behavior of linear peptides exist (2628), and the analysis of the fragmentation behavior of cross-linked peptides has guided the design of scores (24, 29). Further, cross-link-specific ions have been observed from higher energy collision dissociation (HCD) data (30). Isotope-labeled cross-linkers are used to distinguish cross-linked from linear fragments, generally in low-resolution MS2 of cross-linked peptides (14).We compared the mass spectrometric behavior of cross-linked peptides to that of linear peptides, using 910 high-resolution fragment spectra matched to unique cross-linked peptides from multiple different public datasets at 5% peptide-spectrum match (PSM)1 false discovery rate (FDR). In addition, we repeated all experiments with a larger sample set that contains 8,301 spectra—also including data from ongoing studies from our lab (Supplemental material S9-S12). This paper presents the mass spectrometric signature of cross-linked peptides that we identified in our analysis and the resulting heuristics that are incorporated into an integrated strategy for the analysis and identification of cross-linked peptides. We present computational strategies that indicate the possibility of alleviating the need for mass-spectrometrically restricted cross-linker choice.  相似文献   

2.
3.
Database search programs are essential tools for identifying peptides via mass spectrometry (MS) in shotgun proteomics. Simultaneously achieving high sensitivity and high specificity during a database search is crucial for improving proteome coverage. Here we present JUMP, a new hybrid database search program that generates amino acid tags and ranks peptide spectrum matches (PSMs) by an integrated score from the tags and pattern matching. In a typical run of liquid chromatography coupled with high-resolution tandem MS, more than 95% of MS/MS spectra can generate at least one tag, whereas the remaining spectra are usually too poor to derive genuine PSMs. To enhance search sensitivity, the JUMP program enables the use of tags as short as one amino acid. Using a target-decoy strategy, we compared JUMP with other programs (e.g. SEQUEST, Mascot, PEAKS DB, and InsPecT) in the analysis of multiple datasets and found that JUMP outperformed these preexisting programs. JUMP also permitted the analysis of multiple co-fragmented peptides from “mixture spectra” to further increase PSMs. In addition, JUMP-derived tags allowed partial de novo sequencing and facilitated the unambiguous assignment of modified residues. In summary, JUMP is an effective database search algorithm complementary to current search programs.Peptide identification by tandem mass spectra is a critical step in mass spectrometry (MS)-based1 proteomics (1). Numerous computational algorithms and software tools have been developed for this purpose (26). These algorithms can be classified into three categories: (i) pattern-based database search, (ii) de novo sequencing, and (iii) hybrid search that combines database search and de novo sequencing. With the continuous development of high-performance liquid chromatography and high-resolution mass spectrometers, it is now possible to analyze almost all protein components in mammalian cells (7). In contrast to rapid data collection, it remains a challenge to extract accurate information from the raw data to identify peptides with low false positive rates (specificity) and minimal false negatives (sensitivity) (8).Database search methods usually assign peptide sequences by comparing MS/MS spectra to theoretical peptide spectra predicted from a protein database, as exemplified in SEQUEST (9), Mascot (10), OMSSA (11), X!Tandem (12), Spectrum Mill (13), ProteinProspector (14), MyriMatch (15), Crux (16), MS-GFDB (17), Andromeda (18), BaMS2 (19), and Morpheus (20). Some other programs, such as SpectraST (21) and Pepitome (22), utilize a spectral library composed of experimentally identified and validated MS/MS spectra. These methods use a variety of scoring algorithms to rank potential peptide spectrum matches (PSMs) and select the top hit as a putative PSM. However, not all PSMs are correctly assigned. For example, false peptides may be assigned to MS/MS spectra with numerous noisy peaks and poor fragmentation patterns. If the samples contain unknown protein modifications, mutations, and contaminants, the related MS/MS spectra also result in false positives, as their corresponding peptides are not in the database. Other false positives may be generated simply by random matches. Therefore, it is of importance to remove these false PSMs to improve dataset quality. One common approach is to filter putative PSMs to achieve a final list with a predefined false discovery rate (FDR) via a target-decoy strategy, in which decoy proteins are merged with target proteins in the same database for estimating false PSMs (2326). However, the true and false PSMs are not always distinguishable based on matching scores. It is a problem to set up an appropriate score threshold to achieve maximal sensitivity and high specificity (13, 27, 28).De novo methods, including Lutefisk (29), PEAKS (30), NovoHMM (31), PepNovo (32), pNovo (33), Vonovo (34), and UniNovo (35), identify peptide sequences directly from MS/MS spectra. These methods can be used to derive novel peptides and post-translational modifications without a database, which is useful, especially when the related genome is not sequenced. High-resolution MS/MS spectra greatly facilitate the generation of peptide sequences in these de novo methods. However, because MS/MS fragmentation cannot always produce all predicted product ions, only a portion of collected MS/MS spectra have sufficient quality to extract partial or full peptide sequences, leading to lower sensitivity than achieved with the database search methods.To improve the sensitivity of the de novo methods, a hybrid approach has been proposed to integrate peptide sequence tags into PSM scoring during database searches (36). Numerous software packages have been developed, such as GutenTag (37), InsPecT (38), Byonic (39), DirecTag (40), and PEAKS DB (41). These methods use peptide tag sequences to filter a protein database, followed by error-tolerant database searching. One restriction in most of these algorithms is the requirement of a minimum tag length of three amino acids for matching protein sequences in the database. This restriction reduces the sensitivity of the database search, because it filters out some high-quality spectra in which consecutive tags cannot be generated.In this paper, we describe JUMP, a novel tag-based hybrid algorithm for peptide identification. The program is optimized to balance sensitivity and specificity during tag derivation and MS/MS pattern matching. JUMP can use all potential sequence tags, including tags consisting of only one amino acid. When we compared its performance to that of two widely used search algorithms, SEQUEST and Mascot, JUMP identified ∼30% more PSMs at the same FDR threshold. In addition, the program provides two additional features: (i) using tag sequences to improve modification site assignment, and (ii) analyzing co-fragmented peptides from mixture MS/MS spectra.  相似文献   

4.
Oxidative modifications of protein tyrosines have been implicated in multiple human diseases. Among these modifications, elevations in levels of 3,4-dihydroxyphenylalanine (DOPA), a major product of hydroxyl radical addition to tyrosine, has been observed in a number of pathologies. Here we report the first proteome survey of endogenous site-specific modifications, i.e. DOPA and its further oxidation product dopaquinone in mouse brain and heart tissues. Results from LC-MS/MS analyses included 50 and 14 DOPA-modified tyrosine sites identified from brain and heart, respectively, whereas only a few nitrotyrosine-containing peptides, a more commonly studied marker of oxidative stress, were detectable, suggesting the much higher abundance for DOPA modification as compared with tyrosine nitration. Moreover, 20 and 12 dopaquinone-modified peptides were observed from brain and heart, respectively; nearly one-fourth of these peptides were also observed with DOPA modification on the same sites. For both tissues, these modifications are preferentially found in mitochondrial proteins with metal binding properties, consistent with metal-catalyzed hydroxyl radical formation from mitochondrial superoxide and hydrogen peroxide. These modifications also link to a number of mitochondrially associated and other signaling pathways. Furthermore, many of the modification sites were common sites of previously reported tyrosine phosphorylation, suggesting potential disruption of signaling pathways. Collectively, the results suggest that these modifications are linked with mitochondrially derived oxidative stress and may serve as sensitive markers for disease pathologies.Generation of reactive oxygen species (ROS)1 and reactive nitrogen species is a normal consequence of aerobic metabolism that, in excess, results in oxidative stress that further leads to oxidative modification of proteins, lipids, and DNA, events that may lead to altered cellular function and even cell death (1, 2). Chronic oxidative stress is well recognized as having a central role in disease and is responsible for both direct alteration of biomolecular structure-function and compensatory changes in cellular processes (14). It is increasingly recognized that oxidative modifications of proteins can serve as potential biomarkers indicative of the physiological states and changes that occur during disease progression. Thus, the ability to quantitatively measure specific protein oxidation products has the potential to provide the means to monitor the physiological state of a tissue or organism, in particular any progression toward pathology. Given Parkinson disease (PD) as an example, a number of oxidative modifications on proteins pertinent to PD have been identified, further supporting the potential importance of oxidative modifications to disease pathogenesis (5).Many oxidative modifications on specific amino acid residues, such as protein carbonylation (6), cysteine S-nitrosylation (79), cysteine oxidation to sulfinic or sulfonic acid (1012), methionine oxidation (13, 14), and tyrosine nitration (1521) within complex protein mixtures, have been detected by MS-based proteomics; however, their low abundance levels within complex proteomes often hinder confident identification of these potentially significant modifications (22). For example, tyrosine nitration is a well studied post-translational modification mediated by peroxynitrite (ONOO) or nitrogen dioxide (·NO2), which commonly occur in cells during oxidative stress and inflammation; however, only a small number of nitrotyrosine proteins have been identified from a given proteome sample because of insufficient analytical sensitivity and the chance of incorrect peptide assignments (19, 23). With recent advances in high resolution MS that provide high mass measurement accuracy, the ability to confidently identify modified peptides has been significantly enhanced (24).Hydroxyl radical (HO·) is one of the most reactive and major species generated under aerobic conditions in biological systems (1, 25, 26). Among several HO·-mediated oxidative modifications, the protein tyrosine modification 3,4-dihydroxyphenylalanine (DOPA) has been reported as a major product and index of HO· attack on tyrosine residues in proteins (Fig. 1) (27, 28). DOPA is also formed on protein tyrosine residues via controlled enzymatic pathways through enzymes such as tyrosinase or tyrosine hydroxylase (28). Once formed, protein-bound DOPA has the potential to initiate further oxidative reactions through binding and reducing transition metals or through redox cycling between catechol and quinone (dopaquinone) forms (29, 30). Recent studies have suggested that protein-bound DOPA is involved in triggering antioxidant defenses (30) and mediating oxidative damage to DNA (31). Moreover, elevated levels of protein-bound DOPA have been reported in several diseases, including atherosclerosis, cataracts, and myocardial disease, and in PD patients undergoing levodopa therapy (26, 3236). However, the specific DOPA-modified proteins, which could provide mechanistic knowledge of the progression of these diseases, have not been identified (27, 28). The ability to identify site-specific protein modifications should lead to a better understanding of the role of DOPA modification in disease pathologies as well as new molecular signatures or therapeutic targets for diseases.Open in a separate windowFig. 1.DOPA and dopaquinone formation from tyrosine.Therefore, in this study, we demonstrate the ability to identify site-specific DOPA and dopaquinone (DQ) modifications on protein tyrosine residues in normal mouse brain and heart tissues and their relative stoichiometries that are present in vivo under non-stressed conditions. Such endogenous protein modifications were detected using LC-MS/MS. The results from this global proteomics survey suggests that HO· in tissues under normal conditions is generated largely from the mitochondria and metal-binding proteins where the resulting DOPA/DQ modifications have the potential to disrupt mitochondrial respiration as well as alter tyrosine phosphorylation signaling pathways such as 14-3-3-mediated signaling in brain tissue.  相似文献   

5.
6.
A complete understanding of the biological functions of large signaling peptides (>4 kDa) requires comprehensive characterization of their amino acid sequences and post-translational modifications, which presents significant analytical challenges. In the past decade, there has been great success with mass spectrometry-based de novo sequencing of small neuropeptides. However, these approaches are less applicable to larger neuropeptides because of the inefficient fragmentation of peptides larger than 4 kDa and their lower endogenous abundance. The conventional proteomics approach focuses on large-scale determination of protein identities via database searching, lacking the ability for in-depth elucidation of individual amino acid residues. Here, we present a multifaceted MS approach for identification and characterization of large crustacean hyperglycemic hormone (CHH)-family neuropeptides, a class of peptide hormones that play central roles in the regulation of many important physiological processes of crustaceans. Six crustacean CHH-family neuropeptides (8–9.5 kDa), including two novel peptides with extensive disulfide linkages and PTMs, were fully sequenced without reference to genomic databases. High-definition de novo sequencing was achieved by a combination of bottom-up, off-line top-down, and on-line top-down tandem MS methods. Statistical evaluation indicated that these methods provided complementary information for sequence interpretation and increased the local identification confidence of each amino acid. Further investigations by MALDI imaging MS mapped the spatial distribution and colocalization patterns of various CHH-family neuropeptides in the neuroendocrine organs, revealing that two CHH-subfamilies are involved in distinct signaling pathways.Neuropeptides and hormones comprise a diverse class of signaling molecules involved in numerous essential physiological processes, including analgesia, reward, food intake, learning and memory (1). Disorders of the neurosecretory and neuroendocrine systems influence many pathological processes. For example, obesity results from failure of energy homeostasis in association with endocrine alterations (2, 3). Previous work from our lab used crustaceans as model organisms found that multiple neuropeptides were implicated in control of food intake, including RFamides, tachykinin related peptides, RYamides, and pyrokinins (46).Crustacean hyperglycemic hormone (CHH)1 family neuropeptides play a central role in energy homeostasis of crustaceans (717). Hyperglycemic response of the CHHs was first reported after injection of crude eyestalk extract in crustaceans. Based on their preprohormone organization, the CHH family can be grouped into two sub-families: subfamily-I containing CHH, and subfamily-II containing molt-inhibiting hormone (MIH) and mandibular organ-inhibiting hormone (MOIH). The preprohormones of the subfamily-I have a CHH precursor related peptide (CPRP) that is cleaved off during processing; and preprohormones of the subfamily-II lack the CPRP (9). Uncovering their physiological functions will provide new insights into neuroendocrine regulation of energy homeostasis.Characterization of CHH-family neuropeptides is challenging. They are comprised of more than 70 amino acids and often contain multiple post-translational modifications (PTMs) and complex disulfide bridge connections (7). In addition, physiological concentrations of these peptide hormones are typically below picomolar level, and most crustacean species do not have available genome and proteome databases to assist MS-based sequencing.MS-based neuropeptidomics provides a powerful tool for rapid discovery and analysis of a large number of endogenous peptides from the brain and the central nervous system. Our group and others have greatly expanded the peptidomes of many model organisms (3, 1833). For example, we have discovered more than 200 neuropeptides with several neuropeptide families consisting of as many as 20–40 members in a simple crustacean model system (5, 6, 2531, 34). However, a majority of these neuropeptides are small peptides with 5–15 amino acid residues long, leaving a gap of identifying larger signaling peptides from organisms without sequenced genome. The observed lack of larger size peptide hormones can be attributed to the lack of effective de novo sequencing strategies for neuropeptides larger than 4 kDa, which are inherently more difficult to fragment using conventional techniques (3437). Although classical proteomics studies examine larger proteins, these tools are limited to identification based on database searching with one or more peptides matching without complete amino acid sequence coverage (36, 38).Large populations of neuropeptides from 4–10 kDa exist in the nervous systems of both vertebrates and invertebrates (9, 39, 40). Understanding their functional roles requires sufficient molecular knowledge and a unique analytical approach. Therefore, developing effective and reliable methods for de novo sequencing of large neuropeptides at the individual amino acid residue level is an urgent gap to fill in neurobiology. In this study, we present a multifaceted MS strategy aimed at high-definition de novo sequencing and comprehensive characterization of the CHH-family neuropeptides in crustacean central nervous system. The high-definition de novo sequencing was achieved by a combination of three methods: (1) enzymatic digestion and LC-tandem mass spectrometry (MS/MS) bottom-up analysis to generate detailed sequences of proteolytic peptides; (2) off-line LC fractionation and subsequent top-down MS/MS to obtain high-quality fragmentation maps of intact peptides; and (3) on-line LC coupled to top-down MS/MS to allow rapid sequence analysis of low abundance peptides. Combining the three methods overcomes the limitations of each, and thus offers complementary and high-confidence determination of amino acid residues. We report the complete sequence analysis of six CHH-family neuropeptides including the discovery of two novel peptides. With the accurate molecular information, MALDI imaging and ion mobility MS were conducted for the first time to explore their anatomical distribution and biochemical properties.  相似文献   

7.
Posttranslational modifications of proteins increase the complexity of the cellular proteome and enable rapid regulation of protein functions in response to environmental changes. Protein ubiquitylation is a central regulatory posttranslational modification that controls numerous biological processes including proteasomal degradation of proteins, DNA damage repair and innate immune responses. Here we combine high-resolution mass spectrometry with single-step immunoenrichment of di-glycine modified peptides for mapping of endogenous putative ubiquitylation sites in murine tissues. We identify more than 20,000 unique ubiquitylation sites on proteins involved in diverse biological processes. Our data reveals that ubiquitylation regulates core signaling pathways common for each of the studied tissues. In addition, we discover that ubiquitylation regulates tissue-specific signaling networks. Many tissue-specific ubiquitylation sites were obtained from brain highlighting the complexity and unique physiology of this organ. We further demonstrate that different di-glycine-lysine-specific monoclonal antibodies exhibit sequence preferences, and that their complementary use increases the depth of ubiquitylation site analysis, thereby providing a more unbiased view of protein ubiquitylation.Ubiquitin is a small 76-amino-acid protein that is conjugated to the ε-amino group of lysines in a highly orchestrated enzymatic cascade involving ubiquitin activating (E1), ubiquitin conjugating (E2), and ubiquitin ligase (E3) enzymes (1). Ubiquitylation is involved in the regulation of diverse cellular processes including protein degradation (2, 3, 4), DNA damage repair (5, 6), DNA replication (7), cell surface receptor endocytosis, and innate immune signaling (8, 9). Deregulation of protein ubiquitylation is implicated in the development of cancer and neurodegenerative diseases (10, 11). Inhibitors targeting the ubiquitin proteasome system are used in the treatment of hematologic malignancies such as multiple myeloma (12, 13).Recent developments in the mass spectrometry (MS)-based proteomics have greatly expedited proteome-wide analysis of posttranslational modifications (PTMs) (1417). Large-scale mapping of ubiquitylation sites by mass spectrometry is based on the identification of the di-glycine remnant that results from trypsin digestion of ubiquitylated proteins and remains attached to ubiquitylated lysines (18). Recently, two monoclonal antibodies were developed that specifically recognize di-glycine remnant modified peptides enabling their efficient enrichment from complex peptide mixtures (19, 20). These antibodies have been used to identify thousands of endogenous ubiquitylation sites in human cells, and to quantify site-specific changes in ubiquitylation in response to different cellular perturbations (2022). It should be noted that the di-glycine remnant is not specific for proteins modified by ubiquitin but also proteins modified by NEDD8 or ISG15 generate an identical di-glycine remnant on modified lysines making it impossible to distinguish between these modifications by mass spectrometry. However, expression of NEDD8 in mouse tissues was shown to be developmentally down-regulated (23), and ISG15 expression in bovine tissues is low in the absence of interferon stimulation (24). In cell culture experiments it was shown that a great majority of sites identified using di-glycine-lysine-specific antibodies stems from ubiquitylated peptides (20).The rates of cell proliferation and protein turnover in mammals vary dramatically between different tissues. Immortalized cell lines, often derived from cancer, are selected for high proliferation rates and fail to represent the complex conditions in tissues. Tissue proteomics can help to gain a more comprehensive understanding of physiological processes in multicellular organisms. Analysis of tissue proteome and PTMs can provide important insights into tissue-specific processes and signaling networks that regulate these processes (2532). In addition, development of mass spectrometric methods for analysis of PTMs in diseased tissues might lead to the identification of biomarkers.In this study, we combined high-resolution mass spectrometry with immunoenrichment of di-glycine modified peptides to investigate endogenous ubiquitylation sites in murine tissues. We identified more than 20,000 ubiquitylation sites from five different murine tissues and report the largest ubiquitylation dataset obtained from mammalian tissues to date. Furthermore, we compared the performance of the two monoclonal di-glycine-lysine-specific antibodies available for enrichment of ubiquitylated peptides, and reveal their relative preferences for different amino acids flanking ubiquitylation sites.  相似文献   

8.
Knowledge of elaborate structures of protein complexes is fundamental for understanding their functions and regulations. Although cross-linking coupled with mass spectrometry (MS) has been presented as a feasible strategy for structural elucidation of large multisubunit protein complexes, this method has proven challenging because of technical difficulties in unambiguous identification of cross-linked peptides and determination of cross-linked sites by MS analysis. In this work, we developed a novel cross-linking strategy using a newly designed MS-cleavable cross-linker, disuccinimidyl sulfoxide (DSSO). DSSO contains two symmetric collision-induced dissociation (CID)-cleavable sites that allow effective identification of DSSO-cross-linked peptides based on their distinct fragmentation patterns unique to cross-linking types (i.e. interlink, intralink, and dead end). The CID-induced separation of interlinked peptides in MS/MS permits MS3 analysis of single peptide chain fragment ions with defined modifications (due to DSSO remnants) for easy interpretation and unambiguous identification using existing database searching tools. Integration of data analyses from three generated data sets (MS, MS/MS, and MS3) allows high confidence identification of DSSO cross-linked peptides. The efficacy of the newly developed DSSO-based cross-linking strategy was demonstrated using model peptides and proteins. In addition, this method was successfully used for structural characterization of the yeast 20 S proteasome complex. In total, 13 non-redundant interlinked peptides of the 20 S proteasome were identified, representing the first application of an MS-cleavable cross-linker for the characterization of a multisubunit protein complex. Given its effectiveness and simplicity, this cross-linking strategy can find a broad range of applications in elucidating the structural topology of proteins and protein complexes.Proteins form stable and dynamic multisubunit complexes under different physiological conditions to maintain cell viability and normal cell homeostasis. Detailed knowledge of protein interactions and protein complex structures is fundamental to understanding how individual proteins function within a complex and how the complex functions as a whole. However, structural elucidation of large multisubunit protein complexes has been difficult because of a lack of technologies that can effectively handle their dynamic and heterogeneous nature. Traditional methods such as nuclear magnetic resonance (NMR) analysis and x-ray crystallography can yield detailed information on protein structures; however, NMR spectroscopy requires large quantities of pure protein in a specific solvent, whereas x-ray crystallography is often limited by the crystallization process.In recent years, chemical cross-linking coupled with mass spectrometry (MS) has become a powerful method for studying protein interactions (13). Chemical cross-linking stabilizes protein interactions through the formation of covalent bonds and allows the detection of stable, weak, and/or transient protein-protein interactions in native cells or tissues (49). In addition to capturing protein interacting partners, many studies have shown that chemical cross-linking can yield low resolution structural information about the constraints within a molecule (2, 3, 10) or protein complex (1113). The application of chemical cross-linking, enzymatic digestion, and subsequent mass spectrometric and computational analyses for the elucidation of three-dimensional protein structures offers distinct advantages over traditional methods because of its speed, sensitivity, and versatility. Identification of cross-linked peptides provides distance constraints that aid in constructing the structural topology of proteins and/or protein complexes. Although this approach has been successful, effective detection and accurate identification of cross-linked peptides as well as unambiguous assignment of cross-linked sites remain extremely challenging due to their low abundance and complicated fragmentation behavior in MS analysis (2, 3, 10, 14). Therefore, new reagents and methods are urgently needed to allow unambiguous identification of cross-linked products and to improve the speed and accuracy of data analysis to facilitate its application in structural elucidation of large protein complexes.A number of approaches have been developed to facilitate MS detection of low abundance cross-linked peptides from complex mixtures. These include selective enrichment using affinity purification with biotinylated cross-linkers (1517) and click chemistry with alkyne-tagged (18) or azide-tagged (19, 20) cross-linkers. In addition, Staudinger ligation has recently been shown to be effective for selective enrichment of azide-tagged cross-linked peptides (21). Apart from enrichment, detection of cross-linked peptides can be achieved by isotope-labeled (2224), fluorescently labeled (25), and mass tag-labeled cross-linking reagents (16, 26). These methods can identify cross-linked peptides with MS analysis, but interpretation of the data generated from interlinked peptides (two peptides connected with the cross-link) by automated database searching remains difficult. Several bioinformatics tools have thus been developed to interpret MS/MS data and determine interlinked peptide sequences from complex mixtures (12, 14, 2732). Although promising, further developments are still needed to make such data analyses as robust and reliable as analyzing MS/MS data of single peptide sequences using existing database searching tools (e.g. Protein Prospector, Mascot, or SEQUEST).Various types of cleavable cross-linkers with distinct chemical properties have been developed to facilitate MS identification and characterization of cross-linked peptides. These include UV photocleavable (33), chemical cleavable (19), isotopically coded cleavable (24), and MS-cleavable reagents (16, 26, 3438). MS-cleavable cross-linkers have received considerable attention because the resulting cross-linked products can be identified based on their characteristic fragmentation behavior observed during MS analysis. Gas-phase cleavage sites result in the detection of a “reporter” ion (26), single peptide chain fragment ions (3538), or both reporter and fragment ions (16, 34). In each case, further structural characterization of the peptide product ions generated during the cleavage reaction can be accomplished by subsequent MSn1 analysis. Among these linkers, the “fixed charge” sulfonium ion-containing cross-linker developed by Lu et al. (37) appears to be the most attractive as it allows specific and selective fragmentation of cross-linked peptides regardless of their charge and amino acid composition based on their studies with model peptides.Despite the availability of multiple types of cleavable cross-linkers, most of the applications have been limited to the study of model peptides and single proteins. Additionally, complicated synthesis and fragmentation patterns have impeded most of the known MS-cleavable cross-linkers from wide adaptation by the community. Here we describe the design and characterization of a novel and simple MS-cleavable cross-linker, DSSO, and its application to model peptides and proteins and the yeast 20 S proteasome complex. In combination with new software developed for data integration, we were able to identify DSSO-cross-linked peptides from complex peptide mixtures with speed and accuracy. Given its effectiveness and simplicity, we anticipate a broader application of this MS-cleavable cross-linker in the study of structural topology of other protein complexes using cross-linking and mass spectrometry.  相似文献   

9.
Mass spectrometry is a powerful alternative to antibody-based methods for the analysis of histone post-translational modifications (marks). A key development in this approach was the deliberate propionylation of histones to improve sequence coverage across the lysine-rich and hydrophilic tails that bear most modifications. Several marks continue to be problematic however, particularly di- and tri-methylated lysine 4 of histone H3 which we found to be subject to substantial and selective losses during sample preparation and liquid chromatography-mass spectrometry. We developed a new method employing a “one-pot” hybrid chemical derivatization of histones, whereby an initial conversion of free lysines to their propionylated forms under mild aqueous conditions is followed by trypsin digestion and labeling of new peptide N termini with phenyl isocyanate. High resolution mass spectrometry was used to collect qualitative and quantitative data, and a novel web-based software application (Fishtones) was developed for viewing and quantifying histone marks in the resulting data sets. Recoveries of 53 methyl, acetyl, and phosphoryl marks on histone H3.1 were improved by an average of threefold overall, and over 50-fold for H3K4 di- and tri-methyl marks. The power of this workflow for epigenetic research and drug discovery was demonstrated by measuring quantitative changes in H3K4 trimethylation induced by small molecule inhibitors of lysine demethylases and siRNA knockdown of epigenetic modifiers ASH2L and WDR5.The field of Epigenetics has become important in drug discovery as many diseases have been linked to aberrations in chromatin and changes of histone post-translational modifications (PTMs)1 (1, 2). The core histones (H2A, H2B, H3, and H4 and their variants) undergo a multitude of PTMs. Some, like lysine acetylation, lysine mono-, di-, and trimethlyation, and serine/threonine phosphorylation are well documented, with over 100 distinct, albeit generally low abundance, modifications reported for H3 alone (3). Mass spectrometry provides an alternative to antibody-based methods for detecting and quantifying histone PTMs, as the latter are prone to problems of specificity and epitope occlusion (4, 5). The most commonly applied approach to date is known as “bottom-up” mass spectrometry and involves an initial processing of the histones into smaller peptides (6). A key development in histone PTM analysis was the deliberate chemical modification of histone tail lysines by propionic anhydride, preventing digestion of these Lys- and Arg-rich domains into peptides too short or hydrophilic to be detected in reverse-phase liquid chromatography-mass spectrometry experiments (79).Despite this advance, some marks like H3K4 di- and tri-methylation remain problematic; in several examples from the recent literature the H3K4me3 mark is detected either only by means of specifically targeted methods (5), with larger quantitative variation than other marks (10), or not reported among detected marks at all (3, 1113). Alternative approaches include top-down or middle-down mass spectrometry, in which entire histones, or large segments thereof are analyzed directly (1416), but these techniques still suffer from relatively poor sensitivity in comparison to bottom-up workflows, and must contend with the full combinatorial complexity of histone PTMs (17).The H3K4me3 mark is of low natural abundance, having a very restricted genomic localization strongly associated with active gene promotors and enhancers (18, 19), and aberrant activities of writers and erasers of that mark are associated with a variety of diseases (1, 2). Difficulties in its quantitation thus hinder the investigation of both fundamental biology and the discovery of lifesaving drugs. We therefore undertook a re-evaluation of the bottom-up histone PTM workflow, streamlining sample preparation and investigating sources of bias or sample loss. Alternatives to the standard propionylation technique were also explored, resulting in a new hybrid chemical modification workflow yielding across-the-board improvements in recovery of peptides from the N-terminal tail of histone H3, and dramatically improved detection of hydrophilic peptides with marks like H3K4me2/me3.  相似文献   

10.
We present the first comprehensive capillary electrophoresis electrospray ionization mass spectrometry (CESI-MS) analysis of post-translational modifications derived from H1 and core histones. Using a capillary electrophoresis system equipped with a sheathless high-sensitivity porous sprayer and nano–liquid chromatography electrospray ionization mass spectrometry (nano-LC-ESI-MS) as two complementary techniques, we characterized H1 histones isolated from rat testis. Without any pre-separation of the perchloric acid extraction, a total of 70 different modified peptides, including 50 phosphopeptides, were identified in the rat linker histones H1.0, H1a-H1e, and H1t. Out of the 70 modified H1 histone peptides, 27 peptides could be identified with CESI-MS only, and 11 solely with LC-ESI-MS. Immobilized metal-affinity chromatography enrichment prior to MS analysis yielded a total of 55 phosphopeptides; 22 of these peptides could be identified only by CESI-MS, and 19 only by LC-ESI-MS, showing the complementarity of the two techniques. We mapped 42 H1 modification sites, including 31 phosphorylation sites, of which 8 were novel sites. For the analysis of core histones, we chose a different strategy. In a first step, the sulfuric-acid-extracted core histones were pre-separated using reverse-phase high-performance liquid chromatography. Individual rat testis core histone fractions obtained in this way were digested and analyzed via bottom-up CESI-MS. This approach yielded the identification of 42 different modification sites including acetylation (lysine and Nα-terminal); mono-, di-, and trimethylation; and phosphorylation. When we applied CESI-MS for the analysis of intact core histone subtypes from butyrate-treated mouse tumor cells, we were able to rapidly detect their degree of modification, and we found this method very useful for the separation of isobaric trimethyl and acetyl modifications. Taken together, our results highlight the need for additional techniques for the comprehensive analysis of post-translational modifications. CESI-MS is a promising new proteomics tool as demonstrated by this, the first comprehensive analysis of histone modifications, using rat testis as an example.Histones are the most intensively studied group of basic nuclear proteins and are of great importance with regard to the organization of chromatin structure and control of gene activity. They are highly conserved during evolution, binding to and condensing eukaryotic chromosomal DNA to form chromatin. The fundamental chromatin subunit is the nucleosome, in which 166 bp of DNA are wrapped around a core histone octamer and a further ∼40 bp constitute the linker between one nucleosome core and the next. The histone octamer contains two molecules of each of the core histones H2A, H2B, H3, and H4. A fifth type of histone, referred to as linker histone (H1, H5), binds to both the DNA on the outer surface of nucleosomes and the linker DNA.There are numerous microsequence variants of linker and core histones (except H4) differing only slightly in primary sequence. In rat testis, for example, six somatic H1 subtypes, designated as H1a, H1b, H1c, H1d, H1e, and H1.0, as well as germ cell specific subtypes (i.e. H1t, H1T2, and HILS1), have been identified (13). Under various biological conditions, all histone proteins, for both linker and core histones, are subjected to post-translational modifications, including phosphorylation, acetylation, methylation, ubiquitination, deamidation, glycosylation, and ADP-ribosylation, which have a great influence on the epigenetic control of gene expression (46). The multitude of histone proteins resulting from closely related sequence variants and post-translational modifications, as well as their highly basic nature combined with hydrophobic properties, provides a major analytical challenge in current proteomics research. Over the past several years, considerable efforts have been expended to develop methods to identify the specific sites of histone modifications. Mass spectrometry (MS) coupled to liquid chromatography (LC) is the dominant technique for their characterization (714). However, because histone proteins contain up to nearly 35% basic amino acids, the analysis of histone peptides is still problematic, as digestion with many commonly used enzymes (e.g. trypsin, Lys-C, etc.) causes the formation of many short and polar peptides that poorly interact with the reverse-phase (RP)1 material and go undetected by conventional liquid chromatography electrospray ionization mass spectrometry (LC-ESI-MS). To overcome this problem, chemical derivatization such as propionylation is often applied (15, 16).Capillary electrophoresis (CE) overcomes this disadvantage; this technique allows separations based on the mass-to-charge ratio of peptides and does not utilize their hydrophobic nature as a separation principle. The methods of electrophoresis and LC and their applicability for histone analysis have been reviewed in detail by Lindner (17). CE has proven to be a remarkably powerful method for separating individual histones and their modified forms based on their different electrophoretic mobilities. Using a bare fused silica capillary and hydroxypropylmethyl cellulose (HPMC) as a buffer additive in order to avoid undesired protein adsorption, different core and linker histones and their multiply phosphorylated and acetylated forms were successfully separated via capillary zone electrophoresis (CZE) (1822). So far, no data have been published about the identification of histone modifications by means of capillary electrophoresis electrospray ionization mass spectrometry (CESI-MS). LC is given preference over CE because of the difficulty of achieving on-line interfacing of CE with MS that allows stable electrospray processes without compromising the quality of separation or the detection sensitivity. However, CE-MS is a promising technique with constantly increasing importance, as documented by numerous articles (2326).Various interfaces have been constructed to improve CESI-MS coupling (27, 28). Sheathflow interfaces are the most widely used, and although the drawback of having to dilute the analyte is inherent in this kind of interface, they offer stable electrophoretic separations and allow greater versatility in the choice of background electrolyte (BGE) and the range of flow rates (2932). Sheathless interfaces have generated interest because no sheath liquid is added, which leads to enhanced detection sensitivity (33, 34). However, they have not been used frequently because of their limited robustness and lack of well-established interfaces and routine analysis protocols. The most widely used method for establishing the terminating electrical contact is coating the outer surface of the CE capillary tip with a conductive material (3537). Unfortunately, the lifetimes of such coatings are generally very limited, as they suffer from deterioration under the influence of the high voltages applied.A recently published concept of a sheathless interface based on a separation capillary with a porous tip acting as a nanospray emitter overcomes these disadvantages (38). The capillary tip is etched using hydrofluoric acid until the capillary wall becomes so thin and porous that an electric contact can be established. The performance of this methodology, which combines the low-flow characteristics of CE with an integrated ESI source, is described in Refs. 3941. Applications such as the analysis of intact proteins (42), protein–protein and protein–metal complexes (43), and ribosomal protein digests from E. coli (44) have been published. Method-inherent advantages of CESI-MS are highly efficient separations, low flow rates leading to reduced ion suppression, and greater sensitivity (40). In contrast to nano-LC, no column equilibration is needed, there are no gradient effects, and the instrumentation is less maintenance-intensive.Our group recently described important features of CESI-MS and reported the comparison of this method with LC-ESI-MS for the analysis of a 5% perchloric acid extraction of rat testis consisting mainly of different histone H1 subtypes (39). The performance of both techniques was evaluated regarding analysis time, protein sequence coverage, and number and molecular mass distribution of the identified peptides. The CESI-MS method provided shorter analysis times, narrower peaks yielding high signals, and the identification of a greater number of low molecular mass range peptides than LC-ESI-MS (39).In the current study, we investigated the analysis of post-translationally modified peptides, particularly phosphopeptides, obtained from endoproteinase Arg-C digested histones from rat testis; this organ contains the whole set of somatic and germ cell specific H1 histones, as well as numerous modified core histone proteins. CESI-MS and LC-ESI-MS were compared regarding the number and type of identified modified peptides. Without any pre-separation of the perchloric acid extraction, we found numerous known and novel modification sites in linker histones. In addition, immobilized metal-affinity chromatography (IMAC) experiments were utilized to enrich phosphopeptides prior to MS analysis. CESI-MS was also used for the rapid identification of post-translational modifications (PTMs) of rat testis core histones, which were pre-fractionated via RP-HPLC and digested with Arg-C. Using core histones from butyrate-treated mouse erythroleukemia cells, we further demonstrated that our method achieves excellent separations of intact histone subtypes and their multiply modified forms and enables the detection of the extent of PTMs in a fast and reproducible way. Our work represents the first detailed characterization of modified linker and core histone peptides and clearly demonstrates that CESI-MS is a promising alternative tool for epigenetic studies.  相似文献   

11.
Understanding how a small brain region, the suprachiasmatic nucleus (SCN), can synchronize the body''s circadian rhythms is an ongoing research area. This important time-keeping system requires a complex suite of peptide hormones and transmitters that remain incompletely characterized. Here, capillary liquid chromatography and FTMS have been coupled with tailored software for the analysis of endogenous peptides present in the SCN of the rat brain. After ex vivo processing of brain slices, peptide extraction, identification, and characterization from tandem FTMS data with <5-ppm mass accuracy produced a hyperconfident list of 102 endogenous peptides, including 33 previously unidentified peptides, and 12 peptides that were post-translationally modified with amidation, phosphorylation, pyroglutamylation, or acetylation. This characterization of endogenous peptides from the SCN will aid in understanding the molecular mechanisms that mediate rhythmic behaviors in mammals.Central nervous system neuropeptides function in cell-to-cell signaling and are involved in many physiological processes such as circadian rhythms, pain, hunger, feeding, and body weight regulation (14). Neuropeptides are produced from larger protein precursors by the selective action of endopeptidases, which cleave at mono- or dibasic sites and then remove the C-terminal basic residues (1, 2). Some neuropeptides undergo functionally important post-translational modifications (PTMs),1 including amidation, phosphorylation, pyroglutamylation, or acetylation. These aspects of peptide synthesis impact the properties of neuropeptides, further expanding their diverse physiological implications. Therefore, unveiling new peptides and unreported peptide properties is critical to advancing our understanding of nervous system function.Historically, the analysis of neuropeptides was performed by Edman degradation in which the N-terminal amino acid is sequentially removed. However, analysis by this method is slow and does not allow for sequencing of the peptides containing N-terminal PTMs (5). Immunological techniques, such as radioimmunoassay and immunohistochemistry, are used for measuring relative peptide levels and spatial localization, but these methods only detect peptide sequences with known structure (6). More direct, high throughput methods of analyzing brain regions can be used.Mass spectrometry, a rapid and sensitive method that has been used for the analysis of complex biological samples, can detect and identify the precise forms of neuropeptides without prior knowledge of peptide identity, with these approaches making up the field of peptidomics (712). The direct tissue and single neuron analysis by MALDI MS has enabled the discovery of hundreds of neuropeptides in the last decade, and the neuronal homogenate analysis by fractionation and subsequent ESI or MALDI MS has yielded an equivalent number of new brain peptides (5). Several recent peptidome studies, including the work by Dowell et al. (10), have used the specificity of FTMS for peptide discovery (10, 1315). Here, we combine the ability to fragment ions at ultrahigh mass accuracy (16) with a software pipeline designed for neuropeptide discovery. We use nanocapillary reversed-phase LC coupled to 12 Tesla FTMS for the analysis of peptides present in the suprachiasmatic nucleus (SCN) of rat brain.A relatively small, paired brain nucleus located at the base of the hypothalamus directly above the optic chiasm, the SCN contains a biological clock that generates circadian rhythms in behaviors and homeostatic functions (17, 18). The SCN comprises ∼10,000 cellular clocks that are integrated as a tissue level clock which, in turn, orchestrates circadian rhythms throughout the brain and body. It is sensitive to incoming signals from the light-sensing retina and other brain regions, which cause temporal adjustments that align the SCN appropriately with changes in environmental or behavioral state. Previous physiological studies have implicated peptides as critical synchronizers of normal SCN function as well as mediators of SCN inputs, internal signal processing, and outputs; however, only a small number of peptides have been identified and explored in the SCN, leaving unresolved many circadian mechanisms that may involve peptide function.Most peptide expression in the SCN has only been studied through indirect antibody-based techniques (1929), although we recently used MS approaches to characterize several peptides detected in SCN releasates (30). Previous studies indicate that the SCN expresses a rich diversity of peptides relative to other brain regions studied with the same techniques. Previously used immunohistochemical approaches are not only inadequate for comprehensively evaluating PTMs and alternate isoforms of known peptides but are also incapable of exhaustively examining the full peptide complement of this complex biological network of peptidergic inputs and intrinsic components. A comprehensive study of SCN peptidomics is required that utilizes high resolution strategies for directly analyzing the peptide content of the neuronal networks comprising the SCN.In our study, the SCN was obtained from ex vivo coronal brain slices via tissue punch and subjected to multistage peptide extraction. The SCN tissue extract was analyzed by FTMS/MS, and the high resolution MS and MS/MS data were processed using ProSightPC 2.0 (16), which allows the identification and characterization of peptides or proteins from high mass accuracy MS/MS data. In addition, the Sequence Gazer included in ProSightPC was used for manually determining PTMs (31, 32). As a result, a total of 102 endogenous peptides were identified, including 33 that were previously unidentified, and 12 PTMs (including amidation, phosphorylation, pyroglutamylation, and acetylation) were found. The present study is the first comprehensive peptidomics study for identifying peptides present within the mammalian SCN. In fact, this is one of the first peptidome studies to work with discrete brain nuclei as opposed to larger brain structures and follows up on our recent report using LC-ion trap for analysis of the peptides in the supraoptic nucleus (33); here, the use of FTMS allows a greater range of PTMs to be confirmed and allows higher confidence in the peptide assignments. This information on the peptides in the SCN will serve as a basis to more exhaustively explore the extent that previously unreported SCN neuropeptides may function in SCN regulation of mammalian circadian physiology.  相似文献   

12.
13.
The localization of phosphorylation sites in peptide sequences is a challenging problem in large-scale phosphoproteomics analysis. The intense neutral loss peaks and the coexistence of multiple serine/threonine and/or tyrosine residues are limiting factors for objectively scoring site patterns across thousands of peptides. Various computational approaches for phosphorylation site localization have been proposed, including Ascore, Mascot Delta score, and ProteinProspector, yet few address direct estimation of the false localization rate (FLR) in each experiment. Here we propose LuciPHOr, a modified target-decoy-based approach that uses mass accuracy and peak intensities for site localization scoring and FLR estimation. Accurate estimation of the FLR is a difficult task at the individual-site level because the degree of uncertainty in localization varies significantly across different peptides. LuciPHOr carries out simultaneous localization on all candidate sites in each peptide and estimates the FLR based on the target-decoy framework, where decoy phosphopeptides generated by placing artificial phosphorylation(s) on non-candidate residues compete with the non-decoy phosphopeptides. LuciPHOr also reports approximate site-level confidence scores for all candidate sites as a means to localize additional sites from multiphosphorylated peptides in which localization can be partially achieved. Unlike the existing tools, LuciPHOr is compatible with any search engine output processed through the Trans-Proteomic Pipeline. We evaluated the performance of LuciPHOr in terms of the sensitivity and accuracy of FLR estimates using two synthetic phosphopeptide libraries and a phosphoproteomic dataset generated from complex mouse brain samples.Phosphorylation is a common and essential form of post-translational regulation that has been extensively studied via mass spectrometry (15). However, tandem mass spectra produced from phosphorylated peptides can be difficult to interpret because of their relatively low abundance within the cell and the presence of intense neutral loss peaks in the MS/MS spectra (6, 7). Correctly determining which residue bears the phosphate group is typically a tedious and error-prone process. Most commonly used database search tools for peptide identification from MS/MS spectra are not optimized for site localization of a post-translational modification, nor do they provide any confidence score for the assigned site. In addition, manual verification of the modification sites is a time-consuming process that requires expertise in mass spectrometry. As a result, the challenges of site localization have been acknowledged by the proteomics community, including within the latest version of the data publication guidelines of this journal (8).A number of computational approaches that localize phosphorylation sites have been reported in the literature, enabling automated phosphoproteomic analysis (reviewed in Ref. 9). These tools either rescore the MS/MS spectra to assign confidence measures for individual sites based on site-determining ions (1015) or derive localization scores directly from the search engine output (16, 17). Ascore, a representative tool in the rescoring category, scores each candidate phosphosite based upon the peaks representing the site-determining ions and subsequently reports a confidence score for the phosphopeptide sequence (11). This algorithm uses the binomial distribution to compute the probability of a random (incorrect) localization for each candidate site in each spectrum. PhosphoRS extends the scoring approach of Ascore by adjusting the probability of random peak matching based on the density of peaks in different regions of each spectrum (18). In contrast, the Mascot Delta score (MD-score)1 determines the confidence of phosphosite localization on peptides as the difference in Mascot ion scores between the highest scoring phosphopeptide (the peptide reported by the search engine) and the next best scoring phosphopermutation (same peptide sequence, alternative phosphorylation site (17)). Thus, the MD-score represents the second type of approach, which, instead of rescoring MS/MS spectra for the purpose of improved site localization, derives the scores directly from the database search engine output. A similar idea was implemented in the SLIP score using a modified version of the Batch-Tag search engine of the ProteinProspector suite (16) and in the variable modification localization score of the proprietary software Spectrum Mill (9). These tools, however, apply the logic of delta scoring for individual sites, not for the whole peptide; this is an important consideration in the case of multiply phosphorylated peptides.Although these tools have significantly improved the quality of published phosphopeptide identification data, several important issues remain. The level of uncertainty in modification site localization varies significantly across different peptides depending on the total number of candidate sites and the number of phosphorylated residues on the peptide. This, in turn, makes it difficult to compare localization scores between different peptides. Secondly, few algorithms provide a direct estimation of the false localization rate (FLR) in filtered data. Thirdly, most existing algorithms are tied to specific search engines and/or require proprietary libraries (e.g. Ascore and MD-score were developed for SEQUEST and Mascot, respectively; PhosphoRS requires proprietary libraries from Thermo Scientific). This makes it difficult to access these tools and to compare their performance.Here we present LuciPHOr, an alternative approach for site localization and direct FLR estimation. We introduce a novel scoring approach that utilizes both peak intensity and mass accuracy to aid the computation of an objective score for phosphosite determination and dynamically adapts to characteristic peak properties in different types of instrumentation and fragmentation methods. LuciPHOr computes the scores for phosphosite permutations and associated FLR estimates for the best scoring prediction at the peptide level. It also reports site-level scores for multiphosphorylated peptides, with an acknowledgment that it is difficult to rigorously estimate the FLR in such cases. We also highlight the practical utility of LuciPHOr, which is capable of processing the results of any database search tool (including commonly used search engines X! Tandem (19), SEQUEST (20), and Mascot (21)) that is supported by the widely used Trans-Proteomic Pipeline (TPP) (22). We benchmark LuciPHOr using two previously published datasets generated using synthetic phosphopeptide libraries and demonstrate similar or better performance relative to the existing methods. We also demonstrate the high accuracy of the FLR estimated by LuciPHOr obtained using a target-decoy modification site framework. Lastly, the performance of LuciPHOr is further investigated using a complex mouse brain dataset, and we also discuss the issue of site-level scoring in the analysis of multiphosphorylated peptides.  相似文献   

14.
Given the ease of whole genome sequencing with next-generation sequencers, structural and functional gene annotation is now purely based on automated prediction. However, errors in gene structure are frequent, the correct determination of start codons being one of the main concerns. Here, we combine protein N termini derivatization using (N-Succinimidyloxycarbonylmethyl)tris(2,4,6-trimethoxyphenyl)phosphonium bromide (TMPP Ac-OSu) as a labeling reagent with the COmbined FRActional DIagonal Chromatography (COFRADIC) sorting method to enrich labeled N-terminal peptides for mass spectrometry detection. Protein digestion was performed in parallel with three proteases to obtain a reliable automatic validation of protein N termini. The analysis of these N-terminal enriched fractions by high-resolution tandem mass spectrometry allowed the annotation refinement of 534 proteins of the model marine bacterium Roseobacter denitrificans OCh114. This study is especially efficient regarding mass spectrometry analytical time. From the 534 validated N termini, 480 confirmed existing gene annotations, 41 highlighted erroneous start codon annotations, five revealed totally new mis-annotated genes; the mass spectrometry data also suggested the existence of multiple start sites for eight different genes, a result that challenges the current view of protein translation initiation. Finally, we identified several proteins for which classical genome homology-driven annotation was inconsistent, questioning the validity of automatic annotation pipelines and emphasizing the need for complementary proteomic data. All data have been deposited to the ProteomeXchange with identifier PXD000337.Recent developments in mass spectrometry and bioinformatics have established proteomics as a common and powerful technique for identifying and quantifying proteins at a very broad scale, but also for characterizing their post-translational modifications and interaction networks (1, 2). In addition to the avalanche of proteomic data currently being reported, many genome sequences are established using next-generation sequencing, fostering proteomic investigations of new cellular models. Proteogenomics is a relatively recent field in which high-throughput proteomic data is used to verify coding regions within model genomes to refine the annotation of their sequences (28). Because genome annotation is now fully automated, the need for accurate annotation for model organisms with experimental data is crucial. Many projects related to genome re-annotation of microorganisms with the help of proteomics have been recently reported, such as for Mycoplasma pneumoniae (9), Rhodopseudomonas palustris (10), Shewanella oneidensis (11), Thermococcus gammatolerans (12), Deinococcus deserti (13), Salmonella thyphimurium (14), Mycobacterium tuberculosis (15, 16), Shigella flexneri (17), Ruegeria pomeroyi (18), and Candida glabrata (19), as well as for higher organisms such as Anopheles gambiae (20) and Arabidopsis thaliana (4, 5).The most frequently reported problem in automatic annotation systems is the correct identification of the translational start codon (2123). The error rate depends on the primary annotation system, but also on the organism, as reported for Halobacterium salinarum and Natromonas pharaonis (24), Deinococcus deserti (21), and Ruegeria pomeroyi (18), where the error rate is estimated above 10%. Identification of a correct translational start site is essential for the genetic and biochemical analysis of a protein because errors can seriously impact subsequent biological studies. If the N terminus is not correctly identified, the protein will be considered in either a truncated or extended form, leading to errors in bioinformatic analyses (e.g. during the prediction of its molecular weight, isoelectric point, cellular localization) and major difficulties during its experimental characterization. For example, a truncated protein may be heterologously produced as an unfolded polypeptide recalcitrant to structure determination (25). Moreover, N-terminal modifications, which are poorly documented in annotation databases, may occur (26, 27).Unfortunately, the poor polypeptide sequence coverage obtained for the numerous low abundance proteins in current shotgun MS/MS proteomic studies implies that the overall detection of N-terminal peptides obtained in proteogenomic studies is relatively low. Different methods for establishing the most extensive list of protein N termini, grouped under the so-called “N-terminomics” theme, have been proposed to selectively enrich or improve the detection of these peptides (2, 28, 29). Large N-terminome studies have recently been reported based on resin-assisted enrichment of N-terminal peptides (30) or terminal amine isotopic labeling of substrates (TAILS) coupled to depletion of internal peptides with a water-soluble aldehyde-functionalized polymer (3135). Among the numerous N-terminal-oriented methods (2), specific labeling of the N terminus of intact proteins with N-tris(2,4,6-trimethoxyphenyl)phosphonium acetyl succinamide (TMPP-Ac-OSu)1 has proven reliable (21, 3639). TMPP-derivatized N-terminal peptides have interesting properties for further LC-MS/MS mass spectrometry: (1) an increase in hydrophobicity because of the trimethoxyphenyl moiety added to the peptides, increasing their retention times in reverse phase chromatography, (2) improvement of their ionization because of the introduction of a positively charged group, and (3) a much simpler fragmentation pattern in tandem mass spectrometry. Other reported approaches rely on acetylation, followed by trypsin digestion, and then biotinylation of free amino groups (40); guanidination of lysine lateral chains followed by N-biotinylation of the N termini and trypsin digestion (41); or reductive amination of all free amino groups with formaldehyde preceeding trypsin digestion (42). Recently, we applied the TMPP method to the proteome of the Deinococcus deserti bacterium isolated from upper sand layers of the Sahara desert (13). This method enabled the detection of N-terminal peptides allowing the confirmation of 278 translation initiation codons, the correction of 73 translation starts, and the identification of non-canonical translation initiation codons (21). However, most TMPP-labeled N-terminal peptides are hidden among the more abundant internal peptides generated after proteolysis of a complex proteome, precluding their detection. This results in disproportionately fewer N-terminal validations, that is, 5 and 8% of total polypeptides coded in the theoretical proteomes of Mycobacterium smegmatis (37) and Deinococcus deserti (21) with a total of 342 and 278 validations, respectively.An interesting chromatographic method to fractionate peptide mixtures for gel-free high-throughput proteome analysis has been developed over the last years and applied to various topics (43, 44). This technique, known as COmbined FRActional DIagonal Chromatography (COFRADIC), uses a double chromatographic separation with a chemical reaction in between to change the physico-chemical properties of the extraneous peptides to be resolved from the peptides of interest. Its previous applications include the separation of methionine-containing peptides (43), N-terminal peptide enrichment (45, 46), sulfur amino acid-containing peptides (47), and phosphorylated peptides (48). COFRADIC was identified as the best method for identification of N-terminal peptides of two archaea, resulting in the identification of 240 polypeptides (9% of the theoretical proteome) for Halobacterium salinarum and 220 (8%) for Natronomonas pharaonis (24).Taking advantage of both the specificity of TMPP labeling, the resolving power of COFRADIC for enrichment, and the increase in information through the use of multiple proteases, we performed the proteogenomic analysis of a marine bacterium from the Roseobacter clade, namely Roseobacter denitrificans OCh114. This novel approach allowed us to validate and correct 534 unique proteins (13% of the theoretical proteome) with TMPP-labeled N-terminal signatures obtained using high-resolution tandem mass spectrometry. We corrected 41 annotations and detected five new open reading frames in the R. denitrificans genome. We further identified eight distinct proteins showing direct evidence for multiple start sites.  相似文献   

15.
Little is known about the nature of post mortem degradation of proteins and peptides on a global level, the so-called degradome. This is especially true for nonneural tissues. Degradome properties in relation to sampling procedures on different tissues are of great importance for the studies of, for instance, post translational modifications and/or the establishment of clinical biobanks. Here, snap freezing of fresh (<2 min post mortem time) mouse liver and pancreas tissue is compared with rapid heat stabilization with regard to effects on the proteome (using two-dimensional differential in-gel electrophoresis) and peptidome (using label free liquid chromatography). We report several proteins and peptides that exhibit heightened degradation sensitivity, for instance superoxide dismutase in liver, and peptidyl-prolyl cis-trans isomerase and insulin C-peptides in pancreas. Tissue sampling based on snap freezing produces a greater amount of degradation products and lower levels of endogenous peptides than rapid heat stabilization. We also demonstrate that solely snap freezing related degradation can be attenuated by subsequent heat stabilization. We conclude that tissue sampling involving a rapid heat stabilization step is preferable to freezing with regard to proteomic and peptidomic sample quality.The evolving maturation of the field of proteomics has, in the same way as in genomics, highlighted the need of better sampling procedures and sample preparation methodologies to minimize the effect of post mortem alterations. The aspect of sample quality is not new in any way and is relevant in most biomedical fields but has only lately started to receive adequate attention. The main factors influencing sample quality is storage temperature of the body until tissue removal (foremost a problem in clinical settings and extraction of less accessible tissue samples from model organisms) and post mortem interval (PMI)1 (13). Post mortem degradation in during PMI is a well known compromising problem when studying endogenous peptides (2, 3) and has also been proven to affect the results of polypeptide (here defined as proteins larger than 10 kDa) studies (38). PMI degradation has mainly been studied on human or mouse brain tissue, using two-dimensional electrophoresis (2-DE), SDS-PAGE, and immunoblotting (1, 312). There are also a few proteomic studies on muscle tissue degradation in livestock (1316).We and others have previously explored the effect of focused microwave irradiation with regard to sample quality, demonstrating that this method is more reliable than snap freezing in liquid nitrogen, especially with regard to post-translational modification (PTM) stability (2, 3, 1720). An alternative method based on cryostat dissection with subsequent heat treatment through boiling has also been reported to improve endogenous peptide sample quality (21). Besides focused microwave irradiation, which is specifically used for rodent brain tissue sampling, we have also demonstrated the efficiency of rapid heat stabilization through conductivity with regard to sample degradation (3, 22). Although somewhat constrained by its dependence on how quickly the tissue is harvested from the body, the latter procedure has the added advantage that it can be used on any type of tissue and species, fresh as well as frozen. This study will compare effects of sampling procedures on the liver and pancreas degradome following rapid heat stabilization, the more traditional snap freezing, or the combination of snap freezing with subsequent heat stabilization.To summarize, this study investigated the effects of post mortem degradation in pancreas and liver. Both tissues are well studied because of their multiple functions in the body and their involvement in different diseases such as diabetes or hepatocarcinoma. Pancreas is especially interesting in this context as it displays endocrine secretion of peptides, and exocrine secretion of digestive enzymes, the later making it a protease rich tissue. We used both two-dimensional difference in gel electrophoresis (2D-DIGE) and label free liquid chromatography mass spectrometry (LC-MS) based differential peptide display (2, 18), the later to better investigate changes in small molecular fragment that are not easily detectable by gel-based methods. 2D-DIGE is an unrivaled methodology to characterize alterations in isoform patterns, which is an important aspect considering that post-translational modifications (PTMs) such as phosphorylations are especially sensitive to post mortem influence within a few minutes PMI (3). The peptidomics approach has been used in several studies to point out early post mortem changes and protein degradation that tissue undergo following sampling and is therefore a well-suited method (3, 18, 22).  相似文献   

16.
17.
The use of electron transfer dissociation (ETD) fragmentation for analysis of peptides eluting in liquid chromatography tandem mass spectrometry experiments is increasingly common and can allow identification of many peptides and proteins in complex mixtures. Peptide identification is performed through the use of search engines that attempt to match spectra to peptides from proteins in a database. However, software for the analysis of ETD fragmentation data is currently less developed than equivalent algorithms for the analysis of the more ubiquitous collision-induced dissociation fragmentation spectra. In this study, a new scoring system was developed for analysis of peptide ETD fragmentation data that varies the ion type weighting depending on the precursor ion charge state and peptide sequence. This new scoring regime was applied to the analysis of data from previously published results where four search engines (Mascot, Open Mass Spectrometry Search Algorithm (OMSSA), Spectrum Mill, and X!Tandem) were compared (Kandasamy, K., Pandey, A., and Molina, H. (2009) Evaluation of several MS/MS search algorithms for analysis of spectra derived from electron transfer dissociation experiments. Anal. Chem. 81, 7170–7180). Protein Prospector identified 80% more spectra at a 1% false discovery rate than the most successful alternative searching engine in this previous publication. These results suggest that other search engines would benefit from the application of similar rules.The recently developed fragmentation approach of electron transfer dissociation (ETD)1 has become a genuine alternative to the more ubiquitous collision-induced dissociation (CID) for high throughput and high sensitivity proteomic analysis (13). ETD (4) and the related fragmentation process electron capture dissociation (ECD) (5) have been demonstrated to have particular advantages for the analysis of large peptides and small proteins (68) as well as the analysis of peptides bearing labile post-translational modifications (911). The results achieved through ETD and ECD analysis have been shown to be highly complementary to those obtained through CID fragmentation analysis, both through increasing confidence in particular identifications of peptides and also by allowing identification of extra components in complex mixtures (10, 12, 13). As CID and ETD can be sequentially or alternatively performed on precursor ions in the same mass spectrometric run, it is expected that the combined use of these two fragmentation analysis techniques will become increasingly common to enable more comprehensive sample analysis.Software for analysis of CID spectra is significantly more advanced than that for ECD/ETD data. This is partly because the behavior of peptides under CID fragmentation is better characterized and understood so software has been developed that is better able to predict the fragment ions expected. The fragment ion types observed in ETD and ECD are largely known (5, 14, 15), but information about the frequency and peak intensities of the different ion types observed is less well documented.We recently performed a study to characterize how frequently the different fragment ion types are detected in ETD spectra when analyzing complex digest mixtures produced by proteolytic enzymes or chemical cleavage reagents of different sequence specificity (16). These results were analyzed with respect to precursor charge state and location of basic residues, which were both shown to be significant factors in controlling the fragment ion types observed. The results showed that ETD spectra of doubly charged precursor ions produced very different fragment ions depending on the location of a basic residue in the sequence.Based on this statistical analysis of ETD data from a diverse range of peptides (16), in the present study, a new scoring system was developed and implemented in the search engine Batch-Tag within Protein Prospector that adjusts the weighting for different fragment ion types based on the precursor charge state and the presence of basic amino acid residues at either peptide terminus. The results using this new scoring system were compared with the previous generation of Batch-Tag, which used ion score weightings based on the average frequency of observation of different fragment types in ETD spectra of tryptic peptides and used the same scoring irrespective of precursor charge and sequence. The performance of this new scoring was also compared with those reported by other search engines using results previously published from a large standard data set (17). The new scoring system allowed identification of significantly more spectra than achieved with the previous scoring system. It also assigned 80% more spectra than the most successful of the compared search engines when using the same false discovery rate threshold.  相似文献   

18.
N-terminal acetylation (Nt-acetylation) is a highly abundant protein modification in eukaryotes catalyzed by N-terminal acetyltransferases (NATs), which transfer an acetyl group from acetyl coenzyme A to the alpha amino group of a nascent polypeptide. Nt-acetylation has emerged as an important protein modifier, steering protein degradation, protein complex formation and protein localization. Very recently, it was reported that some human proteins could carry a propionyl group at their N-terminus. Here, we investigated the generality of N-terminal propionylation by analyzing its proteome-wide occurrence in yeast and we identified 10 unique in vivo Nt-propionylated N-termini. Furthermore, by performing differential N-terminome analysis of a control yeast strain (yNatA), a yeast NatA deletion strain (yNatAΔ) or a yeast NatA deletion strain expressing human NatA (hNatA), we were able to demonstrate that in vivo Nt-propionylation of several proteins, displaying a NatA type substrate specificity profile, depended on the presence of either yeast or human NatA. Furthermore, in vitro Nt-propionylation assays using synthetic peptides, propionyl coenzyme A, and either purified human NATs or immunoprecipitated human NatA, clearly demonstrated that NATs are Nt-propionyltransferases (NPTs) per se. We here demonstrate for the first time that Nt-propionylation can occur in yeast and thus is an evolutionarily conserved process, and that the NATs are multifunctional enzymes acting as NPTs in vivo and in vitro, in addition to their main role as NATs, and their potential function as lysine acetyltransferases (KATs) and noncatalytic regulators.Modifications greatly increases a cell''s proteome diversity confined by the natural amino acids. As more than 80% of human proteins, more than 70% of plant and fly proteins and more than 60% of yeast proteins are N-terminally acetylated (Nt-acetylated),1 this modification represents one of the most common protein modifications in eukaryotes (15). Recent studies have pointed to distinct functional consequences of Nt-acetylation (6): creating degradation signals recognized by a ubiquitin ligase of a new branch of the N-end rule pathway (7), preventing translocation across the endoplasmic reticulum membrane (8), and mediating protein complex formation (9). Nt-acetylation further appears to be essential for life in higher eukaryotes; for instance, a mutation in the major human N-terminal acetyltransferase (NAT), hNatA, was recently shown to be the cause of Ogden syndrome by which male infants are underdeveloped and die at infancy (10). Unlike lysine acetylation, Nt-acetylation is considered an irreversible process, and further, to mainly occur on the ribosome during protein synthesis (1115). In yeast and humans, three NAT complexes are responsible for the majority of Nt-acetylation; NatA, NatB and NatC, each of which has a defined substrate specificity (16). NatA acetylates Ser-, Ala-, Gly-, Thr-, Val- and Cys- N-termini generated on removal of the initiator methionine (iMet) (1, 1719). NatB and NatC acetylate N-termini in which the iMet is followed by an acidic (2023) or a hydrophobic residue respectively (2426). Naa40p/NatD was shown to acetylate the Ser-starting N-termini of histones H2A and H4 (27, 28). NatE, composed of the catalytic Naa50p (Nat5p) has substrate specificity toward iMet succeeded by a hydrophobic amino acid (29, 30). As largely the same Nt-acetylation patterns are found in yeast and humans, it was believed that the NAT-machineries were conserved in general (31). However, the recently discovered higher eukaryotic specific NAT, Naa60p/NatF, was found to display a partially distinct substrate specificity in part explaining the higher degree of Nt-acetylation in higher versus lower eukaryotes (4).Human NatA is composed of two main subunits: the catalytic subunit hNaa10p and the auxiliary subunit, hNaa15p that is presumably responsible for anchoring the complex to the ribosome (14, 19). The chaperone-like HYPK protein is also stably associated with the NatA subunits and may be essential for efficient NatA activity (32). In addition, hNaa50p was shown to be physically associated with hNatA, however it is believed not to affect NatA activity (14, 33, 34). hNaa50p was also shown to exhibit Nε-acetyltransferase (KAT) activity (29), however, the structure of hNaa50p with its peptide substrate bound strongly indicates that the peptide binding pocket is specifically suited to accommodate N-terminal peptides, as opposed to lysine residues (35). The human NatA subunits are associated with ribosomes, but interestingly, significant fractions are also nonribosomal (19, 30, 32). Of further notice, the catalytic subunits, hNaa10p and hNaa50p, were also found to partially act independently of the hNatA complex (30, 36).Recent studies have identified novel in vivo acyl modifications of proteins. Mass spectrometry data of affinity-enriched acetyllysine-containing peptides from HeLa cells showed the presence of propionylated and butyrylated lysines in histone H4 peptides (37). Similar analyses also showed the presence of propionylated lysines in p53, p300 and CREB-binding protein (38) besides the yeast histones H2B, H3 and H4 (39). Propionylated or butyrylated residues differ by only one or two extra methyl moieties as compared with their acetylated counterparts, thereby adding more hydrophobicity and bulkiness to the affected residue. To date, no distinct propionyl- or butyryltransferases responsible for these modifications have been identified. However, by using propionyl coenzyme A (Prop-CoA) or butyryl coenzyme A (But-CoA) as donors in the enzyme reaction, it was shown that some of the previously characterized lysine acetyltransferases (KATs) are able to respectively catalyze propionylation and butyrylation of lysine residues both in vitro (37, 4042) and in vivo (38, 41). Similarly, it has been shown that lysine deacetylases also are capable of catalyzing depropionylation (40, 41, 43, 44) and debutyrylation (44) (see review (45)).Interestingly, mass spectrometry data also suggested that propionylated N-termini are present in human cell lines (46, 47). Until today, an N-terminal propionyl transferase (NPT) catalyzing N-terminal propionylation (Nt-propionylation) has to our knowledge not been identified.In this study, we hypothesized that NATs might have the ability to act as NPTs. In vitro experiments using purified hNaa10p, hNaa50p or immunoprecipitated human NatA complex indeed confirmed their intrinsic capacity to catalyze Nt-propionylation toward synthetic peptides. NatA was also found capable of Nt-butyrylation in vitro. By means of N-terminomics, we further investigated the presence of yeast Nt-propionylated proteins in vivo. Indeed, we found evidence for Nt-propionylation being a naturally occurring modification in yeast. Interestingly, in a yeast strain lacking NatA, we observed a loss in Nt-propionylation and Nt-acetylation for several NatA substrates, as compared with a control yeast strain expressing endogenous NatA or a strain ectopically expressing hNatA. Thus, besides acting as NATs, yeast and human NatA can act as NPTs and we thus demonstrate for the first time that NATs have the capacity of both acetylating and propionylating protein N-termini in vivo and in vitro.  相似文献   

19.
20.
A crucial component of the analysis of shotgun proteomics datasets is the search engine, an algorithm that attempts to identify the peptide sequence from the parent molecular ion that produced each fragment ion spectrum in the dataset. There are many different search engines, both commercial and open source, each employing a somewhat different technique for spectrum identification. The set of high-scoring peptide-spectrum matches for a defined set of input spectra differs markedly among the various search engine results; individual engines each provide unique correct identifications among a core set of correlative identifications. This has led to the approach of combining the results from multiple search engines to achieve improved analysis of each dataset. Here we review the techniques and available software for combining the results of multiple search engines and briefly compare the relative performance of these techniques.The most commonly used proteomics approach, shotgun proteomics, has become an invaluable tool for the high-throughput characterization of proteins in biological samples (1). This workflow relies on the combination of protein digestion, liquid chromatography (LC)1 separation, tandem mass spectrometry (MS/MS), and sophisticated data analysis in its aim to derive an accurate and complete set of peptides and their inferred proteins that are present in the sample being studied. Although many variations are possible, the typical workflow begins with the digestion of proteins into peptides with a protease, typically trypsin. The resulting peptide mixture is first separated via LC and then subjected to mass spectrometry (MS) analysis. The MS instrument acquires fragment ion spectra on a subset of the peptide precursor ions that it measures. From the MS/MS spectra that measure the abundance and mass of the peptide ion fragments, peptides present in the mixture are identified and proteins are inferred by means of downstream computational analysis.The informatics component of the shotgun proteomics workflow is crucial for proper data analysis (2), and a wide variety of tools have emerged for this purpose (3). The typical informatics workflow can be summarized in a few steps: conversion from vendor proprietary formats to an open format, high-throughput interpretation of the MS/MS spectra with a search engine, and statistical validation of the results with estimation of the false discovery rate at a selected score threshold. Various tools for measuring relative peptide abundances may be applied, dependent on the type of quantitation technique applied in the experiment. Finally, the proteins present, and their abundance in the sample, are inferred based on the peptide identifications.One of the most computationally intensive and diverse steps in the computational analysis workflow is the use of a search engine to interpret the MS/MS spectra in order to determine the best matching peptide ion identifications (4), termed peptide-spectrum matches (PSMs). There are three main types of engines: sequence search engines such as X!Tandem (5), Mascot (6), SEQUEST (7), MyriMatch (8), MS-GFDB (9), and OMSSA (10), which attempt to match acquired spectra with theoretical spectra generated from possible peptide sequences contained in a protein sequence list; spectral library search engines such as SpectraST (11), X!Hunter (12), and Bibliospec (13), which attempt to match spectra with a library of previously observed and identified spectra; and de novo search engines such as PEAKS (14), PepNovo (15), and Lutefisk (16), which attempt to derive peptide identifications based on the MS/MS spectrum peak patterns alone, without reference sequences or previous spectra (17). Additionally, elements of de novo sequencing (short sequence tag extraction) and database searching have been combined to create hybrid search engines such as InSpecT (18) and PEAKS-DB (19).The goal of this review is to evaluate the potential improvement made possible by combining the search results of multiple search engines. On their own, most of the common search engines perform well on typical datasets, with the results having significant overlap between the algorithms (20); and yet, the degree to which there is divergence in the results of different search engines remains quite high. Disagreement between search engines, where multiple different peptide sequences are identified with high confidence, is quite rare. It is much more common to observe different engines being in agreement on the correct identification, yet with neither of the identifications having a probability high enough to allow it to pass the selected error criterion when analyzed independently. When the results are analyzed together, the agreement on the identification might propel the PSM to pass the same error criterion. In cases when only one engine scores the PSM highly enough that it passes an acceptance threshold, these identifications are reported within the acceptable error rate. Also, some engines use unique methods to consider peptides or modifications not considered by other engines. Even if the experimenter is careful to choose similar search parameters when running multiple tools, different search engines will allow one to set non-identical search parameters, which contributes to reduced overlap between the search results. Spectral library search engines tend to be far more sensitive and specific than sequence search engines, but only for peptide ions for which there is a spectrum in the library.Given that different search engines excel at identifying different subsets of PSMs, it seems natural to combine the power of multiple search engines to achieve a single, better result. Many algorithms and software tools have emerged that combine search results (2128), each demonstrating an improved final result over any individual search engine alone. Such improved results come at the cost of the increased complexity of managing multiple searches in an analysis pipeline, as well as a several-fold increase in computational time in what is already the most computationally expensive step. However, with the ever-growing availability of fast computers, computing clusters, and cloud computing resources, researchers now have within reach the ability to quickly search their MS/MS data using several of the still-growing number of search engine algorithms. In some cases the open-source search engines are quite similar to their commercial alternatives; for example, Comet (29) is very similar to SEQUEST. Given the significant amount of time the average researcher takes to design an experiment, process the samples, and acquire the data, it is natural that a researcher would wish to maximize the number and confidence of peptide and protein identifications in each dataset with a rigorous computational analysis. Furthermore, when using label-free spectral counting for abundance analysis, maximizing the number of PSMs increases the dynamic range and accuracy of the quantitative approach (23). Therefore, the demand to use multiple search engines and integrate their results with the goal of maximizing the amount of information gleaned from each dataset is expected to continue growing.Also emerging are software tools that use several iterative database searching passes of the same data, combining multiple database search tools, searches with different post-translational modifications, and searches against different databases in an attempt to use each specific tool under ideal conditions, utilizing each for its specific strengths and integrating the results (30). Some relevant aspects of such strategies are discussed by Tharakan et al. (31).In the following sections, we review the various approaches and software programs available to assist with the merging of results from different search engines. We also provide a performance comparison of the various approaches described here on a test dataset to assess the expected performance gains from the various described methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号