首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
2.
3.
A decoding algorithm is tested that mechanistically models the progressive alignments that arise as the mRNA moves past the rRNA tail during translation elongation. Each of these alignments provides an opportunity for hybridization between the single-stranded, -terminal nucleotides of the 16S rRNA and the spatially accessible window of mRNA sequence, from which a free energy value can be calculated. Using this algorithm we show that a periodic, energetic pattern of frequency 1/3 is revealed. This periodic signal exists in the majority of coding regions of eubacterial genes, but not in the non-coding regions encoding the 16S and 23S rRNAs. Signal analysis reveals that the population of coding regions of each bacterial species has a mean phase that is correlated in a statistically significant way with species () content. These results suggest that the periodic signal could function as a synchronization signal for the maintenance of reading frame and that codon usage provides a mechanism for manipulation of signal phase.[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32]  相似文献   

4.
5.
A Boolean network is a model used to study the interactions between different genes in genetic regulatory networks. In this paper, we present several algorithms using gene ordering and feedback vertex sets to identify singleton attractors and small attractors in Boolean networks. We analyze the average case time complexities of some of the proposed algorithms. For instance, it is shown that the outdegree-based ordering algorithm for finding singleton attractors works in time for , which is much faster than the naive time algorithm, where is the number of genes and is the maximum indegree. We performed extensive computational experiments on these algorithms, which resulted in good agreement with theoretical results. In contrast, we give a simple and complete proof for showing that finding an attractor with the shortest period is NP-hard.[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32]  相似文献   

6.
Allergenic proteins such as grass pollen and house dust mite (HDM) proteins are known to trigger hypersensitivity reactions of the immune system, leading to what is commonly known as allergy. Key allergenic proteins including sequence variants have been identified but characterization of their post-translational modifications (PTMs) is still limited.Here, we present a detailed PTM1 characterization of a series of the main and clinically relevant allergens used in allergy tests and vaccines. We employ Orbitrap-based mass spectrometry with complementary fragmentation techniques (HCD/ETD) for site-specific PTM characterization by bottom-up analysis. In addition, top-down mass spectrometry is utilized for targeted analysis of individual proteins, revealing hitherto unknown PTMs of HDM allergens. We demonstrate the presence of lysine-linked polyhexose glycans and asparagine-linked N-acetylhexosamine glycans on HDM allergens. Moreover, we identified more complex glycan structures than previously reported on the major grass pollen group 1 and 5 allergens, implicating important roles for carbohydrates in allergen recognition and response by the immune system. The new findings are important for understanding basic disease-causing mechanisms at the cellular level, which ultimately may pave the way for instigating novel approaches for targeted desensitization strategies and improved allergy vaccines.Allergic respiratory disease is a global health problem and current clinical guidelines recommend a combination of allergen avoidance, pharmacotherapy, and allergen specific immunotherapy for treatment (14). At present allergy testing and vaccines are based on isolated crude antigen preparations from natural sources (i.e. HDM, pollens, etc.), but a move toward recombinant allergen design is ongoing (5, 6). This could have important functional implications because the production host will determine the repertoire of post-translational modifications (PTMs) and in particular glycan modifications presented on allergens.The carbohydrate structures found on allergens are in most cases not found in mammals and therefore frequently lead to the induction IgE antibodies named Cross-reactive Carbohydrate Determinants (CCD) (711). Moreover, glycans may directly be involved in and promote uptake and target allergens to carbohydrate lectin receptors on antigen presenting cells (APC) (1214). Therefore, a full structural characterization of the glycans on the natural allergens is a prerequisite for understanding both antibody reactivity and lectin receptor mediated allergen recognition and modulation of the immune response (15, 16). Furthermore, a detailed characterization of PTMs of allergens is important for standardization of allergen products for diagnostic purposes as well as for vaccine use (17, 18). Although many major allergens and their etiology have been characterized in some detail, structural information on for example their immunological important PTM status is still incomplete (1921).Mass spectrometry-based technologies offer sensitive and accurate analyses for identification and characterization of proteins. The common proteomics workflow typically adopts the bottom-up approach, i.e. in vitro proteolytic digestion of proteins followed by nanoflow-liquid chromatography-tandem mass spectrometry (nLC-MS/MS) for protein identification and PTM characterization. Electron- or collision-driven fragmentation techniques, e.g. electron transfer dissociation (ETD) (22) or higher energy collisional dissociation (HCD) (23) have enabled accurate identification of peptides of purified proteins, e.g. allergens (21, 24), or complex biological samples (2527) with concurrent characterization of their PTMs. One advantage of bottom-up mass spectrometry is the ability to resolve modified peptides within a narrow chromatographic time frame thereby enabling in-depth characterization of site-specific features, e.g. glycoforms, on peptides. This peptide-level information is subsequently used to generate a protein-level view on the PTM status for a given protein. Importantly, the PTM connectivity of the protein (28) is lost upon proteolytic digestion, and alternative approaches are often required for comprehensive characterization of all proteoforms (29). Top-down mass spectrometry has emerged as an alternative approach to bottom-up proteomics, offering complementary MS and MS/MS information that may be used for protein identification and characterization (30, 31). With top-down MS, intact proteins are typically analyzed by high-resolution FTMS and characterized at the MS/MS level by CID, HCD, ECD, or ETD. This technique provides instant protein-level information on analytes, e.g. sequence variants, amino acid substitutions, PTMs, etc., which can be verified at the MS/MS level by different fragmentation modes. The combination of bottom-up and top-down mass spectrometry is therefore a powerful tool for the identification and characterization of proteins. Here, we combine top-down and bottom-up mass spectrometry for comprehensive characterization of seven major allergens as a first step toward unraveling the molecular mode of action of allergens with complex PTMs. By these methods, we demonstrate hitherto unknown PTMs of HDM allergens and identify more complex glycan structures than previously reported on the major grass pollen group 1 and 5 allergens. The new findings implicate important roles for carbohydrates in allergen recognition and response by the immune system.  相似文献   

7.
A complete understanding of the biological functions of large signaling peptides (>4 kDa) requires comprehensive characterization of their amino acid sequences and post-translational modifications, which presents significant analytical challenges. In the past decade, there has been great success with mass spectrometry-based de novo sequencing of small neuropeptides. However, these approaches are less applicable to larger neuropeptides because of the inefficient fragmentation of peptides larger than 4 kDa and their lower endogenous abundance. The conventional proteomics approach focuses on large-scale determination of protein identities via database searching, lacking the ability for in-depth elucidation of individual amino acid residues. Here, we present a multifaceted MS approach for identification and characterization of large crustacean hyperglycemic hormone (CHH)-family neuropeptides, a class of peptide hormones that play central roles in the regulation of many important physiological processes of crustaceans. Six crustacean CHH-family neuropeptides (8–9.5 kDa), including two novel peptides with extensive disulfide linkages and PTMs, were fully sequenced without reference to genomic databases. High-definition de novo sequencing was achieved by a combination of bottom-up, off-line top-down, and on-line top-down tandem MS methods. Statistical evaluation indicated that these methods provided complementary information for sequence interpretation and increased the local identification confidence of each amino acid. Further investigations by MALDI imaging MS mapped the spatial distribution and colocalization patterns of various CHH-family neuropeptides in the neuroendocrine organs, revealing that two CHH-subfamilies are involved in distinct signaling pathways.Neuropeptides and hormones comprise a diverse class of signaling molecules involved in numerous essential physiological processes, including analgesia, reward, food intake, learning and memory (1). Disorders of the neurosecretory and neuroendocrine systems influence many pathological processes. For example, obesity results from failure of energy homeostasis in association with endocrine alterations (2, 3). Previous work from our lab used crustaceans as model organisms found that multiple neuropeptides were implicated in control of food intake, including RFamides, tachykinin related peptides, RYamides, and pyrokinins (46).Crustacean hyperglycemic hormone (CHH)1 family neuropeptides play a central role in energy homeostasis of crustaceans (717). Hyperglycemic response of the CHHs was first reported after injection of crude eyestalk extract in crustaceans. Based on their preprohormone organization, the CHH family can be grouped into two sub-families: subfamily-I containing CHH, and subfamily-II containing molt-inhibiting hormone (MIH) and mandibular organ-inhibiting hormone (MOIH). The preprohormones of the subfamily-I have a CHH precursor related peptide (CPRP) that is cleaved off during processing; and preprohormones of the subfamily-II lack the CPRP (9). Uncovering their physiological functions will provide new insights into neuroendocrine regulation of energy homeostasis.Characterization of CHH-family neuropeptides is challenging. They are comprised of more than 70 amino acids and often contain multiple post-translational modifications (PTMs) and complex disulfide bridge connections (7). In addition, physiological concentrations of these peptide hormones are typically below picomolar level, and most crustacean species do not have available genome and proteome databases to assist MS-based sequencing.MS-based neuropeptidomics provides a powerful tool for rapid discovery and analysis of a large number of endogenous peptides from the brain and the central nervous system. Our group and others have greatly expanded the peptidomes of many model organisms (3, 1833). For example, we have discovered more than 200 neuropeptides with several neuropeptide families consisting of as many as 20–40 members in a simple crustacean model system (5, 6, 2531, 34). However, a majority of these neuropeptides are small peptides with 5–15 amino acid residues long, leaving a gap of identifying larger signaling peptides from organisms without sequenced genome. The observed lack of larger size peptide hormones can be attributed to the lack of effective de novo sequencing strategies for neuropeptides larger than 4 kDa, which are inherently more difficult to fragment using conventional techniques (3437). Although classical proteomics studies examine larger proteins, these tools are limited to identification based on database searching with one or more peptides matching without complete amino acid sequence coverage (36, 38).Large populations of neuropeptides from 4–10 kDa exist in the nervous systems of both vertebrates and invertebrates (9, 39, 40). Understanding their functional roles requires sufficient molecular knowledge and a unique analytical approach. Therefore, developing effective and reliable methods for de novo sequencing of large neuropeptides at the individual amino acid residue level is an urgent gap to fill in neurobiology. In this study, we present a multifaceted MS strategy aimed at high-definition de novo sequencing and comprehensive characterization of the CHH-family neuropeptides in crustacean central nervous system. The high-definition de novo sequencing was achieved by a combination of three methods: (1) enzymatic digestion and LC-tandem mass spectrometry (MS/MS) bottom-up analysis to generate detailed sequences of proteolytic peptides; (2) off-line LC fractionation and subsequent top-down MS/MS to obtain high-quality fragmentation maps of intact peptides; and (3) on-line LC coupled to top-down MS/MS to allow rapid sequence analysis of low abundance peptides. Combining the three methods overcomes the limitations of each, and thus offers complementary and high-confidence determination of amino acid residues. We report the complete sequence analysis of six CHH-family neuropeptides including the discovery of two novel peptides. With the accurate molecular information, MALDI imaging and ion mobility MS were conducted for the first time to explore their anatomical distribution and biochemical properties.  相似文献   

8.
9.
Posttranslational modifications of proteins increase the complexity of the cellular proteome and enable rapid regulation of protein functions in response to environmental changes. Protein ubiquitylation is a central regulatory posttranslational modification that controls numerous biological processes including proteasomal degradation of proteins, DNA damage repair and innate immune responses. Here we combine high-resolution mass spectrometry with single-step immunoenrichment of di-glycine modified peptides for mapping of endogenous putative ubiquitylation sites in murine tissues. We identify more than 20,000 unique ubiquitylation sites on proteins involved in diverse biological processes. Our data reveals that ubiquitylation regulates core signaling pathways common for each of the studied tissues. In addition, we discover that ubiquitylation regulates tissue-specific signaling networks. Many tissue-specific ubiquitylation sites were obtained from brain highlighting the complexity and unique physiology of this organ. We further demonstrate that different di-glycine-lysine-specific monoclonal antibodies exhibit sequence preferences, and that their complementary use increases the depth of ubiquitylation site analysis, thereby providing a more unbiased view of protein ubiquitylation.Ubiquitin is a small 76-amino-acid protein that is conjugated to the ε-amino group of lysines in a highly orchestrated enzymatic cascade involving ubiquitin activating (E1), ubiquitin conjugating (E2), and ubiquitin ligase (E3) enzymes (1). Ubiquitylation is involved in the regulation of diverse cellular processes including protein degradation (2, 3, 4), DNA damage repair (5, 6), DNA replication (7), cell surface receptor endocytosis, and innate immune signaling (8, 9). Deregulation of protein ubiquitylation is implicated in the development of cancer and neurodegenerative diseases (10, 11). Inhibitors targeting the ubiquitin proteasome system are used in the treatment of hematologic malignancies such as multiple myeloma (12, 13).Recent developments in the mass spectrometry (MS)-based proteomics have greatly expedited proteome-wide analysis of posttranslational modifications (PTMs) (1417). Large-scale mapping of ubiquitylation sites by mass spectrometry is based on the identification of the di-glycine remnant that results from trypsin digestion of ubiquitylated proteins and remains attached to ubiquitylated lysines (18). Recently, two monoclonal antibodies were developed that specifically recognize di-glycine remnant modified peptides enabling their efficient enrichment from complex peptide mixtures (19, 20). These antibodies have been used to identify thousands of endogenous ubiquitylation sites in human cells, and to quantify site-specific changes in ubiquitylation in response to different cellular perturbations (2022). It should be noted that the di-glycine remnant is not specific for proteins modified by ubiquitin but also proteins modified by NEDD8 or ISG15 generate an identical di-glycine remnant on modified lysines making it impossible to distinguish between these modifications by mass spectrometry. However, expression of NEDD8 in mouse tissues was shown to be developmentally down-regulated (23), and ISG15 expression in bovine tissues is low in the absence of interferon stimulation (24). In cell culture experiments it was shown that a great majority of sites identified using di-glycine-lysine-specific antibodies stems from ubiquitylated peptides (20).The rates of cell proliferation and protein turnover in mammals vary dramatically between different tissues. Immortalized cell lines, often derived from cancer, are selected for high proliferation rates and fail to represent the complex conditions in tissues. Tissue proteomics can help to gain a more comprehensive understanding of physiological processes in multicellular organisms. Analysis of tissue proteome and PTMs can provide important insights into tissue-specific processes and signaling networks that regulate these processes (2532). In addition, development of mass spectrometric methods for analysis of PTMs in diseased tissues might lead to the identification of biomarkers.In this study, we combined high-resolution mass spectrometry with immunoenrichment of di-glycine modified peptides to investigate endogenous ubiquitylation sites in murine tissues. We identified more than 20,000 ubiquitylation sites from five different murine tissues and report the largest ubiquitylation dataset obtained from mammalian tissues to date. Furthermore, we compared the performance of the two monoclonal di-glycine-lysine-specific antibodies available for enrichment of ubiquitylated peptides, and reveal their relative preferences for different amino acids flanking ubiquitylation sites.  相似文献   

10.
11.
12.
Understanding how a small brain region, the suprachiasmatic nucleus (SCN), can synchronize the body''s circadian rhythms is an ongoing research area. This important time-keeping system requires a complex suite of peptide hormones and transmitters that remain incompletely characterized. Here, capillary liquid chromatography and FTMS have been coupled with tailored software for the analysis of endogenous peptides present in the SCN of the rat brain. After ex vivo processing of brain slices, peptide extraction, identification, and characterization from tandem FTMS data with <5-ppm mass accuracy produced a hyperconfident list of 102 endogenous peptides, including 33 previously unidentified peptides, and 12 peptides that were post-translationally modified with amidation, phosphorylation, pyroglutamylation, or acetylation. This characterization of endogenous peptides from the SCN will aid in understanding the molecular mechanisms that mediate rhythmic behaviors in mammals.Central nervous system neuropeptides function in cell-to-cell signaling and are involved in many physiological processes such as circadian rhythms, pain, hunger, feeding, and body weight regulation (14). Neuropeptides are produced from larger protein precursors by the selective action of endopeptidases, which cleave at mono- or dibasic sites and then remove the C-terminal basic residues (1, 2). Some neuropeptides undergo functionally important post-translational modifications (PTMs),1 including amidation, phosphorylation, pyroglutamylation, or acetylation. These aspects of peptide synthesis impact the properties of neuropeptides, further expanding their diverse physiological implications. Therefore, unveiling new peptides and unreported peptide properties is critical to advancing our understanding of nervous system function.Historically, the analysis of neuropeptides was performed by Edman degradation in which the N-terminal amino acid is sequentially removed. However, analysis by this method is slow and does not allow for sequencing of the peptides containing N-terminal PTMs (5). Immunological techniques, such as radioimmunoassay and immunohistochemistry, are used for measuring relative peptide levels and spatial localization, but these methods only detect peptide sequences with known structure (6). More direct, high throughput methods of analyzing brain regions can be used.Mass spectrometry, a rapid and sensitive method that has been used for the analysis of complex biological samples, can detect and identify the precise forms of neuropeptides without prior knowledge of peptide identity, with these approaches making up the field of peptidomics (712). The direct tissue and single neuron analysis by MALDI MS has enabled the discovery of hundreds of neuropeptides in the last decade, and the neuronal homogenate analysis by fractionation and subsequent ESI or MALDI MS has yielded an equivalent number of new brain peptides (5). Several recent peptidome studies, including the work by Dowell et al. (10), have used the specificity of FTMS for peptide discovery (10, 1315). Here, we combine the ability to fragment ions at ultrahigh mass accuracy (16) with a software pipeline designed for neuropeptide discovery. We use nanocapillary reversed-phase LC coupled to 12 Tesla FTMS for the analysis of peptides present in the suprachiasmatic nucleus (SCN) of rat brain.A relatively small, paired brain nucleus located at the base of the hypothalamus directly above the optic chiasm, the SCN contains a biological clock that generates circadian rhythms in behaviors and homeostatic functions (17, 18). The SCN comprises ∼10,000 cellular clocks that are integrated as a tissue level clock which, in turn, orchestrates circadian rhythms throughout the brain and body. It is sensitive to incoming signals from the light-sensing retina and other brain regions, which cause temporal adjustments that align the SCN appropriately with changes in environmental or behavioral state. Previous physiological studies have implicated peptides as critical synchronizers of normal SCN function as well as mediators of SCN inputs, internal signal processing, and outputs; however, only a small number of peptides have been identified and explored in the SCN, leaving unresolved many circadian mechanisms that may involve peptide function.Most peptide expression in the SCN has only been studied through indirect antibody-based techniques (1929), although we recently used MS approaches to characterize several peptides detected in SCN releasates (30). Previous studies indicate that the SCN expresses a rich diversity of peptides relative to other brain regions studied with the same techniques. Previously used immunohistochemical approaches are not only inadequate for comprehensively evaluating PTMs and alternate isoforms of known peptides but are also incapable of exhaustively examining the full peptide complement of this complex biological network of peptidergic inputs and intrinsic components. A comprehensive study of SCN peptidomics is required that utilizes high resolution strategies for directly analyzing the peptide content of the neuronal networks comprising the SCN.In our study, the SCN was obtained from ex vivo coronal brain slices via tissue punch and subjected to multistage peptide extraction. The SCN tissue extract was analyzed by FTMS/MS, and the high resolution MS and MS/MS data were processed using ProSightPC 2.0 (16), which allows the identification and characterization of peptides or proteins from high mass accuracy MS/MS data. In addition, the Sequence Gazer included in ProSightPC was used for manually determining PTMs (31, 32). As a result, a total of 102 endogenous peptides were identified, including 33 that were previously unidentified, and 12 PTMs (including amidation, phosphorylation, pyroglutamylation, and acetylation) were found. The present study is the first comprehensive peptidomics study for identifying peptides present within the mammalian SCN. In fact, this is one of the first peptidome studies to work with discrete brain nuclei as opposed to larger brain structures and follows up on our recent report using LC-ion trap for analysis of the peptides in the supraoptic nucleus (33); here, the use of FTMS allows a greater range of PTMs to be confirmed and allows higher confidence in the peptide assignments. This information on the peptides in the SCN will serve as a basis to more exhaustively explore the extent that previously unreported SCN neuropeptides may function in SCN regulation of mammalian circadian physiology.  相似文献   

13.
Insulin plays a central role in the regulation of vertebrate metabolism. The hormone, the post-translational product of a single-chain precursor, is a globular protein containing two chains, A (21 residues) and B (30 residues). Recent advances in human genetics have identified dominant mutations in the insulin gene causing permanent neonatal-onset DM2 (14). The mutations are predicted to block folding of the precursor in the ER of pancreatic β-cells. Although expression of the wild-type allele would in other circumstances be sufficient to maintain homeostasis, studies of a corresponding mouse model (57) suggest that the misfolded variant perturbs wild-type biosynthesis (8, 9). Impaired β-cell secretion is associated with ER stress, distorted organelle architecture, and cell death (10). These findings have renewed interest in insulin biosynthesis (1113) and the structural basis of disulfide pairing (1419). Protein evolution is constrained not only by structure and function but also by susceptibility to toxic misfolding.Insulin plays a central role in the regulation of vertebrate metabolism. The hormone, the post-translational product of a single-chain precursor, is a globular protein containing two chains, A (21 residues) and B (30 residues). Recent advances in human genetics have identified dominant mutations in the insulin gene causing permanent neonatal-onset DM2 (14). The mutations are predicted to block folding of the precursor in the ER of pancreatic β-cells. Although expression of the wild-type allele would in other circumstances be sufficient to maintain homeostasis, studies of a corresponding mouse model (57) suggest that the misfolded variant perturbs wild-type biosynthesis (8, 9). Impaired β-cell secretion is associated with ER stress, distorted organelle architecture, and cell death (10). These findings have renewed interest in insulin biosynthesis (1113) and the structural basis of disulfide pairing (1419). Protein evolution is constrained not only by structure and function but also by susceptibility to toxic misfolding.  相似文献   

14.
A variety of high-throughput methods have made it possible to generate detailed temporal expression data for a single gene or large numbers of genes. Common methods for analysis of these large data sets can be problematic. One challenge is the comparison of temporal expression data obtained from different growth conditions where the patterns of expression may be shifted in time. We propose the use of wavelet analysis to transform the data obtained under different growth conditions to permit comparison of expression patterns from experiments that have time shifts or delays. We demonstrate this approach using detailed temporal data for a single bacterial gene obtained under 72 different growth conditions. This general strategy can be applied in the analysis of data sets of thousands of genes under different conditions.[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29]  相似文献   

15.
Top-down mass spectrometry (MS)-based proteomics is arguably a disruptive technology for the comprehensive analysis of all proteoforms arising from genetic variation, alternative splicing, and posttranslational modifications (PTMs). However, the complexity of top-down high-resolution mass spectra presents a significant challenge for data analysis. In contrast to the well-developed software packages available for data analysis in bottom-up proteomics, the data analysis tools in top-down proteomics remain underdeveloped. Moreover, despite recent efforts to develop algorithms and tools for the deconvolution of top-down high-resolution mass spectra and the identification of proteins from complex mixtures, a multifunctional software platform, which allows for the identification, quantitation, and characterization of proteoforms with visual validation, is still lacking. Herein, we have developed MASH Suite Pro, a comprehensive software tool for top-down proteomics with multifaceted functionality. MASH Suite Pro is capable of processing high-resolution MS and tandem MS (MS/MS) data using two deconvolution algorithms to optimize protein identification results. In addition, MASH Suite Pro allows for the characterization of PTMs and sequence variations, as well as the relative quantitation of multiple proteoforms in different experimental conditions. The program also provides visualization components for validation and correction of the computational outputs. Furthermore, MASH Suite Pro facilitates data reporting and presentation via direct output of the graphics. Thus, MASH Suite Pro significantly simplifies and speeds up the interpretation of high-resolution top-down proteomics data by integrating tools for protein identification, quantitation, characterization, and visual validation into a customizable and user-friendly interface. We envision that MASH Suite Pro will play an integral role in advancing the burgeoning field of top-down proteomics.With well-developed algorithms and computational tools for mass spectrometry (MS)1 data analysis, peptide-based bottom-up proteomics has gained considerable popularity in the field of systems biology (19). Nevertheless, the bottom-up approach is suboptimal for the analysis of protein posttranslational modifications (PTMs) and sequence variants as a result of protein digestion (10). Alternatively, the protein-based top-down proteomics approach analyzes intact proteins, which provides a “bird''s eye” view of all proteoforms (11), including those arising from sequence variations, alternative splicing, and diverse PTMs, making it a disruptive technology for the comprehensive analysis of proteoforms (1224). However, the complexity of top-down high-resolution mass spectra presents a significant challenge for data analysis. In contrast to the well-developed software packages available for processing data from bottom-up proteomics experiments, the data analysis tools in top-down proteomics remain underdeveloped.The initial step in the analysis of top-down proteomics data is deconvolution of high-resolution mass and tandem mass spectra. Thorough high-resolution analysis of spectra by horn (THRASH), which was the first algorithm developed for the deconvolution of high-resolution mass spectra (25), is still widely used. THRASH automatically detects and evaluates individual isotopomer envelopes by comparing the experimental isotopomer envelope with a theoretical envelope and reporting those that score higher than a user-defined threshold. Another commonly used algorithm, MS-Deconv, utilizes a combinatorial approach to address the difficulty of grouping MS peaks from overlapping isotopomer envelopes (26). Recently, UniDec, which employs a Bayesian approach to separate mass and charge dimensions (27), can also be applied to the deconvolution of high-resolution spectra. Although these algorithms assist in data processing, unfortunately, the deconvolution results often contain a considerable amount of misassigned peaks as a consequence of the complexity of the high-resolution MS and MS/MS data generated in top-down proteomics experiments. Errors such as these can undermine the accuracy of protein identification and PTM localization and, thus, necessitate the implementation of visual components that allow for the validation and manual correction of the computational outputs.Following spectral deconvolution, a typical top-down proteomics workflow incorporates identification, quantitation, and characterization of proteoforms; however, most of the recently developed data analysis tools for top-down proteomics, including ProSightPC (28, 29), Mascot Top Down (also known as Big-Mascot) (30), MS-TopDown (31), and MS-Align+ (32), focus almost exclusively on protein identification. ProSightPC was the first software tool specifically developed for top-down protein identification. This software utilizes “shotgun annotated” databases (33) that include all possible proteoforms containing user-defined modifications. Consequently, ProSightPC is not optimized for identifying PTMs that are not defined by the user(s). Additionally, the inclusion of all possible modified forms within the database dramatically increases the size of the database and, thus, limits the search speed (32). Mascot Top Down (30) is based on standard Mascot but enables database searching using a higher mass limit for the precursor ions (up to 110 kDa), which allows for the identification of intact proteins. Protein identification using Mascot Top Down is fundamentally similar to that used in bottom-up proteomics (34), and, therefore, it is somewhat limited in terms of identifying unexpected PTMs. MS-TopDown (31) employs the spectral alignment algorithm (35), which matches the top-down tandem mass spectra to proteins in the database without prior knowledge of the PTMs. Nevertheless, MS-TopDown lacks statistical evaluation of the search results and performs slowly when searching against large databases. MS-Align+ also utilizes spectral alignment for top-down protein identification (32). It is capable of identifying unexpected PTMs and allows for efficient filtering of candidate proteins when the top-down spectra are searched against a large protein database. MS-Align+ also provides statistical evaluation for the selection of proteoform spectrum match (PrSM) with high confidence. More recently, Top-Down Mass Spectrometry Based Proteoform Identification and Characterization (TopPIC) was developed (http://proteomics.informatics.iupui.edu/software/toppic/index.html). TopPIC is an updated version of MS-Align+ with increased spectral alignment speed and reduced computing requirements. In addition, MSPathFinder, developed by Kim et al., also allows for the rapid identification of proteins from top-down tandem mass spectra (http://omics.pnl.gov/software/mspathfinder) using spectral alignment. Although software tools employing spectral alignment, such as MS-Align+ and MSPathFinder, are particularly useful for top-down protein identification, these programs operate using command line, making them difficult to use for those with limited knowledge of command syntax.Recently, new software tools have been developed for proteoform characterization (36, 37). Our group previously developed MASH Suite, a user-friendly interface for the processing, visualization, and validation of high-resolution MS and MS/MS data (36). Another software tool, ProSight Lite, developed recently by the Kelleher group (37), also allows characterization of protein PTMs. However, both of these software tools require prior knowledge of the protein sequence for the effective localization of PTMs. In addition, both software tools cannot process data from liquid chromatography (LC)-MS and LC-MS/MS experiments, which limits their usefulness in large-scale top-down proteomics. Thus, despite these recent efforts, a multifunctional software platform enabling identification, quantitation, and characterization of proteins from top-down spectra, as well as visual validation and data correction, is still lacking.Herein, we report the development of MASH Suite Pro, an integrated software platform, designed to incorporate tools for protein identification, quantitation, and characterization into a single comprehensive package for the analysis of top-down proteomics data. This program contains a user-friendly customizable interface similar to the previously developed MASH Suite (36) but also has a number of new capabilities, including the ability to handle complex proteomics datasets from LC-MS and LC-MS/MS experiments, as well as the ability to identify unknown proteins and PTMs using MS-Align+ (32). Importantly, MASH Suite Pro also provides visualization components for the validation and correction of the computational outputs, which ensures accurate and reliable deconvolution of the spectra and localization of PTMs and sequence variations.  相似文献   

16.
Mathematical tools developed in the context of Shannon information theory were used to analyze the meaning of the BLOSUM score, which was split into three components termed as the BLOSUM spectrum (or BLOSpectrum). These relate respectively to the sequence convergence (the stochastic similarity of the two protein sequences), to the background frequency divergence (typicality of the amino acid probability distribution in each sequence), and to the target frequency divergence (compliance of the amino acid variations between the two sequences to the protein model implicit in the BLOCKS database). This treatment sharpens the protein sequence comparison, providing a rationale for the biological significance of the obtained score, and helps to identify weakly related sequences. Moreover, the BLOSpectrum can guide the choice of the most appropriate scoring matrix, tailoring it to the evolutionary divergence associated with the two sequences, or indicate if a compositionally adjusted matrix could perform better.[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29]  相似文献   

17.
Decomposing a biological sequence into its functional regions is an important prerequisite to understand the molecule. Using the multiple alignments of the sequences, we evaluate a segmentation based on the type of statistical variation pattern from each of the aligned sites. To describe such a more general pattern, we introduce multipattern consensus regions as segmented regions based on conserved as well as interdependent patterns. Thus the proposed consensus region considers patterns that are statistically significant and extends a local neighborhood. To show its relevance in protein sequence analysis, a cancer suppressor gene called p53 is examined. The results show significant associations between the detected regions and tendency of mutations, location on the 3D structure, and cancer hereditable factors that can be inferred from human twin studies.[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27]  相似文献   

18.
19.
20.
Mycobacterium tuberculosis (Mtb), the causative agent of human tuberculosis, remains one of the most prevalent human pathogens and a major cause of mortality worldwide. Metabolic network is a central mediator and defining feature of the pathogenicity of Mtb. Increasing evidence suggests that lysine succinylation dynamically regulates enzymes in carbon metabolism in both bacteria and human cells; however, its extent and function in Mtb remain unexplored. Here, we performed a global succinylome analysis of the virulent Mtb strain H37Rv by using high accuracy nano-LC-MS/MS in combination with the enrichment of succinylated peptides from digested cell lysates and subsequent peptide identification. In total, 1545 lysine succinylation sites on 626 proteins were identified in this pathogen. The identified succinylated proteins are involved in various biological processes and a large proportion of the succinylation sites are present on proteins in the central metabolism pathway. Site-specific mutations showed that succinylation is a negative regulatory modification on the enzymatic activity of acetyl-CoA synthetase. Molecular dynamics simulations demonstrated that succinylation affects the conformational stability of acetyl-CoA synthetase, which is critical for its enzymatic activity. Further functional studies showed that CobB, a sirtuin-like deacetylase in Mtb, functions as a desuccinylase of acetyl-CoA synthetase in in vitro assays. Together, our findings reveal widespread roles for lysine succinylation in regulating metabolism and diverse processes in Mtb. Our data provide a rich resource for functional analyses of lysine succinylation and facilitate the dissection of metabolic networks in this life-threatening pathogen.Post-translational modifications (PTMs)1 are complex and fundamental mechanisms modulating diverse protein properties and functions, and have been associated with almost all known cellular pathways and disease processes (1, 2). Among the hundreds of different PTMs, acylations at lysine residues, such as acetylation (36), malonylation (7, 8), crotonylation (9, 10), propionylation (1113), butyrylation (11, 13), and succinylation (7, 1416) are crucial for functional regulations of many prokaryotic and eukaryotic proteins. Because these lysine PTMs depend on the acyl-CoA metabolic intermediates, such as acetyl-CoA (Ac-CoA), succinyl-CoA, and malonyl-CoA, lysine acylation could provide a mechanism to respond to changes in the energy status of the cell and regulate energy metabolism and the key metabolic pathways in diverse organisms (17, 18).Among these lysine PTMs, lysine succinylation is a highly dynamic and regulated PTM defined as transfer of a succinyl group (-CO-CH2-CH2-CO-) to a lysine residue of a protein molecule (8). It was recently identified and comprehensively validated in both bacterial and mammalian cells (8, 14, 16). It was also identified in core histones, suggesting that lysine succinylation may regulate the functions of histones and affect chromatin structure and gene expression (7). Accumulating evidence suggests that lysine succinylation is a widespread and important PTM in both eukaryotes and prokaryotes and regulates diverse cellular processes (16). The system-wide studies involving lysine-succinylated peptide immunoprecipitation and liquid chromatography-mass spectrometry (LC-MS/MS) have been employed to analyze the bacteria (E. coli) (14, 16), yeast (S. cerevisiae), human (HeLa) cells, and mouse embryonic fibroblasts and liver cells (16, 19). These succinylome studies have generated large data sets of lysine-succinylated proteins in both eukaryotes and prokaryotes and demonstrated the diverse cellular functions of this PTM. Notably, lysine succinylation is widespread among diverse mitochondrial metabolic enzymes that are involved in fatty acid metabolism, amino acid degradation, and the tricarboxylic acid cycle (19, 20). Thus, lysine succinylation is reported as a functional PTM with the potential to impact mitochondrial metabolism and coordinate different metabolic pathways in human cells and bacteria (14, 1922).Mycobacterium tuberculosis (Mtb), the causative agent of tuberculosis (TB), is a major cause of mortality worldwide and claims more human lives annually than any other bacterial pathogen (23). About one third of the world''s population is infected with Mtb, which leads to nearly 1.3 million deaths and 8.6 million new cases of TB in 2012 worldwide (24). Mtb remains a major threat to global health, especially in the developing countries. Emergence of multidrug resistant (MDR) and extensively drug-resistant (XDR) Mtb, and also the emergence of co-infection between TB and HIV have further worsened the situation (2527). Among bacterial pathogens, Mtb has a distinctive life cycle spanning different environments and developmental stages (28). Especially, Mtb can exist in dormant or active states in the host, leading to asymptomatic latent TB infection or active TB disease (29). To achieve these different physiologic states, Mtb developed a mechanism to sense diverse signals from the host and to coordinately regulate multiple cellular processes and pathways (30, 31). Mtb has evolved its metabolic network to both maintain and propagate its survival as a species within humans (3235). It is well accepted that metabolic network is a central mediator and defining feature of the pathogenicity of Mtb (23, 3638). Knowledge of the regulation of metabolic pathways used by Mtb during infection is therefore important for understanding its pathogenicity, and can also guide the development of novel drug therapies (39). On the other hand, increasing evidence suggests that lysine succinylation dynamically regulates enzymes in carbon metabolism in both bacteria and human cells (14, 1922). It is tempting to speculate that lysine succinylation may play an important regulatory role in metabolic processes in Mtb. However, to the best of our knowledge, no succinylated protein in Mtb has been identified, presenting a major obstacle to understand the regulatory roles of lysine succinylation in this life-threatening pathogen.In order to fill this gap in our knowledge, we have initiated a systematic study of the identities and functional roles of the succinylated protein in Mtb. Because Mtb H37Rv is the first sequenced Mtb strain (40) and has been extensively used for studies in dissecting the roles of individual genes in pathogenesis (41), it was selected as a test case. We analyzed the succinylome of Mtb H37Rv by using high accuracy nano-LC-MS/MS in combination with the enrichment of succinylated peptides from digested cell lysates and subsequent peptide identification. In total, 1545 lysine succinylation sites on 626 proteins were identified in this pathogen. The identified succinylated proteins are involved in various biological processes and render particular enrichment to metabolic process. A large proportion of the succinylation sites are present on proteins in the central metabolism pathway. We further dissected the regulatory role of succinylation on acetyl-CoA synthetase (Acs) via site-specific mutagenesis analysis and molecular dynamics (MD) simulations showed that reversible lysine succinylation could inhibit the activity of Acs. Further functional studies showed that CobB, a sirtuin-like deacetylase in Mtb, functions as a deacetylase and as a desuccinylase of Acs in in vitro assays. Together, our findings provide significant insights into the range of functions regulated by lysine succinylation in Mtb.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号