首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Full-length de novo sequencing from tandem mass (MS/MS) spectra of unknown proteins such as antibodies or proteins from organisms with unsequenced genomes remains a challenging open problem. Conventional algorithms designed to individually sequence each MS/MS spectrum are limited by incomplete peptide fragmentation or low signal to noise ratios and tend to result in short de novo sequences at low sequencing accuracy. Our shotgun protein sequencing (SPS) approach was developed to ameliorate these limitations by first finding groups of unidentified spectra from the same peptides (contigs) and then deriving a consensus de novo sequence for each assembled set of spectra (contig sequences). But whereas SPS enables much more accurate reconstruction of de novo sequences longer than can be recovered from individual MS/MS spectra, it still requires error-tolerant matching to homologous proteins to group smaller contig sequences into full-length protein sequences, thus limiting its effectiveness on sequences from poorly annotated proteins. Using low and high resolution CID and high resolution HCD MS/MS spectra, we address this limitation with a Meta-SPS algorithm designed to overlap and further assemble SPS contigs into Meta-SPS de novo contig sequences extending as long as 100 amino acids at over 97% accuracy without requiring any knowledge of homologous protein sequences. We demonstrate Meta-SPS using distinct MS/MS data sets obtained with separate enzymatic digestions and discuss how the remaining de novo sequencing limitations relate to MS/MS acquisition settings.Database search tools, such as Sequest (3), Mascot (4), and InsPecT (5), are the most frequently used methods for reliable protein identification in tandem mass (MS/MS) spectrometry based proteomics. These operate by separately matching each MS/MS spectrum to peptide sequences from reference protein databases where all proteins of interest are presumably contained. But this assumption often does not hold true as many important proteins, such as monoclonal antibodies, are not contained in any database because mechanisms of antibody variation (including genetic recombination and somatic hyper-mutation (6)) constantly create new proteins with novel unique sequences. These mechanisms of variation are the foundation of adaptive immune systems and have enabled highly successful antibody-based therapeutic strategies (7, 8). Nevertheless, such variation also means that antibody MS/MS spectra are typically impossible to identify via standard database search techniques whenever the corresponding sequences are not known in advance. An inherent drawback of database search strategies is that they are only as good as the database(s) being searched and incomplete databases often result in proteins being misidentified or left unidentified (9).Despite the importance of novel protein identification, few high-throughput methods have been developed for de novo sequencing of unknown proteins. Low-throughput Edman degradation is a well-known de novo sequencing approach that can accurately call amino acid sequences in N/C-terminal regions of unknown proteins but has drawbacks that make it unsuitable for sequencing proteins longer than 50 amino acids or proteins with post-translational modifications (10, 11). Many have recognized the potential of tandem mass spectrometry for protein sequencing. For example, in 1987 Johnson and Biemann (12) manually sequenced a complete protein from rabbit bone marrow. Meanwhile, automated de novo sequencing methods that rely on interpretations of individual MS/MS spectra are limited in that they typically cannot reconstruct long (8+ AA) sequences without mis-predicting 1 in 5 AA on average for low accuracy collision-induced dissociation (CID) spectra (13, 14). Recent advances in de novo peptide sequencing have improved sequencing accuracy to over 95% for high resolution higher energy collisional dissociation (HCD)1 spectra (15), but at limited sequence coverage (Chi H et al. report only 55% sequence coverage of peptides identified by database search). In fact, all current per-spectrum de novo sequencing strategies face a significant tradeoff between sequencing accuracy and coverage as spectra exhibiting complete peptide fragmentation rarely cover entire target proteins, yet are required to accurately reconstruct full-length peptide sequences. An alternative approach to separately sequencing individual spectra is to simultaneously interpret multiple MS/MS spectra from overlapping peptides. This Shotgun Protein Sequencing (SPS) paradigm differs from traditional algorithms by deriving consensus sequences from contigs - sets of multiple MS/MS spectra from distinct peptides with overlapping sequences (1, 16). Because SPS aggregates multiple spectra from overlapping peptides, protein sequences extending beyond the length of enzymatically digested peptides can be extracted from spectra with incomplete peptide fragmentation. Furthermore, SPS has been found to generate sequences that frequently cover 90–95+% of the target protein sequence(s) whereas mis-predicting only 1 out of every 20 amino acids on high resolution MS/MS spectra (2). But a remaining limitation of SPS is that it still generates fragmented sequences that do not singularly cover large regions of the target protein sequences, much less complete proteins: SPS sequences have an average length of 10–15 amino acids (depending on input data) and the longest recovered SPS de novo sequence is less than 45 amino acids long (1).The considerable limitations of de novo sequencing strategies have typically been addressed by attempting to circumvent them using error-tolerant matching to known protein sequences. One such strategy (17) is to generate short de novo sequence tags and then match them exactly to protein databases without requiring matching the N/C-term flanking masses (to allow for unexpected polymorphisms or post-translational modifications). Short sequence tags are usually derived from parts of the spectrum with high signal-to-noise ratios and typically have higher sequencing accuracy than full-length de novo sequences (18). This approach was later extended in MS-Shotgun (19) and continues to be a popular technique for speeding up database search tools (5, 2022). Homology matching of full length de novo sequences was first explored in CIDentify (23) and later in MS-BLAST (24) by searching de novo sequences using FASTA and WU-BLAST2 (respectively) to find homologous matches to sequences of related proteins; FASTS (25) also approached the problem using a modified version of FASTA. However, common de novo sequencing errors tend to produce sequences that are heavily penalized in pure sequence homology searches. For example, missing peaks in MS/MS spectra may easily cause GA subsequences to be reconstructed as Q or AG (same-mass sequences), thus making subsequent BLAST searches unlikely to succeed. This issue was partially considered in CIDentify and more thoroughly addressed in SPIDER (26) by explicitly modeling de novo sequencing errors together with BLOSUM scores in MS/MS-based sequence homology searches. In addition, OpenSea (27) further explored database matching of de novo sequences for analysis of unexpected post-translational modifications (PTMs). Finally, Shen et al. (28) used short unique de novo sequence tags, called UStags, to discover protein-localized PTMs.Recent approaches to homology matching of de novo sequences have built on genome assembly and sequencing techniques to achieve database-assisted full-length sequencing of unknown proteins. Comparative Shotgun Protein Sequencing (cSPS) complemented SPS assembly techniques with usage of error tolerant matching of de novo sequences to find overlapping SPS de novo sequences that are then further assembled into full-length protein sequences (2). cSPS was designed to support the sequencing of highly divergent proteins that have regions close enough in homology to transfer matches from a reference. cSPS was shown to enable de novo sequencing of monoclonal antibodies at 95+% sequencing accuracy, while simultaneously tolerating and identifying unexpected PTMs (29). In difference from cSPS, Champs (30) de novo sequences individual spectra to obtain putative peptide sequences, which are then mapped to homologous proteins to correct sequencing errors and reconstruct protein sequences with 100% accuracy and 99% coverage. However, Champs is designed to only map peptides that differ from the reference sequence by one or two amino acids and does not handle PTMs. As such, its sequencing accuracy is not directly comparable to that of cSPS as Champs was not designed to sequence highly divergent proteins (such as monoclonal antibodies) with multiple PTMs, insertions, deletions, and/or recombinations. GenoMS (31) extended the approaches in cSPS/Champs by explicitly modeling protein splice variants as paths in splice graphs where nodes represent translated exon regions (32). MS/MS spectra are first searched for exact sequence matches against all possible protein isoforms. The remaining unidentified MS/MS spectra are then aligned to the matched peptides and de novo sequenced to extend the matched sequences into novel regions. Reported sequences are 97–99% accurate and cover 96–99% of target proteins depending on sequence similarity between the novel and reference sequences (31). However, GenoMS de novo sequences are usually extended less than 3 amino acids beyond matched peptides because sequencing accuracy degrades as sequences are extended, thus preventing the consistent extension of long (10+ AA) sequences. Altogether, the use of homology matching approaches for full-length de novo protein sequencing continues to be limited by 1) requiring the previous knowledge of closely related protein sequences and 2) the inherent difficulties in statistically significant homology-tolerant matching of error-prone short de novo sequences.The Meta-SPS approach proposed here seeks to de novo sequence complete proteins, or long protein regions, without any use of a database. Meta-SPS builds upon SPS by treating SPS de novo sequences (contig sequences) as input spectra and further assembling them into longer de novo sequences (meta-contig sequences). We show that Meta-SPS extends de novo sequences to lengths over 100 AA while boosting sequencing accuracy to only 1 mistake per 40 amino acid predictions, thus enabling database-free de novo sequencing of completely novel proteins while also allowing error-tolerant matching approaches to support higher-divergence homologies (by searching longer, more accurate de novo sequences). Meta-SPS algorithms are demonstrated on CID and HCD MS/MS spectra and its limitations are discussed in relation to the underlying limitations of bottom-up tandem mass spectrometry.  相似文献   

2.
To study the soybean plasma membrane proteome under osmotic stress, two methods were used: a gel‐based and a LC MS/MS‐based proteomics method. Two‐day‐old seedlings were subjected to 10% PEG for 2 days. Plasma membranes were purified from seedlings using a two‐phase partitioning method and their purity was verified by measuring ATPase activity. Using the gel‐based proteomics, four and eight protein spots were identified as up‐ and downregulated, respectively, whereas in the nanoLC MS/MS approach, 11 and 75 proteins were identified as up‐ and downregulated, respectively, under PEG treatment. Out of osmotic stress responsive proteins, most of the transporter proteins and all proteins with high number of transmembrane helices as well as low‐abundance proteins could be identified by the LC MS/MS‐based method. Three homologues of plasma membrane H+‐ATPase, which are transporter proteins involved in ion efflux, were upregulated under osmotic stress. Gene expression of this protein was increased after 12 h of stress exposure. Among the identified proteins, seven proteins were mutual in two proteomics techniques, in which calnexin was the highly upregulated protein. Accumulation of calnexin in plasma membrane was confirmed by immunoblot analysis. These results suggest that under hyperosmotic conditions, calnexin accumulates in the plasma membrane and ion efflux accelerates by upregulation of plasma membrane H+‐ATPase protein.  相似文献   

3.
4.
《Biomarkers》2013,18(4):352-361
Objective: To identify plasma protein biomarkers of cervical high-grade squamous intraepithelial lesion (HSIL) of Uyghur women by proteomics approach.

Methods: Plasma protein samples of Uyghur women with HSIL and chronic cervicitis were analyzed with 2D HPLC followed by detection of target proteins with Linear Trap Quadrupole Mass Spectrometer (LTQ MS/MS).

Results: We detected three upregulated and one downregulated protein peaks representing protein constituents distinguishing HSIL from controls by 2D HPLC, identified 31 target proteins by LTQ MS/MS. Further confirmed analysis with online software IPA® 8.7 and ELISA assay showed APOA1 and mTOR as potential biomarkers.

Conclusions: A distinct plasma proteomic profile may be associated with HSIL of Uyghur women.  相似文献   

5.
Pyridostatin (PDS) is a well-known G-quadruplex (G4) inducer and stabilizer, yet its target genes have remained unclear. Herein, applying MS proteomics strategy, we revealed PDS significantly downregulated 22 proteins but upregulated 16 proteins in HeLa cancer cells, of which the genes both contain a number of G4 potential sequences, implying that PDS regulation on gene expression is far more complicated than inducing/stabilizing G4 structures. The PDS-downregulated proteins consequently upregulated 6 proteins to activate cyclin and cell cycle regulation, suggesting that PDS itself is not a potential anticancer agent, at least toward HeLa cancer cells. Importantly, SUB1, which encodes human positive cofactor and DNA lesion sensor PC4, was downregulated by 4.76-fold. Further studies demonstrated that the downregulation of PC4 dramatically promoted the cytotoxicity of trans-[PtCl2(NH3)(thiazole)] (trans-PtTz) toward HeLa cells to a similar level of cisplatin, contributable to retarding the repair of 1,3-trans-PtTz crosslinked DNA lesion mediated by PC4. These findings not only provide new insights into better understanding on the biological functions of PDS but also implicate a strategy for the rational design of novel multi-targeting platinum anticancer drugs via conjugation of PDS as a ligand to the coordination scaffold of transplatin for battling drug resistance to cisplatin.  相似文献   

6.
7.
Staphylococcus aureus is a highly successful human pathogen responsible for a wide range of infections. This study provides insights into the virulence, pathogenicity, and antimicrobial resistance determinants of methicillin‐susceptible and methicillin‐resistant S. aureus (MSSA; MRSA) recovered from non‐healthcare environments. Three environmental MSSA and three environmental MRSA are selected for proteomic profiling using isobaric tag for relative and absolute quantitation tandem mass spectrometry (iTRAQ MS/MS). Gene Ontology annotation and Kyoto Encyclopedia of Genes and Genomes pathway annotation are applied to interpret the functions of the proteins detected. 792 proteins are identified in MSSA and MRSA. Comparative analysis of MRSA and MSSA reveals that 8 of out 792 proteins are upregulated and 156 are downregulated. Proteins that have differences in abundance are predominantly involved in catalytic and binding activity. Among 164 differently abundant proteins, 29 are involved in pathogenesis, antimicrobial resistance, stress response, mismatch repair, and cell wall synthesis. Twenty‐two proteins associated with pathogenicity including SPA, SBI, CLFA, and DLT are upregulated in MRSA. Moreover, the upregulated pathogenic protein ENTC2 in MSSA is determined to be a super antigen, potentially capable of triggering toxic shock syndrome in the host. Enhanced pathogenicity, antimicrobial resistance, and stress response are observed in MRSA compared to MSSA.  相似文献   

8.
Wang C  Liu Y  Li H  Xu WJ  Zhang H  Peng XX 《Journal of Proteomics》2012,75(4):1263-1275
We have used differential sub-proteomic methodologies to detect Edwardsiella tarda outer membrane (OM) protein expression regulation during interaction with fish and human plasma, which is the critical step of the bacterial invasion internal organs via blood circulation. Seven and nine OM proteins were differentially expressed in response to fish and human plasma stress, respectively. Six proteins, TolB2, ETAE_2935, ETAE_0245, EvpA, ETAE_2675 and OmpA, were the shared proteins with the similar changes between the two plasma treatments. Except for EvpA, which was a known protein involved in bacterial pathogenesis and stress sensing, the others were first reported here to be related to bacterial invasion and infection. Out of them, four, upregulated ETAE_0245 and OmpA and downregulated ETAE_2675 and ETAE_2935, were selected for investigation of immune protection. The upregulated OmpA and ETAE_0245 were able to induce bactericidal antibodies in mice. These findings demonstrate that differential proteomic methodologies following protein expression regulation to interaction between host and pathogen with bacterial challenge post immunization of these altered proteins is a valid approach for identifying new vaccine candidates and nicely complements other high throughput mining strategies used for vaccine discovery.  相似文献   

9.
Ribosomal ribonucleic acid (RNA), transfer RNA and other biological or synthetic RNA polymers can contain nucleotides that have been modified by the addition of chemical groups. Traditional Sanger sequencing methods cannot establish the chemical nature and sequence of these modified-nucleotide containing oligomers. Mass spectrometry (MS) has become the conventional approach for determining the nucleotide composition, modification status and sequence of modified RNAs. Modified RNAs are analyzed by MS using collision-induced dissociation tandem mass spectrometry (CID MS/MS), which produces a complex dataset of oligomeric fragments that must be interpreted to identify and place modified nucleosides within the RNA sequence. Here we report the development of RoboOligo, an interactive software program for the robust analysis of data generated by CID MS/MS of RNA oligomers. There are three main functions of RoboOligo: (i) automated de novo sequencing via the local search paradigm. (ii) Manual sequencing with real-time spectrum labeling and cumulative intensity scoring. (iii) A hybrid approach, coined ‘variable sequencing’, which combines the user intuition of manual sequencing with the high-throughput sampling of automated de novo sequencing.  相似文献   

10.
11.
12.

Background

The MS4A gene family in humans includes CD20 (MS4A1), FcRβ (MS4A2), Htm4 (MS4A3), and at least 13 other syntenic genes encoding membrane proteins, most having characteristic tetraspanning topology. Expression of MS4A genes is variable in tissues throughout the body; however, several are limited to cells in the hematopoietic system where they have known roles in immune cell functions. Genes in the small TMEM176 group share significant sequence similarity with MS4A genes and there is evidence of immune function of at least one of the encoded proteins. In this study, we examined the evolutionary history of the MS4A/TMEM176 families as well as tissue expression of the phylogenetically earliest members, in order to investigate their possible origins in immune cells.

Principal Findings

Orthologs of human MS4A genes were found only in mammals; however, MS4A gene homologs were found in most jawed vertebrates. TMEM176 genes were found only in mammals and bony fish. Several unusual MS4A genes having 2 or more tandem MS4A sequences were identified in the chicken (Gallus gallus) and early mammals (opossum, Monodelphis domestica and platypus, Ornithorhyncus anatinus). A large number of highly conserved MS4A and TMEM176 genes was found in zebrafish (Danio rerio). The most primitive organism identified to have MS4A genes was spiny dogfish (Squalus acanthus). Tissue expression of MS4A genes in S. acanthias and D. rerio showed no evidence of expression restricted to the hematopoietic system.

Conclusions/Significance

Our findings suggest that MS4A genes first appeared in cartilaginous fish with expression outside of the immune system, and have since diversified in many species into their modern forms with expression and function in both immune and nonimmune cells.  相似文献   

13.
A set of proteins that changed their levels of synthesis during growth of Acidithiobacillus ferrooxidans ATCC 19859 on metal sulfides, thiosulfate, elemental sulfur, and ferrous iron was characterized by using two-dimensional polyacrylamide gel electrophoresis. N-terminal amino acid sequencing and mass spectrometry analysis of these proteins allowed their identification and the localization of the corresponding genes in the available genomic sequence of A. ferrooxidans ATCC 23270. The genomic context around several of these genes suggests their involvement in the energetic metabolism of A. ferrooxidans. Two groups of proteins could be distinguished. The first consisted of proteins highly upregulated by growth on sulfur compounds (and downregulated by growth on ferrous iron): a 44-kDa outer membrane protein, an exported 21-kDa putative thiosulfate sulfur transferase protein, a 33-kDa putative thiosulfate/sulfate binding protein, a 45-kDa putative capsule polysaccharide export protein, and a putative 16-kDa protein of unknown function. The second group of proteins comprised those downregulated by growth on sulfur (and upregulated by growth on ferrous iron): rusticyanin, a cytochrome c552, a putative phosphate binding protein (PstS), the small and large subunits of ribulose biphosphate carboxylase, and a 30-kDa putative CbbQ protein, among others. The results suggest in general a separation of the iron and sulfur utilization pathways. Rusticyanin, in addition to being highly expressed on ferrous iron, was also newly synthesized, as determined by metabolic labeling, although at lower levels, during growth on sulfur compounds and iron-free metal sulfides. During growth on metal sulfides containing iron, such as pyrite and chalcopyrite, both proteins upregulated on ferrous iron and those upregulated on sulfur compounds were synthesized, indicating that the two energy-generating pathways are induced simultaneously depending on the kind and concentration of oxidizable substrates available.  相似文献   

14.

Background  

Often high-quality MS/MS spectra of tryptic peptides do not match to any database entry because of only partially sequenced genomes and therefore, protein identification requires de novo peptide sequencing. To achieve protein identification of the economically important but still unsequenced plant pathogenic oomycete Plasmopara halstedii, we first evaluated the performance of three different de novo peptide sequencing algorithms applied to a protein digests of standard proteins using a quadrupole TOF (QStar Pulsar i).  相似文献   

15.
This study profiled the plasma proteins of patients infected by the 2011 H1N1 influenza virus. Differential protein expression was identified in plasma obtained from noninfected control subjects (n = 15) and H1N1‐infected subjects (n = 15). Plasma proteins were separated by a 2DE large gel system and identified by nano‐ultra performance LC‐MS. Western blot assays were performed to validate proteins. Eight plasma proteins were upregulated and six proteins were downregulated among 3316 plasma proteins in the H1N1‐infected group as compared with the control group. Of 14 up‐ and downregulated proteins, nine plasma proteins were validated by Western blot analysis. Putative protein FAM 157A, leucine‐rich alpha 2 glycoprotein, serum amyloid A protein, and dual oxidase 1 showed significant differential expression. The identified plasma proteins could be potential candidates for biomarkers of H1N1 influenza viral infection. Further studies are needed to develop these proteins as diagnostic biomarkers.  相似文献   

16.
Glial cells are responsible for a wide range of functions in the nervous system of vertebrates. The myelinated nervous systems of extant elasmobranchs have the longest independent history of all gnathostomes. Much is known about the development of glia in other jawed vertebrates, but research in elasmobranchs is just beginning to reveal the mechanisms guiding neurodevelopment. This study examines the development of glial cells in the bamboo shark, Chiloscyllium punctatum, by identifying the expression pattern of several classic glial and myelin proteins. We show for the first time that glial development in the bamboo shark (C. punctamum) embryo follows closely the one observed in other vertebrates and that neural development seems to proceed at a faster rate in the PNS than in the CNS. In addition, we observed more myelinated tracts in the PNS than in the CNS, and as early as stage 32, suggesting that the ontogeny of myelin in sharks is closer to osteichthyans than agnathans.  相似文献   

17.
A proteomic approach was used to uncover the inducible molecular defense mechanism of cotton root occurring during the compatible interaction with Thielaviopsis basicola. Microscopic observation of cotton root inoculated with a suspension of conidia showed that this necrotrophic hemibiotroph fungus interacts with the plant and completes its life cycle in our experimental system. 2‐DE analysis of root extracts taken after 1, 3, 5, and 7 days postinoculation and cluster analysis of the protein expression levels showed four major profiles (constant, upregulated, one slightly downregulated, and one dramatically downregulated). Spots significantly (p<0.05) upregulated were analyzed by LC‐MS/MS and identified using MASCOT MS/MS ion search software and associated databases. These proteins included defense and stress related proteins, such as pathogenesis‐related proteins and proteins likely to be involved in the oxidative burst, sugar, and nitrogen metabolism as well as amino acid and isoprenoid synthesis. While many of the identified proteins are common components of the defense response of most plants, a proteasome subunit and a protein reported to be induced only in cotton root following Meloidogyne incognita infection were also identified.  相似文献   

18.
The Burkholderia cepacia complex is a group of Burkholderia species that are opportunistic pathogens causing high mortality rates in patients with cystic fibrosis. An environmental stress often encountered by these soil-dwelling and pathogenic bacteria is phosphorus limitation, an essential element for cellular processes. Here, we describe cellular and extracellular proteins differentially regulated between phosphate-deplete (0 mM, no added phosphate) and phosphate-replete (1 mM) growth conditions using a comparative proteomics (LC–MS/MS) approach. We observed a total of 128 and 65 unique proteins were downregulated and upregulated respectively, in the B. cenocepacia proteome. Of those downregulated proteins, many have functions in amino acid transport/metabolism. We have identified 24 upregulated proteins that are directly/indirectly involved in inorganic phosphate or organic phosphorus acquisition. Also, proteins involved in virulence and antimicrobial resistance were differentially regulated, suggesting B. cenocepacia experiences a dramatic shift in metabolism under these stress conditions. Overall, this study provides a baseline for further research into the biology of Burkholderia in response to phosphorus stress.  相似文献   

19.
20.
Embryonic diapause is a temporary suspension of development at any stage of embryogenesis, which prolongs the gestation period, allowing parturition to occur in conditions that are more suitable for newborns. This reproductive trait is widespread among all vertebrates, including elasmobranchs. Although it has only been confirmed in two elasmobranchs (Rhizoprionodon taylori and Dasyatis say), evidence indicates that at least 14 species of rays and two sharks undergo diapause, suggesting that this form of reproduction exists within a wide range of elasmobranch reproductive modes, including lecithotrophs and matrotrophs. Where it has been studied, embryogenesis is arrested at the blastodisc stage and preserved in the uterus for periods from four to 10?months. There are still many questions that remain unanswered concerning the knowledge on the biology of most diapausing species but it is clear that species benefit differently from this reproductive trait. As in other vertebrates, it is likely that environmental cues and hormones (especially progesterone and prolactin) are involved in the control of diapause in elasmobranchs, however rigorous testing of current hypothesis remains to be carried out.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号