首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Vipera lebetina venom contains specific coagulant Factor X activator (VLFXA) that cleaves the Arg52-Ile53 bond in the heavy chain of human factor X. VLFXA is a glycoprotein that is composed of a heavy chain (HC) and two light chains (LC) linked by disulfide bonds. The complete amino acid sequences of the three chains of the factor X activator from V. lebetina snake venom are deduced from the nucleotide sequences of cDNAs encoding these chains. The full-length cDNA (2347 bp) sequence of the HC encodes an open reading frame (ORF) of 612 amino acids that includes signal peptide, propeptide and mature metalloproteinase with disintegrin-like and cysteine-rich domains. The light chain LC1 contains 123 and LC2 135 amino acid residues. Both light chains belong to the class of C-type lectin-like proteins. The N-termini of VLFXA chains and inner sequences of peptide fragments detected by liquid chromatography-electrospray ionization tandem mass spectrometry (LC MS/MS) from protein sequence are 100% identical to the sequences deduced from the cDNA. The molecular masses of tryptic fragments of VLFXA chains analyzed by matrix-assisted laser desorption ionization time of flight mass spectrometry (MALDI-TOF MS) also confirm the protein sequences deduced from the cDNAs. These are the first cloned factor X activator heavy and light chains. We demonstrate that the heavy and light chains are synthesized from different genes.  相似文献   

2.
3.
4.
LC MS/MS has become an established technology in proteomic studies, and with the maturation of the technology the bottleneck has shifted from data generation to data validation and mining. To address this bottleneck we developed Experimental Peptide Identification Repository (EPIR), which is an integrated software platform for storage, validation, and mining of LC MS/MS-derived peptide evidence. EPIR is a cumulative data repository where precursor ions are linked to peptide assignments and protein associations returned by a search engine (e.g. Mascot, Sequest, or PepSea). Any number of datasets can be parsed into EPIR and subsequently validated and mined using a set of software modules that overlay the database. These include a peptide validation module, a protein grouping module, a generic module for extracting quantitative data, a comparative module, and additional modules for extracting statistical information. In the present study, the utility of EPIR and associated software tools is demonstrated on LC MS/MS data derived from a set of model proteins and complex protein mixtures derived from MCF-7 breast cancer cells. Emphasis is placed on the key strengths of EPIR, including the ability to validate and mine multiple combined datasets, and presentation of protein-level evidence in concise, nonredundant protein groups that are based on shared peptide evidence.  相似文献   

5.
Candidate protein biomarker discovery by full automatic integration of Orbitrap full MS1 spectral peptide profiling and X!Tandem MS2 peptide sequencing is investigated by analyzing mass spectra from brain tumor samples using Peptrix. Potential protein candidate biomarkers found for angiogenesis are compared with those previously reported in the literature and obtained from previous Fourier transform ion cyclotron resonance (FT-ICR) peptide profiling. Lower mass accuracy of peptide masses measured by Orbitrap compared to those measured by FT-ICR is compensated by the larger number of detected masses separated by liquid chromatography (LC), which can be directly linked to protein identifications. The number of peptide sequences divided by the number of unique sequences is 9248/6911  1.3. Peptide sequences appear 1.3 times redundant per up-regulated protein on average in the peptide profile matrix, and do not seem always up-regulated due to tailing in LC retention time (40%), modifications (40%) and mass determination errors (20%). Significantly up-regulated proteins found by integration of X!Tandem are described in the literature as tumor markers and some are linked to angiogenesis. New potential biomarkers are found, but need to be validated independently. Eventually more proteins could be found by actively involving MS2 sequence information in the creation of the MS1 peptide profile matrix.  相似文献   

6.
7.
Bacterial lipoproteins are a diverse and functionally important group of proteins that are amenable to bioinformatic analyses because of their unique signal peptide features. Here we have used a dataset of sequences of experimentally verified lipoproteins of Gram-positive bacteria to refine our previously described lipoprotein recognition pattern (G+LPP). Sequenced bacterial genomes can be screened for putative lipoproteins using the G+LPP pattern. The sequences identified can then be validated using online tools for lipoprotein sequence identification. We have used our protein sequence datasets to evaluate six online tools for efficacy of lipoprotein sequence identification. Our analyses demonstrate that LipoP () performs best individually but that a consensus approach, incorporating outputs from predictors of general signal peptide properties, is most informative. Electronic supplementary material  The online version of this article (doi:) contains supplementary material, which is available to authorized users.  相似文献   

8.
Computer programs that can be used for the design of syntheticgenes and that are run on an Apple Macintosh computer are described.These programs determine nucleic acid sequences encoding aminoacid sequences. They select DNA sequences based on codon usageas specified by the user, and determine the placement of basechanges that can be used to create restriction enzyme siteswithout altering the amino acid sequence. A new algorithm forfinding restriction sites by translating the restriction endonucleasetarget sequence in all three reading frames and then searchingthe given peptide or protein amino acid sequence with theseshort restriction enzyme peptide sequences is described. Examplesare given for the creation of synthetic DNA sequences for thebovine prethrombin-2 and ribonuclease A genes Received on October 18, 1988; accepted on December 9, 1988  相似文献   

9.
We present MassSieve, a Java‐based platform for visualization and parsimony analysis of single and comparative LC‐MS/MS database search engine results. The success of mass spectrometric peptide sequence assignment algorithms has led to the need for a tool to merge and evaluate the increasing data set sizes that result from LC‐MS/MS‐based shotgun proteomic experiments. MassSieve supports reports from multiple search engines with differing search characteristics, which can increase peptide sequence coverage and/or identify conflicting or ambiguous spectral assignments.  相似文献   

10.
A technology of mass spectrometry (MS) was used in this study for the large-scale proteomic identification and verification of protein-encoding genes present in the silkworm (Bombyx mori) genome. Peptide sequences identified by MS were compared with those from an open reading frame (ORF) library of the B. mori genome and a cDNA library, to validate the coding attributes of ORFs. Two databases were created. The first was based on a 9× draft sequence of the silkworm genome and contained 14,632 putative proteins. The second was based on a B. mori pupal cDNA library containing 3,187 putative proteins of at least 30 amino acid residues in length. A total of 81,000 peptide sequences with a threshold score of 60% were generated by the MS/MS analysis, and 55,400 of these were chosen for a sequence alignment. By searching these two databases, 6,649 and 250 proteins were matched, which accounted for approximately 45.4% and 7.8% of the peptide sequences and putative proteins, respectively. Further analyses carried out by several bioinformatic tools suggested that the matches included proteins with predicted transmembrane domains (1,393) and preproteins with a signal peptide (976). These results provide a fundamental understanding of the expression and function of silkworm proteins.  相似文献   

11.
Accurate estimation of biological diversity in environmental DNA samples using high-throughput amplicon pyrosequencing must account for errors generated by PCR and sequencing. We describe a novel approach to distinguish the underlying sequence diversity in environmental DNA samples from errors that uses information on the abundance distribution of similar sequences across independent samples, as well as the frequency and diversity of sequences within individual samples. We have further refined this approach into a bioinformatics pipeline, Amplicon Pyrosequence Denoising Program (APDP) that is able to process raw sequence datasets into a set of validated sequences in formats compatible with commonly used downstream analyses packages. We demonstrate, by sequencing complex environmental samples and mock communities, that APDP is effective for removing errors from deeply sequenced datasets comprising biological and technical replicates, and can efficiently denoise single-sample datasets. APDP provides more conservative diversity estimates for complex datasets than other approaches; however, for some applications this may provide a more accurate and appropriate level of resolution, and result in greater confidence that returned sequences reflect the diversity of the underlying sample.  相似文献   

12.
Towards an analysis of the rice mitochondrial proteome   总被引:32,自引:0,他引:32       下载免费PDF全文
Purified rice (Oryza sativa) mitochondrial proteins have been arrayed by isoelectric focusing/polyacrylamide gel electrophoresis (PAGE), by blue-native (BN) PAGE, and by reverse-phase high-performance liquid chromatography (LC) separation (LC-mass spectrometry [MS]). From these protein arrays, we have identified a range of rice mitochondrial proteins, including hydrophilic/hydrophobic proteins (grand average of hydropathicity = -1.27 to +0.84), highly basic and acid proteins (isoelectric point = 4.0-12.5), and proteins over a large molecular mass range (6.7-252 kD), using proteomic approaches. BN PAGE provided a detailed picture of electron transport chain protein complexes. A total of 232 protein spots from isoelectric focusing/PAGE and BN PAGE separations were excised, trypsin digested, and analyzed by tandem MS (MS/MS). Using this dataset, 149 of the protein spots (the products of 91 nonredundant genes) were identified by searching translated rice open reading frames from genomic sequence and six-frame translated rice expressed sequence tags. Sequence comparison allowed us to assign functions to a subset of 85 proteins, including many of the major function categories expected for this organelle. A further six spots were matched to rice sequences for which no specific function has yet been determined. Complete digestion of mitochondrial proteins with trypsin yielded a peptide mixture that was analyzed directly by reverse-phase LC via organic solvent elution from a C-18 column (LC-MS). These data yielded 170 MS/MS spectra that matched 72 sequence entries from open reading frame and expressed sequence tag databases. Forty-five of these were obtained using LC-MS alone, whereas 28 proteins were identified by both LC-MS and gel-based separations. In total, 136 nonredundant rice proteins were identified, including a new set of 23 proteins of unknown function located in plant mitochondria. We also report the first direct identification, to our knowledge, of PPR (pentatricopeptide repeat) proteins in the plant mitochondrial proteome. This dataset provides the first extensive picture, to our knowledge, of mitochondrial functions in a model monocot plant.  相似文献   

13.
Accurate protein identification in large-scale proteomics experiments relies upon a detailed, accurate protein catalogue, which is derived from predictions of open reading frames based on genome sequence data. Integration of mass spectrometry-based proteomics data with computational proteome predictions from environmental metagenomic sequences has been challenging because of the variable overlap between proteomic datasets and corresponding short-read nucleotide sequence data. In this study, we have benchmarked several strategies for increasing microbial peptide spectral matching in metaproteomic datasets using protein predictions generated from matched metagenomic sequences from the same human fecal samples. Additionally, we investigated the impact of mass spectrometry-based filters (high mass accuracy, delta correlation), and de novo peptide sequencing on the number and robustness of peptide-spectrum assignments in these complex datasets. In summary, we find that high mass accuracy peptide measurements searched against non-assembled reads from DNA sequencing of the same samples significantly increased identifiable proteins without sacrificing accuracy.  相似文献   

14.
The nucleotide and partial amino acid sequence of toxic shock syndrome toxin-1   总被引:37,自引:0,他引:37  
The nucleotide sequence of toxic shock syndrome toxin-1 (TSST-1) has been determined. In addition, one-third of the predicted amino acid sequence was confirmed by amino acid sequence analysis of cyanogen bromide-generated TSST-1 protein fragments. The DNA sequencing results identified a 708-base pair open reading frame starting with an ATG, 7 base pairs downstream from a Shine-Dalgarno sequence, and terminating at a UAA stop codon. Amino acid analysis of the intact protein defined the NH2 terminus of the mature protein and located the cleavage point for the signal peptide (Ala/Ser). The signal peptide contained the first 40 amino acids and had characteristic structural similarities with other bacterial signal peptides. The coding sequence of the mature protein was 585 base pairs (194 amino acids) in length, and the molecular weight of the predicted protein was 22,049. This is in good agreement with the previously reported molecular weight of TSST-1 (22,000), as determined by sodium dodecyl sulfate-polyacrylamide gel electrophoresis. NH2-terminal amino acid sequence analysis performed on isolated TSST-1 CNBr fragments determined the position of the peptides in the TSST-1 sequence and verified the predicted amino acid sequence in those positions. Computer analyses of the amino acid sequence showed that TSST-1 has little or no sequence homology with biologically related toxins, streptococcal pyrogenic exotoxin A, and staphylococcal enterotoxins B and C.  相似文献   

15.
W K Wang  K Kruus    J H Wu 《Journal of bacteriology》1993,175(5):1293-1302
Clostridium thermocellum ATCC 27405 produces an extracellular cellulase system capable of hydrolyzing crystalline cellulose. The enzyme system involves a multicomponent protein aggregate (the cellulosome) with a total molecular weight in the millions, impeding mechanistic studies. However, two major components of the aggregate, SS (M(r) = 82,000) and SL (M(r) = 250,000), which act synergistically to hydrolyze crystalline cellulose, have been identified (J. H. D. Wu, W. H. Orme-Johnson, and A. L. Demain, Biochemistry 27:1703-1709, 1988). To further study this synergism, we cloned and sequenced the gene (celS) coding for the SS (CelS) protein by using a degenerate, inosine-containing oligonucleotide probe whose sequence was derived from the N-terminal amino acid sequence of the CelS protein. The open reading frame of celS consisted of 2,241 bp encoding 741 amino acid residues. It encoded the N-terminal amino acid sequence and two internal peptide sequences determined for the native CelS protein. A putative ribosome binding site was identified at the 5' end of the gene. A putative signal peptide of 27 amino acid residues was adjacent to the N terminus of the CelS protein. The predicted molecular weight of the secreted protein was 80,670. The celS gene contained a conserved reiterated sequence encoding 24 amino acid residues found in proteins encoded by many other clostridial cel or xyn genes. A palindromic structure was found downstream from the open reading frame. The celS gene is unique among the known cel genes of C. thermocellum. However, it is highly homologous to the partial open reading frame found in C. cellulolyticum and in Caldocellum saccharolyticum, indicating that these genes belong to a new family of cel genes.  相似文献   

16.
To improve the utility of increasingly large numbers of available unannotated and initially poorly annotated genomic sequences for proteome analysis, we demonstrate that effective protein identification can be made on a large and unannotated genome. The strategy developed is to translate the unannotated genome sequence into amino acid sequence encoding putative proteins in all six reading frames, to identify peptides by tandem mass spectrometry (MS/MS), to localize them on the genome sequence, and to preliminarily annotate the protein via a similarity search by BLAST. These tasks have been optimized and automated. Optimization to obtain multiple peptide matches in effect extends the searchable region and results in more robust protein identification. The viability of this strategy is demonstrated with the identification of 223 cilia proteins in the unicellular eukaryotic model organism Tetrahymena thermophila, whose initial genomic sequence draft was released in November 2003. To the best of our knowledge, this is the first demonstration of large-scale protein identification based on such a large, unannotated genome. Of the 223 cilia proteins, 84 have no similarity to proteins in NCBI's nonredundant (nr) database. This methodology allows identifying the locations of the genes encoding these novel proteins, which is a necessary first step to downstream functional genomic experimentation.  相似文献   

17.
Lutz S  Fast W  Benkovic SJ 《Protein engineering》2002,15(12):1025-1030
The identification of a nucleic acid sequence's correct reading frame has important implications for homology-independent protein engineering techniques such as incremental truncation and SCRATCHY. We report the development and experimental implementation of a general in-frame selection system, pSALect, a plasmid vector that utilizes two marker sequences flanking the DNA of interest. This dual selection approach overcomes inconsistencies observed with traditional C-terminally fused reporter proteins. In the pSALect vector, sequences of interest are positioned between an N-terminal Tat-signal sequence and a C-terminal beta-lactamase reporter. In-frame selection of the resulting three-domain protein is performed by growing colonies on ampicillin-containing plates, requiring full-length translation in order to link covalently the signal sequence to the lactamase for export into the periplasm. This dual selection scheme has been validated successfully using defined sequences of glycinamide ribonucleotide formyltransferases (GARTs) from Escherichia coli and human and, in contrast to C-terminal fusion systems, proved effective when applied towards the selection of in-frame constructs in an incremental truncation library.  相似文献   

18.
19.
20.
The discovery of unanticipated protein modifications is one of the most challenging problems in proteomics. Whereas widely used algorithms such as Sequest and Mascot enable mapping of modifications when the mass and amino acid specificity are known, unexpected modifications cannot be identified with these tools. We have developed an algorithm and software called P-Mod, which enables discovery and sequence mapping of modifications to target proteins known to be represented in the analysis or identified by Sequest. P-Mod matches MS/MS spectra to peptide sequences in a search list. For spectra of modified peptides, P-Mod calculates mass differences between search peptide sequences and MS/MS precursors and localizes the mass shift to a sequence position in the peptide. Because modifications are detected as mass shifts, P-Mod does not require the user to guess at masses or sequence locations of modifications. P-Mod uses extreme value statistics to assign p value estimates to sequence-to-spectrum matches. The reported p values are scaled to account for the number of comparisons, so that error rates do not increase with the expanded search lists that result from incorporating potential peptide modifications. Combination of P-Mod searches from multiple LC-MS/MS analyses and multiple samples revealed previously unreported BSA modifications, including a novel decarboxymethylation or D-->G substitution at position 579 of the protein. P-Mod can serve a unique role in the identification of protein modifications both from exogenous and endogenous sources and may be useful for identifying modified protein forms as biomarkers for toxicity and disease processes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号