首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 502 毫秒
1.
Bacterial lipoproteins are a diverse and functionally important group of proteins that are amenable to bioinformatic analyses because of their unique signal peptide features. Here we have used a dataset of sequences of experimentally verified lipoproteins of Gram-positive bacteria to refine our previously described lipoprotein recognition pattern (G+LPP). Sequenced bacterial genomes can be screened for putative lipoproteins using the G+LPP pattern. The sequences identified can then be validated using online tools for lipoprotein sequence identification. We have used our protein sequence datasets to evaluate six online tools for efficacy of lipoprotein sequence identification. Our analyses demonstrate that LipoP () performs best individually but that a consensus approach, incorporating outputs from predictors of general signal peptide properties, is most informative. Electronic supplementary material  The online version of this article (doi:) contains supplementary material, which is available to authorized users.  相似文献   

2.
We have developed an automated method for predicting signal peptide sequences and their cleavage sites in eukaryotic and bacterial protein sequences. It is a 2-layer predictor: the 1st-layer prediction engine is to identify a query protein as secretory or non-secretory; if it is secretory, the process will be automatically continued with the 2nd-layer prediction engine to further identify the cleavage site of its signal peptide. The new predictor is called Signal-CF, where C stands for "coupling" and F for "fusion", meaning that Signal-CF is formed by incorporating the subsite coupling effects along a protein sequence and by fusing the results derived from many width-different scaled windows through a voting system. Signal-CF is featured by high success prediction rates with short computational time, and hence is particularly useful for the analysis of large-scale datasets. Signal-CF is freely available as a web-server at http://chou.med.harvard.edu/bioinf/Signal-CF/ or http://202.120.37.186/bioinf/Signal-CF/.  相似文献   

3.
Chou KC 《Proteins》2001,42(1):136-139
Protein signal sequences play a central role in the targeting and translocation of nearly all secreted proteins and many integral membrane proteins in both prokaryotes and eukaryotes. The knowledge of signal sequences has become a crucial tool for pharmaceutical scientists who genetically modify bacteria, plants, and animals to produce effective drugs. However, to effectively use such a tool, the first important thing is to find a fast and effective method to identify the "zipcode" entity; this is also evoked by both the huge amount of unprocessed data available and the industrial need to find more effective vehicles for the production of proteins in recombinant systems. In view of this, a sequence-encoded algorithm was developed to identify the signal sequences and predict their cleavage sites. The rate of correct prediction for 1,939 secretory proteins and 1,440 nonsecretory proteins by self-consistency test is 90.14% and that by jackknife test is 90.13%. The encouraging results indicate that the signal sequences share some common features although they lack similarity in sequence, length, and even composition and that they are predictable to a considerably accurate extent.  相似文献   

4.
Wall-less prokaryotes in the genus Mycoplasma include over 90 species of infectious agents whose pathogenicity for humans and other animals is currently being assessed. Molecular characterization of surface proteins is critical in this regard but is hampered by the lack of genetic systems in these organisms. We used Tn phoA transposition to systematically mutagenize, in Escherichia coli , a genomic plasmid library constructed from Mycoplasma fermentans , a potential human pathogen. The strategy circumvented problems of expressing mycoplasma genes containing UGA (Trp) codons and relied on the construction of the vector pG7ZCW, designed to reduce Tn phoA transposition into vector sequences. Functional phoA gene fusions directly identified genes encoding 19 putative membrane-associated proteins of M. fermentans . Sequences of fusion constructs defined three types of export sequence: (1) non-cleavable, membrane-spanning sequences, (2) signal peptides with signal peptidase (SPase) I-like cleavage sites, and (3) signal peptides with SPase II-like lipoprotein-cleavage sites which, like most other mycoplasmal lipoprotein signals analysed to date, differed from those in several Gram-negative and Gram-positive eubacteria in their lack of a Leu residue at the −3 position. Antibodies to synthetic peptides that were deduced from two fusions to predicted lipoproteins, identified corresponding amphiphilic membrane proteins of 57 kDa and 78 kDa expressed in the mycoplasma. The P57 sequence contained a proline-rich N-terminal region analogous to an adhesin of Mycoplasma gallisepticum . The P78 protein was identical to a serologically defined phase-variant surface lipoprotein. Tn phoA mutagenesis provides an efficient means of systematically characterizing functionally diverse lipoproteins and other exported proteins in mycoplasmas.  相似文献   

5.
A number of computational tools are available for detecting signal peptides, but their abilities to locate the signal peptide cleavage sites vary significantly and are often less than satisfactory. We characterized a set of 270 secreted recombinant human proteins by automated Edman analysis and used the verified cleavage sites to evaluate the success rate of a number of computational prediction programs. An examination of the frequency of amino acid in the N-terminal region of the data set showed a preference of proline and glutamine but a bias against tyrosine. The data set was compared to the SWISS-PROT database and revealed a high percentage of discrepancies with cleavage site annotations that were computationally generated. The best program for predicting signal sequences was found to be SignalP 2.0-NN with an accuracy of 78.1% for cleavage site recognition. The new data set can be utilized for refining prediction algorithms, and we have built an improved version of profile hidden Markov model for signal peptides based on the new data.  相似文献   

6.
Lipid modification of the N-terminal Cys residue (N-acyl-S-diacylglyceryl-Cys) has been found to be an essential, ubiquitous, and unique bacterial posttranslational modification. Such a modification allows anchoring of even highly hydrophilic proteins to the membrane which carry out a variety of functions important for bacteria, including pathogenesis. Hence, being able to identify such proteins is of great value. To this end, we have created a comprehensive database of bacterial lipoproteins, called DOLOP, which contains information and links to molecular details for about 278 distinct lipoproteins and predicted lipoproteins from 234 completely sequenced bacterial genomes. The website also features a tool that applies a predictive algorithm to identify the presence or absence of the lipoprotein signal sequence in a user-given sequence. The experimentally verified lipoproteins have been classified into different functional classes and more importantly functional domain assignments using hidden Markov models from the SUPERFAMILY database that have been provided for the predicted lipoproteins. Other features include the following: primary sequence analysis, signal sequence analysis, and search facility and information exchange facility to allow researchers to exchange results on newly characterized lipoproteins. The website, along with additional information on the biosynthetic pathway, statistics on predicted lipoproteins, and related figures, is available at http://www.mrc-lmb.cam.ac.uk/genomes/dolop/.  相似文献   

7.
Lipoproteins in bacteria   总被引:78,自引:0,他引:78  
Covalent modification of membrane proteins with lipids appears to be ubiquitous in all living cells. The major outer membrane (Braun's) lipoprotein ofE. coli, the prototype of bacterial lipoproteins, is first synthesized as a precursor protein. Analysis of signal sequences of 26 distinct lipoprotein precursors has revealed a consensus sequence of lipoprotein modification/processing site of Leu-(Ala, Ser)-(Gly, Ala)-Cys at – 3 to + 1 positions which would represent the cleavage region of about three-fourth of all lipoprotein signal sequences in bacteria. Unmodified prolipoprotein with the putative consensus sequence undergoes sequential modification and processing reactions catalyzed by glyceryl transferase, O-acyl transferase(s), prolipoprotein signal peptidase (signal peptidase II), and N-acyl transferase to form mature lipoprotein. Like all exported proteins, the export of lipoprotein requires functional SecA, SecY, and SecD proteins. Thus all precursor proteins are exported through a common pathway accessible to both signal peptidase I and signal peptidase II. The rapidly increasing list of lipid-modified proteins in both prokaryotic as well as eukaryotic cells indicates that lipoproteins comprise a diverse group of structurally and functionally distinct proteins. They share a common structural feature which is derived from a common biosynthetic pathway.  相似文献   

8.
The coding of two rare lipoproteins by two genes, rlpA and rlpB, located in the leuS-dacA region (15 min) on the Escherichia coli chromosome was demonstrated by expression of subcloned genes in a maxicell system. The formation of these two proteins was inhibited by globomycin, which is an inhibitor of the signal peptidase for the known lipoproteins of E. coli. In each case, this inhibition was accompanied by formation of a new protein, which showed a slightly lower mobility on sodium dodecyl sulfate-polyacrylamide gel electrophoresis and which we suppose to be a prolipoprotein with an N-terminal signal peptide sequence similar to those of the bacterial major lipoproteins and lysis proteins of some bacteriocins. The incorporation of 3H-labeled palmitate and glycerol into the two lipoproteins was also observed. Sequencing of DNA showed that the two lipoprotein genes contained sequences that could code for signal peptide sequences of 17 amino acids (rlpA lipoprotein) and 18 amino acids (rlpB lipoprotein). The deduced sequences of the mature peptides consisted of 345 amino acids (Mr 35,614, rlpA lipoprotein) and 175 amino acids (Mr 19,445, rlpB lipoprotein), with an N-terminal cysteine to which thioglyceride and N-fatty acyl residues may be attached. These two lipoproteins may be important in duplication of the cells.  相似文献   

9.
We have developed a new method for the identification of signal peptides and their cleavage sites based on neural networks trained on separate sets of prokaryotic and eukaryotic sequences. The method performs significantly better than previous prediction schemes, and can easily be applied to genome-wide data sets. Discrimination between cleaved signal peptides and uncleaved N-terminal signal-anchor sequences is also possible, though with lower precision. Predictions can be made on a publicly available WWW server: http://www.cbs.dtu.dk/services/SignalP/.  相似文献   

10.
Tjalsma H  van Dijl JM 《Proteomics》2005,5(17):4472-4482
The availability of complete bacterial genome sequences allows proteome-wide predictions of exported proteins that are potentially retained in the cytoplasmic membranes of the corresponding organisms. In practice, however, major problems are encountered with the computer-assisted distinction between (Sec-type) signal peptides that direct exported proteins into the growth medium and lipoprotein signal peptides or amino-terminal membrane anchors that cause protein retention in the membrane. In the present studies, which were aimed at improving methods to predict protein retention in the bacterial cytoplasmic membrane, we have compared sets of membrane-attached and extracellular proteins of Bacillus subtilis that were recently identified through proteomics approaches. The results showed that three classes of membrane-attached proteins can be distinguished. Two classes include 43 lipoproteins and 48 proteins with an amino-terminal transmembrane segment, respectively. Remarkably, a third class includes 31 proteins that remain membrane-retained despite the presence of typical Sec-type signal peptides with consensus signal peptidase recognition sites. This unprecedented finding indicates that unknown mechanisms are involved in membrane retention of this class of proteins. A further novelty is a consensus sequence indicative for release of certain lipoproteins from the membrane by proteolytic shaving. Finally, using non-overlapping sets of secreted and membrane-retained proteins, the accuracy of different signal peptide prediction algorithms was assessed. Accuracy for the prediction of protein retention in the membrane was increased to 82% using a majority-vote approach. Our findings provide important leads for future identification of surface proteins from pathogenic bacteria, which are attractive candidate infection markers and potential targets for drugs or vaccines.  相似文献   

11.
Escherichia coli contains several lipoproteins in addition to the major outer membrane lipoprotein (Ichihara, S., Hussain, M., and Mizushima, S. (1981) J. Biol. Chem. 256, 3125-3129). We cloned the gene for one of these new lipoproteins by using a synthetic 15-mer oligonucleotide probe identical to the DNA sequence at the signal peptide cleavage site of the major lipoprotein. The DNA sequence of the cloned gene revealed an open reading frame encoding a 272-amino acid protein with a signal peptide of 23 amino acid residues. The amino acid sequence of the putative cleavage site region of the signal peptide, -Leu-Leu-Ala-Gly-Cys-, is identical to that of the major lipoprotein. When the cloned gene was expressed in E. coli, a gene product with an apparent molecular weight of approximately 29,000 was identified which agrees well with the calculated molecular weight (27,800). The product was labeled with [3H]glycerol, and a precursor molecule of increased molecular weight was accumulated when cells were treated with globomycin, a specific inhibitor for prolipoprotein signal peptidase. We thus designed the gene product as lipoprotein-28. Unlike the major lipoprotein, lipoprotein-28 was found to be localized in the cytoplasmic membrane. A possible orientation of lipoprotein-28 in the E. coli envelope is discussed.  相似文献   

12.
Signal-3L: A 3-layer approach for predicting signal peptides   总被引:3,自引:0,他引:3  
Functioning as an "address tag" that directs nascent proteins to their proper cellular and extracellular locations, signal peptides have become a crucial tool in finding new drugs or reprogramming cells for gene therapy. To effectively and timely use such a tool, however, the first important thing is to develop an automated method for rapidly and accurately identifying the signal peptide for a given nascent protein. With the avalanche of new protein sequences generated in the post-genomic era, the challenge has become even more urgent and critical. In this paper, we have developed a novel method for predicting signal peptide sequences and their cleavage sites in human, plant, animal, eukaryotic, Gram-positive, and Gram-negative protein sequences, respectively. The new predictor is called Signal-3L that consists of three prediction engines working, respectively, for the following three progressively deepening layers: (1) identifying a query protein as secretory or non-secretory by an ensemble classifier formed by fusing many individual OET-KNN (optimized evidence-theoretic K nearest neighbor) classifiers operated in various dimensions of PseAA (pseudo amino acid) composition spaces; (2) selecting a set of candidates for the possible signal peptide cleavage sites of a query secretory protein by a subsite-coupled discrimination algorithm; (3) determining the final cleavage site by fusing the global sequence alignment outcome for each of the aforementioned candidates through a voting system. Signal-3L is featured by high success prediction rates with short computational time, and hence is particularly useful for the analysis of large-scale datasets. Signal-3L is freely available as a web-server at http://chou.med.harvard.edu/bioinf/Signal-3L/ or http://202.120.37.186/bioinf/Signal-3L, where, to further support the demand of the related areas, the signal peptides identified by Signal-3L for all the protein entries in Swiss-Prot databank that do not have signal peptide annotations or are annotated with uncertain terms but are classified by Signal-3L as secretory proteins are provided in a downloadable file. The large-scale file is prepared with Microsoft Excel and named "Tab-Signal-3L.xls", and will be updated once a year to include new protein entries and reflect the continuous development of Signal-3L.  相似文献   

13.
MOTIVATION: Motif detection is an important component of the classification and annotation of protein sequences. A method for aligning motifs with an amino acid sequence is introduced. The motifs can be described by the secondary (i.e. functional, biophysical, etc.) characteristics of a signal or pattern to be detected. The results produced are based on the statistical relevance of the alignment. The method was targeted to avoid the problems (i.e. over-fitting, biological interpretation and mathematical soundness) encountered in other methods currently available. RESULTS: The method was tested on lipoprotein signals in B. subtilis yielding stable results. The results of signal prediction were consistent with other methods where literature was available. AVAILABILITY: An implementation of the motif alignment, refining and bootstrapping is available for public use online at http://www.expasy.org/tools/patoseq/  相似文献   

14.
Proteinases play critical roles in both intra and extracellular processes by binding and cleaving their protein substrates. The cleavage can either be non-specific as part of degradation during protein catabolism or highly specific as part of proteolytic cascades and signal transduction events. Identification of these targets is extremely challenging. Current computational approaches for predicting cleavage sites are very limited since they mainly represent the amino acid sequences as patterns or frequency matrices. In this work, we developed a novel predictor based on Random Forest algorithm (RF) using maximum relevance minimum redundancy (mRMR) method followed by incremental feature selection (IFS). The features of physicochemical/biochemical properties, sequence conservation, residual disorder, amino acid occurrence frequency, secondary structure and solvent accessibility were utilized to represent the peptides concerned. Here, we compared existing prediction tools which are available for predicting possible cleavage sites in candidate substrates with ours. It is shown that our method makes much more reliable predictions in terms of the overall prediction accuracy. In addition, this predictor allows the use of a wide range of proteinases.  相似文献   

15.
The membrane-bound 5'-nucleotidase of Vibrio parahaemolyticus is unique in requiring Cl- for activity. We cloned the nutA gene encoding the 5'-nucleotidase and sequenced it. It contained an open reading frame consisting of 1,680 nucleotides capable of encoding a protein of 560 amino acid residues. The first 21 amino acid residues of the N-terminal portion of this protein seem to be a signal peptide. The rest of the polypeptide (539 residues) is hydrophilic, and its molecular weight was calculated to be 60,008, which is in good agreement with the value of 63 kDa determined by sodium dodecyl sulfate-polyacrylamide gel electrophoresis for the 5'-nucleotidase derived from the cloned nutA gene. We tried to determine the amino acid sequence of the N-terminal portion of the purified enzyme. However, the N-terminal residue seemed to be blocked. As this 5'-nucleotidase can be solubilized from membrane vesicles with detergent, it may be a lipoprotein. The amino acid sequence around the possible cleavage site of the 5'-nucleotidase had homology with the sequences of the cleavage sites of the lipoproteins of Escherichia coli and other bacteria. The amino acid sequence had high (about 60%) homology with the sequence of periplasmic 5'-nucleotidase (uridine diphosphate sugar hydrolase, the product of the ushA gene) of E. coli. It also contained regions that showed some homology with the nucleotide binding sites of many nucleotide binding proteins.  相似文献   

16.
17.
18.
MOTIVATION: Data representation and encoding are essential for classification of protein sequences with artificial neural networks (ANN). Biophysical properties are appropriate for low dimensional encoding of protein sequence data. However, in general there is no a priori knowledge of the relevant properties for extraction of representative features. RESULTS: An adaptive encoding artificial neural network (ACN) for recognition of sequence patterns is described. In this approach parameters for sequence encoding are optimized within the same process as the weight vectors by an evolutionary algorithm. The method is applied to the prediction of signal peptide cleavage sites in human secretory proteins and compared with an established predictor for signal peptides. CONCLUSION: Knowledge of physico-chemical properties is not necessary for training an ACN. The advantage is a low dimensional data representation leading to computational efficiency, easy evaluation of the detected features, and high prediction accuracy. Availability: A cleavage site prediction server is located at the Humboldt University http://itb.biologie.hu-berlin.de/ approximately jo/sig-cleave/ACNpredictor.cgi Contact: jo@itb.hu-berlin.de; berndj@zedat.fu-berlin.de  相似文献   

19.
The nucleotide sequence of the tcpC gene has been determined. It encodes a 53995-Da protein precursor with a signal sequence and cleavage site typical of a number of outer membrane lipoproteins, which are cleaved by the equivalent of signal peptidase II (Lsp) of Escherichia coli. The location of the tcpC gene is such that it is predicted to be translationally coupled to the 5' and 3' flanking genes, tcpY and tcpD, respectively, indicating that it forms part of an operon. Together with the lipoprotein signal sequence and the several hydrophobic domains it seems likely that TcpC is a surface-anchored trans-outer membrane lipoprotein.  相似文献   

20.
SPEPlip: the detection of signal peptide and lipoprotein cleavage sites   总被引:2,自引:0,他引:2  
SUMMARY: SPEPlip is a neural network-based method, trained and tested on a set of experimentally derived signal peptides from eukaryotes and prokaryotes. SPEPlip identifies the presence of sorting signals and predicts their cleavage sites. The accuracy in cross-validation is similar to that of other available programs: the rate of false positives is 4 and 6%, for prokaryotes and eukaryotes respectively and that of false negatives is 3% in both cases. When a set of 409 prokaryotic lipoproteins is predicted, SPEPlip predicts 97% of the chains in the signal peptide class. However, by integrating SPEPlip with a regular expression search utility based on the PROSITE pattern, we can successfully discriminate signal peptide-containing chains from lipoproteins. We propose the method for detecting and discriminating signal peptides containing chains and lipoproteins. AVAILABILITY: It can be accessed through the web page at http://gpcr.biocomp.unibo.it/predictors/  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号