首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 86 毫秒
1.
A new method for predicting signal sequence cleavage sites.   总被引:655,自引:20,他引:635       下载免费PDF全文
A new method for identifying secretory signal sequences and for predicting the site of cleavage between a signal sequence and the mature exported protein is described. The predictive accuracy is estimated to be around 75-80% for both prokaryotic and eukaryotic proteins.  相似文献   

2.
We have developed a new method for the identification of signal peptides and their cleavage sites based on neural networks trained on separate sets of prokaryotic and eukaryotic sequences. The method performs significantly better than previous prediction schemes, and can easily be applied to genome-wide data sets. Discrimination between cleaved signal peptides and uncleaved N-terminal signal-anchor sequences is also possible, though with lower precision. Predictions can be made on a publicly available WWW server: http://www.cbs.dtu.dk/services/SignalP/.  相似文献   

3.
Gram-positive bacteria have been widely investigated for their huge capability to secrete proteins, such as those involved in gene expression, bacterial surface display and bacterial pathogenesis. The N-terminal signal peptide of a secretory protein is responsible for the translocation of polypeptide through the cytoplasmic membrane. Recently, the signal peptide prediction has become a major task in bioinformatics, and many programs with different algorithms were developed to predict signal peptides. In this paper, five prediction programs (SignalP 3.0, PrediSi, Phobius, SOSUIsignal and SIG-Pred) were selected to evaluate their prediction accuracy for signal peptides and cleavage site using 509 unbiased and experimentally verified Gram-positive protein sequences. The results showed that SignalP was the most accurate program in signal peptide (96% accuracy) and cleavage site (83%) prediction. Prediction performance could further be improved by combining multiple methods into consensus prediction, which would increase the accuracy to 98%, and decrease the false positive to zero. When the consensus method was used to predict Bacillus’s extracellular proteins identified by proteomics, more new signal peptides were successfully identified. It could be concluded that the consensus method would be useful to make prediction of signal peptides more reliable.  相似文献   

4.
Presecretory signal peptides of 39 proteins from diverse prokaryotic and eukaryotic sources have been compared. Although varying in length and amino acid composition, the labile peptides share a hydrophobic core of approximately 12 amino acids. A positively charged residue (Lys or Arg) usually precedes the hydrophobic core. Core termination is defined by the occurrence of a charged residue, a sequence of residues which may induce a beta-turn in a polypeptide, or an interruption in potential alpha-helix or beta-extended strand structure. The hydrophobic cores contain, by weight average, 37% Leu: 15% Ala: 10% Val: 10% Phe: 7% Ile plus 21% other hydrophobic amino acids arranged in a non-random sequence. Following the hydrophobic cores (aligned by their last residue) a highly non-random and localized distribution of Ala is apparent within the initial eight positions following the core: (formula; see text) Coincident with this observation, Ala-X-Ala is the most frequent sequence preceding signal peptidase cleavage. We propose the existence of a signal peptidase recognition sequence A-X-B with the preferred cleavage site located after the sixth amino acid following the core sequence. Twenty-two of the above 27 underlined Ala residues would participate as A or B in peptidase cleavage. Position A includes the larger aliphatic amino acids, Leu, Val and Ile, as well as the residues already found at B (principally Ala, Gly and Ser). Since a preferred cleavage site can be discerned from carboxyl and not amino terminal alignment of the hydrophobic cores it is proposed that the carboxyl ends are oriented inward toward the lumen of the endoplasmic reticulum where cleavage is thought to occur. This orientation coupled with the predicted beta-turn typically found between the core and the cleavage site implies reverse hairpin insertion of the signal sequence. The structural features which we describe should help identify signal peptides and cleavage sites in presumptive amino acid sequences derived from DNA sequences.  相似文献   

5.
Given a raw protein sequence, knowing its subcellular location is an important step toward understanding its function and designing further experiments. A novel method is proposed for the prediction of protein subcellular locations from sequences. For four categories of eukaryotic proteins the overall predictive accuracy is 82.0%, 2.6% higher than that by using SVM approach. For three subcellular locations of prokaryotic proteins, an overall accuracy of 89.9% is obtained. In accordance with the architecture of cells, a hierarchical prediction approach is designed. Based on amino acid composition extracellular proteins and intracellular proteins can be identified with accuracy of 97%.  相似文献   

6.
Neural networks have been trained to predict the subcellular location of proteins in prokaryotic or eukaryotic cells from their amino acid composition. For three possible subcellular locations in prokaryotic organisms a prediction accuracy of 81% can be achieved. Assigning a reliability index, 33% of the predictions can be made with an accuracy of 91%. For eukaryotic proteins (excluding plant sequences) an overall prediction accuracy of 66% for four locations was achieved, with 33% of the sequences being predicted with an accuracy of 82% or better. With the subcellular location restricting a protein's possible function, this method should be a useful tool for the systematic analysis of genome data and is available via a server on the world wide web.  相似文献   

7.
Signal peptides and transmembrane helices both contain a stretch of hydrophobic amino acids. This common feature makes it difficult for signal peptide and transmembrane helix predictors to correctly assign identity to stretches of hydrophobic residues near the N-terminal methionine of a protein sequence. The inability to reliably distinguish between N-terminal transmembrane helix and signal peptide is an error with serious consequences for the prediction of protein secretory status or transmembrane topology. In this study, we report a new method for differentiating protein N-terminal signal peptides and transmembrane helices. Based on the sequence features extracted from hydrophobic regions (amino acid frequency, hydrophobicity, and the start position), we set up discriminant functions and examined them on non-redundant datasets with jackknife tests. This method can incorporate other signal peptide prediction methods and achieve higher prediction accuracy. For Gram-negative bacterial proteins, 95.7% of N-terminal signal peptides and transmembrane helices can be correctly predicted (coefficient 0.90). Given a sensitivity of 90%, transmembrane helices can be identified from signal peptides with a precision of 99% (coefficient 0.92). For eukaryotic proteins, 94.2% of N-terminal signal peptides and transmembrane helices can be correctly predicted with coefficient 0.83. Given a sensitivity of 90%, transmembrane helices can be identified from signal peptides with a precision of 87% (coefficient 0.85). The method can be used to complement current transmembrane protein prediction and signal peptide prediction methods to improve their prediction accuracies.  相似文献   

8.
A number of computational tools are available for detecting signal peptides, but their abilities to locate the signal peptide cleavage sites vary significantly and are often less than satisfactory. We characterized a set of 270 secreted recombinant human proteins by automated Edman analysis and used the verified cleavage sites to evaluate the success rate of a number of computational prediction programs. An examination of the frequency of amino acid in the N-terminal region of the data set showed a preference of proline and glutamine but a bias against tyrosine. The data set was compared to the SWISS-PROT database and revealed a high percentage of discrepancies with cleavage site annotations that were computationally generated. The best program for predicting signal sequences was found to be SignalP 2.0-NN with an accuracy of 78.1% for cleavage site recognition. The new data set can be utilized for refining prediction algorithms, and we have built an improved version of profile hidden Markov model for signal peptides based on the new data.  相似文献   

9.
MOTIVATION: Data representation and encoding are essential for classification of protein sequences with artificial neural networks (ANN). Biophysical properties are appropriate for low dimensional encoding of protein sequence data. However, in general there is no a priori knowledge of the relevant properties for extraction of representative features. RESULTS: An adaptive encoding artificial neural network (ACN) for recognition of sequence patterns is described. In this approach parameters for sequence encoding are optimized within the same process as the weight vectors by an evolutionary algorithm. The method is applied to the prediction of signal peptide cleavage sites in human secretory proteins and compared with an established predictor for signal peptides. CONCLUSION: Knowledge of physico-chemical properties is not necessary for training an ACN. The advantage is a low dimensional data representation leading to computational efficiency, easy evaluation of the detected features, and high prediction accuracy. Availability: A cleavage site prediction server is located at the Humboldt University http://itb.biologie.hu-berlin.de/ approximately jo/sig-cleave/ACNpredictor.cgi Contact: jo@itb.hu-berlin.de; berndj@zedat.fu-berlin.de  相似文献   

10.
Signal-3L: A 3-layer approach for predicting signal peptides   总被引:3,自引:0,他引:3  
Functioning as an "address tag" that directs nascent proteins to their proper cellular and extracellular locations, signal peptides have become a crucial tool in finding new drugs or reprogramming cells for gene therapy. To effectively and timely use such a tool, however, the first important thing is to develop an automated method for rapidly and accurately identifying the signal peptide for a given nascent protein. With the avalanche of new protein sequences generated in the post-genomic era, the challenge has become even more urgent and critical. In this paper, we have developed a novel method for predicting signal peptide sequences and their cleavage sites in human, plant, animal, eukaryotic, Gram-positive, and Gram-negative protein sequences, respectively. The new predictor is called Signal-3L that consists of three prediction engines working, respectively, for the following three progressively deepening layers: (1) identifying a query protein as secretory or non-secretory by an ensemble classifier formed by fusing many individual OET-KNN (optimized evidence-theoretic K nearest neighbor) classifiers operated in various dimensions of PseAA (pseudo amino acid) composition spaces; (2) selecting a set of candidates for the possible signal peptide cleavage sites of a query secretory protein by a subsite-coupled discrimination algorithm; (3) determining the final cleavage site by fusing the global sequence alignment outcome for each of the aforementioned candidates through a voting system. Signal-3L is featured by high success prediction rates with short computational time, and hence is particularly useful for the analysis of large-scale datasets. Signal-3L is freely available as a web-server at http://chou.med.harvard.edu/bioinf/Signal-3L/ or http://202.120.37.186/bioinf/Signal-3L, where, to further support the demand of the related areas, the signal peptides identified by Signal-3L for all the protein entries in Swiss-Prot databank that do not have signal peptide annotations or are annotated with uncertain terms but are classified by Signal-3L as secretory proteins are provided in a downloadable file. The large-scale file is prepared with Microsoft Excel and named "Tab-Signal-3L.xls", and will be updated once a year to include new protein entries and reflect the continuous development of Signal-3L.  相似文献   

11.
We have developed an automated method for predicting signal peptide sequences and their cleavage sites in eukaryotic and bacterial protein sequences. It is a 2-layer predictor: the 1st-layer prediction engine is to identify a query protein as secretory or non-secretory; if it is secretory, the process will be automatically continued with the 2nd-layer prediction engine to further identify the cleavage site of its signal peptide. The new predictor is called Signal-CF, where C stands for "coupling" and F for "fusion", meaning that Signal-CF is formed by incorporating the subsite coupling effects along a protein sequence and by fusing the results derived from many width-different scaled windows through a voting system. Signal-CF is featured by high success prediction rates with short computational time, and hence is particularly useful for the analysis of large-scale datasets. Signal-CF is freely available as a web-server at http://chou.med.harvard.edu/bioinf/Signal-CF/ or http://202.120.37.186/bioinf/Signal-CF/.  相似文献   

12.
Signal peptides are short peptides located at the N-terminus of secreted proteins. They characteristically have three domains; a basic region at the N-terminus (n-region), a central hydrophobic core (h-region) and a carboxy-terminal cleavage region (c-region). Although hundreds of different signal peptides have been identified, it has not been completely understood how their features enable signal peptides to influence protein expression. Antibody-derived signal peptides are often used to prepare recombinant antibodies expressed by eukaryotic cells, especially Chinese hamster ovary (CHO) cells. However, when prokaryotic Escherichia coli (E. coli) are utilized in drug discovery processes, such as for phage display selection or antibody humanization, signal peptides have been selected separately due to the differences in the expression systems between the species. In this study, we successfully established a signal peptide that enables a functional antibody to be expressed in both prokaryotic and eukaryotic cells by focusing on the importance of having an Ala residue in the c-region of the signal sequence. We found that changing Ser to Ala at only two positions significantly augmented the anti-HER2 antigen binding fragment (Fab) expression in E. coli. In addition, this altered signal peptide also retained the ability to express functional anti-HER2 antibody in CHO cells. Taken together, the present findings indicate that the signal peptide can promote functional antibody expression in both prokaryotic E. coli and eukaryotic CHO cells. This finding will contribute to the understanding of signal peptides and accelerate therapeutic antibody research.  相似文献   

13.
MOTIVATION: Subcellular localization is a key functional characteristic of proteins. A fully automatic and reliable prediction system for protein subcellular localization is needed, especially for the analysis of large-scale genome sequences. RESULTS: In this paper, Support Vector Machine has been introduced to predict the subcellular localization of proteins from their amino acid compositions. The total prediction accuracies reach 91.4% for three subcellular locations in prokaryotic organisms and 79.4% for four locations in eukaryotic organisms. Predictions by our approach are robust to errors in the protein N-terminal sequences. This new approach provides superior prediction performance compared with existing algorithms based on amino acid composition and can be a complementary method to other existing methods based on sorting signals. AVAILABILITY: A web server implementing the prediction method is available at http://www.bioinfo.tsinghua.edu.cn/SubLoc/. SUPPLEMENTARY INFORMATION: Supplementary material is available at http://www.bioinfo.tsinghua.edu.cn/SubLoc/.  相似文献   

14.
Wang M  Yang J  Chou KC 《Amino acids》2005,28(4):395-402
Summary. Owing to the importance of signal peptides for studying the molecular mechanisms of genetic diseases, reprogramming cells for gene therapy, and finding new drugs for healing a specific defect, it is in great demand to develop a fast and accurate method to identify the signal peptides. Introduction of the so-called {−3,−1, +1} coupling model (Chou, K. C.: Protein Engineering, 2001, 14–2, 75–79) has made it possible to take into account the coupling effect among some key subsites and hence can significantly enhance the prediction quality of peptide cleavage site. Based on the subsite coupling model, a kind of string kernels for protein sequence is introduced. Integrating the biologically relevant prior knowledge, the constructed string kernels can thus be used by any kernel-based method. A Support vector machines (SVM) is thus built to predict the cleavage site of signal peptides from the protein sequences. The current approach is compared with the classical weight matrix method. At small false positive ratios, our method outperforms the classical weight matrix method, indicating the current approach may at least serve as a powerful complemental tool to other existing methods for predicting the signal peptide cleavage site. The software that generated the results reported in this paper is available upon requirement, and will appear at http://www.pami.sjtu.edu.cn/wm. An erratum to this article is available at .  相似文献   

15.
As the knowledge of protein signal peptides can be used to reprogram cells in a desired way for gene therapy, signal peptides have become a crucial tool for researchers to design new drugs for targeting a particular organelle to correct a specific defect. To effectively use such a technique, however, we have to develop an automated method for fast and accurately predicting signal peptides and their cleavage sites, particularly in the post-genomic era when the number of protein sequences is being explosively increased. To realize this, the first important thing is to discriminate secretory proteins from non-secretory proteins. On the basis of the Needleman-Wunsch algorithm, we proposed a new alignment kernel function. The novel approach can be effectively used to extract the statistical properties of protein sequences for machine learning, leading to a higher prediction success rate.  相似文献   

16.
Information of protein subcellular location plays an important role in molecular cell biology. Prediction of the subcellular location of proteins will help to understand their functions and interactions. In this paper, a different mode of pseudo amino acid composition was proposed to represent protein samples for predicting their subcellular localization via the following procedures: based on the optimal splice site of each protein sequence, we divided a sequence into sorting signal part and mature protein part, and extracted sequence features from each part separately. Then, the combined features were fed into the SVM classifier to perform the prediction. By the jackknife test on a benchmark dataset in which none of proteins included has more than 90% pairwise sequence identity to any other, the overall accuracies achieved by the method are 94.5% and 90.3% for prokaryotic and eukaryotic proteins, respectively. The results indicate that the prediction quality by our method is quite satisfactory. It is anticipated that the current method may serve as an alternative approach to the existing prediction methods.  相似文献   

17.
Many secreted and membrane proteins have amino-terminal leader peptides which are essential for their insertion across the membrane bilayer. These precursor proteins, whether from prokaryotic or eukaryotic sources, can be processed to their mature forms in vitro by bacterial leader peptidase. While different leader peptides have shared features, they do not share a unique sequence at the cleavage site. To examine the requirements for substrate recognition by leader peptidase, we have truncated M13 procoat, a membrane protein precursor, from both the amino- and carboxy-terminal ends with specific proteases or chemical cleavage agents. The fragments isolated from these reactions were assayed as substrates for leader peptidase. A 16 amino acid residue peptide which spans the leader peptidase cleavage site is accurately cleaved. Neither the basic amino-terminal region nor most of the hydrophobic central region of the leader peptide are essential for accurate cleavage.  相似文献   

18.
Li Y  Wen Z  Zhou C  Tan F  Li M 《Peptides》2008,29(9):1498-1504
Signal peptide has a pivotal role in the translocation of secretory protein. Some models have been designed to predict its cleavage site. It is reported that the cleavage site has relationship with the neighboring sequence environment, i.e., hydrophobic core h-region, and the specific patterns in c-region. In some studies, this finding does facilitate the prediction of cleavage site. However, in these models, sequence environment information is merely taken account of as model inputs and no detailed investigation into its effect on the prediction of cleavage site has been made. In this work, we analyze the constraint on cleave site placed by the hydrophobic core of signal peptide and then use it to improve the performance of the signal peptide cleavage site prediction. Our model is designed as follows: firstly, a sliding window is used to scan sample and artificial neural network (ANN) is employed to give cleavage site/non-cleavage site scores. Then, based on an estimated hydrophobic h-region a correcting function is proposed to improve the prediction result, in which the sequence environment is taken into account. A trend of cleavage site is indicated by our analysis for each position, which is consistent with experimental findings. Through this correcting step, the improvement of prediction accuracy is over 7%. It therefore demonstrates the neighboring sequence environment is helpful for determination of cleavage site. Program written in Matlab can be downloaded from http://www.scucic.cn/combined model/source code.html.  相似文献   

19.
There are approximately 109 proteins in a cell. A hotspot in bioinformatics is how to identify a protein's subcellular localization, if its sequence is known. In this paper, a method using fast Fourier transform-based support vector machine is developed to predict the subcellular localization of proteins from their physicochemical properties and structural parameters. The prediction accuracies reached 83% in prokaryotic organisms and 84% in eukaryotic organisms with the substitution model of the c-p-v matrix (c, composition; p, polarity; and v, molecular volume). The overall prediction accuracy was also evaluated using the "leave-one-out" jackknife procedure. The influence of the substitution model on prediction accuracy has also been discussed in the work. The source code of the new program is available on request from the authors.  相似文献   

20.
Defective Escherichia coli signal peptides function in yeast   总被引:3,自引:2,他引:1  
To investigate structural characteristics important for eukaryotic signal peptide function in vivo, a hybrid gene with interchangeable signal peptides was cloned into yeast. The hybrid gene encoded nine residues from the amino terminus of the major Escherichia coli lipoprotein, attached to the amino terminus of the entire mature E. coli beta-lactamase sequence. To this sequence were attached sequences encoding the nonmutant E. coli lipoprotein signal peptide, or lipoprotein signal peptide mutants lacking an amino-terminal cationic charge, with shortened hydrophobic core, with altered potential helicity, or with an altered signal-peptide cleavage site. These signal-peptide mutants exhibited altered processing and secretion in E. coli. Using the GAL10 promoter, production of all hybrid proteins was induced to constitute 4-5% of the total yeast protein. Hybrid proteins with mutant signal peptides that show altered processing and secretion in E. coli, were processed and translocated to a similar degree as the non-mutant hybrid protein in yeast (approximately 36% of the total hybrid protein). Both non-mutant and mutant signal peptides appeared to be removed at the same unique site between cysteine 21 and serine 22, one residue from the E. coli signal peptidase II processing site. The mature lipo-beta-lactamase was translocated across the cytoplasmic membrane into the yeast periplasm. Thus the protein secretion apparatus in yeast recognizes the lipoprotein signal sequence in vivo but displays a specificity towards altered signal sequences which differs from that of E. coli.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号