首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A combined transmembrane topology and signal peptide prediction method   总被引:31,自引:0,他引:31  
An inherent problem in transmembrane protein topology prediction and signal peptide prediction is the high similarity between the hydrophobic regions of a transmembrane helix and that of a signal peptide, leading to cross-reaction between the two types of predictions. To improve predictions further, it is therefore important to make a predictor that aims to discriminate between the two classes. In addition, topology information can be gained when successfully predicting a signal peptide leading a transmembrane protein since it dictates that the N terminus of the mature protein must be on the non-cytoplasmic side of the membrane. Here, we present Phobius, a combined transmembrane protein topology and signal peptide predictor. The predictor is based on a hidden Markov model (HMM) that models the different sequence regions of a signal peptide and the different regions of a transmembrane protein in a series of interconnected states. Training was done on a newly assembled and curated dataset. Compared to TMHMM and SignalP, errors coming from cross-prediction between transmembrane segments and signal peptides were reduced substantially by Phobius. False classifications of signal peptides were reduced from 26.1% to 3.9% and false classifications of transmembrane helices were reduced from 19.0% to 7.7%. Phobius was applied to the proteomes of Homo sapiens and Escherichia coli. Here we also noted a drastic reduction of false classifications compared to TMHMM/SignalP, suggesting that Phobius is well suited for whole-genome annotation of signal peptides and transmembrane regions. The method is available at as well as at  相似文献   

2.
Improved prediction of signal peptides: SignalP 3.0   总被引:63,自引:0,他引:63  
We describe improvements of the currently most popular method for prediction of classically secreted proteins, SignalP. SignalP consists of two different predictors based on neural network and hidden Markov model algorithms, where both components have been updated. Motivated by the idea that the cleavage site position and the amino acid composition of the signal peptide are correlated, new features have been included as input to the neural network. This addition, combined with a thorough error-correction of a new data set, have improved the performance of the predictor significantly over SignalP version 2. In version 3, correctness of the cleavage site predictions has increased notably for all three organism groups, eukaryotes, Gram-negative and Gram-positive bacteria. The accuracy of cleavage site prediction has increased in the range 6-17% over the previous version, whereas the signal peptide discrimination improvement is mainly due to the elimination of false-positive predictions, as well as the introduction of a new discrimination score for the neural network. The new method has been benchmarked against other available methods. Predictions can be made at the publicly available web server  相似文献   

3.
4.
A number of computational tools are available for detecting signal peptides, but their abilities to locate the signal peptide cleavage sites vary significantly and are often less than satisfactory. We characterized a set of 270 secreted recombinant human proteins by automated Edman analysis and used the verified cleavage sites to evaluate the success rate of a number of computational prediction programs. An examination of the frequency of amino acid in the N-terminal region of the data set showed a preference of proline and glutamine but a bias against tyrosine. The data set was compared to the SWISS-PROT database and revealed a high percentage of discrepancies with cleavage site annotations that were computationally generated. The best program for predicting signal sequences was found to be SignalP 2.0-NN with an accuracy of 78.1% for cleavage site recognition. The new data set can be utilized for refining prediction algorithms, and we have built an improved version of profile hidden Markov model for signal peptides based on the new data.  相似文献   

5.
Gram-positive bacteria have been widely investigated for their huge capability to secrete proteins, such as those involved in gene expression, bacterial surface display and bacterial pathogenesis. The N-terminal signal peptide of a secretory protein is responsible for the translocation of polypeptide through the cytoplasmic membrane. Recently, the signal peptide prediction has become a major task in bioinformatics, and many programs with different algorithms were developed to predict signal peptides. In this paper, five prediction programs (SignalP 3.0, PrediSi, Phobius, SOSUIsignal and SIG-Pred) were selected to evaluate their prediction accuracy for signal peptides and cleavage site using 509 unbiased and experimentally verified Gram-positive protein sequences. The results showed that SignalP was the most accurate program in signal peptide (96% accuracy) and cleavage site (83%) prediction. Prediction performance could further be improved by combining multiple methods into consensus prediction, which would increase the accuracy to 98%, and decrease the false positive to zero. When the consensus method was used to predict Bacillus’s extracellular proteins identified by proteomics, more new signal peptides were successfully identified. It could be concluded that the consensus method would be useful to make prediction of signal peptides more reliable.  相似文献   

6.
The accuracy of current signal peptide predictors is outstanding. The most successful predictors are based on neural networks and hidden Markov models, reaching a sensitivity of 99% and an accuracy of 95%. Here, we demonstrate that the popular BLASTP alignment tool can be tuned for signal peptide prediction reaching the same high level of prediction success. Alignment-based techniques provide additional benefits. In spite of high success rates signal peptide predictors yield false predictions. Simple sequences like polyvaline, for example, are predicted as signal peptides. The general architecture of learning systems makes it difficult to trace the cause of such problems. This kind of false predictions can be recognized or avoided altogether by using sequence comparison techniques. Based on these results we have implemented a public web service, called Signal-BLAST. Predictions returned by Signal-BLAST are transparent and easy to analyze. AVAILABILITY: Signal-BLAST is available online at http://sigpep.services.came.sbg.ac.at/signalblast.html.  相似文献   

7.
Locating proteins in the cell using TargetP, SignalP and related tools   总被引:9,自引:0,他引:9  
Determining the subcellular localization of a protein is an important first step toward understanding its function. Here, we describe the properties of three well-known N-terminal sequence motifs directing proteins to the secretory pathway, mitochondria and chloroplasts, and sketch a brief history of methods to predict subcellular localization based on these sorting signals and other sequence properties. We then outline how to use a number of internet-accessible tools to arrive at a reliable subcellular localization prediction for eukaryotic and prokaryotic proteins. In particular, we provide detailed step-by-step instructions for the coupled use of the amino-acid sequence-based predictors TargetP, SignalP, ChloroP and TMHMM, which are all hosted at the Center for Biological Sequence Analysis, Technical University of Denmark. In addition, we describe and provide web references to other useful subcellular localization predictors. Finally, we discuss predictive performance measures in general and the performance of TargetP and SignalP in particular.  相似文献   

8.
In this paper we describe an improved neural network method to predict T-cell class I epitopes. A novel input representation has been developed consisting of a combination of sparse encoding, Blosum encoding, and input derived from hidden Markov models. We demonstrate that the combination of several neural networks derived using different sequence-encoding schemes has a performance superior to neural networks derived using a single sequence-encoding scheme. The new method is shown to have a performance that is substantially higher than that of other methods. By use of mutual information calculations we show that peptides that bind to the HLA A*0204 complex display signal of higher order sequence correlations. Neural networks are ideally suited to integrate such higher order correlations when predicting the binding affinity. It is this feature combined with the use of several neural networks derived from different and novel sequence-encoding schemes and the ability of the neural network to be trained on data consisting of continuous binding affinities that gives the new method an improved performance. The difference in predictive performance between the neural network methods and that of the matrix-driven methods is found to be most significant for peptides that bind strongly to the HLA molecule, confirming that the signal of higher order sequence correlation is most strongly present in high-binding peptides. Finally, we use the method to predict T-cell epitopes for the genome of hepatitis C virus and discuss possible applications of the prediction method to guide the process of rational vaccine design.  相似文献   

9.
Machine learning techniques have improved predictions of secretory proteins from protein, genomic and expressed sequence tag (EST) sequences. Artificial neural networks, physical sequence analysis using high-performance optimization, and hidden Markov models identify extremely variable signal peptides (the vehicles of protein transport across the endoplasmic reticulum membrane), transmembrane segments, and specific extracellular and intracellular domains as indicators of possible roles in the intercellular and intracellular chemical signaling pathways. The major role of peptide hormones, blood coagulation factors, carcinogenesis agents, and other secretory proteins in orchestrating multicellular life indicates pharmacological potential in the cure of major diseases and numerous biotechnological applications.  相似文献   

10.
A hidden Markov model (HMM) has been used to describe, predict, identify, and generate secretory signal peptide sequences. The relative strengths of artificial secretory signals emitted from the human signal peptide HMM (SP-HMM) correlate with their HMM bit scores as determined by their effectiveness to direct alkaline phosphatase secretion. The nature of the signal strength is in effect the closeness to the consensus. The HMM bit score of 8 is experimentally determined to be the threshold for discriminating signal sequences from non-secretory ones. An artificial SP-HMM generated signal sequence of the maximum model bit score (HMM + 38) was selected as an ideal human signal sequence. This signal peptide (secrecon) directs strong protein secretion and expression. We further ranked the signal strengths of the signal peptides of the known human secretory proteins by SP-HMM bit scores. The applications of high-bit scoring HMM signals in recombinant protein production and protein engineering are discussed.  相似文献   

11.
We have developed a new method for the identification of signal peptides and their cleavage sites based on neural networks trained on separate sets of prokaryotic and eukaryotic sequences. The method performs significantly better than previous prediction schemes, and can easily be applied to genome-wide data sets. Discrimination between cleaved signal peptides and uncleaved N-terminal signal-anchor sequences is also possible, though with lower precision. Predictions can be made on a publicly available WWW server: http://www.cbs.dtu.dk/services/SignalP/.  相似文献   

12.
In this review we discuss recent insights obtained from well-characterized model systems into the factors that determine the orientation and tilt angles of transmembrane peptides in lipid bilayers. We will compare tilt angles of synthetic peptides with those of natural peptides and proteins, and we will discuss how tilt can be modulated by hydrophobic mismatch between the thickness of the bilayer and the length of the membrane spanning part of the peptide or protein. In particular, we will focus on results obtained on tryptophan-flanked model peptides (WALP peptides) as a case study to illustrate possible consequences of hydrophobic mismatch in molecular detail and to highlight the importance of peptide dynamics for the experimental determination of tilt angles. We will conclude with discussing some future prospects and challenges concerning the use of simple peptide/lipid model systems as a tool to understand membrane structure and function.  相似文献   

13.
BACKGROUND: A variety of methods for prediction of peptide binding to major histocompatibility complex (MHC) have been proposed. These methods are based on binding motifs, binding matrices, hidden Markov models (HMM), or artificial neural networks (ANN). There has been little prior work on the comparative analysis of these methods. MATERIALS AND METHODS: We performed a comparison of the performance of six methods applied to the prediction of two human MHC class I molecules, including binding matrices and motifs, ANNs, and HMMs. RESULTS: The selection of the optimal prediction method depends on the amount of available data (the number of peptides of known binding affinity to the MHC molecule of interest), the biases in the data set and the intended purpose of the prediction (screening of a single protein versus mass screening). When little or no peptide data are available, binding motifs are the most useful alternative to random guessing or use of a complete overlapping set of peptides for selection of candidate binders. As the number of known peptide binders increases, binding matrices and HMM become more useful predictors. ANN and HMM are the predictive methods of choice for MHC alleles with more than 100 known binding peptides. CONCLUSION: The ability of bioinformatic methods to reliably predict MHC binding peptides, and thereby potential T-cell epitopes, has major implications for clinical immunology, particularly in the area of vaccine design.  相似文献   

14.
We describe and validate a new membrane protein topology prediction method, TMHMM, based on a hidden Markov model. We present a detailed analysis of TMHMM's performance, and show that it correctly predicts 97-98 % of the transmembrane helices. Additionally, TMHMM can discriminate between soluble and membrane proteins with both specificity and sensitivity better than 99 %, although the accuracy drops when signal peptides are present. This high degree of accuracy allowed us to predict reliably integral membrane proteins in a large collection of genomes. Based on these predictions, we estimate that 20-30 % of all genes in most genomes encode membrane proteins, which is in agreement with previous estimates. We further discovered that proteins with N(in)-C(in) topologies are strongly preferred in all examined organisms, except Caenorhabditis elegans, where the large number of 7TM receptors increases the counts for N(out)-C(in) topologies. We discuss the possible relevance of this finding for our understanding of membrane protein assembly mechanisms. A TMHMM prediction service is available at http://www.cbs.dtu.dk/services/TMHMM/.  相似文献   

15.
Computational models complement laboratory experimentation for efficient identification of MHC-binding peptides and T-cell epitopes. Methods for prediction of MHC-binding peptides include binding motifs, quantitative matrices, artificial neural networks, hidden Markov models, and molecular modelling. Models derived by these methods have been successfully used for prediction of T-cell epitopes in cancer, autoimmunity, infectious disease, and allergy. For maximum benefit, the use of computer models must be treated as experiments analogous to standard laboratory procedures and performed according to strict standards. This requires careful selection of data for model building, and adequate testing and validation. A range of web-based databases and MHC-binding prediction programs are available. Although some available prediction programs for particular MHC alleles have reasonable accuracy, there is no guarantee that all models produce good quality predictions. In this article, we present and discuss a framework for modelling, testing, and applications of computational methods used in predictions of T-cell epitopes.  相似文献   

16.
Retention times in HPLC yield valuable information for the identification of various analytes and the prediction of peptide retention is useful for the identification of peptides/proteins in LC-MS-based proteomics. Informatics methods such as artificial neural networks and support vector machines capable of solving nonlinear problems made possible the accurate modeling of quantitative structure-retention relationships of peptides (including large polymers) up to 5 kDa to which classical linear models cannot be applied, as well as the proteome-wide prediction of peptide retention. Proteome-wide retention prediction and accurate mass-information facilitate the identification of peptides in complex proteomic samples. In this review, we address recent developments in solid informatics methods and their application to peptide-retention properties in 'bottom-up' shotgun proteomics. We also describe future prospects for the standardization and application of retention times.  相似文献   

17.
Proteins destined for secretion or membrane compartments possess signal peptides for insertion into the membrane. The signal peptide is therefore critical for localization and function of cell surface receptors and ligands that mediate cell-cell communication. About 4% of all human proteins listed in UniProt database have signal peptide domains in their N terminals. A comprehensive literature survey was performed to retrieve functional and disease associated genetic variants in the signal peptide domains of human proteins. In 21 human proteins we have identified 26 disease associated mutations within their signal peptide domains, 14 mutations of which have been experimentally shown to impair the signal peptide function and thus influence protein transportation. We took advantage of SignalP 3.0 predictions to characterize the signal peptide prediction score differences between the mutant and the wild-type alleles of each mutation, as well as 189 previously uncharacterized single nucleotide polymorphisms (SNPs) found to be located in the signal peptide domains of 165 human proteins. Comparisons of signal peptide prediction outcomes of mutations and SNPs, have implicated SNPs potentially impacting the signal peptide function, and thus the cellular localization of the human proteins. The majority of the top candidate proteins represented membrane and secreted proteins that are associated with molecular transport, cell signaling and cell to cell interaction processes of the cell. This is the first study that systematically characterizes genetic variation occurring in the signal peptides of all human proteins. This study represents a useful strategy for prioritization of SNPs occurring within the signal peptide domains of human proteins. Functional evaluation of candidates identified herein may reveal effects on major cellular processes including immune cell function, cell recognition and adhesion, and signal transduction.  相似文献   

18.
SARS-CoV(BJ01)基因预测及功能推测   总被引:1,自引:1,他引:1  
通过对有关SARS—Cov文献的调研,指出了有关基因预测和功能研究的不足。为制备有效的药物和疫苗,对SARS—CoV(BJ01)重新进行了基因预测和功能推测。比较12种基因预测方法对冠状病毒属中已知基因的预测优劣,选用Heuristic models、Gene Identification、ZCURVE—CoV和ORF FINDER4种较好的方法来预测基因,然后运用AT—Gpr分析第一起始密码子的可能性及是否符合Kozak规则,同时搜索转录调控序列,以提高基因预测的准确性。共预测出34个ORF,排除NCBI及有关文献中完全相同或有微弱差别的13个,得到21个大于50个氨基酸的可能新基因。对于预测出的蛋白质,运用ProtParam分析它们的物理化学特征,用SignaIP分析蛋白是否有信号肽,用BLAST、FASTA分析是否有相似序列,用TMPred、TMHMM、PFAM和HMMTOP分析结构域或模体,以提高基因功能推测的可靠性。根据4种基因预测方法使用情况、与其他冠状病毒属已知基因匹配分值、匹配预期值、已知基因与预测基因长度差别,将21个可能的新基因按出现可能性分为4类。同时对结果进行了讨论。  相似文献   

19.
Signal peptides and transmembrane helices both contain a stretch of hydrophobic amino acids. This common feature makes it difficult for signal peptide and transmembrane helix predictors to correctly assign identity to stretches of hydrophobic residues near the N-terminal methionine of a protein sequence. The inability to reliably distinguish between N-terminal transmembrane helix and signal peptide is an error with serious consequences for the prediction of protein secretory status or transmembrane topology. In this study, we report a new method for differentiating protein N-terminal signal peptides and transmembrane helices. Based on the sequence features extracted from hydrophobic regions (amino acid frequency, hydrophobicity, and the start position), we set up discriminant functions and examined them on non-redundant datasets with jackknife tests. This method can incorporate other signal peptide prediction methods and achieve higher prediction accuracy. For Gram-negative bacterial proteins, 95.7% of N-terminal signal peptides and transmembrane helices can be correctly predicted (coefficient 0.90). Given a sensitivity of 90%, transmembrane helices can be identified from signal peptides with a precision of 99% (coefficient 0.92). For eukaryotic proteins, 94.2% of N-terminal signal peptides and transmembrane helices can be correctly predicted with coefficient 0.83. Given a sensitivity of 90%, transmembrane helices can be identified from signal peptides with a precision of 87% (coefficient 0.85). The method can be used to complement current transmembrane protein prediction and signal peptide prediction methods to improve their prediction accuracies.  相似文献   

20.
Imai K  Nakai K 《Proteomics》2010,10(22):3970-3983
Since the proposal of the signal hypothesis on protein subcellular sorting, a number of computational analyses have been performed in this field. A typical example is the development of prediction algorithms for the subcellular localization sites of input protein sequences. In this review, we mainly focus on the biological grounds of the prediction methods rather than the algorithmic issues because we believe the former will be more fruitful for future development. Recent advances on the study of protein sorting signals will hopefully be incorporated into future prediction methods. Unfortunately, many of the state-of-the-art methods are published without sufficient objective tests. In fact, a simple test employed in this article shows that the performance of specifically developed predictors is not significantly better than that of a homology search. We suspect that this is a general problem associated with the interpretation of genome sequences, which have evolved through gene duplication and speciation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号