首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 312 毫秒
1.
MOTIVATION: Subcellular localization is a key functional characteristic of proteins. A fully automatic and reliable prediction system for protein subcellular localization is needed, especially for the analysis of large-scale genome sequences. RESULTS: In this paper, Support Vector Machine has been introduced to predict the subcellular localization of proteins from their amino acid compositions. The total prediction accuracies reach 91.4% for three subcellular locations in prokaryotic organisms and 79.4% for four locations in eukaryotic organisms. Predictions by our approach are robust to errors in the protein N-terminal sequences. This new approach provides superior prediction performance compared with existing algorithms based on amino acid composition and can be a complementary method to other existing methods based on sorting signals. AVAILABILITY: A web server implementing the prediction method is available at http://www.bioinfo.tsinghua.edu.cn/SubLoc/. SUPPLEMENTARY INFORMATION: Supplementary material is available at http://www.bioinfo.tsinghua.edu.cn/SubLoc/.  相似文献   

2.
ABSTRACT: BACKGROUND: Understanding protein subcellular localization is a necessary component toward understanding the overall function of a protein. Numerous computational methods have been published over the past decade, with varying degrees of success. Despite the large number of published methods in this area, only a small fraction of them are available for researchers to use in their own studies. Of those that are available, many are limited by predicting only a small number of major organelles in the cell. Additionally, the majority of methods predict only a single location, even though it is known that a large fraction of the proteins in eukaryotic species shuttle between locations to carry out their function. FINDINGS: We present a software package and a web server for predicting subcellular localization of protein sequences based on the ngLOC method. ngLOC is an n-gram-based Bayesian classifier that predicts subcellular localization of proteins both in prokaryotes and eukaryotes. The overall prediction accuracy varies from 89.8% to 91.4% across species. This program can predict 11 distinct locations each in plant and animal species. ngLOC also predicts 4 and 5 distinct locations on gram-positive and gram-negative bacterial datasets, respectively. CONCLUSIONS: ngLOC is a generic method that can be trained by data from a variety of species or classes for predicting protein subcellular localization. The standalone software is freely available for academic use under GNU GPL, and the ngLOC web server is also accessible at http://ngloc.unmc.edu.  相似文献   

3.
Neural networks have been trained to predict the subcellular location of proteins in prokaryotic or eukaryotic cells from their amino acid composition. For three possible subcellular locations in prokaryotic organisms a prediction accuracy of 81% can be achieved. Assigning a reliability index, 33% of the predictions can be made with an accuracy of 91%. For eukaryotic proteins (excluding plant sequences) an overall prediction accuracy of 66% for four locations was achieved, with 33% of the sequences being predicted with an accuracy of 82% or better. With the subcellular location restricting a protein's possible function, this method should be a useful tool for the systematic analysis of genome data and is available via a server on the world wide web.  相似文献   

4.
5.
MOTIVATION: Currently available methods for the prediction of subcellular location of mitochondrial proteins rely largely on the presence of mitochondrial targeting signals in the protein sequences. However, a large fraction of mitochondrial proteins lack such signals, making those tools ineffective for genome-scale prediction of mitochondria-targeted proteins. Here, we propose a method for genome-scale prediction of nucleus-encoded mitochondrial proteins. The new method, MITOPRED, is based on the Pfam domain occurrence patterns and the amino acid compositional differences between mitochondrial and non-mitochondrial proteins. RESULTS: MITOPRED could predict mitochondrial proteins with 100% specificity at a 44% sensitivity rate and with 67% specificity at 99% sensitivity. Additionally, it was sufficiently robust to predict mitochondrial proteins across different eukaryotic species with similar accuracy. Based on Matthews correlation coefficient measure, the prediction performance of MITOPRED is clearly superior (0.73) to those of the two popular methods TargetP (0.51) and PSORT (0.53). Using this method, we predicted the nucleus-encoded mitochondrial proteins from six complete genomes (three invertebrate, two vertebrate and one plant species) and estimated the total number in each genome. In human, our method estimated the existence of 1362 mitochondrial proteins corresponding to 4.8% of the total proteome. AVAILABILITY: MITOPRED program is freely accessible at http://mitopred.sdsc.edu. Source code is available on request from the authors. SUPPLEMENTARY INFORMATION: Training data sets are also available at http://mitopred.sdsc.edu  相似文献   

6.
Tantoso E  Li KB 《Amino acids》2008,35(2):345-353
Identifying a protein's subcellular localization is an important step to understand its function. However, the involved experimental work is usually laborious, time consuming and costly. Computational prediction hence becomes valuable to reduce the inefficiency. Here we provide a method to predict protein subcellular localization by using amino acid composition and physicochemical properties. The method concatenates the information extracted from a protein's N-terminal, middle and full sequence. Each part is represented by amino acid composition, weighted amino acid composition, five-level grouping composition and five-level dipeptide composition. We divided our dataset into training and testing set. The training set is used to determine the best performing amino acid index by using five-fold cross validation, whereas the testing set acts as the independent dataset to evaluate the performance of our model. With the novel representation method, we achieve an accuracy of approximately 75% on independent dataset. We conclude that this new representation indeed performs well and is able to extract the protein sequence information. We have developed a web server for predicting protein subcellular localization. The web server is available at http://aaindexloc.bii.a-star.edu.sg .  相似文献   

7.
Revealing the subcellular location of newly discovered protein sequences can bring insight to their function and guide research at the cellular level. The rapidly increasing number of sequences entering the genome databanks has called for the development of automated analysis methods. Currently, most existing methods used to predict protein subcellular locations cover only one, or a very limited number of species. Therefore, it is necessary to develop reliable and effective computational approaches to further improve the performance of protein subcellular prediction and, at the same time, cover more species. The current study reports the development of a novel predictor called MSLoc-DT to predict the protein subcellular locations of human, animal, plant, bacteria, virus, fungi, and archaea by introducing a novel feature extraction approach termed Amino Acid Index Distribution (AAID) and then fusing gene ontology information, sequential evolutionary information, and sequence statistical information through four different modes of pseudo amino acid composition (PseAAC) with a decision template rule. Using the jackknife test, MSLoc-DT can achieve 86.5, 98.3, 90.3, 98.5, 95.9, 98.1, and 99.3% overall accuracy for human, animal, plant, bacteria, virus, fungi, and archaea, respectively, on seven stringent benchmark datasets. Compared with other predictors (e.g., Gpos-PLoc, Gneg-PLoc, Virus-PLoc, Plant-PLoc, Plant-mPLoc, ProLoc-Go, Hum-PLoc, GOASVM) on the gram-positive, gram-negative, virus, plant, eukaryotic, and human datasets, the new MSLoc-DT predictor is much more effective and robust. Although the MSLoc-DT predictor is designed to predict the single location of proteins, our method can be extended to multiple locations of proteins by introducing multilabel machine learning approaches, such as the support vector machine and deep learning, as substitutes for the K-nearest neighbor (KNN) method. As a user-friendly web server, MSLoc-DT is freely accessible at http://bioinfo.ibp.ac.cn/MSLOC_DT/index.html.  相似文献   

8.
Proteins may simultaneously exist at, or move between, two or more different subcellular locations. Proteins with multiple locations or dynamic feature of this kind are particularly interesting because they may have some very special biological functions intriguing to investigators in both basic research and drug discovery. For instance, among the 6408 human protein entries that have experimentally observed subcellular location annotations in the Swiss-Prot database (version 50.7, released 19-Sept-2006), 973 ( approximately 15%) have multiple location sites. The number of total human protein entries (except those annotated with "fragment" or those with less than 50 amino acids) in the same database is 14,370, meaning a gap of (14,370-6408)=7962 entries for which no knowledge is available about their subcellular locations. Although one can use the computational approach to predict the desired information for the gap, so far all the existing methods for predicting human protein subcellular localization are limited in the case of single location site only. To overcome such a barrier, a new ensemble classifier, named Hum-mPLoc, was developed that can be used to deal with the case of multiple location sites as well. Hum-mPLoc is freely accessible to the public as a web server at http://202.120.37.186/bioinf/hum-multi. Meanwhile, for the convenience of people working in the relevant areas, Hum-mPLoc has been used to identify all human protein entries in the Swiss-Prot database that do not have subcellular location annotations or are annotated as being uncertain. The large-scale results thus obtained have been deposited in a downloadable file prepared with Microsoft Excel and named "Tab_Hum-mPLoc.xls". This file is available at the same website and will be updated twice a year to include new entries of human proteins and reflect the continuous development of Hum-mPLoc.  相似文献   

9.
Predicting subcellular localization with AdaBoost Learner   总被引:1,自引:0,他引:1  
Protein subcellular localization, which tells where a protein resides in a cell, is an important characteristic of a protein, and relates closely to the function of proteins. The prediction of their subcellular localization plays an important role in the prediction of protein function, genome annotation and drug design. Therefore, it is an important and challenging role to predict subcellular localization using bio-informatics approach. In this paper, a robust predictor, AdaBoost Learner is introduced to predict protein subcellular localization based on its amino acid composition. Jackknife cross-validation and independent dataset test were used to demonstrate that Adaboost is a robust and efficient model in predicting protein subcellular localization. As a result, the correct prediction rates were 74.98% and 80.12% for the Jackknife test and independent dataset test respectively, which are higher than using other existing predictors. An online server for predicting subcellular localization of proteins based on AdaBoost classifier was available on http://chemdata.shu. edu.cn/sl12.  相似文献   

10.
MOTIVATION: Many important biological processes such as cell signaling, transport of membrane-impermeable molecules, cell-cell communication, cell recognition and cell adhesion are mediated by membrane proteins. Unfortunately, as these proteins are not water soluble, it is extremely hard to experimentally determine their structure. Therefore, improved methods for predicting the structure of these proteins are vital in biological research. In order to improve transmembrane topology prediction, we evaluate the combined use of both integrated signal peptide prediction and evolutionary information in a single algorithm. RESULTS: A new method (MEMSAT3) for predicting transmembrane protein topology from sequence profiles is described and benchmarked with full cross-validation on a standard data set of 184 transmembrane proteins. The method is found to predict both the correct topology and the locations of transmembrane segments for 80% of the test set. This compares with accuracies of 62-72% for other popular methods on the same benchmark. By using a second neural network specifically to discriminate transmembrane from globular proteins, a very low overall false positive rate (0.5%) can also be achieved in detecting transmembrane proteins. AVAILABILITY: An implementation of the described method is available both as a web server (http://www.psipred.net) and as downloadable source code from http://bioinf.cs.ucl.ac.uk/memsat. Both the server and source code files are free to non-commercial users. Benchmark and training data are also available from http://bioinf.cs.ucl.ac.uk/memsat.  相似文献   

11.
Prediction of protein subcellular locations using fuzzy k-NN method   总被引:7,自引:0,他引:7  
MOTIVATION: Protein localization data are a valuable information resource helpful in elucidating protein functions. It is highly desirable to predict a protein's subcellular locations automatically from its sequence. RESULTS: In this paper, fuzzy k-nearest neighbors (k-NN) algorithm has been introduced to predict proteins' subcellular locations from their dipeptide composition. The prediction is performed with a new data set derived from version 41.0 SWISS-PROT databank, the overall predictive accuracy about 80% has been achieved in a jackknife test. The result demonstrates the applicability of this relative simple method and possible improvement of prediction accuracy for the protein subcellular locations. We also applied this method to annotate six entirely sequenced proteomes, namely Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster, Oryza sativa, Arabidopsis thaliana and a subset of all human proteins. AVAILABILITY: Supplementary information and subcellular location annotations for eukaryotes are available at http://166.111.30.65/hying/fuzzy_loc.htm  相似文献   

12.
Chang JM  Su EC  Lo A  Chiu HS  Sung TY  Hsu WL 《Proteins》2008,72(2):693-710
Prediction of protein subcellular localization (PSL) is important for genome annotation, protein function prediction, and drug discovery. Many computational approaches for PSL prediction based on protein sequences have been proposed in recent years for Gram-negative bacteria. We present PSLDoc, a method based on gapped-dipeptides and probabilistic latent semantic analysis (PLSA) to solve this problem. A protein is considered as a term string composed by gapped-dipeptides, which are defined as any two residues separated by one or more positions. The weighting scheme of gapped-dipeptides is calculated according to a position specific score matrix, which includes sequence evolutionary information. Then, PLSA is applied for feature reduction, and reduced vectors are input to five one-versus-rest support vector machine classifiers. The localization site with the highest probability is assigned as the final prediction. It has been reported that there is a strong correlation between sequence homology and subcellular localization (Nair and Rost, Protein Sci 2002;11:2836-2847; Yu et al., Proteins 2006;64:643-651). To properly evaluate the performance of PSLDoc, a target protein can be classified into low- or high-homology data sets. PSLDoc's overall accuracy of low- and high-homology data sets reaches 86.84% and 98.21%, respectively, and it compares favorably with that of CELLO II (Yu et al., Proteins 2006;64:643-651). In addition, we set a confidence threshold to achieve a high precision at specified levels of recall rates. When the confidence threshold is set at 0.7, PSLDoc achieves 97.89% in precision which is considerably better than that of PSORTb v.2.0 (Gardy et al., Bioinformatics 2005;21:617-623). Our approach demonstrates that the specific feature representation for proteins can be successfully applied to the prediction of protein subcellular localization and improves prediction accuracy. Besides, because of the generality of the representation, our method can be extended to eukaryotic proteomes in the future. The web server of PSLDoc is publicly available at http://bio-cluster.iis.sinica.edu.tw/~ bioapp/PSLDoc/.  相似文献   

13.
SUMMARY: We developed a web server PSLpred for predicting subcellular localization of gram-negative bacterial proteins with an overall accuracy of 91.2%. PSLpred is a hybrid approach-based method that integrates PSI-BLAST and three SVM modules based on compositions of residues, dipeptides and physico-chemical properties. The prediction accuracies of 90.7, 86.8, 90.3, 95.2 and 90.6% were attained for cytoplasmic, extracellular, inner-membrane, outer-membrane and periplasmic proteins, respectively. Furthermore, PSLpred was able to predict approximately 74% of sequences with an average prediction accuracy of 98% at RI = 5. AVAILABILITY: PSLpred is available at http://www.imtech.res.in/raghava/pslpred/  相似文献   

14.
Huang B  Schroeder M 《Gene》2008,422(1-2):14-21
Predicting protein interaction interfaces and protein complexes are two important related problems. For interface prediction, there are a number of tools, such as PPI-Pred, PPISP, PINUP, Promate, and SPPIDER, which predict enzyme-inhibitor interfaces with success rates of 23% to 55% and other interfaces with 10% to 28% on a benchmark dataset of 62 complexes. Here, we develop, metaPPI, a meta server for interface prediction. It significantly improves prediction success rates to 70% for enzyme-inhibitor and 44% for other interfaces. As shown with Promate, predicted interfaces can be used to improve protein docking. Here, we follow this idea using the meta server instead of individual predictions. We confirm that filtering with predicted interfaces significantly improves candidate generation in rigid-body docking based on shape complementarity. Finally, we show that the initial ranking of candidate solutions in rigid-body docking can be further improved for the class of enzyme-inhibitor complexes by a geometrical scoring which rewards deep pockets. A web server of metaPPI is available at scoppi.tu-dresden.de/metappi. The source code of our docking algorithm BDOCK is also available at www.biotec.tu-dresden.de/~bhuang/bdock.  相似文献   

15.
State-of-the-art methods for topology of α-helical membrane proteins are based on the use of time-consuming multiple sequence alignments obtained from PSI-BLAST or other sources. Here, we examine if it is possible to use the consensus of topology prediction methods that are based on single sequences to obtain a similar accuracy as the more accurate multiple sequence-based methods. Here, we show that TOPCONS-single performs better than any of the other topology prediction methods tested here, but ~6% worse than the best method that is utilizing multiple sequence alignments. AVAILABILITY AND IMPLEMENTATION: TOPCONS-single is available as a web server from http://single.topcons.net/ and is also included for local installation from the web site. In addition, consensus-based topology predictions for the entire international protein index (IPI) is available from the web server and will be updated at regular intervals.  相似文献   

16.
Cheng J  Randall A  Baldi P 《Proteins》2006,62(4):1125-1132
Accurate prediction of protein stability changes resulting from single amino acid mutations is important for understanding protein structures and designing new proteins. We use support vector machines to predict protein stability changes for single amino acid mutations leveraging both sequence and structural information. We evaluate our approach using cross-validation methods on a large dataset of single amino acid mutations. When only the sign of the stability changes is considered, the predictive method achieves 84% accuracy-a significant improvement over previously published results. Moreover, the experimental results show that the prediction accuracy obtained using sequence alone is close to the accuracy obtained using tertiary structure information. Because our method can accurately predict protein stability changes using primary sequence information only, it is applicable to many situations where the tertiary structure is unknown, overcoming a major limitation of previous methods which require tertiary information. The web server for predictions of protein stability changes upon mutations (MUpro), software, and datasets are available at http://www.igb.uci.edu/servers/servers.html.  相似文献   

17.
MOTIVATION: beta-turns play an important role from a structural and functional point of view. beta-turns are the most common type of non-repetitive structures in proteins and comprise on average, 25% of the residues. In the past numerous methods have been developed to predict beta-turns in a protein. Most of these prediction methods are based on statistical approaches. In order to utilize the full potential of these methods, there is a need to develop a web server. RESULTS: This paper describes a web server called BetaTPred, developed for predicting beta-TURNS in a protein from its amino acid sequence. BetaTPred allows the user to predict turns in a protein using existing statistical algorithms. It also allows to predict different types of beta-TURNS e.g. type I, I', II, II', VI, VIII and non-specific. This server assists the users in predicting the consensus beta-TURNS in a protein. AVAILABILITY: The server is accessible from http://imtech.res.in/raghava/betatpred/  相似文献   

18.
TMpro is a transmembrane (TM) helix prediction algorithm that uses language processing methodology for TM segment identification. It is primarily based on the analysis of statistical distributions of properties of amino acids in transmembrane segments. This article describes the availability of TMpro on the internet via a web interface. The key features of the interface are: (i) output is generated in multiple formats including a user-interactive graphical chart which allows comparison of TMpro predicted segment locations with other labeled segments input by the user, such as predictions from other methods. (ii) Up to 5000 sequences can be submitted at a time for prediction. (iii) TMpro is available as a web server and is published as a web service so that the method can be accessed by users as well as other services depending on the need for data integration. Availability: http://linzer.blm.cs.cmu.edu/tmpro/ (web server and help), http://blm.sis.pitt.edu:8080/axis/services/TMProFetcherService (web service).  相似文献   

19.
It is well known that protein subcellular localizations are closely related to their functions. Although many computational methods and tools are available from Internet, it is still necessary to develop new algorithms in this filed to gain a better understanding of the complex mechanism of plant subcellular localization. Here, we provide a new web server named PSCL for plant protein subcellular localization prediction by employing optimized functional domains. After feature optimization, 848 optimal functional domains from InterPro were obtained to represent each protein. By calculating the distances to each of the seven categories, PSCL showing the possibilities of a protein located into each of those categories in ascending order. Toward our dataset, PSCL achieved a first-order predicted accuracy of 75.7% by jackknife test. Gene Ontology enrichment analysis showing that catalytic activity, cellular process and metabolic process are strongly correlated with the localization of plant proteins. Finally, PSCL, a Linux Operate System based web interface for the predictor was designed and is accessible for public use at http://pscl.biosino.org/.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号