首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Many species of Gram-negative bacteria are pathogenic bacteria that can cause disease in a host organism. This pathogenic capability is usually associated with certain components in Gram-negative cells. Therefore, developing an automated method for fast and reliable prediction of Gram-negative protein subcellular location will allow us to not only timely annotate gene products, but also screen candidates for drug discovery. However, protein subcellular location prediction is a very difficult problem, particularly when more location sites need to be involved and when unknown query proteins do not have significant homology to proteins of known subcellular locations. PSORT-B, a recently updated version of PSORT, widely used for predicting Gram-negative protein subcellular location, only covers five location sites. Also, the data set used to train PSORT-B contains many proteins with high degrees of sequence identity in a same location group and, hence, may bear a strong homology bias. To overcome these problems, a new predictor, called "Gneg-PLoc", is developed. Featured by fusing many basic classifiers each being trained with a stringent data set containing proteins with strictly less than 25% sequence identity to one another in a same location group, the new predictor can cover eight subcellular locations; that is, cytoplasm, extracellular space, fimbrium, flagellum, inner membrane, nucleoid, outer membrane, and periplasm. In comparison with PSORT-B, the new predictor not only covers more subcellular locations, but also yields remarkably higher success rates. Gneg-PLoc is available as a Web server at http://202.120.37.186/bioinf/Gneg. To support the demand of people working in the relevant areas, a downloadable file is provided at the same Web site to list the results identified by Gneg-PLoc for 49 907 Gram-negative protein entries in the Swiss-Prot database that have no subcellular location annotations or are annotated with uncertain terms. The large-scale results will be updated twice a year to cover the new entries of Gram-negative bacterial proteins and reflect the new development of Gneg-PLoc.  相似文献   

2.
One of the critical challenges in predicting protein subcellular localization is how to deal with the case of multiple location sites. Unfortunately, so far, no efforts have been made in this regard except for the one focused on the proteins in budding yeast only. For most existing predictors, the multiple-site proteins are either excluded from consideration or assumed even not existing. Actually, proteins may simultaneously exist at, or move between, two or more different subcellular locations. For instance, according to the Swiss-Prot database (version 50.7, released 19-Sept-2006), among the 33,925 eukaryotic protein entries that have experimentally observed subcellular location annotations, 2715 have multiple location sites, meaning about 8% bearing the multiplex feature. Proteins with multiple locations or dynamic feature of this kind are particularly interesting because they may have some very special biological functions intriguing to investigators in both basic research and drug discovery. Meanwhile, according to the same Swiss-Prot database, the number of total eukaryotic protein entries (except those annotated with "fragment" or those with less than 50 amino acids) is 90,909, meaning a gap of (90,909-33,925) = 56,984 entries for which no knowledge is available about their subcellular locations. Although one can use the computational approach to predict the desired information for the blank, so far, all the existing methods for predicting eukaryotic protein subcellular localization are limited in the case of single location site only. To overcome such a barrier, a new ensemble classifier, named Euk-mPLoc, was developed that can be used to deal with the case of multiple location sites as well. Euk-mPLoc is freely accessible to the public as a Web server at http://202.120.37.186/bioinf/euk-multi. Meanwhile, to support the people working in the relevant areas, Euk-mPLoc has been used to identify all eukaryotic protein entries in the Swiss-Prot database that do not have subcellular location annotations or are annotated as being uncertain. The large-scale results thus obtained have been deposited at the same Web site via a downloadable file prepared with Microsoft Excel and named "Tab_Euk-mPLoc.xls". Furthermore, to include new entries of eukaryotic proteins and reflect the continuous development of Euk-mPLoc in both the coverage scope and prediction accuracy, we will timely update the downloadable file as well as the predictor, and keep users informed by publishing a short note in the Journal and making an announcement in the Web Page.  相似文献   

3.
Prediction of protein subcellular location is a meaningful task which attracted much attention in recent years. A lot of protein subcellular location predictors which can only deal with the single-location proteins were developed. However, some proteins may belong to two or even more subcellular locations. It is important to develop predictors which will be able to deal with multiplex proteins, because these proteins have extremely useful implication in both basic biological research and drug discovery. Considering the circumstance that the number of methods dealing with multiplex proteins is limited, it is meaningful to explore some new methods which can predict subcellular location of proteins with both single and multiple sites. Different methods of feature extraction and different models of predict algorithms using on different benchmark datasets may receive some general results. In this paper, two different feature extraction methods and two different models of neural networks were performed on three benchmark datasets of different kinds of proteins, i.e. datasets constructed specially for Gram-positive bacterial proteins, plant proteins and virus proteins. These benchmark datasets have different number of location sites. The application result shows that RBF neural network has apparently superiorities against BP neural network on these datasets no matter which type of feature extraction is chosen.  相似文献   

4.
现有蛋白质亚细胞定位方法针对水溶性蛋白质而设计,对跨膜蛋白并不适用。而专门的跨膜拓扑预测器,又不是为亚细胞定位而设计的。文章改进了跨膜拓扑预测器TMPHMMLoc的模型结构,设计了一个新的二阶隐马尔可夫模型;采用推广到二阶模型的Baum-Welch算法估计模型参数,并把将各个亚细胞位置建立的模型整合为一个预测器。数据集上测试结果表明,此方法性能显著优于针对可溶性蛋白设计的支持向量机方法和模糊k最邻近方法,也优于TMPHMMLoc中提出的隐马尔可夫模型方法,是一个有效的跨膜蛋白亚细胞定位预测方法。  相似文献   

5.
One of the fundamental tasks in biology is to identify the functions of all proteins to reveal the primary machinery of a cell. Knowledge of the subcellular locations of proteins will provide key hints to reveal their functions and to understand the intricate pathways that regulate biological processes at the cellular level. Protein subcellular location prediction has been extensively studied in the past two decades. A lot of methods have been developed based on protein primary sequences as well as protein-protein interaction network. In this paper, we propose to use the protein-protein interaction network as an infrastructure to integrate existing sequence based predictors. When predicting the subcellular locations of a given protein, not only the protein itself, but also all its interacting partners were considered. Unlike existing methods, our method requires neither the comprehensive knowledge of the protein-protein interaction network nor the experimentally annotated subcellular locations of most proteins in the protein-protein interaction network. Besides, our method can be used as a framework to integrate multiple predictors. Our method achieved 56% on human proteome in absolute-true rate, which is higher than the state-of-the-art methods.  相似文献   

6.
Predicting protein subcellular locations has attracted much attention in the past decade. However, one of the most challenging problems is that many proteins were found simultaneously existing in, or moving between, two or more different cell components in a eukaryotic cell. Seldom previous predictors were able to deal with such multiplex proteins although they have extremely important implications in future drug discovery in terms of their specific subcellular targeting. Approximately 20% of the human proteome consists of such multiplex proteins with multiple sample labels. In order to efficiently handle such multiplex human proteins, we have developed a novel multi-label (ML) learning and prediction framework called ML-PLoc, which decomposes the multi-label prediction problem into multiple independent binary classification problems. ML-PLoc is constructed based on support vector machine (SVM) and sequential evolution information. Experimental results show that ML-PLoc can achieve an overall accuracy 64.6% and recall ratio 67.2% on a benchmark dataset consisting of 14 human subcellular locations, and is very powerful for dealing with multiplex proteins. The current approach represents a new strategy to deal with the multi-label biological problems. ML-PLoc software is freely available for academic use at: http://www.csbio.sjtu.edu.cn/bioinf/ML-PLoc.  相似文献   

7.

Background  

Knowledge of subcellular localization of proteins is crucial to proteomics, drug target discovery and systems biology since localization and biological function are highly correlated. In recent years, numerous computational prediction methods have been developed. Nevertheless, there is still a need for prediction methods that show more robustness and higher accuracy.  相似文献   

8.
MOTIVATION: The subcellular location of a protein is closely correlated to its function. Thus, computational prediction of subcellular locations from the amino acid sequence information would help annotation and functional prediction of protein coding genes in complete genomes. We have developed a method based on support vector machines (SVMs). RESULTS: We considered 12 subcellular locations in eukaryotic cells: chloroplast, cytoplasm, cytoskeleton, endoplasmic reticulum, extracellular medium, Golgi apparatus, lysosome, mitochondrion, nucleus, peroxisome, plasma membrane, and vacuole. We constructed a data set of proteins with known locations from the SWISS-PROT database. A set of SVMs was trained to predict the subcellular location of a given protein based on its amino acid, amino acid pair, and gapped amino acid pair compositions. The predictors based on these different compositions were then combined using a voting scheme. Results obtained through 5-fold cross-validation tests showed an improvement in prediction accuracy over the algorithm based on the amino acid composition only. This prediction method is available via the Internet.  相似文献   

9.
Information of the proteins' subcellular localization is crucially important for revealing their biological functions in a cell, the basic unit of life. With the avalanche of protein sequences generated in the postgenomic age, it is highly desired to develop computational tools for timely identifying their subcellular locations based on the sequence information alone. The current study is focused on the Gram-negative bacterial proteins. Although considerable efforts have been made in protein subcellular prediction, the problem is far from being solved yet. This is because mounting evidences have indicated that many Gram-negative bacterial proteins exist in two or more location sites. Unfortunately, most existing methods can be used to deal with single-location proteins only. Actually, proteins with multi-locations may have some special biological functions important for both basic research and drug design. In this study, by using the multi-label theory, we developed a new predictor called “pLoc-mGneg” for predicting the subcellular localization of Gram-negative bacterial proteins with both single and multiple locations. Rigorous cross-validation on a high quality benchmark dataset indicated that the proposed predictor is remarkably superior to “iLoc-Gneg”, the state-of-the-art predictor for the same purpose. For the convenience of most experimental scientists, a user-friendly web-server for the novel predictor has been established at http://www.jci-bioinfo.cn/pLoc-mGneg/, by which users can easily get their desired results without the need to go through the complicated mathematics involved.  相似文献   

10.
MOTIVATION: Identifying the destination or localization of proteins is key to understanding their function and facilitating their purification. A number of existing computational prediction methods are based on sequence analysis. However, these methods are limited in scope, accuracy and most particularly breadth of coverage. Rather than using sequence information alone, we have explored the use of database text annotations from homologs and machine learning to substantially improve the prediction of subcellular location. RESULTS: We have constructed five machine-learning classifiers for predicting subcellular localization of proteins from animals, plants, fungi, Gram-negative bacteria and Gram-positive bacteria, which are 81% accurate for fungi and 92-94% accurate for the other four categories. These are the most accurate subcellular predictors across the widest set of organisms ever published. Our predictors are part of the Proteome Analyst web-service.  相似文献   

11.
以500个茶(Camellia sinensis(L.)O.Ktze.)叶片的蛋白质作为数据集,比较TargetP、WoLF PSORT、LocTree和Plant-mPLoc 4种软件预测亚细胞定位的可信度和灵敏度。结果显示,4种软件预测可信度均高于80%,依次排序为TargetP > LocTree > WoLF PSORT > Plant-mPLoc。其中,LocTree对细胞质蛋白和分泌蛋白检测灵敏度最高,但对叶绿体蛋白灵敏度最低;Plant-mPLoc检测核蛋白最灵敏,但对细胞质蛋白最不敏感;TargetP检测叶绿体蛋白最灵敏,但仅能区分3个亚细胞器官;WoLF PSORT对分泌蛋白检测灵敏度最低,但对其他蛋白均较灵敏。基于上述结果,该研究针对4种软件提出了合理的使用建议。  相似文献   

12.
The study of rat proteins is an indispensable task in experimental medicine and drug development. The function of a rat protein is closely related to its subcellular location. Based on the above concept, we construct the benchmark rat proteins dataset and develop a combined approach for predicting the subcellular localization of rat proteins. From protein primary sequence, the multiple sequential features are obtained by using of discrete Fourier analysis, position conservation scoring function and increment of diversity, and these sequential features are selected as input parameters of the support vector machine. By the jackknife test, the overall success rate of prediction is 95.6% on the rat proteins dataset. Our method are performed on the apoptosis proteins dataset and the Gram-negative bacterial proteins dataset with the jackknife test, the overall success rates are 89.9% and 96.4%, respectively. The above results indicate that our proposed method is quite promising and may play a complementary role to the existing predictors in this area.  相似文献   

13.
The subcellular localization of a protein can provide important information about its function within the cell. As eukaryotic cells and particularly mammalian cells are characterized by a high degree of compartmentalization, most protein activities can be assigned to particular cellular compartments. The categorization of proteins by their subcellular localization is therefore one of the essential goals of the functional annotation of the human genome. We previously performed a subcellular localization screen of 52 proteins encoded on human chromosome 21. In the current study, we compared the experimental localization data to the in silico results generated by nine leading software packages with different prediction resolutions. The comparison revealed striking differences between the programs in the accuracy of their subcellular protein localization predictions. Our results strongly suggest that the recently developed predictors utilizing multiple prediction methods tend to provide significantly better performance over purely sequence-based or homology-based predictions.  相似文献   

14.
Knowledge of protein subcellular localization is vitally important for both basic research and drug development. With the avalanche of protein sequences emerging in the post-genomic age, it is highly desired to develop computational tools for timely and effectively identifying their subcellular localization purely based on the sequence information alone. Recently, a predictor called “pLoc-mGpos” was developed for identifying the subcellular localization of Gram-positive bacterial proteins. Its performance is overwhelmingly better than that of the other predictors for the same purpose, particularly in dealing with multi-label systems in which some proteins, called “multiplex proteins”, may simultaneously occur in two or more subcellular locations. Although it is indeed a very powerful predictor, more efforts are definitely needed to further improve it. This is because pLoc-mGpos was trained by an extremely skewed dataset in which some subset (subcellular location) was over 11 times the size of the other subsets. Accordingly, it cannot avoid the bias consequence caused by such an uneven training dataset. To alleviate such bias consequence, we have developed a new and bias-reducing predictor called pLoc_bal-mGpos by quasi-balancing the training dataset. Rigorous target jackknife tests on exactly the same experiment-confirmed dataset have indicated that the proposed new predictor is remarkably superior to pLoc-mGpos, the existing state-of-the-art predictor in identifying the subcellular localization of Gram-positive bacterial proteins. To maximize the convenience for most experimental scientists, a user-friendly web-server for the new predictor has been established at http://www.jci-bioinfo.cn/pLoc_bal-mGpos/, by which users can easily get their desired results without the need to go through the detailed mathematics.  相似文献   

15.
The field of bacterial cell biology has been revolutionized in the last decade by improvements in imaging capabilities which have revealed that bacterial cells, previously thought to be non-compartmentalized, possess an intricate higher-order organization. Many bacterial proteins localize to specific subcellular domains and regulate the spatial deployment of other proteins, DNA and lipids. Recently, the surprising discovery was made that bacterial RNA molecules are also specifically localized. However, the mechanisms that underlie bacterial cell architecture are just starting to be unraveled. The limited number of distribution patterns observed thus far for bacterial proteins and RNAs, and the similarity between the patterns exhibited by these macromolecules, suggest that the processes that underlie their localization are inextricably linked. We discuss these spatial arrangements and the insights that they provide on processes, such as localized translation, protein complex formation, and crosstalk between bacterial machineries.  相似文献   

16.
Many proteins bear multi-locational characteristics, and this phenomenon is closely related to biological function. However, most of the existing methods can only deal with single-location proteins. Therefore, an automatic and reliable ensemble classifier for protein subcellular multi-localization is needed. We propose a new ensemble classifier combining the KNN (K-nearest neighbour) and SVM (support vector machine) algorithms to predict the subcellular localization of eukaryotic, Gram-negative bacterial and viral proteins based on the general form of Chou's pseudo amino acid composition, i.e., GO (gene ontology) annotations, dipeptide composition and AmPseAAC (Amphiphilic pseudo amino acid composition). This ensemble classifier was developed by fusing many basic individual classifiers through a voting system. The overall prediction accuracies obtained by the KNN-SVM ensemble classifier are 95.22, 93.47 and 80.72% for the eukaryotic, Gram-negative bacterial and viral proteins, respectively. Our prediction accuracies are significantly higher than those by previous methods and reveal that our strategy better predicts subcellular locations of multi-location proteins.  相似文献   

17.
Predicting subcellular localization with AdaBoost Learner   总被引:1,自引:0,他引:1  
Protein subcellular localization, which tells where a protein resides in a cell, is an important characteristic of a protein, and relates closely to the function of proteins. The prediction of their subcellular localization plays an important role in the prediction of protein function, genome annotation and drug design. Therefore, it is an important and challenging role to predict subcellular localization using bio-informatics approach. In this paper, a robust predictor, AdaBoost Learner is introduced to predict protein subcellular localization based on its amino acid composition. Jackknife cross-validation and independent dataset test were used to demonstrate that Adaboost is a robust and efficient model in predicting protein subcellular localization. As a result, the correct prediction rates were 74.98% and 80.12% for the Jackknife test and independent dataset test respectively, which are higher than using other existing predictors. An online server for predicting subcellular localization of proteins based on AdaBoost classifier was available on http://chemdata.shu. edu.cn/sl12.  相似文献   

18.
Neural networks have been trained to predict the subcellular location of proteins in prokaryotic or eukaryotic cells from their amino acid composition. For three possible subcellular locations in prokaryotic organisms a prediction accuracy of 81% can be achieved. Assigning a reliability index, 33% of the predictions can be made with an accuracy of 91%. For eukaryotic proteins (excluding plant sequences) an overall prediction accuracy of 66% for four locations was achieved, with 33% of the sequences being predicted with an accuracy of 82% or better. With the subcellular location restricting a protein's possible function, this method should be a useful tool for the systematic analysis of genome data and is available via a server on the world wide web.  相似文献   

19.
Methods for predicting bacterial protein subcellular localization   总被引:1,自引:0,他引:1  
The computational prediction of the subcellular localization of bacterial proteins is an important step in genome annotation and in the search for novel vaccine or drug targets. Since the 1991 release of PSORT I--the first comprehensive algorithm to predict bacterial protein localization--many other localization prediction tools have been developed. These methods offer significant improvements in predictive performance over PSORT I and the accuracy of some methods now rivals that of certain high-throughput laboratory methods for protein localization identification.  相似文献   

20.
Although numerous efforts have been made for predicting the subcellular locations of proteins based on their sequence information, it still remains as a challenging problem, particularly when query proteins may have the multiplex character, i.e., they simultaneously exist, or move between, two or more different subcellular location sites. Most of the existing methods were established on the assumption: a protein has one, and only one, subcellular location. Actually, recent evidence has indicated an increasing number of human proteins having multiple subcellular locations. This kind of multiplex proteins should not be ignored because they may bear some special biological functions worthy of our attention. Based on the accumulation-label scale, a new predictor, called iLoc-Hum, was developed for identifying the subcellular localization of human proteins with both single and multiple location sites. As a demonstration, the jackknife cross-validation was performed with iLoc-Hum on a benchmark dataset of human proteins that covers the following 14 location sites: centrosome, cytoplasm, cytoskeleton, endoplasmic reticulum, endosome, extracellular, Golgi apparatus, lysosome, microsome, mitochondrion, nucleus, peroxisome, plasma membrane, and synapse, where some proteins belong to two, three or four locations but none has 25% or higher pairwise sequence identity to any other in the same subset. For such a complicated and stringent system, the overall success rate achieved by iLoc-Hum was 76%, which is remarkably higher than that by any of the existing predictors that also have the capacity to deal with this kind of system. Further comparisons were also made via two independent datasets; all indicated that the success rates by iLoc-Hum were even more significantly higher than its counterparts. As a user-friendly web-server, iLoc-Hum is freely accessible to the public at or . For the convenience of most experimental scientists, a step-by-step guide is provided on how to use the web-server to get the desired results by choosing either a straightforward submission or a batch submission, without the need to follow the complicated mathematical equations involved.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号