首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 890 毫秒
1.
Automated prediction of bacterial protein subcellular localization is an important tool for genome annotation and drug discovery. PSORT has been one of the most widely used computational methods for such bacterial protein analysis; however, it has not been updated since it was introduced in 1991. In addition, neither PSORT nor any of the other computational methods available make predictions for all five of the localization sites characteristic of Gram-negative bacteria. Here we present PSORT-B, an updated version of PSORT for Gram-negative bacteria, which is available as a web-based application at http://www.psort.org. PSORT-B examines a given protein sequence for amino acid composition, similarity to proteins of known localization, presence of a signal peptide, transmembrane alpha-helices and motifs corresponding to specific localizations. A probabilistic method integrates these analyses, returning a list of five possible localization sites with associated probability scores. PSORT-B, designed to favor high precision (specificity) over high recall (sensitivity), attained an overall precision of 97% and recall of 75% in 5-fold cross-validation tests, using a dataset we developed of 1443 proteins of experimentally known localization. This dataset, the largest of its kind, is freely available, along with the PSORT-B source code (under GNU General Public License).  相似文献   

2.
以500个茶(Camellia sinensis(L.)O.Ktze.)叶片的蛋白质作为数据集,比较TargetP、WoLF PSORT、LocTree和Plant-mPLoc 4种软件预测亚细胞定位的可信度和灵敏度。结果显示,4种软件预测可信度均高于80%,依次排序为TargetP > LocTree > WoLF PSORT > Plant-mPLoc。其中,LocTree对细胞质蛋白和分泌蛋白检测灵敏度最高,但对叶绿体蛋白灵敏度最低;Plant-mPLoc检测核蛋白最灵敏,但对细胞质蛋白最不敏感;TargetP检测叶绿体蛋白最灵敏,但仅能区分3个亚细胞器官;WoLF PSORT对分泌蛋白检测灵敏度最低,但对其他蛋白均较灵敏。基于上述结果,该研究针对4种软件提出了合理的使用建议。  相似文献   

3.
Secreted protein prediction system combining CJ-SPHMM,TMHMM, and PSORT   总被引:4,自引:0,他引:4  
To increase the coverage of secreted protein prediction, we describe a combination strategy. Instead of using a single method, we combine Hidden Markov Model (HMM)-based methods CJ-SPHMM and TMHMM with PSORT in secreted protein prediction. CJ-SPHMM is an HMM-based signal peptide prediction method, while TMHMM is an HMM-based transmembrane (TM) protein prediction algorithm. With CJ-SPHMM and TMHMM, proteins with predicted signal peptide and without predicted TM regions are taken as putative secreted proteins. This HMM-based approach predicts secreted protein with Ac (Accuracy) at 0.82 and Cc (Correlation coefficient) at 0.75, which are similar to PSORT with Ac at 0.82 and Cc at 0.76. When we further complement the HMM-based method, i.e., CJ-SPHMM + TMHMM with PSORT in secreted protein prediction, the Ac value is increased to 0.86 and the Cc value is increased to 0.81. Taking this combination strategy to search putative secreted proteins from the International Protein Index (IPI) maintained at the European Bioinformatics Institute (EBI), we constructed a putative human secretome with 5235 proteins. The prediction system described here can also be applied to predicting secreted proteins from other vertebrate proteomes. Availability: The CJ-SPHMM and predicted secreted proteins are available at: ftp://ftp.cbi.pku.edu.cn/pub/secreted-protein/  相似文献   

4.
MOTIVATION: There is a scarcity of efficient computational methods for predicting protein subcellular localization in eukaryotes. Currently available methods are inadequate for genome-scale predictions with several limitations. Here, we present a new prediction method, pTARGET that can predict proteins targeted to nine different subcellular locations in the eukaryotic animal species. RESULTS: The nine subcellular locations predicted by pTARGET include cytoplasm, endoplasmic reticulum, extracellular/secretory, golgi, lysosomes, mitochondria, nucleus, plasma membrane and peroxisomes. Predictions are based on the location-specific protein functional domains and the amino acid compositional differences across different subcellular locations. Overall, this method can predict 68-87% of the true positives at accuracy rates of 96-99%. Comparison of the prediction performance against PSORT showed that pTARGET prediction rates are higher by 11-60% in 6 of the 8 locations tested. Besides, the pTARGET method is robust enough for genome-scale prediction of protein subcellular localizations since, it does not rely on the presence of signal or target peptides. AVAILABILITY: A public web server based on the pTARGET method is accessible at the URL http://bioinformatics.albany.edu/~ptarget. Datasets used for developing pTARGET can be downloaded from this web server. Source code will be available on request from the corresponding author.  相似文献   

5.
Meta-prediction seeks to harness the combined strengths of multiple predicting programs with the hope of achieving predicting performance surpassing that of all existing predictors in a defined problem domain. We investigated meta-prediction for the four-compartment eukaryotic subcellular localization problem. We compiled an unbiased subcellular localization dataset of 1693 nuclear, cytoplasmic, mitochondrial and extracellular animal proteins from Swiss-Prot 50.2. Using this dataset, we assessed the predicting performance of 12 predictors from eight independent subcellular localization predicting programs: ELSPred, LOCtree, PLOC, Proteome Analyst, PSORT, PSORT II, SubLoc and WoLF PSORT. Gorodkin correlation coefficient (GCC) was one of the performance measures. Proteome Analyst is the best individual subcellular localization predictor tested in this four-compartment prediction problem, with GCC = 0.811. A reduced voting strategy eliminating six of the 12 predictors yields a meta-predictor (RAW-RAG-6) with GCC = 0.856, substantially better than all tested individual subcellular localization predictors (P = 8.2 × 10−6, Fisher's Z-transformation test). The improvement in performance persists when the meta-predictor is tested with data not used in its development. This and similar voting strategies, when properly applied, are expected to produce meta-predictors with outstanding performance in other life sciences problem domains.  相似文献   

6.
Newly synthesized proteins in eukaryotic cells can only function well after they are accurately transported to specific organelles. The establishment of protein databases and the development of programs have accelerated the study of protein subcellular locations, but their comparisons and evaluations of the prediction accuracy of subcellular location programs in plants are lacking. In this study, we built a random test set of maize proteins to evaluate the accuracy of six commonly used programs of subcellular locations: iLoc-Plant, Plant-mPLoc, CELLO, WoLF PSORT, SherLoc2, and Predotar. Our results showed that the accuracy of prediction varied greatly depending on the programs and subcellular locations involved. The programs using homology search methods (iLoc-Plant and Plant-mPLoc) performed better than those using feature search methods (CELLO, WoLF PSORT, SherLoc2, and Predotar). In particular, iLoc-Plant achieved an 84.9 % accuracy for proteins whose subcellular locations have been experimentally determined and a 74.3 % accuracy for all of the proteins in the test set. Regarding locations, the highest prediction accuracies for subcellular locations were obtained for the nucleus, followed by the cytoplasm, mitochondria, plastids, endoplasmic reticulum, and vacuoles, while the lowest were obtained for cell membrane, secreted, and multiple-location proteins. We discussed the accuracy of the six programs in this article. This study will assist plant biologists in choosing appropriate programs to predict the location of proteins and provide clues regarding their function, especially for hypothetical or novel proteins.  相似文献   

7.
Here we report identification of a 2269‐base pair full‐length cDNA, CYP97E1, encoding a novel cytochrome P450 protein from the marine diatom Skeletonema costatum. The CYP97E1 protein contains 659 amino acids (Mr 74,200) and is the largest P450 isoform described to date. Our BLAST homology search and parsimony analysis showed that CYP97E1 shared high sequence identity (>40%) and genetic relatedness, respectively, with the CYP97B isoforms from different plant species. CYP97E1 was predicted by PSORT (a protein localization site prediction program) to be a cytosolic protein. Northern hybridization analysis indicated that CYP97E1 expression in S. costatum was not significantly affected by 2,4‐dichlorophenol, suggesting that CYP97E1 may not be involved in 2,4‐dichlorophenol detoxification in this diatom.  相似文献   

8.
The attainment of complete map‐based sequence for rice (Oryza sativa) is clearly a major milestone for the research community. Identifying the localization of encoded proteins is the key to understanding their functional characteristics and facilitating their purification. Our proposed method, RSLpred, is an effort in this direction for genome‐scale subcellular prediction of encoded rice proteins. First, the support vector machine (SVM)‐based modules have been developed using traditional amino acid‐, dipeptide‐ (i+1) and four parts‐amino acid composition and achieved an overall accuracy of 81.43, 80.88 and 81.10%, respectively. Secondly, a similarity search‐based module has been developed using position‐specific iterated‐basic local alignment search tool and achieved 68.35% accuracy. Another module developed using evolutionary information of a protein sequence extracted from position‐specific scoring matrix achieved an accuracy of 87.10%. In this study, a large number of modules have been developed using various encoding schemes like higher‐order dipeptide composition, N‐ and C‐terminal, splitted amino acid composition and the hybrid information. In order to benchmark RSLpred, it was tested on an independent set of rice proteins where it outperformed widely used prediction methods such as TargetP, Wolf‐PSORT, PA‐SUB, Plant‐Ploc and ESLpred. To assist the plant research community, an online web tool ‘RSLpred’ has been developed for subcellular prediction of query rice proteins, which is freely accessible at http://www.imtech.res.in/raghava/rslpred.  相似文献   

9.
10.
使用生物信息学预测结合实验验证的策略筛选鉴定人新的分泌蛋白基因。用SignalP、SOSUI、PSORT和BLAST等程序对UniProt蛋白数据库进行生物信息学分析 ,筛选出用于实验验证的 1 4个功能未知基因。采用RT PCR方法 ,克隆得到 1 4个基因的全长编码序列 ,并构建到真核表达载体pcDNA3.1 ( - ) Myc His质粒。采用蛋白质印迹与免疫荧光分析 ,检测到其中 7个基因的表达。除其中一个在细胞核表达外 ,其余 6个只在细胞质中表达 ;其中的 4个基因的表达产物在细胞培养液中可被检测到 ,鉴定为 4个新的分泌蛋白基因。  相似文献   

11.
Subcellular localization of messenger RNAs (mRNAs), as a prevalent mechanism, gives precise and efficient control for the translation process. There is mounting evidence for the important roles of this process in a variety of cellular events. Computational methods for mRNA subcellular localization prediction provide a useful approach for studying mRNA functions. However, few computational methods were designed for mRNA subcellular localization prediction and their performance have room for improvement. Especially, there is still no available tool to predict for mRNAs that have multiple localization annotations. In this paper, we propose a multi-head self-attention method, DM3Loc, for multi-label mRNA subcellular localization prediction. Evaluation results show that DM3Loc outperforms existing methods and tools in general. Furthermore, DM3Loc has the interpretation ability to analyze RNA-binding protein motifs and key signals on mRNAs for subcellular localization. Our analyses found hundreds of instances of mRNA isoform-specific subcellular localizations and many significantly enriched gene functions for mRNAs in different subcellular localizations.  相似文献   

12.
Many proteins bear multi-locational characteristics, and this phenomenon is closely related to biological function. However, most of the existing methods can only deal with single-location proteins. Therefore, an automatic and reliable ensemble classifier for protein subcellular multi-localization is needed. We propose a new ensemble classifier combining the KNN (K-nearest neighbour) and SVM (support vector machine) algorithms to predict the subcellular localization of eukaryotic, Gram-negative bacterial and viral proteins based on the general form of Chou's pseudo amino acid composition, i.e., GO (gene ontology) annotations, dipeptide composition and AmPseAAC (Amphiphilic pseudo amino acid composition). This ensemble classifier was developed by fusing many basic individual classifiers through a voting system. The overall prediction accuracies obtained by the KNN-SVM ensemble classifier are 95.22, 93.47 and 80.72% for the eukaryotic, Gram-negative bacterial and viral proteins, respectively. Our prediction accuracies are significantly higher than those by previous methods and reveal that our strategy better predicts subcellular locations of multi-location proteins.  相似文献   

13.
MOTIVATION: Currently available methods for the prediction of subcellular location of mitochondrial proteins rely largely on the presence of mitochondrial targeting signals in the protein sequences. However, a large fraction of mitochondrial proteins lack such signals, making those tools ineffective for genome-scale prediction of mitochondria-targeted proteins. Here, we propose a method for genome-scale prediction of nucleus-encoded mitochondrial proteins. The new method, MITOPRED, is based on the Pfam domain occurrence patterns and the amino acid compositional differences between mitochondrial and non-mitochondrial proteins. RESULTS: MITOPRED could predict mitochondrial proteins with 100% specificity at a 44% sensitivity rate and with 67% specificity at 99% sensitivity. Additionally, it was sufficiently robust to predict mitochondrial proteins across different eukaryotic species with similar accuracy. Based on Matthews correlation coefficient measure, the prediction performance of MITOPRED is clearly superior (0.73) to those of the two popular methods TargetP (0.51) and PSORT (0.53). Using this method, we predicted the nucleus-encoded mitochondrial proteins from six complete genomes (three invertebrate, two vertebrate and one plant species) and estimated the total number in each genome. In human, our method estimated the existence of 1362 mitochondrial proteins corresponding to 4.8% of the total proteome. AVAILABILITY: MITOPRED program is freely accessible at http://mitopred.sdsc.edu. Source code is available on request from the authors. SUPPLEMENTARY INFORMATION: Training data sets are also available at http://mitopred.sdsc.edu  相似文献   

14.
Survivin is an inhibitor of apoptosis protein (IAP) that is markedly overexpressed in most cancers. We identified two novel functionally divergent splice variants, i.e. non-antiapoptotic survivin-2B and antiapoptotic survivin-deltaEx3. Because survivin-2B might be a naturally occurring antagonist of antiapoptotic survivin variants, we analyzed the subcellular distribution of these proteins. PSORT II analysis predicted a preferential cytoplasmic localization of survivin and survivin-2B, but a preferential nuclear localization of survivin-deltaEx3. GFP-tagged survivin variants confirmed the predicted subcellular localization and additionally revealed a cell cycle-dependent nuclear accumulation of survivin-deltaEx3. Moreover, a bipartite nuclear localization signal found exclusively in survivin-deltaEx3 may support cytoplasmic clearance of survivin-deltaEx3. In contrast to the known association between survivin and microtubules or centromeres during mitosis, no corresponding co-localization became evident for survivin-deltaEx3 or survivin-2B. In conclusion, our study provided data on a differential subcellular localization of functionally divergent survivin variants, suggesting that survivin isoforms may perform different functions in distinct subcellular compartments and distinct phases of the cell cycle.  相似文献   

15.
用离散增量结合支持向量机方法预测蛋白质亚细胞定位   总被引:3,自引:0,他引:3  
赵禹  赵巨东  姚龙 《生物信息学》2010,8(3):237-239,244
对未知蛋白的功能注释是蛋白质组学的主要目标。一个关键的注释是蛋白质亚细胞定位的预测。本文应用离散增量结合支持向量机(ID_SVM)的方法,对阳性革兰氏细菌蛋白的5类亚细胞定位点进行预测。在独立检验下,其总体预测成功率为89.66%。结果发现ID_SVM算法对预测的成功率有很大改进。  相似文献   

16.

Background  

Gene Ontology (GO) annotation, which describes the function of genes and gene products across species, has recently been used to predict protein subcellular and subnuclear localization. Existing GO-based prediction methods for protein subcellular localization use the known accession numbers of query proteins to obtain their annotated GO terms. An accurate prediction method for predicting subcellular localization of novel proteins without known accession numbers, using only the input sequence, is worth developing.  相似文献   

17.
Lee K  Kim DW  Na D  Lee KH  Lee D 《Nucleic acids research》2006,34(17):4655-4666
Subcellular localization is one of the key functional characteristics of proteins. An automatic and efficient prediction method for the protein subcellular localization is highly required owing to the need for large-scale genome analysis. From a machine learning point of view, a dataset of protein localization has several characteristics: the dataset has too many classes (there are more than 10 localizations in a cell), it is a multi-label dataset (a protein may occur in several different subcellular locations), and it is too imbalanced (the number of proteins in each localization is remarkably different). Even though many previous works have been done for the prediction of protein subcellular localization, none of them tackles effectively these characteristics at the same time. Thus, a new computational method for protein localization is eventually needed for more reliable outcomes. To address the issue, we present a protein localization predictor based on D-SVDD (PLPD) for the prediction of protein localization, which can find the likelihood of a specific localization of a protein more easily and more correctly. Moreover, we introduce three measurements for the more precise evaluation of a protein localization predictor. As the results of various datasets which are made from the experiments of Huh et al. (2003), the proposed PLPD method represents a different approach that might play a complimentary role to the existing methods, such as Nearest Neighbor method and discriminate covariant method. Finally, after finding a good boundary for each localization using the 5184 classified proteins as training data, we predicted 138 proteins whose subcellular localizations could not be clearly observed by the experiments of Huh et al. (2003).  相似文献   

18.
粗糙脉孢菌基因组分泌蛋白的初步分析   总被引:4,自引:0,他引:4  
文章报道利用信号肽预测软件SignalP v3.0和PSORT,跨膜螺旋结构预测软件TMHMMv2.0和THUMBUP,GPI-锚定位点预测软件big-PI Predictor和亚细胞器中蛋白定位分布预测软件TargetP v1.01对粗糙脉孢菌全基因组数据库中已公布的10 082个氨基酸序列进行预测分析。结果表明在粗糙脉孢菌中有437个蛋白为分泌蛋白,编码这些蛋白最小的可读框(open reading frame,ORF)为252 bp,最大为6 604 bp,平均1 433 bp,分泌蛋白信号肽长度介于15~59个氨基酸之间。在437个分泌蛋白中,205个具有功能描述,主要包括各种酶类、细胞能量生成、运转以及自身修复、防卫等多种功能。这些蛋白所参与的生化过程可能发生在膜外的周质空间或是菌体外的场所,为该物种营养的摄取,以及对环境做出响应服务。   相似文献   

19.

Background  

Protein subcellular localization is crucial for genome annotation, protein function prediction, and drug discovery. Determination of subcellular localization using experimental approaches is time-consuming; thus, computational approaches become highly desirable. Extensive studies of localization prediction have led to the development of several methods including composition-based and homology-based methods. However, their performance might be significantly degraded if homologous sequences are not detected. Moreover, methods that integrate various features could suffer from the problem of low coverage in high-throughput proteomic analyses due to the lack of information to characterize unknown proteins.  相似文献   

20.
马铃薯晚疫病菌全基因组分泌蛋白的初步分析   总被引:1,自引:0,他引:1  
Zhou XG  Hou SM  Chen DW  Tao N  Ding YM  Sun ML  Zhang SS 《遗传》2011,33(7):785-793
利用马铃薯晚疫病菌全基因组测序结果,结合计算机技术和生物信息学的方法,对马铃薯晚疫病菌的蛋白进行分析,为明确该病原菌与寄主互作的分子机制奠定基础。文章应用信号肽预测软件SignalP v3.0和PSORT,跨膜螺旋结构预测软件TMHMM-2.0和THUMBUP,GPI锚定位点预测软件big-PI Predictor,亚细胞器中蛋白定位分布预测软件TargetP v1.01,对已经公布的马铃薯晚疫病菌全基因组22 658个蛋白质氨基酸序列进行分析。结果发现,晚疫病菌全基因组编码蛋白中有671个为潜在的分泌型蛋白,占编码蛋白总数的3.0%。其中有45个分泌蛋白有功能方面的描述,其功能涉及细胞代谢、信号转导等方面;此外,还有一些与激发子类似的分泌蛋白,它们可能与晚疫病菌的毒性有关。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号