共查询到20条相似文献,搜索用时 0 毫秒
1.
Large-scale plant protein subcellular location prediction 总被引:1,自引:0,他引:1
Current plant genome sequencing projects have called for development of novel and powerful high throughput tools for timely annotating the subcellular location of uncharacterized plant proteins. In view of this, an ensemble classifier, Plant-PLoc, formed by fusing many basic individual classifiers, has been developed for large-scale subcellular location prediction for plant proteins. Each of the basic classifiers was engineered by the K-Nearest Neighbor (KNN) rule. Plant-PLoc discriminates plant proteins among the following 11 subcellular locations: (1) cell wall, (2) chloroplast, (3) cytoplasm, (4) endoplasmic reticulum, (5) extracell, (6) mitochondrion, (7) nucleus, (8) peroxisome, (9) plasma membrane, (10) plastid, and (11) vacuole. As a demonstration, predictions were performed on a stringent benchmark dataset in which none of the proteins included has > or =25% sequence identity to any other in a same subcellular location to avoid the homology bias. The overall success rate thus obtained was 32-51% higher than the rates obtained by the previous methods on the same benchmark dataset. The essence of Plant-PLoc in enhancing the prediction quality and its significance in biological applications are discussed. Plant-PLoc is accessible to public as a free web-server at: (http://202.120.37.186/bioinf/plant). Furthermore, for public convenience, results predicted by Plant-PLoc have been provided in a downloadable file at the same website for all plant protein entries in the Swiss-Prot database that do not have subcellular location annotations, or are annotated as being uncertain. The large-scale results will be updated twice a year to include new entries of plant proteins and reflect the continuous development of Plant-PLoc. 相似文献
2.
The localization of a protein in a cell is closely correlated with its biological function. With the explosion of protein sequences entering into DataBanks, it is highly desired to develop an automated method that can fast identify their subcellular location. This will expedite the annotation process, providing timely useful information for both basic research and industrial application. In view of this, a powerful predictor has been developed by hybridizing the gene ontology approach [Nat. Genet. 25 (2000) 25], functional domain composition approach [J. Biol. Chem. 277 (2002) 45765], and the pseudo-amino acid composition approach [Proteins Struct. Funct. Genet. 43 (2001) 246; Erratum: ibid. 44 (2001) 60]. As a showcase, the recently constructed dataset [Bioinformatics 19 (2003) 1656] was used for demonstration. The dataset contains 7589 proteins classified into 12 subcellular locations: chloroplast, cytoplasmic, cytoskeleton, endoplasmic reticulum, extracellular, Golgi apparatus, lysosomal, mitochondrial, nuclear, peroxisomal, plasma membrane, and vacuolar. The overall success rate of prediction obtained by the jackknife cross-validation was 92%. This is so far the highest success rate performed on this dataset by following an objective and rigorous cross-validation procedure. 相似文献
3.
相似性比对预测蛋白质亚细胞区间 总被引:1,自引:0,他引:1
【目的】对蛋白质所属的亚细胞区间进行预测,为进一步研究蛋白质的生物学功能提供基础。【方法】以蛋白质序列的氨基酸组成、二肽、伪氨基酸组成作为序列特征,用BLAST比对改进K最近邻分类算法(K-nearest neighbor,KNN)实现蛋白序列所属亚细胞区间预测。【结果】在Jackknife检验下,数据集CH317三种特征的成功率分别为91.5%、91.5%和89.3%,数据集ZD98成功率分别为93.9%、92.9%和89.8%。【结论】BLAST比对改进KNN算法是预测蛋白质亚细胞区间的一种有效方法。 相似文献
4.
MOTIVATION: Protein localization data are a valuable information resource helpful in elucidating protein functions. It is highly desirable to predict a protein's subcellular locations automatically from its sequence. RESULTS: In this paper, fuzzy k-nearest neighbors (k-NN) algorithm has been introduced to predict proteins' subcellular locations from their dipeptide composition. The prediction is performed with a new data set derived from version 41.0 SWISS-PROT databank, the overall predictive accuracy about 80% has been achieved in a jackknife test. The result demonstrates the applicability of this relative simple method and possible improvement of prediction accuracy for the protein subcellular locations. We also applied this method to annotate six entirely sequenced proteomes, namely Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster, Oryza sativa, Arabidopsis thaliana and a subset of all human proteins. AVAILABILITY: Supplementary information and subcellular location annotations for eukaryotes are available at http://166.111.30.65/hying/fuzzy_loc.htm 相似文献
5.
Prediction of protein subcellular locations by incorporating quasi-sequence-order effect 总被引:16,自引:0,他引:16
Chou KC 《Biochemical and biophysical research communications》2000,278(2):477-483
How to incorporate the sequence order effect is a key and logical step for improving the prediction quality of protein subcellular location, but meanwhile it is a very difficult problem as well. This is because the number of possible sequence order patterns in proteins is extremely large, which has posed a formidable barrier to construct an effective training data set for statistical treatment based on the current knowledge. That is why most of the existing prediction algorithms are operated based on the amino-acid composition alone. In this paper, based on the physicochemical distance between amino acids, a set of sequence-order-coupling numbers was introduced to reflect the sequence order effect, or in a rigorous term, the quasi-sequence-order effect. Furthermore, the covariant discriminant algorithm by Chou and Elrod (Protein Eng. 12, 107-118, 1999) developed recently was augmented to allow the prediction performed by using the input of both the sequence-order-coupling numbers and amino-acid composition. A remarkable improvement was observed in the prediction quality using the augmented covariant discriminant algorithm. The approach described here represents one promising step forward in the efforts of incorporating sequence order effect in protein subcellular location prediction. It is anticipated that the current approach may also have a series of impacts on the prediction of other protein features by statistical approaches. 相似文献
6.
Prediction of membrane protein types and subcellular locations. 总被引:12,自引:0,他引:12
Membrane proteins are classified according to two different schemes. In scheme 1, they are discriminated among the following five types: (1) type I single-pass transmembrane, (2) type II single-pass transmembrane, (3) multipass transmembrane, (4) lipid chain-anchored membrane, and (5) GPI-anchored membrane proteins. In scheme 2, they are discriminated among the following nine locations: (1) chloroplast, (2) endoplasmic reticulum, (3) Golgi apparatus, (4) lysosome, (5) mitochondria, (6) nucleus, (7) peroxisome, (8) plasma, and (9) vacuole. An algorithm is formulated for predicting the type or location of a given membrane protein based on its amino acid composition. The overall rates of correct prediction thus obtained by both self-consistency and jackknife tests, as well as by an independent dataset test, were around 76-81% for the classification of five types, and 66-70% for the classification of nine cellular locations. Furthermore, classification and prediction were also conducted between inner and outer membrane proteins; the corresponding rates thus obtained were 88-91%. These results imply that the types of membrane proteins, as well as their cellular locations and other attributes, are closely correlated with their amino acid composition. It is anticipated that the classification schemes and prediction algorithm can expedite the functionality determination of new proteins. The concept and method can be also useful in the prioritization of genes and proteins identified by genomics efforts as potential molecular targets for drug design. 相似文献
7.
《Expert review of proteomics》2013,10(3):227-237
In the last two decades, predicting protein subcellular locations has become a hot topic in bioinformatics. A number of algorithms and online services have been developed to computationally assign a subcellular location to a given protein sequence. With the progress of many proteome projects, more and more proteins are annotated with more than one subcellular location. However, multisite prediction has only been considered in a handful of recent studies, in which there are several common challenges. In this special report, the authors discuss what these challenges are, why these challenges are important and how the existing studies gave their solutions. Finally, a vision of the future of predicting multisite protein subcellular locations is given. 相似文献
8.
9.
Felise HB Nguyen HV Pfuetzner RA Barry KC Jackson SR Blanc MP Bronstein PA Kline T Miller SI 《Cell host & microbe》2008,4(4):325-336
Bacterial virulence mechanisms are attractive targets for antibiotic development because they are required for the pathogenesis of numerous global infectious disease agents. The bacterial secretion systems used to assemble the surface structures that promote adherence and deliver protein virulence effectors to host cells could comprise one such therapeutic target. In this study, we developed and performed a high-throughput screen of small molecule libraries and identified one compound, a 2-imino-5-arylidene thiazolidinone that blocked secretion and virulence functions of a wide array of animal and plant Gram-negative bacterial pathogens. This compound inhibited type III secretion-dependent functions, with the exception of flagellar motility, and type II secretion-dependent functions, suggesting that its target could be an outer membrane component conserved between these two secretion systems. This work provides a proof of concept that compounds with a broad spectrum of activity against Gram-negative bacterial secretion systems could be developed to prevent and treat bacterial diseases. 相似文献
10.
集成改进KNN算法预测蛋白质亚细胞定位 总被引:1,自引:0,他引:1
基于Adaboost算法对多个相似性比对K最近邻(K-nearest neighbor,KNN)分类器集成实现蛋白质的亚细胞定位预测。相似性比对KNN算法分别以氨基酸组成、二肽、伪氨基酸组成为蛋白序列特征,在KNN的决策阶段使用Blast比对决定蛋白质的亚细胞定位。在Jackknife检验下,Adaboost集成分类算法提取3种蛋白序列特征,3种特征在数据集CH317和Gram1253的最高预测成功率分别为92.4%和93.1%。结果表明Adaboost集成改进KNN分类预测方法是一种有效的蛋白质亚细胞定位预测方法。 相似文献
11.
The computational prediction of the subcellular localization of bacterial proteins is an important step in genome annotation and in the search for novel vaccine or drug targets. Since the 1991 release of PSORT I--the first comprehensive algorithm to predict bacterial protein localization--many other localization prediction tools have been developed. These methods offer significant improvements in predictive performance over PSORT I and the accuracy of some methods now rivals that of certain high-throughput laboratory methods for protein localization identification. 相似文献
12.
Given a raw protein sequence, knowing its subcellular location is an important step toward understanding its function and designing further experiments. A novel method is proposed for the prediction of protein subcellular locations from sequences. For four categories of eukaryotic proteins the overall predictive accuracy is 82.0%, 2.6% higher than that by using SVM approach. For three subcellular locations of prokaryotic proteins, an overall accuracy of 89.9% is obtained. In accordance with the architecture of cells, a hierarchical prediction approach is designed. Based on amino acid composition extracellular proteins and intracellular proteins can be identified with accuracy of 97%. 相似文献
13.
A graphic representation of protein sequence and predicting the subcellular locations of prokaryotic proteins 总被引:1,自引:0,他引:1
Zp curve, a three-dimensional space curve representation of protein primary sequence based on the hydrophobicity and charged properties of amino acid residues along the primary sequence is suggested. Relying on the Zp parameters extracted from the three components of the Zp curve and the Bayes discriminant algorithm, the subcellular locations of prokaryotic proteins were predicted. Consequently, an accuracy of 81.5% in the cross-validation test has been achieved using 13 parameters extracted from the curve for the database of 997 prokaryotic proteins. The result is slightly better than that of using the neural network method (80.9%) based on the amino acid composition for the same database. By jointing the amino acid composition and the Zp parameters, the overall predictive accuracy 89.6% can be achieved. It is about 3% higher than that of the Bayes discriminant algorithm based merely on the amino acid composition for the same database. The prediction is also performed with a larger dataset derived from the version 39 SWISS-PROT databank and two datasets with different sequence similarity. Even for the dataset of non-sequence similarity, the improvement can be of 4.4% in the cross-validation test. The results indicate that the Zp parameters are effective in representing the information within a protein primary sequence. The method of extracting information from the primary structure may be useful for other areas of protein studies. 相似文献
14.
Background
The subcellular location of a protein is closely related to its function. It would be worthwhile to develop a method to predict the subcellular location for a given protein when only the amino acid sequence of the protein is known. Although many efforts have been made to predict subcellular location from sequence information only, there is the need for further research to improve the accuracy of prediction. 相似文献15.
16.
Prediction of protein subcellular localization is a challenging problem, particularly when the system concerned contains both singleplex and multiplex proteins. In this paper, by introducing the "multi-label scale" and hybridizing the information of gene ontology with the sequential evolution information, a novel predictor called iLoc-Gneg is developed for predicting the subcellular localization of gram-positive bacterial proteins with both single-location and multiple-location sites. For facilitating comparison, the same stringent benchmark dataset used to estimate the accuracy of Gneg-mPLoc was adopted to demonstrate the power of iLoc-Gneg. The dataset contains 1,392 gram-negative bacterial proteins classified into the following eight locations: (1) cytoplasm, (2) extracellular, (3) fimbrium, (4) flagellum, (5) inner membrane, (6) nucleoid, (7) outer membrane, and (8) periplasm. Of the 1,392 proteins, 1,328 are each with only one subcellular location and the other 64 are each with two subcellular locations, but none of the proteins included has pairwise sequence identity to any other in a same subset (subcellular location). It was observed that the overall success rate by jackknife test on such a stringent benchmark dataset by iLoc-Gneg was over 91%, which is about 6% higher than that by Gneg-mPLoc. As a user-friendly web-server, iLoc-Gneg is freely accessible to the public at http://icpr.jci.edu.cn/bioinfo/iLoc-Gneg. Meanwhile, a step-by-step guide is provided on how to use the web-server to get the desired results. Furthermore, for the user's convenience, the iLoc-Gneg web-server also has the function to accept the batch job submission, which is not available in the existing version of Gneg-mPLoc web-server. It is anticipated that iLoc-Gneg may become a useful high throughput tool for Molecular Cell Biology, Proteomics, System Biology, and Drug Development. 相似文献
17.
Uptake of the fluorescent probe 1-N-phenylnaphthylamine (NPN), as adapted to an automated spectrofluorometer enabling multiwell reading of microtitre plates, was applied to determine permeability changes in Gram-negative bacteria. An intact outer membrane is a permeability barrier, and excludes hydrophobic substances such as NPN but, once damaged, it can allow the entry of NPN to the phospholipid layer, resulting in prominent fluorescence. With Escherichia coli O157, Pseudomonas aeruginosa, and Salmonella typhimurium as test organisms and ethylenediaminetetraacetic acid and sodium hexametaphosphate as the model permeabilizers, quantitative and highly reproducible NPN uptake levels were obtained that differed characteristically between the test bacteria. Furthermore, citric acid was shown to be a potent permeabilizer at millimolar concentrations, its effect being partly (Ps. aeruginosa, Salm. typhimurium) or almost totally (E. coli O157) abolished by MgCl2, suggesting that part of the action occurs by chelation. Sodium citrate induced weak NPN uptake, which was totally abolished by MgCl2. In conclusion, the NPN uptake assay with the automated spectrofluorometer serves as a convenient method in analysing and quantifying the effects of external agents, including potential food preservatives, on Gram-negative bacteria. 相似文献
18.
Fold assignments for newly sequenced genomes belong to the most important and interesting applications of the booming field of protein structure prediction. We present a brief survey and a discussion of such assignments completed to date, using as an example several fold assignment projects for proteins from the Escherichia coli genome. This review focuses on steps that are necessary to go beyond the simple assignment projects and into the development of tools extending our understanding of functions of proteins in newly sequenced genomes. This paper also discusses several problems seldom addressed in the literature, such as the problem of domain prediction and complementary predictions (e.g., transmembrane regions and flexible regions) and cross-correlation of predictions from different servers. The influence of sequence and structure database growth on prediction success is also addressed. Finally, we discuss the perspectives of the field in the context of massive sequence and structure determination projects, as well as the development of novel prediction methods. 相似文献
19.
Twelve different porins from the gram-negative bacteria Escherichia coli, Salmonella typhimurium, Pseudomonas aeruginosa, and Yersinia pestis were reconstituted into lipid bilayer membranes. Most of the porins, except outer membrane protein P, formed large, water-filled, ion-permeable channels with a single-channel conductance between 1.5 and 6 nS in 1 M KCl. The ions used for probing the pore structure had the same relative mobilities while moving through the porin pore as they did while moving in free solution. Thus the single-channel conductances of the individual porins could be used to estimate the effective channel diameters of these porins, yielding values ranging from 1.0 to 2.0 nm. Zero-current potential measurements in the presence of salt gradients across lipid bilayer membranes containing individual porins gave results that were consistent with the conclusions drawn from the single-channel experiments. For all porins except protein P, the channels exhibited a greater cation selectivity for less mobile anions and a greater anion selectivity for less mobile cations, which again indicated that the ions were moving inside the pores in a fashion similar to their movement in the aqueous phase. Three porins, PhoE and NmpC of E. coli and protein P of P. aeruginosa, formed anion-selective pores. PhoE and NmpC were only weakly anion selective, and their selectivity was dependent on the mobility of the ions. In contrast, cations were unable to enter the selectivity filter of the protein P channel. This resulted in a high anion selectivity for all salts tested in this study. The other porins examined, including all of the known constitutive porins of the four gram-negative bacteria studied, were cation selective with a 3- to 40-fold preference for K+ ions over Cl- ions. 相似文献
20.
Prediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairs 总被引:14,自引:0,他引:14
MOTIVATION: The subcellular location of a protein is closely correlated to its function. Thus, computational prediction of subcellular locations from the amino acid sequence information would help annotation and functional prediction of protein coding genes in complete genomes. We have developed a method based on support vector machines (SVMs). RESULTS: We considered 12 subcellular locations in eukaryotic cells: chloroplast, cytoplasm, cytoskeleton, endoplasmic reticulum, extracellular medium, Golgi apparatus, lysosome, mitochondrion, nucleus, peroxisome, plasma membrane, and vacuole. We constructed a data set of proteins with known locations from the SWISS-PROT database. A set of SVMs was trained to predict the subcellular location of a given protein based on its amino acid, amino acid pair, and gapped amino acid pair compositions. The predictors based on these different compositions were then combined using a voting scheme. Results obtained through 5-fold cross-validation tests showed an improvement in prediction accuracy over the algorithm based on the amino acid composition only. This prediction method is available via the Internet. 相似文献