首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Subcellular localization is a key functional characteristic of proteins. It is determined by signals encoded in the protein sequence. The experimental determination of subcellular localization is laborious. Thus, a number of computational methods have been developed to predict the protein location from sequence. However predictions made by different methods often disagree with each other and it is not always clear which algorithm performs best for the given cellular compartment. We benchmarked primary subcellular localization predictors for proteins from Gram-negative bacteria, PSORTb3, PSLpred, CELLO, and SOSUI-GramN, on a common dataset that included 1056 proteins. We found that PSORTb3 performs best on the average, but is outperformed by other methods in predictions of extracellular proteins. This motivated us to develop a meta-predictor, which combines the primary methods by using the logistic regression models, to take advantage of their combined strengths, and to eliminate their individual weaknesses. MetaLocGramN runs the primary methods, and based on their output classifies protein sequences into one of five major localizations of the Gram-negative bacterial cell: cytoplasm, plasma membrane, periplasm, outer membrane, and extracellular space. MetaLocGramN achieves the average Matthews correlation coefficient of 0.806, i.e. 12% better than the best individual primary method. MetaLocGramN is a meta-predictor specialized in predicting subcellular localization for proteins from Gram-negative bacteria. According to our benchmark, it performs better than all other tools run independently. MetaLocGramN is a web and SOAP server available for free use by all academic users at the URL http://iimcb.genesilico.pl/MetaLocGramN. This article is part of a Special Issue entitled: Computational Methods for Protein Interaction and Structural Prediction.  相似文献   

2.
Han MJ  Yun H  Lee JW  Lee YH  Lee SY  Yoo JS  Kim JY  Kim JF  Hur CG 《Proteomics》2011,11(7):1213-1227
Escherichia coli K-12 and B strains have most widely been employed for scientific studies as well as industrial applications. Recently, the complete genome sequences of two representative descendants of E. coli B strains, REL606 and BL21(DE3), have been determined. Here, we report the subproteome reference maps of E. coli B REL606 by analyzing cytoplasmic, periplasmic, inner and outer membrane, and extracellular proteomes based on the genome information using experimental and computational approaches. Among the total of 3487 spots, 651 proteins including 410 non-redundant proteins were identified and characterized by 2-DE and LC-MS/MS; they include 440 cytoplasmic, 45 periplasmic, 50 inner membrane, 61 outer membrane, and 55 extracellular proteins. In addition, subcellular localizations of all 4205 ORFs of E. coli B were predicted by combined computational prediction methods. The subcellular localizations of 1812 (43.09%) proteins of currently unknown function were newly assigned. The results of computational prediction were also compared with the experimental results, showing that overall precision and recall were 92.16 and 92.16%, respectively. This work represents the most comprehensive analyses of the subproteomes of E. coli B, and will be useful as a reference for proteome profiling studies under various conditions. The complete proteome data are available online (http://ecolib.kaist.ac.kr).  相似文献   

3.
Gram-negative bacteria have five major subcellular localization sites: the cytoplasm, the periplasm, the inner membrane, the outer membrane, and the extracellular space. The subcellular location of a protein can provide valuable information about its function. With the rapid increase of sequenced genomic data, the need for an automated and accurate tool to predict subcellular localization becomes increasingly important. We present an approach to predict subcellular localization for Gram-negative bacteria. This method uses the support vector machines trained by multiple feature vectors based on n-peptide compositions. For a standard data set comprising 1443 proteins, the overall prediction accuracy reaches 89%, which, to the best of our knowledge, is the highest prediction rate ever reported. Our prediction is 14% higher than that of the recently developed multimodular PSORT-B. Because of its simplicity, this approach can be easily extended to other organisms and should be a useful tool for the high-throughput and large-scale analysis of proteomic and genomic data.  相似文献   

4.
5.
Xiao X  Wu ZC  Chou KC 《PloS one》2011,6(6):e20592
Prediction of protein subcellular localization is a challenging problem, particularly when the system concerned contains both singleplex and multiplex proteins. In this paper, by introducing the "multi-label scale" and hybridizing the information of gene ontology with the sequential evolution information, a novel predictor called iLoc-Gneg is developed for predicting the subcellular localization of gram-positive bacterial proteins with both single-location and multiple-location sites. For facilitating comparison, the same stringent benchmark dataset used to estimate the accuracy of Gneg-mPLoc was adopted to demonstrate the power of iLoc-Gneg. The dataset contains 1,392 gram-negative bacterial proteins classified into the following eight locations: (1) cytoplasm, (2) extracellular, (3) fimbrium, (4) flagellum, (5) inner membrane, (6) nucleoid, (7) outer membrane, and (8) periplasm. Of the 1,392 proteins, 1,328 are each with only one subcellular location and the other 64 are each with two subcellular locations, but none of the proteins included has pairwise sequence identity to any other in a same subset (subcellular location). It was observed that the overall success rate by jackknife test on such a stringent benchmark dataset by iLoc-Gneg was over 91%, which is about 6% higher than that by Gneg-mPLoc. As a user-friendly web-server, iLoc-Gneg is freely accessible to the public at http://icpr.jci.edu.cn/bioinfo/iLoc-Gneg. Meanwhile, a step-by-step guide is provided on how to use the web-server to get the desired results. Furthermore, for the user's convenience, the iLoc-Gneg web-server also has the function to accept the batch job submission, which is not available in the existing version of Gneg-mPLoc web-server. It is anticipated that iLoc-Gneg may become a useful high throughput tool for Molecular Cell Biology, Proteomics, System Biology, and Drug Development.  相似文献   

6.
MOTIVATION: Subcellular localization is a key functional characteristic of proteins. A fully automatic and reliable prediction system for protein subcellular localization is needed, especially for the analysis of large-scale genome sequences. RESULTS: In this paper, Support Vector Machine has been introduced to predict the subcellular localization of proteins from their amino acid compositions. The total prediction accuracies reach 91.4% for three subcellular locations in prokaryotic organisms and 79.4% for four locations in eukaryotic organisms. Predictions by our approach are robust to errors in the protein N-terminal sequences. This new approach provides superior prediction performance compared with existing algorithms based on amino acid composition and can be a complementary method to other existing methods based on sorting signals. AVAILABILITY: A web server implementing the prediction method is available at http://www.bioinfo.tsinghua.edu.cn/SubLoc/. SUPPLEMENTARY INFORMATION: Supplementary material is available at http://www.bioinfo.tsinghua.edu.cn/SubLoc/.  相似文献   

7.
Chang JM  Su EC  Lo A  Chiu HS  Sung TY  Hsu WL 《Proteins》2008,72(2):693-710
Prediction of protein subcellular localization (PSL) is important for genome annotation, protein function prediction, and drug discovery. Many computational approaches for PSL prediction based on protein sequences have been proposed in recent years for Gram-negative bacteria. We present PSLDoc, a method based on gapped-dipeptides and probabilistic latent semantic analysis (PLSA) to solve this problem. A protein is considered as a term string composed by gapped-dipeptides, which are defined as any two residues separated by one or more positions. The weighting scheme of gapped-dipeptides is calculated according to a position specific score matrix, which includes sequence evolutionary information. Then, PLSA is applied for feature reduction, and reduced vectors are input to five one-versus-rest support vector machine classifiers. The localization site with the highest probability is assigned as the final prediction. It has been reported that there is a strong correlation between sequence homology and subcellular localization (Nair and Rost, Protein Sci 2002;11:2836-2847; Yu et al., Proteins 2006;64:643-651). To properly evaluate the performance of PSLDoc, a target protein can be classified into low- or high-homology data sets. PSLDoc's overall accuracy of low- and high-homology data sets reaches 86.84% and 98.21%, respectively, and it compares favorably with that of CELLO II (Yu et al., Proteins 2006;64:643-651). In addition, we set a confidence threshold to achieve a high precision at specified levels of recall rates. When the confidence threshold is set at 0.7, PSLDoc achieves 97.89% in precision which is considerably better than that of PSORTb v.2.0 (Gardy et al., Bioinformatics 2005;21:617-623). Our approach demonstrates that the specific feature representation for proteins can be successfully applied to the prediction of protein subcellular localization and improves prediction accuracy. Besides, because of the generality of the representation, our method can be extended to eukaryotic proteomes in the future. The web server of PSLDoc is publicly available at http://bio-cluster.iis.sinica.edu.tw/~ bioapp/PSLDoc/.  相似文献   

8.
Mimicking cellular sorting improves prediction of subcellular localization   总被引:27,自引:0,他引:27  
Predicting the native subcellular compartment of a protein is an important step toward elucidating its function. Here we introduce LOCtree, a hierarchical system combining support vector machines (SVMs) and other prediction methods. LOCtree predicts the subcellular compartment of a protein by mimicking the mechanism of cellular sorting and exploiting a variety of sequence and predicted structural features in its input. Currently LOCtree does not predict localization for membrane proteins, since the compositional properties of membrane proteins significantly differ from those of non-membrane proteins. While any information about function can be used by the system, we present estimates of performance that are valid when only the amino acid sequence of a protein is known. When evaluated on a non-redundant test set, LOCtree achieved sustained levels of 74% accuracy for non-plant eukaryotes, 70% for plants, and 84% for prokaryotes. We rigorously benchmarked LOCtree in comparison to the best alternative methods for localization prediction. LOCtree outperformed all other methods in nearly all benchmarks. Localization assignments using LOCtree agreed quite well with data from recent large-scale experiments. Our preliminary analysis of a few entirely sequenced organisms, namely human (Homo sapiens), yeast (Saccharomyces cerevisiae), and weed (Arabidopsis thaliana) suggested that over 35% of all non-membrane proteins are nuclear, about 20% are retained in the cytosol, and that every fifth protein in the weed resides in the chloroplast.  相似文献   

9.
The genome sequence of Bacillus subtilis was published in 1997 and since then many other bacterial genomes have been sequenced, among them Bacillus licheniformis in 2004. B. subtilis and B. licheniformis are closely related and feature similar saprophytic lifestyles in the soil. Both species can secrete numerous proteins into the surrounding medium enabling them to use high-molecular-weight substances, which are abundant in soils, as nutrient sources. The availability of complete genome sequences allows for the prediction of the proteins containing signals for secretion into the extracellular milieu and also of the proteins which form the secretion machinery needed for protein translocation through the cytoplasmic membrane. To confirm the predicted subcellular localization of proteins, proteomics is the best choice. The extracellular proteomes of B. subtilis and B. licheniformis have been analyzed under different growth conditions allowing comparisons of the extracellular proteomes and conclusions regarding similarities and differences of the protein secretion mechanisms between the two species.  相似文献   

10.
K Nakai  M Kanehisa 《Proteins》1991,11(2):95-110
We have developed an expert system that makes use of various kinds of knowledge organized as "if-then" rules for predicting protein localization sites in Gram-negative bacteria, given the amino acid sequence information alone. We considered four localization sites: the cytoplasm, the inner (cytoplasmic) membrane, the periplasm, and the outer membrane. Most rules were derived from experimental observations. For example, the rule to recognize an inner membrane protein is the presence of either a hydrophobic stretch in the predicted mature protein or an uncleavable N-terminal signal sequence. Lipoproteins are first recognized by a consensus pattern and then assumed present at either the inner or outer membrane. These two possibilities are further discriminated by examining an acidic residue in the mature N-terminal portion. Furthermore, we found an empirical rule that periplasmic and outer membrane proteins were successfully discriminated by their different amino acid composition. Overall, our system could predict 83% of the localization sites of proteins in our database.  相似文献   

11.
Automated prediction of bacterial protein subcellular localization is an important tool for genome annotation and drug discovery. PSORT has been one of the most widely used computational methods for such bacterial protein analysis; however, it has not been updated since it was introduced in 1991. In addition, neither PSORT nor any of the other computational methods available make predictions for all five of the localization sites characteristic of Gram-negative bacteria. Here we present PSORT-B, an updated version of PSORT for Gram-negative bacteria, which is available as a web-based application at http://www.psort.org. PSORT-B examines a given protein sequence for amino acid composition, similarity to proteins of known localization, presence of a signal peptide, transmembrane alpha-helices and motifs corresponding to specific localizations. A probabilistic method integrates these analyses, returning a list of five possible localization sites with associated probability scores. PSORT-B, designed to favor high precision (specificity) over high recall (sensitivity), attained an overall precision of 97% and recall of 75% in 5-fold cross-validation tests, using a dataset we developed of 1443 proteins of experimentally known localization. This dataset, the largest of its kind, is freely available, along with the PSORT-B source code (under GNU General Public License).  相似文献   

12.

Background

Subcellular localization of a new protein sequence is very important and fruitful for understanding its function. As the number of new genomes has dramatically increased over recent years, a reliable and efficient system to predict protein subcellular location is urgently needed.

Results

Esub8 was developed to predict protein subcellular localizations for eukaryotic proteins based on amino acid composition. In this research, the proteins are classified into the following eight groups: chloroplast, cytoplasm, extracellular, Golgi apparatus, lysosome, mitochondria, nucleus and peroxisome. We know subcellular localization is a typical classification problem; consequently, a one-against-one (1-v-1) multi-class support vector machine was introduced to construct the classifier. Unlike previous methods, ours considers the order information of protein sequences by a different method. Our method is tested in three subcellular localization predictions for prokaryotic proteins and four subcellular localization predictions for eukaryotic proteins on Reinhardt's dataset. The results are then compared to several other methods. The total prediction accuracies of two tests are both 100% by a self-consistency test, and are 92.9% and 84.14% by the jackknife test, respectively. Esub8 also provides excellent results: the total prediction accuracies are 100% by a self-consistency test and 87% by the jackknife test.

Conclusions

Our method represents a different approach for predicting protein subcellular localization and achieved a satisfactory result; furthermore, we believe Esub8 will be a useful tool for predicting protein subcellular localizations in eukaryotic organisms.
  相似文献   

13.
Subcellular location is an important functional annotation of proteins. An automatic, reliable and efficient prediction system for protein subcellular localization is necessary for large-scale genome analysis. This paper describes a protein subcellular localization method which extracts features from protein profiles rather than from amino acid sequences. The protein profile represents a protein family, discards part of the sequence information that is not conserved throughout the family and therefore is more sensitive than the amino acid sequence. The amino acid compositions of whole profile and the N-terminus of the profile are extracted, respectively, to train and test the probabilistic neural network classifiers. On two benchmark datasets, the overall accuracies of the proposed method reach 89.1% and 68.9%, respectively. The prediction results show that the proposed method perform better than those methods based on amino acid sequences. The prediction results of the proposed method are also compared with Subloc on two redundance-reduced datasets.  相似文献   

14.
Prediction of protein subcellular localization   总被引:6,自引:0,他引:6  
Yu CS  Chen YC  Lu CH  Hwang JK 《Proteins》2006,64(3):643-651
Because the protein's function is usually related to its subcellular localization, the ability to predict subcellular localization directly from protein sequences will be useful for inferring protein functions. Recent years have seen a surging interest in the development of novel computational tools to predict subcellular localization. At present, these approaches, based on a wide range of algorithms, have achieved varying degrees of success for specific organisms and for certain localization categories. A number of authors have noticed that sequence similarity is useful in predicting subcellular localization. For example, Nair and Rost (Protein Sci 2002;11:2836-2847) have carried out extensive analysis of the relation between sequence similarity and identity in subcellular localization, and have found a close relationship between them above a certain similarity threshold. However, many existing benchmark data sets used for the prediction accuracy assessment contain highly homologous sequences-some data sets comprising sequences up to 80-90% sequence identity. Using these benchmark test data will surely lead to overestimation of the performance of the methods considered. Here, we develop an approach based on a two-level support vector machine (SVM) system: the first level comprises a number of SVM classifiers, each based on a specific type of feature vectors derived from sequences; the second level SVM classifier functions as the jury machine to generate the probability distribution of decisions for possible localizations. We compare our approach with a global sequence alignment approach and other existing approaches for two benchmark data sets-one comprising prokaryotic sequences and the other eukaryotic sequences. Furthermore, we carried out all-against-all sequence alignment for several data sets to investigate the relationship between sequence homology and subcellular localization. Our results, which are consistent with previous studies, indicate that the homology search approach performs well down to 30% sequence identity, although its performance deteriorates considerably for sequences sharing lower sequence identity. A data set of high homology levels will undoubtedly lead to biased assessment of the performances of the predictive approaches-especially those relying on homology search or sequence annotations. Our two-level classification system based on SVM does not rely on homology search; therefore, its performance remains relatively unaffected by sequence homology. When compared with other approaches, our approach performed significantly better. Furthermore, we also develop a practical hybrid method, which combines the two-level SVM classifier and the homology search method, as a general tool for the sequence annotation of subcellular localization.  相似文献   

15.
Knowing the comprehensive knowledge about the protein subcellular localization is an important step to understand the function of the proteins. Recent advances in system biology have allowed us to develop more accurate methods for characterizing the proteins at subcellular localization level. In this study, the analysis method was developed to characterize the topological properties and biological properties of the cytoplasmic proteins, inner membrane proteins, outer membrane proteins and periplasmic proteins in Escherichia coli (E. coli). Statistical significant differences were found in all topological properties and biological properties among proteins in different subcellular localizations. In addition, investigation was carried out to analyze the differences in 20 amino acid compositions for four protein categories. We also found that there were significant differences in all of the 20 amino acid compositions. These findings may be helpful for understanding the comprehensive relationship between protein subcellular localization and biological function  相似文献   

16.
Chimeras created by fusing the monomeric red fluorescent protein (RFP) to a bacterial lipoprotein signal peptide (lipoRFPs) were visualized in the cell envelope by epifluorescence microscopy. Plasmolysis of the bacteria separated the inner and outer membranes, allowing the specific subcellular localization of lipoRFPs to be determined in situ. When equipped with the canonical inner membrane lipoprotein retention signal CDSR, lipoRFP was located in the inner membrane in Escherichia coli, whereas the outer membrane sorting signal CSSR caused lipoRFP to localize to the outer membrane. CFSR-RFP was also routed to the outer membrane, but CFNSR-RFP was located in the inner membrane, consistent with previous data showing that this sequence functions as an inner membrane retention signal. These four lipoproteins exhibited identical localization patterns in a panel of members of the family Enterobacteriaceae, showing that the lipoprotein sorting rules are conserved in these bacteria and validating the use of E. coli as a model system. Although most predicted inner membrane lipoproteins in these bacteria have an aspartate residue after the fatty acylated N-terminal cysteine residue, alternative signals such as CFN can and probably do function in parallel, as indicated by the existence of putative inner membrane lipoproteins with this sequence at their N termini.  相似文献   

17.
The identification of exported proteins with gene fusions to invasin   总被引:2,自引:0,他引:2  
Exported proteins are integral to understanding the biology of bacterial organisms. They have special significance in pathogenesis research because they can mediate critical interactions between pathogens and eukaryotic cell surfaces. Further, they frequently serve as targets for vaccines and diagnostic tests. The commonly used genetic assays for identifying exported proteins use fusions to alkaline phosphatase or beta-lactamase. These systems are not ideal for identifying outer membrane proteins because they identify a large number of inner membrane proteins as well. We addressed this problem by developing a gene fusion system that preferentially identifies proteins that contain cleavable signal sequences and are released from the inner membrane. This system selects fusions that restore outer membrane localization to an amino terminal-truncated Yersinia pseudotuberculosis invasin derivative. In the present study, a variety of Salmonella typhimurium proteins that localize beyond the inner membrane were identified with gene fusions to this invasin derivative. Previously undescribed proteins identified include ones that share homology with components of fimbrial operons, multiple drug resistance efflux pumps and a haemolysin. All of the positive clones analysed contain cleavable signal sequences. Moreover, over 40% of the genes identified encode putative outer membrane proteins. This system has several features that may make it especially useful in the study of genetically intractable organisms.  相似文献   

18.
J M Gennity  H Kim    M Inouye 《Journal of bacteriology》1992,174(7):2095-2101
The lipid-modified nine-residue amino-terminal sequence of the mature form of the major outer membrane lipoprotein of Escherichia coli contains information that is responsible for sorting to either the inner or outer membrane. Fusion of this sorting sequence to beta-lactamase is sufficient for localization of the resultant lipo-beta-lactamase to the outer membrane (J. Ghrayeb and M. Inouye, J. Biol. Chem. 259:463-467, 1984). Substitution of the serine adjacent to the amino-terminal lipid-modified cysteine residue of the sorting sequence with the negatively charged residue aspartate causes inner membrane localization (K. Yamaguchi, F. Yu, and M. Inouye, Cell 53:423-432, 1988). Fusion of the aspartate-containing nine-residue inner membrane localization signal to the normally outer membrane lipoprotein bacteriocin release protein does cause partial localization to the inner membrane. However, a single replacement of the glutamine adjacent to the amino-terminal lipid-modified cysteine residue of bacteriocin release protein with aspartate causes no inner membrane localization. Therefore, an aspartate residue itself lacks the information necessary for inner membrane sorting when removed from the structural context provided by the additional eight residues of the sorting sequence. Although the aspartate-containing inner membrane sorting sequence causes an almost quantitative localization to the inner membrane when fused to the otherwise soluble protein beta-lactamase, this sequence cannot prevent significant outer membrane localization when fused to proteins (bacteriocin release protein and OmpA) normally found in the outer membrane. Therefore, structural determinants in addition to the amino-terminal sorting sequence influence the membrane localization of lipoproteins.  相似文献   

19.
Porphyromonas gingivalis secretes endopeptidase gingipains, which are important virulence factors of this bacterium. Gingipains are transported across the inner membrane via the Sec system, followed by transport across the outer membrane via an unidentified pathway. The latter transport step is suggested to be mediated via a novel protein secretion pathway. In the present study, we report a novel candidate as an essential factor for the latter transport step. The PG0027 gene of P. gingivalis W83 encodes novel protein PG27. In a PG0027 deletion mutant (83K10), the activities of Arg-gingipain and Lys-gingipain were severely reduced, while the activities of secreted exopeptidases DPPIV, DPP-7, and PTP-A were unaffected. Protein localization was investigated by cell-surface biotinylation, subcellular fractionation, and immunoblot analysis. In the wild-type W83, Arg-gingipains in membrane fraction were detected as cell surface proteins. In contrast, in 83K10, Arg-gingipains were trapped in the periplasm and hardly secreted into an extracellular milieu. PG27 was suggested to be exposed to the cell surface by a cell surface biotinylation experiment; however, PG27 was detected in both inner and outer membrane fractions by subcellular fractionation experiments. Taken together, we suggest that PG27 is a unique membrane protein essential for a novel secretion pathway.  相似文献   

20.
Many species of Gram-negative bacteria are pathogenic bacteria that can cause disease in a host organism. This pathogenic capability is usually associated with certain components in Gram-negative cells. Therefore, developing an automated method for fast and reliable prediction of Gram-negative protein subcellular location will allow us to not only timely annotate gene products, but also screen candidates for drug discovery. However, protein subcellular location prediction is a very difficult problem, particularly when more location sites need to be involved and when unknown query proteins do not have significant homology to proteins of known subcellular locations. PSORT-B, a recently updated version of PSORT, widely used for predicting Gram-negative protein subcellular location, only covers five location sites. Also, the data set used to train PSORT-B contains many proteins with high degrees of sequence identity in a same location group and, hence, may bear a strong homology bias. To overcome these problems, a new predictor, called "Gneg-PLoc", is developed. Featured by fusing many basic classifiers each being trained with a stringent data set containing proteins with strictly less than 25% sequence identity to one another in a same location group, the new predictor can cover eight subcellular locations; that is, cytoplasm, extracellular space, fimbrium, flagellum, inner membrane, nucleoid, outer membrane, and periplasm. In comparison with PSORT-B, the new predictor not only covers more subcellular locations, but also yields remarkably higher success rates. Gneg-PLoc is available as a Web server at http://202.120.37.186/bioinf/Gneg. To support the demand of people working in the relevant areas, a downloadable file is provided at the same Web site to list the results identified by Gneg-PLoc for 49 907 Gram-negative protein entries in the Swiss-Prot database that have no subcellular location annotations or are annotated with uncertain terms. The large-scale results will be updated twice a year to cover the new entries of Gram-negative bacterial proteins and reflect the new development of Gneg-PLoc.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号