首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Huang WL  Tung CW  Huang HL  Hwang SF  Ho SY 《Bio Systems》2007,90(2):573-581
Accurate prediction methods of protein subnuclear localizations rely on the cooperation between informative features and classifier design. Support vector machine (SVM) based learning methods are shown effective for predictions of protein subcellular and subnuclear localizations. This study proposes an evolutionary support vector machine (ESVM) based classifier with automatic selection from a large set of physicochemical composition (PCC) features to design an accurate system for predicting protein subnuclear localization, named ProLoc. ESVM using an inheritable genetic algorithm combined with SVM can automatically determine the best number m of PCC features and identify m out of 526 PCC features simultaneously. To evaluate ESVM, this study uses two datasets SNL6 and SNL9, which have 504 proteins localized in 6 subnuclear compartments and 370 proteins localized in 9 subnuclear compartments. Using a leave-one-out cross-validation, ProLoc utilizing the selected m=33 and 28 PCC features has accuracies of 56.37% for SNL6 and 72.82% for SNL9, which are better than 51.4% for the SVM-based system using k-peptide composition features applied on SNL6, and 64.32% for an optimized evidence-theoretic k-nearest neighbor classifier utilizing pseudo amino acid composition applied on SNL9, respectively.  相似文献   

2.
Li L  Zhang Y  Zou L  Li C  Yu B  Zheng X  Zhou Y 《PloS one》2012,7(1):e31057
With the rapid increase of protein sequences in the post-genomic age, it is challenging to develop accurate and automated methods for reliably and quickly predicting their subcellular localizations. Till now, many efforts have been tried, but most of which used only a single algorithm. In this paper, we proposed an ensemble classifier of KNN (k-nearest neighbor) and SVM (support vector machine) algorithms to predict the subcellular localization of eukaryotic proteins based on a voting system. The overall prediction accuracies by the one-versus-one strategy are 78.17%, 89.94% and 75.55% for three benchmark datasets of eukaryotic proteins. The improved prediction accuracies reveal that GO annotations and hydrophobicity of amino acids help to predict subcellular locations of eukaryotic proteins.  相似文献   

3.
Predicting subcellular localization with AdaBoost Learner   总被引:1,自引:0,他引:1  
Protein subcellular localization, which tells where a protein resides in a cell, is an important characteristic of a protein, and relates closely to the function of proteins. The prediction of their subcellular localization plays an important role in the prediction of protein function, genome annotation and drug design. Therefore, it is an important and challenging role to predict subcellular localization using bio-informatics approach. In this paper, a robust predictor, AdaBoost Learner is introduced to predict protein subcellular localization based on its amino acid composition. Jackknife cross-validation and independent dataset test were used to demonstrate that Adaboost is a robust and efficient model in predicting protein subcellular localization. As a result, the correct prediction rates were 74.98% and 80.12% for the Jackknife test and independent dataset test respectively, which are higher than using other existing predictors. An online server for predicting subcellular localization of proteins based on AdaBoost classifier was available on http://chemdata.shu. edu.cn/sl12.  相似文献   

4.
Measuring the properties of endogenous cell proteins, such as expression level, subcellular localization, and turnover rates, on a whole proteome level remains a major challenge in the postgenome era. Quantitative methods for measuring mRNA expression do not reliably predict corresponding protein levels and provide little or no information on other protein properties. Here we describe a combined pulse-labeling, spatial proteomics and data analysis strategy to characterize the expression, localization, synthesis, degradation, and turnover rates of endogenously expressed, untagged human proteins in different subcellular compartments. Using quantitative mass spectrometry and stable isotope labeling with amino acids in cell culture, a total of 80,098 peptides from 8,041 HeLa proteins were quantified, and their spatial distribution between the cytoplasm, nucleus and nucleolus determined and visualized using specialized software tools developed in PepTracker. Using information from ion intensities and rates of change in isotope ratios, protein abundance levels and protein synthesis, degradation and turnover rates were calculated for the whole cell and for the respective cytoplasmic, nuclear, and nucleolar compartments. Expression levels of endogenous HeLa proteins varied by up to seven orders of magnitude. The average turnover rate for HeLa proteins was ~20 h. Turnover rate did not correlate with either molecular weight or net charge, but did correlate with abundance, with highly abundant proteins showing longer than average half-lives. Fast turnover proteins had overall a higher frequency of PEST motifs than slow turnover proteins but no general correlation was observed between amino or carboxyl terminal amino acid identities and turnover rates. A subset of proteins was identified that exist in pools with different turnover rates depending on their subcellular localization. This strongly correlated with subunits of large, multiprotein complexes, suggesting a general mechanism whereby their assembly is controlled in a different subcellular location to their main site of function.  相似文献   

5.
MOTIVATION: Subcellular localization is a key functional characteristic of proteins. A fully automatic and reliable prediction system for protein subcellular localization is needed, especially for the analysis of large-scale genome sequences. RESULTS: In this paper, Support Vector Machine has been introduced to predict the subcellular localization of proteins from their amino acid compositions. The total prediction accuracies reach 91.4% for three subcellular locations in prokaryotic organisms and 79.4% for four locations in eukaryotic organisms. Predictions by our approach are robust to errors in the protein N-terminal sequences. This new approach provides superior prediction performance compared with existing algorithms based on amino acid composition and can be a complementary method to other existing methods based on sorting signals. AVAILABILITY: A web server implementing the prediction method is available at http://www.bioinfo.tsinghua.edu.cn/SubLoc/. SUPPLEMENTARY INFORMATION: Supplementary material is available at http://www.bioinfo.tsinghua.edu.cn/SubLoc/.  相似文献   

6.
Here we report a systematic approach for predicting subcellular localization (cytoplasm, mitochondrial, nuclear, and plasma membrane) of human proteins. First, support vector machine (SVM)-based modules for predicting subcellular localization using traditional amino acid and dipeptide (i + 1) composition achieved overall accuracy of 76.6 and 77.8%, respectively. PSI-BLAST, when carried out using a similarity-based search against a nonredundant data base of experimentally annotated proteins, yielded 73.3% accuracy. To gain further insight, a hybrid module (hybrid1) was developed based on amino acid composition, dipeptide composition, and similarity information and attained better accuracy of 84.9%. In addition, SVM modules based on a different higher order dipeptide i.e. i + 2, i + 3, and i + 4 were also constructed for the prediction of subcellular localization of human proteins, and overall accuracy of 79.7, 77.5, and 77.1% was accomplished, respectively. Furthermore, another SVM module hybrid2 was developed using traditional dipeptide (i + 1) and higher order dipeptide (i + 2, i + 3, and i + 4) compositions, which gave an overall accuracy of 81.3%. We also developed SVM module hybrid3 based on amino acid composition, traditional and higher order dipeptide compositions, and PSI-BLAST output and achieved an overall accuracy of 84.4%. A Web server HSLPred (www.imtech.res.in/raghava/hslpred/ or bioinformatics.uams.edu/raghava/hslpred/) has been designed to predict subcellular localization of human proteins using the above approaches.  相似文献   

7.
This paper presents a composite multi-layer classifier system for predicting the subcellular localization of proteins based on their amino acid sequence. The work is an extension of our previous predictor PProwler v1.1 which is itself built upon the series of predictors SignalP and TargetP. In this study we outline experiments conducted to improve the classifier design. The major improvement came from using Support Vector machines as a "smart gate" sorting the outputs of several different targeting peptide detection networks. Our final model (PProwler v1.2) gives MCC values of 0.873 for non-plant and 0.849 for plant proteins. The model improves upon the accuracy of our previous subcellular localization predictor (PProwler v1.1) by 2% for plant data (which represents 7.5% improvement upon TargetP).  相似文献   

8.
Shi JY  Zhang SW  Pan Q  Zhou GP 《Amino acids》2008,35(2):321-327
In the Post Genome Age, there is an urgent need to develop the reliable and effective computational methods to predict the subcellular localization for the explosion of newly found proteins. Here, a novel method of pseudo amino acid (PseAA) composition, the so-called “amino acid composition distribution” (AACD), is introduced. First, a protein sequence is divided equally into multiple segments. Then, amino acid composition of each segment is calculated in series. After that, each protein sequence can be represented by a feature vector. Finally, the feature vectors of all sequences thus obtained are further input into the multi-class support vector machines to predict the subcellular localization. The results show that AACD is quite effective in representing protein sequences for the purpose of predicting protein subcellular localization.  相似文献   

9.
Many proteins bear multi-locational characteristics, and this phenomenon is closely related to biological function. However, most of the existing methods can only deal with single-location proteins. Therefore, an automatic and reliable ensemble classifier for protein subcellular multi-localization is needed. We propose a new ensemble classifier combining the KNN (K-nearest neighbour) and SVM (support vector machine) algorithms to predict the subcellular localization of eukaryotic, Gram-negative bacterial and viral proteins based on the general form of Chou's pseudo amino acid composition, i.e., GO (gene ontology) annotations, dipeptide composition and AmPseAAC (Amphiphilic pseudo amino acid composition). This ensemble classifier was developed by fusing many basic individual classifiers through a voting system. The overall prediction accuracies obtained by the KNN-SVM ensemble classifier are 95.22, 93.47 and 80.72% for the eukaryotic, Gram-negative bacterial and viral proteins, respectively. Our prediction accuracies are significantly higher than those by previous methods and reveal that our strategy better predicts subcellular locations of multi-location proteins.  相似文献   

10.
Xu Y  Colletti KS  Pari GS 《Journal of virology》2002,76(17):8931-8938
The UL84 open reading frame encodes a protein that is required for origin-dependent DNA replication and interacts with the immediate-early protein IE2 in lytically infected cells. Transfection of UL84 expression constructs showed that UL84 localized to the nucleus of transfected cells in the absence of any other viral proteins and displayed a punctate speckled fluorescent staining pattern. Cotransfection of all the human cytomegalovirus replication proteins and oriLyt, along with pUL84-EGFP, showed that UL84 colocalized with UL44 (polymerase accessory protein) in replication compartments. Experiments using infected human fibroblasts demonstrated that UL84 also colocalized with UL44 and IE2 in viral replication compartments in infected cells. A nuclear localization signal was identified using plasmid constructs expressing truncation mutants of the UL84 protein in transient transfection assays. Transfection assays showed that UL84 failed to localize to the nucleus when 200 amino acids of the N terminus were deleted. Inspection of the UL84 amino acid sequence revealed a consensus putative nuclear localization signal between amino acids 160 and 171 (PEKKKEKQEKK) of the UL84 protein.  相似文献   

11.
Li S  Ehrhardt DW  Rhee SY 《Plant physiology》2006,141(2):527-539
Cells are organized into a complex network of subcellular compartments that are specialized for various biological functions. Subcellular location is an important attribute of protein function. To facilitate systematic elucidation of protein subcellular location, we analyzed experimentally verified protein localization data of 1,300 Arabidopsis (Arabidopsis thaliana) proteins. The 1,300 experimentally verified proteins are distributed among 40 different compartments, with most of the proteins localized to four compartments: mitochondria (36%), nucleus (28%), plastid (17%), and cytosol (13.3%). About 19% of the proteins are found in multiple compartments, in which a high proportion (36.4%) is localized to both cytosol and nucleus. Characterization of the overrepresented Gene Ontology molecular functions and biological processes suggests that the Golgi apparatus and peroxisome may play more diverse functions but are involved in more specialized processes than other compartments. To support systematic empirical determination of protein subcellular localization using a technology called fluorescent tagging of full-length proteins, we developed a database and Web application to provide preselected green fluorescent protein insertion position and primer sequences for all Arabidopsis proteins to study their subcellular localization and to store experimentally verified protein localization images, videos, and their annotations of proteins generated using the fluorescent tagging of full-length proteins technology. The database can be searched, browsed, and downloaded using a Web browser at http://aztec.stanford.edu/gfp/. The software can also be downloaded from the same Web site for local installation.  相似文献   

12.
Wu S  Wan P  Li J  Li D  Zhu Y  He F 《Proteomics》2006,6(2):449-455
Multi-modality of pI distribution is a common feature in different whole proteomes. Some researchers considered it relate to the proteins with different subcellular locations, indicating the result of natural selection. We explored the pI distribution of predicted proteomes (including animals, plants, bacterium, archaeans) and random proteome [random protein sequences constructed according to the special amino acid composition and molecular weight (MW) distribution of human predicted proteome]. Our results suggest that the multi-modality is the result of discrete pK(R) values for different amino acids. Amino acid composition and MW distribution of a proteome also contributes to the specific pI distribution. Although protein subcellular location was related to pI value, our analyses revealed that comparing with the random proteome, neither the multi-modality phenomenon nor the distribution bias of pI values is caused by subcellular location. It seems that the multi-modality distribution is just a mathematical fun. The blank region near the neutral pI was caused by the absence of amino acids with neutral pK(R), and suggests that the selection of amino acids with ionizable side chain might be restricted by the requirement for a special pH environment during the origin of life. From this point of view, the special distribution was the result of natural selection.  相似文献   

13.

Background

Subcellular localization of a new protein sequence is very important and fruitful for understanding its function. As the number of new genomes has dramatically increased over recent years, a reliable and efficient system to predict protein subcellular location is urgently needed.

Results

Esub8 was developed to predict protein subcellular localizations for eukaryotic proteins based on amino acid composition. In this research, the proteins are classified into the following eight groups: chloroplast, cytoplasm, extracellular, Golgi apparatus, lysosome, mitochondria, nucleus and peroxisome. We know subcellular localization is a typical classification problem; consequently, a one-against-one (1-v-1) multi-class support vector machine was introduced to construct the classifier. Unlike previous methods, ours considers the order information of protein sequences by a different method. Our method is tested in three subcellular localization predictions for prokaryotic proteins and four subcellular localization predictions for eukaryotic proteins on Reinhardt's dataset. The results are then compared to several other methods. The total prediction accuracies of two tests are both 100% by a self-consistency test, and are 92.9% and 84.14% by the jackknife test, respectively. Esub8 also provides excellent results: the total prediction accuracies are 100% by a self-consistency test and 87% by the jackknife test.

Conclusions

Our method represents a different approach for predicting protein subcellular localization and achieved a satisfactory result; furthermore, we believe Esub8 will be a useful tool for predicting protein subcellular localizations in eukaryotic organisms.
  相似文献   

14.
This paper introduces a new subcellular localization system (TSSub) for eukaryotic proteins. This system extracts features from both profiles and amino acid sequences. Four different features are extracted from profiles by four probabilistic neural network (PNN) classifiers, respectively (the amino acid composition from whole profiles; the amino acid composition from the N-terminus of profiles; the dipeptide composition from whole profiles and the amino acid composition from fragments of profiles). In addition, a support vector machine (SVM) classifier is added to implement the residue-couple feature extracted from amino acid sequences. The results from the five classifiers are fused by an additional SVM classifier. The overall accuracies of this TSSub reach 93.0 and 77.4% on Reinhardt and Hubbard's eukaryotic protein dataset and Huang and Li's eukaryotic protein dataset, respectively. The comparison with existing methods results shows TSSub provides better prediction performance than existing methods. AVAILABILITY: The web server is available from http://166.111.24.5/webtools/TSSub/index.html.  相似文献   

15.
The technique of proteome analysis using two-dimensional polyacrylamide gel electrophoresis (2D-PAGE) has the power to monitor global changes that occur in the protein complement of tissues and subcellular compartments. In this study, the proteins of rice were cataloged, a rice proteome database was constructed, and a functional characterization of some of the identified proteins was undertaken. Proteins extracted from various tissues and subcellular compartments in rice were separated by 2D-PAGE and an image analyzer was used to construct a display of the proteins. The Rice Proteome Database contains 23 reference maps based on 2D-PAGE of proteins from various rice tissues and subcellular compartments. These reference maps comprise 13129 identified proteins, and the amino acid sequences of 5092 proteins are entered in the database. Major proteins involved in growth or stress responses were identified using the proteome approach. Some of these proteins, including a β-tubulin, calreticulin, and ribulose-1,5-bisphosphate carboxylase/oxygenase activase in rice, have unexpected functions. The information obtained from the Rice Proteome Database will aid in cloning the genes for and predicting the function of unknown proteins.  相似文献   

16.
MOTIVATION: Functional annotation of unknown proteins is a major goal in proteomics. A key annotation is the prediction of a protein's subcellular localization. Numerous prediction techniques have been developed, typically focusing on a single underlying biological aspect or predicting a subset of all possible localizations. An important step is taken towards emulating the protein sorting process by capturing and bringing together biologically relevant information, and addressing the clear need to improve prediction accuracy and localization coverage. RESULTS: Here we present a novel SVM-based approach for predicting subcellular localization, which integrates N-terminal targeting sequences, amino acid composition and protein sequence motifs. We show how this approach improves the prediction based on N-terminal targeting sequences, by comparing our method TargetLoc against existing methods. Furthermore, MultiLoc performs considerably better than comparable methods predicting all major eukaryotic subcellular localizations, and shows better or comparable results to methods that are specialized on fewer localizations or for one organism. AVAILABILITY: http://www-bs.informatik.uni-tuebingen.de/Services/MultiLoc/  相似文献   

17.
We present a method called ngLOC, an n-gram-based Bayesian classifier that predicts the localization of a protein sequence over ten distinct subcellular organelles. A tenfold cross-validation result shows an accuracy of 89% for sequences localized to a single organelle, and 82% for those localized to multiple organelles. An enhanced version of ngLOC was developed to estimate the subcellular proteomes of eight eukaryotic organisms: yeast, nematode, fruitfly, mosquito, zebrafish, chicken, mouse, and human.  相似文献   

18.
The bovine herpesvirus 1 (BHV-1) tegument protein VP22 is predominantly localized in the nucleus after viral infection. To analyze subcellular localization in the absence of other viral proteins, a plasmid expressing BHV-1 VP22 fused to enhanced yellow fluorescent protein (EYFP) was constructed. The transient expression of VP22 fused to EYFP in COS-7 cells confirmed the predominant nuclear localization of VP22. Analysis of the amino acid sequence of VP22 revealed that it does not have a classical nuclear localization signal (NLS). However, by constructing a series of deletion derivatives, we mapped the nuclear targeting domain of BHV-1 VP22 to amino acids (aa) 121 to 139. Furthermore, a 4-aa motif, 130PRPR133, was able to direct EYFP and an EYFP dimer (dEYFP) or trimer (tEYFP) predominantly into the nucleus, whereas a deletion or mutation of this arginine-rich motif abrogated the nuclear localization property of VP22. Thus, 130PRPR133 is a functional nonclassical NLS. Since we observed that the C-terminal 68 aa of VP22 mediated the cytoplasmic localization of EYFP, an analysis was performed on these C-terminal amino acid sequences, and a leucine-rich motif, 204LDRMLKSAAIRIL216, was detected. Replacement of the leucines in this putative nuclear export signal (NES) with neutral amino acids resulted in an exclusive nuclear localization of VP22. Furthermore, this motif was able to localize EYFP and dEYFP in the cytoplasm, and the nuclear export function of this NES could be blocked by leptomycin B. This demonstrates that this leucine-rich motif is a functional NES. These data represent the first identification of a functional NLS and NES in a herpesvirus VP22 homologue.  相似文献   

19.
Wang H  Zhang J  Qiu W  Han GS  Carman GM  Adeli K 《FEBS letters》2011,585(12):1979-1984
Lipin-1 proteins are phosphatidic acid phosphatases (PAPs) catalyzing the conversion from phosphatidic acid (PA) to diacylglycerol (DG). Two alternative splicing isoforms, lipin-1α and -1β, are localized at different subcellular compartments. A third splicing isoform, lipin-1γ was recently cloned and its subcellular localization is unknown. Here, we demonstrate that lipin-1γ is localized to lipid droplets (LDs), an association mediated by a hydrophobic, lipin-1γ-specific domain. Additional expression of lipin-1γ altered LD morphology without affecting the triacylglycerol (TG) level. In human tissues, lipin-1γ is the main lipin-1 isoform expressed in normal human brain, suggesting a specialized role in regulating brain lipid metabolism.  相似文献   

20.
The serine/threonine protein kinase Sgk1 (serum- and glucocorticoid-inducible kinase 1) is characterized by a short half-life and has been implicated in the control of a large variety of functions in different subcellular compartments and tissues. Here, we analysed the influence of the N-terminus of Sgk1 on protein turnover and subcellular localization. Using green fluorescent protein-tagged Sgk1 deletion variants, we identified amino acids 17-32 to function as an anchor for the OMM (outer mitochondrial membrane). Subcellular fractionation of mouse tissue revealed a predominant localization of Sgk1 to the mitochondrial fraction. A cytosolic orientation of the kinase at the OMM was determined by in vitro import of Sgk1 and protease protection assays. Pulse-chase experiments showed that half-life and subcellular localization of Sgk1 are inseparable and determined by identical amino acids. Our results provide evidence that Sgk1 is primarily localized to the OMM and shed new light on the role of Sgk1 in the control of cellular function.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号