首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The nucleus guides life processes of cells. Many of the nuclear proteins participating in the life processes tend to concentrate on subnuclear compartments. The subnuclear localization of nuclear proteins is hence important for deeply understanding the construction and functions of the nucleus. Recently, Gene Ontology (GO) annotation has been used for prediction of subnuclear localization. However, the effective use of GO terms in solving sequence-based prediction problems remains challenging, especially when query protein sequences have no accession number or annotated GO term. This study obtains homologies of query proteins with known accession numbers using BLAST to retrieve GO terms for sequence-based subnuclear localization prediction. A prediction method PGAC, which involves mining informative GO terms associated with amino acid composition features, is proposed to design a support vector machine-based classifier. PGAC yields 55 informative GO terms with training and test accuracies of 85.7% and 76.3%, respectively, using a data set SNL_35 (561 proteins in 9 localizations) with 35% sequence identity. Upon comparison with Nuc-PLoc, which combines amphiphilic pseudo amino acid composition of a protein with its position-specific scoring matrix, PGAC using the data set SNL_80 yields a leave-one-out cross-validation accuracy of 81.1%, which is better than that of Nuc-PLoc, 67.4%. Experimental results show that the set of informative GO terms are effective features for protein subnuclear localization. The prediction server based on PGAC has been implemented at http://iclab.life.nctu.edu.tw/prolocgac.  相似文献   

2.
Facing the explosion of newly generated protein sequences in the post genomic era, we are challenged to develop an automated method for fast and reliably annotating their subcellular locations. Knowledge of subcellular locations of proteins can provide useful hints for revealing their functions and understanding how they interact with each other in cellular networking. Unfortunately, it is both expensive and time-consuming to determine the localization of an uncharacterized protein in a living cell purely based on experiments. To tackle the challenge, a novel hybridization classifier was developed by fusing many basic individual classifiers through a voting system. The "engine" of these basic classifiers was operated by the OET-KNN (Optimized Evidence-Theoretic K-Nearest Neighbor) rule. As a demonstration, predictions were performed with the fusion classifier for proteins among the following 16 localizations: (1) cell wall, (2) centriole, (3) chloroplast, (4) cyanelle, (5) cytoplasm, (6) cytoskeleton, (7) endoplasmic reticulum, (8) extracell, (9) Golgi apparatus, (10) lysosome, (11) mitochondria, (12) nucleus, (13) peroxisome, (14) plasma membrane, (15) plastid, and (16) vacuole. To get rid of redundancy and homology bias, none of the proteins investigated here had >/=25% sequence identity to any other in a same subcellular location. The overall success rates thus obtained via the jack-knife cross-validation test and independent dataset test were 81.6% and 83.7%, respectively, which were 46 approximately 63% higher than those performed by the other existing methods on the same benchmark datasets. Also, it is clearly elucidated that the overwhelmingly high success rates obtained by the fusion classifier is by no means a trivial utilization of the GO annotations as prone to be misinterpreted because there is a huge number of proteins with given accession numbers and the corresponding GO numbers, but their subcellular locations are still unknown, and that the percentage of proteins with GO annotations indicating their subcellular components is even less than the percentage of proteins with known subcellular location annotation in the Swiss-Prot database. It is anticipated that the powerful fusion classifier may also become a very useful high throughput tool in characterizing other attributes of proteins according to their sequences, such as enzyme class, membrane protein type, and nuclear receptor subfamily, among many others. A web server, called "Euk-OET-PLoc", has been designed at http://202.120.37.186/bioinf/euk-oet for public to predict subcellular locations of eukaryotic proteins by the fusion OET-KNN classifier.  相似文献   

3.
Jiang X  Wei R  Zhao Y  Zhang T 《Amino acids》2008,34(4):669-675
The knowledge of subnuclear localization in eukaryotic cells is essential for understanding the life function of nucleus. Developing prediction methods and tools for proteins subnuclear localization become important research fields in protein science for special characteristics in cell nuclear. In this study, a novel approach has been proposed to predict protein subnuclear localization. Sample of protein is represented by Pseudo Amino Acid (PseAA) composition based on approximate entropy (ApEn) concept, which reflects the complexity of time series. A novel ensemble classifier is designed incorporating three AdaBoost classifiers. The base classifier algorithms in three AdaBoost are decision stumps, fuzzy K nearest neighbors classifier, and radial basis-support vector machines, respectively. Different PseAA compositions are used as input data of different AdaBoost classifier in ensemble. Genetic algorithm is used to optimize the dimension and weight factor of PseAA composition. Two datasets often used in published works are used to validate the performance of the proposed approach. The obtained results of Jackknife cross-validation test are higher and more balance than them of other methods on same datasets. The promising results indicate that the proposed approach is effective and practical. It might become a useful tool in protein subnuclear localization. The software in Matlab and supplementary materials are available freely by contacting the corresponding author.  相似文献   

4.
Shen HB  Chou KC 《Amino acids》2007,32(4):483-488
Predicting membrane protein type is both an important and challenging topic in current molecular and cellular biology. This is because knowledge of membrane protein type often provides useful clues for determining, or sheds light upon, the function of an uncharacterized membrane protein. With the explosion of newly-found protein sequences in the post-genomic era, it is in a great demand to develop a computational method for fast and reliably identifying the types of membrane proteins according to their primary sequences. In this paper, a novel classifier, the so-called "ensemble classifier", was introduced. It is formed by fusing a set of nearest neighbor (NN) classifiers, each of which is defined in a different pseudo amino acid composition space. The type for a query protein is determined by the outcome of voting among these constituent individual classifiers. It was demonstrated through the self-consistency test, jackknife test, and independent dataset test that the ensemble classifier outperformed other existing classifiers widely used in biological literatures. It is anticipated that the idea of ensemble classifier can also be used to improve the prediction quality in classifying other attributes of proteins according to their sequences.  相似文献   

5.
了解真核细胞中细胞核内蛋白质的定位情况对于新发现蛋白质的功能注释具有重要意义.随着蛋白质数据库中蛋白质序列数量的急速增加,采用计算方法来预测蛋白质亚核定位已经成为蛋白质科学领域研究的热点.根据Chou提出的伪氨基酸组成离散模型,提出了一种新的蛋白质亚核定位预测方法.计算蛋白质序列的近似熵作为附加特征构建伪氨基酸组成,表示蛋白质序列特征,AdaBoost分类算法作为预测工具.与已报道的亚核定位预测方法的性能相比,这种方法具有更高的准确率.  相似文献   

6.
Li L  Zhang Y  Zou L  Li C  Yu B  Zheng X  Zhou Y 《PloS one》2012,7(1):e31057
With the rapid increase of protein sequences in the post-genomic age, it is challenging to develop accurate and automated methods for reliably and quickly predicting their subcellular localizations. Till now, many efforts have been tried, but most of which used only a single algorithm. In this paper, we proposed an ensemble classifier of KNN (k-nearest neighbor) and SVM (support vector machine) algorithms to predict the subcellular localization of eukaryotic proteins based on a voting system. The overall prediction accuracies by the one-versus-one strategy are 78.17%, 89.94% and 75.55% for three benchmark datasets of eukaryotic proteins. The improved prediction accuracies reveal that GO annotations and hydrophobicity of amino acids help to predict subcellular locations of eukaryotic proteins.  相似文献   

7.
One of the fundamental goals in cell biology and proteomics is to identify the functions of proteins in the context of compartments that organize them in the cellular environment. Knowledge of subcellular locations of proteins can provide key hints for revealing their functions and understanding how they interact with each other in cellular networking. Unfortunately, it is both time-consuming and expensive to determine the localization of an uncharacterized protein in a living cell purely based on experiments. With the avalanche of newly found protein sequences emerging in the post genomic era, we are facing a critical challenge, that is, how to develop an automated method to fast and reliably identify their subcellular locations so as to be able to timely use them for basic research and drug discovery. In view of this, an ensemble classifier was developed by the approach of fusing many basic individual classifiers through a voting system. Each of these basic classifiers was trained in a different dimension of the amphiphilic pseudo amino acid composition (Chou [2005] Bioinformatics 21: 10-19). As a demonstration, predictions were performed with the fusion classifier for proteins among the following 14 localizations: (1) cell wall, (2) centriole, (3) chloroplast, (4) cytoplasm, (5) cytoskeleton, (6) endoplasmic reticulum, (7) extracellular, (8) Golgi apparatus, (9) lysosome, (10) mitochondria, (11) nucleus, (12) peroxisome, (13) plasma membrane, and (14) vacuole. The overall success rates thus obtained via the resubstitution test, jackknife test, and independent dataset test were all significantly higher than those by the existing classifiers. It is anticipated that the novel ensemble classifier may also become a very useful vehicle in classifying other attributes of proteins according to their sequences, such as membrane protein type, enzyme family/sub-family, G-protein coupled receptor (GPCR) type, and structural class, among many others. The fusion ensemble classifier will be available at www.pami.sjtu.edu.cn/people/hbshen.  相似文献   

8.
Huang WL  Tung CW  Huang HL  Hwang SF  Ho SY 《Bio Systems》2007,90(2):573-581
Accurate prediction methods of protein subnuclear localizations rely on the cooperation between informative features and classifier design. Support vector machine (SVM) based learning methods are shown effective for predictions of protein subcellular and subnuclear localizations. This study proposes an evolutionary support vector machine (ESVM) based classifier with automatic selection from a large set of physicochemical composition (PCC) features to design an accurate system for predicting protein subnuclear localization, named ProLoc. ESVM using an inheritable genetic algorithm combined with SVM can automatically determine the best number m of PCC features and identify m out of 526 PCC features simultaneously. To evaluate ESVM, this study uses two datasets SNL6 and SNL9, which have 504 proteins localized in 6 subnuclear compartments and 370 proteins localized in 9 subnuclear compartments. Using a leave-one-out cross-validation, ProLoc utilizing the selected m=33 and 28 PCC features has accuracies of 56.37% for SNL6 and 72.82% for SNL9, which are better than 51.4% for the SVM-based system using k-peptide composition features applied on SNL6, and 64.32% for an optimized evidence-theoretic k-nearest neighbor classifier utilizing pseudo amino acid composition applied on SNL9, respectively.  相似文献   

9.
Numerous studies have identified key binding partners and functional activities of nuclear tumor-suppressor proteins such as the retinoblastoma protein, p53 and BRCA1. Historically, less attention has been given to the subnuclear locations of these proteins. Here, we describe several recent studies that promote the view that regulated association with subcompartments of the nucleus is inherent to tumor-suppressor function.  相似文献   

10.
The retroviral transforming gene v-myb encodes a 45,000-Mr nuclear transforming protein (p45v-myb). p45v-myb is a truncated and mutated version of a 75,000-Mr protein encoded by the chicken c-myb gene (p75c-myb). Like its viral counterpart, p75c-myb is located in the cell nucleus. As a first step in identifying nuclear targets involved in cellular transformation by v-myb and in c-myb function, we determined the subnuclear locations of p45v-myb and p75c-myb. Approximately 80 to 90% of the total p45v-myb and p75c-myb present in nuclei was released from nuclei at low salt concentrations, exhibited DNA-binding activity, and was attached to nucleoprotein particles when released from the nuclei after digestion with nuclease. A minor portion of approximately 10 to 20% of the total p45v-myb and p75c-myb remained tightly associated with the nuclei even in the presence of 2 M NaCl. These observations suggest that both proteins are associated with two nuclear substructures tentatively identified as the chromatin and the nuclear matrix. The function of myb proteins may therefore depend on interactions with several nuclear targets.  相似文献   

11.
12.
Shen HB  Yang J  Chou KC 《Amino acids》2007,33(1):57-67
With the avalanche of newly-found protein sequences emerging in the post genomic era, it is highly desirable to develop an automated method for fast and reliably identifying their subcellular locations because knowledge thus obtained can provide key clues for revealing their functions and understanding how they interact with each other in cellular networking. However, predicting subcellular location of eukaryotic proteins is a challenging problem, particularly when unknown query proteins do not have significant homology to proteins of known subcellular locations and when more locations need to be covered. To cope with the challenge, protein samples are formulated by hybridizing the information derived from the gene ontology database and amphiphilic pseudo amino acid composition. Based on such a representation, a novel ensemble hybridization classifier was developed by fusing many basic individual classifiers through a voting system. Each of these basic classifiers was engineered by the KNN (K-Nearest Neighbor) principle. As a demonstration, a new benchmark dataset was constructed that covers the following 18 localizations: (1) cell wall, (2) centriole, (3) chloroplast, (4) cyanelle, (5) cytoplasm, (6) cytoskeleton, (7) endoplasmic reticulum, (8) extracell, (9) Golgi apparatus, (10) hydrogenosome, (11) lysosome, (12) mitochondria, (13) nucleus, (14) peroxisome, (15) plasma membrane, (16) plastid, (17) spindle pole body, and (18) vacuole. To avoid the homology bias, none of the proteins included has > or =25% sequence identity to any other in a same subcellular location. The overall success rates thus obtained via the 5-fold and jackknife cross-validation tests were 81.6 and 80.3%, respectively, which were 40-50% higher than those performed by the other existing methods on the same strict dataset. The powerful predictor, named "Euk-PLoc", is available as a web-server at http://202.120.37.186/bioinf/euk . Furthermore, to support the need of people working in the relevant areas, a downloadable file will be provided at the same website to list the results predicted by Euk-PLoc for all eukaryotic protein entries (excluding fragments) in Swiss-Prot database that do not have subcellular location annotations or are annotated as being uncertain. The large-scale results will be updated twice a year to include the new entries of eukaryotic proteins and reflect the continuous development of Euk-PLoc.  相似文献   

13.
Many nuclear proteins, including the nuclear receptor co-repressor (NCoR) protein are localized to specific regions of the cell nucleus, and this subnuclear positioning is preserved when NCoR is expressed in cells as a fusion to a fluorescent protein (FP). To determine how specific factors may influence the subnuclear organization of NCoR requires an unbiased approach to the selection of cells for image analysis. Here, we use the co-expression of the monomeric red FP (mRFP) to select cells that also express NCoR labeled with yellow FP (YFP). The transfected cells are selected for imaging based on the diffuse cellular mRFP signal without prior knowledge of the subnuclear organization of the co-expressed YFP-NCoR. The images acquired of the expressed FPs are then analyzed using an automated image analysis protocol that identifies regions of interest (ROIs) using a set of empirically determined rules. The relative expression levels of both fluorescent proteins are estimated, and YFP-NCoR subnuclear organization is quantified based on the mean focal body size and relative intensity. The selected ROIs are tagged with an identifier and annotated with the acquired data. This integrated image analysis protocol is an unbiased method for the precise and consistent measurement of thousands of ROIs from hundreds of individual cells in the population.  相似文献   

14.
15.
The past decade has witnessed an explosion in the growth of proteomics. The completion of numerous genome sequences, the development of powerful protein analytical technologies, as well as the design of innovative bioinformatics tools have marked the beginning of a new post-genomic era. Proteomics, the large-scale analysis of proteins in an organism, organ or organelle encompasses different aspects: (1) the identification, analysis of post-translational modifications and quantification of proteins; (2) the study of protein-protein interactions; and (3) the functional analysis of interactome networks. Here, we briefly summarize the emerging analytical tools and databases that are paving the way for studying Drosophila development by proteomic approaches.  相似文献   

16.
17.
Signal-3L: A 3-layer approach for predicting signal peptides   总被引:3,自引:0,他引:3  
Functioning as an "address tag" that directs nascent proteins to their proper cellular and extracellular locations, signal peptides have become a crucial tool in finding new drugs or reprogramming cells for gene therapy. To effectively and timely use such a tool, however, the first important thing is to develop an automated method for rapidly and accurately identifying the signal peptide for a given nascent protein. With the avalanche of new protein sequences generated in the post-genomic era, the challenge has become even more urgent and critical. In this paper, we have developed a novel method for predicting signal peptide sequences and their cleavage sites in human, plant, animal, eukaryotic, Gram-positive, and Gram-negative protein sequences, respectively. The new predictor is called Signal-3L that consists of three prediction engines working, respectively, for the following three progressively deepening layers: (1) identifying a query protein as secretory or non-secretory by an ensemble classifier formed by fusing many individual OET-KNN (optimized evidence-theoretic K nearest neighbor) classifiers operated in various dimensions of PseAA (pseudo amino acid) composition spaces; (2) selecting a set of candidates for the possible signal peptide cleavage sites of a query secretory protein by a subsite-coupled discrimination algorithm; (3) determining the final cleavage site by fusing the global sequence alignment outcome for each of the aforementioned candidates through a voting system. Signal-3L is featured by high success prediction rates with short computational time, and hence is particularly useful for the analysis of large-scale datasets. Signal-3L is freely available as a web-server at http://chou.med.harvard.edu/bioinf/Signal-3L/ or http://202.120.37.186/bioinf/Signal-3L, where, to further support the demand of the related areas, the signal peptides identified by Signal-3L for all the protein entries in Swiss-Prot databank that do not have signal peptide annotations or are annotated with uncertain terms but are classified by Signal-3L as secretory proteins are provided in a downloadable file. The large-scale file is prepared with Microsoft Excel and named "Tab-Signal-3L.xls", and will be updated once a year to include new protein entries and reflect the continuous development of Signal-3L.  相似文献   

18.
The heterogeneous nuclear RNP (hnRNP) A1 protein is one of the major pre-mRNA/mRNA binding proteins in eukaryotic cells and one of the most abundant proteins in the nucleus. It is localized to the nucleoplasm and it also shuttles between the nucleus and the cytoplasm. The amino acid sequence of A1 contains two RNP motif RNA-binding domains (RBDs) at the amino terminus and a glycine-rich domain at the carboxyl terminus. This configuration, designated 2x RBD-Gly, is representative of perhaps the largest family of hnRNP proteins. Unlike most nuclear proteins characterized so far, A1 (and most 2x RBD-Gly proteins) does not contain a recognizable nuclear localization signal (NLS). We have found that a segment of ca. 40 amino acids near the carboxyl end of the protein (designated M9) is necessary and sufficient for nuclear localization; attaching this segment to the bacterial protein beta- galactosidase or to pyruvate kinase completely localized these otherwise cytoplasmic proteins to the nucleus. The RBDs and another RNA binding motif found in the glycine-rich domain, the RGG box, are not required for A1 nuclear localization. M9 is a novel type of nuclear localization domain as it does not contain sequences similar to classical basic-type NLS. Interestingly, sequences similar to M9 are found in other nuclear RNA-binding proteins including hnRNP A2.  相似文献   

19.
The human polyomavirus JC (JCV) replicates in the nuclei of infected cells. Here we report that JCV virions are efficiently assembled at nuclear domain 10 (ND10), which is also known as promyelocytic leukemia (PML) nuclear bodies. The major capsid protein VP1, the minor capsid proteins VP2 and VP3, and a regulatory protein called agnoprotein were coexpressed from a polycistronic expression vector in COS-7 cells. We found that VP1 accumulated to distinct subnuclear domains in the presence of VP2/VP3 and agnoprotein, while VP1 expressed alone was distributed both in the cytoplasm and in the nucleus. Mutation analysis revealed that discrete intranuclear accumulation of VP1 requires the presence of either VP2 or VP3. However, VP2 or VP3 expressed in the absence of VP1 showed diffuse, not discrete, nuclear localization. The C-terminal sequence of VP2/VP3 contains two basic regions, GPNKKKRRK (cluster 1) and KRRSRSSRS (cluster 2). The deletion of cluster 2 abolished the accumulation of VP1 to distinct subnuclear domains. Deletion of the C-terminal 34 residues of VP2/VP3, including both cluster 1 and cluster 2, caused VP1 to localize both in the cytoplasm and in the nucleus. Using immunoelectron microscopy of cells that coexpressed VP1, VP2/VP3, and agnoprotein, we detected the assembly of virus-like particles in discrete locations along the inner nuclear periphery. Both in oligodendrocytes of the human brain and in transfected cells, discrete nuclear domains for VP1 accumulation were identified as ND10, which contains the PML protein. These results indicate that major and minor capsid proteins cooperatively accumulate in ND10, where they are efficiently assembled into virions.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号