首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Background  

Gene Ontology (GO) annotation, which describes the function of genes and gene products across species, has recently been used to predict protein subcellular and subnuclear localization. Existing GO-based prediction methods for protein subcellular localization use the known accession numbers of query proteins to obtain their annotated GO terms. An accurate prediction method for predicting subcellular localization of novel proteins without known accession numbers, using only the input sequence, is worth developing.  相似文献   

2.
Li FM  Li QZ 《Amino acids》2008,34(1):119-125
Summary. The subnuclear localization of nuclear protein is very important for in-depth understanding of the construction and function of the nucleus. Based on the amino acid and pseudo amino acid composition (PseAA) as originally introduced by K. C. Chou can incorporate much more information of a protein sequence than the classical amino acid composition so as to significantly enhance the power of using a discrete model to predict various attributes of a protein, an algorithm of increment of diversity combined with the improved quadratic discriminant analysis is proposed to predict the protein subnuclear location. The overall predictive success rates and correlation coefficient are 75.4% and 0.629 for 504 single localization proteins in jackknife test, and 80.4% for an independent set of 92 multi-localization proteins, respectively. For 406 single localization nuclear proteins with ≤25% sequence identity, the results of jackknife test show that the overall accuracy of prediction is 77.1%. Authors’ address: Qian-Zhong Li, Laboratory of Theoretical Biophysics, Department of Physics, College of Sciences and Technology, Inner Mongolia University, Hohhot 010021, China  相似文献   

3.
Huang WL  Tung CW  Huang HL  Hwang SF  Ho SY 《Bio Systems》2007,90(2):573-581
Accurate prediction methods of protein subnuclear localizations rely on the cooperation between informative features and classifier design. Support vector machine (SVM) based learning methods are shown effective for predictions of protein subcellular and subnuclear localizations. This study proposes an evolutionary support vector machine (ESVM) based classifier with automatic selection from a large set of physicochemical composition (PCC) features to design an accurate system for predicting protein subnuclear localization, named ProLoc. ESVM using an inheritable genetic algorithm combined with SVM can automatically determine the best number m of PCC features and identify m out of 526 PCC features simultaneously. To evaluate ESVM, this study uses two datasets SNL6 and SNL9, which have 504 proteins localized in 6 subnuclear compartments and 370 proteins localized in 9 subnuclear compartments. Using a leave-one-out cross-validation, ProLoc utilizing the selected m=33 and 28 PCC features has accuracies of 56.37% for SNL6 and 72.82% for SNL9, which are better than 51.4% for the SVM-based system using k-peptide composition features applied on SNL6, and 64.32% for an optimized evidence-theoretic k-nearest neighbor classifier utilizing pseudo amino acid composition applied on SNL9, respectively.  相似文献   

4.
We have characterized the interaction and nuclear localization of the nucleocapsid (N) protein and phosphoprotein (P) of sonchus yellow net nucleorhabdovirus. Expression studies with plant and yeast cells revealed that both N and P are capable of independent nuclear import. Site-specific mutagenesis and deletion analyses demonstrated that N contains a carboxy-terminal bipartite nuclear localization signal (NLS) located between amino acids 465 and 481 and that P contains a karyophillic region between amino acids 40 and 124. The N NLS was fully capable of functioning outside of the context of the N protein and was able to direct the nuclear import of a synthetic protein fusion consisting of green fluorescent protein fused to glutathione S-transferase (GST). Expression and mapping studies suggested that the karyophillic domain in P is located within the N-binding domain. Coexpression of N and P drastically affected their localization patterns relative to those of individually expressed proteins and resulted in a shift of both proteins to a subnuclear region. Yeast two-hybrid and GST pulldown experiments verified the N-P and P-P interactions, and deletion analyses have identified the N and P interacting domains. N NLS mutants were not transported to the nucleus by import-competent P, presumably because N binding masks the P NLS. Taken together, our results support a model for independent entry of N and P into the nucleus followed by associations that mediate subnuclear localization.  相似文献   

5.
Jiang X  Wei R  Zhao Y  Zhang T 《Amino acids》2008,34(4):669-675
The knowledge of subnuclear localization in eukaryotic cells is essential for understanding the life function of nucleus. Developing prediction methods and tools for proteins subnuclear localization become important research fields in protein science for special characteristics in cell nuclear. In this study, a novel approach has been proposed to predict protein subnuclear localization. Sample of protein is represented by Pseudo Amino Acid (PseAA) composition based on approximate entropy (ApEn) concept, which reflects the complexity of time series. A novel ensemble classifier is designed incorporating three AdaBoost classifiers. The base classifier algorithms in three AdaBoost are decision stumps, fuzzy K nearest neighbors classifier, and radial basis-support vector machines, respectively. Different PseAA compositions are used as input data of different AdaBoost classifier in ensemble. Genetic algorithm is used to optimize the dimension and weight factor of PseAA composition. Two datasets often used in published works are used to validate the performance of the proposed approach. The obtained results of Jackknife cross-validation test are higher and more balance than them of other methods on same datasets. The promising results indicate that the proposed approach is effective and practical. It might become a useful tool in protein subnuclear localization. The software in Matlab and supplementary materials are available freely by contacting the corresponding author.  相似文献   

6.

Background  

The accomplishment of the various genome sequencing projects resulted in accumulation of massive amount of gene sequence information. This calls for a large-scale computational method for predicting protein localization from sequence. The protein localization can provide valuable information about its molecular function, as well as the biological pathway in which it participates. The prediction of localization of a protein at subnuclear level is a challenging task. In our previous work we proposed an SVM-based system using protein sequence information for this prediction task. In this work, we assess protein similarity with Gene Ontology (GO) and then improve the performance of the system by adding a module of nearest neighbor classifier using a similarity measure derived from the GO annotation terms for protein sequences.  相似文献   

7.
8.
A rat liver nuclear insoluble protein fraction was analyzed to investigate candidate proteins participating in nuclear architecture formation. Proteins were subjected to two-dimensional separation by reversed-phase HPLC in 60% formic acid and SDS/PAGE. The method produced good resolution of insoluble proteins. One hundred and thirty-eight proteins were separated, and 28 of these were identified. The identified proteins included one novel protein, seven known nuclear proteins and 12 known nuclear matrix proteins. The novel 36 kDa protein was further investigated for its subnuclear localization. The human ortholog of the protein was expressed in Escherichia coli and antibodies were raised against the recombinant protein. Exclusive localization of the protein to the nuclear insoluble protein fraction was confirmed by cell fractionation followed by immunoblotting. Immunostaining of mouse C3H cells suggested that the 36 kDa protein was a constituent of an insoluble macromolecular complex spread throughout the interchromatin space of the nucleus. The protein was designated 'interchromatin space protein of 36 kDa', ISP36.  相似文献   

9.
Mammalian topoisomerase II isoforms alpha and beta are diverged in their C-terminal domain (CTD), but both isoforms complement the yeast top2 mutation. In this study, mammalian topoisomerase IIalpha-CTD and IIbeta-CTD were tagged with yellow fluorescent protein (YFP), expressed in yeast cells, and their localization was examined. YFP tagged-topoisomerase IIalpha-CTD was distributed evenly throughout the nucleus, while YFP tagged-topoisomerase IIbeta-CTD was sequestered into a subnuclear compartment. Deletion analysis revealed that two regions (amino acids 1207-1234 and 1513-1573) of the topoisomerase IIbeta-CTD are essential for specific localization of the beta isoform: if either of the two regions is removed, the mutant topoisomerase IIbeta-CTD distributes evenly throughout the nucleus. The data suggest that yeast cells distinguish the nuclear and subnuclear localization signals associated with these two mammalian topoisomerase II isoforms.  相似文献   

10.
We have compared a novel sequence-structure matching technique, FORESST, for detecting remote homologs to three existing sequence based methods, including local amino acid sequence similarity by BLASTP, hidden Markov models (HMMs) of sequences of protein families using SAM, HMMs based on sequence motifs identified using meta-MEME. FORESST compares predicted secondary structures to a library of structural families of proteins, using HMMs. Altogether 45 proteins from nine structural families in the database CATH were used in a cross-validated test of the fold assignment accuracy of each method. Local sequence similarity of a query sequence to a protein family is measured by the highest segment pair (HSP) score. Each of the HMM-based approaches (FORESST, MEME, amino acid sequence-based HMM) yielded log-odds score for the query sequence. In order to make a fair comparison among these methods, the scores for each method were converted to Z-scores in a uniform way by comparing the raw scores of a query protein with the corresponding scores for a set of unrelated proteins. Z-Scores were analyzed as a function of the maximum pairwise sequence identity (MPSID) of the query sequence to sequences used in training the model. For MPSID above 20%, the Z-scores increase linearly with MPSID for the sequence-based methods but remain roughly constant for FORESST. Below 15%, average Z-scores are close to zero for the sequence-based methods, whereas the FORESST method yielded average Z-scores of 1.8 and 1.1, using observed and predicted secondary structures, respectively. This demonstrates the advantage of the sequence-structure method for detecting remote homologs.  相似文献   

11.
Proteins located in appropriate cellular compartments are of paramount importance to exert their biological functions. Prediction of protein subcellular localization by computational methods is required in the post-genomic era. Recent studies have been focusing on predicting not only single-location proteins but also multi-location proteins. However, most of the existing predictors are far from effective for tackling the challenges of multi-label proteins. This article proposes an efficient multi-label predictor, namely mPLR-Loc, based on penalized logistic regression and adaptive decisions for predicting both single- and multi-location proteins. Specifically, for each query protein, mPLR-Loc exploits the information from the Gene Ontology (GO) database by using its accession number (AC) or the ACs of its homologs obtained via BLAST. The frequencies of GO occurrences are used to construct feature vectors, which are then classified by an adaptive decision-based multi-label penalized logistic regression classifier. Experimental results based on two recent stringent benchmark datasets (virus and plant) show that mPLR-Loc remarkably outperforms existing state-of-the-art multi-label predictors. In addition to being able to rapidly and accurately predict subcellular localization of single- and multi-label proteins, mPLR-Loc can also provide probabilistic confidence scores for the prediction decisions. For readers’ convenience, the mPLR-Loc server is available online (http://bioinfo.eie.polyu.edu.hk/mPLRLocServer).  相似文献   

12.
13.
The serine/arginine-rich (SR) proteins are one type of major actors in regulation of pre-mRNA splicing. Their functions are closely related to the intracellular spatial organization. The RS domain and phosphorylation status of SR proteins are two critical factors in determining the subcellular distribution. Mammalian Transformer-2β (Tra2β) protein, a member of SR proteins, is known to play multiple important roles in development and diseases. In the present study, we characterized the subcellular and subnuclear localization of Tra2β protein and its related mechanisms. The results demonstrated that in the brain the nuclear and cytoplasmic localization of Tra2β were correlated with its phosphorylation status. Using deletional mutation analysis, we showed that the nuclear localization of Tra2β was determined by multiple nuclear localization signals (NLSs) in the RS domains. The point-mutation analysis disclosed that phosphorylation of serine residues in the NLSs inhibited the function of NLS in directing Tra2β to the nucleus. In addition, we identified at least two nuclear speckle localization signals within the RS1 domain, but not in the RS2 domain. The nuclear speckle localization signals determined the localization of RS1 domain-contained proteins to the nuclear speckle. The function of the signals did not depend on the presence of serine residues. The results provide new insight into the mechanisms by which the subcellular and subnuclear localization of Tra2β proteins are regulated.  相似文献   

14.
了解真核细胞中细胞核内蛋白质的定位情况对于新发现蛋白质的功能注释具有重要意义.随着蛋白质数据库中蛋白质序列数量的急速增加,采用计算方法来预测蛋白质亚核定位已经成为蛋白质科学领域研究的热点.根据Chou提出的伪氨基酸组成离散模型,提出了一种新的蛋白质亚核定位预测方法.计算蛋白质序列的近似熵作为附加特征构建伪氨基酸组成,表示蛋白质序列特征,AdaBoost分类算法作为预测工具.与已报道的亚核定位预测方法的性能相比,这种方法具有更高的准确率.  相似文献   

15.
The nucleus is the brain of eukaryotic cells that guides the life processes of the cell by issuing key instructions. For in-depth understanding of the biochemical process of the nucleus, the knowledge of localization of nuclear proteins is very important. With the avalanche of protein sequences generated in the post-genomic era, it is highly desired to develop an automated method for fast annotating the subnuclear locations for numerous newly found nuclear protein sequences so as to be able to timely utilize them for basic research and drug discovery. In view of this, a novel approach is developed for predicting the protein subnuclear location. It is featured by introducing a powerful classifier, the optimized evidence-theoretic K-nearest classifier, and using the pseudo amino acid composition [K.C. Chou, PROTEINS: Structure, Function, and Genetics, 43 (2001) 246], which can incorporate a considerable amount of sequence-order effects, to represent protein samples. As a demonstration, identifications were performed for 370 nuclear proteins among the following 9 subnuclear locations: (1) Cajal body, (2) chromatin, (3) heterochromatin, (4) nuclear diffuse, (5) nuclear pore, (6) nuclear speckle, (7) nucleolus, (8) PcG body, and (9) PML body. The overall success rates thus obtained by both the re-substitution test and jackknife cross-validation test are significantly higher than those by existing classifiers on the same working dataset. It is anticipated that the powerful approach may also become a useful high throughput vehicle to bridge the huge gap occurring in the post-genomic era between the number of gene sequences in databases and the number of gene products that have been functionally characterized. The OET-KNN classifier will be available at www.pami.sjtu.edu.cn/people/hbshen.  相似文献   

16.
The constitutive photomorphogenesis 1 (COP1) protein of Arabidopsis thaliana accumulates in discrete subnuclear foci. To better understand the role of subnuclear architecture in COP1-mediated gene expression, we investigated the structural motifs of COP1 that mediate its localization to subnuclear foci using mutational analysis with green fluorescent protein as a reporter. In a transient expression assay, a subnuclear localization signal consisting of 58 residues between amino acids 120 and 177 of COP1 was able to confer speckled localization onto the heterologous nuclear NIa protein from tobacco etch virus. The subnuclear localization signal overlaps two previously characterized motifs, a cytoplasmic localization signal and a putative alpha-helical coiled-coil domain that has been implicated in COP1 dimerization. Moreover, phenotypically lethal mutations in the carboxyl-terminal WD-40 repeats inhibited localization to subnuclear foci, consistent with a functional role for the accumulation of COP1 at subnuclear sites.  相似文献   

17.
Antibodies against the loosely bound subnuclear fraction (0.35 M NaCl-extractable subnuclear fraction) of rat brain were raised in rabbits, and the distribution of the main antigenic determinants was followed among subcellular fractions of nervous tissue and among homologous nuclear preparations from different tissues. By immunofluorescence a localization restricted to the nucleus was observed, and by microcomplement fixation the antigens appeared to specifically enrich the fraction under examination, being poorly detectable in cytosol, nuclear sap, or deoxyribounucleoproteins of rat brain. No significant cross-reaction was observed by complement fixation with homologous preparations from muscle, liver, kidney, spleen, lung, or thymus of rat, whereas the 0.35 M NaCl-extracted subnuclear fraction from rat testis exhibited an immunoreactivity, although lower than that for brain proteins. After trypsin or ribonuclease treatment, the main antigenic determinants appeared to be protein in nature. The subnuclear fraction under examination, which is believed to be relevant to gene regulation, appears to contain protein antigens mainly concentrated in the nervous system.  相似文献   

18.
Protein domain prediction is often the preliminary step in both experimental and computational protein research. Here we present a new method to predict the domain boundaries of a multidomain protein from its amino acid sequence using a fuzzy mean operator. Using the nr-sequence database together with a reference protein set (RPS) containing known domain boundaries, the operator is used to assign a likelihood value for each residue of the query sequence as belonging to a domain boundary. This procedure robustly identifies contiguous boundary regions. For a dataset with a maximum sequence identity of 30%, the average domain prediction accuracy of our method is 97% for one domain proteins and 58% for multidomain proteins. The presented model is capable of using new sequence/structure information without re-parameterization after each RPS update. When tested on a current database using a four year old RPS and on a database that contains different domain definitions than those used to train the models, our method consistently yielded the same accuracy while two other published methods did not. A comparison with other domain prediction methods used in the CASP7 competition indicates that our method performs better than existing sequence-based methods.  相似文献   

19.
20.
Nuclear localization of proteins is a crucial element in the dynamic life of the cell. It is complicated by the massive diversity of targeting signals and the existence of proteins that shuttle between the nucleus and cytoplasm. Nevertheless, a majority of subcellular localization tools that predict nuclear proteins have been developed without involving dual localized proteins in the data sets. Hence, in general, the existing models are focused on predicting statically nuclear proteins, rather than nuclear localization itself. We present an independent analysis of existing nuclear localization predictors, using a nonredundant data set extracted from Swiss-Prot R50.0. We demonstrate that accuracy on truly novel proteins is lower than that of previous estimations, and that existing models generalize poorly to dual localized proteins. We have developed a model trained to identify nuclear proteins including dual localized proteins. The results suggest that using more recent data and including dual localized proteins improves the overall prediction. The final predictor NUCLEO operates with a realistic success rate of 0.70 and a correlation coefficient of 0.38, as established on the independent test set. (NUCLEO is available at: http://pprowler.itee.uq.edu.au.).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号