首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 11 毫秒
1.

Background  

Protein subcellular localization is crucial for genome annotation, protein function prediction, and drug discovery. Determination of subcellular localization using experimental approaches is time-consuming; thus, computational approaches become highly desirable. Extensive studies of localization prediction have led to the development of several methods including composition-based and homology-based methods. However, their performance might be significantly degraded if homologous sequences are not detected. Moreover, methods that integrate various features could suffer from the problem of low coverage in high-throughput proteomic analyses due to the lack of information to characterize unknown proteins.  相似文献   

2.

Background  

Predicting the subcellular localization of proteins is important for determining the function of proteins. Previous works focused on predicting protein localization in Gram-negative bacteria obtained good results. However, these methods had relatively low accuracies for the localization of extracellular proteins. This paper studies ways to improve the accuracy for predicting extracellular localization in Gram-negative bacteria.  相似文献   

3.
The function of a protein is intimately tied to its subcellular localization. Although localizations have been measured for many yeast proteins through systematic GFP fusions, similar studies in other branches of life are still forthcoming. In the interim, various machine-learning methods have been proposed to predict localization using physical characteristics of a protein, such as amino acid content, hydrophobicity, side-chain mass and domain composition. However, there has been comparatively little work on predicting localization using protein networks. Here, we predict protein localizations by integrating an extensive set of protein physical characteristics over a protein's extended protein-protein interaction neighborhood, using a classification framework called 'Divide and Conquer k-Nearest Neighbors' (DC-kNN). These predictions achieve significantly higher accuracy than two well-known methods for predicting protein localization in yeast. Using new GFP imaging experiments, we show that the network-based approach can extend and revise previous annotations made from high-throughput studies. Finally, we show that our approach remains highly predictive in higher eukaryotes such as fly and human, in which most localizations are unknown and the protein network coverage is less substantial.  相似文献   

4.
Here we report a systematic approach for predicting subcellular localization (cytoplasm, mitochondrial, nuclear, and plasma membrane) of human proteins. First, support vector machine (SVM)-based modules for predicting subcellular localization using traditional amino acid and dipeptide (i + 1) composition achieved overall accuracy of 76.6 and 77.8%, respectively. PSI-BLAST, when carried out using a similarity-based search against a nonredundant data base of experimentally annotated proteins, yielded 73.3% accuracy. To gain further insight, a hybrid module (hybrid1) was developed based on amino acid composition, dipeptide composition, and similarity information and attained better accuracy of 84.9%. In addition, SVM modules based on a different higher order dipeptide i.e. i + 2, i + 3, and i + 4 were also constructed for the prediction of subcellular localization of human proteins, and overall accuracy of 79.7, 77.5, and 77.1% was accomplished, respectively. Furthermore, another SVM module hybrid2 was developed using traditional dipeptide (i + 1) and higher order dipeptide (i + 2, i + 3, and i + 4) compositions, which gave an overall accuracy of 81.3%. We also developed SVM module hybrid3 based on amino acid composition, traditional and higher order dipeptide compositions, and PSI-BLAST output and achieved an overall accuracy of 84.4%. A Web server HSLPred (www.imtech.res.in/raghava/hslpred/ or bioinformatics.uams.edu/raghava/hslpred/) has been designed to predict subcellular localization of human proteins using the above approaches.  相似文献   

5.
MOTIVATION: Each protein performs its functions within some specific locations in a cell. This subcellular location is important for understanding protein function and for facilitating its purification. There are now many computational techniques for predicting location based on sequence analysis and database information from homologs. A few recent techniques use text from biological abstracts: our goal is to improve the prediction accuracy of such text-based techniques. We identify three techniques for improving text-based prediction: a rule for ambiguous abstract removal, a mechanism for using synonyms from the Gene Ontology (GO) and a mechanism for using the GO hierarchy to generalize terms. We show that these three techniques can significantly improve the accuracy of protein subcellular location predictors that use text extracted from PubMed abstracts whose references are recorded in Swiss-Prot.  相似文献   

6.
We present a new method for conducting protein structure similarity searches, which improves on the efficiency of some existing techniques. Our method is grounded in the theory of differential geometry on 3D space curve matching. We generate shape signatures for proteins that are invariant, localized, robust, compact, and biologically meaningful. The invariancy of the shape signatures allows us to improve similarity searching efficiency by adopting a hierarchical coarse-to-fine strategy. We index the shape signatures using an efficient hashing-based technique. With the help of this technique we screen out unlikely candidates and perform detailed pairwise alignments only for a small number of candidates that survive the screening process. Contrary to other hashing based techniques, our technique employs domain specific information (not just geometric information) in constructing the hash key, and hence, is more tuned to the domain of biology. Furthermore, the invariancy, localization, and compactness of the shape signatures allow us to utilize a well-known local sequence alignment algorithm for aligning two protein structures. One measure of the efficacy of the proposed technique is that we were able to perform structure alignment queries 36 times faster (on the average) than a well-known method while keeping the quality of the query results at an approximately similar level.  相似文献   

7.
蛋白质亚细胞定位预测对蛋白质的功能、相互作用及调控机制的研究具有重要意义。本文基于物化性质和结构性质对氨基酸的约化,描述序列局部和全局信息的"组成"、"转换"和"分布"特征,并利用氨基酸亲疏水性的数值统计特征,提出了一种新的蛋白质特征表示方法(NSBH)。分别使用三种分类器KNN、SVM及BP神经网络进行蛋白质亚细胞定位预测,比较了几种方法和特征融合方法的预测结果,显示融合特征表示及结合SVM分类器时能够达到更好的预测准确率。同时,还详细讨论了不同参数对实验结果的影响,具体的实验及比较结果显示了该方法的有效性。  相似文献   

8.
9.
Chang JM  Su EC  Lo A  Chiu HS  Sung TY  Hsu WL 《Proteins》2008,72(2):693-710
Prediction of protein subcellular localization (PSL) is important for genome annotation, protein function prediction, and drug discovery. Many computational approaches for PSL prediction based on protein sequences have been proposed in recent years for Gram-negative bacteria. We present PSLDoc, a method based on gapped-dipeptides and probabilistic latent semantic analysis (PLSA) to solve this problem. A protein is considered as a term string composed by gapped-dipeptides, which are defined as any two residues separated by one or more positions. The weighting scheme of gapped-dipeptides is calculated according to a position specific score matrix, which includes sequence evolutionary information. Then, PLSA is applied for feature reduction, and reduced vectors are input to five one-versus-rest support vector machine classifiers. The localization site with the highest probability is assigned as the final prediction. It has been reported that there is a strong correlation between sequence homology and subcellular localization (Nair and Rost, Protein Sci 2002;11:2836-2847; Yu et al., Proteins 2006;64:643-651). To properly evaluate the performance of PSLDoc, a target protein can be classified into low- or high-homology data sets. PSLDoc's overall accuracy of low- and high-homology data sets reaches 86.84% and 98.21%, respectively, and it compares favorably with that of CELLO II (Yu et al., Proteins 2006;64:643-651). In addition, we set a confidence threshold to achieve a high precision at specified levels of recall rates. When the confidence threshold is set at 0.7, PSLDoc achieves 97.89% in precision which is considerably better than that of PSORTb v.2.0 (Gardy et al., Bioinformatics 2005;21:617-623). Our approach demonstrates that the specific feature representation for proteins can be successfully applied to the prediction of protein subcellular localization and improves prediction accuracy. Besides, because of the generality of the representation, our method can be extended to eukaryotic proteomes in the future. The web server of PSLDoc is publicly available at http://bio-cluster.iis.sinica.edu.tw/~ bioapp/PSLDoc/.  相似文献   

10.
The ability to predict the subcellular localization of a protein from its sequence is of great importance, as it provides information about the protein's function. We present a computational tool, PredSL, which utilizes neural networks, Markov chains, profile hidden Markov models, and scoring matrices for the prediction of the subcellular localization of proteins in eukaryotic cells from the N-terminal amino acid sequence. It aims to classify proteins into five groups: chloroplast, thylakoid, mitochondrion, secretory pathway, and "other". When tested in a fivefold cross-validation procedure, PredSL demonstrates 86.7% and 87.1% overall accuracy for the plant and non-plant datasets, respectively. Compared with TargetP, which is the most widely used method to date, and LumenP, the results of PredSL are comparable in most cases. When tested on the experimentally verified proteins of the Saccharomyces cerevisiae genome, PredSL performs comparably if not better than any available algorithm for the same task. Furthermore, PredSL is the only method capable for the prediction of these subcellular localizations that is available as a stand-alone application through the URL: http://bioinformatics.biol.uoa.gr/PredSL/.  相似文献   

11.
MOTIVATION: Functional annotation of unknown proteins is a major goal in proteomics. A key annotation is the prediction of a protein's subcellular localization. Numerous prediction techniques have been developed, typically focusing on a single underlying biological aspect or predicting a subset of all possible localizations. An important step is taken towards emulating the protein sorting process by capturing and bringing together biologically relevant information, and addressing the clear need to improve prediction accuracy and localization coverage. RESULTS: Here we present a novel SVM-based approach for predicting subcellular localization, which integrates N-terminal targeting sequences, amino acid composition and protein sequence motifs. We show how this approach improves the prediction based on N-terminal targeting sequences, by comparing our method TargetLoc against existing methods. Furthermore, MultiLoc performs considerably better than comparable methods predicting all major eukaryotic subcellular localizations, and shows better or comparable results to methods that are specialized on fewer localizations or for one organism. AVAILABILITY: http://www-bs.informatik.uni-tuebingen.de/Services/MultiLoc/  相似文献   

12.
13.
We evaluated the efficiency of the best linear unbiased predictor (BLUP) and the influence of the use of similarity in state (SIS) and similarity by descent (SBD) in the prediction of untested maize hybrids. Nine inbred lines of maize were crossed using a randomized complete diallel method. These materials were genotyped with 48 microsatellite markers (SSR) associated with the QTL regions for grain yield. Estimates of four coefficients of SIS and four coefficients of SBD were used to construct the additive genetic and dominance matrices, which were later used in combination with the BLUP for predicting genotypic values and specific combining ability (SCA) in unanalyzed hybrids under simulated unbalance. The values of correlations between the genotypic values predicted and the means observed, depending on the degree of unbalance, ranged from 0.48 to 0.99 for SIS and 0.40 to 0.99 using information from SBD. The results obtained for the SCA ranged from 0.26 to 0.98 using the SIS and 0.001 to 0.990 using the SBD information. It was also observed that the predictions using SBD showed less biased than SIS predictions demonstrating that the predictions obtained by these coefficients (SBD) were closer to the observed value, but were less efficient in the ranking of genotypes. Although the SIS showed a bias due to overestimation of relatedness, this type of coefficient may be used where low values are detected in the SBD in the group of parents because of its greater efficiency in ranking the candidates hybrids.  相似文献   

14.
The subcellular localization of a protein can provide important information about its function within the cell. As eukaryotic cells and particularly mammalian cells are characterized by a high degree of compartmentalization, most protein activities can be assigned to particular cellular compartments. The categorization of proteins by their subcellular localization is therefore one of the essential goals of the functional annotation of the human genome. We previously performed a subcellular localization screen of 52 proteins encoded on human chromosome 21. In the current study, we compared the experimental localization data to the in silico results generated by nine leading software packages with different prediction resolutions. The comparison revealed striking differences between the programs in the accuracy of their subcellular protein localization predictions. Our results strongly suggest that the recently developed predictors utilizing multiple prediction methods tend to provide significantly better performance over purely sequence-based or homology-based predictions.  相似文献   

15.
We explored a novel approach to the functional regulation of nuclear proteins; altering their subcellular localization. To anchor a nuclear protein, beta-galactosidase with the nuclear localization signal of SV40 (nbeta-gal), within the cytoplasm, nbeta-gal was fused to the transmembrane domain of granulocyte colony-stimulating factor receptor (G-CSFR), a membrane protein. To liberate the nbeta-gal portion from the fusion protein, we used a protease derived from a plant virus, whose recognition sequence was inserted between the G-CSFR and nbeta-gal. Western analysis showed that the chimeric protein was cleaved in the presence of the protease in 293 cells and that the fusion protein without the recognition sequence remained intact. This chimeric protein was localized exclusively in the cytoplasm as visualized by X-gal staining and immunofluorescence microscopy. In contrast, when expressed together with the protease, beta-gal was predominantly detected in the nuclei. Moreover, we isolated 293-cell clones constitutively expressing the protease, indicating that this protease is not cytotoxic. These results suggest that the viral protease-mediated alteration of subcellular localization can potentially regulate the function of nuclear proteins.  相似文献   

16.
Automated sequence annotation is a major goal of post-genomic era with hundreds of genomes in the databases, from both prokaryotes and eukaryotes. While the number of fully sequenced chromosomes from microbial organisms exponentially increased in the last decade above 600, presently we know the whole DNA content of only 25 eukaryotic organisms, including Homo sapiens. However, the process of genome annotation is far from being completed. This is particularly relevant in eukaryotes, whose cells contain several subcellular compartments, or organelles, enclosed by membranes, where different relevant functions are performed. Translocation across the membrane into the organelles is a highly regulated and complex cellular process. Indeed different proteins and/or protein isoforms, originated from genes by alternative splicing, may be conveyed to different cell compartments, depending on their specific role in the cell. During recent years the prediction of subcellular localization (SL) by computational means has been an active research area. Several methods are presently available based on different notions and addressing different aspects of SL. This review provides a short overview of the most well performing methods described in the literature, highlighting their predictive capabilities and different applications.  相似文献   

17.
Protein tertiary structure prediction using a branch and bound algorithm   总被引:2,自引:0,他引:2  
We report a new method for predicting protein tertiary structure from sequence and secondary structure information. The predictions result from global optimization of a potential energy function, including van der Waals, hydrophobic, and excluded volume terms. The optimization algorithm, which is based on the alphaBB method developed by Floudas and coworkers (Costas and Floudas, J Chem Phys 1994;100:1247-1261), uses a reduced model of the protein and is implemented in both distance and dihedral angle space, enabling a side-by-side comparison of methodologies. For a set of eight small proteins, representing the three basic types--all alpha, all beta, and mixed alpha/beta--the algorithm locates low-energy native-like structures (less than 6A root mean square deviation from the native coordinates) starting from an unfolded state. Serial and parallel implementations of this methodology are discussed.  相似文献   

18.
The uapC gene of Aspergillus nidulans belongs to a family of nucleobase-specific transporters conserved in prokaryotic and eucaryotic organisms. We report the use of immunological and green fluorescent protein based strategies to study protein expression and subcellular distribution of UapC. A chimeric protein containing a plant-adapted green fluorescent protein (sGFP) fused to the C-terminus of UapC was shown to be functional in vivo, as it complements a triple mutant (i.e., uapC(-) uapA(-) azgA(-)) unable to grow on uric acid as the sole nitrogen source. UapC-GFP is located in the plasma membrane and, secondarily, in internal structures observed as fluorescent dots. A strong correlation was found between cellular levels of UapC-GFP fluorescence and known patterns of uapC gene expression. This work represents the first in vivo study of protein expression and subcellular localization of a filamentous fungal nucleobase transporter.  相似文献   

19.
Tobamoviruses represent a well-characterized system used to examine viral infection, whereas Arabidopsis is a choice plant for most genetic experiments. It would be useful to combine both approaches into one experimental system for virus–plant interaction. Most tobamoviruses, however, are not pathogenic in Arabidopsis . Here, we describe infection of Arabidopsis by a recently discovered crucifer-infecting turnip vein clearing tobamovirus (TVCV). Using this system, we determined patterns and kinetics of viral local and systemic movement within Arabidopsis plants. Localization studies showed that the virus infects both vegetative and reproductive plant tissues. However, there may be a transport barrier between the seed coat and the embryo which virions cannot cross, preventing seed transmission of TVCV. The ability to move both locally and systemically in Arabidopsis , causing mild and fast-developing symptoms but allowing survival and fertility of the infected plants, distinguish TVCV infection of Arabidopsis as a model system to study virus–plant interaction.  相似文献   

20.

Background  

Knowing the subcellular location of proteins provides clues to their function as well as the interconnectivity of biological processes. Dozens of tools are available for predicting protein location in the eukaryotic cell. Each tool performs well on certain data sets, but their predictions often disagree for a given protein. Since the individual tools each have particular strengths, we set out to integrate them in a way that optimally exploits their potential. The method we present here is applicable to various subcellular locations, but tailored for predicting whether or not a protein is localized in mitochondria. Knowledge of the mitochondrial proteome is relevant to understanding the role of this organelle in global cellular processes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号