期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

ProLoc: prediction of protein subnuclear localization using SVM with automatic selection from physicochemical composition features 总被引：3，自引：0，他引：3

Huang WL Tung CW Huang HL Hwang SF Ho SY 《Bio Systems》2007,90(2):573-581

Accurate prediction methods of protein subnuclear localizations rely on the cooperation between informative features and classifier design. Support vector machine (SVM) based learning methods are shown effective for predictions of protein subcellular and subnuclear localizations. This study proposes an evolutionary support vector machine (ESVM) based classifier with automatic selection from a large set of physicochemical composition (PCC) features to design an accurate system for predicting protein subnuclear localization, named ProLoc. ESVM using an inheritable genetic algorithm combined with SVM can automatically determine the best number m of PCC features and identify m out of 526 PCC features simultaneously. To evaluate ESVM, this study uses two datasets SNL6 and SNL9, which have 504 proteins localized in 6 subnuclear compartments and 370 proteins localized in 9 subnuclear compartments. Using a leave-one-out cross-validation, ProLoc utilizing the selected m=33 and 28 PCC features has accuracies of 56.37% for SNL6 and 72.82% for SNL9, which are better than 51.4% for the SVM-based system using k-peptide composition features applied on SNL6, and 64.32% for an optimized evidence-theoretic k-nearest neighbor classifier utilizing pseudo amino acid composition applied on SNL9, respectively. 相似文献

2.

PairProSVM: protein subcellular localization based on local pairwise profile alignment and SVM

Mak MW Guo J Kung SY 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2008,5(3):416-422

The subcellular locations of proteins are important functional annotations. An effective and reliable subcellular localization method is necessary for proteomics research. This paper introduces a new method---PairProSVM---to automatically predict the subcellular locations of proteins. The profiles of all protein sequences in the training set are constructed by PSI-BLAST and the pairwise profile-alignment scores are used to form feature vectors for training a support vector machine (SVM) classifier. It was found that PairProSVM outperforms the methods that are based on sequence alignment and amino-acid compositions even if most of the homologous sequences have been removed. This paper also demonstrates that the performance of PairProSVM is sensitive (and somewhat proportional) to the degree of its kernel matrix meeting the Mercer's condition. PairProSVM was evaluated on Reinhardt and Hubbard's, Huang and Li's, and Gardy et al.'s protein datasets. The overall accuracies on these three datasets reach 99.3\%, 76.5\%, and 91.9\%, respectively, which are higher than or comparable to those obtained by sequence alignment and by the methods compared in this paper. 相似文献

3.

TESTLoc: protein subcellular localization prediction from EST data

Yao-Qing Shen Gertraud Burger 《BMC bioinformatics》2010,11(1):563

Background

The eukaryotic cell has an intricate architecture with compartments and substructures dedicated to particular biological processes. Knowing the subcellular location of proteins not only indicates how bio-processes are organized in different cellular compartments, but also contributes to unravelling the function of individual proteins. Computational localization prediction is possible based on sequence information alone, and has been successfully applied to proteins from virtually all subcellular compartments and all domains of life. However, we realized that current prediction tools do not perform well on partial protein sequences such as those inferred from Expressed Sequence Tag (EST) data, limiting the exploitation of the large and taxonomically most comprehensive body of sequence information from eukaryotes. 相似文献

4.

'Unite and conquer': enhanced prediction of protein subcellular localization by integrating multiple specialized tools

Yao Qing Shen Gertraud Burger 《BMC bioinformatics》2007,8(1):420

Background

Knowing the subcellular location of proteins provides clues to their function as well as the interconnectivity of biological processes. Dozens of tools are available for predicting protein location in the eukaryotic cell. Each tool performs well on certain data sets, but their predictions often disagree for a given protein. Since the individual tools each have particular strengths, we set out to integrate them in a way that optimally exploits their potential. The method we present here is applicable to various subcellular locations, but tailored for predicting whether or not a protein is localized in mitochondria. Knowledge of the mitochondrial proteome is relevant to understanding the role of this organelle in global cellular processes. 相似文献

5.

PSORT-B: Improving protein subcellular localization prediction for Gram-negative bacteria 总被引：2，自引：0，他引：2

Gardy JL Spencer C Wang K Ester M Tusnády GE Simon I Hua S deFays K Lambert C Nakai K Brinkman FS 《Nucleic acids research》2003,31(13):3613-3617

Automated prediction of bacterial protein subcellular localization is an important tool for genome annotation and drug discovery. PSORT has been one of the most widely used computational methods for such bacterial protein analysis; however, it has not been updated since it was introduced in 1991. In addition, neither PSORT nor any of the other computational methods available make predictions for all five of the localization sites characteristic of Gram-negative bacteria. Here we present PSORT-B, an updated version of PSORT for Gram-negative bacteria, which is available as a web-based application at http://www.psort.org. PSORT-B examines a given protein sequence for amino acid composition, similarity to proteins of known localization, presence of a signal peptide, transmembrane alpha-helices and motifs corresponding to specific localizations. A probabilistic method integrates these analyses, returning a list of five possible localization sites with associated probability scores. PSORT-B, designed to favor high precision (specificity) over high recall (sensitivity), attained an overall precision of 97% and recall of 75% in 5-fold cross-validation tests, using a dataset we developed of 1443 proteins of experimentally known localization. This dataset, the largest of its kind, is freely available, along with the PSORT-B source code (under GNU General Public License). 相似文献

6.

PROlocalizer: integrated web service for protein subcellular localization prediction

Kirsti Laurila Mauno Vihinen 《Amino acids》2011,40(3):975-980

Subcellular localization is an important protein property, which is related to function, interactions and other features. As experimental determination of the localization can be tedious, especially for large numbers of proteins, a number of prediction tools have been developed. We developed the PROlocalizer service that integrates 11 individual methods to predict altogether 12 localizations for animal proteins. The method allows the submission of a number of proteins and mutations and generates a detailed informative document of the prediction and obtained results. PROlocalizer is available at . 相似文献

7.

蛋白质亚细胞定位预测中的序列编码技术

王正华张振慧王勇献《生物信息学》2007,5(2):82-85,89

蛋白质序列的编码是亚细胞定位预测问题中的关键技术之一。该文较为详细地介绍了目前已有的蛋白质序列编码算法;并指出了序列编码中存在的一些问题及可能的发展方向。相似文献

8.

A novel representation for apoptosis protein subcellular localization prediction using support vector machine

Li Zhang Dachao Li 《Journal of theoretical biology》2009,259(2):361-99

Apoptosis, or programmed cell death, plays an important role in development of an organism. Obtaining information on subcellular location of apoptosis proteins is very helpful to understand the apoptosis mechanism. In this paper, based on the concept that the position distribution information of amino acids is closely related with the structure and function of proteins, we introduce the concept of distance frequency [Matsuda, S., Vert, J.P., Ueda, N., Toh, H., Akutsu, T., 2005. A novel representation of protein sequences for prediction of subcellular location using support vector machines. Protein Sci. 14, 2804-2813] and propose a novel way to calculate distance frequencies. In order to calculate the local features, each protein sequence is separated into p parts with the same length in our paper. Then we use the novel representation of protein sequences and adopt support vector machine to predict subcellular location. The overall prediction accuracy is significantly improved by jackknife test. 相似文献

9.

SCLpred: protein subcellular localization prediction by N-to-1 neural networks

Mooney C Wang YH Pollastri G 《Bioinformatics (Oxford, England)》2011,27(20):2812-2819

相似文献

10.

Support vector machine approach for protein subcellular localization prediction 总被引：47，自引：0，他引：47

Hua S Sun Z 《Bioinformatics (Oxford, England)》2001,17(8):721-728

MOTIVATION: Subcellular localization is a key functional characteristic of proteins. A fully automatic and reliable prediction system for protein subcellular localization is needed, especially for the analysis of large-scale genome sequences. RESULTS: In this paper, Support Vector Machine has been introduced to predict the subcellular localization of proteins from their amino acid compositions. The total prediction accuracies reach 91.4% for three subcellular locations in prokaryotic organisms and 79.4% for four locations in eukaryotic organisms. Predictions by our approach are robust to errors in the protein N-terminal sequences. This new approach provides superior prediction performance compared with existing algorithms based on amino acid composition and can be a complementary method to other existing methods based on sorting signals. AVAILABILITY: A web server implementing the prediction method is available at http://www.bioinfo.tsinghua.edu.cn/SubLoc/. SUPPLEMENTARY INFORMATION: Supplementary material is available at http://www.bioinfo.tsinghua.edu.cn/SubLoc/. 相似文献

11.

Protein networks markedly improve prediction of subcellular localization in multiple eukaryotic species 总被引：1，自引：0，他引：1

Lee K Chuang HY Beyer A Sung MK Huh WK Lee B Ideker T 《Nucleic acids research》2008,36(20):e136

The function of a protein is intimately tied to its subcellular localization. Although localizations have been measured for many yeast proteins through systematic GFP fusions, similar studies in other branches of life are still forthcoming. In the interim, various machine-learning methods have been proposed to predict localization using physical characteristics of a protein, such as amino acid content, hydrophobicity, side-chain mass and domain composition. However, there has been comparatively little work on predicting localization using protein networks. Here, we predict protein localizations by integrating an extensive set of protein physical characteristics over a protein's extended protein-protein interaction neighborhood, using a classification framework called 'Divide and Conquer k-Nearest Neighbors' (DC-kNN). These predictions achieve significantly higher accuracy than two well-known methods for predicting protein localization in yeast. Using new GFP imaging experiments, we show that the network-based approach can extend and revise previous annotations made from high-throughput studies. Finally, we show that our approach remains highly predictive in higher eukaryotes such as fly and human, in which most localizations are unknown and the protein network coverage is less substantial. 相似文献

12.

SherLoc: high-accuracy prediction of protein subcellular localization by integrating text and protein sequence data 总被引：1，自引：0，他引：1

Shatkay H Höglund A Brady S Blum T Dönnes P Kohlbacher O 《Bioinformatics (Oxford, England)》2007,23(11):1410-1417

MOTIVATION: Knowing the localization of a protein within the cell helps elucidate its role in biological processes, its function and its potential as a drug target. Thus, subcellular localization prediction is an active research area. Numerous localization prediction systems are described in the literature; some focus on specific localizations or organisms, while others attempt to cover a wide range of localizations. RESULTS: We introduce SherLoc, a new comprehensive system for predicting the localization of eukaryotic proteins. It integrates several types of sequence and text-based features. While applying the widely used support vector machines (SVMs), SherLoc's main novelty lies in the way in which it selects its text sources and features, and integrates those with sequence-based features. We test SherLoc on previously used datasets, as well as on a new set devised specifically to test its predictive power, and show that SherLoc consistently improves on previous reported results. We also report the results of applying SherLoc to a large set of yet-unlocalized proteins. AVAILABILITY: SherLoc, along with Supplementary Information, is available at: http://www-bs.informatik.uni-tuebingen.de/Services/SherLoc/ 相似文献

13.

PredSL： A Tool for the N-terminal Sequence-based Prediction of Protein Subcellular Localization

Petsalaki EI Bagos PG Litou ZI Hamodrakas SJ 《基因组蛋白质组与生物信息学报(英文版)》2006,4(1):48-55

The ability to predict the subcellular localization of a protein from its sequence is of great importance, as it provides information about the protein＇s function. We present a computational tool, PredSL, which utilizes neural networks, Markov chains, profile hidden Markov models, and scoring matrices for the prediction of the subcellular localization of proteins in eukaryotic cells from the N-terminal amino acid sequence. It aims to classify proteins into five groups： chloroplast, thylakoid, mitochondrion, secretory pathway, and ＂other＂. When tested in a fivefold cross-validation procedure, PredSL demonstrates 86.7% and 87.1% overall accuracy for the plant and non-plant datasets, respectively. Compared with TargetP, which is the most widely used method to date, and LumenP, the results of PredSL are comparable in most cases. When tested on the experimentally verified proteins of the Saccharomyces cerevisiae genome, PredSL performs comparably if not better than any available algorithm for the same task. Furthermore, PredSL is the only method capable for the prediction of these subcellular localizations that is available as a stand-alone application through the URL： http：//bioinformatics.biol.uoa.gr/PredSL/. 相似文献

14.

MultiLoc: prediction of protein subcellular localization using N-terminal targeting sequences, sequence motifs and amino acid composition 总被引：10，自引：0，他引：10

Höglund A Dönnes P Blum T Adolph HW Kohlbacher O 《Bioinformatics (Oxford, England)》2006,22(10):1158-1165

MOTIVATION: Functional annotation of unknown proteins is a major goal in proteomics. A key annotation is the prediction of a protein's subcellular localization. Numerous prediction techniques have been developed, typically focusing on a single underlying biological aspect or predicting a subset of all possible localizations. An important step is taken towards emulating the protein sorting process by capturing and bringing together biologically relevant information, and addressing the clear need to improve prediction accuracy and localization coverage. RESULTS: Here we present a novel SVM-based approach for predicting subcellular localization, which integrates N-terminal targeting sequences, amino acid composition and protein sequence motifs. We show how this approach improves the prediction based on N-terminal targeting sequences, by comparing our method TargetLoc against existing methods. Furthermore, MultiLoc performs considerably better than comparable methods predicting all major eukaryotic subcellular localizations, and shows better or comparable results to methods that are specialized on fewer localizations or for one organism. AVAILABILITY: http://www-bs.informatik.uni-tuebingen.de/Services/MultiLoc/ 相似文献

15.

PSCL: predicting protein subcellular localization based on optimal functional domains

Wang K Hu LL Shi XH Dong YS Li HP Wen TQ 《Protein and peptide letters》2012,19(1):15-22

It is well known that protein subcellular localizations are closely related to their functions. Although many computational methods and tools are available from Internet, it is still necessary to develop new algorithms in this filed to gain a better understanding of the complex mechanism of plant subcellular localization. Here, we provide a new web server named PSCL for plant protein subcellular localization prediction by employing optimized functional domains. After feature optimization, 848 optimal functional domains from InterPro were obtained to represent each protein. By calculating the distances to each of the seven categories, PSCL showing the possibilities of a protein located into each of those categories in ascending order. Toward our dataset, PSCL achieved a first-order predicted accuracy of 75.7% by jackknife test. Gene Ontology enrichment analysis showing that catalytic activity, cellular process and metabolic process are strongly correlated with the localization of plant proteins. Finally, PSCL, a Linux Operate System based web interface for the predictor was designed and is accessible for public use at http://pscl.biosino.org/. 相似文献

16.

Refining protein subcellular localization

Scott MS Calafell SJ Thomas DY Hallett MT 《PLoS computational biology》2005,1(6):e66

The study of protein subcellular localization is important to elucidate protein function. Even in well-studied organisms such as yeast, experimental methods have not been able to provide a full coverage of localization. The development of bioinformatic predictors of localization can bridge this gap. We have created a Bayesian network predictor called PSLT2 that considers diverse protein characteristics, including the combinatorial presence of InterPro motifs and protein interaction data. We compared the localization predictions of PSLT2 to high-throughput experimental localization datasets. Disagreements between these methods generally involve proteins that transit through or reside in the secretory pathway. We used our multi-compartmental predictions to refine the localization annotations of yeast proteins primarily by distinguishing between soluble lumenal proteins and soluble proteins peripherally associated with organelles. To our knowledge, this is the first tool to provide this functionality. We used these sub-compartmental predictions to characterize cellular processes on an organellar scale. The integration of diverse protein characteristics and protein interaction data in an appropriate setting can lead to high-quality detailed localization annotations for whole proteomes. This type of resource is instrumental in developing models of whole organelles that provide insight into the extent of interaction and communication between organelles and help define organellar functionality. 相似文献

17.

Improving subcellular localization prediction using text classification and the gene ontology 总被引：1，自引：0，他引：1

Fyshe A Liu Y Szafron D Greiner R Lu P 《Bioinformatics (Oxford, England)》2008,24(21):2512-2517

MOTIVATION: Each protein performs its functions within some specific locations in a cell. This subcellular location is important for understanding protein function and for facilitating its purification. There are now many computational techniques for predicting location based on sequence analysis and database information from homologs. A few recent techniques use text from biological abstracts: our goal is to improve the prediction accuracy of such text-based techniques. We identify three techniques for improving text-based prediction: a rule for ambiguous abstract removal, a mechanism for using synonyms from the Gene Ontology (GO) and a mechanism for using the GO hierarchy to generalize terms. We show that these three techniques can significantly improve the accuracy of protein subcellular location predictors that use text extracted from PubMed abstracts whose references are recorded in Swiss-Prot. 相似文献

18.

Gene ontology based transfer learning for protein subcellular localization

Suyu Mei Wang Fei Shuigeng Zhou 《BMC bioinformatics》2011,12(1):44

Background

Prediction of protein subcellular localization generally involves many complex factors, and using only one or two aspects of data information may not tell the true story. For this reason, some recent predictive models are deliberately designed to integrate multiple heterogeneous data sources for exploiting multi-aspect protein feature information. Gene ontology, hereinafter referred to as GO, uses a controlled vocabulary to depict biological molecules or gene products in terms of biological process, molecular function and cellular component. With the rapid expansion of annotated protein sequences, gene ontology has become a general protein feature that can be used to construct predictive models in computational biology. Existing models generally either concatenated the GO terms into a flat binary vector or applied majority-vote based ensemble learning for protein subcellular localization, both of which can not estimate the individual discriminative abilities of the three aspects of gene ontology. 相似文献

19.

MultiLoc2: integrating phylogeny and Gene Ontology terms improves subcellular protein localization prediction 总被引：2，自引：0，他引：2

Torsten Blum Sebastian Briesemeister Oliver Kohlbacher 《BMC bioinformatics》2009,10(1):274

Background

Knowledge of subcellular localization of proteins is crucial to proteomics, drug target discovery and systems biology since localization and biological function are highly correlated. In recent years, numerous computational prediction methods have been developed. Nevertheless, there is still a need for prediction methods that show more robustness and higher accuracy. 相似文献

20.

ProLoc-GO: Utilizing informative Gene Ontology terms for sequence-based prediction of protein subcellular localization

Wen-Lin Huang Chun-Wei Tung Shih-Wen Ho Shiow-Fen Hwang Shinn-Ying Ho 《BMC bioinformatics》2008,9(1):80

Background

Gene Ontology (GO) annotation, which describes the function of genes and gene products across species, has recently been used to predict protein subcellular and subnuclear localization. Existing GO-based prediction methods for protein subcellular localization use the known accession numbers of query proteins to obtain their annotated GO terms. An accurate prediction method for predicting subcellular localization of novel proteins without known accession numbers, using only the input sequence, is worth developing. 相似文献