期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

TESTLoc: protein subcellular localization prediction from EST data

Yao-Qing Shen Gertraud Burger 《BMC bioinformatics》2010,11(1):563

Background

The eukaryotic cell has an intricate architecture with compartments and substructures dedicated to particular biological processes. Knowing the subcellular location of proteins not only indicates how bio-processes are organized in different cellular compartments, but also contributes to unravelling the function of individual proteins. Computational localization prediction is possible based on sequence information alone, and has been successfully applied to proteins from virtually all subcellular compartments and all domains of life. However, we realized that current prediction tools do not perform well on partial protein sequences such as those inferred from Expressed Sequence Tag (EST) data, limiting the exploitation of the large and taxonomically most comprehensive body of sequence information from eukaryotes. 相似文献

2.

Adjusted geometric-mean: a novel performance measure for imbalanced bioinformatics datasets learning

Batuwita R Palade V 《Journal of bioinformatics and computational biology》2012,10(4):1250003

One common and challenging problem faced by many bioinformatics applications, such as promoter recognition, splice site prediction, RNA gene prediction, drug discovery and protein classification, is the imbalance of the available datasets. In most of these applications, the positive data examples are largely outnumbered by the negative data examples, which often leads to the development of sub-optimal prediction models having high negative recognition rate (Specificity = SP) and low positive recognition rate (Sensitivity = SE). When class imbalance learning methods are applied, usually, the SE is increased at the expense of reducing some amount of the SP. In this paper, we point out that in these data-imbalanced bioinformatics applications, the goal of applying class imbalance learning methods would be to increase the SE as high as possible by keeping the reduction of SP as low as possible. We explain that the existing performance measures used in class imbalance learning can still produce sub-optimal models with respect to this classification goal. In order to overcome these problems, we introduce a new performance measure called Adjusted Geometric-mean (AGm). The experimental results obtained on ten real-world imbalanced bioinformatics datasets demonstrates that the AGm metric can achieve a lower rate of reduction of SP than the existing performance metrics, when increasing the SE through class imbalance learning methods. This characteristic of AGm metric makes it more suitable for achieving the proposed classification goal in imbalanced bioinformatics datasets learning. 相似文献

3.

Ensuring reliable datasets for environmental models and forecasts 总被引：2，自引：0，他引：2

Emery R. Boose Aaron M. Ellison Leon J. Osterweil Lori A. Clarke Rodion Podorozhny Julian L. Hadley Alexander Wise David R. Foster 《Ecological Informatics》2007,2(3):237-247

相似文献

4.

Biomolecule function: no reliable prediction from cell culture 总被引：1，自引：0，他引：1

Kolter T Magin TM Sandhoff K 《Traffic (Copenhagen, Denmark)》2000,1(10):803-804

相似文献

5.

Characterization and prediction of protein nucleolar localization sequences

Michelle S. Scott Fran?ois-Michel Boisvert Mark D. McDowall Angus I. Lamond Geoffrey J. Barton 《Nucleic acids research》2010,38(21):7388-7399

Although the nucleolar localization of proteins is often believed to be mediated primarily by non-specific retention to core nucleolar components, many examples of short nucleolar targeting sequences have been reported in recent years. In this article, 46 human nucleolar localization sequences (NoLSs) were collated from the literature and subjected to statistical analysis. Of the residues in these NoLSs 48% are basic, whereas 99% of the residues are predicted to be solvent-accessible with 42% in α-helix and 57% in coil. The sequence and predicted protein secondary structure of the 46 NoLSs were used to train an artificial neural network to identify NoLSs. At a true positive rate of 54%, the predictor’s overall false positive rate (FPR) is estimated to be 1.52%, which can be broken down to FPRs of 0.26% for randomly chosen cytoplasmic sequences, 0.80% for randomly chosen nucleoplasmic sequences and 12% for nuclear localization signals. The predictor was used to predict NoLSs in the complete human proteome and 10 of the highest scoring previously unknown NoLSs were experimentally confirmed. NoLSs are a prevalent type of targeting motif that is distinct from nuclear localization signals and that can be computationally predicted. 相似文献

6.

Antibody-protein interactions: benchmark datasets and prediction tools evaluation

Julia V Ponomarenko Philip E Bourne 《BMC structural biology》2007,7(1):64

Background

The ability to predict antibody binding sites (aka antigenic determinants or B-cell epitopes) for a given protein is a precursor to new vaccine design and diagnostics. Among the various methods of B-cell epitope identification X-ray crystallography is one of the most reliable methods. Using these experimental data computational methods exist for B-cell epitope prediction. As the number of structures of antibody-protein complexes grows, further interest in prediction methods using 3D structure is anticipated. This work aims to establish a benchmark for 3D structure-based epitope prediction methods. 相似文献

7.

PROlocalizer: integrated web service for protein subcellular localization prediction

Kirsti Laurila Mauno Vihinen 《Amino acids》2011,40(3):975-980

Subcellular localization is an important protein property, which is related to function, interactions and other features. As experimental determination of the localization can be tedious, especially for large numbers of proteins, a number of prediction tools have been developed. We developed the PROlocalizer service that integrates 11 individual methods to predict altogether 12 localizations for animal proteins. The method allows the submission of a number of proteins and mutations and generates a detailed informative document of the prediction and obtained results. PROlocalizer is available at . 相似文献

8.

PSORT-B: Improving protein subcellular localization prediction for Gram-negative bacteria 总被引：2，自引：0，他引：2

Gardy JL Spencer C Wang K Ester M Tusnády GE Simon I Hua S deFays K Lambert C Nakai K Brinkman FS 《Nucleic acids research》2003,31(13):3613-3617

Automated prediction of bacterial protein subcellular localization is an important tool for genome annotation and drug discovery. PSORT has been one of the most widely used computational methods for such bacterial protein analysis; however, it has not been updated since it was introduced in 1991. In addition, neither PSORT nor any of the other computational methods available make predictions for all five of the localization sites characteristic of Gram-negative bacteria. Here we present PSORT-B, an updated version of PSORT for Gram-negative bacteria, which is available as a web-based application at http://www.psort.org. PSORT-B examines a given protein sequence for amino acid composition, similarity to proteins of known localization, presence of a signal peptide, transmembrane alpha-helices and motifs corresponding to specific localizations. A probabilistic method integrates these analyses, returning a list of five possible localization sites with associated probability scores. PSORT-B, designed to favor high precision (specificity) over high recall (sensitivity), attained an overall precision of 97% and recall of 75% in 5-fold cross-validation tests, using a dataset we developed of 1443 proteins of experimentally known localization. This dataset, the largest of its kind, is freely available, along with the PSORT-B source code (under GNU General Public License). 相似文献

9.

蛋白质亚细胞定位预测中的序列编码技术

王正华张振慧王勇献《生物信息学》2007,5(2):82-85,89

蛋白质序列的编码是亚细胞定位预测问题中的关键技术之一。该文较为详细地介绍了目前已有的蛋白质序列编码算法;并指出了序列编码中存在的一些问题及可能的发展方向。相似文献

10.

SherLoc: high-accuracy prediction of protein subcellular localization by integrating text and protein sequence data 总被引：1，自引：0，他引：1

Shatkay H Höglund A Brady S Blum T Dönnes P Kohlbacher O 《Bioinformatics (Oxford, England)》2007,23(11):1410-1417

MOTIVATION: Knowing the localization of a protein within the cell helps elucidate its role in biological processes, its function and its potential as a drug target. Thus, subcellular localization prediction is an active research area. Numerous localization prediction systems are described in the literature; some focus on specific localizations or organisms, while others attempt to cover a wide range of localizations. RESULTS: We introduce SherLoc, a new comprehensive system for predicting the localization of eukaryotic proteins. It integrates several types of sequence and text-based features. While applying the widely used support vector machines (SVMs), SherLoc's main novelty lies in the way in which it selects its text sources and features, and integrates those with sequence-based features. We test SherLoc on previously used datasets, as well as on a new set devised specifically to test its predictive power, and show that SherLoc consistently improves on previous reported results. We also report the results of applying SherLoc to a large set of yet-unlocalized proteins. AVAILABILITY: SherLoc, along with Supplementary Information, is available at: http://www-bs.informatik.uni-tuebingen.de/Services/SherLoc/ 相似文献

11.

SCLpred: protein subcellular localization prediction by N-to-1 neural networks

Mooney C Wang YH Pollastri G 《Bioinformatics (Oxford, England)》2011,27(20):2812-2819

相似文献

12.

Subcellular localization prediction with new protein encoding schemes

Oğul H Mumcuoğu EU 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2007,4(2):227-232

Subcellular localization is one of the key properties in functional annotation of proteins. Support vector machines (SVMs) have been widely used for automated prediction of subcellular localizations. Existing methods differ in the protein encoding schemes used. In this study, we present two methods for protein encoding to be used for SVM-based subcellular localization prediction: n-peptide compositions with reduced amino acid alphabets for larger values of n and pairwise sequence similarity scores based on whole sequence and N-terminal sequence. We tested the methods on a common benchmarking data set that consists of 2,427 eukaryotic proteins with four localization sites. As a result of 5-fold cross-validation tests, the encoding with n-peptide compositions provided the accuracies of 84.5, 88.9, 66.3, and 94.3 percent for cytoplasmic, extracellular, mitochondrial, and nuclear proteins, where the overall accuracy was 87.1 percent. The second method provided 83.6, 87.7, 87.9, and 90.5 percent accuracies for individual locations and 87.8 percent overall accuracy. A hybrid system, which we called PredLOC, makes a final decision based on the results of the two presented methods which achieved an overall accuracy of 91.3 percent, which is better than the achievements of many of the existing methods. The new system also outperformed the recent methods in the experiments conducted on a new-unique SWISSPROT test set 相似文献

13.

ProLoc: prediction of protein subnuclear localization using SVM with automatic selection from physicochemical composition features 总被引：3，自引：0，他引：3

Huang WL Tung CW Huang HL Hwang SF Ho SY 《Bio Systems》2007,90(2):573-581

Accurate prediction methods of protein subnuclear localizations rely on the cooperation between informative features and classifier design. Support vector machine (SVM) based learning methods are shown effective for predictions of protein subcellular and subnuclear localizations. This study proposes an evolutionary support vector machine (ESVM) based classifier with automatic selection from a large set of physicochemical composition (PCC) features to design an accurate system for predicting protein subnuclear localization, named ProLoc. ESVM using an inheritable genetic algorithm combined with SVM can automatically determine the best number m of PCC features and identify m out of 526 PCC features simultaneously. To evaluate ESVM, this study uses two datasets SNL6 and SNL9, which have 504 proteins localized in 6 subnuclear compartments and 370 proteins localized in 9 subnuclear compartments. Using a leave-one-out cross-validation, ProLoc utilizing the selected m=33 and 28 PCC features has accuracies of 56.37% for SNL6 and 72.82% for SNL9, which are better than 51.4% for the SVM-based system using k-peptide composition features applied on SNL6, and 64.32% for an optimized evidence-theoretic k-nearest neighbor classifier utilizing pseudo amino acid composition applied on SNL9, respectively. 相似文献

14.

The prediction of protein subcellular localization from sequence: a shortcut to functional genome annotation. 总被引：2，自引：0，他引：2

Rita Casadio Pier Luigi Martelli Andrea Pierleoni 《Briefings in Functional Genomics and Prot》2008,7(1):63-73

Automated sequence annotation is a major goal of post-genomic era with hundreds of genomes in the databases, from both prokaryotes and eukaryotes. While the number of fully sequenced chromosomes from microbial organisms exponentially increased in the last decade above 600, presently we know the whole DNA content of only 25 eukaryotic organisms, including Homo sapiens. However, the process of genome annotation is far from being completed. This is particularly relevant in eukaryotes, whose cells contain several subcellular compartments, or organelles, enclosed by membranes, where different relevant functions are performed. Translocation across the membrane into the organelles is a highly regulated and complex cellular process. Indeed different proteins and/or protein isoforms, originated from genes by alternative splicing, may be conveyed to different cell compartments, depending on their specific role in the cell. During recent years the prediction of subcellular localization (SL) by computational means has been an active research area. Several methods are presently available based on different notions and addressing different aspects of SL. This review provides a short overview of the most well performing methods described in the literature, highlighting their predictive capabilities and different applications. 相似文献

15.

Comparative genomics for reliable protein-function prediction from genomic data

Huynen MA Snel B van Noort V 《Trends in genetics : TIG》2004,20(8):340-344

Genomic data provide invaluable, yet unreliable information about protein function. However, if the overlap in information among various genomic datasets is taken into account, one observes an increase in the reliability of the protein-function predictions that can be made. Recently published approaches achieved this either by comparing the same type of data from multiple species (horizontal comparative genomics) or by using subtle, Bayesian methods to compare different types of genomic data from a single species (vertical comparative genomics). In this article, we discuss these methods, illustrating horizontal comparative genomics by comparing yeast two-hybrid (Y2H) data from Saccharomyces cerevisiae with Y2H data from Drosophila melanogaster, and illustrating vertical comparative genomics by comparing RNA expression data with proteomic data from Plasmodium falciparum. 相似文献

16.

pSLIP: SVM based protein subcellular localization prediction using multiple physicochemical properties

Deepak?Sarda Gek?Huey?Chua Kuo-Bin?Li Arun?Krishnan Email author 《BMC bioinformatics》2005,6(1):152

Background

Protein subcellular localization is an important determinant of protein function and hence, reliable methods for prediction of localization are needed. A number of prediction algorithms have been developed based on amino acid compositions or on the N-terminal characteristics (signal peptides) of proteins. However, such approaches lead to a loss of contextual information. Moreover, where information about the physicochemical properties of amino acids has been used, the methods employed to exploit that information are less than optimal and could use the information more effectively. 相似文献

17.

Support vector machine approach for protein subcellular localization prediction 总被引：47，自引：0，他引：47

Hua S Sun Z 《Bioinformatics (Oxford, England)》2001,17(8):721-728

MOTIVATION: Subcellular localization is a key functional characteristic of proteins. A fully automatic and reliable prediction system for protein subcellular localization is needed, especially for the analysis of large-scale genome sequences. RESULTS: In this paper, Support Vector Machine has been introduced to predict the subcellular localization of proteins from their amino acid compositions. The total prediction accuracies reach 91.4% for three subcellular locations in prokaryotic organisms and 79.4% for four locations in eukaryotic organisms. Predictions by our approach are robust to errors in the protein N-terminal sequences. This new approach provides superior prediction performance compared with existing algorithms based on amino acid composition and can be a complementary method to other existing methods based on sorting signals. AVAILABILITY: A web server implementing the prediction method is available at http://www.bioinfo.tsinghua.edu.cn/SubLoc/. SUPPLEMENTARY INFORMATION: Supplementary material is available at http://www.bioinfo.tsinghua.edu.cn/SubLoc/. 相似文献

18.

Genome-wide protein localization prediction strategies for gram negative bacteria

Romine MF 《BMC genomics》2011,12(Z1):S1

相似文献

19.

MultiLoc2: integrating phylogeny and Gene Ontology terms improves subcellular protein localization prediction 总被引：2，自引：0，他引：2

Torsten Blum Sebastian Briesemeister Oliver Kohlbacher 《BMC bioinformatics》2009,10(1):274

Background

Knowledge of subcellular localization of proteins is crucial to proteomics, drug target discovery and systems biology since localization and biological function are highly correlated. In recent years, numerous computational prediction methods have been developed. Nevertheless, there is still a need for prediction methods that show more robustness and higher accuracy. 相似文献

20.

PSORTb v.2.0: expanded prediction of bacterial protein subcellular localization and insights gained from comparative proteome analysis 总被引：15，自引：0，他引：15

Gardy JL Laird MR Chen F Rey S Walsh CJ Ester M Brinkman FS 《Bioinformatics (Oxford, England)》2005,21(5):617-623

MOTIVATION: PSORTb v.1.1 is the most precise bacterial localization prediction tool available. However, the program's predictive coverage and recall are low and the method is only applicable to Gram-negative bacteria. The goals of the present work are as follows: increase PSORTb's coverage while maintaining the existing precision level, expand it to include Gram-positive bacteria and then carry out a comparative analysis of localization. RESULTS: An expanded database of proteins of known localization and new modules using frequent subsequence-based support vector machines was introduced into PSORTb v.2.0. The program attains a precision of 96% for Gram-positive and Gram-negative bacteria and predictive coverage comparable to other tools for whole proteome analysis. We show that the proportion of proteins at each localization is remarkably consistent across species, even in species with varying proteome size. AVAILABILITY: Web-based version: http://www.psort.org/psortb. Standalone version: Available through the website under GNU General Public License. CONTACT: psort-mail@sfu.ca, brinkman@sfu.ca SUPPLEMENTARY INFORMATION: http://www.psort.org/psortb/supplementaryinfo.html. 相似文献