mPLR-Loc: An adaptive decision multi-label classifier based on penalized logistic regression for protein subcellular localization prediction |
| |
Affiliation: | 1. Immunogenetic Laboratory, Cisanello Hospital, via Paradisa 2, 56124 Pisa, Italy;2. Department of Physiological Science, University of Pisa, Pisa, Italy;1. Quantitative Model Organism Proteomics (Q-MOP), Institute of Molecular Life Sciences, University of Zurich, Winterthurerstrasse 190, 8057 Zurich, Switzerland;2. Institute of Molecular Systems Biology, ETH Zurich, Auguste-Piccard-Hof 1, 8093 Zurich, Switzerland;3. Biozentrum, University of Basel, Klingelbergstrasse 50/70, 4056 Basel, Switzerland |
| |
Abstract: | Proteins located in appropriate cellular compartments are of paramount importance to exert their biological functions. Prediction of protein subcellular localization by computational methods is required in the post-genomic era. Recent studies have been focusing on predicting not only single-location proteins but also multi-location proteins. However, most of the existing predictors are far from effective for tackling the challenges of multi-label proteins. This article proposes an efficient multi-label predictor, namely mPLR-Loc, based on penalized logistic regression and adaptive decisions for predicting both single- and multi-location proteins. Specifically, for each query protein, mPLR-Loc exploits the information from the Gene Ontology (GO) database by using its accession number (AC) or the ACs of its homologs obtained via BLAST. The frequencies of GO occurrences are used to construct feature vectors, which are then classified by an adaptive decision-based multi-label penalized logistic regression classifier. Experimental results based on two recent stringent benchmark datasets (virus and plant) show that mPLR-Loc remarkably outperforms existing state-of-the-art multi-label predictors. In addition to being able to rapidly and accurately predict subcellular localization of single- and multi-label proteins, mPLR-Loc can also provide probabilistic confidence scores for the prediction decisions. For readers’ convenience, the mPLR-Loc server is available online (http://bioinfo.eie.polyu.edu.hk/mPLRLocServer). |
| |
Keywords: | Protein subcellular localization Multi-location proteins Adaptive decision Logistic regression Multi-label classification |
本文献已被 ScienceDirect 等数据库收录! |
|