Protein sequences classification by means of feature extraction with substitution matrices |
| |
Authors: | Rabie Saidi Engelbert Mephu Nguifo |
| |
Institution: | 1.LIMOS - Blaise Pascal University,Clermont University,Clermont-Ferrand,France;2.LIMOS,CNRS UMR,Aubière,France;3.Department of Computer Science - FSJ,University of Jendouba,Jendouba,Tunisia;4.URPAH - FST,University of Tunis El Manar,Tunis,Tunisia;5.Department of Computer Science - FSG,University of Gafsa,Gafsa,Tunisia |
| |
Abstract: | Background This paper deals with the preprocessing of protein sequences for supervised classification. Motif extraction is one way to
address that task. It has been largely used to encode biological sequences into feature vectors to enable using well-known
machine-learning classifiers which require this format. However, designing a suitable feature space, for a set of proteins,
is not a trivial task. For this purpose, we propose a novel encoding method that uses amino-acid substitution matrices to
define similarity between motifs during the extraction step. |
| |
Keywords: | |
本文献已被 SpringerLink 等数据库收录! |
|