Automated Alphabet Reduction for Protein Datasets |
| |
Authors: | Jaume Bacardit Michael Stout Jonathan D Hirst Alfonso Valencia Robert E Smith and Natalio Krasnogor |
| |
Institution: | (1) ASAP research group, School of Computer Science, University of Nottingham, Jubilee Campus, Wollaton Road, Nottingham, NG8 1BB, UK;(2) MYCIB, School of Biosciences, University of Nottingham, Sutton Bonington, LE12 5RD, UK;(3) School of Chemistry, University of Nottingham, University Park, Nottingham, NG7 2RD, UK;(4) Spanish National Cancer Research Centre, Melchor Fdez Almagro, 3., 28029 Madrid, Spain;(5) Dept. of Computer Science, University College London, Gower Street, London, WC1E 6BT, UK |
| |
Abstract: | Background We investigate automated and generic alphabet reduction techniques for protein structure prediction datasets. Reducing alphabet
cardinality without losing key biochemical information opens the door to potentially faster machine learning, data mining
and optimization applications in structural bioinformatics. Furthermore, reduced but informative alphabets often result in,
e.g., more compact and human-friendly classification/clustering rules. In this paper we propose a robust and sophisticated
alphabet reduction protocol based on mutual information and state-of-the-art optimization techniques. |
| |
Keywords: | |
本文献已被 SpringerLink 等数据库收录! |
|