(1) Swiss Institute of Bioinformatics, Computational Cancer Genomics Group – ISREC, Ch. des Boveresses 155, 1066 Epalinges, Switzerland;(2) Swiss Institute of Bioinformatics, Vital IT Group, BEP-UNIL, 1015 Lausanne, Switzerland
Abstract:
Background
Whole-genome sequencing projects are rapidly producing an enormous number of new sequences. Consequently almost every family of proteins now contains hundreds of members. It has thus become necessary to develop tools, which classify protein sequences automatically and also quickly and reliably. The difficulty of this task is intimately linked to the mechanism by which protein sequences diverge, i.e. by simultaneous residue substitutions, insertions and/or deletions and whole domain reorganisations (duplications/swapping/fusion).