(1) Bioinformatics Group, Centre for Infectious Disease, Institute of Cell and Molecular Science, Queen Mary's School of Medicine and Dentistry, University of London, 32 Newark St, London E1 2AA, UK
Abstract:
Background
Annotation of sequences that share little similarity to sequences of known function remains a major obstacle in genome annotation. Some of the best methods of detecting remote relationships between protein sequences are based on matching sequence profiles. We analyse the superfamily specific performance of sequence profile-profile matching. Our benchmark consists of a set of 16 protein superfamilies that are highly diverse at the sequence level. We relate the performance to the number of sequences in the profiles, the profile diversity and the extent of structural conservation in the superfamily.