GFam: a platform for automatic annotation of gene families |
| |
Authors: | Rajkumar Sasidharan Tam��s Nepusz David Swarbreck Eva Huala Alberto Paccanaro |
| |
Affiliation: | 1.Department of Molecular, Cell and Developmental Biology, University of California at Los Angeles, Los Angeles, CA 90095, USA, 2.Department of Plant Biology, Carnegie Institution for Science, Stanford, CA, 94305 USA, 3.Department of Computer Science, Center for Systems and Synthetic Biology, Royal Holloway University of London, Egham Hill, Egham, Surrey, TW20 0EX, UK and 4.Bioinformatics, The Genome Analysis Centre, Norwich Research Park, Colney, Norwich, NR4 7UH, Norfolk, UK |
| |
Abstract: | We have developed GFam, a platform for automatic annotation of gene/protein families. GFam provides a framework for genome initiatives and model organism resources to build domain-based families, derive meaningful functional labels and offers a seamless approach to propagate functional annotation across periodic genome updates. GFam is a hybrid approach that uses a greedy algorithm to chain component domains from InterPro annotation provided by its 12 member resources followed by a sequence-based connected component analysis of un-annotated sequence regions to derive consensus domain architecture for each sequence and subsequently generate families based on common architectures. Our integrated approach increases sequence coverage by 7.2 percentage points and residue coverage by 14.6 percentage points higher than the coverage relative to the best single-constituent database within InterPro for the proteome of Arabidopsis. The true power of GFam lies in maximizing annotation provided by the different InterPro data sources that offer resource-specific coverage for different regions of a sequence. GFam’s capability to capture higher sequence and residue coverage can be useful for genome annotation, comparative genomics and functional studies. GFam is a general-purpose software and can be used for any collection of protein sequences. The software is open source and can be obtained from http://www.paccanarolab.org/software/gfam/. |
| |
Keywords: | |
|
|