Abstract: | A growing number of solved protein structures display an elongated structuraldomain, denoted here as alpha-rod, composed of stacked pairs of anti-parallelalpha-helices. Alpha-rods are flexible and expose a large surface, which makesthem suitable for protein interaction. Although most likely originating bytandem duplication of a two-helix unit, their detection using sequencesimilarity between repeats is poor. Here, we show that alpha-rod repeats can bedetected using a neural network. The network detects more repeats than areidentified by domain databases using multiple profiles, with a low level offalse positives (<10%). We identify alpha-rod repeats inapproximately 0.4% of proteins in eukaryotic genomes. We theninvestigate the results for all human proteins, identifying alpha-rod repeatsfor the first time in six protein families, including proteins STAG1-3, SERAC1,and PSMD1-2 & 5. We also characterize a short version of these repeatsin eight protein families of Archaeal, Bacterial, and Fungal species. Finally,we demonstrate the utility of these predictions in directing experimental workto demarcate three alpha-rods in huntingtin, a protein mutated inHuntington''s disease. Using yeast two hybrid analysis and animmunoprecipitation technique, we show that the huntingtin fragments containingalpha-rods associate with each other. This is the first definition of domains inhuntingtin and the first validation of predicted interactions between fragmentsof huntingtin, which sets up directions toward functional characterization ofthis protein. An implementation of the repeat detection algorithm is availableas a Web server with a simple graphical output: http://www.ogic.ca/projects/ard. This can be further visualizedusing BiasViz, a graphic tool for representation of multiple sequencealignments. |