Abstract: | The complete nucleotide sequence and exon/intron structure of the rat embryonic skeletal muscle myosin heavy chain (MHC) gene has been determined. This gene comprises 24 X 10(3) bases of DNA and is split into 41 exons. The exons encode a 6035 nucleotide (nt) long mRNA consisting of 90 nt of 5' untranslated, 5820 nt of protein coding and 125 nt of 3' untranslated sequence. The rat embryonic MHC polypeptide is encoded by exons 3 to 41 and contains 1939 amino acid residues with a calculated Mr of 223,900. Its amino acid sequence displays the structural features typical for all sarcomeric MHCs, i.e. an amino-terminal "globular" head region and a carboxy-terminal alpha-helical rod portion that shows the characteristics of a coiled coil with a superimposed 28-residue repeat pattern interrupted at only four positions by "skip" residues. The complex structure of the rat embryonic MHC gene and the conservation of intron locations in this and other MHC genes are indicative of a highly split ancestral sarcomeric MHC gene. Introns in the rat embryonic gene interrupt the coding sequence at the boundaries separating the proteolytic subfragments of the head, but not at the head/rod junction or between the 28-residue repeats present within the rod. Therefore, there is little evidence for exon shuffling and intron-dependent evolution by gene duplication as a mechanism for the generation of the ancestral MHC gene. Rather, intron insertion into a previously non-split ancestral MHC rod gene consisting of multiple tandemly arranged 28-residue-encoding repeats, or convergent evolution of an originally non-repetitive ancestral MHC rod gene must account for the observed structure of the rod-encoding portion of present-day MHC genes. |