Regional Context in the Alignment of Biological Sequence Pairs |
| |
Authors: | Raymond Sammut Gavin Huttley |
| |
Institution: | (1) Department of Genome Biology, John Curtin School of Medical Research, Building 54, The Australian National University, Canberra, ACT, 0200, Australia |
| |
Abstract: | Sequence divergence derives from either point substitution or indel (insertion or deletion) processes. We investigated the
rates of these two processes both in protein and non-protein coding DNA. We aligned sequence pairs using two pair-hidden Markov
models (PHMMs) conjoined by one silent state. The two PHMMs had their own set of parameters to model rates in their respective
regions. The aim was to test the hypothesis that the indel mutation rate mimics the point mutation rate. That is, indels are
found less often in conserved regions (slow point substitution rate) and more often in non-conserved regions (fast point substitution
rate). Both polypeptides and rRNA molecules in our data exhibited a clear distinction between slow and fast rates of the two
processes. These two rates served as surrogates to conserved and non-conserved secondary structure components, respectively.
With polypeptides we found both the fast indel rate and the fast replacement rate were co-located with hydrophilic residues.
We also found that the average concordance, of our alignments with corresponding curated alignments, improves markedly when
the model allows either of the two fast rates to colocate with hydrophilic residues. With rRNA molecules, our model did not
detect colocation between the fast indel rate and the fast substitution rate. Nevertheless, coupling the indel rates with
the point substitution rates across the two regions markedly increased model fit. This result suggests that rRNA pairwise
alignments should be modeled after allowing for the two processes to vary simultaneously and independently in the two regions. |
| |
Keywords: | |
本文献已被 PubMed SpringerLink 等数据库收录! |
|