Evolutionary distances for protein-coding sequences: modeling site- specific residue frequencies |
| |
Authors: | Halpern, AL Bruno, WJ |
| |
Affiliation: | Los Alamos National Laboratory, New Mexico, USA. ahalpern@ender.unm.edu |
| |
Abstract: | Estimation of evolutionary distances from coding sequences must take intoaccount protein-level selection to avoid relative underestimation of longerevolutionary distances. Current modeling of selection via site-to-site rateheterogeneity generally neglects another aspect of selection, namelyposition-specific amino acid frequencies. These frequencies determine themaximum dissimilarity expected for highly diverged but functionally andstructurally conserved sequences, and hence are crucial for estimating longdistances. We introduce a codon- level model of coding sequence evolutionin which position-specific amino acid frequencies are free parameters. Inour implementation, these are estimated from an alignment using methodsdescribed previously. We use simulations to demonstrate the importance andfeasibility of modeling such behavior; our model produces linear distanceestimates over a wide range of distances, while several alternative modelsunderestimate long distances relative to short distances. Site-to-sitedifferences in rates, as well as synonymous/nonsynonymous andfirst/second/third-codon-position differences, arise as a naturalconsequence of the site-to-site differences in amino acid frequencies. |
| |
Keywords: | |
本文献已被 Oxford 等数据库收录! |
|