Estimating the fraction of invariable codons with a capture-recapture method |
| |
Authors: | Arend Sidow Trang Nguyen Terence P Speed |
| |
Institution: | (1) Department of Molecular and Cell Biology, University of California at Berkeley, 401 Barker Hall, 94720 Berkeley, CA, USA;(2) Department of Statistics, University of California at Berkeley, 367 Evans Hall, 94720 Berkeley, CA, USA |
| |
Abstract: | Summary A codon-based approach to estimating the number of variable sites in a protein is presented. When first and second positions of codons are assumed to be replacement positions, a capture-recapture model can be used to estimate the number of variable codons from every pair of homologous and aligned sequences. The capture-recapture estimate is compared to a maximum likelihood estimate of the number of variable codons and to previous approaches that estimate the number of variable sites (not codons) in a sequence. Computer simulations are presented that show under which circumstances the capture-recapture estimate can be used to correct biases in distance matrices. Analysis of published sequences of two genes, calmodulin and serum albumin, shows that distance corrections that employ a capture-recapture estimate of the number of variable sites may be considerably different from corrections that assume that the number of variable sites is equal to the total number of positions in the sequence.
Offprint requests to: A. Sidow |
| |
Keywords: | Protein-coding sequences DNA sequences Evolution Evolutionary rates Rate heterogeneity Maximum likelihood Statistical testing |
本文献已被 SpringerLink 等数据库收录! |
|