首页 | 本学科首页   官方微博 | 高级检索  
     


Information-theoretical entropy as a measure of sequence variability.
Authors:P S Shenkin  B Erman  L D Mastrandrea
Affiliation:Department of Chemistry, Barnard College, New York, New York 10027.
Abstract:We propose the use of the information-theoretical entrophy, S = -sigman pi log2 pi, as a measure of variability at a given position in a set of aligned sequences. pi stands for the fraction of times the i-th type appears at a position. For protein sequences, the sum has up to 20 terms, for nucleotide sequences, up to 4 terms, and for codon sequences, up to 61 terms. We compare S and Vs, a related measure, in detail with Vk, the traditional measure of immunoglobulin sequence variability, both in the abstract and as applied to the immunoglobulins. We conclude that S has desirable mathematical properties that Vk lacks and has intuitive and statistical meanings that accord well with the notion of variability. We find that Vk and the S-based measures are highly correlated for the immunoglobulins. We show by analysis of sequence data and by means of a mathematical model that this correlation is due to a strong tendency for the frequency of occurrence of amino acid types at a given position to be log-linear. It is not known whether the immunoglobulins are typical or atypical of protein families in this regard, nor is the origin of the observed rank-frequency distribution obvious, although we discuss several possible etiologies.
Keywords:information theory  entropy  variability  sequence comparison  immunoglobulins  antibodies
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号