首页 | 本学科首页   官方微博 | 高级检索  
   检索      


Graphic analysis of codon usage strategy in 1490 human proteins
Authors:Chun-Ting Zhang and Kuo-Chen Chou
Institution:(1) Computational Chemistry, Upjohn Research Laboratories, 49001 Kalamazoo, Michigan
Abstract:The frequencies of bases A (adenine), C (cytosine), G (guanine), and T (thymine) occurring in codon positioni, denoted bya i ,c i ,g i , andt i , respectively (i=1, 2, 3), have been calculated and diagrammatized for the 1490 human proteins in the codon usage table for primate genes compiled recently. Based on the characteristic graphs thus obtained, an overall picture of codon base distribution has been provided, and the relevant biological implication discussed. For the first codon position, it is shown in most cases that G is the most dominant base, and that the relationshipg 1>a 1>c 1>t 1 generally holds true. For the second codon position, A is generally the most dominant base and G is the one with the least occurrence frequently, with the relationship ofa 2>t 2>c 2>g 2. As to the third codon position, the values ofg 3+c 3 vary from 0.27 to 1, roughly keeping the relationship ofc 3>g 3>a 3=t 3 for the majority of cases. Interestingly, if the average frequencies for bases A, C, G, and T are defined as 
$$\bar a = {{(a_1  + a_2  + a_3 )} \mathord{\left/ {\vphantom {{(a_1  + a_2  + a_3 )} 3}} \right. \kern-\nulldelimiterspace} 3}, \bar c = {{(c_1  + c_2  + c_3 )} \mathord{\left/ {\vphantom {{(c_1  + c_2  + c_3 )} 3}} \right. \kern-\nulldelimiterspace} 3}, \bar g = {{(g_1  + g_2  + g_3 )} \mathord{\left/ {\vphantom {{(g_1  + g_2  + g_3 )} 3}} \right. \kern-\nulldelimiterspace} 3} and \bar t = {{(t1 + t2 + t3)} \mathord{\left/ {\vphantom {{(t1 + t2 + t3)} 3}} \right. \kern-\nulldelimiterspace} 3}$$
, respectively, we find that 
$$\bar a^2  + \bar c^2  +  \bar g^2  +  \bar t^2< \tfrac{1}{3}$$
is valid almost without exception. Such a characteristic inequality might reflect some inherent rule of codon usage, although its biological implications is unclear. An important advantage by introducing graphic methods is to make it possible to catch essential features from a huge amount of data by a direct and intuitive examination. The method used here allows one to see means and variances, and also spot outliers. This is particularly useful for finding and classifying similarity patterns and relationships in data sets of long sequences, such as DNA coding sequences. The current method also holds a great potential for the study of molecular evolution from the viewpoint of genetic code whose data have been accumulated rapidly and are to continue growth at a much faster pace.On sabbatical leave from Department of Physics, Tianjin University, Tianjin, China.
Keywords:Codon position  DNA bases  characteristic inequality  mapping point
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号