首页 | 本学科首页   官方微博 | 高级检索  
     

人类蛋白编码基因局部GC水平相关性分析
引用本文:陈祥贵,胡军,杨潇. 人类蛋白编码基因局部GC水平相关性分析[J]. 遗传, 2008, 30(9): 1169-1174. DOI: 10.3724/SP.J.1005.2008.01169
作者姓名:陈祥贵  胡军  杨潇
作者单位:西华大学生物工程学院, 成都610039
基金项目:四川省应用基础研究计划
摘    要:GC含量是基因组DNA序列碱基组成的重要特征, 蕴涵基因结构、功能和进化信息。文中通过从公共数据库提取7 992个非冗余的人类蛋白质编码基因DNA序列, 分析了基因序列不同区域的局部GC含量和相关性。结果表明: 基因局部GC含量呈现不均一性, 5′非翻译区GC水平最高, 为62.56%; 而3′非翻译区GC水平最低, 为43.97%。3′侧翼序列的GC含量能较好地代表基因所在区域DNA长片段的GC水平。虽然开放阅读框的GC含量比内含子、3′非翻译区和3′侧翼序列的GC含量高, 但4个区域的GC含量之间均存在较高的相关性。密码子第三位置的平均GC含量(GC3)为58.09%, 显著高于密码子第一位置和第二位置的GC含量, 且与开放阅读框的GC水平高度相关, 相关系数高达0.91。GC3与内含子、3′非翻译区、3′侧翼序列的GC水平相关性也较高, GC3对3′侧翼序列的GC含量的直线回归斜率为1.25。因此, GC3可作为基因所在区域GC水平变化的敏感性指标。而密码子第一位置和第二位置以及5′侧翼序列和5′非翻译区GC水平与基因其他区域的GC水平的相关性较弱。该研究结果提示: 基因蛋白编码区密码子第三位置、内含子、3′非翻译区和3′侧翼序列的碱基可能经历了相近的进化过程, 而蛋白编码区密码子第一位置和第二位置、5′侧翼序列和5′非翻译区由于功能的需要而经历了不同的突变和选择。

关 键 词:人类蛋白编码基因  相关  局部GC含量  
收稿时间:2007-12-26
修稿时间:2008-01-18

Analysis of correlation of local GC level in human protein coding genes
CHEN Xiang-Gui,HU-Jun,YANG Xiao. Analysis of correlation of local GC level in human protein coding genes[J]. Hereditas, 2008, 30(9): 1169-1174. DOI: 10.3724/SP.J.1005.2008.01169
Authors:CHEN Xiang-Gui  HU-Jun  YANG Xiao
Affiliation:School of Bioengineering, Xihua University, Chengdu 610039, China
Abstract:GC level is an important feature of genomic composition, which significantly improve our understanding of structure, function and evolution of genes. In this paper, the nonredundant DNA sequence of 7 992 human protein coding genes were retrieved from public database and the local GC level of different sequence regions and correlation between GC levels were analyzed.. The results showed that the GC levels of different sequence regions were strikingly nonuniform. 5' untranslated regions were of richest GC, with average GC content being 62.5%. 3'-untranslated regions were of poorest GC, with average GC content being 43.97%. GC contents of 3' flanking sequences profoundly matched the GC levels of DNA large fragments where the genes were located. Although the GC contents of open reading frames (ORFs) were higher than that of intron, 3' non-translated region and 3' flanking sequences, high correlation existed among the GC contents of the four regions. Average GC content of the third codon position (GC3) was 58.9%, higher than that of the fist and second position, and showed high correlation to GC contents of ORFs, with correlation coefficients being 0.91, besides of its significant association with GC contents of intron, 3'-untranslated region and 3' flanking sequences. Moreover, the linear regression of GC3 against GC contents of 3' flanking sequences yielded a slope of 1.25. Thus, GC3 was a sensitive indicator for GC change of local genome. As for 5' flanking sequences, 5' untranslated regions, fist and second codon position, however, their GC level exhibited weaker correlation with that of other regions. These results suggest that the third codon positions, introns, 3'-untranslated regions and 3' flanking sequences may evolve similarly while first and second codon positions, 5' flanking sequences and 5' untranslated region were expected to bear more selective stress for holding their functions.
Keywords:
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《遗传》浏览原始摘要信息
点击此处可从《遗传》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号