人类新基因C17orf32的电子克隆和编码区序列RT-PCR验证
DOI:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

中国博士后科学基金资助项目(2920011217608062000).


In Silico Cloning of C17orf32, a Novel Human Gene and Verification of Its Coding Region by RT-PCR
Author:
Affiliation:

Fund Project:

This work was supported by a grant from The China Postdoctoral Science Foundation (2920011217608062000).

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    利用生物信息学与实验验证的技术路线,成功地克隆了人类新基因C17orf32的cDNA(GenBank登记号:AY074907和TPA: BK000260),发现C17orf32的完整开放阅读框架(ORF,31~657 bp)cDNA(627 bp)与人类假定基因LOC124919 ORF(25~807 bp)的25~651位只有一个碱基不同.经RT-PCR验证并cDNA测序、人类表达序列标签(EST)数据库的BLAST检索和基因组成规律分析三方面的结果,均支持C17orf32的序列,而不支持LOC124919的编码序列.C17orf32基因组序列全长4.610 kb,含有6个外显子和5个内含子,cDNA序列全长1 679 bp, ORF横跨全部6个外显子.该基因ORF翻译起始处符合Kozak规则,ORF起始码上游同一相位有终止码,ORF后有2个加尾信号和PolyA尾.C17orf32基因的成功克隆表明,NCBI GENOME Annotation Project在2001年12月预测的人类假定蛋白XP-058865编码基因LOC124919的模式参考序列XM-058865中存在偏差,即在C17orf32基因cDNA的406与407位碱基之间错误插入一个碱基G, 从而导致在插入位点后,ORF编码125位氨基酸以后蛋白质序列的改变,出现260个氨基酸的多肽.因此,应慎重看待计算机注释的人类基因组编码序列.建立的技术路线有助于发现更多新的人类功能基因.

    Abstract:

    A novel human gene encoding a protein of 208 amino acids is identified and characterized, which has been offered by HGNC with symbol of C17orf32 and name of chromosome 17 open reading frame 32. The full-length cDNA of 1 679 bp for C17orf32 was cloned through a blast search of public databases following the identification of 1 119 bp cDNA obtained by EST assembly with full robotization of SiClone software (created by Chen RS and Ling LJ, and will be released on their website) in ShenWei Ⅳ-type supercomputer. Structurally, C17orf32 has one calcitonin / CGRP / IAPP family signature from amino acid 16 to 169, one dihydroorotase signature from amino acid 43 to 117, one tyrosine kinase phosphorylation site from amino acid 68 to 75, and one bipartite nuclear localization signal from amino acid 28 to 45. These motifs imply the potential biological importance of this gene. Genomic organization analyses show that C17orf32 gene is comprised of six exons, in the size ranging from 43 to 1 101 bp, and five introns, in the size ranging from 163 to 1 124 bp, and spanning 4.61 kb. All of the exon/intron boundaries are consistent with the GT/AG rule, and consensuses surrounding the splice boundaries are found as well. The C17orf32 gene is located on accession NT-010808.7 in the human chromosome 17, and is only linked with LOC124919, a hypothetical human gene of 889 bp mRNA encoding hypothetical protein XP-058865 of 260 amino acids supported by XM-058865. The sequence of LOC124919 has not been verified experimentally. Furthermore, the full-length ORF of 627 bp cDNA from 31 to 654 bp by RT-PCR from the single-stranded human gastric adenocarcinoma MGC803 cell line are cloned and sequenced, which is fully identical with that of the in silico cloning determined by the nucleotide sequencing. Thus, in silico cloning of C17orf31 gene with GenBank accession number of AY074907 and TPA: BK000260 is identified solely by bioinformatics analyses. The full-length cDNA sequence of 1 679 bp exhibits very good overall homology to that of LOC123722 of 899 bp mRNA, with matching percentage of 99% in 78% of total window and 57% in 57% of total window over the full-length nucleotide and protein, respectively. However, the base G in the No.401 position of LOC123722 cDNA is a redundant insert, which causes a reading frame shift in the translation of an alternative protein. The insert G of LOC123722 is not supported by the experimental clone, and is fully rejected by human EST alignment, and is shown as a redundance by genomic GT/AG organization analysis. C17orf32 gene has 9 putative promoters with possibility of 58%~97%, two TATAs, a stop codon in the upstream of ORF, two PolyA signals and a PolyA tail in the downstream of ORF, and accords with Kozak rule around the translation start of the ORF. Based on the above results, it can be concluded that a complete novel human gene is obtained. The full-length gene sequence exhibits little overall homology to any known protein at either the nucleotide or the amino acid level. The two related proteins, with 31% (in 29% of total window) and 18% (in 18% of total window) identity over the full-length protein, respectively, are hypothetical caenorhabditis elegans protein F09E5.11.p of 221 amino acids and polyphosphate kinase [the filamentous nitrogen-fixing cyanobacterium Anabaena sp. strain PCC 7120] of 736 amino acids. Taken together, by combining bioinformatics analyses with experimental verification, a novel human gene C17orf32 is successfully cloned, verified by a series of theoretical and experimental evidence. The strategy will be helpful in discovering more novel human genes, even in correcting errors appeared in NCBI GENOME ANNOTATION PROJECT REFSEQs, such as LOC124919, a model reference sequence predicted from NCBI contig NT-010808 by automated computational analysis using gene prediction method. Therefore, human genome coding region annotated by computer should be used with caution.

    参考文献
    相似文献
    引证文献
引用本文

张德礼,丁培国,凌伦奖,陈润生,马大龙.人类新基因C17orf32的电子克隆和编码区序列RT-PCR验证[J].生物化学与生物物理进展,2002,29(4):543-549

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2002-03-08
  • 最后修改日期:2002-03-28
  • 接受日期:
  • 在线发布日期:
  • 出版日期: