首页 | 本学科首页   官方微博 | 高级检索  
     

基于蓝藻全基因组原始数据的转座元件挖掘及组成分析
引用本文:Xiao P,Li RH. 基于蓝藻全基因组原始数据的转座元件挖掘及组成分析[J]. 遗传, 2011, 33(6): 654-660. DOI: 10.3724/SP.J.1005.2011.00654
作者姓名:Xiao P  Li RH
作者单位:1. 中国科学院水生生物研究所,中国科学院水生生物多样性与保护重点实验室,武汉430072;中国科学院研究生院,北京100049
2. 中国科学院水生生物研究所,中国科学院水生生物多样性与保护重点实验室,武汉430072
基金项目:淡水生态与生物技术国家重点实验室项目,国家重点基础研究发展规划(973计划)项目
摘    要:二代测序技术及全基因组多样性比较是现代生物学及信息科学研究的热点,对基因组中转座元件(Transposable element)的分析已成为基因组比较分析的重要组成部分。目前对于转座元件的种类、数量和组成的挖掘和分析一般是基于完全拼接后的全基因组序列,对在此之前的海量短片段序列后期处理及拼接仍是目前基因组研究的盲点,以转座元件为主的重复序列在拼接过程中也存在着不可避免的拼接误差或丢失,给转座元件系统的分析带来不确定。文章旨在建立一套分析流程,对铜绿微囊藻NIES 843全基因组构建的罗氏(Roche)公司454测序随机模拟原始数据集的转座元件(主要类型为插入序列:Insert sequence,IS)组成进行分析,结果表明,采用对核酸探针扫描后备选序列分成3组,并分设氨基酸检测阈值的方案分析得到的结果较为可靠,结果显示铜绿微囊藻NIES843的蓝藻转座元件占基因组比例的10.38%,归属于14个IS家族,66个IS亚家族。与之前基于完整拼接基因组数据的两套不同分析流程得到的结果相比,在丰度及家族/亚家族组成上无显著差异,在转座元件序列水平上也显示了高比例的相似性序列重叠,证实了本研究流程在基于高通量测序原始数据的转座元件分析方面具可靠性及实用性。

关 键 词:蓝藻基因组  插入序列  IS家族  转座元件  Roche454测序原始数据

Cyanobacterial genome transposable element mining and analysis based on 454 deep-sequencing data set
Xiao Peng,Li Ren-Hui. Cyanobacterial genome transposable element mining and analysis based on 454 deep-sequencing data set[J]. Hereditas, 2011, 33(6): 654-660. DOI: 10.3724/SP.J.1005.2011.00654
Authors:Xiao Peng  Li Ren-Hui
Affiliation:Key Laboratory of Aquatic Biodiversity and Conservation Biology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China. xp@ihb.ac.cn
Abstract:Researches on the next generation sequencing (NGS) and the comparative genome analysis have recently been concerned. The analyses on transposable element composition and abundance are important parts for genome studies. Generally, the analyses of transposable element system were based on the complete spliced genomes; however, the post-processing and sequence splicing of the huge amount of short sequences from the 454 sequencer always encounter problems. Moreover, the occasion that large amount of repeat elements made up by transposable elements were incorrectly splicing or lost, leading to uncertain results. This study aimed at the construction of a framework to automatically analyze the insert sequence (IS) abundance and their composition based on a stimulated Roche 454 deep-sequencing data set, which was a 33-fold coverage of Microcystis aeruginosa NIES 843 genome. The result from the examination under the setting of three classes of division on the IS element candidates and a separated transposase examination thresholds is the most reliable. It showed that the abundance of IS element in this stimulated dataset was 10.38%, including 14 IS families and 66 IS subfamilies, which demonstrated no significant difference with the two sets of previous analysis results based on the spliced M. aeruginosa NIES 843 genome and a high percentage of IS element sequence overlap, indicating the reliability of this framework.
Keywords:
本文献已被 CNKI 万方数据 PubMed 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号