首页 | 本学科首页   官方微博 | 高级检索  
   检索      

Illumina-Solexa测序数据质量评估系统的构建
引用本文:宋琳琳,顾朝辉,韦朝春,陈赛娟.Illumina-Solexa测序数据质量评估系统的构建[J].生物磁学,2009(15):2899-2902,2912.
作者姓名:宋琳琳  顾朝辉  韦朝春  陈赛娟
作者单位:[1]上海交通大学生命科学技术学院,上海200240 [2]上海交通大学系统生物医学研究院,上海200240 [3]上海交通大学医学院附属瑞金医院上海血液学研究所,医学基因组学国家重点实验室,上海200025 [4]上海生物信息技术中心,上海200235
摘    要:目的:针对下一代测序数据量大、序列长度短的特点,研究数据分析和质量评估方法。方法:选择已发布的Illumina-Solexa平台测序数据为研究对象,通过MAQ软件将测序数据与人类全基因组序列进行比对,并以外显子区域为例,在位点水平对测序数据质量进行评估。结果:结合已有软件系统和本文自创线性算法,建立了一套包括比对、拼接在内的测序数据质量评估系统。比对分析后,发现原始测序序列共覆盖了127,113,378个位点,涉及24条染色体上的64868个外显子。其中,每个位点都被测到的外显子为0.50%,位点平均测序深度大于等于1的外显子为3.98%。结论:成功构建了基于Illumina-Solexa测序平台的数据分析和质量评估方法,其可适用于其它第二代测序平台。研究者可在质量评估的基础上完善测序试验设计,并进行SNP和突变筛选及后续功能性研究。

关 键 词:下一代测序  Illumina-Solexa测序平台  MAQ比对软件  测序质量评估

Evaluation System for Sequencing Data Generated by Illumina-Solexa Platform
SONG Lin-lin,GU Zhao-hui,WEI Chao-chun,CHEN Sai-juan.Evaluation System for Sequencing Data Generated by Illumina-Solexa Platform[J].Biomagnetism,2009(15):2899-2902,2912.
Authors:SONG Lin-lin  GU Zhao-hui  WEI Chao-chun  CHEN Sai-juan
Institution:1 School of Life Sciences & Biotechnology , Shanghai Jiao Tong University, Shanghai 200240, China ; 2 School of Systems Biomedicine, Shanghai Jiao Tong University, Shanghai 200240, China; 3 State Key Laboratory for Medical Genomics, Shanghai Institute of Hematology, Ruijin Hospital attiliated to School of Medicine, Shanghai Jiao Tong University, Shanghai 200025, China; 4 Shanghai Center for Bioinformation Technology, Shanghai 200235, China)
Abstract:Objective: To deal with the huge number of short sequences from the next-generation-sequencing (NGS) platforms, and to establish a systematical method to evaluate the quality of the sequenced data. Methods: In this paper, the raw data is short se- quences acquired from Illumina-Solex a sequencing platform. The MAQ software was used to align these sequences with human genomic sequence and extract all the sites' information from the alignment result. After that, an algorithm was developed to evaluate the sequenc- ing output in specific regions, like exons, on the locus level. Results: In combination with the existing software and a linear algorithm cre- ated by ourselves, an evaluation system was established including of the alignment, assemble and assessment of the sequencing data. Af- ter our analysis, 127,113,378 sites and 64,868 exnons in human's 24 chromosomes are covered in the raw sequencing dada. In all the total number of covered exons, 0.50% of them are totally covered on every site and 3.98% of them are covered with the average depth greater than 1. Conclusions Although this method is based on the data from Illumina-Solexa sequencing platform, the analysis software and qual- ity-checking algorithm could also be used for other next-generation-sequencing platforms with little adaptation. Based on the analysis re- sult of the sequencing data, researchers can improve their experiments design and guarantee the reliability results of their SNPs and muta- tions screening for further functional study.
Keywords:Next Generation Sequencing  Illumina-Solexa sequencing platform  MAQ alignment software  Quality assessment
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号