首页 | 本学科首页   官方微博 | 高级检索  
     


Effects of GC Bias in Next-Generation-Sequencing Data on De Novo Genome Assembly
Authors:Yen-Chun Chen  Tsunglin Liu  Chun-Hui Yu  Tzen-Yuh Chiang  Chi-Chuan Hwang
Affiliation:1. Department of Engineering Science, National Cheng Kung University, Tainan, Taiwan.; 2. Institute of Bioinformatics and Biosignal Transduction, National Cheng Kung University, Tainan, Taiwan.; 3. Department of Life Sciences, National Cheng Kung University, Tainan, Taiwan.; 4. Supercomputing Research Center, National Cheng Kung University, Tainan, Taiwan.; University of Georgia, United States of America,
Abstract:Next-generation-sequencing (NGS) has revolutionized the field of genome assembly because of its much higher data throughput and much lower cost compared with traditional Sanger sequencing. However, NGS poses new computational challenges to de novo genome assembly. Among the challenges, GC bias in NGS data is known to aggravate genome assembly. However, it is not clear to what extent GC bias affects genome assembly in general. In this work, we conduct a systematic analysis on the effects of GC bias on genome assembly. Our analyses reveal that GC bias only lowers assembly completeness when the degree of GC bias is above a threshold. At a strong GC bias, the assembly fragmentation due to GC bias can be explained by the low coverage of reads in the GC-poor or GC-rich regions of a genome. This effect is observed for all the assemblers under study. Increasing the total amount of NGS data thus rescues the assembly fragmentation because of GC bias. However, the amount of data needed for a full rescue depends on the distribution of GC contents. Both low and high coverage depths due to GC bias lower the accuracy of assembly. These pieces of information provide guidance toward a better de novo genome assembly in the presence of GC bias.
Keywords:
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号