首页 | 本学科首页   官方微博 | 高级检索  
   检索      


Reference-free SNP calling: improved accuracy by preventing incorrect calls from repetitive genomic regions
Authors:Jinzhuang Dou  Xiqiang Zhao  Xiaoteng Fu  Wenqian Jiao  Nannan Wang  Lingling Zhang  Xiaoli Hu  Shi Wang  Zhenmin Bao
Institution:1. Key Laboratory of Marine Genetics and Breeding, College of Marine Life Sciences, Ocean University of China, 5 Yushan Road, Qingdao, 266003, China
2. College of Mathematical sciences, Ocean University of China, 238 Songling Road, Qingdao, 266003, China
Abstract:ABSTRACT: BACKGROUND: Single nucleotide polymorphisms (SNPs) are the most abundant type of genetic variation in eukaryotic genomes and have recently become the marker of choice in a wide variety of ecological and evolutionary studies. The advent of next-generation sequencing (NGS) technologies has made it possible to efficiently genotype a large number of SNPs in the non-model organisms with no or limited genomic resources. Most NGS-based genotyping methods require a reference genome to perform accurate SNP calling. Little effort, however, has yet been devoted to developing or improving algorithms for accurate SNP calling in the absence of a reference genome. RESULTS: Here we describe an improved maximum likelihood (ML) algorithm called iML, which can achieve high genotyping accuracy for SNP calling in the non-model organisms without a reference genome. The iML algorithm incorporates the mixed Poisson/normal model to detect composite read clusters and can efficiently prevent incorrect SNP calls resulting from repetitive genomic regions. Through analysis of simulation and real sequencing datasets, we demonstrate that in comparison with ML or a threshold approach, iML can remarkably improve the accuracy of de novo SNP genotyping and is especially powerful for the reference-free genotyping in diploid genomes with high repeat contents. CONCLUSIONS: The iML algorithm can efficiently prevent incorrect SNP calls resulting from repetitive genomic regions, and thus outperforms the original ML algorithm by achieving much higher genotyping accuracy. Our algorithm is therefore very useful for accurate de novo SNP genotyping in the non-model organisms without a reference genome.
Keywords:
本文献已被 PubMed SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号