Ultra-fast local-haplotype variant calling using paired-end DNA-sequencing data reveals somatic mosaicism in tumor and normal blood samples |
| |
Authors: | Subhajit Sengupta Kamalakar Gulukota Yitan Zhu Carole Ober Katherine Naughton William Wentworth-Sheilds Yuan Ji |
| |
Institution: | 1.Program of Computational Genomics & Medicine, NorthShore University HealthSystem, Evanston, IL 60201, USA;2.Center for Molecular Medicine, NorthShore University HealthSystem, Evanston, IL 60201, USA;3.Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA;4.Department of Health Studies, University of Chicago, Chicago, IL 60637, USA |
| |
Abstract: | Somatic mosaicism refers to the existence of somatic mutations in a fraction of somatic cells in a single biological sample. Its importance has mainly been discussed in theory although experimental work has started to emerge linking somatic mosaicism to disease diagnosis. Through novel statistical modeling of paired-end DNA-sequencing data using blood-derived DNA from healthy donors as well as DNA from tumor samples, we present an ultra-fast computational pipeline, LocHap that searches for multiple single nucleotide variants (SNVs) that are scaffolded by the same reads. We refer to scaffolded SNVs as local haplotypes (LH). When an LH exhibits more than two genotypes, we call it a local haplotype variant (LHV). The presence of LHVs is considered evidence of somatic mosaicism because a genetically homogeneous cell population will not harbor LHVs. Applying LocHap to whole-genome and whole-exome sequence data in DNA from normal blood and tumor samples, we find wide-spread LHVs across the genome. Importantly, we find more LHVs in tumor samples than in normal samples, and more in older adults than in younger ones. We confirm the existence of LHVs and somatic mosaicism by validation studies in normal blood samples. LocHap is publicly available at http://www.compgenome.org/lochap. |
| |
Keywords: | |
|
|