首页 | 本学科首页   官方微博 | 高级检索  
     


Detecting genomic deletions from high-throughput sequence data with unsupervised learning
Authors:Li  Xin  Wu  Yufeng
Affiliation:1.Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, 20892, USA
;2.Cancer Genomics Research Laboratory, Frederick National Laboratory for Cancer Research, Leidos Biomedical Research Inc, Frederick, MD, 21702, USA
;3.Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, 06269, USA
;
Abstract:Background

Structural variation (SV), which ranges from 50 bp to (sim) 3 Mb in size, is an important type of genetic variations. Deletion is a type of SV in which a part of a chromosome or a sequence of DNA is lost during DNA replication. Three types of signals, including discordant read-pairs, reads depth and split reads, are commonly used for SV detection from high-throughput sequence data. Many tools have been developed for detecting SVs by using one or multiple of these signals.

Results

In this paper, we develop a new method called EigenDel for detecting the germline submicroscopic genomic deletions. EigenDel first takes advantage of discordant read-pairs and clipped reads to get initial deletion candidates, and then it clusters similar candidates by using unsupervised learning methods. After that, EigenDel uses a carefully designed approach for calling true deletions from each cluster. We conduct various experiments to evaluate the performance of EigenDel on low coverage sequence data.

Conclusions

Our results show that EigenDel outperforms other major methods in terms of improving capability of balancing accuracy and sensitivity as well as reducing bias. EigenDel can be downloaded from https://github.com/lxwgcool/EigenDel.

Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号