首页 | 本学科首页   官方微博 | 高级检索  
   检索      


Mako: A Graph-based Pattern Growth Approach to Detect Complex Structural Variants
Authors:Jiadong Lin  Xiaofei Yang  Walter Kosters  Tun Xu  Yanyan Jia  Songbo Wang  Qihui Zhu  Mallory Ryan  Li Guo  Chengsheng Zhang  The Human Genome Structural Variation Consortium  Charles Lee  Scott E Devine  Evan E Eichler  Kai Ye
Institution:1. Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA;2. European Molecular Biology Laboratory (EMBL), Genome Biology Unit, D-69117 Heidelberg, Germany;3. New York Genome Center, New York, NY 10013, USA;4. Center for Genomic Medicine, Massachusetts General Hospital, Department of Neurology, Harvard Medical School, Boston, MA 02114, USA;5. Department of Computational Medicine & Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA;6. Heinrich Heine University, Medical Faculty, Institute for Medical Biometry and Bioinformatics, D-40225 Düsseldorf, Germany;7. Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195-5065, USA;8. Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), D-69120 Heidelberg, Germany;9. The Jackson Laboratory for Genomic Medicine, Farmington, CT 06030, USA;10. Bionano Genomics, San Diego, CA 92121, USA;11. Department of Genetics and Informatics Institute, School of Medicine, University of Alabama at Birmingham, Birmingham, AL 35294, USA;12. Molecular and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA;13. Department of Computer & Information Sciences, Temple University, Philadelphia, PA 19122, USA;14. Pacific Biosystems of California, Inc, Menlo Park, CA 94025, USA;15. Washington University, St. Louis, MO 63108, USA;16. European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom;1. School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an 710049, China;2. MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an 710049, China;3. Genome Institute, the First Affiliated Hospital of Xi’an Jiaotong University, Xi’an 710061, China;4. Leiden Institute of Advanced Computer Science, Faculty of Science, Leiden University, Leiden 2311EZ, Netherland;5. School of Computer Science and Technology, Faculty of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an 710049, China;6. The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA;7. Precision Medicine Center, the First Affiliated Hospital of Xi’an Jiaotong University, Xi’an 710061, China;8. Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA;9. Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98119, USA;10. Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA;11. The School of Life Science and Technology, Xi’an Jiaotong University, Xi’an 710049, China
Abstract:Complex structural variants (CSVs) are genomic alterations that have more than two breakpoints and are considered as the simultaneous occurrence of simple structural variants. However, detecting the compounded mutational signals of CSVs is challenging through a commonly used model-match strategy. As a result, there has been limited progress for CSV discovery compared with simple structural variants. Here, we systematically analyzed the multi-breakpoint connection feature of CSVs, and proposed Mako, utilizing a bottom-up guided model-free strategy, to detect CSVs from paired-end short-read sequencing. Specifically, we implemented a graph-based pattern growth approach, where the graph depicts potential breakpoint connections, and pattern growth enables CSV detection without pre-defined models. Comprehensive evaluations on both simulated and real datasets revealed that Mako outperformed other algorithms. Notably, validation rates of CSVs on real data based on experimental and computational validations as well as manual inspections are around 70%, where the medians of experimental and computational breakpoint shift are 13 bp and 26 bp, respectively. Moreover, the Mako CSV subgraph effectively characterized the breakpoint connections of a CSV event and uncovered a total of 15 CSV types, including two novel types of adjacent segment swap and tandem dispersed duplication. Further analysis of these CSVs also revealed the impact of sequence homology on the formation of CSVs. Mako is publicly available at https://github.com/xjtu-omics/Mako.
Keywords:Next-generation sequencing  Complex structural variant  Pattern growth  Graph mining  Formation mechanism
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号