首页 | 本学科首页   官方微博 | 高级检索  
     


3DMolNavi: A web-based retrieval and navigation tool for flexible molecular shape comparison
Authors:Yu-Shen Liu  Meng Wang  Jean-Claude Paul  Karthik Ramani
Affiliation:1. Department of Biostatistics, Harvard School of Public Health, 677 Huntington Avenue, Boston, Massachusetts, USA
2. Institute for Genomic Mathematics, University of Bonn, Bonn, Germany
3. German Center for Neurodegenerative Diseases (DZNE), Bonn, Germany
Abstract:

Background

As Next-Generation Sequencing data becomes available, existing hardware environments do not provide sufficient storage space and computational power to store and process the data due to their enormous size. This is and will be a frequent problem that is encountered everyday by researchers who are working on genetic data. There are some options available for compressing and storing such data, such as general-purpose compression software, PBAT/PLINK binary format, etc. However, these currently available methods either do not offer sufficient compression rates, or require a great amount of CPU time for decompression and loading every time the data is accessed.

Results

Here, we propose a novel and simple algorithm for storing such sequencing data. We show that, the compression factor of the algorithm ranges from 16 to several hundreds, which potentially allows SNP data of hundreds of Gigabytes to be stored in hundreds of Megabytes. We provide a C++ implementation of the algorithm, which supports direct loading and parallel loading of the compressed format without requiring extra time for decompression. By applying the algorithm to simulated and real datasets, we show that the algorithm gives greater compression rate than the commonly used compression methods, and the data-loading process takes less time. Also, The C++ library provides direct-data-retrieving functions, which allows the compressed information to be easily accessed by other C++ programs.

Conclusions

The SpeedGene algorithm enables the storage and the analysis of next generation sequencing data in current hardware environment, making system upgrades unnecessary.
Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号