Simplifier: a web tool to eliminate redundant NGS contigs |
| |
Authors: | Rommel Thiago Jucá Ramos Adriana Ribeiro Carneiro Vasco Azevedo Maria Paula Schneider Debmalya Barh Artur Silva |
| |
Affiliation: | 1Instituto de Ciências Biológicas, Universidade Federal do Pará, Belém, PA, Brazil;2Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil;3Centre for Genomics and Applied Gene Technology, Institute of Integrative Omics and Applied Biotechnology (IIOAB), Nonakuri, Purba Medinipur, WB-721172, India |
| |
Abstract: | Modern genomic sequencing technologies produce a large amount of data with reduced cost per base; however, this data consistsof short reads. This reduction in the size of the reads, compared to those obtained with previous methodologies, presents newchallenges, including a need for efficient algorithms for the assembly of genomes from short reads and for resolving repetitions.Additionally after abinitio assembly, curation of the hundreds or thousands of contigs generated by assemblers demandsconsiderable time and computational resources. We developed Simplifier, a stand-alone software that selectively eliminatesredundant sequences from the collection of contigs generated by ab initio assembly of genomes. Application of Simplifier to datagenerated by assembly of the genome of Corynebacterium pseudotuberculosis strain 258 reduced the number of contigs generated byab initio methods from 8,004 to 5,272, a reduction of 34.14%; in addition, N50 increased from 1 kb to 1.5 kb. Processing the contigs ofEscherichia coli DH10B with Simplifier reduced the mate-paired library 17.47% and the fragment library 23.91%. Simplifier removedredundant sequences from datasets produced by assemblers, thereby reducing the effort required for finalization of genomeassembly in tests with data from Prokaryotic organisms.AvailabilitySimplifier is available at http://www.genoma.ufpa.br/rramos/softwares/simplifier.xhtmlIt requires Sun jdk 6 or higher. |
| |
Keywords: | NGS sequencing ab initio assembly of genomes redundant sequences |
|
|