TWARIT: an extremely rapid and efficient approach for phylogenetic classification of metagenomic sequences |
| |
Authors: | Reddy Rachamalla Maheedhar Mohammed Monzoorul Haque Mande Sharmila S |
| |
Affiliation: | Bio-sciences R&D Division, TCS Innovation Labs, Tata Research Development & Design Centre, 54-B, Hadapsar Industrial Estate, Pune, 411013, India. maheedhar@atc.tcs.com |
| |
Abstract: | Phylogenetic assignment of individual sequence reads to their respective taxa, referred to as 'taxonomic binning', constitutes a key step of metagenomic analysis. Existing binning methods have limitations either with respect to time or accuracy/specificity of binning. Given these limitations, development of a method that can bin vast amounts of metagenomic sequence data in a rapid, efficient and computationally inexpensive manner can profoundly influence metagenomic analysis in computational resource poor settings. We introduce TWARIT, a hybrid binning algorithm, that employs a combination of short-read alignment and composition-based signature sorting approaches to achieve rapid binning rates without compromising on binning accuracy and specificity. TWARIT is validated with simulated and real-world metagenomes and the results demonstrate significantly lower overall binning times compared to that of existing methods. Furthermore, the binning accuracy and specificity of TWARIT are observed to be comparable/superior to them. A web server implementing TWARIT algorithm is available at http://metagenomics.atc.tcs.com/Twarit/ |
| |
Keywords: | BLAST, basic local alignment search tool HPBA, hit-pair based assignment SSBA, signature sorting based assignment BWA, Burrows–Wheeler alignment NCBI, national center for biotechnology information LCA, least common ancestor bp, base pair(s) 256D, 256-dimensional RC-centroids, reference cluster centroids SS-score, signature similarity score TA-flag, taxonomic assignment flag MCT, most common taxon FAMeS, fidelity of analysis of metagenomic samples simHC, high-complexity simulated metagenome simMC, medium-complexity simulated metagenome simLC, low-complexity simulated metagenome MEGAN, metagenome analyzer SOrt-ITEMS, sequence orthology based approach for improved taxonomic estimation of metagenomic sequences AMD, acid mine drainage WGS scaffolds, whole genome shotgun scaffolds GB, gigabyte(s) RAM, random access memory min, minute(s) |
本文献已被 ScienceDirect PubMed 等数据库收录! |
|