首页 | 本学科首页   官方微博 | 高级检索  
     


MSClust: A Multi-Seeds based Clustering algorithm for microbiome profiling using 16S rRNA sequence
Authors:Wei Chen  Yongmei Cheng  Clarence Zhang  Shaowu Zhang  Hongyu Zhao
Affiliation:1. College of Automation, Northwestern Polytechnical University, 710072 Xi''an, China;2. Department of Biostatistics, Yale School of Public Health, New Haven, CT 06510, United States;3. Keck Biotechnology Laboratory, Biostatistics Resource, Yale School of Medicine, New Haven, CT 06510, United States
Abstract:Recent developments of next generation sequencing technologies have led to rapid accumulation of 16S rRNA sequences for microbiome profiling. One key step in data processing is to cluster short sequences into operational taxonomic units (OTUs). Although many methods have been proposed for OTU inferences, a major challenge is the balance between inference accuracy and computational efficiency, where inference accuracy is often sacrificed to accommodate the need to analyze large numbers of sequences. Inspired by the hierarchical clustering method and a modified greedy network clustering algorithm, we propose a novel multi-seeds based heuristic clustering method, named MSClust, for OTU inference. MSClust first adaptively selects multi-seeds instead of one seed for each candidate cluster, and the reads are then processed using a greedy clustering strategy. Through many numerical examples, we demonstrate that MSClust enjoys less memory usage, and better biological accuracy compared to existing heuristic clustering methods while preserving efficiency and scalability.
Keywords:Clustering algorithms   Operational taxonomic unit (OTU)   Next-generation sequencing   Seeds-selection   16S rRNA reads
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号