首页 | 本学科首页   官方微博 | 高级检索  
     


A non-negative matrix factorization framework for identifying modular patterns in metagenomic profile data
Authors:Xingpeng?Jiang,Joshua?S.?Weitz,Jonathan?Dushoff  author-information"  >  author-information__contact u-icon-before"  >  mailto:dushoff@mcmaster.ca"   title="  dushoff@mcmaster.ca"   itemprop="  email"   data-track="  click"   data-track-action="  Email author"   data-track-label="  "  >Email author
Affiliation:(1) Department of Biology, McMaster University, Hamilton, Ontario, Canada;(2) School of Biology and School of Physics, Georgia Institute of Technology, Atlanta, GA, USA;(3) M. G. DeGroote Institute for Infectious Disease Research, McMaster University, Hamilton, Ontario, Canada;
Abstract:Metagenomic studies sequence DNA directly from environmental samples to explore the structure and function of complex microbial and viral communities. Individual, short pieces of sequenced DNA (“reads”) are classified into (putative) taxonomic or metabolic groups which are analyzed for patterns across samples. Analysis of such read matrices is at the core of using metagenomic data to make inferences about ecosystem structure and function. Non-negative matrix factorization (NMF) is a numerical technique for approximating high-dimensional data points as positive linear combinations of positive components. It is thus well suited to interpretation of observed samples as combinations of different components. We develop, test and apply an NMF-based framework to analyze metagenomic read matrices. In particular, we introduce a method for choosing NMF degree in the presence of overlap, and apply spectral-reordering techniques to NMF-based similarity matrices to aid visualization. We show that our method can robustly identify the appropriate degree and disentangle overlapping contributions using synthetic data sets. We then examine and discuss the NMF decomposition of a metabolic profile matrix extracted from 39 publicly available metagenomic samples, and identify canonical sample types, including one associated with coral ecosystems, one associated with highly saline ecosystems and others. We also identify specific associations between pathways and canonical environments, and explore how alternative choices of decompositions facilitate analysis of read matrices at a finer scale.
Keywords:
本文献已被 PubMed SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号