首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Jingru Zhang  Wei Lin 《Biometrics》2019,75(4):1098-1108
Clustered multinomial data are prevalent in a variety of applications such as microbiome studies, where metagenomic sequencing data are summarized as multinomial counts for a large number of bacterial taxa per subject. Count normalization with ad hoc zero adjustment tends to result in poor estimates of abundances for taxa with zero or small counts. To account for heterogeneity and overdispersion in such data, we suggest using the logistic normal multinomial (LNM) model with an arbitrary correlation structure to simultaneously estimate the taxa compositions by borrowing information across subjects. We overcome the computational difficulties in high dimensions by developing a stochastic approximation EM algorithm with Hamiltonian Monte Carlo sampling for scalable parameter estimation in the LNM model. The ill‐conditioning problem due to unstructured covariance is further mitigated by a covariance‐regularized estimator with a condition number constraint. The advantages of the proposed methods are illustrated through simulations and an application to human gut microbiome data.  相似文献   

2.
One goal of sequencing-based metagenomic community analysis is the quantitative taxonomic assessment of microbial community compositions. In particular, relative quantification of taxons is of high relevance for metagenomic diagnostics or microbial community comparison. However, the majority of existing approaches quantify at low resolution (e.g. at phylum level), rely on the existence of special genes (e.g. 16S), or have severe problems discerning species with highly similar genome sequences. Yet, problems as metagenomic diagnostics require accurate quantification on species level. We developed Genome Abundance Similarity Correction (GASiC), a method to estimate true genome abundances via read alignment by considering reference genome similarities in a non-negative LASSO approach. We demonstrate GASiC’s superior performance over existing methods on simulated benchmark data as well as on real data. In addition, we present applications to datasets of both bacterial DNA and viral RNA source. We further discuss our approach as an alternative to PCR-based DNA quantification.  相似文献   

3.
Prevailing 16S rRNA gene-amplicon methods for characterizing the bacterial microbiome of wildlife are economical, but result in coarse taxonomic classifications, are subject to primer and 16S copy number biases, and do not allow for direct estimation of microbiome functional potential. While deep shotgun metagenomic sequencing can overcome many of these limitations, it is prohibitively expensive for large sample sets. Here we evaluated the ability of shallow shotgun metagenomic sequencing to characterize taxonomic and functional patterns in the faecal microbiome of a model population of feral horses (Sable Island, Canada). Since 2007, this unmanaged population has been the subject of an individual-based, long-term ecological study. Using deep shotgun metagenomic sequencing, we determined the sequencing depth required to accurately characterize the horse microbiome. In comparing conventional vs. high-throughput shotgun metagenomic library preparation techniques, we validate the use of more cost-effective laboratory methods. Finally, we characterize similarities between 16S amplicon and shallow shotgun characterization of the microbiome, and demonstrate that the latter recapitulates biological patterns first described in a published amplicon data set. Unlike for amplicon data, we further demonstrate how shallow shotgun metagenomic data provide useful insights regarding microbiome functional potential which support previously hypothesized diet effects in this study system.  相似文献   

4.
The ability to predict human phenotypes and identify biomarkers of disease from metagenomic data is crucial for the development of therapeutics for microbiome-associated diseases. However, metagenomic data is commonly affected by technical variables unrelated to the phenotype of interest, such as sequencing protocol, which can make it difficult to predict phenotype and find biomarkers of disease. Supervised methods to correct for background noise, originally designed for gene expression and RNA-seq data, are commonly applied to microbiome data but may be limited because they cannot account for unmeasured sources of variation. Unsupervised approaches address this issue, but current methods are limited because they are ill-equipped to deal with the unique aspects of microbiome data, which is compositional, highly skewed, and sparse. We perform a comparative analysis of the ability of different denoising transformations in combination with supervised correction methods as well as an unsupervised principal component correction approach that is presently used in other domains but has not been applied to microbiome data to date. We find that the unsupervised principal component correction approach has comparable ability in reducing false discovery of biomarkers as the supervised approaches, with the added benefit of not needing to know the sources of variation apriori. However, in prediction tasks, it appears to only improve prediction when technical variables contribute to the majority of variance in the data. As new and larger metagenomic datasets become increasingly available, background noise correction will become essential for generating reproducible microbiome analyses.  相似文献   

5.
Given the advent of massively parallel DNA sequencing, human microbiome is analyzed comprehensively by metagenomic approaches. However, the inter- and intra-individual variability and stability of the human microbiome remain poorly characterized, particularly at the intra-day level. This issue is of crucial importance for studies examining the effects of microbiome on human health. Here, we focused on bacteriome of oral plaques, for which repeated, time-controlled sampling is feasible. Eighty-one supragingival plaque subjects were collected from healthy individuals, examining multiple sites within the mouth at three time points (forenoon, evening, and night) over the course of 3 days. Bacterial composition was estimated by 16S rRNA sequencing and species-level profiling, resulting in identification of a total of 162 known bacterial species. We found that species compositions and their relative abundances were similar within individuals, and not between sampling time or tooth type. This suggests that species-level oral bacterial composition differs significantly between individuals, although the number of subjects is limited and the intra-individual variation also occurs. The majority of detected bacterial species (98.2%; 159/162), however, did not fluctuate over the course of the day, implying a largely stable oral microbiome on an intra-day time scale. In fact, the stability of this data set enabled us to estimate potential interactions between rare bacteria, with 40 co-occurrences supported by the existing literature. In summary, the present study provides a valuable basis for studies of the human microbiome, with significant implications in terms of biological and clinical outcomes.  相似文献   

6.
Massive DNA sequencing studies have expanded our insights and understanding of the ecological and functional characteristics of the gut microbiome. Advanced sequencing technologies allow us to understand the close association of the gut microbiome with human health and critical illnesses. In the future, analyses of the gut microbiome will provide key information associating with human individual health, which will help provide personalized health care for diseases. Numerous molecular biological analysis tools have been rapidly developed and employed for the gut microbiome researches; however, methodological differences among researchers lead to inconsistent data, limiting extensive share of data. It is therefore very essential to standardize the current methodologies and establish appropriate pipelines for human gut microbiome research. Herein, we review the methods and procedures currently available for studying the human gut microbiome, including fecal sample collection, metagenomic DNA extraction, massive DNA sequencing, and data analyses with bioinformatics. We believe that this review will contribute to the progress of gut microbiome research in the clinical and practical aspects of human health.  相似文献   

7.

With the increasing availability of microbiome 16S data, network estimation has become a useful approach to studying the interactions between microbial taxa. Network estimation on a set of variables is frequently explored using graphical models, in which the relationship between two variables is modeled via their conditional dependency given the other variables. Various methods for sparse inverse covariance estimation have been proposed to estimate graphical models in the high-dimensional setting, including graphical lasso. However, current methods do not address the compositional count nature of microbiome data, where abundances of microbial taxa are not directly measured, but are reflected by the observed counts in an error-prone manner. Adding to the challenge is that the sum of the counts within each sample, termed “sequencing depth,” is an experimental technicality that carries no biological information but can vary drastically across samples. To address these issues, we develop a new approach to network estimation, called BC-GLASSO (bias-corrected graphical lasso), which models the microbiome data using a logistic normal multinomial distribution with the sequencing depths explicitly incorporated, corrects the bias of the naive empirical covariance estimator arising from the heterogeneity in sequencing depths, and builds the inverse covariance estimator via graphical lasso. We demonstrate the advantage of BC-GLASSO over current approaches to microbial interaction network estimation under a variety of simulation scenarios. We also illustrate the efficacy of our method in an application to a human microbiome data set.

  相似文献   

8.
Next-generation sequencing technologies have opened up an unprecedented opportunity for microbiology by enabling the culture-independent genetic study of complex microbial communities, which were so far largely unknown. The analysis of metagenomic data is challenging: potentially, one is faced with a sample containing a mixture of many different bacterial species, whose genome has not necessarily been sequenced beforehand. In the simpler case of the analysis of 16S ribosomal RNA metagenomic data, for which databases of reference sequences are known, we survey the computational challenges to be solved in order to be able to characterize and quantify a sample. In particular, we examine two aspects: how the necessary adoption of new tools geared towards high-throughput analysis impacts the quality of the results, and how good is the performance of various established methods to assign sequence reads to microbial species, with and without taking taxonomic information into account.  相似文献   

9.
Background: Metagenomic sequencing is a complex sampling procedure from unknown mixtures of many genomes. Having metagenome data with known genome compositions is essential for both benchmarking bioinformatics software and for investigating influences of various factors on the data. Compared to data from real microbiome samples or from defined microbial mock community, simulated data with proper computational models are better for the purpose as they provide more flexibility for controlling multiple factors. Methods: We developed a non-uniform metagenomic sequencing simulation system (nuMetaSim) that is capable of mimicking various factors in real metagenomic sequencing to reflect multiple properties of real data with customizable parameter settings. Results: We generated 9 comprehensive metagenomic datasets with different composition complexity from of 203 bacterial genomes and 2 archaeal genomes related with human intestine system. Conclusion: The data can serve as benchmarks for comparing performance of different methods at different situations, and the software package allows users to generate simulation data that can better reflect the specific properties in their scenarios.  相似文献   

10.
The growing threat of antimicrobial resistance (AMR) calls for new epidemiological surveillance methods, as well as a deeper understanding of how antimicrobial resistance genes (ARGs) have been transmitted around the world. The large pool of sequencing data available in public repositories provides an excellent resource for monitoring the temporal and spatial dissemination of AMR in different ecological settings. However, only a limited number of research groups globally have the computational resources to analyze such data. We retrieved 442 Tbp of sequencing reads from 214,095 metagenomic samples from the European Nucleotide Archive (ENA) and aligned them using a uniform approach against ARGs and 16S/18S rRNA genes. Here, we present the results of this extensive computational analysis and share the counts of reads aligned. Over 6.76∙108 read fragments were assigned to ARGs and 3.21∙109 to rRNA genes, where we observed distinct differences in both the abundance of ARGs and the link between microbiome and resistome compositions across various sampling types. This collection is another step towards establishing global surveillance of AMR and can serve as a resource for further research into the environmental spread and dynamic changes of ARGs.

The growing threat of antimicrobial resistance (AMR) calls for new epidemiological surveillance methods and a deeper understanding of how resistance genes are transmitted around the world. This study presents a large-scale remapping of sequencing reads of publicly available metagenomic datasets that can be used to monitor the global prevalence of AMR genes.  相似文献   

11.
Human associated microbial communities exert tremendous influence over human health and disease. With modern metagenomic sequencing methods it is now possible to follow the relative abundance of microbes in a community over time. These microbial communities exhibit rich ecological dynamics and an important goal of microbial ecology is to infer the ecological interactions between species directly from sequence data. Any algorithm for inferring ecological interactions must overcome three major obstacles: 1) a correlation between the abundances of two species does not imply that those species are interacting, 2) the sum constraint on the relative abundances obtained from metagenomic studies makes it difficult to infer the parameters in timeseries models, and 3) errors due to experimental uncertainty, or mis-assignment of sequencing reads into operational taxonomic units, bias inferences of species interactions due to a statistical problem called “errors-in-variables”. Here we introduce an approach, Learning Interactions from MIcrobial Time Series (LIMITS), that overcomes these obstacles. LIMITS uses sparse linear regression with boostrap aggregation to infer a discrete-time Lotka-Volterra model for microbial dynamics. We tested LIMITS on synthetic data and showed that it could reliably infer the topology of the inter-species ecological interactions. We then used LIMITS to characterize the species interactions in the gut microbiomes of two individuals and found that the interaction networks varied significantly between individuals. Furthermore, we found that the interaction networks of the two individuals are dominated by distinct “keystone species”, Bacteroides fragilis and Bacteroided stercosis, that have a disproportionate influence on the structure of the gut microbiome even though they are only found in moderate abundance. Based on our results, we hypothesize that the abundances of certain keystone species may be responsible for individuality in the human gut microbiome.  相似文献   

12.
Microbial communities carry out the majority of the biochemical activity on the planet, and they play integral roles in processes including metabolism and immune homeostasis in the human microbiome. Shotgun sequencing of such communities' metagenomes provides information complementary to organismal abundances from taxonomic markers, but the resulting data typically comprise short reads from hundreds of different organisms and are at best challenging to assemble comparably to single-organism genomes. Here, we describe an alternative approach to infer the functional and metabolic potential of a microbial community metagenome. We determined the gene families and pathways present or absent within a community, as well as their relative abundances, directly from short sequence reads. We validated this methodology using a collection of synthetic metagenomes, recovering the presence and abundance both of large pathways and of small functional modules with high accuracy. We subsequently applied this method, HUMAnN, to the microbial communities of 649 metagenomes drawn from seven primary body sites on 102 individuals as part of the Human Microbiome Project (HMP). This provided a means to compare functional diversity and organismal ecology in the human microbiome, and we determined a core of 24 ubiquitously present modules. Core pathways were often implemented by different enzyme families within different body sites, and 168 functional modules and 196 metabolic pathways varied in metagenomic abundance specifically to one or more niches within the microbiome. These included glycosaminoglycan degradation in the gut, as well as phosphate and amino acid transport linked to host phenotype (vaginal pH) in the posterior fornix. An implementation of our methodology is available at http://huttenhower.sph.harvard.edu/humann. This provides a means to accurately and efficiently characterize microbial metabolic pathways and functional modules directly from high-throughput sequencing reads, enabling the determination of community roles in the HMP cohort and in future metagenomic studies.  相似文献   

13.
Scientific research is shedding light on the interaction of the gut microbiome with the human host and on its role in human health. Existing machine learning methods have shown great potential in discriminating healthy from diseased microbiome states. Most of them leverage shotgun metagenomic sequencing to extract gut microbial species-relative abundances or strain-level markers. Each of these gut microbial profiling modalities showed diagnostic potential when tested separately; however, no existing approach combines them in a single predictive framework. Here, we propose the Multimodal Variational Information Bottleneck (MVIB), a novel deep learning model capable of learning a joint representation of multiple heterogeneous data modalities. MVIB achieves competitive classification performance while being faster than existing methods. Additionally, MVIB offers interpretable results. Our model adopts an information theoretic interpretation of deep neural networks and computes a joint stochastic encoding of different input data modalities. We use MVIB to predict whether human hosts are affected by a certain disease by jointly analysing gut microbial species-relative abundances and strain-level markers. MVIB is evaluated on human gut metagenomic samples from 11 publicly available disease cohorts covering 6 different diseases. We achieve high performance (0.80 < ROC AUC < 0.95) on 5 cohorts and at least medium performance on the remaining ones. We adopt a saliency technique to interpret the output of MVIB and identify the most relevant microbial species and strain-level markers to the model’s predictions. We also perform cross-study generalisation experiments, where we train and test MVIB on different cohorts of the same disease, and overall we achieve comparable results to the baseline approach, i.e. the Random Forest. Further, we evaluate our model by adding metabolomic data derived from mass spectrometry as a third input modality. Our method is scalable with respect to input data modalities and has an average training time of < 1.4 seconds. The source code and the datasets used in this work are publicly available.  相似文献   

14.
The upper respiratory tract microbiome has an important role in respiratory health. Influenza A is a common viral infection that challenges that health, and a well-recognized sequela is bacterial pneumonia. Given this connection, we sought to characterize the upper respiratory tract microbiota of individuals suffering from the pandemic H1N1 influenza A outbreak of 2009 and determine if microbiome profiles could be correlated with patient characteristics. We determined the microbial profiles of 65 samples from H1N1 patients by cpn60 universal target amplification and sequencing. Profiles were examined at the phylum and nearest neighbor “species” levels using the characteristics of patient gender, age, originating health authority, sample type and designation (STAT/non-STAT). At the phylum level, Actinobacteria-, Firmicutes- and Proteobacteria-dominated microbiomes were observed, with none of the patient characteristics showing significant profile composition differences. At the nearest neighbor “species” level, the upper respiratory tract microbiomes were composed of 13-20 “species” and showed a trend towards increasing diversity with patient age. Interestingly, at an individual level, most patients had one to three organisms dominant in their microbiota. A limited number of discrete microbiome profiles were observed, shared among influenza patients regardless of patient status variables. To assess the validity of analyses derived from sequence read abundance, several bacterial species were quantified by quantitative PCR and compared to the abundance of cpn60 sequence read counts obtained in the study. A strong positive correlation between read abundance and absolute bacterial quantification was observed. This study represents the first examination of the upper respiratory tract microbiome using a target other than the 16S rRNA gene and to our knowledge, the first thorough examination of this microbiome during a viral infection.  相似文献   

15.
Xia LC  Cram JA  Chen T  Fuhrman JA  Sun F 《PloS one》2011,6(12):e27992
Accurate estimation of microbial community composition based on metagenomic sequencing data is fundamental for subsequent metagenomics analysis. Prevalent estimation methods are mainly based on directly summarizing alignment results or its variants; often result in biased and/or unstable estimates. We have developed a unified probabilistic framework (named GRAMMy) by explicitly modeling read assignment ambiguities, genome size biases and read distributions along the genomes. Maximum likelihood method is employed to compute Genome Relative Abundance of microbial communities using the Mixture Model theory (GRAMMy). GRAMMy has been demonstrated to give estimates that are accurate and robust across both simulated and real read benchmark datasets. We applied GRAMMy to a collection of 34 metagenomic read sets from four metagenomics projects and identified 99 frequent species (minimally 0.5% abundant in at least 50% of the data-sets) in the human gut samples. Our results show substantial improvements over previous studies, such as adjusting the over-estimated abundance for Bacteroides species for human gut samples, by providing a new reference-based strategy for metagenomic sample comparisons. GRAMMy can be used flexibly with many read assignment tools (mapping, alignment or composition-based) even with low-sensitivity mapping results from huge short-read datasets. It will be increasingly useful as an accurate and robust tool for abundance estimation with the growing size of read sets and the expanding database of reference genomes.  相似文献   

16.
Microbes are ubiquitously distributed in nature, and recent culture-independent studies have highlighted the significance of gut microbiota in human health and disease. Fecal DNA is the primary source for the majority of human gut microbiome studies. However, further improvement is needed to obtain fecal metagenomic DNA with sufficient amount and good quality but low host genomic DNA contamination. In the current study, we demonstrate a quick, robust, unbiased,and cost-effective method for the isolation of high molecular weight(23 kb) metagenomic DNA(260/280 ratio 1.8) with a good yield(55.8 ± 3.8 ng/mg of feces). We also confirm that there is very low human genomic DNA contamination(eubacterial: human genomic DNA marker genes = 2~(27.9):1) in the human feces. The newly-developed method robustly performs for fresh as well as stored fecal samples as demonstrated by 16 S r RNA gene sequencing using 454 FLX+.Moreover, 16 S r RNA gene analysis indicated that compared to other DNA extraction methods tested, the fecal metagenomic DNA isolated with current methodology retains species richnessand does not show microbial diversity biases, which is further confirmed by q PCR with a known quantity of spike-in genomes. Overall, our data highlight a protocol with a balance between quality,amount, user-friendliness, and cost effectiveness for its suitability toward usage for cultureindependent analysis of the human gut microbiome, which provides a robust solution to overcome key issues associated with fecal metagenomic DNA isolation in human gut microbiome studies.  相似文献   

17.
Microbial community succession was examined over a two-year period using spatially and temporally coordinated water chemistry measurements, metagenomic sequencing, phylogenetic binning and de novo metagenomic assembly in the extreme hypersaline habitat of Lake Tyrrell, Victoria, Australia. Relative abundances of Haloquadratum-related sequences were positively correlated with co-varying concentrations of potassium, magnesium and sulfate, but not sodium, chloride or calcium ions, while relative abundances of Halorubrum, Haloarcula, Halonotius, Halobaculum and Salinibacter-related sequences correlated negatively with Haloquadratum and these same ionic factors. Nanohaloarchaea and Halorhabdus-related sequence abundances were inversely correlated with each other, but not other taxonomic groups. These data, along with predicted gene functions from nearly-complete assembled population metagenomes, suggest different ecological phenotypes for Nanohaloarchaea and Halorhabdus-related strains versus other community members. Nucleotide percent G+C compositions were consistently lower in community metagenomic reads from summer versus winter samples. The same seasonal G+C trends were observed within taxonomically binned read subsets from each of seven different genus-level archaeal groups. Relative seasonal abundances were also linked to percent G+C for assembled population genomes. Together, these data suggest that extreme ionic conditions may exert selective pressure on archaeal populations at the level of genomic nucleotide composition, thus contributing to seasonal successional processes. Despite the unavailability of cultured representatives for most of the organisms identified in this study, effective coordination of physical and biological measurements has enabled discovery and quantification of unexpected taxon-specific, environmentally mediated factors influencing microbial community structure.  相似文献   

18.
The various ecological habitats in the human body provide microbes a wide array of nutrient sources and survival challenges. Advances in technology such as DNA sequencing have allowed a deeper perspective into the molecular function of the human microbiota than has been achievable in the past. Here we aimed to examine the enzymes that cleave complex carbohydrates (CAZymes) in the human microbiome in order to determine (i) whether the CAZyme profiles of bacterial genomes are more similar within body sites or bacterial families and (ii) the sugar degradation and utilization capabilities of microbial communities inhabiting various human habitats. Upon examination of 493 bacterial references genomes from 12 human habitats, we found that sugar degradation capabilities of taxa are more similar to others in the same bacterial family than to those inhabiting the same habitat. Yet, the analysis of 520 metagenomic samples from five major body sites show that even when the community composition varies the CAZyme profiles are very similar within a body site, suggesting that the observed functional profile and microbial habitation have adapted to the local carbohydrate composition. When broad sugar utilization was compared within the five major body sites, the gastrointestinal track contained the highest potential for total sugar degradation, while dextran and peptidoglycan degradation were highest in oral and vaginal sites respectively. Our analysis suggests that the carbohydrate composition of each body site has a profound influence and probably constitutes one of the major driving forces that shapes the community composition and therefore the CAZyme profile of the local microbial communities, which in turn reflects the microbiome fitness to a body site.  相似文献   

19.

Background

Characterizing the biogeography of the microbiome of healthy humans is essential for understanding microbial associated diseases. Previous studies mainly focused on a single body habitat from a limited set of subjects. Here, we analyzed one of the largest microbiome datasets to date and generated a biogeographical map that annotates the biodiversity, spatial relationships, and temporal stability of 22 habitats from 279 healthy humans.

Results

We identified 929 genera from more than 24 million 16S rRNA gene sequences of 22 habitats, and we provide a baseline of inter-subject variation for healthy adults. The oral habitat has the most stable microbiota with the highest alpha diversity, while the skin and vaginal microbiota are less stable and show lower alpha diversity. The level of biodiversity in one habitat is independent of the biodiversity of other habitats in the same individual. The abundances of a given genus at a body site in which it dominates do not correlate with the abundances at body sites where it is not dominant. Additionally, we observed the human microbiota exhibit both cosmopolitan and endemic features. Finally, comparing datasets of different projects revealed a project-based clustering pattern, emphasizing the significance of standardization of metagenomic studies.

Conclusions

The data presented here extend the definition of the human microbiome by providing a more complete and accurate picture of human microbiome biogeography, addressing questions best answered by a large dataset of subjects and body sites that are deeply sampled by sequencing.  相似文献   

20.
The availability of metagenomic sequencing data, generated by sequencing DNA pooled from multiple microbes living jointly, has increased sharply in the last few years with developments in sequencing technology. Characterizing the contents of metagenomic samples is a challenging task, which has been extensively attempted by both supervised and unsupervised techniques, each with its own limitations. Common to practically all the methods is the processing of single samples only; when multiple samples are sequenced, each is analyzed separately and the results are combined. In this paper we propose to perform a combined analysis of a set of samples in order to obtain a better characterization of each of the samples, and provide two applications of this principle. First, we use an unsupervised probabilistic mixture model to infer hidden components shared across metagenomic samples. We incorporate the model in a novel framework for studying association of microbial sequence elements with phenotypes, analogous to the genome-wide association studies performed on human genomes: We demonstrate that stratification may result in false discoveries of such associations, and that the components inferred by the model can be used to correct for this stratification. Second, we propose a novel read clustering (also termed "binning") algorithm which operates on multiple samples simultaneously, leveraging on the assumption that the different samples contain the same microbial species, possibly in different proportions. We show that integrating information across multiple samples yields more precise binning on each of the samples. Moreover, for both applications we demonstrate that given a fixed depth of coverage, the average per-sample performance generally increases with the number of sequenced samples as long as the per-sample coverage is high enough.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号