首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.

Background

The 16S rRNA gene-based amplicon sequencing analysis is widely used to determine the taxonomic composition of microbial communities. Once the taxonomic composition of each community is obtained, evolutionary relationships among taxa are inferred by a phylogenetic tree. Thus, the combined representation of taxonomic composition and phylogenetic relationships among taxa is a powerful method for understanding microbial community structure; however, applying phylogenetic tree-based representation with information on the abundance of thousands or more taxa in each community is a difficult task. For this purpose, we previously developed the tool VITCOMIC (VIsualization tool for Taxonomic COmpositions of MIcrobial Community), which is based on the genome-sequenced microbes’ phylogenetic information. Here, we introduce VITCOMIC2, which incorporates substantive improvements over VITCOMIC that were necessary to address several issues associated with 16S rRNA gene-based analysis of microbial communities.

Results

We developed VITCOMIC2 to provide (i) sequence identity searches against broad reference taxa including uncultured taxa; (ii) normalization of 16S rRNA gene copy number differences among taxa; (iii) rapid sequence identity searches by applying the graphics processing unit-based sequence identity search tool CLAST; (iv) accurate taxonomic composition inference and nearly full-length 16S rRNA gene sequence reconstructions for metagenomic shotgun sequencing; and (v) an interactive user interface for simultaneous representation of the taxonomic composition of microbial communities and phylogenetic relationships among taxa. We validated the accuracy of processes (ii) and (iv) by using metagenomic shotgun sequencing data from a mock microbial community.

Conclusions

The improvements incorporated into VITCOMIC2 enable users to acquire an intuitive understanding of microbial community composition based on the 16S rRNA gene sequence data obtained from both metagenomic shotgun and amplicon sequencing.
  相似文献   

2.
Shotgun metagenomic sequencing does not depend on gene-targeted primers or PCR amplification; thus, it is not affected by primer bias or chimeras. However, searching rRNA genes from large shotgun Illumina data sets is computationally expensive, and no approach exists for unsupervised community analysis of small-subunit (SSU) rRNA gene fragments retrieved from shotgun data. We present a pipeline, SSUsearch, to achieve the faster identification of short-subunit rRNA gene fragments and enabled unsupervised community analysis with shotgun data. It also includes classification and copy number correction, and the output can be used by traditional amplicon analysis platforms. Shotgun metagenome data using this pipeline yielded higher diversity estimates than amplicon data but retained the grouping of samples in ordination analyses. We applied this pipeline to soil samples with paired shotgun and amplicon data and confirmed bias against Verrucomicrobia in a commonly used V6-V8 primer set, as well as discovering likely bias against Actinobacteria and for Verrucomicrobia in a commonly used V4 primer set. This pipeline can utilize all variable regions in SSU rRNA and also can be applied to large-subunit (LSU) rRNA genes for confirmation of community structure. The pipeline can scale to handle large amounts of soil metagenomic data (5 Gb memory and 5 central processing unit hours to process 38 Gb [1 lane] of trimmed Illumina HiSeq2500 data) and is freely available at https://github.com/dib-lab/SSUsearch under a BSD license.  相似文献   

3.
The ribosomal small subunit (SSU) rRNA gene has emerged as an important genetic marker for taxonomic identification in environmental sequencing datasets. In addition to being present in the nucleus of eukaryotes and the core genome of prokaryotes, the gene is also found in the mitochondria of eukaryotes and in the chloroplasts of photosynthetic eukaryotes. These three sets of genes are conceptually paralogous and should in most situations not be aligned and analyzed jointly. To identify the origin of SSU sequences in complex sequence datasets has hitherto been a time-consuming and largely manual undertaking. However, the present study introduces Metaxa (), an automated software tool to extract full-length and partial SSU sequences from larger sequence datasets and assign them to an archaeal, bacterial, nuclear eukaryote, mitochondrial, or chloroplast origin. Using data from reference databases and from full-length organelle and organism genomes, we show that Metaxa detects and scores SSU sequences for origin with very low proportions of false positives and negatives. We believe that this tool will be useful in microbial and evolutionary ecology as well as in metagenomics.  相似文献   

4.
In microbial ecology, a fundamental question relates to how community diversity and composition change in response to perturbation. Most studies have had limited ability to deeply sample community structure (e.g. Sanger-sequenced 16S rRNA libraries), or have had limited taxonomic resolution (e.g. studies based on 16S rRNA hypervariable region sequencing). Here, we combine the higher taxonomic resolution of near-full-length 16S rRNA gene amplicons with the economics and sensitivity of short-read sequencing to assay the abundance and identity of organisms that represent as little as 0.01% of sediment bacterial communities. We used a new version of EMIRGE optimized for large data size to reconstruct near-full-length 16S rRNA genes from amplicons sheared and sequenced with Illumina technology. The approach allowed us to differentiate the community composition among samples acquired before perturbation, after acetate amendment shifted the predominant metabolism to iron reduction, and once sulfate reduction began. Results were highly reproducible across technical replicates, and identified specific taxa that responded to the perturbation. All samples contain very high alpha diversity and abundant organisms from phyla without cultivated representatives. Surprisingly, at the time points measured, there was no strong loss of evenness, despite the selective pressure of acetate amendment and change in the terminal electron accepting process. However, community membership was altered significantly. The method allows for sensitive, accurate profiling of the “long tail” of low abundance organisms that exist in many microbial communities, and can resolve population dynamics in response to environmental change.  相似文献   

5.
I present the results of a culture-independent survey of soil bacterial communities from serpentine soils and adjacent nonserpentine comparator soils using a variety of newly developed phylogenetically based statistical tools. The study design included site-based replication of the serpentine-to-nonserpentine community comparison over a regional scale ( approximately 100 km) in Northern California and Southern Oregon by producing 16S rRNA clone libraries from pairs of samples taken on either side of the serepentine-nonserpentine edaphic boundary at three geographical sites. At the division level, the serpentine and nonserpentine communities were similar to each other and to previous data from forest soils. Comparisons of both richness and Shannon diversity produced no significant differences between any of the libraries, but the vast majority of phylogenetically based tests were significant, even with only 50 sequences per library. These results suggest that most samples were distinct, consisting of a collection of lineages generally not found in other samples. The pattern of results showed that serpentine communities tended to be more similar to each other than they were to nonserpentine communities, and these differences were at a lower taxonomic scale. Comparisons of two nonserpentine communities generally showed differences, and some results suggest that the geographical site may control community composition as well. These results show the power of phylogenetic tests to discern differences between 16S rRNA libraries compared to tests that discard DNA data to bin sequences into operational taxonomic units, and they stress the importance of replication at larger scales for inferences regarding microbial biogeography.  相似文献   

6.
I present the results of a culture-independent survey of soil bacterial communities from serpentine soils and adjacent nonserpentine comparator soils using a variety of newly developed phylogenetically based statistical tools. The study design included site-based replication of the serpentine-to-nonserpentine community comparison over a regional scale (~100 km) in Northern California and Southern Oregon by producing 16S rRNA clone libraries from pairs of samples taken on either side of the serepentine-nonserpentine edaphic boundary at three geographical sites. At the division level, the serpentine and nonserpentine communities were similar to each other and to previous data from forest soils. Comparisons of both richness and Shannon diversity produced no significant differences between any of the libraries, but the vast majority of phylogenetically based tests were significant, even with only 50 sequences per library. These results suggest that most samples were distinct, consisting of a collection of lineages generally not found in other samples. The pattern of results showed that serpentine communities tended to be more similar to each other than they were to nonserpentine communities, and these differences were at a lower taxonomic scale. Comparisons of two nonserpentine communities generally showed differences, and some results suggest that the geographical site may control community composition as well. These results show the power of phylogenetic tests to discern differences between 16S rRNA libraries compared to tests that discard DNA data to bin sequences into operational taxonomic units, and they stress the importance of replication at larger scales for inferences regarding microbial biogeography.  相似文献   

7.
There is no single diversity index that adequately summarises species diversity, since this is a multidimensional concept and hence should be quantified using a compound statistical measure. Here, we present the DER algorithm, available as an R package on CRAN and as an RWizard application on http://www.ipez.es/RWizard. This algorithm provides tools for differentiating assemblages on the basis of five compounds of diversity: rarity, heterogeneity, evenness, taxonomic/phylogenetic diversity and functional diversity. For all the samples, the algorithm calculates a total of 39 of the indices most often used. All indices of all samples are transformed to a scale range of between 0 and 1, and the algorithm calculates the polar coordinates of all samples with all possible combinations for all five groups of indices. Thus, for each combination, an index of rarity, heterogeneity (species richness is included in this group), evenness, taxonomic/phylogenetic diversity and functional diversity is used for each group to calculate the polar coordinates of all samples. When the polar coordinates of the samples are obtained for each combination, the convex hull and the mean Euclidean distance between samples are calculated. The algorithm selects the combination of indices with the highest value of the mean between convex hull and mean Euclidean distance between samples; priority is therefore given to maximising dispersion between the samples. The polar coordinates of the selected combination are depicted using a diagram from which it is possible to determine the differences in terms of rarity, heterogeneity, evenness, taxonomic/phylogenetic diversity and functional diversity between assemblages.  相似文献   

8.
Functional analysis of a clinical microbiome facilitates the elucidation of mechanisms by which microbiome perturbation can cause a phenotypic change in the patient. The direct approach for the analysis of the functional capacity of the microbiome is via shotgun metagenomics. An inexpensive method to estimate the functional capacity of a microbial community is through collecting 16S rRNA gene profiles then indirectly inferring the abundance of functional genes. This inference approach has been implemented in the PICRUSt and Tax4Fun software tools. However, those tools have important limitations since they rely on outdated functional databases and uncertain phylogenetic trees and require very specific data pre-processing protocols. Here we introduce Piphillin, a straightforward algorithm independent of any proposed phylogenetic tree, leveraging contemporary functional databases and not obliged to any singular data pre-processing protocol. When all three inference tools were evaluated against actual shotgun metagenomics, Piphillin was superior in predicting gene composition in human clinical samples compared to both PICRUSt and Tax4Fun (p<0.01 and p<0.001, respectively) and Piphillin’s ability to predict disease associations with specific gene orthologs exhibited a 15% increase in balanced accuracy compared to PICRUSt. From laboratory animal samples, no performance advantage was observed for any one of the tools over the others and for environmental samples all produced unsatisfactory predictions. Our results demonstrate that functional inference using the direct method implemented in Piphillin is preferable for clinical biospecimens. Piphillin is publicly available for academic use at http://secondgenome.com/Piphillin.  相似文献   

9.
16S rRNA amplicon analysis and shotgun metagenome sequencing are two main culture-independent strategies to explore the genetic landscape of various microbial communities. Recently, numerous studies have employed these two approaches together, but downstream data analyses were performed separately, which always generated incongruent or conflict signals on both taxonomic and functional classifications. Here we propose a novel approach, RiboFR-Seq (Ribosomal RNA gene flanking region sequencing), for capturing both ribosomal RNA variable regions and their flanking protein-coding genes simultaneously. Through extensive testing on clonal bacterial strain, salivary microbiome and bacterial epibionts of marine kelp, we demonstrated that RiboFR-Seq could detect the vast majority of bacteria not only in well-studied microbiomes but also in novel communities with limited reference genomes. Combined with classical amplicon sequencing and shotgun metagenome sequencing, RiboFR-Seq can link the annotations of 16S rRNA and metagenomic contigs to make a consensus classification. By recognizing almost all 16S rRNA copies, the RiboFR-seq approach can effectively reduce the taxonomic abundance bias resulted from 16S rRNA copy number variation. We believe that RiboFR-Seq, which provides an integrated view of 16S rRNA profiles and metagenomes, will help us better understand diverse microbial communities.  相似文献   

10.

Background

Taxonomic profiling of microbial communities is often performed using small subunit ribosomal RNA (SSU) amplicon sequencing (16S or 18S), while environmental shotgun sequencing is often focused on functional analysis. Large shotgun datasets contain a significant number of SSU sequences and these can be exploited to perform an unbiased SSU--based taxonomic analysis.

Results

Here we present a new program called RiboTagger that identifies and extracts taxonomically informative ribotags located in a specified variable region of the SSU gene in a high-throughput fashion.

Conclusions

RiboTagger permits fast recovery of SSU-RNA sequences from shotgun nucleic acid surveys of complex microbial communities. The program targets all three domains of life, exhibits high sensitivity and specificity and is substantially faster than comparable programs.
  相似文献   

11.
The dynamic expansion of the taxonomic knowledge base is fundamental to further developments in biotechnology and sustainable conservation strategies. The vast array of software tools for numerical taxonomy and probabilistic identification, in conjunction with automated systems for data generation are allowing the construction of large computerised strain databases. New techniques available for the generation of chemical and molecular data, associated with new software tools for data analysis, are leading to a quantum leap in bacterial systematics. The easy exchange of data through an interactive and highly distributed global computer network, such as the Internet, is facilitating the dissemination of taxonomic data. Relevant information for comparative sequence analysis, ribotyping, protein and DNA electrophoretic pattern analysis is available on-line through computerised networks. Several software packages are available for the analysis of molecular data. Nomenclatural and taxonomic Authority Files are available from different sources together with strain specific information. The increasing availability of public domain software, is leading to the establishment and integration of public domain databases all over the world, and promoting co-operative research projects on a scale never seen before.  相似文献   

12.
PCR-based surveys of microbial communities commonly use regions of the small-subunit ribosomal RNA (SSU rRNA) gene to determine taxonomic membership and estimate total diversity. Here we show that the length of the target amplicon has a significant effect on assessments of microbial richness and community membership. Using operational taxonomic unit (OTU)- and taxonomy-based tools, we compared the V6 hypervariable region of the bacterial SSU rRNA gene of three amplicon libraries of c. 100, 400 and 1000 base pairs (bp) from each of two hydrothermal vent fluid samples. We found that the smallest amplicon libraries contained more unique sequences, higher diversity estimates and a different community structure than the other two libraries from each sample. We hypothesize that a combination of polymerase dissociation, cloning bias and mispriming due to secondary structure accounts for the differences. While this relationship is not linear, it is clear that the smallest amplicon libraries contained more different types of sequences, and accordingly, more diverse members of the community. Because divergent and lower abundant taxa can be more readily detected with smaller amplicons, they may provide better assessments of total community diversity and taxonomic membership than longer amplicons in molecular studies of microbial communities.  相似文献   

13.
Microbiology has long studied the ways in which subtle genetic differences between closely related microbial strains can have profound impacts on their phenotypes and those of their surrounding environments and communities. Despite the growth in high-throughput microbial community profiling, however, such strain-level differences remain challenging to detect. Once detected, few quantitative approaches have been well-validated for associating strain variants from microbial communities with phenotypes of interest, such as medication usage, treatment efficacy, host environment, or health. First, the term “strain” itself is not used consistently when defining a highly-resolved taxonomic or genomic unit from within a microbial community. Second, computational methods for identifying such strains directly from shotgun metagenomics are difficult, with several possible reference- and assembly-based approaches available, each with different sensitivity/specificity tradeoffs. Finally, statistical challenges exist in using any of the resulting strain profiles for downstream analyses, which can include strain tracking, phylogenetic analysis, or genetic association studies. We provide an in depth discussion of recently available computational tools that can be applied for this task, as well as statistical models and gaps in performing and interpreting any of these three main types of studies using strain-resolved shotgun metagenomic profiling of microbial communities.  相似文献   

14.
Tangherlini  M.  Miralto  M.  Colantuono  C.  Sangiovanni  M.  Dell&#; Anno  A.  Corinaldesi  C.  Danovaro  R.  Chiusano  M. L. 《BMC bioinformatics》2018,19(15):443-143

Background

Environmental metagenomics is a challenging approach that is exponentially spreading in the scientific community to investigate taxonomic diversity and possible functions of the biological components. The massive amount of sequence data produced, often endowed with rich environmental metadata, needs suitable computational tools to fully explore the embedded information. Bioinformatics plays a key role in providing methodologies to manage, process and mine molecular data, integrated with environmental metagenomics collections. One such relevant example is represented by the Tara Ocean Project.

Results

We considered the Tara 16S miTAGs released by the consortium, representing raw sequences from a shotgun metagenomics approach with similarities to 16S rRNA genes. We generated assembled 16S rDNA sequences, which were classified according to their lengths, the possible presence of chimeric reads, the putative taxonomic affiliation. The dataset was included in GLOSSary (the GLobal Ocean 16S Subunit web accessible resource), a bioinformatics platform to organize environmental metagenomics data. The aims of this work were: i) to present alternative computational approaches to manage challenging metagenomics data; ii) to set up user friendly web-based platforms to allow the integration of environmental metagenomics sequences and of the associated metadata; iii) to implement an appropriate bioinformatics platform supporting the analysis of 16S rDNA sequences exploiting reference datasets, such as the SILVA database. We organized the data in a next-generation NoSQL “schema-less” database, allowing flexible organization of large amounts of data and supporting native geospatial queries. A web interface was developed to permit an interactive exploration and a visual geographical localization of the data, either raw miTAG reads or 16S contigs, from our processing pipeline. Information on unassembled sequences is also available. The taxonomic affiliations of contigs and miTAGs, and the spatial distribution of the sampling sites and their associated sequence libraries, as they are contained in the Tara metadata, can be explored by a query interface, which allows both textual and visual investigations. In addition, all the sequence data were made available for a dedicated BLAST-based web application alongside the SILVA collection.

Conclusions

GLOSSary provides an expandable bioinformatics environment, able to support the scientific community in current and forthcoming environmental metagenomics analyses.
  相似文献   

15.
A major challenge in the field of shotgun metagenomics is the accurate identification of organisms present within a microbial community, based on classification of short sequence reads. Though existing microbial community profiling methods have attempted to rapidly classify the millions of reads output from modern sequencers, the combination of incomplete databases, similarity among otherwise divergent genomes, errors and biases in sequencing technologies, and the large volumes of sequencing data required for metagenome sequencing has led to unacceptably high false discovery rates (FDR). Here, we present the application of a novel, gene-independent and signature-based metagenomic taxonomic profiling method with significantly and consistently smaller FDR than any other available method. Our algorithm circumvents false positives using a series of non-redundant signature databases and examines Genomic Origins Through Taxonomic CHAllenge (GOTTCHA). GOTTCHA was tested and validated on 20 synthetic and mock datasets ranging in community composition and complexity, was applied successfully to data generated from spiked environmental and clinical samples, and robustly demonstrates superior performance compared with other available tools.  相似文献   

16.
Introduction: The study of microbial communities based on the combined analysis of genomic and proteomic data – called metaproteogenomics – has gained increased research attention in recent years. This relatively young field aims to elucidate the functional and taxonomic interplay of proteins in microbiomes and its implications on human health and the environment.

Areas covered: This article reviews bioinformatics methods and software tools dedicated to the analysis of data from metaproteomics and metaproteogenomics experiments. In particular, it focuses on the creation of tailored protein sequence databases, on the optimal use of database search algorithms including methods of error rate estimation, and finally on taxonomic and functional annotation of peptide and protein identifications.

Expert opinion: Recently, various promising strategies and software tools have been proposed for handling typical data analysis issues in metaproteomics. However, severe challenges remain that are highlighted and discussed in this article; these include: (i) robust false-positive assessment of peptide and protein identifications, (ii) complex protein inference against a background of highly redundant data, (iii) taxonomic and functional post-processing of identification data, and finally, (iv) the assessment and provision of metrics and tools for quantitative analysis.  相似文献   


17.
Because of technological limitations, the primer and amplification biases in targeted sequencing of 16S rRNA genes have veiled the true microbial diversity underlying environmental samples. However, the protocol of metagenomic shotgun sequencing provides 16S rRNA gene fragment data with natural immunity against the biases raised during priming and thus the potential of uncovering the true structure of microbial community by giving more accurate predictions of operational taxonomic units (OTUs). Nonetheless, the lack of statistically rigorous comparison between 16S rRNA gene fragments and other data types makes it difficult to interpret previously reported results using 16S rRNA gene fragments. Therefore, in the present work, we established a standard analysis pipeline that would help confirm if the differences in the data are true or are just due to potential technical bias. This pipeline is built by using simulated data to find optimal mapping and OTU prediction methods. The comparison between simulated datasets revealed a relationship between 16S rRNA gene fragments and full-length 16S rRNA sequences that a 16S rRNA gene fragment having a length >150 bp provides the same accuracy as a full-length 16S rRNA sequence using our proposed pipeline, which could serve as a good starting point for experimental design and making the comparison between 16S rRNA gene fragment-based and targeted 16S rRNA sequencing-based surveys possible.  相似文献   

18.
The application of whole-genome shotgun sequencing to microbial communities represents a major development in metagenomics, the study of uncultured microbes via the tools of modern genomic analysis. In the past year, whole-genome shotgun sequencing projects of prokaryotic communities from an acid mine biofilm, the Sargasso Sea, Minnesota farm soil, three deep-sea whale falls, and deep-sea sediments have been reported, adding to previously published work on viral communities from marine and fecal samples. The interpretation of this new kind of data poses a wide variety of exciting and difficult bioinformatics problems. The aim of this review is to introduce the bioinformatics community to this emerging field by surveying existing techniques and promising new approaches for several of the most interesting of these computational problems.  相似文献   

19.
The deep sequencing of 16S rRNA genes amplified by universal primers has revolutionized our understanding of microbial communities by allowing the characterization of the diversity of the uncultured majority. However, some universal primers also amplify eukaryotic rRNA genes, leading to a decrease in the efficiency of sequencing of prokaryotic 16S rRNA genes with possible mischaracterization of the diversity in the microbial community. In this study, we compared 16S rRNA gene sequences from genome-sequenced strains and identified candidates for non-degenerate universal primers that could be used for the amplification of prokaryotic 16S rRNA genes. The 50 identified candidates were investigated to calculate their coverage for prokaryotic and eukaryotic rRNA genes, including those from uncultured taxa and eukaryotic organelles, and a novel universal primer set, 342F-806R, covering many prokaryotic, but not eukaryotic, rRNA genes was identified. This primer set was validated by the amplification of 16S rRNA genes from a soil metagenomic sample and subsequent pyrosequencing using the Roche 454 platform. The same sample was also used for pyrosequencing of the amplicons by employing a commonly used primer set, 338F-533R, and for shotgun metagenomic sequencing using the Illumina platform. Our comparison of the taxonomic compositions inferred by the three sequencing experiments indicated that the non-degenerate 342F-806R primer set can characterize the taxonomic composition of the microbial community without substantial bias, and is highly expected to be applicable to the analysis of a wide variety of microbial communities.  相似文献   

20.

Background

Previously, we demonstrated that dietary protein:carbohydrate ratio dramatically affects the fecal microbial taxonomic structure of kittens using targeted 16S gene sequencing. The present study, using the same fecal samples, applied deep Illumina shotgun sequencing to identify the diet-associated functional potential and analyze taxonomic changes of the feline fecal microbiome.

Methodology & Principal Findings

Fecal samples from kittens fed one of two diets differing in protein and carbohydrate content (high–protein, low–carbohydrate, HPLC; and moderate-protein, moderate-carbohydrate, MPMC) were collected at 8, 12 and 16 weeks of age (n = 6 per group). A total of 345.3 gigabases of sequence were generated from 36 samples, with 99.75% of annotated sequences identified as bacterial. At the genus level, 26% and 39% of reads were annotated for HPLC- and MPMC-fed kittens, with HPLC-fed cats showing greater species richness and microbial diversity. Two phyla, ten families and fifteen genera were responsible for more than 80% of the sequences at each taxonomic level for both diet groups, consistent with the previous taxonomic study. Significantly different abundances between diet groups were observed for 324 genera (56% of all genera identified) demonstrating widespread diet-induced changes in microbial taxonomic structure. Diversity was not affected over time. Functional analysis identified 2,013 putative enzyme function groups were different (p<0.000007) between the two dietary groups and were associated to 194 pathways, which formed five discrete clusters based on average relative abundance. Of those, ten contained more (p<0.022) enzyme functions with significant diet effects than expected by chance. Six pathways were related to amino acid biosynthesis and metabolism linking changes in dietary protein with functional differences of the gut microbiome.

Conclusions

These data indicate that feline feces-derived microbiomes have large structural and functional differences relating to the dietary protein:carbohydrate ratio and highlight the impact of diet early in life.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号