首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 265 毫秒
1.
2.
Next‐generation sequencing technologies give access to large sets of data, which are extremely useful in the study of microbial diversity based on 16S rRNA gene. However, the production of such large data sets is not only marred by technical biases and sequencing noise but also increases computation time and disc space use. To improve the accuracy of OTU predictions and overcome both computations, storage and noise issues, recent studies and tools suggested removing all single reads and low abundant OTUs, considering them as noise. Although the effect of applying an OTU abundance threshold on α‐ and β‐diversity has been well documented, the consequences of removing single reads have been poorly studied. Here, we test the effect of singleton read filtering (SRF) on microbial community composition using in silico simulated data sets as well as sequencing data from synthetic and real communities displaying different levels of diversity and abundance profiles. Scalability to large data sets is also assessed using a complete MiSeq run. We show that SRF drastically reduces the chimera content and computational time, enabling the analysis of a complete MiSeq run in just a few minutes. Moreover, SRF accurately determines the actual community diversity: the differences in α‐ and β‐community diversity obtained with SRF and standard procedures are much smaller than the intrinsic variability of technical and biological replicates.  相似文献   

3.
Recent developments of next generation sequencing technologies have led to rapid accumulation of 16S rRNA sequences for microbiome profiling. One key step in data processing is to cluster short sequences into operational taxonomic units (OTUs). Although many methods have been proposed for OTU inferences, a major challenge is the balance between inference accuracy and computational efficiency, where inference accuracy is often sacrificed to accommodate the need to analyze large numbers of sequences. Inspired by the hierarchical clustering method and a modified greedy network clustering algorithm, we propose a novel multi-seeds based heuristic clustering method, named MSClust, for OTU inference. MSClust first adaptively selects multi-seeds instead of one seed for each candidate cluster, and the reads are then processed using a greedy clustering strategy. Through many numerical examples, we demonstrate that MSClust enjoys less memory usage, and better biological accuracy compared to existing heuristic clustering methods while preserving efficiency and scalability.  相似文献   

4.
In spite of technical advances that have provided increases in orders of magnitude in sequencing coverage, microbial ecologists still grapple with how to interpret the genetic diversity represented by the 16S rRNA gene. Two widely used approaches put sequences into bins based on either their similarity to reference sequences (i.e., phylotyping) or their similarity to other sequences in the community (i.e., operational taxonomic units [OTUs]). In the present study, we investigate three issues related to the interpretation and implementation of OTU-based methods. First, we confirm the conventional wisdom that it is impossible to create an accurate distance-based threshold for defining taxonomic levels and instead advocate for a consensus-based method of classifying OTUs. Second, using a taxonomic-independent approach, we show that the average neighbor clustering algorithm produces more robust OTUs than other hierarchical and heuristic clustering algorithms. Third, we demonstrate several steps to reduce the computational burden of forming OTUs without sacrificing the robustness of the OTU assignment. Finally, by blending these solutions, we propose a new heuristic that has a minimal effect on the robustness of OTUs and significantly reduces the necessary time and memory requirements. The ability to quickly and accurately assign sequences to OTUs and then obtain taxonomic information for those OTUs will greatly improve OTU-based analyses and overcome many of the challenges encountered with phylotype-based methods.  相似文献   

5.
AJ Pinto  L Raskin 《PloS one》2012,7(8):e43093
As 16S rRNA gene targeted massively parallel sequencing has become a common tool for microbial diversity investigations, numerous advances have been made to minimize the influence of sequencing and chimeric PCR artifacts through rigorous quality control measures. However, there has been little effort towards understanding the effect of multi-template PCR biases on microbial community structure. In this study, we used three bacterial and three archaeal mock communities consisting of, respectively, 33 bacterial and 24 archaeal 16S rRNA gene sequences combined in different proportions to compare the influences of (1) sequencing depth, (2) sequencing artifacts (sequencing errors and chimeric PCR artifacts), and (3) biases in multi-template PCR, towards the interpretation of community structure in pyrosequencing datasets. We also assessed the influence of each of these three variables on α- and β-diversity metrics that rely on the number of OTUs alone (richness) and those that include both membership and the relative abundance of detected OTUs (diversity). As part of this study, we redesigned bacterial and archaeal primer sets that target the V3-V5 region of the 16S rRNA gene, along with multiplexing barcodes, to permit simultaneous sequencing of PCR products from the two domains. We conclude that the benefits of deeper sequencing efforts extend beyond greater OTU detection and result in higher precision in β-diversity analyses by reducing the variability between replicate libraries, despite the presence of more sequencing artifacts. Additionally, spurious OTUs resulting from sequencing errors have a significant impact on richness or shared-richness based α- and β-diversity metrics, whereas metrics that utilize community structure (including both richness and relative abundance of OTUs) are minimally affected by spurious OTUs. However, the greatest obstacle towards accurately evaluating community structure are the errors in estimated mean relative abundance of each detected OTU due to biases associated with multi-template PCR reactions.  相似文献   

6.
Jiang XT  Zhang H  Sheng HF  Wang Y  He Y  Zou F  Zhou HW 《PloS one》2012,7(1):e30230
Clustering 16S/18S rRNA amplicon sequences into operational taxonomic units (OTUs) is a critical step for the bioinformatic analysis of microbial diversity. Here, we report a pipeline for selecting OTUs with a relatively low computational demand and a high degree of accuracy. This pipeline is referred to as two-stage clustering (TSC) because it divides tags into two groups according to their abundance and clusters them sequentially. The more abundant group is clustered using a hierarchical algorithm similar to that in ESPRIT, which has a high degree of accuracy but is computationally costly for large datasets. The rarer group, which includes the majority of tags, is then heuristically clustered to improve efficiency. To further improve the computational efficiency and accuracy, two preclustering steps are implemented. To maintain clustering accuracy, all tags are grouped into an OTU depending on their pairwise Needleman-Wunsch distance. This method not only improved the computational efficiency but also mitigated the spurious OTU estimation from 'noise' sequences. In addition, OTUs clustered using TSC showed comparable or improved performance in beta-diversity comparisons compared to existing OTU selection methods. This study suggests that the distribution of sequencing datasets is a useful property for improving the computational efficiency and increasing the clustering accuracy of the high-throughput sequencing of PCR amplicons. The software and user guide are freely available at http://hwzhoulab.smu.edu.cn/paperdata/.  相似文献   

7.
MOTIVATION: With the advancements of next-generation sequencing technology, it is now possible to study samples directly obtained from the environment. Particularly, 16S rRNA gene sequences have been frequently used to profile the diversity of organisms in a sample. However, such studies are still taxed to determine both the number of operational taxonomic units (OTUs) and their relative abundance in a sample. RESULTS: To address these challenges, we propose an unsupervised Bayesian clustering method termed Clustering 16S rRNA for OTU Prediction (CROP). CROP can find clusters based on the natural organization of data without setting a hard cut-off threshold (3%/5%) as required by hierarchical clustering methods. By applying our method to several datasets, we demonstrate that CROP is robust against sequencing errors and that it produces more accurate results than conventional hierarchical clustering methods. Availability and Implementation: Source code freely available at the following URL: http://code.google.com/p/crop-tingchenlab/, implemented in C++ and supported on Linux and MS Windows.  相似文献   

8.
In this study, for the first time the diversity of bacteria associated with the endemic freshwater sponge Lubomirskia baicalensis collected from the Sousern Basin of Lake Baikal was investigated employing cultivation-independent approaches. In total, 102 bacterial 16S rRNA clones were screened using restriction fragment length polymorphism (RFLP) and 30 were selected for sequencing. BLASTN and phylogenetic analysis based on near full length 16S rDNA sequences showed that 22 operational taxonomic units (OTUs) were clustered in six known phyla: Actinobacteria (8 OTUs), alpha-Proteobacteria (4 OTUs), beta-Proteobacteria (4 OTUs), Verrucomicrobia (4 OTUs), Nitrospiracea (1 OTU) and Bacteroidetes (1 OTU). Remarkably all phylotypes were affiliated to uncultured microorganisms, however, all alpha-Proteobacteria sequences were closely related to bacteria derived from the freshwater sponge Spongilla lacustris. Our results reveal a high diversity in the L. baicalensis bacterial community and provide an insight into microbial ecology and diversity within freshwater sponges inhabiting the ancient Lake Baikal ecosystem.  相似文献   

9.
10.
Analysis of microbial community structure by multivariate ordination methods, using data obtained by high‐throughput sequencing of amplified markers (i.e., DNA metabarcoding), often requires clustering of DNA sequences into operational taxonomic units (OTUs). Parameters for the clustering procedure tend not to be justified but are set by tradition rather than being based on explicit knowledge. In this study, we explore the extent to which ordination results are affected by variation in parameter settings for the clustering procedure. Amplicon sequence data from nine microbial community studies, representing different sampling designs, spatial scales and ecosystems, were subjected to clustering into OTUs at seven different similarity thresholds (clustering thresholds) ranging from 87% to 99% sequence similarity. The 63 data sets thus obtained were subjected to parallel DCA and GNMDS ordinations. The resulting community structures were highly similar across all clustering thresholds. We explain this pattern by the existence of strong ecological structuring gradients and phylogenetically diverse sets of abundant OTUs that are highly stable across clustering thresholds. Removing low‐abundance, rare OTUs had negligible effects on community patterns. Our results indicate that microbial data sets with a clear gradient structure are highly robust to choice of sequence clustering threshold.  相似文献   

11.
DNA metabarcoding is a promising method for describing communities and estimating biodiversity. This approach uses high‐throughput sequencing of targeted markers to identify species in a complex sample. By convention, sequences are clustered at a predefined sequence divergence threshold (often 3%) into operational taxonomic units (OTUs) that serve as a proxy for species. However, variable levels of interspecific marker variation across taxonomic groups make clustering sequences from a phylogenetically diverse dataset into OTUs at a uniform threshold problematic. In this study, we use mock zooplankton communities to evaluate the accuracy of species richness estimates when following conventional protocols to cluster hypervariable sequences of the V4 region of the small subunit ribosomal RNA gene (18S) into OTUs. By including individually tagged single specimens and “populations” of various species in our communities, we examine the impact of intra‐ and interspecific diversity on OTU clustering. Communities consisting of single individuals per species generated a correspondence of 59–84% between OTU number and species richness at a 3% divergence threshold. However, when multiple individuals per species were included, the correspondence between OTU number and species richness dropped to 31–63%. Our results suggest that intraspecific variation in this marker can often exceed 3%, such that a single species does not always correspond to one OTU. We advocate the need to apply group‐specific divergence thresholds when analyzing complex and taxonomically diverse communities, but also encourage the development of additional filtering steps that allow identification of artifactual rRNA gene sequences or pseudogenes that may generate spurious OTUs.  相似文献   

12.
Next‐generation sequencing is a common method for analysing microbial community diversity and composition. Configuring an appropriate sequence processing strategy within the variety of tools and methods is a nontrivial task and can considerably influence the resulting community characteristics. We analysed the V4 region of 18S rRNA gene sequences of marine samples by 454‐pyrosequencing. Along this process, we generated several data sets with QIIME, mothur, and a custom‐made pipeline based on DNAStar and the phylogenetic tree‐based PhyloAssigner. For all processing strategies, default parameter settings and punctual variations were used. Our results revealed strong differences in total number of operational taxonomic units (OTUs), indicating that sequence preprocessing and clustering had a major impact on protist diversity estimates. However, diversity estimates of the abundant biosphere (abundance of ≥1%) were reproducible for all conducted processing pipeline versions. A qualitative comparison of diatom genera emphasized strong differences between the pipelines in which phylogenetic placement of sequences came closest to light microscopy‐based diatom identification. We conclude that diversity studies using different sequence processing strategies are comparable if the focus is on higher taxonomic levels, and if abundance thresholds are used to filter out OTUs of the rare biosphere.  相似文献   

13.
Marilley  Laurent  Vogt  Gudrun  Blanc  Michel  Aragno  Michel 《Plant and Soil》1998,198(2):219-224
The rhizosphere of Trifolium repens and Lolium perenne was divided into three fractions: the bulk soil, the soil adhering to the roots and the washed roots (rhizoplane and endorhizosphere). After isolation and purification of DNA from these fractions, 16S rDNA was amplified by PCR and cloned to obtain a collection of 16S rRNA genes representative of the bacterial communities of these three fractions. The genes were then characterized by PCR restriction analysis. Each different profile was used to define an operational taxonomic unit (OTU). The numbers of OTUs and the numbers of clones among these OTUs allowed to calculate a diversity index. The number of OTUs decreased as root proximity increased and a few OTUs became dominant, resulting in a lower diversity index. In the root fraction of T. repens, the restriction profile of the dominant OTU matched the theoretical profile of the 16S rRNA gene of Rhizobium leguminosarum. This study showed that plant roots create a selective environment for microbial populations.  相似文献   

14.
Because of technological limitations, the primer and amplification biases in targeted sequencing of 16S rRNA genes have veiled the true microbial diversity underlying environmental samples. However, the protocol of metagenomic shotgun sequencing provides 16S rRNA gene fragment data with natural immunity against the biases raised during priming and thus the potential of uncovering the true structure of microbial community by giving more accurate predictions of operational taxonomic units (OTUs). Nonetheless, the lack of statistically rigorous comparison between 16S rRNA gene fragments and other data types makes it difficult to interpret previously reported results using 16S rRNA gene fragments. Therefore, in the present work, we established a standard analysis pipeline that would help confirm if the differences in the data are true or are just due to potential technical bias. This pipeline is built by using simulated data to find optimal mapping and OTU prediction methods. The comparison between simulated datasets revealed a relationship between 16S rRNA gene fragments and full-length 16S rRNA sequences that a 16S rRNA gene fragment having a length >150 bp provides the same accuracy as a full-length 16S rRNA sequence using our proposed pipeline, which could serve as a good starting point for experimental design and making the comparison between 16S rRNA gene fragment-based and targeted 16S rRNA sequencing-based surveys possible.  相似文献   

15.
The rhizosphere is populated by a numerous and diverse array of rhizobacteria, and many impact productivity in largely unknown ways. Here we characterize the rhizobacterial community in a wheat variety categorized according to shoot biomass using 16S rRNA pyrosequencing abundance data. Plants were grown in homogenized field soil under greenhouse conditions, and DNA was extracted and pyrosequenced, resulting in 29,007 quality sequences. Operational taxonomic units (OTUs) that were significantly associated with biomass productivity were identified using an exact test adjusted for the false-discovery rate. The productivity deviation expressed as a percentage of the total mean square for regression (PMSR) was determined for each OTU. Out of 719 OTUs, 42 showed significant positive associations and 39 showed significant negative associations (q value, ≤0.05). OTUs with the greatest net positive associations, by genus, were as follows: Duganella, OTU 43 and OTU 3; Janthinobacterium, OTU 278; Pseudomonas, OTU 588; and Cellvibrio, OTU 1847. Those with negative associations were as follows: Bacteria, OTU 273; Chryseobacterium, OTU 508; Proteobacteria, OTU 249; and Enterobacter, OTU 357. Shoot biomass productivity was strongly correlated with the balance between the overall abundances of positive- and negative-productivity-associated OTUs. High-productivity rhizospheres contained 9.2 significant positives for every negatively associated rhizobacterium, while low-productivity rhizospheres showed 2.3 significant negatives for every positively associated rhizobacterium. Overall rhizobacterial community diversity as measured by the Chao1, Shannon, and Simpson indexes was nonlinearly related to productivity, closely fitting a wavelike cubic equation. We conclude that shoot biomass productivity is strongly related to the ratio of positive- to negative-productivity-associated rhizobacteria in the rhizosphere. This study identifies significant OTUs composing the productive and unproductive rhizobacterial communities.  相似文献   

16.
Analysis of microbial communities by high-throughput pyrosequencing of SSU rRNA gene PCR amplicons has transformed microbial ecology research and led to the observation that many communities contain a diverse assortment of rare taxa-a phenomenon termed the Rare Biosphere. Multiple studies have investigated the effect of pyrosequencing read quality on operational taxonomic unit (OTU) richness for contrived communities, yet there is limited information on the fidelity of community structure estimates obtained through this approach. Given that PCR biases are widely recognized, and further unknown biases may arise from the sequencing process itself, a priori assumptions about the neutrality of the data generation process are at best unvalidated. Furthermore, post-sequencing quality control algorithms have not been explicitly evaluated for the accuracy of recovered representative sequences and its impact on downstream analyses, reducing useful discussion on pyrosequencing reads to their diversity and abundances. Here we report on community structures and sequences recovered for in vitro-simulated communities consisting of twenty 16S rRNA gene clones tiered at known proportions. PCR amplicon libraries of the V3-V4 and V6 hypervariable regions from the in vitro-simulated communities were sequenced using the Roche 454 GS FLX Titanium platform. Commonly used quality control protocols resulted in the formation of OTUs with >1% abundance composed entirely of erroneous sequences, while over-aggressive clustering approaches obfuscated real, expected OTUs. The pyrosequencing process itself did not appear to impose significant biases on overall community structure estimates, although the detection limit for rare taxa may be affected by PCR amplicon size and quality control approach employed. Meanwhile, PCR biases associated with the initial amplicon generation may impose greater distortions in the observed community structure.  相似文献   

17.
Next-generation DNA sequencing (NGS) approaches are rapidly surpassing Sanger sequencing for characterizing the diversity of natural microbial communities. Despite this rapid transition, few comparisons exist between Sanger sequences and the generally much shorter reads of NGS. Operational taxonomic units (OTUs) derived from full-length (Sanger sequencing) and pyrotag (454 sequencing of the V9 hypervariable region) sequences of 18S rRNA genes from 10 global samples were analyzed in order to compare the resulting protistan community structures and species richness. Pyrotag OTUs called at 98% sequence similarity yielded numbers of OTUs that were similar overall to those for full-length sequences when the latter were called at 97% similarity. Singleton OTUs strongly influenced estimates of species richness but not the higher-level taxonomic composition of the community. The pyrotag and full-length sequence data sets had slightly different taxonomic compositions of rhizarians, stramenopiles, cryptophytes, and haptophytes, but the two data sets had similarly high compositions of alveolates. Pyrotag-based OTUs were often derived from sequences that mapped to multiple full-length OTUs at 100% similarity. Thus, pyrotags sequenced from a single hypervariable region might not be appropriate for establishing protistan species-level OTUs. However, nonmetric multidimensional scaling plots constructed with the two data sets yielded similar clusters, indicating that beta diversity analysis results were similar for the Sanger and NGS sequences. Short pyrotag sequences can provide holistic assessments of protistan communities, although care must be taken in interpreting the results. The longer reads (>500 bp) that are now becoming available through NGS should provide powerful tools for assessing the diversity of microbial eukaryotic assemblages.  相似文献   

18.
Operational taxonomic units (OTUs) are conventionally defined at a phylogenetic distance (0.03—species, 0.05—genus, 0.10—family) based on full-length 16S rRNA gene sequences. However, partial sequences (700 bp or shorter) have been used in most studies. This discord may affect analysis of diversity and species richness because sequence divergence is not distributed evenly along the 16S rRNA gene. In this study, we compared a set each of bacterial and archaeal 16S rRNA gene sequences of nearly full length with multiple sets of different partial 16S rRNA gene sequences derived therefrom (approximately 440-700 bp), at conventional and alternative distance levels. Our objective was to identify partial sequence region(s) and distance level(s) that allow more accurate phylogenetic analysis of partial 16S rRNA genes. Our results showed that no partial sequence region could estimate OTU richness or define OTUs as reliably as nearly full-length genes. However, the V1-V4 regions can provide more accurate estimates than others. For analysis of archaea, we recommend the V1-V3 and the V4-V7 regions and clustering of species-level OTUs at 0.03 and 0.02 distances, respectively. For analysis of bacteria, the V1-V3 and the V1-V4 regions should be targeted, with species-level OTUs being clustered at 0.04 distance in both cases.  相似文献   

19.
Enhanced biological phosphorus removal (EBPR) relies on diverse but specialized microbial communities to mediate the cycling and ultimate removal of phosphorus from municipal wastewaters. However, little is known about microbial activity and dynamics in relation to process fluctuations in EBPR ecosystems. Here, we monitored temporal changes in microbial community structure and potential activity across each bioreactor zone in a pilot‐scale EBPR treatment plant by examining the ratio of small subunit ribosomal RNA (SSU rRNA) to SSU rRNA gene (rDNA) over a 120 day study period. Although the majority of operational taxonomic units (OTUs) in the EBPR ecosystem were rare, many maintained high potential activities based on SSU rRNA : rDNA ratios, suggesting that rare OTUs contribute substantially to protein synthesis potential in EBPR ecosystems. Few significant differences in OTU abundance and activity were observed between bioreactor redox zones, although differences in temporal activity were observed among phylogenetically cohesive OTUs. Moreover, observed temporal activity patterns could not be explained by measured process parameters, suggesting that other ecological drivers, such as grazing or viral lysis, modulated community interactions. Taken together, these results point towards complex interactions selected for within the EBPR ecosystem and highlight a previously unrecognized functional potential among low abundance microorganisms in engineered ecosystems.  相似文献   

20.
Rare bacterial biosphere (RBB) is a large and probably predominant sector of bacterial diversity, which is specifically represented by small populations. Although some RBB components have been characterized phenotypically (actualistic objects), it has been mainly described as a set of virtual objects, i.e., of the 16S rRNA gene sequences from environmental DNA samples, which are grouped into phylotypes (operational taxonomic units, OTUs). The upper OTU threshold for RBB is presently not standardized. It is usually ~1% of the sum of OTU sequences in the metagenome library, or five sequences per OTU in absolute values. The analyzed RBB objects include (1) virtual and actualistic objects; (2) autochthonous and allochthonous forms; (3) vegetative and differentiated cells; (4) dead bacteria and free DNA; and (5) artifacts and informational gaps. The RBB phenomenon has not been sufficiently explained. According to some concepts, the RBB objects are rare due to restrictive action of unfavorable environmental factors. According to others, they utilize a successful adaptive strategy of low abundance, which facilitates higher genetic diversity, dispersal and colonization of new niches, and microbial conversion of specific substrates. Since RBB was revealed only in the early 2000s and is still poorly studied, its role in organic evolution and its place in the ecosystems should be determined by future research. The information on the RBB composition, distribution, and functions will be important for bacteriology, while some cultured species may be of basic or applied importance.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号