首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Recent studies of 16S rRNA sequences through next-generation sequencing have revolutionized our understanding of the microbial community composition and structure. One common approach in using these data to explore the genetic diversity in a microbial community is to cluster the 16S rRNA sequences into Operational Taxonomic Units (OTUs) based on sequence similarities. The inferred OTUs can then be used to estimate species, diversity, composition, and richness. Although a number of methods have been developed and commonly used to cluster the sequences into OTUs, relatively little guidance is available on their relative performance and the choice of key parameters for each method. In this study, we conducted a comprehensive evaluation of ten existing OTU inference methods. We found that the appropriate dissimilarity value for defining distinct OTUs is not only related with a specific method but also related with the sample complexity. For data sets with low complexity, all the algorithms need a higher dissimilarity threshold to define OTUs. Some methods, such as, CROP and SLP, are more robust to the specific choice of the threshold than other methods, especially for shorter reads. For high-complexity data sets, hierarchical cluster methods need a more strict dissimilarity threshold to define OTUs because the commonly used dissimilarity threshold of 3% often leads to an under-estimation of the number of OTUs. In general, hierarchical clustering methods perform better at lower dissimilarity thresholds. Our results show that sequence abundance plays an important role in OTU inference. We conclude that care is needed to choose both a threshold for dissimilarity and abundance for OTU inference.  相似文献   

2.
Operational taxonomic units (OTUs) are conventionally defined at a phylogenetic distance (0.03—species, 0.05—genus, 0.10—family) based on full-length 16S rRNA gene sequences. However, partial sequences (700 bp or shorter) have been used in most studies. This discord may affect analysis of diversity and species richness because sequence divergence is not distributed evenly along the 16S rRNA gene. In this study, we compared a set each of bacterial and archaeal 16S rRNA gene sequences of nearly full length with multiple sets of different partial 16S rRNA gene sequences derived therefrom (approximately 440-700 bp), at conventional and alternative distance levels. Our objective was to identify partial sequence region(s) and distance level(s) that allow more accurate phylogenetic analysis of partial 16S rRNA genes. Our results showed that no partial sequence region could estimate OTU richness or define OTUs as reliably as nearly full-length genes. However, the V1-V4 regions can provide more accurate estimates than others. For analysis of archaea, we recommend the V1-V3 and the V4-V7 regions and clustering of species-level OTUs at 0.03 and 0.02 distances, respectively. For analysis of bacteria, the V1-V3 and the V1-V4 regions should be targeted, with species-level OTUs being clustered at 0.04 distance in both cases.  相似文献   

3.
4.
Because of technological limitations, the primer and amplification biases in targeted sequencing of 16S rRNA genes have veiled the true microbial diversity underlying environmental samples. However, the protocol of metagenomic shotgun sequencing provides 16S rRNA gene fragment data with natural immunity against the biases raised during priming and thus the potential of uncovering the true structure of microbial community by giving more accurate predictions of operational taxonomic units (OTUs). Nonetheless, the lack of statistically rigorous comparison between 16S rRNA gene fragments and other data types makes it difficult to interpret previously reported results using 16S rRNA gene fragments. Therefore, in the present work, we established a standard analysis pipeline that would help confirm if the differences in the data are true or are just due to potential technical bias. This pipeline is built by using simulated data to find optimal mapping and OTU prediction methods. The comparison between simulated datasets revealed a relationship between 16S rRNA gene fragments and full-length 16S rRNA sequences that a 16S rRNA gene fragment having a length >150 bp provides the same accuracy as a full-length 16S rRNA sequence using our proposed pipeline, which could serve as a good starting point for experimental design and making the comparison between 16S rRNA gene fragment-based and targeted 16S rRNA sequencing-based surveys possible.  相似文献   

5.
DNA metabarcoding is a promising method for describing communities and estimating biodiversity. This approach uses high‐throughput sequencing of targeted markers to identify species in a complex sample. By convention, sequences are clustered at a predefined sequence divergence threshold (often 3%) into operational taxonomic units (OTUs) that serve as a proxy for species. However, variable levels of interspecific marker variation across taxonomic groups make clustering sequences from a phylogenetically diverse dataset into OTUs at a uniform threshold problematic. In this study, we use mock zooplankton communities to evaluate the accuracy of species richness estimates when following conventional protocols to cluster hypervariable sequences of the V4 region of the small subunit ribosomal RNA gene (18S) into OTUs. By including individually tagged single specimens and “populations” of various species in our communities, we examine the impact of intra‐ and interspecific diversity on OTU clustering. Communities consisting of single individuals per species generated a correspondence of 59–84% between OTU number and species richness at a 3% divergence threshold. However, when multiple individuals per species were included, the correspondence between OTU number and species richness dropped to 31–63%. Our results suggest that intraspecific variation in this marker can often exceed 3%, such that a single species does not always correspond to one OTU. We advocate the need to apply group‐specific divergence thresholds when analyzing complex and taxonomically diverse communities, but also encourage the development of additional filtering steps that allow identification of artifactual rRNA gene sequences or pseudogenes that may generate spurious OTUs.  相似文献   

6.
Published polymerase chain reaction primer sets for detecting the genes encoding 16S rRNA gene and hydrazine oxidoreductase (hzo) in anammox bacteria were compared by using the same coastal marine sediment samples. While four previously reported primer sets developed to detect the 16S rRNA gene showed varying specificities between 12% and 77%, an optimized primer combination resulted in up to 98% specificity, and the recovered anammox 16S rRNA gene sequences were >95% sequence identical to published sequences from anammox bacteria in the Candidatus “Scalindua” group. Furthermore, four primer sets used in detecting the hzo gene of anammox bacteria were highly specific (up to 92%) and efficient, and the newly designed primer set in this study amplified longer hzo gene segments suitable for phylogenetic analysis. The optimized primer set for the 16S rRNA gene and the newly designed primer set for the hzo gene were successfully applied to identify anammox bacteria from marine sediments of aquaculture zone, coastal wetland, and deep ocean where the three ecosystems form a gradient of anthropogenic impact. Results indicated a broad distribution of anammox bacteria with high niche-specific community structure within each marine ecosystem.  相似文献   

7.
16S rRNA gene analysis is the most convenient and robust method for microbiome studies. Inaccurate taxonomic assignment of bacterial strains could have deleterious effects as all downstream analyses rely heavily on the accurate assessment of microbial taxonomy. The use of mock communities to check the reliability of the results has been suggested. However, often the mock communities used in most of the studies represent only a small fraction of taxa and are used mostly as validation of sequencing run to estimate sequencing artifacts. Moreover, a large number of databases and tools available for classification and taxonomic assignment of the 16S rRNA gene make it challenging to select the best-suited method for a particular dataset. In the present study, we used authentic and validly published 16S rRNA gene type strain sequences (full length, V3-V4 region) and analyzed them using a widely used QIIME pipeline along with different parameters of OTU clustering and QIIME compatible databases. Data Analysis Measures (DAM) revealed a high discrepancy in ratifying the taxonomy at different taxonomic hierarchies. Beta diversity analysis showed clear segregation of different DAMs. Limited differences were observed in reference data set analysis using partial (V3-V4) and full-length 16S rRNA gene sequences, which signify the reliability of partial 16S rRNA gene sequences in microbiome studies. Our analysis also highlights common discrepancies observed at various taxonomic levels using various methods and databases.  相似文献   

8.
Taxonomic classification of the thousands–millions of 16S rRNA gene sequences generated in microbiome studies is often achieved using a naïve Bayesian classifier (for example, the Ribosomal Database Project II (RDP) classifier), due to favorable trade-offs among automation, speed and accuracy. The resulting classification depends on the reference sequences and taxonomic hierarchy used to train the model; although the influence of primer sets and classification algorithms have been explored in detail, the influence of training set has not been characterized. We compared classification results obtained using three different publicly available databases as training sets, applied to five different bacterial 16S rRNA gene pyrosequencing data sets generated (from human body, mouse gut, python gut, soil and anaerobic digester samples). We observed numerous advantages to using the largest, most diverse training set available, that we constructed from the Greengenes (GG) bacterial/archaeal 16S rRNA gene sequence database and the latest GG taxonomy. Phylogenetic clusters of previously unclassified experimental sequences were identified with notable improvements (for example, 50% reduction in reads unclassified at the phylum level in mouse gut, soil and anaerobic digester samples), especially for phylotypes belonging to specific phyla (Tenericutes, Chloroflexi, Synergistetes and Candidate phyla TM6, TM7). Trimming the reference sequences to the primer region resulted in systematic improvements in classification depth, and greatest gains at higher confidence thresholds. Phylotypes unclassified at the genus level represented a greater proportion of the total community variation than classified operational taxonomic units in mouse gut and anaerobic digester samples, underscoring the need for greater diversity in existing reference databases.  相似文献   

9.
Next‐generation sequencing is a common method for analysing microbial community diversity and composition. Configuring an appropriate sequence processing strategy within the variety of tools and methods is a nontrivial task and can considerably influence the resulting community characteristics. We analysed the V4 region of 18S rRNA gene sequences of marine samples by 454‐pyrosequencing. Along this process, we generated several data sets with QIIME, mothur, and a custom‐made pipeline based on DNAStar and the phylogenetic tree‐based PhyloAssigner. For all processing strategies, default parameter settings and punctual variations were used. Our results revealed strong differences in total number of operational taxonomic units (OTUs), indicating that sequence preprocessing and clustering had a major impact on protist diversity estimates. However, diversity estimates of the abundant biosphere (abundance of ≥1%) were reproducible for all conducted processing pipeline versions. A qualitative comparison of diatom genera emphasized strong differences between the pipelines in which phylogenetic placement of sequences came closest to light microscopy‐based diatom identification. We conclude that diversity studies using different sequence processing strategies are comparable if the focus is on higher taxonomic levels, and if abundance thresholds are used to filter out OTUs of the rare biosphere.  相似文献   

10.
Exploring the metabolic characteristics of indigenous PAH degraders is critical to understanding the PAH bioremediation mechanism in the natural environment. While stable-isotopic probing (SIP) is a viable method to identify functional microorganisms in complex environments, the metabolic characteristics of uncultured degraders are still elusive. Here, we investigated the naphthalene (NAP) biodegradation of petroleum polluted soils by combining SIP, amplicon sequencing and metagenome binning. Based on the SIP and amplicon sequencing results, an uncultured Gammaproteobacterium sp. was identified as the key NAP degrader. Additionally, the assembled genome of this uncultured degrader was successfully obtained from the 13C-DNA metagenomes by matching its 16S rRNA gene with the SIP identified OTU sequence. Meanwhile, a number of NAP degrading genes encoding naphthalene/PAH dioxygenases were identified in this genome, further confirming the direct involvement of this indigenous degrader in the NAP degradation. The degrader contained genes related to the metabolisms of several carbon sources, energy substances and vitamins, illuminating potential reasons for why microorganisms cannot be cultivated and finally realize their cultivation. Our findings provide novel information on the mechanisms of in situ PAH biodegradation and add to our current knowledge on the cultivation of non-culturable microorganisms by combining both SIP and metagenome binning.  相似文献   

11.
MOTIVATION: With the advancements of next-generation sequencing technology, it is now possible to study samples directly obtained from the environment. Particularly, 16S rRNA gene sequences have been frequently used to profile the diversity of organisms in a sample. However, such studies are still taxed to determine both the number of operational taxonomic units (OTUs) and their relative abundance in a sample. RESULTS: To address these challenges, we propose an unsupervised Bayesian clustering method termed Clustering 16S rRNA for OTU Prediction (CROP). CROP can find clusters based on the natural organization of data without setting a hard cut-off threshold (3%/5%) as required by hierarchical clustering methods. By applying our method to several datasets, we demonstrate that CROP is robust against sequencing errors and that it produces more accurate results than conventional hierarchical clustering methods. Availability and Implementation: Source code freely available at the following URL: http://code.google.com/p/crop-tingchenlab/, implemented in C++ and supported on Linux and MS Windows.  相似文献   

12.
Analysis of microbial community structure by multivariate ordination methods, using data obtained by high‐throughput sequencing of amplified markers (i.e., DNA metabarcoding), often requires clustering of DNA sequences into operational taxonomic units (OTUs). Parameters for the clustering procedure tend not to be justified but are set by tradition rather than being based on explicit knowledge. In this study, we explore the extent to which ordination results are affected by variation in parameter settings for the clustering procedure. Amplicon sequence data from nine microbial community studies, representing different sampling designs, spatial scales and ecosystems, were subjected to clustering into OTUs at seven different similarity thresholds (clustering thresholds) ranging from 87% to 99% sequence similarity. The 63 data sets thus obtained were subjected to parallel DCA and GNMDS ordinations. The resulting community structures were highly similar across all clustering thresholds. We explain this pattern by the existence of strong ecological structuring gradients and phylogenetically diverse sets of abundant OTUs that are highly stable across clustering thresholds. Removing low‐abundance, rare OTUs had negligible effects on community patterns. Our results indicate that microbial data sets with a clear gradient structure are highly robust to choice of sequence clustering threshold.  相似文献   

13.
Aim: To study genetic diversity of Chromobacterium haemolyticum isolates recovered from a natural tropical lake. Methods and Results: A set of 31 isolates were recovered from a bacterial freshwater community by conventional plating methods and subjected to genetic and phenotypic characterization. The 16S ribosomal RNA (rRNA) gene phylogeny revealed that the isolates were related most closely with C. haemolyticum. In addition to the molecular data, our isolates exhibited strong β‐haemolytic activity, were nonviolacein producers and utilized i‐inositol, d ‐mannitol and d ‐sorbitol in contrast with the other known chromobacteria. Evaluation of the genetic diversity in the 16S rRNA gene, tRNA intergenic spacers (tDNA) and 16S‐23S internal transcribed spacers (ITS) unveiled different levels of genetic heterogeneity in the population, which were also observed with repetitive extragenic palindromic (rep)‐PCR genomic fingerprinting using the BOX‐AR1 primer. tDNA‐ and ITS‐PCR analyses were partially congruent with the 16S rRNA gene phylogeny. The isolates exhibited high resistance to β‐lactamic antibiotics. Conclusion: The population genetic heterogeneity was revealed by 16S rRNA gene sequence, ITS and BOX‐PCR analysis. Significance and Impact of the Study: This study provides for the first time an insight into the genetic diversity of phylogenetically close isolates to C. haemolyticum species.  相似文献   

14.
【背景】对于环境样品中氨氧化古菌(Ammonia-oxidizing archaea,AOA)多样性的研究,利用amoA功能基因作为分子标记会比16SrRNA基因有更强的特异性和更高的分辨率,能更准确地反映环境样品中氨氧化古菌的种群结构和分布特征。然而,目前对amoA基因扩增子高通量测序的分析存在两大限制因素:一是缺乏相应的amoA基因参考数据库;二是AOA amoA基因在种水平上的相似性阈值未知,分析过程中没有明确的划分种水平操作分类单元(Operational taxonomic unit,OTU)的阈值。【目的】构建基于amoA功能基因序列分析氨氧化古菌多样性的方法,为基于高通量测序的功能微生物多样性分析提供参考。【方法】基于目前已通过分离纯化或富集培养获得的34株氨氧化古菌及功能基因数据库中收录的环境样品amoA基因序列,构建氨氧化古菌amoA基因参考数据库。通过菌株间两两比对获得的amoA基因相似度与16SrRNA基因相似度的相关性分析,确定amoA基因在种水平上的相似性阈值。基于MOTHUR软件平台,利用建立的参考数据库和确定的阈值对南海一个垂直水体剖面样品的amoA基因序列进行多样性分析。【结果】构建了含有26 091条序列信息的古菌amoA基因参考数据库,确定了89%作为分析过程中古菌amoA基因划分种水平OTU的阈值,对南海水体样品氨氧化古菌的多样性分析结果很好地显示了南海不同深度水层水体中氨氧化古菌的种群结构和系统发育关系,有效揭示了南海氨氧化古菌的垂直分布差异。【结论】建立了基于amoA基因高通量测序的氨氧化古菌多样性分析方法,此方法可以有效分析环境样品中氨氧化古菌的多样性。  相似文献   

15.
A multi‐locus approach was used to examine the DNA sequences of 10 nominal species of blackfly in the Simulium subgenus Gomphostilbia (Diptera: Simuliidae) in Malaysia. Molecular data were acquired from partial DNA sequences of the mitochondria‐encoded cytochrome c oxidase subunit I (COI), 12S rRNA and 16S rRNA genes, and the nuclear‐encoded 18S rRNA and 28S rRNA genes. No single gene, nor the concatenated gene set, resolved all species or all relationships. However, all morphologically established species were supported by at least one gene. The multi‐locus sequence analysis revealed two distinct evolutionary lineages, conforming to the morphotaxonomically recognized Simulium asakoae and Simulium ceylonicum species groups.  相似文献   

16.
Methods to estimate microbial diversity have developed rapidly in an effort to understand the distribution and diversity of microorganisms in natural environments. For bacterial communities, the 16S rRNA gene is the phylogenetic marker gene of choice, but most studies select only a specific region of the 16S rRNA to estimate bacterial diversity. Whereas biases derived from from DNA extraction, primer choice and PCR amplification are well documented, we here address how the choice of variable region can influence a wide range of standard ecological metrics, such as species richness, phylogenetic diversity, β-diversity and rank-abundance distributions. We have used Illumina paired-end sequencing to estimate the bacterial diversity of 20 natural lakes across Switzerland derived from three trimmed variable 16S rRNA regions (V3, V4, V5). Species richness, phylogenetic diversity, community composition, β-diversity, and rank-abundance distributions differed significantly between 16S rRNA regions. Overall, patterns of diversity quantified by the V3 and V5 regions were more similar to one another than those assessed by the V4 region. Similar results were obtained when analyzing the datasets with different sequence similarity thresholds used during sequences clustering and when the same analysis was used on a reference dataset of sequences from the Greengenes database. In addition we also measured species richness from the same lake samples using ARISA Fingerprinting, but did not find a strong relationship between species richness estimated by Illumina and ARISA. We conclude that the selection of 16S rRNA region significantly influences the estimation of bacterial diversity and species distributions and that caution is warranted when comparing data from different variable regions as well as when using different sequencing techniques.  相似文献   

17.
The exploration of microbial communities by sequencing 16S rRNA genes has expanded with low-cost, high-throughput sequencing instruments. Illumina-based 16S rRNA gene sequencing has recently gained popularity over 454 pyrosequencing due to its lower costs, higher accuracy and greater throughput. Although recent reports suggest that Illumina and 454 pyrosequencing provide similar beta diversity measures, it remains to be demonstrated that pre-existing 454 pyrosequencing workflows can transfer directly from 454 to Illumina MiSeq sequencing by simply changing the sequencing adapters of the primers. In this study, we modified 454 pyrosequencing primers targeting the V4-V5 hyper-variable regions of the 16S rRNA gene to be compatible with Illumina sequencers. Microbial communities from cows, humans, leeches, mice, sewage, and termites and a mock community were analyzed by 454 and MiSeq sequencing of the V4-V5 region and MiSeq sequencing of the V4 region. Our analysis revealed that reference-based OTU clustering alone introduced biases compared to de novo clustering, preventing certain taxa from being observed in some samples. Based on this we devised and recommend an analysis pipeline that includes read merging, contaminant filtering, and reference-based clustering followed by de novo OTU clustering, which produces diversity measures consistent with de novo OTU clustering analysis. Low levels of dataset contamination with Illumina sequencing were discovered that could affect analyses that require highly sensitive approaches. While moving to Illumina-based sequencing platforms promises to provide deeper insights into the breadth and function of microbial diversity, our results show that care must be taken to ensure that sequencing and processing artifacts do not obscure true microbial diversity.  相似文献   

18.
Highly conserved regions are attractive targets for detection and quantitation by PCR, but designing species-specific primer sets can be difficult. Ultimately, almost all primer sets are designed based upon literature searches in public domain databases, such as the National Center for Biotechnology Information (NCBI). Prudence suggests that the researcher needs to evaluate as many sequences as available for designing species-specific PCR primers. In this report, we aligned 11, 9, and 16 DNA sequences entered for Stachybotrys spp. rRNA, tri5, and β-tubulin regions, respectively. Although we were able to align and determine consensus primer sets for the 9 tri5 and the 16 β-tubulin sequences, there was no consensus sequence that could be derived from alignment of the 11 rRNA sequences. However, by judicious clustering of the sequences that aligned well, we were able to design three sets of primers for the rRNA region of S. chartarum. The two primer sets for tri5 and β-tubulin produced satisfactory PCR results for all four strains of S. chartarum used in this study whereas only one rRNA primer set of three produced similar satisfactory results. Ultimately, we were able to show that rRNA copy number is approximately 2-log greater than for tri5 and β-tubulin in the four strains of S. chartarum tested.  相似文献   

19.

Background

The intra- and inter-species genetic diversity of bacteria and the absence of ‘reference’, or the most representative, sequences of individual species present a significant challenge for sequence-based identification. The aims of this study were to determine the utility, and compare the performance of several clustering and classification algorithms to identify the species of 364 sequences of 16S rRNA gene with a defined species in GenBank, and 110 sequences of 16S rRNA gene with no defined species, all within the genus Nocardia.

Methods

A total of 364 16S rRNA gene sequences of Nocardia species were studied. In addition, 110 16S rRNA gene sequences assigned only to the Nocardia genus level at the time of submission to GenBank were used for machine learning classification experiments. Different clustering algorithms were compared with a novel algorithm or the linear mapping (LM) of the distance matrix. Principal Components Analysis was used for the dimensionality reduction and visualization.

Results

The LM algorithm achieved the highest performance and classified the set of 364 16S rRNA sequences into 80 clusters, the majority of which (83.52%) corresponded with the original species. The most representative 16S rRNA sequences for individual Nocardia species have been identified as ‘centroids’ in respective clusters from which the distances to all other sequences were minimized; 110 16S rRNA gene sequences with identifications recorded only at the genus level were classified using machine learning methods. Simple kNN machine learning demonstrated the highest performance and classified Nocardia species sequences with an accuracy of 92.7% and a mean frequency of 0.578.

Conclusion

The identification of centroids of 16S rRNA gene sequence clusters using novel distance matrix clustering enables the identification of the most representative sequences for each individual species of Nocardia and allows the quantitation of inter- and intra-species variability.  相似文献   

20.
16S rRNA基因在微生物生态学中的应用   总被引:10,自引:0,他引:10  
16S rRNA(Small subunit ribosomal RNA)基因是对原核微生物进行系统进化分类研究时最常用的分子标志物(Biomarker),广泛应用于微生物生态学研究中。近些年来随着高通量测序技术及数据分析方法等的不断进步,大量基于16S rRNA基因的研究使得微生物生态学得到了快速发展,然而使用16S rRNA基因作为分子标志物时也存在诸多问题,比如水平基因转移、多拷贝的异质性、基因扩增效率的差异、数据分析方法的选择等,这些问题影响了微生物群落组成和多样性分析时的准确性。对当前使用16S rRNA基因分析微生物群落组成和多样性的进展情况做一总结,重点讨论当前存在的主要问题以及各种分析方法的发展,尤其是与高通量测序技术有关的实验和数据处理问题。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号