首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Removing Noise From Pyrosequenced Amplicons   总被引:2,自引:0,他引:2  

Background  

In many environmental genomics applications a homologous region of DNA from a diverse sample is first amplified by PCR and then sequenced. The next generation sequencing technology, 454 pyrosequencing, has allowed much larger read numbers from PCR amplicons than ever before. This has revolutionised the study of microbial diversity as it is now possible to sequence a substantial fraction of the 16S rRNA genes in a community. However, there is a growing realisation that because of the large read numbers and the lack of consensus sequences it is vital to distinguish noise from true sequence diversity in this data. Otherwise this leads to inflated estimates of the number of types or operational taxonomic units (OTUs) present. Three sources of error are important: sequencing error, PCR single base substitutions and PCR chimeras. We present AmpliconNoise, a development of the PyroNoise algorithm that is capable of separately removing 454 sequencing errors and PCR single base errors. We also introduce a novel chimera removal program, Perseus, that exploits the sequence abundances associated with pyrosequencing data. We use data sets where samples of known diversity have been amplified and sequenced to quantify the effect of each of the sources of error on OTU inflation and to validate these algorithms.  相似文献   

2.
G C Wang  Y Wang 《Applied microbiology》1997,63(12):4645-4650
PCR is routinely used in amplification and cloning of rRNA genes from environmental DNA samples for studies of microbial community structure and identification of novel organisms. There have been concerns about generation of chimeric sequences as a consequence of PCR coamplification of highly conserved genes, because such sequences may lead to reports of nonexistent organisms. To quantify the frequency of chimeric molecule formation, mixed genomic DNAs from eight actinomycete species whose 16S rRNA sequences had been determined were used for PCR coamplification of 16S rRNA genes. A large number of cloned 16S ribosomal DNAs were examined by sequence analysis, and chimeric molecules were identified by multiple-sequence alignment with reference species. Here, we report that the level of occurrence of chimeric sequences after 30 cycles of PCR amplification was 32%. We also show that PCR-induced chimeras were formed between different rRNA gene copies from the same organism. Because of the wide use of PCR for direct isolation of 16S rRNA sequences from environmental DNA to assess microbial diversity, the extent of chimeric molecule formation deserves serious attention.  相似文献   

3.
Due to potential sequencing errors in pyrosequencing data, species richness and diversity indices of microbial systems can be miscalculated. The "traditional" sequence refinement method is not sufficient to account for overestimations (e.g., length, primer errors, ambiguous nucleotides). Recent in silico and single-organism studies have revealed the importance of sequence quality scores in the estimation of ecological indices; however, this is the first study to compare quality-score stringencies across four regions of the SSU rRNA gene sequence (V1V2, V3, V4, and V6) with actual environmental samples compared directly to corresponding clone libraries produced from the same primer sets. The nucleic acid sequences determined via pyrosequencing were subjected to varying quality-score cutoffs that ranged from 25 to 32, and at each quality-score cutoff, either 10 or 15 % of the nucleotides were allowed to be below the cutoff. When species richness estimates were compared for the tested samples, the cutoff values of Q27(15%), Q30(10%), and Q32(15%) for V1V2, V4, and V6, respectively, estimated similar values as obtained with clone libraries and Sanger sequencing. The most stringent Q tested (Q32(10%)) was not enough to account for species richness inflation of the V3 region pyrosequence data. Results indicated that quality-score assessment greatly improved estimates of ecological indices for environmental samples (species richness and α-diversity) and that the effect of quality-score filtering was region-dependent.  相似文献   

4.
16S rRNA基因在微生物生态学中的应用   总被引:10,自引:0,他引:10  
16S rRNA(Small subunit ribosomal RNA)基因是对原核微生物进行系统进化分类研究时最常用的分子标志物(Biomarker),广泛应用于微生物生态学研究中。近些年来随着高通量测序技术及数据分析方法等的不断进步,大量基于16S rRNA基因的研究使得微生物生态学得到了快速发展,然而使用16S rRNA基因作为分子标志物时也存在诸多问题,比如水平基因转移、多拷贝的异质性、基因扩增效率的差异、数据分析方法的选择等,这些问题影响了微生物群落组成和多样性分析时的准确性。对当前使用16S rRNA基因分析微生物群落组成和多样性的进展情况做一总结,重点讨论当前存在的主要问题以及各种分析方法的发展,尤其是与高通量测序技术有关的实验和数据处理问题。  相似文献   

5.
The formation of chimeric sequences can create significant methodological bias in PCR‐based DNA metabarcoding analyses. During mixed‐template amplification of barcoding regions, chimera formation is frequent and well documented. However, profiling of fungal communities typically uses the more variable rDNA region ITS. Due to a larger research community, tools for chimera detection have been developed mainly for the 16S/18S markers. However, these tools are widely applied to the ITS region without verification of their performance. We examined the rate of chimera formation during amplification and 454 sequencing of the ITS2 region from fungal mock communities of different complexities. We evaluated the chimera detecting ability of two common chimera‐checking algorithms: perseus and uchime . Large proportions of the chimeras reported were false positives. No false negatives were found in the data set. Verified chimeras accounted for only 0.2% of the total ITS2 reads, which is considerably less than what is typically reported in 16S and 18S metabarcoding analyses. Verified chimeric ‘parent sequences’ had significantly higher per cent identity to one another than to random members of the mock communities. Community complexity increased the rate of chimera formation. GC content was higher around the verified chimeric break points, potentially facilitating chimera formation through base pair mismatching in the neighbouring regions of high similarity in the chimeric region. We conclude that the hypervariable nature of the ITS region seems to buffer the rate of chimera formation in comparison with other, less variable barcoding regions, due to shorter regions of high sequence similarity.  相似文献   

6.
Next-generation DNA sequencing (NGS) approaches are rapidly surpassing Sanger sequencing for characterizing the diversity of natural microbial communities. Despite this rapid transition, few comparisons exist between Sanger sequences and the generally much shorter reads of NGS. Operational taxonomic units (OTUs) derived from full-length (Sanger sequencing) and pyrotag (454 sequencing of the V9 hypervariable region) sequences of 18S rRNA genes from 10 global samples were analyzed in order to compare the resulting protistan community structures and species richness. Pyrotag OTUs called at 98% sequence similarity yielded numbers of OTUs that were similar overall to those for full-length sequences when the latter were called at 97% similarity. Singleton OTUs strongly influenced estimates of species richness but not the higher-level taxonomic composition of the community. The pyrotag and full-length sequence data sets had slightly different taxonomic compositions of rhizarians, stramenopiles, cryptophytes, and haptophytes, but the two data sets had similarly high compositions of alveolates. Pyrotag-based OTUs were often derived from sequences that mapped to multiple full-length OTUs at 100% similarity. Thus, pyrotags sequenced from a single hypervariable region might not be appropriate for establishing protistan species-level OTUs. However, nonmetric multidimensional scaling plots constructed with the two data sets yielded similar clusters, indicating that beta diversity analysis results were similar for the Sanger and NGS sequences. Short pyrotag sequences can provide holistic assessments of protistan communities, although care must be taken in interpreting the results. The longer reads (>500 bp) that are now becoming available through NGS should provide powerful tools for assessing the diversity of microbial eukaryotic assemblages.  相似文献   

7.
It is important to estimate the true microbial diversities accurately for a comparative microbial diversity analysis among various ecological settings in ecological models. Despite drastically increasing amounts of 16S rRNA gene targeting pyrosequencing data, sampling and data interpretation for comparative analysis have not yet been standardized. For more accurate bacterial diversity analyses, the influences of soil heterogeneity and sequence resolution on bacterial diversity estimates were investigated using pyrosequencing data of oak and pine forest soils with focus on the bacterial 16SrRNA gene. Soil bacterial community sets were phylogenetically clustered into two separate groups by forest type. Rarefaction curves showed that bacterial communities sequenced from the DNA mixtures and the DNAs of the soil mixtures hadmidsize richness compared with other samples. Richness and diversity estimates were highly variable depending on the sequence read numbers. Bacterial richness estimates (ACE, Chao 1 and Jack) of the forest soils had positive linear relationships with the sequence read number. Bacterial diversity estimates (NPShannon, Shannon and the inverse Simpson) of the forest soils were also positively correlated with the sequence read number. One-way ANOVA shows that sequence resolution significantly affected the a-diversity indices (P<0.05), but the soil heterogeneity did not (P>0.05). For an unbiased evaluation, richness and diversity estimates should be calculated and compared from subsets of the same size.  相似文献   

8.
Analysis of microbial communities by high-throughput pyrosequencing of SSU rRNA gene PCR amplicons has transformed microbial ecology research and led to the observation that many communities contain a diverse assortment of rare taxa-a phenomenon termed the Rare Biosphere. Multiple studies have investigated the effect of pyrosequencing read quality on operational taxonomic unit (OTU) richness for contrived communities, yet there is limited information on the fidelity of community structure estimates obtained through this approach. Given that PCR biases are widely recognized, and further unknown biases may arise from the sequencing process itself, a priori assumptions about the neutrality of the data generation process are at best unvalidated. Furthermore, post-sequencing quality control algorithms have not been explicitly evaluated for the accuracy of recovered representative sequences and its impact on downstream analyses, reducing useful discussion on pyrosequencing reads to their diversity and abundances. Here we report on community structures and sequences recovered for in vitro-simulated communities consisting of twenty 16S rRNA gene clones tiered at known proportions. PCR amplicon libraries of the V3-V4 and V6 hypervariable regions from the in vitro-simulated communities were sequenced using the Roche 454 GS FLX Titanium platform. Commonly used quality control protocols resulted in the formation of OTUs with >1% abundance composed entirely of erroneous sequences, while over-aggressive clustering approaches obfuscated real, expected OTUs. The pyrosequencing process itself did not appear to impose significant biases on overall community structure estimates, although the detection limit for rare taxa may be affected by PCR amplicon size and quality control approach employed. Meanwhile, PCR biases associated with the initial amplicon generation may impose greater distortions in the observed community structure.  相似文献   

9.
Analyses of degraded DNA are typically hampered by contamination, especially when employing universal primers such as commonly used in environmental DNA studies. In addition to false-positive results, the amplification of contaminant DNA may cause false-negative results because of competition, or bias, during the PCR. In this study, we test the utility of human-specific blocking primers in mammal diversity analyses of ancient permafrost samples from Siberia. Using quantitative PCR (qPCR) on human and mammoth DNA, we first optimized the design and concentration of blocking primer in the PCR. Subsequently, 454 pyrosequencing of ancient permafrost samples amplified with and without the addition of blocking primer revealed that DNA sequences from a diversity of mammalian representatives of the Beringian megafauna were retrieved only when the blocking primer was added to the PCR. Notably, we observe the first retrieval of woolly rhinoceros (Coelodonta antiquitatis) DNA from ancient permafrost cores. In contrast, reactions without blocking primer resulted in complete dominance by human DNA sequences. These results demonstrate that in ancient environmental analyses, the PCR can be biased towards the amplification of contaminant sequences to such an extent that retrieval of the endogenous DNA is severely restricted. The application of blocking primers is a promising tool to avoid this bias and can greatly enhance the quantity and the diversity of the endogenous DNA sequences that are amplified.  相似文献   

10.
Accurate estimation of biological diversity in environmental DNA samples using high-throughput amplicon pyrosequencing must account for errors generated by PCR and sequencing. We describe a novel approach to distinguish the underlying sequence diversity in environmental DNA samples from errors that uses information on the abundance distribution of similar sequences across independent samples, as well as the frequency and diversity of sequences within individual samples. We have further refined this approach into a bioinformatics pipeline, Amplicon Pyrosequence Denoising Program (APDP) that is able to process raw sequence datasets into a set of validated sequences in formats compatible with commonly used downstream analyses packages. We demonstrate, by sequencing complex environmental samples and mock communities, that APDP is effective for removing errors from deeply sequenced datasets comprising biological and technical replicates, and can efficiently denoise single-sample datasets. APDP provides more conservative diversity estimates for complex datasets than other approaches; however, for some applications this may provide a more accurate and appropriate level of resolution, and result in greater confidence that returned sequences reflect the diversity of the underlying sample.  相似文献   

11.
Soil fungal communities underneath willow canopies that had established on the forefront of a receding glacier were analyzed by cloning the polymerase chain reaction (PCR)-amplified partial small subunit (18S) of the ribosomal (rRNA) genes. Congruence between two sets of fungus-specific primers targeting the same gene region was analyzed by comparisons of inferred neighbor-joining topologies. The importance of chimeric sequences was evaluated by Chimera Check (Ribosomal Database Project) and by data reanalyses after omission of potentially chimeric regions at the 5'- and 3'-ends of the cloned amplicons. Diverse communities of fungi representing Ascomycota, Basidiomycota, Chytridiomycota, and Zygomycota were detected. Ectomycorrhizal fungi comprised a major component in the early plant communities in primary successional ecosystems, as both primer sets frequently detected basidiomycetes (Russulaceae and Thelephoraceae) forming mycorrhizal symbioses. Various ascomycetes (Ophiostomatales, Pezizales, and Sordariales) of uncertain function dominated the clone libraries amplified from the willow canopy soil with one set of primers, whereas the clone libraries of the amplicons generated with the second primer set were dominated by basidiomycetes. Accordingly, primer bias is an important factor in fungal community analyses using DNA extracted from environmental samples. A large proportion (>30%) of the cloned sequences were concluded to be chimeric based on their changing positions in inferred phylogenies after omission of possibly chimeric data. Many chimeric sequences were positioned basal to existing classes of fungi, suggesting that PCR artifacts may cause frequent discovery of new, higher level taxa (order, class) in direct PCR analyses. Longer extension times during the PCR amplification and a smaller number of PCR cycles are necessary precautions to allow collection of reliable environmental sequence data.  相似文献   

12.
Microbial ecology has been profoundly advanced by the ability to profile complex microbial communities by sequencing of marker genes amplified from environmental samples. However, inclusion of appropriate controls is vital to revealing the limitations and biases of this technique. “Mock community” samples, in which the composition and relative abundances of community members are known, are particularly valuable for guiding library preparation and data processing decisions. I generated a set of three mock communities using 19 different fungal taxa and demonstrate their utility by contrasting amplicon sequencing data obtained for the same communities under modifications to PCR conditions during library preparation. Increasing the number of PCR cycles elevated rates of chimera formation, and of errors in the final data set. Extension time during PCR had little impact on chimera formation, error rate or observed community structure. Polymerase fidelity impacted error rates significantly. Despite a high error rate, a master mix optimized to minimize amplification bias yielded profiles that were most similar to the true community structure. Bias against particular taxa differed among ITS1 vs. ITS2 loci. Preclustering nearly identical reads substantially reduced error rates, but did not improve similarity to the expected community structure. Inaccuracies in amplicon sequence‐based estimates of fungal community structure were associated with amplification bias and size selection processes, as well as variable culling rates among reads from different taxa. In some cases, the numerically dominant taxon was completely absent from final data sets, highlighting the need for further methodological improvements to avoid biased observations of community profiles.  相似文献   

13.
Analyses of the structure and function of microbial communities are highly constrained by the diversity of organisms present within most environmental samples. A common approach is to rely almost entirely on DNA sequence data for estimates of microbial diversity, but to date there is no objective method of clustering sequences into groups that is grounded in evolutionary theory of what constitutes a biological lineage. The general mixed Yule-coalescent (GMYC) model uses a likelihood-based approach to distinguish population-level processes within lineages from processes associated with speciation and extinction, thus identifying a distinct point where extant lineages became independent. Using two independent surveys of DNA sequences associated with a group of ubiquitous plant-symbiotic fungi, we compared estimates of species richness derived using the GMYC model to those based on operational taxonomic units (OTUs) defined by fixed levels of sequence similarity. The model predicted lower species richness in these surveys than did traditional methods of sequence similarity. Here, we show for the first time that groups delineated by the GMYC model better explained variation in the distribution of fungi in relation to putative niche-based variables associated with host species identity, edaphic factors, and aspects of how the sampled ecosystems were managed. Our results suggest the coalescent-based GMYC model successfully groups environmental sequences of fungi into clusters that are ecologically more meaningful than more arbitrary approaches for estimating species richness.  相似文献   

14.
Unique DNA sequences are present in all species and can be used as biomarkers for the detection of cells from that species. These DNA sequences can most easily be detected using the polymerase chain reaction (PCR), which allows very small quantities of target DNA sequence to be amplified even when the target is mixed with large amounts of nontarget DNA. PCR amplification of DNA markers that are present in a wide range of species has proven very useful for studies of species diversity in environmental samples. The taxonomic range of species to be identified from environmental samples may often need to be restricted to simplify downstream analyses and to ensure that less abundant sequences are amplified. Group-specific PCR primer sets are one means of specifying the range of taxa that produce an amplicon in a PCR. We have developed a range of group-specific PCR primers for studying the prey diversity found in predator stomach contents and scats. These primers, their design and their application to studying prey diversity and identity in predator diet are described.  相似文献   

15.
Next‐generation sequencing technologies have provided unprecedented insights into fungal diversity and ecology. However, intrinsic biases and insufficient quality control in next‐generation methods can lead to difficult‐to‐detect errors in estimating fungal community richness, distributions and composition. The aim of this study was to examine how tissue storage prior to DNA extraction, primer design and various quality‐control approaches commonly used in 454 amplicon pyrosequencing might influence ecological inferences in studies of endophytic and endolichenic fungi. We first contrast 454 data sets generated contemporaneously from subsets of the same plant and lichen tissues that were stored in CTAB buffer, dried in silica gel or freshly frozen prior to DNA extraction. We show that storage in silica gel markedly limits the recovery of sequence data and yields a small fraction of the diversity observed by the other two methods. Using lichen mycobiont sequences as internal positive controls, we next show that despite careful filtering of raw reads and utilization of current best‐practice OTU clustering methods, homopolymer errors in sequences representing rare taxa artificially increased estimates of richness c. 15‐fold in a model data set. Third, we show that inferences regarding endolichenic diversity can be improved using a novel primer that reduces amplification of the mycobiont. Together, our results provide a rationale for selecting tissue treatment regimes prior to DNA extraction, demonstrate the efficacy of reducing mycobiont amplification in studies of the fungal microbiomes of lichen thalli and highlight the difficulties in differentiating true information about fungal biodiversity from methodological artefacts.  相似文献   

16.
The rDNA internal transcribed spacer (ITS) region has been accepted as a DNA barcoding marker for fungi and is widely used in phylogenetic studies; however, intragenomic ITS variability has been observed in a broad range of taxa, including prokaryotes, plants, animals, and fungi, and this variability has the potential to inflate species richness estimates in molecular investigations of environmental samples. In this study 454 amplicon pyrosequencing of the ITS1 region was applied to 99 phylogenetically diverse axenic single‐spore cultures of fungi (Dikarya: Ascomycota and Basidiomycota) to investigate levels of intragenomic variation. Three species (one Basidiomycota and two Ascomycota), in addition to a positive control species known to contain ITS paralogs, displayed levels of molecular variation indicative of intragenomic variation; taxon inflation due to presumed intragenomic variation was ≈9%. Intragenomic variability in the ITS region appears to be widespread but relatively rare in fungi (≈3–5% of species investigated in this study), suggesting this problem may have minor impacts on species richness estimates relative to PCR and/or pyrosequencing errors. Our results indicate that 454 amplicon pyrosequencing represents a powerful tool for investigating levels of ITS intragenomic variability across taxa, which may be valuable for better understanding the fundamental mechanisms underlying concerted evolution of repetitive DNA regions.  相似文献   

17.
"Barcode-tagged" PCR primers used for multiplex amplicon sequencing generate a thus-far-overlooked amplification bias that produces variable terminal restriction fragment length polymorphism (T-RFLP) and pyrosequencing data from the same environmental DNA template. We propose a simple two-step PCR approach that increases reproducibility and consistently recovers higher genetic diversity in pyrosequencing libraries.  相似文献   

18.
Pyrosequencing of an artificially assembled nematode community of known nematode species at known densities allowed us to characterize the potential extent of chimera problems in multi-template eukaryotic samples. Chimeras were confirmed to be very common, making up to 17% of all high quality pyrosequencing reads and exceeding 40% of all OCTUs (operationally clustered taxonomic units). Typically, chimeric OCTUs were made up of single or double reads, but very well covered OCTUs were also present. As expected, the majority of chimeras were formed between two DNA molecules of nematode origin, but a small proportion involved a nematode and a fragment of another eukaryote origin. In addition, examples of a combination of three or even four different template origins were observed. All chimeras were associated with the presence of conserved regions with 80% of all recombinants following a conserved region of about 25bp. While there was a positive influence of species abundance on the overall number of chimeras, the influence of specific-species identity was less apparent. We also suggest that the problem is not nematode exclusive, but instead applies to other eukaryotes typically accompanying nematodes (e.g. fungi, rotifers, tardigrades). An analysis of real environmental samples revealed the presence of chimeras for all eukaryotic taxa in patterns similar to that observed in artificial nematode communities. This information warrants caution for biodiversity studies utilizing a step of PCR amplification of complex DNA samples. When unrecognized, generated abundant chimeric sequences falsely overestimate eukaryotic biodiversity.  相似文献   

19.
Efficient methods for constructing 16S tag amplicon libraries for pyrosequencing are needed for the rapid and thorough screening of infectious bacterial diversity from host tissue samples. Here we have developed a double‐nested PCR methodology that generates 16S tag amplicon libraries from very small amounts of bacteria/host samples. This methodology was tested for 133 kidney samples from the lake whitefish Coregonus clupeaformis (Salmonidae) sampled in five different lake populations. The double‐nested PCR efficiency was compared with two other PCR strategies: single primer pair amplification and simple nested PCR. The double‐nested PCR was the only amplification strategy to provide highly specific amplification of bacterial DNA. The resulting 16S amplicon libraries were synthesized and pyrosequenced using 454 FLX technology to analyse the variation of pathogenic bacteria abundance. The proportion of the community sequenced was very high (Good’s coverage estimator; mean = 95.4%). Furthermore, there were no significant differences of sequence coverage among samples. Finally, the occurrence of chimeric amplicons was very low. Therefore, the double‐nested PCR approach provides a rapid, informative and cost‐effective method for screening fish immunobiomes and most likely applicable to other low‐density microbiomes as well.  相似文献   

20.
Metabarcoding of environmental samples on second‐generation sequencing platforms has rapidly become a valuable tool for ecological studies. A fundamental assumption of this approach is the reliance on being able to track tagged amplicons back to the samples from which they originated. In this study, we address the problem of sequences in metabarcoding sequencing outputs with false combinations of used tags (tag jumps). Unless these sequences can be identified and excluded from downstream analyses, tag jumps creating sequences with false, but already used tag combinations, can cause incorrect assignment of sequences to samples and artificially inflate diversity. In this study, we document and investigate tag jumping in metabarcoding studies on Illumina sequencing platforms by amplifying mixed‐template extracts obtained from bat droppings and leech gut contents with tagged generic arthropod and mammal primers, respectively. We found that an average of 2.6% and 2.1% of sequences had tag combinations, which could be explained by tag jumping in the leech and bat diet study, respectively. We suggest that tag jumping can happen during blunt‐ending of pools of tagged amplicons during library build and as a consequence of chimera formation during bulk amplification of tagged amplicons during library index PCR. We argue that tag jumping and contamination between libraries represents a considerable challenge for Illumina‐based metabarcoding studies, and suggest measures to avoid false assignment of tag jumping‐derived sequences to samples.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号