首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
DNA metabarcoding is a promising method for describing communities and estimating biodiversity. This approach uses high‐throughput sequencing of targeted markers to identify species in a complex sample. By convention, sequences are clustered at a predefined sequence divergence threshold (often 3%) into operational taxonomic units (OTUs) that serve as a proxy for species. However, variable levels of interspecific marker variation across taxonomic groups make clustering sequences from a phylogenetically diverse dataset into OTUs at a uniform threshold problematic. In this study, we use mock zooplankton communities to evaluate the accuracy of species richness estimates when following conventional protocols to cluster hypervariable sequences of the V4 region of the small subunit ribosomal RNA gene (18S) into OTUs. By including individually tagged single specimens and “populations” of various species in our communities, we examine the impact of intra‐ and interspecific diversity on OTU clustering. Communities consisting of single individuals per species generated a correspondence of 59–84% between OTU number and species richness at a 3% divergence threshold. However, when multiple individuals per species were included, the correspondence between OTU number and species richness dropped to 31–63%. Our results suggest that intraspecific variation in this marker can often exceed 3%, such that a single species does not always correspond to one OTU. We advocate the need to apply group‐specific divergence thresholds when analyzing complex and taxonomically diverse communities, but also encourage the development of additional filtering steps that allow identification of artifactual rRNA gene sequences or pseudogenes that may generate spurious OTUs.  相似文献   

2.
Environmental DNA studies targeting multiple taxa using metabarcoding provide remarkable insights into levels of species diversity in any habitat. The main drawbacks are the presence of primer bias and difficulty in identifying rare species. We tested a DNA sequence‐capture method in parallel with the metabarcoding approach to reveal possible advantages of one method over the other. Both approaches were performed using the same eDNA samples and the same 18S and COI regions, followed by high throughput sequencing. Metabarcoded eDNA libraries were PCR amplified with one primer pair from 18S and COI genes. DNA sequence‐capture libraries were enriched with 3,639 baits targeting the same gene regions. We tested amplicon sequence variants (ASVs) and operational taxonomic units (OTUs) in silico approaches for both markers and methods, using for this purpose the metabarcoding data set. ASVs methods uncovered more species for the COI gene, whereas the opposite occurred for the 18S gene, suggesting that clustering reads into OTUs could bias diversity richness especially using 18S with relaxed thresholds. Additionally, metabarcoding and DNA sequence‐capture recovered 80%–90% of the control sample species. DNA sequence‐capture was 8x more expensive, nonetheless it identified 1.5x more species for COI and 13x more genera for 18S than metabarcoding. Both approaches offer reliable results, sharing ca. 40% species and 72% families and retrieve more taxa when nuclear and mitochondrial markers are combined. eDNA metabarcoding is quite well established and low‐cost, whereas DNA‐sequence capture for biodiversity assessment is still in its infancy, is more time‐consuming but provides more taxonomic assignments.  相似文献   

3.
Effective and precise grouping of highly similar sequences remains a major bottleneck in the evaluation of high-throughput sequencing datasets. Amplicon sequence variants (ASVs) offer a promising alternative that may supersede the widely used operational taxonomic units (OTUs) in environmental sequencing studies. We compared the performance of a recently developed pipeline based on the algorithm DADA2 for obtaining ASVs against a pipeline based on the algorithm SWARM for obtaining OTUs. Illumina-sequencing of 29 individual ciliate species resulted in up to 11 ASVs per species, while SWARM produced up to 19 OTUs per species. To improve the congruency between species diversity and molecular diversity, we applied sequence similarity networks (SSNs) for second-level sequence grouping into network sequence clusters (NSCs). At 100% sequence similarity in SWARM-SSNs, NSC numbers decreased from 7.9-fold overestimation without abundance filter, to 4.5-fold overestimation when an abundance filter was applied. For the DADA2-SSN approach, NSC numbers decreased from 3.5-fold to 3-fold overestimation. Rand index cluster analyses predicted best binning results between 97% and 94% sequence similarity for both DADA2-SSNs and SWARM-SSNs. Depending on the ecological questions addressed in an environmental sequencing study with protists we recommend ASVs as replacement for OTUs, best in combination with SSNs.  相似文献   

4.
Next‐generation sequencing is a common method for analysing microbial community diversity and composition. Configuring an appropriate sequence processing strategy within the variety of tools and methods is a nontrivial task and can considerably influence the resulting community characteristics. We analysed the V4 region of 18S rRNA gene sequences of marine samples by 454‐pyrosequencing. Along this process, we generated several data sets with QIIME, mothur, and a custom‐made pipeline based on DNAStar and the phylogenetic tree‐based PhyloAssigner. For all processing strategies, default parameter settings and punctual variations were used. Our results revealed strong differences in total number of operational taxonomic units (OTUs), indicating that sequence preprocessing and clustering had a major impact on protist diversity estimates. However, diversity estimates of the abundant biosphere (abundance of ≥1%) were reproducible for all conducted processing pipeline versions. A qualitative comparison of diatom genera emphasized strong differences between the pipelines in which phylogenetic placement of sequences came closest to light microscopy‐based diatom identification. We conclude that diversity studies using different sequence processing strategies are comparable if the focus is on higher taxonomic levels, and if abundance thresholds are used to filter out OTUs of the rare biosphere.  相似文献   

5.
Urmas Kõljalg  R. Henrik Nilsson  Kessy Abarenkov  Leho Tedersoo  Andy F. S. Taylor  Mohammad Bahram  Scott T. Bates  Thomas D. Bruns  Johan Bengtsson‐Palme  Tony M. Callaghan  Brian Douglas  Tiia Drenkhan  Ursula Eberhardt  Margarita Dueñas  Tine Grebenc  Gareth W. Griffith  Martin Hartmann  Paul M. Kirk  Petr Kohout  Ellen Larsson  Björn D. Lindahl  Robert Lücking  María P. Martín  P. Brandon Matheny  Nhu H. Nguyen  Tuula Niskanen  Jane Oja  Kabir G. Peay  Ursula Peintner  Marko Peterson  Kadri Põldmaa  Lauri Saag  Irja Saar  Arthur Schüßler  James A. Scott  Carolina Senés  Matthew E. Smith  Ave Suija  D. Lee Taylor  M. Teresa Telleria  Michael Weiss  Karl‐Henrik Larsson 《Molecular ecology》2013,22(21):5271-5277
The nuclear ribosomal internal transcribed spacer (ITS) region is the formal fungal barcode and in most cases the marker of choice for the exploration of fungal diversity in environmental samples. Two problems are particularly acute in the pursuit of satisfactory taxonomic assignment of newly generated ITS sequences: (i) the lack of an inclusive, reliable public reference data set and (ii) the lack of means to refer to fungal species, for which no Latin name is available in a standardized stable way. Here, we report on progress in these regards through further development of the UNITE database ( http://unite.ut.ee ) for molecular identification of fungi. All fungal species represented by at least two ITS sequences in the international nucleotide sequence databases are now given a unique, stable name of the accession number type (e.g. Hymenoscyphus pseudoalbidus|GU586904|SH133781.05FU), and their taxonomic and ecological annotations were corrected as far as possible through a distributed, third‐party annotation effort. We introduce the term ‘species hypothesis’ (SH) for the taxa discovered in clustering on different similarity thresholds (97–99%). An automatically or manually designated sequence is chosen to represent each such SH. These reference sequences are released ( http://unite.ut.ee/repository.php ) for use by the scientific community in, for example, local sequence similarity searches and in the QIIME pipeline. The system and the data will be updated automatically as the number of public fungal ITS sequences grows. We invite everybody in the position to improve the annotation or metadata associated with their particular fungal lineages of expertise to do so through the new Web‐based sequence management system in UNITE.  相似文献   

6.
Microalgae in the division Haptophyta may be difficult to identify to species by microscopy because they are small and fragile. Here, we used high‐throughput sequencing to explore the diversity of haptophytes in outer Oslofjorden, Skagerrak, and supplemented this with electron microscopy. Nano‐ and picoplanktonic subsurface samples were collected monthly for 2 yr, and the haptophytes were targeted by amplification of RNA/cDNA with Haptophyta‐specific 18S ribosomal DNA V4 primers. Pyrosequencing revealed higher species richness of haptophytes than previously observed in the Skagerrak by microscopy. From ca. 400,000 reads we obtained 156 haptophyte operational taxonomic units (OTUs) after rigorous filtering and 99.5% clustering. The majority (84%) of the OTUs matched environmental sequences not linked to a morphological species, most of which were affiliated with the order Prymnesiales. Phylogenetic analyses including Oslofjorden OTUs and available cultured and environmental haptophyte sequences showed that several of the OTUs matched sequences forming deep‐branching lineages, potentially representing novel haptophyte classes. Pyrosequencing also retrieved cultured species not previously reported by microscopy in the Skagerrak. Electron microscopy revealed species not yet genetically characterised and some potentially novel taxa. This study contributes to linking genotype to phenotype within this ubiquitous and ecologically important protist group, and reveals great, unknown diversity.  相似文献   

7.
8.
Next‐generation DNA sequencing has enabled a rapid expansion in the size of molecular fungal ecology studies employing the nuclear internal transcribed spacer (ITS) region. Many sequence‐processing pipelines and protocols require sequence clustering to generate operational taxonomic units (OTUs) based on sequence similarity as a step to reduce total data quantity and complexity prior to taxonomic assignment. However, the consequences of ITS sequence clustering in regard to sample taxonomic coverage have not been carefully examined. Here we demonstrate that typically used clustering thresholds for fungal ITS sequences result in statistically significant losses in taxonomic coverage. Analyses using environmentally derived fungal sequences indicated an average of 3.1% of species went undetected (P < 0.05) if the sequences were denoised and clustered at a 97% threshold prior to taxonomic assignment. Additionally, an in silico analysis using a reference fungal ITS database suggested that approximately 25% of species went undetected if the sequences were clustered prior to taxonomic assignment. Finally, analysis of sequences derived from pure‐cultured fungal isolates of known identity indicated sequence denoising and clustering were not critical in improving identification accuracy.  相似文献   

9.
Biomonitoring approaches and investigations of many ecological questions require assessments of the biodiversity of a given habitat. Small organisms, ranging from protozoans to metazoans, are of great ecological importance and comprise a major share of the planet's biodiversity but they are extremely difficult to identify, due to their minute body sizes and indistinct structures. Thus, most biodiversity studies that include small organisms draw on several methods for species delimitation, ranging from traditional microscopy to molecular techniques. In this study, we compared the efficiency of these methods by analyzing a community of nematodes. Specifically, we evaluated the performances of traditional morphological identification, single‐specimen barcoding (Sanger sequencing), and metabarcoding in the identification of 1500 nematodes from sediment samples. The molecular approaches were based on the analysis of the 28S ribosomal large and 18S small subunits (LSU and SSU). The morphological analysis resulted in the determination of 22 nematode species. Barcoding identified a comparable number of operational taxonomic units (OTUs) based on 28S rDNA (n = 20) and fewer OTUs based on 18S rDNA (n = 12). Metabarcoding identified a higher OTU number but fewer amplicon sequence variants (AVSs) (n = 48 OTUs, n = 17 ASVs for 28S rDNA, and n = 31 OTUs, n = 6 ASVs for 18S rDNA). Between the three approaches (morphology, barcoding, and metabarcoding), only three species (13.6%) were shared. This lack of taxonomic resolution hinders reliable community identifications to the species level. Further database curation will ensure the effective use of molecular species identification.  相似文献   

10.
Different second‐generation sequencing technologies may have taxon‐specific biases when DNA metabarcoding prey in predator faeces. Our major objective was to examine differences in prey recovery from bat guano across two different sequencing workflows using the same faecal DNA extracts. We compared results between the Ion Torrent PGM and the Illumina MiSeq with similar library preparations and the same analysis pipeline. We focus on repeatability and provide an R Notebook in an effort towards transparency for future methodological improvements. Full documentation of each step enhances the accessibility of our analysis pipeline. We tagged DNA from insectivorous bat faecal samples, targeted the arthropod cytochrome c oxidase I minibarcode region and sequenced the product on both second‐generation sequencing platforms. We developed an analysis pipeline with a high operational taxonomic unit (OTU) clustering threshold (i.e., ≥98.5%) followed by copy number filtering to avoid merging rare but genetically similar prey into the same OTUs. With this workflow, we detected 297 unique prey taxa, of which 74% were identified at the species level. Of these, 104 (35%) prey OTUs were detected by both platforms, 176 (59%) OTUs were detected by the Illumina MiSeq system only, and 17 (6%) OTUs were detected using the Ion Torrent system only. Costs were similar between platforms but the Illumina MiSeq recovered six times more reads and four additional insect orders than did Ion Torrent. The considerations we outline are particularly important for long‐term ecological monitoring; a more standardized approach will facilitate comparisons between studies and allow faster recognition of changes within ecological communities.  相似文献   

11.
Analysis of microbial community structure by multivariate ordination methods, using data obtained by high‐throughput sequencing of amplified markers (i.e., DNA metabarcoding), often requires clustering of DNA sequences into operational taxonomic units (OTUs). Parameters for the clustering procedure tend not to be justified but are set by tradition rather than being based on explicit knowledge. In this study, we explore the extent to which ordination results are affected by variation in parameter settings for the clustering procedure. Amplicon sequence data from nine microbial community studies, representing different sampling designs, spatial scales and ecosystems, were subjected to clustering into OTUs at seven different similarity thresholds (clustering thresholds) ranging from 87% to 99% sequence similarity. The 63 data sets thus obtained were subjected to parallel DCA and GNMDS ordinations. The resulting community structures were highly similar across all clustering thresholds. We explain this pattern by the existence of strong ecological structuring gradients and phylogenetically diverse sets of abundant OTUs that are highly stable across clustering thresholds. Removing low‐abundance, rare OTUs had negligible effects on community patterns. Our results indicate that microbial data sets with a clear gradient structure are highly robust to choice of sequence clustering threshold.  相似文献   

12.
Promoted by the barcoding approach, mitochondrial DNA is more than ever used as a molecular marker to identify species boundaries. Yet, it has been repeatedly argued that it may be poorly suited for this purpose, especially in insects where mitochondria are often associated with invasive intracellular bacteria that may promote their introgression. Here, we inform this debate by assessing how divergent nuclear genomes can be when mitochondrial barcodes indicate very high proximity. To this end, we obtained RAD‐seq data from 92 barcode‐based species‐like units (operational taxonomic units [OTUs]) spanning four insect orders. In 100% of the cases, the observed median nuclear divergence was lower than 2%, a value that was recently estimated as one below which nuclear gene flow is not uncommon. These results suggest that although mitochondria may occasionally leak between species, this process is rare enough in insects to make DNA barcoding a reliable tool for clustering specimens into species‐like units.  相似文献   

13.
14.
Metabarcoding has the potential to become a rapid, sensitive, and effective approach for identifying species in complex environmental samples. Accurate molecular identification of species depends on the ability to generate operational taxonomic units (OTUs) that correspond to biological species. Due to the sometimes enormous estimates of biodiversity using this method, there is a great need to test the efficacy of data analysis methods used to derive OTUs. Here, we evaluate the performance of various methods for clustering length variable 18S amplicons from complex samples into OTUs using a mock community and a natural community of zooplankton species. We compare analytic procedures consisting of a combination of (1) stringent and relaxed data filtering, (2) singleton sequences included and removed, (3) three commonly used clustering algorithms (mothur, UCLUST, and UPARSE), and (4) three methods of treating alignment gaps when calculating sequence divergence. Depending on the combination of methods used, the number of OTUs varied by nearly two orders of magnitude for the mock community (60–5068 OTUs) and three orders of magnitude for the natural community (22–22191 OTUs). The use of relaxed filtering and the inclusion of singletons greatly inflated OTU numbers without increasing the ability to recover species. Our results also suggest that the method used to treat gaps when calculating sequence divergence can have a great impact on the number of OTUs. Our findings are particularly relevant to studies that cover taxonomically diverse species and employ markers such as rRNA genes in which length variation is extensive.  相似文献   

15.
16.
Current methods to identify unknown insect (class Insecta) cytochrome c oxidase (COI barcode) sequences often rely on thresholds of distances that can be difficult to define, sequence similarity cut‐offs, or monophyly. Some of the most commonly used metagenomic classification methods do not provide a measure of confidence for the taxonomic assignments they provide. The aim of this study was to use a naïve Bayesian classifier (Wang et al. Applied and Environmental Microbiology, 2007; 73: 5261) to automate taxonomic assignments for large batches of insect COI sequences such as data obtained from high‐throughput environmental sequencing. This method provides rank‐flexible taxonomic assignments with an associated bootstrap support value, and it is faster than the blast ‐based methods commonly used in environmental sequence surveys. We have developed and rigorously tested the performance of three different training sets using leave‐one‐out cross‐validation, two field data sets, and targeted testing of Lepidoptera, Diptera and Mantodea sequences obtained from the Barcode of Life Data system. We found that type I error rates, incorrect taxonomic assignments with a high bootstrap support, were already relatively low but could be lowered further by ensuring that all query taxa are actually present in the reference database. Choosing bootstrap support cut‐offs according to query length and summarizing taxonomic assignments to more inclusive ranks can also help to reduce error while retaining the maximum number of assignments. Additionally, we highlight gaps in the taxonomic and geographic representation of insects in public sequence databases that will require further work by taxonomists to improve the quality of assignments generated using any method.  相似文献   

17.
The advent of next generation sequencing has coincided with a growth in interest in using these approaches to better understand the role of the structure and function of the microbial communities in human, animal, and environmental health. Yet, use of next generation sequencing to perform 16S rRNA gene sequence surveys has resulted in considerable controversy surrounding the effects of sequencing errors on downstream analyses. We analyzed 2.7×10(6) reads distributed among 90 identical mock community samples, which were collections of genomic DNA from 21 different species with known 16S rRNA gene sequences; we observed an average error rate of 0.0060. To improve this error rate, we evaluated numerous methods of identifying bad sequence reads, identifying regions within reads of poor quality, and correcting base calls and were able to reduce the overall error rate to 0.0002. Implementation of the PyroNoise algorithm provided the best combination of error rate, sequence length, and number of sequences. Perhaps more problematic than sequencing errors was the presence of chimeras generated during PCR. Because we knew the true sequences within the mock community and the chimeras they could form, we identified 8% of the raw sequence reads as chimeric. After quality filtering the raw sequences and using the Uchime chimera detection program, the overall chimera rate decreased to 1%. The chimeras that could not be detected were largely responsible for the identification of spurious operational taxonomic units (OTUs) and genus-level phylotypes. The number of spurious OTUs and phylotypes increased with sequencing effort indicating that comparison of communities should be made using an equal number of sequences. Finally, we applied our improved quality-filtering pipeline to several benchmarking studies and observed that even with our stringent data curation pipeline, biases in the data generation pipeline and batch effects were observed that could potentially confound the interpretation of microbial community data.  相似文献   

18.
High‐throughput DNA metabarcoding of amplicon sizes below 500 bp has revolutionized the analysis of environmental microbial diversity. However, these short regions contain limited phylogenetic signal, which makes it impractical to use environmental DNA in full phylogenetic inferences. This lesser phylogenetic resolution of short amplicons may be overcome by new long‐read sequencing technologies. To test this idea, we amplified soil DNA and used PacBio Circular Consensus Sequencing (CCS) to obtain an ~4500‐bp region spanning most of the eukaryotic small subunit (18S) and large subunit (28S) ribosomal DNA genes. We first treated the CCS reads with a novel curation workflow, generating 650 high‐quality operational taxonomic units (OTUs) containing the physically linked 18S and 28S regions. To assign taxonomy to these OTUs, we developed a phylogeny‐aware approach based on the 18S region that showed greater accuracy and sensitivity than similarity‐based methods. The taxonomically annotated OTUs were then combined with available 18S and 28S reference sequences to infer a well‐resolved phylogeny spanning all major groups of eukaryotes, allowing us to accurately derive the evolutionary origin of environmental diversity. A total of 1,019 sequences were included, of which a majority (58%) corresponded to the new long environmental OTUs. The long reads also allowed us to directly investigate the relationships among environmental sequences themselves, which represents a key advantage over the placement of short reads on a reference phylogeny. Together, our results show that long amplicons can be treated in a full phylogenetic framework to provide greater taxonomic resolution and a robust evolutionary perspective to environmental DNA.  相似文献   

19.
Identification of North Sea molluscs with DNA barcoding   总被引:1,自引:0,他引:1       下载免费PDF全文
Sequence‐based specimen identification, known as DNA barcoding, is a common method complementing traditional morphology‐based taxonomic assignments. The fundamental resource in DNA barcoding is the availability of a taxonomically reliable sequence database to use as a reference for sequence comparisons. Here, we provide a reference library including 579 sequences of the mitochondrial cytochrome c oxidase subunit I for 113 North Sea mollusc species. We tested the efficacy of this library by simulating a sequence‐based specimen identification scenario using Best Match, Best Close Match (BCM) and All Species Barcode (ASB) criteria with three different threshold values. Each identification result was compared with our prior morphology‐based taxonomic assignments. Our simulation resulted in 87.7% congruent identifications (93.8% when excluding singletons). The highest number of congruent identifications was obtained with BCM and ASB and a 0.05 threshold. We also compared identifications with genetic clustering (Barcode Index Numbers, BINs) computed by the Barcode of Life Datasystem (BOLD). About 68% of our morphological identifications were congruent with BINs created by BOLD. Forty‐nine sequences were clustered in 16 discordant BINs, and these were divided in two classes: sequences from different species clustered in a single BIN and conspecific sequences divided in more BINs. Whereas former incongruences were probably caused by BOLD entries in need of a taxonomic update, the latter incongruences regarded taxa requiring further investigations. These include species with amphi‐Atlantic distribution, whose genetic structure should be evaluated over their entire range to produce a reliable sequence‐based identification system.  相似文献   

20.
Biomonitoring underpins the environmental assessment of freshwater ecosystems and guides management and conservation. Current methodology for surveys of (macro)invertebrates uses coarse taxonomic identification where species‐level resolution is difficult to obtain. Next‐generation sequencing of entire assemblages (metabarcoding) provides a new approach for species detection, but requires further validation. We used metabarcoding of invertebrate assemblages with two fragments of the cox1 “barcode” and partial nuclear ribosomal (SSU) genes, to assess the effects of a pesticide spill in the River Kennet (southern England). Operational taxonomic unit (OTU) recovery was tested under 72 parameters (read denoising, filtering, pair merging and clustering). Similar taxonomic profiles were obtained under a broad range of parameters. The SSU marker recovered Platyhelminthes and Nematoda, missed by cox1, while Rotifera were only amplified with cox1. A reference set was created from all available barcode entries for Arthropoda in the BOLD database and clustered into OTUs. The River Kennet metabarcoding produced matches to 207 of these reference OTUs, five times the number of species recognized with morphological monitoring. The increase was due to the following: greater taxonomic resolution (e.g., splitting a single morphotaxon “Chironomidae” into 55 named OTUs); splitting of Linnaean binomials into multiple molecular OTUs; and the use of a filtration‐flotation protocol for extraction of minute specimens (meiofauna). Community analyses revealed strong differences between “impacted” vs. “control” samples, detectable with each gene marker, for each major taxonomic group, and for meio‐ and macrofaunal samples separately. Thus, highly resolved taxonomic data can be extracted at a fraction of the time and cost of traditional nonmolecular methods, opening new avenues for freshwater invertebrate biodiversity monitoring and molecular ecology.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号