首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Micro‐organisms account for most of the Earth's biodiversity and yet remain largely unknown. The complexity and diversity of microbial communities present in clinical and environmental samples can now be robustly investigated in record times and prices thanks to recent advances in high‐throughput DNA sequencing (HTS). Here, we develop metaBIT, an open‐source computational pipeline automatizing routine microbial profiling of shotgun HTS data. Customizable by the user at different stringency levels, it performs robust taxonomy‐based assignment and relative abundance calculation of microbial taxa, as well as cross‐sample statistical analyses of microbial diversity distributions. We demonstrate the versatility of metaBIT within a range of published HTS data sets sampled from the environment (soil and seawater) and the human body (skin and gut), but also from archaeological specimens. We present the diversity of outputs provided by the pipeline for the visualization of microbial profiles (barplots, heatmaps) and for their characterization and comparison (diversity indices, hierarchical clustering and principal coordinates analyses). We show that metaBIT allows an automatic, fast and user‐friendly profiling of the microbial DNA present in HTS shotgun data sets. The applications of metaBIT are vast, from monitoring of laboratory errors and contaminations, to the reconstruction of past and present microbiota, and the detection of candidate species, including pathogens.  相似文献   

2.
High‐throughput sequencing methods have become a routine analysis tool in environmental sciences as well as in public and private sector. These methods provide vast amount of data, which need to be analysed in several steps. Although the bioinformatics may be applied using several public tools, many analytical pipelines allow too few options for the optimal analysis for more complicated or customized designs. Here, we introduce PipeCraft, a flexible and handy bioinformatics pipeline with a user‐friendly graphical interface that links several public tools for analysing amplicon sequencing data. Users are able to customize the pipeline by selecting the most suitable tools and options to process raw sequences from Illumina, Pacific Biosciences, Ion Torrent and Roche 454 sequencing platforms. We described the design and options of PipeCraft and evaluated its performance by analysing the data sets from three different sequencing platforms. We demonstrated that PipeCraft is able to process large data sets within 24 hr. The graphical user interface and the automated links between various bioinformatics tools enable easy customization of the workflow. All analytical steps and options are recorded in log files and are easily traceable.  相似文献   

3.
The purpose of this review is to present the most common and emerging DNA‐based methods used to generate data for biodiversity and biomonitoring studies. As environmental assessment and monitoring programmes may require biodiversity information at multiple levels, we pay particular attention to the DNA metabarcoding method and discuss a number of bioinformatic tools and considerations for producing DNA‐based indicators using operational taxonomic units (OTUs), taxa at a variety of ranks and community composition. By developing the capacity to harness the advantages provided by the newest technologies, investigators can “scale up” by increasing the number of samples and replicates processed, the frequency of sampling over time and space, and even the depth of sampling such as by sequencing more reads per sample or more markers per sample. The ability to scale up is made possible by the reduced hands‐on time and cost per sample provided by the newest kits, platforms and software tools. Results gleaned from broad‐scale monitoring will provide opportunities to address key scientific questions linked to biodiversity and its dynamics across time and space as well as being more relevant for policymakers, enabling science‐based decision‐making, and provide a greater socio‐economic impact. As genomic approaches are continually evolving, we provide this guide to methods used in biodiversity genomics.  相似文献   

4.
Molecular identification of mixed‐species pollen samples has a range of applications in various fields of research. To date, such molecular identification has primarily been carried out via amplicon sequencing, but whole‐genome shotgun (WGS) sequencing of pollen DNA has potential advantages, including (1) more genetic information per sample and (2) the potential for better quantitative matching. In this study, we tested the performance of WGS sequencing methodology and publicly available reference sequences in identifying species and quantifying their relative abundance in pollen mock communities. Using mock communities previously analyzed with DNA metabarcoding, we sequenced approximately 200Mbp for each sample using Illumina HiSeq and MiSeq. Taxonomic identifications were based on the Kraken k‐mer identification method with reference libraries constructed from full‐genome and short read archive data from the NCBI database. We found WGS to be a reliable method for taxonomic identification of pollen with near 100% identification of species in mixtures but generating higher rates of false positives (reads not identified to the correct taxon at the required taxonomic level) relative to rbcL and ITS2 amplicon sequencing. For quantification of relative species abundance, WGS data provided a stronger correlation between pollen grain proportion and sequence read proportion, but diverged more from a 1:1 relationship, likely due to the higher rate of false positives. Currently, a limitation of WGS‐based pollen identification is the lack of representation of plant diversity in publicly available genome databases. As databases improve and costs drop, we expect that eventually genomics methods will become the methods of choice for species identification and quantification of mixed‐species pollen samples.  相似文献   

5.
Current biodiversity assessment and biomonitoring are largely based on the morphological identification of selected bioindicator taxa. Recently, several attempts have been made to use eDNA metabarcoding as an alternative tool. However, until now, most applied metabarcoding studies have been based on the taxonomic assignment of sequences that provides reference to morphospecies ecology. Usually, only a small portion of metabarcoding data can be used due to a limited reference database and a lack of phylogenetic resolution. Here, we investigate the possibility to overcome these limitations using a taxonomy‐free approach that allows the computing of a molecular index directly from eDNA data without any reference to morphotaxonomy. As a case study, we use the benthic diatoms index, commonly used for monitoring the biological quality of rivers and streams. We analysed 87 epilithic samples from Swiss rivers, the ecological status of which was established based on the microscopic identification of diatom species. We compared the diatom index derived from eDNA data obtained with or without taxonomic assignment. Our taxonomy‐free approach yields promising results by providing a correct assessment for 77% of examined sites. The main advantage of this method is that almost 95% of OTUs could be used for index calculation, compared to 35% in the case of the taxonomic assignment approach. Its main limitations are under‐sampling and the need to calibrate the index based on the microscopic assessment of diatoms communities. However, once calibrated, the taxonomy‐free molecular index can be easily standardized and applied in routine biomonitoring, as a complementary tool allowing fast and cost‐effective assessment of the biological quality of watercourses.  相似文献   

6.
Faecal samples are of great value as a non‐invasive means to gather information on the genetics, distribution, demography, diet and parasite infestation of endangered species. Direct shotgun sequencing of faecal DNA could give information on these simultaneously, but this approach is largely untested. Here, we used two faecal samples to characterize the diet of two red‐shanked doucs langurs (Pygathrix nemaeus) that were fed known foliage, fruits, vegetables and cereals. Illumina HiSeq produced ~74 and 67 million paired reads for these samples, of which ~10 000 (0.014%) and ~44 000 (0.066%), respectively, were of chloroplast origin. Sequences were matched against a database of available chloroplast ‘barcodes’ for angiosperms. The results were compared with ‘metabarcoding’ using PCR amplification of the P6 loop of trnL. Metagenomics identified seven and nine of the likely 16 diet plants while six and five were identified by metabarcoding. Metabarcoding produced thousands of reads consistent with the known diet, but the barcodes were too short to identify several plant species to genus. Metagenomics utilized multiple, longer barcodes that combined had greater power of identification. However, rare diet items were not recovered. Read numbers for diet species in metagenomic and metabarcoding data were correlated, indicating that both are useful for determining relative sequence abundance. Metagenomic reads were uniformly distributed across the chloroplast genomes; thus, if chloroplast genomes were used as reference, the precision of identifications and species recovery would improve further. Metagenomics also recovered the host mitochondrial genome and numerous intestinal parasite sequences in addition to generating data useful for characterizing the microbiome.  相似文献   

7.
High‐throughput sequencing (HTS) of PCR amplicons is becoming the method of choice to sequence one or several targeted loci for phylogenetic and DNA barcoding studies. Although the development of HTS has allowed rapid generation of massive amounts of DNA sequence data, preparing amplicons for HTS remains a rate‐limiting step. For example, HTS platforms require platform‐specific adapter sequences to be present at the 5′ and 3′ end of the DNA fragment to be sequenced. In addition, short multiplex identifier (MID) tags are typically added to allow multiple samples to be pooled in a single HTS run. Existing methods to incorporate HTS adapters and MID tags into PCR amplicons are either inefficient, requiring multiple enzymatic reactions and clean‐up steps, or costly when applied to multiple samples or loci (fusion primers). We describe a method to amplify a target locus and add HTS adapters and MID tags via a linker sequence using a single PCR. We demonstrate our approach by generating reference sequence data for two mitochondrial loci (COI and 16S) for a diverse suite of insect taxa. Our approach provides a flexible, cost‐effective and efficient method to prepare amplicons for HTS.  相似文献   

8.
Studies on foraging partitioning in pollinators can provide critical information to the understanding of food‐web niche and pollination functions, thus aiding conservation. Metabarcoding based on PCR amplification and high‐throughput sequencing has seen increasing applications in characterizing pollen loads carried by pollinators. However, amplification bias across taxa could lead to unpredictable artefacts in estimation of pollen compositions. We examined the efficacy of a genome‐skimming method based on direct shotgun sequencing in quantifying mixed pollen, using mock samples (five and 14 mocks of flower and bee pollen, respectively). The results demonstrated a high level of repeatability and accuracy in identifying pollen from mixtures of varied species ratios. All pollen species were detected in all mocks, and pollen frequencies estimated from the number of sequence reads of each species were significantly correlated with pollen count proportions (linear model, R2 = 86.7%, p = 2.2e?16). For >97% of the mixed taxa, pollen proportion could be quantified by sequencing to the correct order of magnitude, even for species which constituted only 0.2% of the total pollen. In addition, DNA extracted from pollen grains equivalent to those collected from a single honeybee corbicula was sufficient for genome‐skimming. We conclude that genome‐skimming is a feasible approach to identifying and quantifying mixed pollen samples. By providing reliable and sensitive taxon identification and relative abundance, this method is expected to improve our understanding in studies that involve plant–pollinator interactions, such as pollen preference in corbiculate bees, pollen diet analyses and identification of landscape pollen resource use from beehives.  相似文献   

9.
Blue Catfish Ictalurus furcatus are an invasive, yet economically important species in the Chesapeake Bay. However, their impact on the trophic ecology of this system is not well understood. In order to provide in‐depth analysis of predation by Blue Catfish, we identified prey items using high‐throughput DNA sequencing (HTS) of entire gastrointestinal tracts from 134 samples using two genetic markers, mitochondrial cytochrome c oxidase I (COI) and the nuclear 18S ribosomal RNA gene. We compared our HTS results to a more traditional “hybrid” approach that coupled morphological identification with DNA barcoding. The hybrid study was conducted on additional Blue Catfish samples (n = 617 stomachs) collected from the same location and season in the previous year. Taxonomic representation with HTS vastly surpassed that achieved with the hybrid methodology in Blue Catfish. Significantly, our HTS study identified several instances of at‐risk and invasive species consumption not identified using the hybrid method, supporting the hypothesis that previous studies using morphological methods may greatly underestimate consumption of critical species. Finally, we report the novel finding that Blue Catfish diet diversity inversely correlates to daily flow rates, perhaps due to higher mobility and prey‐seeking behaviors exhibited during lower flow.  相似文献   

10.
High‐throughput sequencing has been proposed as a method to genotype microsatellites and overcome the four main technical drawbacks of capillary electrophoresis: amplification artifacts, imprecise sizing, length homoplasy, and limited multiplex capability. The objective of this project was to test a high‐throughput amplicon sequencing approach to fragment analysis of short tandem repeats and characterize its advantages and disadvantages against traditional capillary electrophoresis. We amplified and sequenced 12 muskrat microsatellite loci from 180 muskrat specimens and analyzed the sequencing data for precision of allele calling, propensity for amplification or sequencing artifacts, and for evidence of length homoplasy. Of the 294 total alleles, we detected by sequencing, only 164 alleles would have been detected by capillary electrophoresis as the remaining 130 alleles (44%) would have been hidden by length homoplasy. The ability to detect a greater number of unique alleles resulted in the ability to resolve greater population genetic structure. The primary advantages of fragment analysis by sequencing are the ability to precisely size fragments, resolve length homoplasy, multiplex many individuals and many loci into a single high‐throughput run, and compare data across projects and across laboratories (present and future) with minimal technical calibration. A significant disadvantage of fragment analysis by sequencing is that the method is only practical and cost‐effective when performed on batches of several hundred samples with multiple loci. Future work is needed to optimize throughput while minimizing costs and to update existing microsatellite allele calling and analysis programs to accommodate sequence‐aware microsatellite data.  相似文献   

11.
One consistent finding among studies using shotgun metagenomics to analyze whole viral communities is that most viral sequences show no significant homology to known sequences. Thus, bioinformatic analyses based on sequence collections such as GenBank nr, which are largely comprised of sequences from known organisms, tend to ignore a majority of sequences within most shotgun viral metagenome libraries. Here we describe a bioinformatic pipeline, the Viral Informatics Resource for Metagenome Exploration (VIROME), that emphasizes the classification of viral metagenome sequences (predicted open-reading frames) based on homology search results against both known and environmental sequences. Functional and taxonomic information is derived from five annotated sequence databases which are linked to the UniRef 100 database. Environmental classifications are obtained from hits against a custom database, MetaGenomes On-Line, which contains 49 million predicted environmental peptides. Each predicted viral metagenomic ORF run through the VIROME pipeline is placed into one of seven ORF classes, thus, every sequence receives a meaningful annotation. Additionally, the pipeline includes quality control measures to remove contaminating and poor quality sequence and assesses the potential amount of cellular DNA contamination in a viral metagenome library by screening for rRNA genes. Access to the VIROME pipeline and analysis results are provided through a web-application interface that is dynamically linked to a relational back-end database. The VIROME web-application interface is designed to allow users flexibility in retrieving sequences (reads, ORFs, predicted peptides) and search results for focused secondary analyses.  相似文献   

12.
Dietary changes linked to the availability of anthropogenic food resources can have complex implications for species and ecosystems, especially when species are in decline. Here, we use recently developed primers targeting the ITS2 region of plants to characterize diet from faecal samples of four UK columbids, with particular focus on the European turtle dove (Streptopelia turtur), a rapidly declining obligate granivore. We examine dietary overlap between species (potential competition), associations with body condition in turtle doves and spatiotemporal variation in diet. We identified 143 taxonomic units, of which we classified 55% to species, another 34% to genus and the remaining 11% to family. We found significant dietary overlap between all columbid species, with the highest between turtle doves and stock doves (Columba oenas), then between turtle doves and woodpigeons (Columba palumbus). The lowest overlap was between woodpigeons and collared doves (Streptopelia decaocto). We show considerable change in columbid diets compared to previous studies, probably reflecting opportunistic foraging behaviour by columbids within a highly anthropogenically modified landscape, although our data for nonturtle doves should be considered preliminary. Nestling turtle doves in better condition had a higher dietary proportion of taxonomic units from natural arable plant species and a lower proportion of taxonomic units from anthropogenic food resources such as garden bird seed mixes and brassicas. This suggests that breeding ground conservation strategies for turtle doves should include provision of anthropogenic seeds for adults early in the breeding season, coupled with habitat rich in accessible seeds from arable plants once chicks have hatched.  相似文献   

13.
动物食性分析是动物营养生态学的重要研究手段,可用于解析动物与环境因素的关联性、捕食者与猎物之间的关系,以及动物物种多样性等科学问题。近年来,基于新一代测序技术的DNA宏条形码技术被广泛应用到生态学多个研究领域,极大地促进了生命科学交叉学科的发展。其中,DNA宏条形码技术在动物食性分析中具有高分辨、高效率、低样本量等优势,具有重要的应用前景。综述了基于DNA宏条形码技术的动物食性分析在生态学中的应用研究进展,并进一步总结了DNA宏条形码技术原理和食性分析方法,着重探讨了基于DNA宏条形码技术的动物食性分析在珍稀濒危动物保护、生物多样性监测、农业害虫防治等生态学研究领域中的应用,并对DNA宏条形码技术在动物食性分析中存在的问题及应用前景进行小结与展望。  相似文献   

14.
The development of DNA sequencing methods for characterizing microbial communities has evolved rapidly over the past decades. To evaluate more traditional, as well as newer methodologies for DNA library preparation and sequencing, we compared fosmid, short-insert shotgun and 454 pyrosequencing libraries prepared from the same metagenomic DNA samples. GC content was elevated in all fosmid libraries, compared with shotgun and 454 libraries. Taxonomic composition of the different libraries suggested that this was caused by a relative underrepresentation of dominant taxonomic groups with low GC content, notably Prochlorales and the SAR11 cluster, in fosmid libraries. While these abundant taxa had a large impact on library representation, we also observed a positive correlation between taxon GC content and fosmid library representation in other low-GC taxa, suggesting a general trend. Analysis of gene category representation in different libraries indicated that the functional composition of a library was largely a reflection of its taxonomic composition, and no additional systematic biases against particular functional categories were detected at the level of sequencing depth in our samples. Another important but less predictable factor influencing the apparent taxonomic and functional library composition was the read length afforded by the different sequencing technologies. Our comparisons and analyses provide a detailed perspective on the influence of library type on the recovery of microbial taxa in metagenomic libraries and underscore the different uses and utilities of more traditional, as well as contemporary ‘next-generation'' DNA library construction and sequencing technologies for exploring the genomics of the natural microbial world.  相似文献   

15.
Many applications in molecular ecology require the ability to match specific DNA sequences from single- or mixed-species samples with a diagnostic reference library. Widely used methods for DNA barcoding and metabarcoding employ PCR and amplicon sequencing to identify taxa based on target sequences, but the target-specific enrichment capabilities of CRISPR-Cas systems may offer advantages in some applications. We identified 54,837 CRISPR-Cas guide RNAs that may be useful for enriching chloroplast DNA across phylogenetically diverse plant species. We tested a subset of 17 guide RNAs in vitro to enrich plant DNA strands ranging in size from diagnostic DNA barcodes of 1,428 bp to entire chloroplast genomes of 121,284 bp. We used an Oxford Nanopore sequencer to evaluate sequencing success based on both single- and mixed-species samples, which yielded mean chloroplast sequence lengths of 2,530–11,367 bp, depending on the experiment. In comparison to mixed-species experiments, single-species experiments yielded more on-target sequence reads and greater mean pairwise identity between contigs and the plant species' reference genomes. But nevertheless, these mixed-species experiments yielded sufficient data to provide ≥48-fold increase in sequence length and better estimates of relative abundance for a commercially prepared mixture of plant species compared to DNA metabarcoding based on the chloroplast trnL-P6 marker. Prior work developed CRISPR-based enrichment protocols for long-read sequencing and our experiments pioneered its use for plant DNA barcoding and chloroplast assemblies that may have advantages over workflows that require PCR and short-read sequencing. Future work would benefit from continuing to develop in vitro and in silico methods for CRISPR-based analyses of mixed-species samples, especially when the appropriate reference genomes for contig assembly cannot be known a priori.  相似文献   

16.
High‐throughput sequencing (HTS) technologies generate millions of sequence reads from DNA/RNA molecules rapidly and cost‐effectively, enabling single investigator laboratories to address a variety of ‘omics’ questions in nonmodel organisms, fundamentally changing the way genomic approaches are used to advance biological research. One major challenge posed by HTS is the complexity and difficulty of data quality control (QC). While QC issues associated with sample isolation, library preparation and sequencing are well known and protocols for their handling are widely available, the QC of the actual sequence reads generated by HTS is often overlooked. HTS‐generated sequence reads can contain various errors, biases and artefacts whose identification and amelioration can greatly impact subsequent data analysis. However, a systematic survey on QC procedures for HTS data is still lacking. In this review, we begin by presenting standard ‘health check‐up’ QC procedures recommended for HTS data sets and establishing what ‘healthy’ HTS data look like. We next proceed by classifying errors, biases and artefacts present in HTS data into three major types of ‘pathologies’, discussing their causes and symptoms and illustrating with examples their diagnosis and impact on downstream analyses. We conclude this review by offering examples of successful ‘treatment’ protocols and recommendations on standard practices and treatment options. Notwithstanding the speed with which HTS technologies – and consequently their pathologies – change, we argue that careful QC of HTS data is an important – yet often neglected – aspect of their application in molecular ecology, and lay the groundwork for developing a HTS data QC ‘best practices’ guide.  相似文献   

17.
High‐throughput DNA analyses are increasingly being used to detect rare mutations in moderately sized genomes. These methods have yielded genome mutation rates that are markedly higher than those obtained using pre‐genomic strategies. Recent work in a variety of organisms has shown that mutation rate is strongly affected by sequence context and genome position. These observations suggest that high‐throughput DNA analyses will ultimately allow researchers to identify trans‐acting factors and cis sequences that underlie mutation rate variation. Such work should provide insights on how mutation rate variability can impact genome organization and disease progression.  相似文献   

18.
高通量测序技术在野生动物食性分析中的应用   总被引:2,自引:0,他引:2  
刘刚  宁宇  夏晓飞  龚明昊 《生态学报》2018,38(9):3347-3356
食性研究是动物生态学颇受关注的一个重要内容,而食性分析方法由于受到技术和适用范围的限制,也在不断改进和更新。随着高通量测序技术的发展,该技术逐渐扩展到野生动物的食性分析,使食性分析的效率得到极大提升,并拓宽了食性分析的应用范围。尽管高通量测序应用于食性分析在数据量、灵敏度和分辨率方面的优势较为明显,但由于涉及到的步骤较多,受到的影响因素较为复杂,目前高通量测序应用于食性分析还属于研究比较薄弱的领域。概述了高通量测序技术应用于食性分析的基本流程,总结了该技术在食物组成分析、种内和种间食性关系、食物与栖息地、行为关系方面的研究动态,分析了PCR、污染和定量分析对该技术应用性的影响,提出了相应的解决对策和建议,并对其应用前景进行了展望。  相似文献   

19.
The present study aimed to estimate the clinical performance of non‐invasive prenatal testing (NIPT) based on high‐throughput sequencing method for the detection of foetal chromosomal deletions and duplications. A total of 6348 pregnant women receiving NIPT using high‐throughput sequencing method were included in our study. They all conceived naturally, without twins, triplets or multiple births. Individuals showing abnormalities in NIPT received invasive ultrasound‐guided amniocentesis for chromosomal karyotype and microarray analysis at 18‐24 weeks of pregnancy. Detection results of foetal chromosomal deletions and duplications were compared between high‐throughput sequencing method and chromosomal karyotype and microarray analysis. Thirty‐eight individuals were identified to show 51 chromosomal deletions/duplications via high‐throughput sequencing method. In subsequent chromosomal karyotype and microarray analysis, 34 subchromosomal deletions/duplications were identified in 26 pregnant women. The observed deletions and duplications ranged from 1.05 to 17.98 Mb. Detection accuracy for these deletions and duplications was 66.7%. Twenty‐one deletions and duplications were found to be correlated with the known abnormalities. NIPT based on high‐throughput sequencing technique is able to identify foetal chromosomal deletions and duplications, but its sensitivity and specificity were not explored. Further progress should be made to reduce false‐positive results.  相似文献   

20.
  1. Increasing access to next‐generation sequencing (NGS) technologies is revolutionizing the life sciences. In disease ecology, NGS‐based methods have the potential to provide higher‐resolution data on communities of parasites found in individual hosts as well as host populations.
  2. Here, we demonstrate how a novel analytical method, utilizing high‐throughput sequencing of PCR amplicons, can be used to explore variation in blood‐borne parasite (Theileria—Apicomplexa: Piroplasmida) communities of African buffalo at higher resolutions than has been obtained with conventional molecular tools.
  3. Results reveal temporal patterns of synchronized and opposite fluctuations of prevalence and relative abundance of Theileria spp. within the host population, suggesting heterogeneous transmission across taxa. Furthermore, we show that the community composition of Theileria spp. and their subtypes varies considerably between buffalo, with differences in composition reflected in mean and variance of overall parasitemia, thereby showing potential to elucidate previously unexplained contrasts in infection outcomes for host individuals.
  4. Importantly, our methods are generalizable as they can be utilized to describe blood‐borne parasite communities in any host species. Furthermore, our methodological framework can be adapted to any parasite system given the appropriate genetic marker.
  5. The findings of this study demonstrate how a novel NGS‐based analytical approach can provide fine‐scale, quantitative data, unlocking opportunities for discovery in disease ecology.
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号