首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Next‐generation sequencing is a common method for analysing microbial community diversity and composition. Configuring an appropriate sequence processing strategy within the variety of tools and methods is a nontrivial task and can considerably influence the resulting community characteristics. We analysed the V4 region of 18S rRNA gene sequences of marine samples by 454‐pyrosequencing. Along this process, we generated several data sets with QIIME, mothur, and a custom‐made pipeline based on DNAStar and the phylogenetic tree‐based PhyloAssigner. For all processing strategies, default parameter settings and punctual variations were used. Our results revealed strong differences in total number of operational taxonomic units (OTUs), indicating that sequence preprocessing and clustering had a major impact on protist diversity estimates. However, diversity estimates of the abundant biosphere (abundance of ≥1%) were reproducible for all conducted processing pipeline versions. A qualitative comparison of diatom genera emphasized strong differences between the pipelines in which phylogenetic placement of sequences came closest to light microscopy‐based diatom identification. We conclude that diversity studies using different sequence processing strategies are comparable if the focus is on higher taxonomic levels, and if abundance thresholds are used to filter out OTUs of the rare biosphere.  相似文献   

2.
Sequencing whole genomes has become a standard research tool in many disciplines including Molecular Ecology, but the rapid technological advances in combination with several competing platforms have resulted in a confusing diversity of formats. This lack of standard formats causes several problems, such as undocumented preprocessing steps or the loss of information in downstream software tools, which do not account for the specifics of the different available formats. ReadTools is an open‐source Java toolkit designed to standardize and preprocess read data from different platforms. It manages FASTQ‐ and SAM‐formatted inputs while dealing with platform‐specific peculiarities and provides a standard SAM compliant output. The code and executable are available at https://github.com/magicDGS/ReadTools .  相似文献   

3.
基于16S rRNA基因测序分析微生物群落多样性   总被引:6,自引:1,他引:5  
微生物群落多样性的研究对于挖掘微生物资源,探索微生物群落功能,阐明微生物群落与生境间的关系具有重要意义。随着宏基因组概念的提出以及测序技术的快速发展,16S rRNA基因测序在微生物群落多样性的研究中已被广泛应用。文中系统地介绍了16S rRNA基因测序分析流程中的四个重要环节,包括测序平台与扩增区的选择、测序数据预处理以及多样性分析方法,就其面临的问题与挑战进行了探讨并对未来的研究方向进行了展望,以期为微生物群落多样性相关研究提供参考。  相似文献   

4.
Genotyping of multilocus gene families, such as the major histocompatibility complex (MHC), may be challenging because of problems with assigning alleles to loci and copy number variation among individuals. Simultaneous amplification and genotyping of multiple loci may be necessary, and in such cases, next-generation deep amplicon sequencing offers a great promise as a genotyping method of choice. Here, we describe jMHC, a computer program developed for analysing and assisting in the visualization of deep amplicon sequencing data. Software operates on FASTA files; therefore, output from any sequencing technology may be used. jMHC was designed specifically for MHC studies but it may be useful for analysing amplicons derived from other multigene families or for genotyping other polymorphic systems. The program is written in Java with user-friendly graphical interface (GUI) and can be run on Microsoft Windows, Linux OS and Mac OS.  相似文献   

5.
Next‐generation sequencing (NGS) technologies are revolutionizing the fields of biology and medicine as powerful tools for amplicon sequencing (AS). Using combinations of primers and barcodes, it is possible to sequence targeted genomic regions with deep coverage for hundreds, even thousands, of individuals in a single experiment. This is extremely valuable for the genotyping of gene families in which locus‐specific primers are often difficult to design, such as the major histocompatibility complex (MHC). The utility of AS is, however, limited by the high intrinsic sequencing error rates of NGS technologies and other sources of error such as polymerase amplification or chimera formation. Correcting these errors requires extensive bioinformatic post‐processing of NGS data. Amplicon Sequence Assignment (amplisas ) is a tool that performs analysis of AS results in a simple and efficient way, while offering customization options for advanced users. amplisas is designed as a three‐step pipeline consisting of (i) read demultiplexing, (ii) unique sequence clustering and (iii) erroneous sequence filtering. Allele sequences and frequencies are retrieved in excel spreadsheet format, making them easy to interpret. amplisas performance has been successfully benchmarked against previously published genotyped MHC data sets obtained with various NGS technologies.  相似文献   

6.
Next‐generation technologies generate an overwhelming amount of gene sequence data. Efficient annotation tools are required to make these data amenable to functional genomics analyses. The Mercator pipeline automatically assigns functional terms to protein or nucleotide sequences. It uses the MapMan ‘BIN’ ontology, which is tailored for functional annotation of plant ‘omics’ data. The classification procedure performs parallel sequence searches against reference databases, compiles the results and computes the most likely MapMan BINs for each query. In the current version, the pipeline relies on manually curated reference classifications originating from the three reference organisms (Arabidopsis, Chlamydomonas, rice), various other plant species that have a reviewed SwissProt annotation, and more than 2000 protein domain and family profiles at InterPro, CDD and KOG. Functional annotations predicted by Mercator achieve accuracies above 90% when benchmarked against manual annotation. In addition to mapping files for direct use in the visualization software MapMan, Mercator provides graphical overview charts, detailed annotation information in a convenient web browser interface and a MapMan‐to‐GO translation table to export results as GO terms. Mercator is available free of charge via http://mapman.gabipd.org/web/guest/app/Mercator .  相似文献   

7.
Characterization of highly duplicated genes, such as genes of the major histocompatibility complex (MHC), where multiple loci often co‐amplify, has until recently been hindered by insufficient read depths per amplicon. Here, we used ultra‐deep Illumina sequencing to resolve genotypes at exon 3 of MHC class I genes in the sedge warbler (Acrocephalus schoenobaenus). We sequenced 24 individuals in two replicates and used this data, as well as a simulated data set, to test the effect of amplicon coverage (range: 500–20 000 reads per amplicon) on the repeatability of genotyping using four different genotyping approaches. A third replicate employed unique barcoding to assess the extent of tag jumping, that is swapping of individual tag identifiers, which may confound genotyping. The reliability of MHC genotyping increased with coverage and approached or exceeded 90% within‐method repeatability of allele calling at coverages of >5000 reads per amplicon. We found generally high agreement between genotyping methods, especially at high coverages. High reliability of the tested genotyping approaches was further supported by our analysis of the simulated data set, although the genotyping approach relying primarily on replication of variants in independent amplicons proved sensitive to repeatable errors. According to the most repeatable genotyping method, the number of co‐amplifying variants per individual ranged from 19 to 42. Tag jumping was detectable, but at such low frequencies that it did not affect the reliability of genotyping. We thus demonstrate that gene families with many co‐amplifying genes can be reliably genotyped using HTS, provided that there is sufficient per amplicon coverage.  相似文献   

8.
张军毅    朱冰川  徐超  丁啸  李俊锋  张学工  陆祖宏   《生态学杂志》2015,26(11):3545-3553
随着新一代DNA测序技术出现,人们能够同时对多个DNA样本的宏基因组进行并行分析,尤其是以16S rRNA基因高变区为分子标记的测序已经成为微生物多样性研究最为简洁有效的方法. 目前二代高通量测序的读长不能覆盖16S rRNA基因的全长,需要选择一个有效的高变区进行测序.十多年来,对于16S rRNA基因高变区的选择策略没有统一的标准.本文分析了常用的高变区选择策略,指出不同环境条件是影响高变区选择的重要因素之一.在此基础上,提出了高变区选择的参考准则,同时建议应对选择的高变区进行有效评估.  相似文献   

9.
DNA barcodes are useful for species discovery and species identification, but obtaining barcodes currently requires a well‐equipped molecular laboratory and is time‐consuming, and/or expensive. We here address these issues by developing a barcoding pipeline for Oxford Nanopore MinION? and demonstrating that one flow cell can generate barcodes for ~500 specimens despite the high basecall error rates of MinION? reads. The pipeline overcomes these errors by first summarizing all reads for the same tagged amplicon as a consensus barcode. Consensus barcodes are overall mismatch‐free but retain indel errors that are concentrated in homopolymeric regions. They are addressed with an optional error correction pipeline that is based on conserved amino acid motifs from publicly available barcodes. The effectiveness of this pipeline is documented by analysing reads from three MinION? runs that represent three different stages of MinION? development. They generated data for (i) 511 specimens of a mixed Diptera sample, (ii) 575 specimens of ants and (iii) 50 specimens of Chironomidae. The run based on the latest chemistry yielded MinION? barcodes for 490 of the 511 specimens which were assessed against reference Sanger barcodes (N = 471). Overall, the MinION? barcodes have an accuracy of 99.3%–100% with the number of ambiguous bases after correction ranging from <0.01% to 1.5% depending on which correction pipeline is used. We demonstrate that it requires ~2 hr of sequencing to gather all information needed for obtaining reliable barcodes for most specimens (>90%). We estimate that up to 1,000 barcodes can be generated in one flow cell and that the cost per barcode can be 相似文献   

10.
  1. Increasing access to next‐generation sequencing (NGS) technologies is revolutionizing the life sciences. In disease ecology, NGS‐based methods have the potential to provide higher‐resolution data on communities of parasites found in individual hosts as well as host populations.
  2. Here, we demonstrate how a novel analytical method, utilizing high‐throughput sequencing of PCR amplicons, can be used to explore variation in blood‐borne parasite (Theileria—Apicomplexa: Piroplasmida) communities of African buffalo at higher resolutions than has been obtained with conventional molecular tools.
  3. Results reveal temporal patterns of synchronized and opposite fluctuations of prevalence and relative abundance of Theileria spp. within the host population, suggesting heterogeneous transmission across taxa. Furthermore, we show that the community composition of Theileria spp. and their subtypes varies considerably between buffalo, with differences in composition reflected in mean and variance of overall parasitemia, thereby showing potential to elucidate previously unexplained contrasts in infection outcomes for host individuals.
  4. Importantly, our methods are generalizable as they can be utilized to describe blood‐borne parasite communities in any host species. Furthermore, our methodological framework can be adapted to any parasite system given the appropriate genetic marker.
  5. The findings of this study demonstrate how a novel NGS‐based analytical approach can provide fine‐scale, quantitative data, unlocking opportunities for discovery in disease ecology.
  相似文献   

11.
12.
DNA metabarcoding offers new perspectives in biodiversity research. This recently developed approach to ecosystem study relies heavily on the use of next‐generation sequencing (NGS) and thus calls upon the ability to deal with huge sequence data sets. The obitools package satisfies this requirement thanks to a set of programs specifically designed for analysing NGS data in a DNA metabarcoding context. Their capacity to filter and edit sequences while taking into account taxonomic annotation helps to set up tailor‐made analysis pipelines for a broad range of DNA metabarcoding applications, including biodiversity surveys or diet analyses. The obitools package is distributed as an open source software available on the following website: http://metabarcoding.org/obitools . A Galaxy wrapper is available on the GenOuest core facility toolshed: http://toolshed.genouest.org .  相似文献   

13.
14.
Natural history collections are unparalleled repositories of geographical and temporal variation in faunal conditions. Molecular studies offer an opportunity to uncover much of this variation; however, genetic studies of historical museum specimens typically rely on extracting highly degraded and chemically modified DNA samples from skins, skulls or other dried samples. Despite this limitation, obtaining short fragments of DNA sequences using traditional PCR amplification of DNA has been the primary method for genetic study of historical specimens. Few laboratories have succeeded in obtaining genome-scale sequences from historical specimens and then only with considerable effort and cost. Here, we describe a low-cost approach using high-throughput next-generation sequencing to obtain reliable genome-scale sequence data from a traditionally preserved mammal skin and skull using a simple extraction protocol. We show that single-nucleotide polymorphisms (SNPs) from the genome sequences obtained independently from the skin and from the skull are highly repeatable compared to a reference genome.  相似文献   

15.
16.
17.
The choice of technology and bioinformatics approach is critical in obtaining accurate and reliable information from next‐generation sequencing (NGS) experiments. An increasing number of software and methodological guidelines are being published, but deciding upon which approach and experimental design to use can depend on the particularities of the species and on the aims of the study. This leaves researchers unable to produce informed decisions on these central questions. To address these issues, we developed pipeliner – a tool to evaluate, by simulation, the performance of NGS pipelines in resequencing studies. Pipeliner provides a graphical interface allowing the users to write and test their own bioinformatics pipelines with publicly available or custom software. It computes a number of statistics summarizing the performance in SNP calling, including the recovery, sensitivity and false discovery rate for heterozygous and homozygous SNP genotypes. Pipeliner can be used to answer many practical questions, for example, for a limited amount of NGS effort, how many more reliable SNPs can be detected by doubling coverage and halving sample size or what is the false discovery rate provided by different SNP calling algorithms and options. Pipeliner thus allows researchers to carefully plan their study's sampling design and compare the suitability of alternative bioinformatics approaches for their specific study systems. Pipeliner is written in C++ and is freely available from http://github.com/brunonevado/Pipeliner .  相似文献   

18.
Microsatellites are widely used in population genetics to uncover recent evolutionary events. They are typically genotyped using capillary sequencer, which capacity is usually limited to 9, at most 12 loci for each run, and which analysis is a tedious task that is performed by hand. With the rise of next‐generation sequencing (NGS), a much larger number of loci and individuals are available from sequencing: for example, on a single run of a GS Junior, 28 loci from 96 individuals are sequenced with a 30X cover. We have developed an algorithm to automatically and efficiently genotype microsatellites from a collection of reads sorted by individual (e.g. specific PCR amplifications of a locus or a collection of reads that encompass a locus of interest). As the sequencing and the PCR amplification introduce artefactual insertions or deletions, the set of reads from a single microsatellite allele shows several length variants. The algorithm infers, without alignment, the true unknown allele(s) of each individual from the observed distributions of microsatellites length of all individuals. MicNeSs, a python implementation of the algorithm, can be used to genotype any microsatellite locus from any organism and has been tested on 454 pyrosequencing data of several loci from fruit flies (a model species) and red deers (a nonmodel species). Without any parallelization, it automatically genotypes 22 loci from 441 individuals in 11 hours on a standard computer. The comparison of MicNeSs inferences to the standard method shows an excellent agreement, with some differences illustrating the pros and cons of both methods.  相似文献   

19.
Recent developments of next generation sequencing technologies have led to rapid accumulation of 16S rRNA sequences for microbiome profiling. One key step in data processing is to cluster short sequences into operational taxonomic units (OTUs). Although many methods have been proposed for OTU inferences, a major challenge is the balance between inference accuracy and computational efficiency, where inference accuracy is often sacrificed to accommodate the need to analyze large numbers of sequences. Inspired by the hierarchical clustering method and a modified greedy network clustering algorithm, we propose a novel multi-seeds based heuristic clustering method, named MSClust, for OTU inference. MSClust first adaptively selects multi-seeds instead of one seed for each candidate cluster, and the reads are then processed using a greedy clustering strategy. Through many numerical examples, we demonstrate that MSClust enjoys less memory usage, and better biological accuracy compared to existing heuristic clustering methods while preserving efficiency and scalability.  相似文献   

20.
【背景】高通量测序技术已经广泛应用于环境微生物研究的各个领域。不同原理的测序平台以及众多生物公司的出现为各个科研团队提供了各具特色的测序技术支持,在满足了不同研究需要的同时,也产生了多种多样的测序数据。这些基于不同测序平台,以及同一测序平台下不同测序公司所产生的数据之间是否具有通用性,一直以来都是广大科研学者所关注的。【目的】探究同一样品在基于MiSeq测序平台下,不同测序环境以及不同测序深度对实验数据的影响,并进一步探究造成差异的原因,以及这些差异对实验结果的影响。【方法】从鄱阳湖松门山、南矶山、饶河、白沙洲采集底泥沉积物样品,分别在2个公司进行不同测序深度16SrRNA基因V3-V4区高通量测序,并比较分析2组测序数据。【结果】2组数据反映的微生物群落结构在实验样地间的分布规律具有高度的相似性,但稀有微生物的差异导致他们在PCoA以及聚类分析中被分为两簇。关系网络关联分析发现具有较高测序深度的B组数据反映了更为复杂的微生物间相互作用,部分稀有微生物如Deferribacteres(脱铁杆菌门)、Lentisphaerae (黏胶球形菌门)等在群落中发挥着重要的作用。METAGENassist功能预测发现了他们在Atrazine metabolism、Chitin degradation、Sulfate reducer、Nitrogen fixation等14类功能上存在差异。【结论】不同的测序环境对实验数据造成的影响可以通过数据质控过程减弱甚至排除,而测序深度的不同则会对测序数据产生显著影响。这种影响主要体现在较深的测序深度会显著增加稀有微生物的丰富度,进而有利于增强我们对环境微生物群落整体功能的认识。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号