首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Constructing mixtures of tagged or bar-coded DNAs for sequencing is an important requirement for the efficient use of next-generation sequencers in applications where limited sequence data are required per sample. There are many applications in which next-generation sequencing can be used effectively to sequence large mixed samples; an example is the characterization of microbial communities where ≤1,000 sequences per samples are adequate to address research questions. Thus, it is possible to examine hundreds to thousands of samples per run on massively parallel next-generation sequencers. However, the cost savings for efficient utilization of sequence capacity is realized only if the production and management costs associated with construction of multiplex pools are also scalable. One critical step in multiplex pool construction is the normalization process, whereby equimolar amounts of each amplicon are mixed. Here we compare three approaches (spectroscopy, size-restricted spectroscopy, and quantitative binding) for normalization of large, multiplex amplicon pools for performance and efficiency. We found that the quantitative binding approach was superior and represents an efficient scalable process for construction of very large, multiplex pools with hundreds and perhaps thousands of individual amplicons included. We demonstrate the increased sequence diversity identified with higher throughput. Massively parallel sequencing can dramatically accelerate microbial ecology studies by allowing appropriate replication of sequence acquisition to account for temporal and spatial variations. Further, population studies to examine genetic variation, which require even lower levels of sequencing, should be possible where thousands of individual bar-coded amplicons are examined in parallel.Emergent technologies that generate DNA sequence data are designed primarily to perform resequencing projects at reasonable cost. The result is a substantial decrease in per base costs from traditional methods. However, these next-generation platforms do not readily accommodate projects that require obtaining moderate amounts of sequence from large numbers of samples. These platforms also have per run costs that are significant and generally preclude large numbers of single-sample, nonmultiplexed runs. One example of research that is not readily supported is rRNA-directed metagenomics study of some human clinical samples or environmental rRNA analysis of samples from communities with low community diversity that require only thousands of sequences. Thus, strategies to utilize next-generation DNA sequencers efficiently for applications that require lower throughput are critical to capitalize on the efficiency and cost benefits of next-generation sequencing platforms.Directed metagenomics based on amplification of rRNA genes is an important tool to characterize microbial communities in various environmental and clinical settings. In diverse environmental samples, large numbers of sequences are required to fully characterize the microbial communities (15). However, a lower number of sequences is generally adequate to answer specific research questions. In addition, the levels of diversity in human clinical samples are usually lower than what is observed in environmental samples (for example, see reference 7).The Roche 454 genome sequencer system FLX pyrosequencer (which we will refer to as 454 FLX hereafter) is the most useful platform for rRNA-directed metagenomics because it currently provides the longest read lengths of any next-generation sequencing platform (1, 14). Computational analysis has shown that the 250-nucleotide read length (available from the 454 FLX-LR chemistry) is adequate for identification of bacteria if the amplified region is properly positioned within variable regions of the small-subunit rRNA (SSU-rRNA) gene (9, 10).In this study, we used the 454 FLX-LR genome sequencing platform and chemistry, which provides >400,000 sequences of ∼250 bp per run. After we conducted this study, a new reagent set (454 FLX-XLR titanium chemistry) was released, which further increases reads to >1,000,000 and read lengths to >400 bp (Roche). The 454 FLX platform dramatically reduces per base costs of obtaining sequence, and physical separation into between 2 and 16 lanes is available; this physical separation on the plate reduces sequencing output overall, up to 40% comparing 2 lanes versus 16 lanes. For applications where modest sequencing depth (∼1,000 sequences per sample) is adequate to address research questions, physical separation does not allow adequate sample multiplexing because even a 1/16 454 FLX-LR plate run is expected to produce ∼15,000 reads. Further, the utility of the platform as a screening tool at 16-plex is limited by cost per run.A solution to make next-generation sequencing economical for projects such as rRNA-directed metagenomics is to use bar-coded primers to multiplex amplicon pools so they can be sequenced together and computationally separated afterward (6). To successfully accomplish this strategy, precise normalization of the DNA concentrations of the individual amplicons in the multiplex pools is essential for effective multiplex sequencing when large numbers of pooled samples are sequenced in parallel. There are several potential methods available for normalizing concentrations of amplicons included in multiplex pools, but the relative and absolute performance of each approach has not been compared.In this study, we present a direct quantitative comparison of three available methods for amplicon pool normalization for downstream next-generation sequencing. The central goal of the study was to identify the most effective method for normalizing multiplex pools containing >100 individual amplicons. We evaluated each pooling approach by 454 sequencing and compared the observed frequencies of sequences from different pooled bar-coded amplicons. From these data, we determined the efficacy of each method based on the following factors: (i) how well normalized the sequences within the pool were, (ii) the proportion of samples failing to meet a minimum threshold of sequences per sample, and (iii) the overall efficiency (speed and labor required) of the process to multiplex samples.  相似文献   

2.
Removing Noise From Pyrosequenced Amplicons   总被引:2,自引:0,他引:2  

Background  

In many environmental genomics applications a homologous region of DNA from a diverse sample is first amplified by PCR and then sequenced. The next generation sequencing technology, 454 pyrosequencing, has allowed much larger read numbers from PCR amplicons than ever before. This has revolutionised the study of microbial diversity as it is now possible to sequence a substantial fraction of the 16S rRNA genes in a community. However, there is a growing realisation that because of the large read numbers and the lack of consensus sequences it is vital to distinguish noise from true sequence diversity in this data. Otherwise this leads to inflated estimates of the number of types or operational taxonomic units (OTUs) present. Three sources of error are important: sequencing error, PCR single base substitutions and PCR chimeras. We present AmpliconNoise, a development of the PyroNoise algorithm that is capable of separately removing 454 sequencing errors and PCR single base errors. We also introduce a novel chimera removal program, Perseus, that exploits the sequence abundances associated with pyrosequencing data. We use data sets where samples of known diversity have been amplified and sequenced to quantify the effect of each of the sources of error on OTU inflation and to validate these algorithms.  相似文献   

3.
Parallel tagged sequencing on the 454 platform   总被引:2,自引:0,他引:2  
Parallel tagged sequencing (PTS) is a molecular barcoding method designed to adapt the recently developed high-throughput 454 parallel sequencing technology for use with multiple samples. Unlike other barcoding methods, PTS can be applied to any type of double-stranded DNA (dsDNA) sample, including shotgun DNA libraries and pools of PCR products, and requires no amplification or gel purification steps. The method relies on attaching sample-specific barcoding adapters, which include sequence tags and a restriction site, to blunt-end repaired DNA samples by ligation and strand-displacement. After pooling multiple barcoded samples, molecules without sequence tags are effectively excluded from sequencing by dephosphorylation and restriction digestion, and using the tag sequences, the source of each DNA sequence can be traced. This protocol allows for sequencing 300 or more complete mitochondrial genomes on a single 454 GS FLX run, or twenty-five 6-kb plasmid sequences on only one 16th plate region. Most of the reactions can be performed in a multichannel setup on 96-well reaction plates, allowing for processing up to several hundreds of samples in a few days.  相似文献   

4.
5.
DNA barcoding is an efficient method to identify specimens and to detect undescribed/cryptic species. Sanger sequencing of individual specimens is the standard approach in generating large‐scale DNA barcode libraries and identifying unknowns. However, the Sanger sequencing technology is, in some respects, inferior to next‐generation sequencers, which are capable of producing millions of sequence reads simultaneously. Additionally, direct Sanger sequencing of DNA barcode amplicons, as practiced in most DNA barcoding procedures, is hampered by the need for relatively high‐target amplicon yield, coamplification of nuclear mitochondrial pseudogenes, confusion with sequences from intracellular endosymbiotic bacteria (e.g. Wolbachia) and instances of intraindividual variability (i.e. heteroplasmy). Any of these situations can lead to failed Sanger sequencing attempts or ambiguity of the generated DNA barcodes. Here, we demonstrate the potential application of next‐generation sequencing platforms for parallel acquisition of DNA barcode sequences from hundreds of specimens simultaneously. To facilitate retrieval of sequences obtained from individual specimens, we tag individual specimens during PCR amplification using unique 10‐mer oligonucleotides attached to DNA barcoding PCR primers. We employ 454 pyrosequencing to recover full‐length DNA barcodes of 190 specimens using 12.5% capacity of a 454 sequencing run (i.e. two lanes of a 16 lane run). We obtained an average of 143 sequence reads for each individual specimen. The sequences produced are full‐length DNA barcodes for all but one of the included specimens. In a subset of samples, we also detected Wolbachia, nontarget species, and heteroplasmic sequences. Next‐generation sequencing is of great value because of its protocol simplicity, greatly reduced cost per barcode read, faster throughout and added information content.  相似文献   

6.
The characterization of bacterial communities using DNA sequencing has revolutionized our ability to study microbes in nature and discover the ways in which microbial communities affect ecosystem functioning and human health. Here we describe Serial Illumina Sequencing (SI-Seq): a method for deep sequencing of the bacterial 16S rRNA gene using next-generation sequencing technology. SI-Seq serially sequences portions of the V5, V6 and V7 hypervariable regions from barcoded 16S rRNA amplicons using an Illumina short-read genome analyzer. SI-Seq obtains taxonomic resolution similar to 454 pyrosequencing for a fraction of the cost, and can produce hundreds of thousands of reads per sample even with very high multiplexing. We validated SI-Seq using single species and mock community controls, and via a comparison to cystic fibrosis lung microbiota sequenced using 454 FLX Titanium. Our control runs show that SI-Seq has a dynamic range of at least five orders of magnitude, can classify >96% of sequences to the genus level, and performs just as well as 454 and paired-end Illumina methods in estimation of standard microbial ecology diversity measurements. We illustrate the utility of SI-Seq in a pilot sample of central airway secretion samples from cystic fibrosis patients.  相似文献   

7.
Rapid advances in sequencing technology have changed the experimental landscape of microbial ecology. In the last 10 years, the field has moved from sequencing hundreds of 16S rRNA gene fragments per study using clone libraries to the sequencing of millions of fragments per study using next-generation sequencing technologies from 454 and Illumina. As these technologies advance, it is critical to assess the strengths, weaknesses, and overall suitability of these platforms for the interrogation of microbial communities. Here, we present an improved method for sequencing variable regions within the 16S rRNA gene using Illumina''s MiSeq platform, which is currently capable of producing paired 250-nucleotide reads. We evaluated three overlapping regions of the 16S rRNA gene that vary in length (i.e., V34, V4, and V45) by resequencing a mock community and natural samples from human feces, mouse feces, and soil. By titrating the concentration of 16S rRNA gene amplicons applied to the flow cell and using a quality score-based approach to correct discrepancies between reads used to construct contigs, we were able to reduce error rates by as much as two orders of magnitude. Finally, we reprocessed samples from a previous study to demonstrate that large numbers of samples could be multiplexed and sequenced in parallel with shotgun metagenomes. These analyses demonstrate that our approach can provide data that are at least as good as that generated by the 454 platform while providing considerably higher sequencing coverage for a fraction of the cost.  相似文献   

8.
PCR permits the exponential and sequence-specific amplification of DNA, even from minute starting quantities. PCR is a fundamental step in preparing DNA samples for high-throughput sequencing. However, there are errors associated with PCR-mediated amplification. Here we examine the effects of four important sources of error—bias, stochasticity, template switches and polymerase errors—on sequence representation in low-input next-generation sequencing libraries. We designed a pool of diverse PCR amplicons with a defined structure, and then used Illumina sequencing to search for signatures of each process. We further developed quantitative models for each process, and compared predictions of these models to our experimental data. We find that PCR stochasticity is the major force skewing sequence representation after amplification of a pool of unique DNA amplicons. Polymerase errors become very common in later cycles of PCR but have little impact on the overall sequence distribution as they are confined to small copy numbers. PCR template switches are rare and confined to low copy numbers. Our results provide a theoretical basis for removing distortions from high-throughput sequencing data. In addition, our findings on PCR stochasticity will have particular relevance to quantification of results from single cell sequencing, in which sequences are represented by only one or a few molecules.  相似文献   

9.
Pyrosequencing of 16S rRNA gene amplicons on the 454 FLX Titanium platform has been widely used to analyze microbiomes in various environments. However, different results may stem from variations among sequencing runs or among sequencing facilities. This study aimed to evaluate these variations between different pyrosequencing runs by sequencing 16S rRNA gene amplicon libraries generated from three sets of rumen samples twice each on the 454 FLX Titanium system at two independent sequencing facilities. Similar relative abundances were found for predominant taxa represented by large numbers of sequence reads but not for minor taxa represented by small numbers of sequence reads. The two sequencing facilities revealed different bacterial profiles with respect to both predominant taxa and minor taxa, including the most predominant genus Prevotella, the family Lachnospiraceae, and the phylum Proteobacteria. Differences in primers used to generate amplicon libraries may be a major source of variations in microbiome profiling. Because different primers and regions of 16S rRNA genes are often used by different researchers, significant variations likely exist among studies. Quantitative interpretation for relative abundance of taxa, especially minor taxa, from prevalence of sequence reads and comparisons of results from different studies should be done with caution.  相似文献   

10.
Current efforts to recover the Neandertal and mammoth genomes by 454 DNA sequencing demonstrate the sensitivity of this technology. However, routine 454 sequencing applications still require microgram quantities of initial material. This is due to a lack of effective methods for quantifying 454 sequencing libraries, necessitating expensive and labour-intensive procedures when sequencing ancient DNA and other poor DNA samples. Here we report a 454 sequencing library quantification method based on quantitative PCR that effectively eliminates these limitations. We estimated both the molecule numbers and the fragment size distributions in sequencing libraries derived from Neandertal DNA extracts, SAGE ditags and bonobo genomic DNA, obtaining optimal sequencing yields without performing any titration runs. Using this method, 454 sequencing can routinely be performed from as little as 50 pg of initial material without titration runs, thereby drastically reducing costs while increasing the scope of sample throughput and protocol development on the 454 platform. The method should also apply to Illumina/Solexa and ABI/SOLiD sequencing, and should therefore help to widen the accessibility of all three platforms.  相似文献   

11.
12.
Researchers face a significant problem in PCR amplification of DNA fragments with high GC contents. Analysis of these regions is of importance since many regulatory regions of different genes and their first exons are GC-rich. There are a large number of protocols for amplification of GC-rich DNA, some of which perform well but are costly. Most of the economical protocols fail to perform consistently, especially on products with >80 % GC contents and a size of >300 bp. One of these protocols requires multiple additions of DNA polymerase during thermal cycling which therefore rules out its utility if a large number of samples have to be amplified. We have established a method for simultaneous amplification of specific PCR products from a large number of human DNA samples using general laboratory reagents. These amplicons have GC contents ranging from 65–85 % and sizes up to 870 bp. The protocol uses a PCR buffer containing co-solvents including 2-mercaptoethanol and bovine serum albumin for amplification of DNA. A specific thermal cycling profile is also used which incorporates a high annealing temperature in the first 7 cycles of the reactions. The PCR products are suitable for different molecular biology applications including sequencing.  相似文献   

13.
Complex polyploid crop genomes can be recalcitrant towards conventional DNA sequencing approaches for allele mining in candidate genes for valuable traits. In the past, this has greatly complicated the transfer of knowledge on promising candidate genes from model plants to even closely related polyploid crops. Next-generation sequencing offers diverse solutions to overcome such difficulties. Here, we present a method for multiplexed 454 sequencing in gene-specific PCR amplicons that can simultaneously address multiple homologues of given target genes. We devised a simple two-step PCR procedure employing a set of barcoded M13/T7 universal fusion primers that enable a cost-effective and efficient amplification of large numbers of target gene amplicons. Sequencing-ready amplicons are generated that can be simultaneously sequenced in pools comprising multiple amplicons from multiple genotypes. High-depth sequencing allows resolution of the resulting sequence reads into contigs representing multiple homologous loci, with only insignificant off-target capture of paralogues or PCR artefacts. In a case study, the procedure was tested in the complex polyploid genome of Brassica napus for a set of nine genes identified in Arabidopsis as candidates for regulation of seed development and oil content. Up to six copies of these genes were expected in B.?napus. SNP discovery was performed by pooled multiplex sequencing of 30 amplicons in 20 diverse B.?napus accessions with interesting trait variation for oil content, providing a basis for comparative mapping to relevant quantitative trait loci and for subsequent marker-assisted breeding.  相似文献   

14.
To date we have little knowledge of how accurate next-generation sequencing (NGS) technologies are in sequencing repetitive sequences beyond known limitations to accurately sequence homopolymers. Only a handful of previous reports have evaluated the potential of NGS for sequencing short tandem repeats (microsatellites) and no empirical study has compared and evaluated the performance of more than one NGS platform with the same dataset. Here we examined yeast microsatellite variants from both long-read (454-sequencing) and short-read (Illumina) NGS platforms and compared these to data derived through Sanger sequencing. In addition, we investigated any locus-specific biases and differences that might have resulted from variability in microsatellite repeat number, repeat motif or type of mutation. Out of 112 insertion/deletion variants identified among 45 microsatellite amplicons in our study, we found 87.5% agreement between the 454-platform and Sanger sequencing in frequency of variant detection after Benjamini-Hochberg correction for multiple tests. For a subset of 21 microsatellite amplicons derived from Illumina sequencing, the results of short-read platform were highly consistent with the other two platforms, with 100% agreement with 454-sequencing and 93.6% agreement with the Sanger method after Benjamini-Hochberg correction. We found that the microsatellite attributes copy number, repeat motif and type of mutation did not have a significant effect on differences seen between the sequencing platforms. We show that both long-read and short-read NGS platforms can be used to sequence short tandem repeats accurately, which makes it feasible to consider the use of these platforms in high-throughput genotyping. It appears the major requirement for achieving both high accuracy and rare variant detection in microsatellite genotyping is sufficient read depth coverage. This might be a challenge because each platform generates a consistent pattern of non-uniform sequence coverage, which, as our study suggests, may affect some types of tandem repeats more than others.  相似文献   

15.
Bi-PROF     
The use of next generation sequencing has expanded our view on whole mammalian methylome patterns. In particular, it provides a genome-wide insight of local DNA methylation diversity at single nucleotide level and enables the examination of single chromosome sequence sections at a sufficient statistical power. We describe a bisulfite-based sequence profiling pipeline, Bi-PROF, which is based on the 454 GS-FLX Titanium technology that allows to obtain up to one million sequence stretches at single base pair resolution without laborious subcloning. To illustrate the performance of the experimental workflow connected to a bioinformatics program pipeline (BiQ Analyzer HT) we present a test analysis set of 68 different epigenetic marker regions (amplicons) in five individual patient-derived xenograft tissue samples of colorectal cancer and one healthy colon epithelium sample as a control. After the 454 GS-FLX Titanium run, sequence read processing and sample decoding, the obtained alignments are quality controlled and statistically evaluated. Comprehensive methylation pattern interpretation (profiling) assessed by analyzing 102-104 sequence reads per amplicon allows an unprecedented deep view on pattern formation and methylation marker heterogeneity in tissues concerned by complex diseases like cancer.  相似文献   

16.
Screening large numbers of target regions in multiple DNA samples for sequence variation is an important application of next-generation sequencing but an efficient method to enrich the samples in parallel has yet to be reported. We describe an advanced method that combines DNA samples using indexes or barcodes prior to target enrichment to facilitate this type of experiment. Sequencing libraries for multiple individual DNA samples, each incorporating a unique 6-bp index, are combined in equal quantities, enriched using a single in-solution target enrichment assay and sequenced in a single reaction. Sequence reads are parsed based on the index, allowing sequence analysis of individual samples. We show that the use of indexed samples does not impact on the efficiency of the enrichment reaction. For three- and nine-indexed HapMap DNA samples, the method was found to be highly accurate for SNP identification. Even with sequence coverage as low as 8x, 99% of sequence SNP calls were concordant with known genotypes. Within a single experiment, this method can sequence the exonic regions of hundreds of genes in tens of samples for sequence and structural variation using as little as 1 μg of input DNA per sample.  相似文献   

17.
Ultra-deep sequencing (UDS) of amplicons is a major application for next-generation sequencing technologies, even more so for the 454 Genome Sequencer FLX. Especially for this application, errors that might be introduced during any of the sample processing or data analysis steps should be avoided or at least recognized, as they might lead to aberrant sequence variant calling. Since 454 pyrosequencing relies on PCR-driven target amplification, it is key to differentiate errors introduced during the amplification step from genuine minority variants. Thereto, optimal primer design is imperative because primer selection, primer dimer formation, and nonspecific binding may all affect the quality and outcome of amplicon-based deep sequencing. Also, other intrinsic PCR characteristics including amplification drift and the formation of secondary structures may influence sequencing data quality. We illustrate these phenomena using real life case studies and propose experimental and analytical evidence-based solutions for effective practice. Furthermore, because accuracy of the DNA polymerase is vital for reliable UDS results, a comparative analysis of error profiles from seven different DNA polymerases was performed and experimentally assessed in parallel by 454 sequencing. Finally, intra and interrun variability evaluation of the 454 sequencing protocol revealed highly reproducible results in amplicon-based UDS.  相似文献   

18.
Genotyping by sequencing (GBS) is a restriction enzyme based targeted approach developed to reduce the genome complexity and discover genetic markers when a priori sequence information is unavailable. Sufficient coverage at each locus is essential to distinguish heterozygous from homozygous sites accurately. The number of GBS samples able to be pooled in one sequencing lane is limited by the number of restriction sites present in the genome and the read depth required at each site per sample for accurate calling of single-nucleotide polymorphisms. Loci bias was observed using a slight modification of the Elshire et al. method: some restriction enzyme sites were represented in higher proportions while others were poorly represented or absent. This bias could be due to the quality of genomic DNA, the endonuclease and ligase reaction efficiency, the distance between restriction sites, the preferential amplification of small library restriction fragments, or bias towards cluster formation of small amplicons during the sequencing process. To overcome these issues, we have developed a GBS method based on randomly tagging genomic DNA (rtGBS). By randomly landing on the genome, we can, with less bias, find restriction sites that are far apart, and undetected by the standard GBS (stdGBS) method. The study comprises two types of biological replicates: six different kiwifruit plants and two independent DNA extractions per plant; and three types of technical replicates: four samples of each DNA extraction, stdGBS vs. rtGBS methods, and two independent library amplifications, each sequenced in separate lanes. A statistically significant unbiased distribution of restriction fragment size by rtGBS showed that this method targeted 49% (39,145) of BamH I sites shared with the reference genome, compared to only 14% (11,513) by stdGBS.  相似文献   

19.
Assessing phytoplankton diversity is of primary importance for both basic and applied ecological studies. Following the advances in molecular methods, phytoplankton studies are switching from using classical microscopy to high throughput sequencing approaches. However, methodological comparisons of these approaches have rarely been reported. In this study, we compared the two methods, using a unique dataset of multiple water samples taken from a natural freshwater environment. Environmental DNA was extracted from 300 water samples collected weekly during 20 years, followed by high throughput sequencing of amplicons from the 16S and 18S rRNA hypervariable regions. For each water sample, phytoplankton diversity was also estimated using light microscopy. Our study indicates that species compositions detected by light microscopy and 454 high throughput sequencing do not always match. High throughput sequencing detected more rare species and picoplankton than light microscopy, and thus gave a better assessment of phytoplankton diversity. However, when compared to light microscopy, high throughput sequencing of 16S and 18S rRNA amplicons did not adequately identify phytoplankton at the species level. In summary, our study recommends a combined strategy using both morphological and molecular techniques.  相似文献   

20.
Multilocus sequence typing (MLST) is a widely used system for typing microorganisms by sequence analysis of housekeeping genes. The main advantage of MLST in comparison to other typing techniques is the unambiguity and transferability of sequence data. However, a main disadvantage is the high cost of DNA sequencing. Here we introduce a high-throughput MLST (HiMLST) method that employs next-generation sequencing (NGS) technology (Roche 454), to generate large quantities of high-quality MLST data at low costs. The HiMLST protocol consists of two steps. In the first step MLST target genes are amplified by PCR in multi-well plates. During this PCR the amplicons of each bacterial isolate are provided with a unique DNA barcode, the multiplex identifier (MID). In the second step all amplicons are pooled and sequenced in a single NGS-run. The MLST profile of each individual isolate can be retrieved easily using its unique MID. With HiMLST we have profiled 575 isolates of Legionella pneumophila, Staphylococcus aureus, Pseudomonas aeruginosa and Streptococcus pneumoniae in mixed species HiMLST experiments. In conclusion, the introduction of HiMLST paves the way for a broad employment of the MLST as a high-quality and cost-effective method for typing microbial species.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号