首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Recent advances in sequencing technology allow for accurate detection of mitochondrial sequence variants, even those in low abundance at heteroplasmic sites. Considerable sequencing cost savings can be achieved by enriching samples for mitochondrial (relative to nuclear) DNA. Reduction in nuclear DNA (nDNA) content can also help to avoid false positive variants resulting from nuclear mitochondrial sequences (numts). We isolate intact mitochondrial organelles from both human cell lines and blood components using two separate methods: a magnetic bead binding protocol and differential centrifugation. DNA is extracted and further enriched for mitochondrial DNA (mtDNA) by an enzyme digest. Only 1 ng of the purified DNA is necessary for library preparation and next generation sequence (NGS) analysis. Enrichment methods are assessed and compared using mtDNA (versus nDNA) content as a metric, measured by using real-time quantitative PCR and NGS read analysis. Among the various strategies examined, the optimal is differential centrifugation isolation followed by exonuclease digest. This strategy yields >35% mtDNA reads in blood and cell lines, which corresponds to hundreds-fold enrichment over baseline. The strategy also avoids false variant calls that, as we show, can be induced by the long-range PCR approaches that are the current standard in enrichment procedures. This optimization procedure allows mtDNA enrichment for efficient and accurate massively parallel sequencing, enabling NGS from samples with small amounts of starting material. This will decrease costs by increasing the number of samples that may be multiplexed, ultimately facilitating efforts to better understand mitochondria-related diseases.  相似文献   

2.

Background

The processing and analysis of the large scale data generated by next-generation sequencing (NGS) experiments is challenging and is a burgeoning area of new methods development. Several new bioinformatics tools have been developed for calling sequence variants from NGS data. Here, we validate the variant calling of these tools and compare their relative accuracy to determine which data processing pipeline is optimal.

Results

We developed a unified pipeline for processing NGS data that encompasses four modules: mapping, filtering, realignment and recalibration, and variant calling. We processed 130 subjects from an ongoing whole exome sequencing study through this pipeline. To evaluate the accuracy of each module, we conducted a series of comparisons between the single nucleotide variant (SNV) calls from the NGS data and either gold-standard Sanger sequencing on a total of 700 variants or array genotyping data on a total of 9,935 single-nucleotide polymorphisms. A head to head comparison showed that Genome Analysis Toolkit (GATK) provided more accurate calls than SAMtools (positive predictive value of 92.55% vs. 80.35%, respectively). Realignment of mapped reads and recalibration of base quality scores before SNV calling proved to be crucial to accurate variant calling. GATK HaplotypeCaller algorithm for variant calling outperformed the UnifiedGenotype algorithm. We also showed a relationship between mapping quality, read depth and allele balance, and SNV call accuracy. However, if best practices are used in data processing, then additional filtering based on these metrics provides little gains and accuracies of >99% are achievable.

Conclusions

Our findings will help to determine the best approach for processing NGS data to confidently call variants for downstream analyses. To enable others to implement and replicate our results, all of our codes are freely available at http://metamoodics.org/wes.
  相似文献   

3.
Next-generation sequencing (NGS) has caused a revolution in biology. NGS requires the preparation of libraries in which (fragments of) DNA or RNA molecules are fused with adapters followed by PCR amplification and sequencing. It is evident that robust library preparation methods that produce a representative, non-biased source of nucleic acid material from the genome under investigation are of crucial importance. Nevertheless, it has become clear that NGS libraries for all types of applications contain biases that compromise the quality of NGS datasets and can lead to their erroneous interpretation. A detailed knowledge of the nature of these biases will be essential for a careful interpretation of NGS data on the one hand and will help to find ways to improve library quality or to develop bioinformatics tools to compensate for the bias on the other hand. In this review we discuss the literature on bias in the most common NGS library preparation protocols, both for DNA sequencing (DNA-seq) as well as for RNA sequencing (RNA-seq). Strikingly, almost all steps of the various protocols have been reported to introduce bias, especially in the case of RNA-seq, which is technically more challenging than DNA-seq. For each type of bias we discuss methods for improvement with a view to providing some useful advice to the researcher who wishes to convert any kind of raw nucleic acid into an NGS library.  相似文献   

4.
The importance of next generation sequencing (NGS) rises in cancer research as accessing this key technology becomes easier for researchers. The sequence data created by NGS technologies must be processed by various bioinformatics algorithms within a pipeline in order to convert raw data to meaningful information. Mapping and variant calling are the two main steps of these analysis pipelines, and many algorithms are available for these steps. Therefore, detailed benchmarking of these algorithms in different scenarios is crucial for the efficient utilization of sequencing technologies. In this study, we compared the performance of twelve pipelines (three mapping and four variant discovery algorithms) with recommended settings to capture single nucleotide variants. We observed significant discrepancy in variant calls among tested pipelines for different heterogeneity levels in real and simulated samples with overall high specificity and low sensitivity. Additional to the individual evaluation of pipelines, we also constructed and tested the performance of pipeline combinations. In these analyses, we observed that certain pipelines complement each other much better than others and display superior performance than individual pipelines. This suggests that adhering to a single pipeline is not optimal for cancer sequencing analysis and sample heterogeneity should be considered in algorithm optimization.  相似文献   

5.
ABSTRACT: BACKGROUND: Compared to classical genotyping, targeted next-generation sequencing (tNGS) can be custom-designed to interrogate entire genomic regions of interest, in order to detect novel as well as known variants. To bring down the per-sample cost, one approach is to pool barcoded NGS libraries before sample enrichment. Still, we lack a complete understanding of how this multiplexed tNGS approach and the varying performance of the ever-evolving analytical tools can affect the quality of variant discovery. Therefore, we evaluated the impact of different software tools and analytical approaches on the discovery of single nucleotide polymorphisms (SNPs) in multiplexed tNGS data. To generate our own test model, we combined a sequence capture method with NGS in three experimental stages of increasing complexity (E. coli genes, multiplexed E. coli, and multiplexed HapMap BRCA1/2 regions). RESULTS: We successfully enriched barcoded NGS libraries instead of genomic DNA, achieving reproducible coverage profiles (Pearson correlation coefficients of up to 0.99) across multiplexed samples, with <10% strand bias. However, the SNP calling quality was substantially affected by the choice of tools and mapping strategy. With the aim of reducing computational requirements, we compared conventional whole-genome mapping and SNP-calling with a new faster approach: target-region mapping with subsequent 'read-backmapping' to the whole genome to reduce the false detection rate. Consequently, we developed a combined mapping pipeline, which includes standard tools (BWA, SAMtools, etc.), and tested it on public HiSeq2000 exome data from the 1000 Genomes Project. Our pipeline saved 12 hours of run time per Hiseq2000 exome sample and detected ~5% more SNPs than the conventional whole genome approach. This suggests that more potential novel SNPs may be discovered using both approaches than with just the conventional approach. CONCLUSIONS: We recommend applying our general 'two-step' mapping approach for more efficient SNP discovery in tNGS. Our study has also shown the benefit of computing inter-sample SNP-concordances and inspecting read alignments in order to attain more confident results.  相似文献   

6.
Next generation sequencing (NGS) has traditionally been performed in various fields including agricultural to clinical and there are so many sequencing platforms available in order to obtain accurate and consistent results. However, these platforms showed amplification bias when facilitating variant calls in personal genomes. Here, we sequenced whole genomes and whole exomes from ten Korean individuals using Illumina and Ion Proton, respectively to find the vulnerability and accuracy of NGS platform in the GC rich/poor area. Overall, a total of 1013 Gb reads from Illumina and ~39.1 Gb reads from Ion Proton were analyzed using BWA-GATK variant calling pipeline. Furthermore, conjunction with the VQSR tool and detailed filtering strategies, we achieved high-quality variants. Finally, each of the ten variants from Illumina only, Ion Proton only, and intersection was selected for Sanger validation. The validation results revealed that Illumina platform showed higher accuracy than Ion Proton. The described filtering methods are advantageous for large population-based whole genome studies designed to identify common and rare variations associated with complex diseases.  相似文献   

7.
Next‐generation sequencing (NGS) is increasingly used for diet analyses; however, it may not always describe diet samples well. A reason for this is that diet samples contain mixtures of food DNA in different amounts as well as consumer DNA which can reduce the food DNA characterized. Because of this, detections will depend on the relative amount and identity of each type of DNA. For such samples, diagnostic PCR will most likely give more reliable results, as detection probability is only marginally dependent on other copresent DNA. We investigated the reliability of each method to test (a) whether predatory beetle regurgitates, supposed to be low in consumer DNA, allow to retrieve prey sequences using general barcoding primers that co‐amplify the consumer DNA, and (b) to assess the sequencing depth or replication needed for NGS and diagnostic PCR to give stable results. When consumer DNA is co‐amplified, NGS is better suited to discover the range of possible prey, than for comparing co‐occurrences of diet species between samples, as retested samples were repeatedly different in prey detections with this approach. This shows that samples were incompletely described, as prey detected by diagnostic PCR frequently were missed by NGS. As the sequencing depth needed to reliably describe the diet in such samples becomes very high, the cost‐efficiency and reliability of diagnostic PCR make diagnostic PCR better suited for testing large sample‐sets. Especially if the targeted prey taxa are thought to be of ecological importance, as diagnostic PCR gave more nested and consistent results in repeated testing of the same sample.  相似文献   

8.
Next‐generation sequencing (NGS) technology has extraordinarily enhanced the scope of research in the life sciences. To broaden the application of NGS to systems that were previously difficult to study, we present protocols for processing faecal and swab samples into amplicon libraries amenable to Illumina sequencing. We developed and tested a novel metagenomic DNA extraction approach using solid phase reversible immobilization (SPRI) beads on Western Bluebird (Sialia mexicana) samples stored in RNAlater. Compared with the MO BIO PowerSoil Kit, the current standard for the Human and Earth Microbiome Projects, the SPRI‐based method produced comparable 16S rRNA gene PCR amplification from faecal extractions but significantly greater DNA quality, quantity and PCR success for both cloacal and oral swab samples. We furthermore modified published protocols for preparing highly multiplexed Illumina libraries with minimal sample loss and without post‐adapter ligation amplification. Our library preparation protocol was successfully validated on three sets of heterogeneous amplicons (16S rRNA gene amplicons from SPRI and PowerSoil extractions as well as control arthropod COI gene amplicons) that were sequenced across three independent, 250‐bp, paired‐end runs on Illumina's MiSeq platform. Sequence analyses revealed largely equivalent results from the SPRI and PowerSoil extractions. Our comprehensive strategies focus on maximizing efficiency and minimizing costs. In addition to increasing the feasibility of using minimally invasive sampling and NGS capabilities in avian research, our methods are notably not avian‐specific and thus applicable to many research programmes that involve DNA extraction and amplicon sequencing.  相似文献   

9.
Amplification by polymerase chain reaction is often used in the preparation of template DNA molecules for next-generation sequencing. Amplification increases the number of available molecules for sequencing but changes the representation of the template molecules in the amplified product and introduces random errors. Such changes in representation hinder applications requiring accurate quantification of template molecules, such as allele calling or estimation of microbial diversity. We present a simple method to count the number of template molecules using degenerate bases and show that it improves genotyping accuracy and removes noise from PCR amplification. This method can be easily added to existing DNA library preparation techniques and can improve the accuracy of variant calling.  相似文献   

10.
Chemical mutagenesis is routinely used to create large numbers of rare mutations in plant and animal populations, which can be subsequently subjected to selection for beneficial traits and phenotypes that enable the characterization of gene functions. Several next‐generation sequencing (NGS)‐based target enrichment methods have been developed for the detection of mutations in target DNA regions. However, most of these methods aim to sequence a large number of target regions from a small number of individuals. Here, we demonstrate an effective and affordable strategy for the discovery of rare mutations in a large sodium azide‐induced mutant rice population (F2). The integration of multiplex, semi‐nested PCR combined with NGS library construction allowed for the amplification of multiple target DNA fragments for sequencing. The 8 × 8 × 8 tridimensional DNA sample pooling strategy enabled us to obtain DNA sequences of 512 individuals while only sequencing 24 samples. A stepwise filtering procedure was then elaborated to eliminate most of the false positives expected to arise through sequencing error, and the application of a simple Student's t‐test against position‐prone error allowed for the discovery of 16 mutations from 36 enriched targeted DNA fragments of 1024 mutagenized rice plants, all without any false calls.  相似文献   

11.
Next-generation sequencing (NGS) technologies provide the potential for developing high-throughput and low-cost platforms for clinical diagnostics. A limiting factor to clinical applications of genomic NGS is downstream bioinformatics analysis for data interpretation. We have developed an integrated approach for end-to-end clinical NGS data analysis from variant detection to functional profiling. Robust bioinformatics pipelines were implemented for genome alignment, single nucleotide polymorphism (SNP), small insertion/deletion (InDel), and copy number variation (CNV) detection of whole exome sequencing (WES) data from the Illumina platform. Quality-control metrics were analyzed at each step of the pipeline by use of a validated training dataset to ensure data integrity for clinical applications. We annotate the variants with data regarding the disease population and variant impact. Custom algorithms were developed to filter variants based on criteria, such as quality of variant, inheritance pattern, and impact of variant on protein function. The developed clinical variant pipeline links the identified rare variants to Integrated Genome Viewer for visualization in a genomic context and to the Protein Information Resource’s iProXpress for rich protein and disease information. With the application of our system of annotations, prioritizations, inheritance filters, and functional profiling and analysis, we have created a unique methodology for downstream variant filtering that empowers clinicians and researchers to interpret more effectively the relevance of genomic alterations within a rare genetic disease.  相似文献   

12.
Next generation sequencing (NGS) is an emerging technology becoming relevant for genotyping of clinical samples. Here, we assessed the stability of amplicon sequencing from formalin-fixed paraffin-embedded (FFPE) and paired frozen samples from colorectal cancer metastases with different analysis pipelines. 212 amplicon regions in 48 cancer related genes were sequenced with Illumina MiSeq using DNA isolated from resection specimens from 17 patients with colorectal cancer liver metastases. From ten of these patients, paired fresh frozen and routinely processed FFPE tissue was available for comparative study. Sample quality of FFPE tissues was determined by the amount of amplifiable DNA using qPCR, sequencing libraries were evaluated using Bioanalyzer. Three bioinformatic pipelines were compared for analysis of amplicon sequencing data. Selected hot spot mutations were reviewed using Sanger sequencing. In the sequenced samples from 16 patients, 29 non-synonymous coding mutations were identified in eleven genes. Most frequent were mutations in TP53 (10), APC (7), PIK3CA (3) and KRAS (2). A high concordance of FFPE and paired frozen tissue samples was observed in ten matched samples, revealing 21 identical mutation calls and only two mutations differing. Comparison of these results with two other commonly used variant calling tools, however, showed high discrepancies. Hence, amplicon sequencing can potentially be used to identify hot spot mutations in colorectal cancer metastases in frozen and FFPE tissue. However, remarkable differences exist among results of different variant calling tools, which are not only related to DNA sample quality. Our study highlights the need for standardization and benchmarking of variant calling pipelines, which will be required for translational and clinical applications.  相似文献   

13.
Who is eating what: diet assessment using next generation sequencing   总被引:4,自引:0,他引:4  
The analysis of food webs and their dynamics facilitates understanding of the mechanistic processes behind community ecology and ecosystem functions. Having accurate techniques for determining dietary ranges and components is critical for this endeavour. While visual analyses and early molecular approaches are highly labour intensive and often lack resolution, recent DNA-based approaches potentially provide more accurate methods for dietary studies. A suite of approaches have been used based on the identification of consumed species by characterization of DNA present in gut or faecal samples. In one approach, a standardized DNA region (DNA barcode) is PCR amplified, amplicons are sequenced and then compared to a reference database for identification. Initially, this involved sequencing clones from PCR products, and studies were limited in scale because of the costs and effort required. The recent development of next generation sequencing (NGS) has made this approach much more powerful, by allowing the direct characterization of dozens of samples with several thousand sequences per PCR product, and has the potential to reveal many consumed species simultaneously (DNA metabarcoding). Continual improvement of NGS technologies, on-going decreases in costs and current massive expansion of reference databases make this approach promising. Here we review the power and pitfalls of NGS diet methods. We present the critical factors to take into account when choosing or designing a suitable barcode. Then, we consider both technical and analytical aspects of NGS diet studies. Finally, we discuss the validation of data accuracy including the viability of producing quantitative data.  相似文献   

14.
Next-Generation Sequencing (NGS) technologies have dramatically revolutionised research in many fields of genetics. The ability to sequence many individuals from one or multiple populations at a genomic scale has greatly enhanced population genetics studies and made it a data-driven discipline. Recently, researchers have proposed statistical modelling to address genotyping uncertainty associated with NGS data. However, an ongoing debate is whether it is more beneficial to increase the number of sequenced individuals or the per-sample sequencing depth for estimating genetic variation. Through extensive simulations, I assessed the accuracy of estimating nucleotide diversity, detecting polymorphic sites, and predicting population structure under different experimental scenarios. Results show that the greatest accuracy for estimating population genetics parameters is achieved by employing a large sample size, despite single individuals being sequenced at low depth. Under some circumstances, the minimum sequencing depth for obtaining accurate estimates of allele frequencies and to identify polymorphic sites is , where both alleles are more likely to have been sequenced. On the other hand, inferences of population structure are more accurate at very large sample sizes, even with extremely low sequencing depth. This all points to the conclusion that under various experimental scenarios, in cost-limited population genetics studies, large sample sizes at low sequencing depth are desirable to achieve high accuracy. These findings will help researchers design their experimental set-ups and guide further investigation on the effect of protocol design for genetic research.  相似文献   

15.
The unprecedented increase in the throughput of DNA sequencing driven by next-generation technologies now allows efficient analysis of the complete protein-coding regions of genomes (exomes) for multiple samples in a single sequencing run. However, sample preparation and targeted enrichment of multiple samples has become a rate-limiting and costly step in high-throughput genetic analysis. Here we present an efficient protocol for parallel library preparation and targeted enrichment of pooled multiplexed bar-coded samples. The procedure is compatible with microarray-based and solution-based capture approaches. The high flexibility of this method allows multiplexing of 3-5 samples for whole-exome experiments, 20 samples for targeted footprints of 5 Mb and 96 samples for targeted footprints of 0.4 Mb. From library preparation to post-enrichment amplification, including hybridization time, the protocol takes 5-6 d for array-based enrichment and 3-4 d for solution-based enrichment. Our method provides a cost-effective approach for a broad range of applications, including targeted resequencing of large sample collections (e.g., follow-up genome-wide association studies), and whole-exome or custom mini-genome sequencing projects. This protocol gives details for a single-tube procedure, but scaling to a manual or automated 96-well plate format is possible and discussed.  相似文献   

16.
We describe a method for pooling and sequencing DNA from a large number of individual samples while preserving information regarding sample identity. DNA from 576 individuals was arranged into four 12 row by 12 column matrices and then pooled by row and by column resulting in 96 total pools with 12 individuals in each pool. Pooling of DNA was carried out in a two-dimensional fashion, such that DNA from each individual is present in exactly one row pool and exactly one column pool. By considering the variants observed in the rows and columns of a matrix we are able to trace rare variants back to the specific individuals that carry them. The pooled DNA samples were enriched over a 250 kb region previously identified by GWAS to significantly predispose individuals to lung cancer. All 96 pools (12 row and 12 column pools from 4 matrices) were barcoded and sequenced on an Illumina HiSeq 2000 instrument with an average depth of coverage greater than 4,000×. Verification based on Ion PGM sequencing confirmed the presence of 91.4% of confidently classified SNVs assayed. In this way, each individual sample is sequenced in multiple pools providing more accurate variant calling than a single pool or a multiplexed approach. This provides a powerful method for rare variant detection in regions of interest at a reduced cost to the researcher.  相似文献   

17.
Yuan B  Wang J  Cao H  Sun R  Wang Y 《Nucleic acids research》2011,39(14):5945-5954
Human cells are constantly exposed to environmental and endogenous agents which can induce damage to DNA. Understanding the implications of these DNA modifications in the etiology of human diseases requires the examination about how these DNA lesions block DNA replication and induce mutations in cells. All previously reported shuttle vector-based methods for investigating the cytotoxic and mutagenic properties of DNA lesions in cells have low-throughput, where plasmids containing individual lesions are transfected into cells one lesion at a time and the products from the replication of individual lesions are analyzed separately. The advent of next-generation sequencing (NGS) technology has facilitated investigators to design scientific approaches that were previously not technically feasible or affordable. In this study, we developed a new method employing NGS, together with shuttle vector technology, to have a multiplexed and quantitative assessment of how DNA lesions perturb the efficiency and accuracy of DNA replication in cells. By using this method, we examined the replication of four carboxymethylated DNA lesions and two oxidatively induced bulky DNA lesions including (5'S) diastereomers of 8,5'-cyclo-2'-deoxyguanosine (cyclo-dG) and 8,5'-cyclo-2'-deoxyadenosine (cyclo-dA) in five different strains of Escherichia coli cells. We further validated the results obtained from NGS using previously established methods. Taken together, the newly developed method provided a high-throughput and readily affordable method for assessing quantitatively how DNA lesions compromise the efficiency and fidelity of DNA replication in cells.  相似文献   

18.
Next‐generation sequencing (NGS) is emerging as an efficient and cost‐effective tool in population genomic analyses of nonmodel organisms, allowing simultaneous resequencing of many regions of multi‐genomic DNA from multiplexed samples. Here, we detail our synthesis of protocols for targeted resequencing of mitochondrial and nuclear loci by generating indexed genomic libraries for multiplexing up to 100 individuals in a single sequencing pool, and then enriching the pooled library using custom DNA capture arrays. Our use of DNA sequence from one species to capture and enrich the sequencing libraries of another species (i.e. cross‐species DNA capture) indicates that efficient enrichment occurs when sequences are up to about 12% divergent, allowing us to take advantage of genomic information in one species to sequence orthologous regions in related species. In addition to a complete mitochondrial genome on each array, we have included between 43 and 118 nuclear loci for low‐coverage sequencing of between 18 kb and 87 kb of DNA sequence per individual for single nucleotide polymorphisms discovery from 50 to 100 individuals in a single sequencing lane. Using this method, we have generated a total of over 500 whole mitochondrial genomes from seven cetacean species and green sea turtles. The greater variation detected in mitogenomes relative to short mtDNA sequences is helping to resolve genetic structure ranging from geographic to species‐level differences. These NGS and analysis techniques have allowed for simultaneous population genomic studies of mtDNA and nDNA with greater genomic coverage and phylogeographic resolution than has previously been possible in marine mammals and turtles.  相似文献   

19.
Molecular diagnosis of monogenic diabetes and obesity is of paramount importance for both the patient and society, as it can result in personalized medicine associated with a better life and it eventually saves health care spending. Genetic clinical laboratories are currently switching from Sanger sequencing to next-generation sequencing (NGS) approaches but choosing the optimal protocols is not easy. Here, we compared the sequencing coverage of 43 genes involved in monogenic forms of diabetes and obesity, and variant detection rates, resulting from four enrichment methods based on the sonication of DNA (Agilent SureSelect, RainDance technologies), or using enzymes for DNA fragmentation (Illumina Nextera, Agilent HaloPlex). We analyzed coding exons and untranslated regions of the 43 genes involved in monogenic diabetes and obesity. We found that none of the methods achieves yet full sequencing of the gene targets. Nonetheless, the RainDance, SureSelect and HaloPlex enrichment methods led to the best sequencing coverage of the targets; while the Nextera method resulted in the poorest sequencing coverage. Although the sequencing coverage was high, we unexpectedly found that the HaloPlex method missed 20% of variants detected by the three other methods and Nextera missed 10%. The question of which NGS technique for genetic diagnosis yields the highest diagnosis rate is frequently discussed in the literature and the response is still unclear. Here, we showed that the RainDance enrichment method as well as SureSelect, which are both based on the sonication of DNA, resulted in a good sequencing quality and variant detection, while the use of enzymes to fragment DNA (HaloPlex or Nextera) might not be the best strategy to get an accurate sequencing.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号