期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Coval: Improving Alignment Quality and Variant Calling Accuracy for Next-Generation Sequencing Data

Shunichi Kosugi Satoshi Natsume Kentaro Yoshida Daniel MacLean Liliana Cano Sophien Kamoun Ryohei Terauchi 《PloS one》2013,8(10)

Accurate identification of DNA polymorphisms using next-generation sequencing technology is challenging because of a high rate of sequencing error and incorrect mapping of reads to reference genomes. Currently available short read aligners and DNA variant callers suffer from these problems. We developed the Coval software to improve the quality of short read alignments. Coval is designed to minimize the incidence of spurious alignment of short reads, by filtering mismatched reads that remained in alignments after local realignment and error correction of mismatched reads. The error correction is executed based on the base quality and allele frequency at the non-reference positions for an individual or pooled sample. We demonstrated the utility of Coval by applying it to simulated genomes and experimentally obtained short-read data of rice, nematode, and mouse. Moreover, we found an unexpectedly large number of incorrectly mapped reads in ‘targeted’ alignments, where the whole genome sequencing reads had been aligned to a local genomic segment, and showed that Coval effectively eliminated such spurious alignments. We conclude that Coval significantly improves the quality of short-read sequence alignments, thereby increasing the calling accuracy of currently available tools for SNP and indel identification. Coval is available at http://sourceforge.net/projects/coval105/. 相似文献

2.

Meta-QC-Chain: Comprehensive and Fast Quality Control Method for Metagenomic Data

Qian Zhou Xiaoquan Su Gongchao Jing Kang Ning 《基因组蛋白质组与生物信息学报(英文版)》2014,12(1):52-56

Next-generation sequencing(NGS) technology has revolutionized and significantly impacted metagenomic research.However,the NGS data usually contains sequencing artifacts such as low-quality reads and contaminating reads,which will significantly compromise downstream analysis.Many quality control(QC) tools have been proposed,however,few of them have been verified to be suitable or efficient for metagenomic data,which are composed of multiple genomes and are more complex than other kinds of NGS data.Here we present a metagenomic data QC method named Meta-QC-Chain.Meta-QC-Chain combines multiple QC functions:technical tests describe input data status and identify potential errors,quality trimming filters poor sequencing-quality bases and reads,and contamination screening identifies higher eukaryotic species,which are considered as contamination for metagenomic data.Most computing processes are optimized based on parallel programming.Testing on an 8-GB real dataset showed that Meta-QC-Chain trimmed low sequencing-quality reads and contaminating reads,and the whole quality control procedure was completed within 20 min.Therefore,Meta-QC-Chain provides a comprehensive,useful and high-performance QC tool for metagenomic data.Meta-QC-Chain is publicly available for free at:http://computationalbioenergy.org/meta-qc-chain.html. 相似文献

3.

ConPADE: Genome Assembly Ploidy Estimation from Next-Generation Sequencing Data

Gabriel R. A. Margarido David Heckerman 《PLoS computational biology》2015,11(4)

As a result of improvements in genome assembly algorithms and the ever decreasing costs of high-throughput sequencing technologies, new high quality draft genome sequences are published at a striking pace. With well-established methodologies, larger and more complex genomes are being tackled, including polyploid plant genomes. Given the similarity between multiple copies of a basic genome in polyploid individuals, assembly of such data usually results in collapsed contigs that represent a variable number of homoeologous genomic regions. Unfortunately, such collapse is often not ideal, as keeping contigs separate can lead both to improved assembly and also insights about how haplotypes influence phenotype. Here, we describe a first step in avoiding inappropriate collapse during assembly. In particular, we describe ConPADE (Contig Ploidy and Allele Dosage Estimation), a probabilistic method that estimates the ploidy of any given contig/scaffold based on its allele proportions. In the process, we report findings regarding errors in sequencing. The method can be used for whole genome shotgun (WGS) sequencing data. We also show applicability of the method for variant calling and allele dosage estimation. Results for simulated and real datasets are discussed and provide evidence that ConPADE performs well as long as enough sequencing coverage is available, or the true contig ploidy is low. We show that ConPADE may also be used for related applications, such as the identification of duplicated genes in fragmented assemblies, although refinements are needed. 相似文献

4.

Quantifying Population Genetic Differentiation from Next-Generation Sequencing Data

Matteo Fumagalli Filipe G. Vieira Thorfinn Sand Korneliussen Tyler Linderoth Emilia Huerta-Sánchez Anders Albrechtsen Rasmus Nielsen 《Genetics》2013,195(3):979-992

Over the past few years, new high-throughput DNA sequencing technologies have dramatically increased speed and reduced sequencing costs. However, the use of these sequencing technologies is often challenged by errors and biases associated with the bioinformatical methods used for analyzing the data. In particular, the use of naïve methods to identify polymorphic sites and infer genotypes can inflate downstream analyses. Recently, explicit modeling of genotype probability distributions has been proposed as a method for taking genotype call uncertainty into account. Based on this idea, we propose a novel method for quantifying population genetic differentiation from next-generation sequencing data. In addition, we present a strategy for investigating population structure via principal components analysis. Through extensive simulations, we compare the new method herein proposed to approaches based on genotype calling and demonstrate a marked improvement in estimation accuracy for a wide range of conditions. We apply the method to a large-scale genomic data set of domesticated and wild silkworms sequenced at low coverage. We find that we can infer the fine-scale genetic structure of the sampled individuals, suggesting that employing this new method is useful for investigating the genetic relationships of populations sampled at low coverage. 相似文献

5.

Fast and Efficient XML Data Access for Next-Generation Mass Spectrometry

Hannes L. R?st Uwe Schmitt Ruedi Aebersold Lars Malmstr?m 《PloS one》2015,10(4)

Motivation

In mass spectrometry-based proteomics, XML formats such as mzML and mzXML provide an open and standardized way to store and exchange the raw data (spectra and chromatograms) of mass spectrometric experiments. These file formats are being used by a multitude of open-source and cross-platform tools which allow the proteomics community to access algorithms in a vendor-independent fashion and perform transparent and reproducible data analysis. Recent improvements in mass spectrometry instrumentation have increased the data size produced in a single LC-MS/MS measurement and put substantial strain on open-source tools, particularly those that are not equipped to deal with XML data files that reach dozens of gigabytes in size.

Results

Here we present a fast and versatile parsing library for mass spectrometric XML formats available in C++ and Python, based on the mature OpenMS software framework. Our library implements an API for obtaining spectra and chromatograms under memory constraints using random access or sequential access functions, allowing users to process datasets that are much larger than system memory. For fast access to the raw data structures, small XML files can also be completely loaded into memory. In addition, we have improved the parsing speed of the core mzML module by over 4-fold (compared to OpenMS 1.11), making our library suitable for a wide variety of algorithms that need fast access to dozens of gigabytes of raw mass spectrometric data.

Availability

Our C++ and Python implementations are available for the Linux, Mac, and Windows operating systems. All proposed modifications to the OpenMS code have been merged into the OpenMS mainline codebase and are available to the community at https://github.com/OpenMS/OpenMS. 相似文献

6.

High-Throughput Microdissection for Next-Generation Sequencing

Avi Z. Rosenberg Michael D. Armani Patricia A. Fetsch Liqiang Xi Tina Thu Pham Mark Raffeld Yun Chen Neil O’Flaherty Rebecca Stussman Adele R. Blackler Qiang Du Jeffrey C. Hanson Mark J. Roth Armando C. Filie Michael H. Roh Michael R. Emmert-Buck Jason D. Hipp Michael A. Tangrea 《PloS one》2016,11(3)

相似文献

7.

Screening for SNPs with Allele-Specific Methylation Based on Next-Generation Sequencing Data

Bo Hu Yuan Ji Yaomin Xu Angela H. Ting 《Statistics in biosciences》2013,5(1):179-197

Allele-specific methylation (ASM) has long been studied but mainly documented in the context of genomic imprinting and X chromosome inactivation. Taking advantage of the next-generation sequencing technology, we conduct a high-throughput sequencing experiment with four prostate cell lines to survey the whole genome and identify single-nucleotide polymorphisms (SNPs) with ASM. A Bayesian approach is proposed to model the counts of short reads for each SNP conditional on its genotypes of multiple subjects, leading to a posterior probability of ASM. We flag SNPs with high posterior probabilities of ASM by accounting for multiple comparisons based on posterior false discovery rates. Applying the Bayesian approach to the in-house prostate cell line data, we identify 269 SNPs as candidates of ASM. A simulation study is carried out to demonstrate the quantitative performance of the proposed approach. 相似文献

8.

NeSSM: A Next-Generation Sequencing Simulator for Metagenomics

Ben Jia Liming Xuan Kaiye Cai Zhiqiang Hu Liangxiao Ma Chaochun Wei 《PloS one》2013,8(10)

Background

Metagenomics can reveal the vast majority of microbes that have been missed by traditional cultivation-based methods. Due to its extremely wide range of application areas, fast metagenome sequencing simulation systems with high fidelity are in great demand to facilitate the development and comparison of metagenomics analysis tools.

Results

We present here a customizable metagenome simulation system: NeSSM (Next-generation Sequencing Simulator for Metagenomics). Combining complete genomes currently available, a community composition table, and sequencing parameters, it can simulate metagenome sequencing better than existing systems. Sequencing error models based on the explicit distribution of errors at each base and sequencing coverage bias are incorporated in the simulation. In order to improve the fidelity of simulation, tools are provided by NeSSM to estimate the sequencing error models, sequencing coverage bias and the community composition directly from existing metagenome sequencing data. Currently, NeSSM supports single-end and pair-end sequencing for both 454 and Illumina platforms. In addition, a GPU (graphics processing units) version of NeSSM is also developed to accelerate the simulation. By comparing the simulated sequencing data from NeSSM with experimental metagenome sequencing data, we have demonstrated that NeSSM performs better in many aspects than existing popular metagenome simulators, such as MetaSim, GemSIM and Grinder. The GPU version of NeSSM is more than one-order of magnitude faster than MetaSim.

Conclusions

NeSSM is a fast simulation system for high-throughput metagenome sequencing. It can be helpful to develop tools and evaluate strategies for metagenomics analysis and it’s freely available for academic users at http://cbb.sjtu.edu.cn/~ccwei/pub/software/NeSSM.php. 相似文献

9.

An Integrated Approach for Analyzing Clinical Genomic Variant Data from Next-Generation Sequencing

Erin L. Crowgey Deborah L. Stabley Chuming Chen Hongzhan Huang Katherine M. Robbins Shawn W. Polson Katia Sol-Church Cathy H. Wu 《Journal of biomolecular techniques》2015,26(1):19-28

Next-generation sequencing (NGS) technologies provide the potential for developing high-throughput and low-cost platforms for clinical diagnostics. A limiting factor to clinical applications of genomic NGS is downstream bioinformatics analysis for data interpretation. We have developed an integrated approach for end-to-end clinical NGS data analysis from variant detection to functional profiling. Robust bioinformatics pipelines were implemented for genome alignment, single nucleotide polymorphism (SNP), small insertion/deletion (InDel), and copy number variation (CNV) detection of whole exome sequencing (WES) data from the Illumina platform. Quality-control metrics were analyzed at each step of the pipeline by use of a validated training dataset to ensure data integrity for clinical applications. We annotate the variants with data regarding the disease population and variant impact. Custom algorithms were developed to filter variants based on criteria, such as quality of variant, inheritance pattern, and impact of variant on protein function. The developed clinical variant pipeline links the identified rare variants to Integrated Genome Viewer for visualization in a genomic context and to the Protein Information Resource’s iProXpress for rich protein and disease information. With the application of our system of annotations, prioritizations, inheritance filters, and functional profiling and analysis, we have created a unique methodology for downstream variant filtering that empowers clinicians and researchers to interpret more effectively the relevance of genomic alterations within a rare genetic disease. 相似文献

10.

MAFsnp: A Multi-Sample Accurate and Flexible SNP Caller Using Next-Generation Sequencing Data

Jiyuan Hu Tengfei Li Zidi Xiu Hong Zhang 《PloS one》2015,10(8)

Most existing statistical methods developed for calling single nucleotide polymorphisms (SNPs) using next-generation sequencing (NGS) data are based on Bayesian frameworks, and there does not exist any SNP caller that produces p-values for calling SNPs in a frequentist framework. To fill in this gap, we develop a new method MAFsnp, a Multiple-sample based Accurate and Flexible algorithm for calling SNPs with NGS data. MAFsnp is based on an estimated likelihood ratio test (eLRT) statistic. In practical situation, the involved parameter is very close to the boundary of the parametric space, so the standard large sample property is not suitable to evaluate the finite-sample distribution of the eLRT statistic. Observing that the distribution of the test statistic is a mixture of zero and a continuous part, we propose to model the test statistic with a novel two-parameter mixture distribution. Once the parameters in the mixture distribution are estimated, p-values can be easily calculated for detecting SNPs, and the multiple-testing corrected p-values can be used to control false discovery rate (FDR) at any pre-specified level. With simulated data, MAFsnp is shown to have much better control of FDR than the existing SNP callers. Through the application to two real datasets, MAFsnp is also shown to outperform the existing SNP callers in terms of calling accuracy. An R package “MAFsnp” implementing the new SNP caller is freely available at http://homepage.fudan.edu.cn/zhangh/softwares/. 相似文献

11.

Computel: Computation of Mean Telomere Length from Whole-Genome Next-Generation Sequencing Data

Lilit Nersisyan Arsen Arakelyan 《PloS one》2015,10(4)

Telomeres are the ends of eukaryotic chromosomes, consisting of consecutive short repeats that protect chromosome ends from degradation. Telomeres shorten with each cell division, leading to replicative cell senescence. Deregulation of telomere length homeostasis is associated with the development of various age-related diseases and cancers. A number of experimental techniques exist for telomere length measurement; however, until recently, the absence of tools for extracting telomere lengths from high-throughput sequencing data has significantly obscured the association of telomere length with molecular processes in normal and diseased conditions. We have developed Computel, a program in R for computing mean telomere length from whole-genome next-generation sequencing data. Computel is open source, and is freely available at https://github.com/lilit-nersisyan/computel. It utilizes a short-read alignment-based approach and integrates various popular tools for sequencing data analysis. We validated it with synthetic and experimental data, and compared its performance with the previously available software. The results have shown that Computel outperforms existing software in accuracy, independence of results from sequencing conditions, stability against inherent sequencing errors, and better ability to distinguish pure telomeric sequences from interstitial telomeric repeats. By providing a highly reliable methodology for determining telomere lengths from whole-genome sequencing data, Computel should help to elucidate the role of telomeres in cellular health and disease. 相似文献

12.

Next-Generation Sequencing Analysis of MiRNA Expression in Control and FSHD Myogenesis

Veronica Colangelo Stéphanie Fran?ois Giulia Soldà Raffaella Picco Francesca Roma Enrico Ginelli Raffaella Meneveri 《PloS one》2014,9(10)

Emerging evidence has demonstrated that miRNA sequences can regulate skeletal myogenesis by controlling the process of myoblast proliferation and differentiation. However, at present a deep analysis of miRNA expression in control and FSHD myoblasts during differentiation has not yet been derived. To close this gap, we used a next-generation sequencing (NGS) approach applied to in vitro myogenesis. Furthermore, to minimize sample genetic heterogeneity and muscle-type specific patterns of gene expression, miRNA profiling from NGS data was filtered with FC≥4 (log₂FC≥2) and p-value<0.05, and its validation was derived by qRT-PCR on myoblasts from seven muscle districts. In particular, control myogenesis showed the modulation of 38 miRNAs, the majority of which (34 out 38) were up-regulated, including myomiRs (miR-1, -133a, -133b and -206). Approximately one third of the modulated miRNAs were not previously reported to be involved in muscle differentiation, and interestingly some of these (i.e. miR-874, -1290, -95 and -146a) were previously shown to regulate cell proliferation and differentiation. FSHD myogenesis evidenced a reduced number of modulated miRNAs than healthy muscle cells. The two processes shared nine miRNAs, including myomiRs, although with FC values lower in FSHD than in control cells. In addition, FSHD cells showed the modulation of six miRNAs (miR-1268, -1268b, -1908, 4258, -4508- and -4516) not evidenced in control cells and that therefore could be considered FSHD-specific, likewise three novel miRNAs that seem to be specifically expressed in FSHD myotubes. These data further clarify the impact of miRNA regulation during control myogenesis and strongly suggest that a complex dysregulation of miRNA expression characterizes FSHD, impairing two important features of myogenesis: cell cycle and muscle development. The derived miRNA profiling could represent a novel molecular signature for FSHD that includes diagnostic biomarkers and possibly therapeutic targets. 相似文献

13.

DDBJ Read Annotation Pipeline: A Cloud Computing-Based Pipeline for High-Throughput Analysis of Next-Generation Sequencing Data

Hideki Nagasaki Takako Mochizuki Yuichi Kodama Satoshi Saruhashi Shota Morizaki Hideaki Sugawara Hajime Ohyanagi Nori Kurata Kousaku Okubo Toshihisa Takagi Eli Kaminuma Yasukazu Nakamura 《DNA research》2013,20(4):383-390

High-performance next-generation sequencing (NGS) technologies are advancing genomics and molecular biological research. However, the immense amount of sequence data requires computational skills and suitable hardware resources that are a challenge to molecular biologists. The DNA Data Bank of Japan (DDBJ) of the National Institute of Genetics (NIG) has initiated a cloud computing-based analytical pipeline, the DDBJ Read Annotation Pipeline (DDBJ Pipeline), for a high-throughput annotation of NGS reads. The DDBJ Pipeline offers a user-friendly graphical web interface and processes massive NGS datasets using decentralized processing by NIG supercomputers currently free of charge. The proposed pipeline consists of two analysis components: basic analysis for reference genome mapping and de novo assembly and subsequent high-level analysis of structural and functional annotations. Users may smoothly switch between the two components in the pipeline, facilitating web-based operations on a supercomputer for high-throughput data analysis. Moreover, public NGS reads of the DDBJ Sequence Read Archive located on the same supercomputer can be imported into the pipeline through the input of only an accession number. This proposed pipeline will facilitate research by utilizing unified analytical workflows applied to the NGS data. The DDBJ Pipeline is accessible at http://p.ddbj.nig.ac.jp/. 相似文献

14.

Stepwise Threshold Clustering: A New Method for Genotyping MHC Loci Using Next-Generation Sequencing Technology

William E. Stutz Daniel I. Bolnick 《PloS one》2014,9(7)

Genes of the vertebrate major histocompatibility complex (MHC) are of great interest to biologists because of their important role in immunity and disease, and their extremely high levels of genetic diversity. Next generation sequencing (NGS) technologies are quickly becoming the method of choice for high-throughput genotyping of multi-locus templates like MHC in non-model organisms.Previous approaches to genotyping MHC genes using NGS technologies suffer from two problems:1) a “gray zone” where low frequency alleles and high frequency artifacts can be difficult to disentangle and 2) a similar sequence problem, where very similar alleles can be difficult to distinguish as two distinct alleles. Here were present a new method for genotyping MHC loci – Stepwise Threshold Clustering (STC) – that addresses these problems by taking full advantage of the increase in sequence data provided by NGS technologies. Unlike previous approaches for genotyping MHC with NGS data that attempt to classify individual sequences as alleles or artifacts, STC uses a quasi-Dirichlet clustering algorithm to cluster similar sequences at increasing levels of sequence similarity. By applying frequency and similarity based criteria to clusters rather than individual sequences, STC is able to successfully identify clusters of sequences that correspond to individual or similar alleles present in the genomes of individual samples. Furthermore, STC does not require duplicate runs of all samples, increasing the number of samples that can be genotyped in a given project. We show how the STC method works using a single sample library. We then apply STC to 295 threespine stickleback (Gasterosteus aculeatus) samples from four populations and show that neighboring populations differ significantly in MHC allele pools. We show that STC is a reliable, accurate, efficient, and flexible method for genotyping MHC that will be of use to biologists interested in a variety of downstream applications. 相似文献

15.

A Scalable and Accurate Targeted Gene Assembly Tool (SAT-Assembler) for Next-Generation Sequencing Data

Yuan Zhang Yanni Sun James R. Cole 《PLoS computational biology》2014,10(8)

Gene assembly, which recovers gene segments from short reads, is an important step in functional analysis of next-generation sequencing data. Lacking quality reference genomes, de novo assembly is commonly used for RNA-Seq data of non-model organisms and metagenomic data. However, heterogeneous sequence coverage caused by heterogeneous expression or species abundance, similarity between isoforms or homologous genes, and large data size all pose challenges to de novo assembly. As a result, existing assembly tools tend to output fragmented contigs or chimeric contigs, or have high memory footprint. In this work, we introduce a targeted gene assembly program SAT-Assembler, which aims to recover gene families of particular interest to biologists. It addresses the above challenges by conducting family-specific homology search, homology-guided overlap graph construction, and careful graph traversal. It can be applied to both RNA-Seq and metagenomic data. Our experimental results on an Arabidopsis RNA-Seq data set and two metagenomic data sets show that SAT-Assembler has smaller memory usage, comparable or better gene coverage, and lower chimera rate for assembling a set of genes from one or multiple pathways compared with other assembly tools. Moreover, the family-specific design and rapid homology search allow SAT-Assembler to be naturally compatible with parallel computing platforms. The source code of SAT-Assembler is available at https://sourceforge.net/projects/sat-assembler/. The data sets and experimental settings can be found in supplementary material. 相似文献

16.

VarWalker: Personalized Mutation Network Analysis of Putative Cancer Genes from Next-Generation Sequencing Data

Peilin Jia Zhongming Zhao 《PLoS computational biology》2014,10(2)

A major challenge in interpreting the large volume of mutation data identified by next-generation sequencing (NGS) is to distinguish driver mutations from neutral passenger mutations to facilitate the identification of targetable genes and new drugs. Current approaches are primarily based on mutation frequencies of single-genes, which lack the power to detect infrequently mutated driver genes and ignore functional interconnection and regulation among cancer genes. We propose a novel mutation network method, VarWalker, to prioritize driver genes in large scale cancer mutation data. VarWalker fits generalized additive models for each sample based on sample-specific mutation profiles and builds on the joint frequency of both mutation genes and their close interactors. These interactors are selected and optimized using the Random Walk with Restart algorithm in a protein-protein interaction network. We applied the method in >300 tumor genomes in two large-scale NGS benchmark datasets: 183 lung adenocarcinoma samples and 121 melanoma samples. In each cancer, we derived a consensus mutation subnetwork containing significantly enriched consensus cancer genes and cancer-related functional pathways. These cancer-specific mutation networks were then validated using independent datasets for each cancer. Importantly, VarWalker prioritizes well-known, infrequently mutated genes, which are shown to interact with highly recurrently mutated genes yet have been ignored by conventional single-gene-based approaches. Utilizing VarWalker, we demonstrated that network-assisted approaches can be effectively adapted to facilitate the detection of cancer driver genes in NGS data. 相似文献

17.

Correction: PCR-Free Enrichment of Mitochondrial DNA from Human Blood and Cell Lines for High Quality Next-Generation DNA Sequencing

Meetha P. Gould Colleen M. Bosworth Sarah McMahon Sneha Grandhi Brian T. Grimberg Thomas LaFramboise 《PloS one》2016,11(5)

相似文献

18.

A Microfluidic DNA Library Preparation Platform for Next-Generation Sequencing

Hanyoup Kim Mais J. Jebrail Anupama Sinha Zachary W. Bent Owen D. Solberg Kelly P. Williams Stanley A. Langevin Ronald F. Renzi James L. Van De Vreugde Robert J. Meagher Joseph S. Schoeniger Todd W. Lane Steven S. Branda Michael S. Bartsch Kamlesh D. Patel 《PloS one》2013,8(7)

Next-generation sequencing (NGS) is emerging as a powerful tool for elucidating genetic information for a wide range of applications. Unfortunately, the surging popularity of NGS has not yet been accompanied by an improvement in automated techniques for preparing formatted sequencing libraries. To address this challenge, we have developed a prototype microfluidic system for preparing sequencer-ready DNA libraries for analysis by Illumina sequencing. Our system combines droplet-based digital microfluidic (DMF) sample handling with peripheral modules to create a fully-integrated, sample-in library-out platform. In this report, we use our automated system to prepare NGS libraries from samples of human and bacterial genomic DNA. E. coli libraries prepared on-device from 5 ng of total DNA yielded excellent sequence coverage over the entire bacterial genome, with >99% alignment to the reference genome, even genome coverage, and good quality scores. Furthermore, we produced a de novo assembly on a previously unsequenced multi-drug resistant Klebsiella pneumoniae strain BAA-2146 (KpnNDM). The new method described here is fast, robust, scalable, and automated. Our device for library preparation will assist in the integration of NGS technology into a wide variety of laboratories, including small research laboratories and clinical laboratories. 相似文献

19.

PCR-Free Enrichment of Mitochondrial DNA from Human Blood and Cell Lines for High Quality Next-Generation DNA Sequencing

Meetha P. Gould Colleen M. Bosworth Sarah McMahon Sneha Grandhi Brian T. Grimerg Thomas LaFramboise 《PloS one》2015,10(10)

Recent advances in sequencing technology allow for accurate detection of mitochondrial sequence variants, even those in low abundance at heteroplasmic sites. Considerable sequencing cost savings can be achieved by enriching samples for mitochondrial (relative to nuclear) DNA. Reduction in nuclear DNA (nDNA) content can also help to avoid false positive variants resulting from nuclear mitochondrial sequences (numts). We isolate intact mitochondrial organelles from both human cell lines and blood components using two separate methods: a magnetic bead binding protocol and differential centrifugation. DNA is extracted and further enriched for mitochondrial DNA (mtDNA) by an enzyme digest. Only 1 ng of the purified DNA is necessary for library preparation and next generation sequence (NGS) analysis. Enrichment methods are assessed and compared using mtDNA (versus nDNA) content as a metric, measured by using real-time quantitative PCR and NGS read analysis. Among the various strategies examined, the optimal is differential centrifugation isolation followed by exonuclease digest. This strategy yields >35% mtDNA reads in blood and cell lines, which corresponds to hundreds-fold enrichment over baseline. The strategy also avoids false variant calls that, as we show, can be induced by the long-range PCR approaches that are the current standard in enrichment procedures. This optimization procedure allows mtDNA enrichment for efficient and accurate massively parallel sequencing, enabling NGS from samples with small amounts of starting material. This will decrease costs by increasing the number of samples that may be multiplexed, ultimately facilitating efforts to better understand mitochondria-related diseases. 相似文献

20.

Correction: Targeted Next-Generation Sequencing for Clinical Diagnosis of 561 Mendelian Diseases

Yanqiu Liu Xiaoming Wei Xiangdong Kong Xueqin Guo Yan Sun Jianfen Man Lique Du Hui Zhu Zelan Qu Ping Tian Bing Mao Yun Yang 《PloS one》2015,10(9)

相似文献