期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Coverage Bias and Sensitivity of Variant Calling for Four Whole-genome Sequencing Technologies

Nora Rieber Marc Zapatka B?rbel Lasitschka David Jones Paul Northcott Barbara Hutter Natalie J?ger Marcel Kool Michael Taylor Peter Lichter Stefan Pfister Stephan Wolf Benedikt Brors Roland Eils 《PloS one》2013,8(6)

The emergence of high-throughput, next-generation sequencing technologies has dramatically altered the way we assess genomes in population genetics and in cancer genomics. Currently, there are four commonly used whole-genome sequencing platforms on the market: Illumina’s HiSeq2000, Life Technologies’ SOLiD 4 and its completely redesigned 5500xl SOLiD, and Complete Genomics’ technology. A number of earlier studies have compared a subset of those sequencing platforms or compared those platforms with Sanger sequencing, which is prohibitively expensive for whole genome studies. Here we present a detailed comparison of the performance of all currently available whole genome sequencing platforms, especially regarding their ability to call SNVs and to evenly cover the genome and specific genomic regions. Unlike earlier studies, we base our comparison on four different samples, allowing us to assess the between-sample variation of the platforms. We find a pronounced GC bias in GC-rich regions for Life Technologies’ platforms, with Complete Genomics performing best here, while we see the least bias in GC-poor regions for HiSeq2000 and 5500xl. HiSeq2000 gives the most uniform coverage and displays the least sample-to-sample variation. In contrast, Complete Genomics exhibits by far the smallest fraction of bases not covered, while the SOLiD platforms reveal remarkable shortcomings, especially in covering CpG islands. When comparing the performance of the four platforms for calling SNPs, HiSeq2000 and Complete Genomics achieve the highest sensitivity, while the SOLiD platforms show the lowest false positive rate. Finally, we find that integrating sequencing data from different platforms offers the potential to combine the strengths of different technologies. In summary, our results detail the strengths and weaknesses of all four whole-genome sequencing platforms. It indicates application areas that call for a specific sequencing platform and disallow other platforms. This helps to identify the proper sequencing platform for whole genome studies with different application scopes. 相似文献

2.

Application of Genotyping-by-Sequencing on Semiconductor Sequencing Platforms: A Comparison of Genetic and Reference-Based Marker Ordering in Barley

Martin Mascher Shuangye Wu Paul St. Amand Nils Stein Jesse Poland 《PloS one》2013,8(10)

The rapid development of next-generation sequencing platforms has enabled the use of sequencing for routine genotyping across a range of genetics studies and breeding applications. Genotyping-by-sequencing (GBS), a low-cost, reduced representation sequencing method, is becoming a common approach for whole-genome marker profiling in many species. With quickly developing sequencing technologies, adapting current GBS methodologies to new platforms will leverage these advancements for future studies. To test new semiconductor sequencing platforms for GBS, we genotyped a barley recombinant inbred line (RIL) population. Based on a previous GBS approach, we designed bar code and adapter sets for the Ion Torrent platforms. Four sets of 24-plex libraries were constructed consisting of 94 RILs and the two parents and sequenced on two Ion platforms. In parallel, a 96-plex library of the same RILs was sequenced on the Illumina HiSeq 2000. We applied two different computational pipelines to analyze sequencing data; the reference-independent TASSEL pipeline and a reference-based pipeline using SAMtools. Sequence contigs positioned on the integrated physical and genetic map were used for read mapping and variant calling. We found high agreement in genotype calls between the different platforms and high concordance between genetic and reference-based marker order. There was, however, paucity in the number of SNP that were jointly discovered by the different pipelines indicating a strong effect of alignment and filtering parameters on SNP discovery. We show the utility of the current barley genome assembly as a framework for developing very low-cost genetic maps, facilitating high resolution genetic mapping and negating the need for developing de novo genetic maps for future studies in barley. Through demonstration of GBS on semiconductor sequencing platforms, we conclude that the GBS approach is amenable to a range of platforms and can easily be modified as new sequencing technologies, analysis tools and genomic resources develop. 相似文献

3.

Performance comparison of exome DNA sequencing technologies

Clark MJ Chen R Lam HY Karczewski KJ Chen R Euskirchen G Butte AJ Snyder M 《Nature biotechnology》2011,29(10):908-914

Whole exome sequencing by high-throughput sequencing of target-enriched genomic DNA (exome-seq) has become common in basic and translational research as a means of interrogating the interpretable part of the human genome at relatively low cost. We present a comparison of three major commercial exome sequencing platforms from Agilent, Illumina and Nimblegen applied to the same human blood sample. Our results suggest that the Nimblegen platform, which is the only one to use high-density overlapping baits, covers fewer genomic regions than the other platforms but requires the least amount of sequencing to sensitively detect small variants. Agilent and Illumina are able to detect a greater total number of variants with additional sequencing. Illumina captures untranslated regions, which are not targeted by the Nimblegen and Agilent platforms. We also compare exome sequencing and whole genome sequencing (WGS) of the same sample, demonstrating that exome sequencing can detect additional small variants missed by WGS. 相似文献

4.

High-Throughput,Amplicon-Based Sequencing of the CREBBP Gene as a Tool to Develop a Universal Platform-Independent Assay

Marc W. Fuellgrabe Dietrich Herrmann Henrik Knecht Sven Kuenzel Michael Kneba Christiane Pott Monika Brüggemann 《PloS one》2015,10(6)

High-throughput sequencing technologies are widely used to analyse genomic variants or rare mutational events in different fields of genomic research, with a fast development of new or adapted platforms and technologies, enabling amplicon-based analysis of single target genes or even whole genome sequencing within a short period of time. Each sequencing platform is characterized by well-defined types of errors, resulting from different steps in the sequencing workflow. Here we describe a universal method to prepare amplicon libraries that can be used for sequencing on different high-throughput sequencing platforms. We have sequenced distinct exons of the CREB binding protein (CREBBP) gene and analysed the output resulting from three major deep-sequencing platforms. platform-specific errors were adjusted according to the result of sequence analysis from the remaining platforms. Additionally, bioinformatic methods are described to determine platform dependent errors. Summarizing the results we present a platform-independent cost-efficient and timesaving method that can be used as an alternative to commercially available sample-preparation kits. 相似文献

5.

Comparison of Sequencing Platforms for Single Nucleotide Variant Calls in a Human Sample

Aakrosh Ratan Webb Miller Joseph Guillory Jeremy Stinson Somasekar Seshagiri Stephan C. Schuster 《PloS one》2013,8(2)

Next-generation sequencings platforms coupled with advanced bioinformatic tools enable re-sequencing of the human genome at high-speed and large cost savings. We compare sequencing platforms from Roche/454(GS FLX), Illumina/HiSeq (HiSeq 2000), and Life Technologies/SOLiD (SOLiD 3 ECC) for their ability to identify single nucleotide substitutions in whole genome sequences from the same human sample. We report on significant GC-related bias observed in the data sequenced on Illumina and SOLiD platforms. The differences in the variant calls were investigated with regards to coverage, and sequencing error. Some of the variants called by only one or two of the platforms were experimentally tested using mass spectrometry; a method that is independent of DNA sequencing. We establish several causes why variants remained unreported, specific to each platform. We report the indel called using the three sequencing technologies and from the obtained results we conclude that sequencing human genomes with more than a single platform and multiple libraries is beneficial when high level of accuracy is required. 相似文献

6.

高通量测序技术在细菌耐药中的应用

陈慧娟刘琪琦《中国生物化学与分子生物学报》2022,38(7):865-874

细菌耐药已成为威胁全球人类公共健康的重要因素之一,快速、准确明确细菌耐药的特性、机制及传播特征对疾病治疗及控制耐药菌的传播具有重要意义。高通量测序技术可以同时平行检测多个基因序列的状态,已广泛应用于细菌耐药检测。目前高通量测序技术在细菌耐药领域的应用主要有:全基因组测序技术、目标区域测序技术和宏基因组测序技术。所采用的测序平台主要为Illumina、Ion Torrent、BGI等二代测序和Pacific Biosciences、Oxford Nonopore 等三代测序平台。通过细菌耐药基因预测细菌耐药表型的准确性在很大程度上依赖于成熟的专业耐药基因数据库,各种通用型、特异型及隐马尔可夫模型耐药基因数据库的建立和完善,为高通量测序技术在细菌耐药领域的应用提供了坚实的基础。本文简要介绍了高通量测序技术、数据分析方法及相应测序平台在细菌耐药领域中的应用进展,并同时介绍了细菌耐药数据库的现状。相似文献

7.

The Accuracy,Feasibility and Challenges of Sequencing Short Tandem Repeats Using Next-Generation Sequencing Platforms

Monika Zavodna Andrew Bagshaw Rudiger Brauning Neil J. Gemmell 《PloS one》2014,9(12)

To date we have little knowledge of how accurate next-generation sequencing (NGS) technologies are in sequencing repetitive sequences beyond known limitations to accurately sequence homopolymers. Only a handful of previous reports have evaluated the potential of NGS for sequencing short tandem repeats (microsatellites) and no empirical study has compared and evaluated the performance of more than one NGS platform with the same dataset. Here we examined yeast microsatellite variants from both long-read (454-sequencing) and short-read (Illumina) NGS platforms and compared these to data derived through Sanger sequencing. In addition, we investigated any locus-specific biases and differences that might have resulted from variability in microsatellite repeat number, repeat motif or type of mutation. Out of 112 insertion/deletion variants identified among 45 microsatellite amplicons in our study, we found 87.5% agreement between the 454-platform and Sanger sequencing in frequency of variant detection after Benjamini-Hochberg correction for multiple tests. For a subset of 21 microsatellite amplicons derived from Illumina sequencing, the results of short-read platform were highly consistent with the other two platforms, with 100% agreement with 454-sequencing and 93.6% agreement with the Sanger method after Benjamini-Hochberg correction. We found that the microsatellite attributes copy number, repeat motif and type of mutation did not have a significant effect on differences seen between the sequencing platforms. We show that both long-read and short-read NGS platforms can be used to sequence short tandem repeats accurately, which makes it feasible to consider the use of these platforms in high-throughput genotyping. It appears the major requirement for achieving both high accuracy and rare variant detection in microsatellite genotyping is sufficient read depth coverage. This might be a challenge because each platform generates a consistent pattern of non-uniform sequence coverage, which, as our study suggests, may affect some types of tandem repeats more than others. 相似文献

8.

A platform-independent method for detecting errors in metagenomic sequencing data: DRISEE

Keegan KP Trimble WL Wilkening J Wilke A Harrison T D'Souza M Meyer F 《PLoS computational biology》2012,8(6):e1002541

We provide a novel method, DRISEE (duplicate read inferred sequencing error estimation), to assess sequencing quality (alternatively referred to as "noise" or "error") within and/or between sequencing samples. DRISEE provides positional error estimates that can be used to inform read trimming within a sample. It also provides global (whole sample) error estimates that can be used to identify samples with high or varying levels of sequencing error that may confound downstream analyses, particularly in the case of studies that utilize data from multiple sequencing samples. For shotgun metagenomic data, we believe that DRISEE provides estimates of sequencing error that are more accurate and less constrained by technical limitations than existing methods that rely on reference genomes or the use of scores (e.g. Phred). Here, DRISEE is applied to (non amplicon) data sets from both the 454 and Illumina platforms. The DRISEE error estimate is obtained by analyzing sets of artifactual duplicate reads (ADRs), a known by-product of both sequencing platforms. We present DRISEE as an open-source, platform-independent method to assess sequencing error in shotgun metagenomic data, and utilize it to discover previously uncharacterized error in de novo sequence data from the 454 and Illumina sequencing platforms. 相似文献

9.

Ultra-high-throughput microbial community analysis on the Illumina HiSeq and MiSeq platforms 总被引：3，自引：0，他引：3

J Gregory Caporaso Christian L Lauber William A Walters Donna Berg-Lyons James Huntley Noah Fierer Sarah M Owens Jason Betley Louise Fraser Markus Bauer Niall Gormley Jack A Gilbert Geoff Smith Rob Knight 《The ISME journal》2012,6(8):1621-1624

DNA sequencing continues to decrease in cost with the Illumina HiSeq2000 generating up to 600 Gb of paired-end 100 base reads in a ten-day run. Here we present a protocol for community amplicon sequencing on the HiSeq2000 and MiSeq Illumina platforms, and apply that protocol to sequence 24 microbial communities from host-associated and free-living environments. A critical question as more sequencing platforms become available is whether biological conclusions derived on one platform are consistent with what would be derived on a different platform. We show that the protocol developed for these instruments successfully recaptures known biological results, and additionally that biological conclusions are consistent across sequencing platforms (the HiSeq2000 versus the MiSeq) and across the sequenced regions of amplicons. 相似文献

10.

The effects of read length,quality and quantity on microsatellite discovery and primer development: from Illumina to PacBio

Na Wei Jordan B. Bemmels Christopher W. Dick 《Molecular ecology resources》2014,14(5):953-965

The advent of next‐generation sequencing (NGS) technologies has transformed the way microsatellites are isolated for ecological and evolutionary investigations. Recent attempts to employ NGS for microsatellite discovery have used the 454, Illumina, and Ion Torrent platforms, but other methods including single‐molecule real‐time DNA sequencing (Pacific Biosciences or PacBio) remain viable alternatives. We outline a workflow from sequence quality control to microsatellite marker validation in three plant species using PacBio circular consensus sequencing (CCS). We then evaluate the performance of PacBio CCS in comparison with other NGS platforms for microsatellite isolation, through simulations that focus on variations in read length, read quantity and sequencing error rate. Although quality control of CCS reads reduced microsatellite yield by around 50%, hundreds of microsatellite loci that are expected to have improved conversion efficiency to functional markers were retrieved for each species. The simulations quantitatively validate the advantages of long reads and emphasize the detrimental effects of sequencing errors on NGS‐enabled microsatellite development. In view of the continuing improvement in read length on NGS platforms, sequence quality and the corresponding strategies of quality control will become the primary factors to consider for effective microsatellite isolation. Among current options, PacBio CCS may be optimal for rapid, small‐scale microsatellite development due to its flexibility in scaling sequencing effort, while platforms such as Illumina MiSeq will provide cost‐efficient solutions for multispecies microsatellite projects. 相似文献

11.

Revising a Personal Genome by Comparing and Combining Data from Two Different Sequencing Platforms

Deokhoon Kim Woo-Yeon Kim Sun-Young Lee Sung-Yeoun Lee Hongseok Yun Soo-Yong Shin Jungyoun Lee Yoojin Hong Youngmi Won Seong-Jin Kim Yong Seok Lee Sung-Min Ahn 《PloS one》2013,8(4)

For the robust practice of genomic medicine, sequencing results must be compatible, regardless of the sequencing technologies and algorithms used. Presently, genome sequencing is still an imprecise science and is complicated by differences in the chemistry, coverage, alignment, and variant-calling algorithms. We identified ∼3.33 million single nucleotide variants (SNVs) and ∼3.62 million SNVs in the SJK genome using SOLiD and Illumina data, respectively. Approximately 3 million SNVs were concordant between the two platforms while 68,532 SNVs were discordant; 219,616 SNVs were SOLiD-specific and 516,080 SNVs were Illumina-specific (i.e., platform-specific). Concordant, discordant, and platform-specific SNVs were further analyzed and characterized. Overall, a large portion of heterozygous SNVs that were discordant with genotyping calls of single nucleotide polymorphism chips were highly confident. Approximately 70% of the platform-specific SNVs were located in regions containing repetitive sequences. Such platform-specificity may arise from differences between platforms, with regard to read length (36 bp and 72 bp vs. 50 bp), insert size (∼100–300 bp vs. ∼1–2 kb), sequencing chemistry (sequencing-by-synthesis using single nucleotides vs. ligation-based sequencing using oligomers), and sequencing quality. When data from the two platforms were merged for variant calling, the proportion of callable regions of the reference genome increased to 99.66%, which was 1.43% higher than the average callability of the two platforms, representing ∼40 million bases. In this study, we compared the differences in sequencing results between two sequencing platforms. Approximately 90% of the SNVs were concordant between the two platforms, yet ∼10% of the SNVs were either discordant or platform-specific, indicating that each platform had its own strengths and weaknesses. When data from the two platforms were merged, both the overall callability of the reference genome and the overall accuracy of the SNVs improved, demonstrating the likelihood that a re-sequenced genome can be revised using complementary data. 相似文献

12.

Direct Chloroplast Sequencing: Comparison of Sequencing Platforms and Analysis Tools for Whole Chloroplast Barcoding

Marta Brozynska Agnelo Furtado Robert James Henry 《PloS one》2014,9(10)

Direct sequencing of total plant DNA using next generation sequencing technologies generates a whole chloroplast genome sequence that has the potential to provide a barcode for use in plant and food identification. Advances in DNA sequencing platforms may make this an attractive approach for routine plant identification. The HiSeq (Illumina) and Ion Torrent (Life Technology) sequencing platforms were used to sequence total DNA from rice to identify polymorphisms in the whole chloroplast genome sequence of a wild rice plant relative to cultivated rice (cv. Nipponbare). Consensus chloroplast sequences were produced by mapping sequence reads to the reference rice chloroplast genome or by de novo assembly and mapping of the resulting contigs to the reference sequence. A total of 122 polymorphisms (SNPs and indels) between the wild and cultivated rice chloroplasts were predicted by these different sequencing and analysis methods. Of these, a total of 102 polymorphisms including 90 SNPs were predicted by both platforms. Indels were more variable with different sequencing methods, with almost all discrepancies found in homopolymers. The Ion Torrent platform gave no apparent false SNP but was less reliable for indels. The methods should be suitable for routine barcoding using appropriate combinations of sequencing platform and data analysis. 相似文献

13.

Validation of multiple single nucleotide variation calls by additional exome analysis with a semiconductor sequencer to supplement data of whole-genome sequencing of a human population

Ikuko N Motoike Mitsuyo Matsumoto Inaho Danjoh Fumiki Katsuoka Kaname Kojima Naoki Nariai Yukuto Sato Yumi Yamaguchi-Kabata Shin Ito Hisaaki Kudo Ichiko Nishijima Satoshi Nishikawa Xiaoqing Pan Rumiko Saito Sakae Saito Tomo Saito Matsuyuki Shirota Kaoru Tsuda Junji Yokozawa Kazuhiko Igarashi Naoko Minegishi Osamu Tanabe Nobuo Fuse Masao Nagasaki Kengo Kinoshita Jun Yasuda Masayuki Yamamoto 《BMC genomics》2014,15(1)

Background

Validation of single nucleotide variations in whole-genome sequencing is critical for studying disease-related variations in large populations. A combination of different types of next-generation sequencers for analyzing individual genomes may be an efficient means of validating multiple single nucleotide variations calls simultaneously.

Results

Here, we analyzed 12 independent Japanese genomes using two next-generation sequencing platforms: the Illumina HiSeq 2500 platform for whole-genome sequencing (average depth 32.4×), and the Ion Proton semiconductor sequencer for whole exome sequencing (average depth 109×). Single nucleotide polymorphism (SNP) calls based on the Illumina Human Omni 2.5-8 SNP chip data were used as the reference. We compared the variant calls for the 12 samples, and found that the concordance between the two next-generation sequencing platforms varied between 83% and 97%.

Conclusions

Our results show the versatility and usefulness of the combination of exome sequencing with whole-genome sequencing in studies of human population genetics and demonstrate that combining data from multiple sequencing platforms is an efficient approach to validate and supplement SNP calls.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-673) contains supplementary material, which is available to authorized users. 相似文献

14.

Performance Comparison of Digital microRNA Profiling Technologies Applied on Human Breast Cancer Cell Lines

Erik Knutsen Tonje Fiskaa Anita Ursvik Tor Erik J?rgensen Maria Perander Eiliv Lund Ole Morten Seternes Steinar D. Johansen Morten Andreassen 《PloS one》2013,8(10)

MicroRNA profiling represents an important first-step in deducting individual RNA-based regulatory function in a cell, tissue, or at a specific developmental stage. Currently there are several different platforms to choose from in order to make the initial miRNA profiles. In this study we investigate recently developed digital microRNA high-throughput technologies. Four different platforms were compared including next generation SOLiD ligation sequencing and Illumina HiSeq sequencing, hybridization-based NanoString nCounter, and miRCURY locked nucleic acid RT-qPCR. For all four technologies, full microRNA profiles were generated from human cell lines that represent noninvasive and invasive tumorigenic breast cancer. This study reports the correlation between platforms, as well as a more extensive analysis of the accuracy and sensitivity of data generated when using different platforms and important consideration when verifying results by the use of additional technologies. We found all the platforms to be highly capable for microRNA analysis. Furthermore, the two NGS platforms and RT-qPCR all have equally high sensitivity, and the fold change accuracy is independent of individual miRNA concentration for NGS and RT-qPCR. Based on these findings we propose new guidelines and considerations when performing microRNA profiling. 相似文献

15.

Next generation sequencing based approaches to epigenomics 总被引：1，自引：0，他引：1

Hirst M Marra MA 《Briefings in functional genomics》2010,9(5-6):455-465

Next generation sequencing has brought epigenomic studies to the forefront of current research. The power of massively parallel sequencing coupled to innovative molecular and computational techniques has allowed researchers to profile the epigenome at resolutions that were unimaginable only a few years ago. With early proof of concept studies published, the field is now moving into the next phase where the importance of method standardization and rigorous quality control are becoming paramount. In this review we will describe methodologies that have been developed to profile the epigenome using next generation sequencing platforms. We will discuss these in terms of library preparation, sequence platforms and analysis techniques. 相似文献

16.

Deep-sequencing protocols influence the results obtained in small-RNA sequencing

Toedling J Servant N Ciaudo C Farinelli L Voinnet O Heard E Barillot E 《PloS one》2012,7(2):e32724

Second-generation sequencing is a powerful method for identifying and quantifying small-RNA components of cells. However, little attention has been paid to the effects of the choice of sequencing platform and library preparation protocol on the results obtained. We present a thorough comparison of small-RNA sequencing libraries generated from the same embryonic stem cell lines, using different sequencing platforms, which represent the three major second-generation sequencing technologies, and protocols. We have analysed and compared the expression of microRNAs, as well as populations of small RNAs derived from repetitive elements. Despite the fact that different libraries display a good correlation between sequencing platforms, qualitative and quantitative variations in the results were found, depending on the protocol used. Thus, when comparing libraries from different biological samples, it is strongly recommended to use the same sequencing platform and protocol in order to ensure the biological relevance of the comparisons. 相似文献

17.

Generations of sequencing technologies

Pettersson E Lundeberg J Ahmadian A 《Genomics》2009,93(2):105-111

Advancements in the field of DNA sequencing are changing the scientific horizon and promising an era of personalized medicine for elevated human health. Although platforms are improving at the rate of Moore's Law, thereby reducing the sequencing costs by a factor of two or three each year, we find ourselves at a point in history where individual genomes are starting to appear but where the cost is still too high for routine sequencing of whole genomes. These needs will be met by miniaturized and parallelized platforms that allow a lower sample and template consumption thereby increasing speed and reducing costs. Current massively parallel, state-of-the-art systems are providing significantly improved throughput over Sanger systems and future single-molecule approaches will continue the exponential improvements in the field. 相似文献

18.

Read length versus Depth of Coverage for Viral Quasispecies Reconstruction

Osvaldo Zagordi Martin D?umer Christian Beisel Niko Beerenwinkel 《PloS one》2012,7(10)

Recent advancements of sequencing technology have opened up unprecedented opportunities in many application areas. Virus samples can now be sequenced efficiently with very deep coverage to infer the genetic diversity of the underlying virus populations. Several sequencing platforms with different underlying technologies and performance characteristics are available for viral diversity studies. Here, we investigate how the differences between two common platforms provided by 454/Roche and Illumina affect viral diversity estimation and the reconstruction of viral haplotypes. Using a mixture of ten HIV clones sequenced with both platforms and additional simulation experiments, we assessed the trade-off between sequencing coverage, read length, and error rate. For fixed costs, short Illumina reads can be generated at higher coverage and allow for detecting variants at lower frequencies. They can also be sufficient to assess the diversity of the sample if sequences are dissimilar enough, but, in general, assembly of full-length haplotypes is feasible only with the longer 454/Roche reads. The quantitative comparison highlights the advantages and disadvantages of both platforms and provides guidance for the design of viral diversity studies. 相似文献

19.

Performance comparison of whole-genome sequencing platforms

Lam HY Clark MJ Chen R Chen R Natsoulis G O'Huallachain M Dewey FE Habegger L Ashley EA Gerstein MB Butte AJ Ji HP Snyder M 《Nature biotechnology》2012,30(1):78-82

Whole-genome sequencing is becoming commonplace, but the accuracy and completeness of variant calling by the most widely used platforms from Illumina and Complete Genomics have not been reported. Here we sequenced the genome of an individual with both technologies to a high average coverage of ～76×, and compared their performance with respect to sequence coverage and calling of single-nucleotide variants (SNVs), insertions and deletions (indels). Although 88.1% of the ～3.7 million unique SNVs were concordant between platforms, there were tens of thousands of platform-specific calls located in genes and other genomic regions. In contrast, 26.5% of indels were concordant between platforms. Target enrichment validated 92.7% of the concordant SNVs, whereas validation by genotyping array revealed a sensitivity of 99.3%. The validation experiments also suggested that >60% of the platform-specific variants were indeed present in the genome. Our results have important implications for understanding the accuracy and completeness of the genome sequencing platforms. 相似文献

20.

Evaluation of next generation sequencing platforms for population targeted sequencing studies

Olivier Harismendy Pauline C Ng Robert L Strausberg Xiaoyun Wang Timothy B Stockwell Karen Y Beeson Nicholas J Schork Sarah S Murray Eric J Topol Samuel Levy Kelly A Frazer 《Genome biology》2009,10(3):R32-13

Background

Next generation sequencing (NGS) platforms are currently being utilized for targeted sequencing of candidate genes or genomic intervals to perform sequence-based association studies. To evaluate these platforms for this application, we analyzed human sequence generated by the Roche 454, Illumina GA, and the ABI SOLiD technologies for the same 260 kb in four individuals.

Results

Local sequence characteristics contribute to systematic variability in sequence coverage (>100-fold difference in per-base coverage), resulting in patterns for each NGS technology that are highly correlated between samples. A comparison of the base calls to 88 kb of overlapping ABI 3730xL Sanger sequence generated for the same samples showed that the NGS platforms all have high sensitivity, identifying >95% of variant sites. At high coverage, depth base calling errors are systematic, resulting from local sequence contexts; as the coverage is lowered additional 'random sampling' errors in base calling occur.

Conclusions

Our study provides important insights into systematic biases and data variability that need to be considered when utilizing NGS platforms for population targeted sequencing studies. 相似文献