共查询到20条相似文献,搜索用时 31 毫秒
1.
Grace Logan Graham L Freimanis David J King Bego?a Valdazo-González Katarzyna Bachanek-Bankowska Nicholas D Sanderson Nick J Knowles Donald P King Eleanor M Cottam 《BMC genomics》2014,15(1)
Background
Next-Generation Sequencing (NGS) is revolutionizing molecular epidemiology by providing new approaches to undertake whole genome sequencing (WGS) in diagnostic settings for a variety of human and veterinary pathogens. Previous sequencing protocols have been subject to biases such as those encountered during PCR amplification and cell culture, or are restricted by the need for large quantities of starting material. We describe here a simple and robust methodology for the generation of whole genome sequences on the Illumina MiSeq. This protocol is specific for foot-and-mouth disease virus (FMDV) or other polyadenylated RNA viruses and circumvents both the use of PCR and the requirement for large amounts of initial template.Results
The protocol was successfully validated using five FMDV positive clinical samples from the 2001 epidemic in the United Kingdom, as well as a panel of representative viruses from all seven serotypes. In addition, this protocol was successfully used to recover 94% of an FMDV genome that had previously been identified as cell culture negative. Genome sequences from three other non-FMDV polyadenylated RNA viruses (EMCV, ERAV, VESV) were also obtained with minor protocol amendments. We calculated that a minimum coverage depth of 22 reads was required to produce an accurate consensus sequence for FMDV O. This was achieved in 5 FMDV/O/UKG isolates and the type O FMDV from the serotype panel with the exception of the 5′ genomic termini and area immediately flanking the poly(C) region.Conclusions
We have developed a universal WGS method for FMDV and other polyadenylated RNA viruses. This method works successfully from a limited quantity of starting material and eliminates the requirement for genome-specific PCR amplification. This protocol has the potential to generate consensus-level sequences within a routine high-throughput diagnostic environment.Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-828) contains supplementary material, which is available to authorized users. 相似文献2.
Background
Bacterial DNA contamination in PCR reagents has been a long standing problem that hampers the adoption of broad-range PCR in clinical and applied microbiology, particularly in detection of low abundance bacteria. Although several DNA decontamination protocols have been reported, they all suffer from compromised PCR efficiency or detection limits. To date, no satisfactory solution has been found.Methodology/Principal Findings
We herein describe a method that solves this long standing problem by employing a broad-range primer extension-PCR (PE-PCR) strategy that obviates the need for DNA decontamination. In this method, we first devise a fusion probe having a 3′-end complementary to the template bacterial sequence and a 5′-end non-bacterial tag sequence. We then hybridize the probes to template DNA, carry out primer extension and remove the excess probes using an optimized enzyme mix of Klenow DNA polymerase and exonuclease I. This strategy allows the templates to be distinguished from the PCR reagent contaminants and selectively amplified by PCR. To prove the concept, we spiked the PCR reagents with Staphylococcus aureus genomic DNA and applied PE-PCR to amplify template bacterial DNA. The spiking DNA neither interfered with template DNA amplification nor caused false positive of the reaction. Broad-range PE-PCR amplification of the 16S rRNA gene was also validated and minute quantities of template DNA (10–100 fg) were detectable without false positives. When adapting to real-time and high-resolution melting (HRM) analytical platforms, the unique melting profiles for the PE-PCR product can be used as the molecular fingerprints to further identify individual bacterial species.Conclusions/Significance
Broad-range PE-PCR is simple, efficient, and completely obviates the need to decontaminate PCR reagents. When coupling with real-time and HRM analyses, it offers a new avenue for bacterial species identification with a limited source of bacterial DNA, making it suitable for use in clinical and applied microbiology laboratories. 相似文献3.
Sébastien Rodrigue Rex R. Malmstrom Aaron M. Berlin Bruce W. Birren Matthew R. Henn Sallie W. Chisholm 《PloS one》2009,4(9)
Background
Single-cell genome sequencing has the potential to allow the in-depth exploration of the vast genetic diversity found in uncultured microbes. We used the marine cyanobacterium Prochlorococcus as a model system for addressing important challenges facing high-throughput whole genome amplification (WGA) and complete genome sequencing of individual cells.Methodology/Principal Findings
We describe a pipeline that enables single-cell WGA on hundreds of cells at a time while virtually eliminating non-target DNA from the reactions. We further developed a post-amplification normalization procedure that mitigates extreme variations in sequencing coverage associated with multiple displacement amplification (MDA), and demonstrated that the procedure increased sequencing efficiency and facilitated genome assembly. We report genome recovery as high as 99.6% with reference-guided assembly, and 95% with de novo assembly starting from a single cell. We also analyzed the impact of chimera formation during MDA on de novo assembly, and discuss strategies to minimize the presence of incorrectly joined regions in contigs.Conclusions/Significance
The methods describe in this paper will be useful for sequencing genomes of individual cells from a variety of samples. 相似文献4.
Eva C Berglund Carl M?rten Lindqvist Shahina Hayat Elin ?vern?s Niklas Henriksson Jessica Nordlund Per Wahlberg Erik Forestier Gudmar L?nnerholm Ann-Christine Syv?nen 《BMC genomics》2013,14(1)
Background
Target enrichment and resequencing is a widely used approach for identification of cancer genes and genetic variants associated with diseases. Although cost effective compared to whole genome sequencing, analysis of many samples constitutes a significant cost, which could be reduced by pooling samples before capture. Another limitation to the number of cancer samples that can be analyzed is often the amount of available tumor DNA. We evaluated the performance of whole genome amplified DNA and the power to detect subclonal somatic single nucleotide variants in non-indexed pools of cancer samples using the HaloPlex technology for target enrichment and next generation sequencing.Results
We captured a set of 1528 putative somatic single nucleotide variants and germline SNPs, which were identified by whole genome sequencing, with the HaloPlex technology and sequenced to a depth of 792–1752. We found that the allele fractions of the analyzed variants are well preserved during whole genome amplification and that capture specificity or variant calling is not affected. We detected a large majority of the known single nucleotide variants present uniquely in one sample with allele fractions as low as 0.1 in non-indexed pools of up to ten samples. We also identified and experimentally validated six novel variants in the samples included in the pools.Conclusion
Our work demonstrates that whole genome amplified DNA can be used for target enrichment equally well as genomic DNA and that accurate variant detection is possible in non-indexed pools of cancer samples. These findings show that analysis of a large number of samples is feasible at low cost, even when only small amounts of DNA is available, and thereby significantly increases the chances of indentifying recurrent mutations in cancer samples.Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-14-856) contains supplementary material, which is available to authorized users. 相似文献5.
Background
Influenza viruses exist as a large group of closely related viral genomes, also called quasispecies. The composition of this influenza viral quasispecies can be determined by an accurate and sensitive sequencing technique and data analysis pipeline. We compared the suitability of two benchtop next-generation sequencers for whole genome influenza A quasispecies analysis: the Illumina MiSeq sequencing-by-synthesis and the Ion Torrent PGM semiconductor sequencing technique.Results
We first compared the accuracy and sensitivity of both sequencers using plasmid DNA and different ratios of wild type and mutant plasmid. Illumina MiSeq sequencing reads were one and a half times more accurate than those of the Ion Torrent PGM. The majority of sequencing errors were substitutions on the Illumina MiSeq and insertions and deletions, mostly in homopolymer regions, on the Ion Torrent PGM. To evaluate the suitability of the two techniques for determining the genome diversity of influenza A virus, we generated plasmid-derived PR8 virus and grew this virus in vitro. We also optimized an RT-PCR protocol to obtain uniform coverage of all eight genomic RNA segments. The sequencing reads obtained with both sequencers could successfully be assembled de novo into the segmented influenza virus genome. After mapping of the reads to the reference genome, we found that the detection limit for reliable recognition of variants in the viral genome required a frequency of 0.5% or higher. This threshold exceeds the background error rate resulting from the RT-PCR reaction and the sequencing method. Most of the variants in the PR8 virus genome were present in hemagglutinin, and these mutations were detected by both sequencers.Conclusions
Our approach underlines the power and limitations of two commonly used next-generation sequencers for the analysis of influenza virus gene diversity. We conclude that the Illumina MiSeq platform is better suited for detecting variant sequences whereas the Ion Torrent PGM platform has a shorter turnaround time. The data analysis pipeline that we propose here will also help to standardize variant calling in small RNA genomes based on next-generation sequencing data. 相似文献6.
7.
Hui Yang Natalia Volfovsky Alison Rattray Xiongfong Chen Hisashi Tanaka Jeffrey Strathern 《BMC genomics》2014,15(1)
Background
Closely spaced long inverted repeats, also known as DNA palindromes, can undergo intrastrand annealing to form DNA hairpins. The ability to form these hairpins results in genome instability, difficulties in maintaining clones in Escherichia coli and major problems for most DNA sequencing approaches. Because of their role in genomic instability and gene amplification in some human cancers, it is important to develop systematic approaches to detect and characterize DNA palindromes.Results
We developed a new protocol to identify palindromes that couples the S1 nuclease treated Cot0 DNA (GAPF) with high-throughput sequencing (GAP-Seq). Unlike earlier protocols, it does not involve restriction enzymatic digestion prior to DNA snap-back thereby preserving longer DNA sequences. It also indicates the location of the novel junction, which can then be recovered. Using MCF-7 breast cancer cell line as the proof-of-principle analysis, we have identified 35 palindrome candidates and physically characterized the top 5 candidates and their junctions. Because this protocol eliminates many of the false positives that plague earlier techniques, we have improved palindrome identification.Conclusions
The GAP-Seq approach underscores the importance of developing new tools for identifying and characterizing palindromes, and provides a new strategy to systematically assess palindromes in genomes. It will be useful for studying human cancers and other diseases associated with palindromes.Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-394) contains supplementary material, which is available to authorized users. 相似文献8.
9.
Kristina Warton Vita Lin Tina Navin Nicola J Armstrong Warren Kaplan Kevin Ying Brian Gloss Helena Mangs Shalima S Nair Neville F Hacker Robert L Sutherland Susan J Clark Goli Samimi 《BMC genomics》2014,15(1)
Background
Free circulating DNA (fcDNA) has many potential clinical applications, due to the non-invasive way in which it is collected. However, because of the low concentration of fcDNA in blood, genome-wide analysis carries many technical challenges that must be overcome before fcDNA studies can reach their full potential. There are currently no definitive standards for fcDNA collection, processing and whole-genome sequencing. We report novel detailed methodology for the capture of high-quality methylated fcDNA, library preparation and downstream genome-wide Next-Generation Sequencing. We also describe the effects of sample storage, processing and scaling on fcDNA recovery and quality.Results
Use of serum versus plasma, and storage of blood prior to separation resulted in genomic DNA contamination, likely due to leukocyte lysis. Methylated fcDNA fragments were isolated from 5 donors using a methyl-binding protein-based protocol and appear as a discrete band of ~180 bases. This discrete band allows minimal sample loss at the size restriction step in library preparation for Next-Generation Sequencing, allowing for high-quality sequencing from minimal amounts of fcDNA. Following sequencing, we obtained 37×106-86×106 unique mappable reads, representing more than 50% of total mappable reads. The methylation status of 9 genomic regions as determined by DNA capture and sequencing was independently validated by clonal bisulphite sequencing.Conclusions
Our optimized methods provide high-quality methylated fcDNA suitable for whole-genome sequencing, and allow good library complexity and accurate sequencing, despite using less than half of the recommended minimum input DNA.Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-476) contains supplementary material, which is available to authorized users. 相似文献10.
A Whole Genome Amplification Method to Generate Long Fragments from Low Quantities of Genomic DNA 总被引:5,自引:0,他引:5
Several whole genome amplification strategies have been developed to preamplify the entire genome from minimal amounts of DNA for subsequent molecular genetic analysis. However, none of these techniques has proven to amplify long products from very low (nanogram or picogram) quantities of genomic DNA. Here we report a new whole genome amplification protocol using a degenerate primer (DOP-PCR) that generates products up to about 10 kb in length from less than 1 ng genomic template DNA. This new protocol (LL-DOP-PCR) allows in the subsequent PCR the specific amplification, with high fidelity, of DNA fragments that are more than 1 kb in length. LL-DOP-PCR provides significantly better coverage for microsatellites and unique sequences in comparison to a conventional DOP-PCR method. 相似文献
11.
Habib A Shojaei Saadi Christian Vigneault Mehdi Sargolzaei Dominic Gagné éric Fournier Béatrice de Montera Jacques Chesnais Patrick Blondin Claude Robert 《BMC genomics》2014,15(1)
Background
Genome-wide profiling of single-nucleotide polymorphisms is receiving increasing attention as a method of pre-implantation genetic diagnosis in humans and of commercial genotyping of pre-transfer embryos in cattle. However, the very small quantity of genomic DNA in biopsy material from early embryos poses daunting technical challenges. A reliable whole-genome amplification (WGA) procedure would greatly facilitate the procedure.Results
Several PCR-based and non-PCR based WGA technologies, namely multiple displacement amplification, quasi-random primed library synthesis followed by PCR, ligation-mediated PCR, and single-primer isothermal amplification were tested in combination with different DNA extractions protocols for various quantities of genomic DNA inputs. The efficiency of each method was evaluated by comparing the genotypes obtained from 15 cultured cells (representative of an embryonic biopsy) to unamplified reference gDNA. The gDNA input, gDNA extraction method and amplification technology were all found to be critical for successful genome-wide genotyping. The selected WGA platform was then tested on embryo biopsies (n = 226), comparing their results to that of biopsies collected after birth. Although WGA inevitably leads to a random loss of information and to the introduction of erroneous genotypes, following genomic imputation the resulting genetic index of both sources of DNA were highly correlated (r = 0.99, P<0.001).Conclusion
It is possible to generate high-quality DNA in sufficient quantities for successful genome-wide genotyping starting from an early embryo biopsy. However, imputation from parental and population genotypes is a requirement for completing and correcting genotypic data. Judicious selection of the WGA platform, careful handling of the samples and genomic imputation together, make it possible to perform extremely reliable genomic evaluations for pre-transfer embryos.Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-889) contains supplementary material, which is available to authorized users. 相似文献12.
Asan Chunyu Geng Yan Chen Kui Wu Qingle Cai Yu Wang Yongshan Lang Hongzhi Cao Huangming Yang Jian Wang Xiuqing Zhang 《PloS one》2012,7(9)
Background
The relatively short read lengths from next generation sequencing (NGS) technologies still pose a challenge for de novo assembly of complex mammal genomes. One important solution is to use paired-end (PE) sequence information experimentally obtained from long-range DNA fragments (>1 kb). Here, we characterize and extend a long-range PE library construction method based on direct intra-molecule ligation (or molecular linker-free circularization) for NGS.Results
We found that the method performs stably for PE sequencing of 2- to 5- kb DNA fragments, and can be extended to 10–20 kb (and even in extremes, up to ∼35 kb). We also characterized the impact of low quality input DNA on the method, and develop a whole-genome amplification (WGA) based protocol using limited input DNA (<1 µg). Using this PE dataset, we accurately assembled the YanHuang (YH) genome, the first sequenced Asian genome, into a scaffold N50 size of >2 Mb, which is over100-times greater than the initial size produced with only small insert PE reads(17 kb). In addition, we mapped two 7- to 8- kb insertions in the YH genome using the larger insert sizes of the long-range PE data.Conclusions
In conclusion, we demonstrate here the effectiveness of this long-range PE sequencing method and its use for the de novo assembly of a large, complex genome using NGS short reads. 相似文献13.
Giovanna Carpi Katharine S. Walter Stephen J. Bent Anne Gatewood Hoen Maria Diuk-Wasser Adalgisa Caccone 《BMC genomics》2015,16(1)
Background
Rapid and accurate retrieval of whole genome sequences of human pathogens from disease vectors or animal reservoirs will enable fine-resolution studies of pathogen epidemiological and evolutionary dynamics. However, next generation sequencing technologies have not yet been fully harnessed for the study of vector-borne and zoonotic pathogens, due to the difficulty of obtaining high-quality pathogen sequence data directly from field specimens with a high ratio of host to pathogen DNA.Results
We addressed this challenge by using custom probes for multiplexed hybrid capture to enrich for and sequence 30 Borrelia burgdorferi genomes from field samples of its arthropod vector. Hybrid capture enabled sequencing of nearly the complete genome (~99.5 %) of the Borrelia burgdorferi pathogen with 132-fold coverage, and identification of up to 12,291 single nucleotide polymorphisms per genome.Conclusions
The proprosed culture-independent method enables efficient whole genome capture and sequencing of pathogens directly from arthropod vectors, thus making population genomic study of vector-borne and zoonotic infectious diseases economically feasible and scalable. Furthermore, given the similarities of invertebrate field specimens to other mixed DNA templates characterized by a high ratio of host to pathogen DNA, we discuss the potential applicabilty of hybrid capture for genomic study across diverse study systems.Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1634-x) contains supplementary material, which is available to authorized users. 相似文献14.
Michael J. Reagin Theresa L. Giesler Alia L. Merla Jeanine M. Resetar-Gerke Kinga M. Kapolka J. Anthony Mamone 《Journal of biomolecular techniques》2003,14(2):143-148
Preparing plasmid templates for DNA sequencing is the most time-consuming step in the sequencing process. Current template preparation methods rely on a labor-intensive, multistep procedure that takes up to 24 h and produces templates of varying quality and quantity. The TempliPhi™ DNA Sequencing Template Amplification Kit eliminates the requirement for extended bacterial growth prior to sequencing and saves laboratory personnel hands-on time by eliminating the centrifugation and transfer steps currently required by older preparatory methods. In addition, costly purification filters and columns are not necessary, as amplified product can be added directly to a sequencing reaction. Starting material can be any circular template from a colony, culture, glycerol stock, or plaque. Based on rolling circle amplification and employing bacteriophage Phi29 DNA polymerase, the method can produce 3–5 μg of template directly from a single bacterial colony in as little as 4 h. Implementation of these procedures in a laboratory or core sequencing facility can decrease cost on tips, plates, and other plasticware, while at the same time increase throughput. 相似文献
15.
Matthew R. Henn Matthew B. Sullivan Nicole Stange-Thomann Marcia S. Osburne Aaron M. Berlin Libusha Kelly Chandri Yandava Chinnappa Kodira Qiandong Zeng Michael Weiand Todd Sparrow Sakina Saif Georgia Giannoukos Sarah K. Young Chad Nusbaum Bruce W. Birren Sallie W. Chisholm 《PloS one》2010,5(2)
Background
Bacterial viruses (phages) play a critical role in shaping microbial populations as they influence both host mortality and horizontal gene transfer. As such, they have a significant impact on local and global ecosystem function and human health. Despite their importance, little is known about the genomic diversity harbored in phages, as methods to capture complete phage genomes have been hampered by the lack of knowledge about the target genomes, and difficulties in generating sufficient quantities of genomic DNA for sequencing. Of the approximately 550 phage genomes currently available in the public domain, fewer than 5% are marine phage.Methodology/Principal Findings
To advance the study of phage biology through comparative genomic approaches we used marine cyanophage as a model system. We compared DNA preparation methodologies (DNA extraction directly from either phage lysates or CsCl purified phage particles), and sequencing strategies that utilize either Sanger sequencing of a linker amplification shotgun library (LASL) or of a whole genome shotgun library (WGSL), or 454 pyrosequencing methods. We demonstrate that genomic DNA sample preparation directly from a phage lysate, combined with 454 pyrosequencing, is best suited for phage genome sequencing at scale, as this method is capable of capturing complete continuous genomes with high accuracy. In addition, we describe an automated annotation informatics pipeline that delivers high-quality annotation and yields few false positives and negatives in ORF calling.Conclusions/Significance
These DNA preparation, sequencing and annotation strategies enable a high-throughput approach to the burgeoning field of phage genomics. 相似文献16.
Mohammed-Amin Madoui Stefan Engelen Corinne Cruaud Caroline Belser Laurie Bertrand Adriana Alberti Arnaud Lemainque Patrick Wincker Jean-Marc Aury 《BMC genomics》2015,16(1)
Background
Long-read sequencing technologies were launched a few years ago, and in contrast with short-read sequencing technologies, they offered a promise of solving assembly problems for large and complex genomes. Moreover by providing long-range information, it could also solve haplotype phasing. However, existing long-read technologies still have several limitations that complicate their use for most research laboratories, as well as in large and/or complex genome projects. In 2014, Oxford Nanopore released the MinION® device, a small and low-cost single-molecule nanopore sequencer, which offers the possibility of sequencing long DNA fragments.Results
The assembly of long reads generated using the Oxford Nanopore MinION® instrument is challenging as existing assemblers were not implemented to deal with long reads exhibiting close to 30% of errors. Here, we presented a hybrid approach developed to take advantage of data generated using MinION® device. We sequenced a well-known bacterium, Acinetobacter baylyi ADP1 and applied our method to obtain a highly contiguous (one single contig) and accurate genome assembly even in repetitive regions, in contrast to an Illumina-only assembly. Our hybrid strategy was able to generate NaS (Nanopore Synthetic-long) reads up to 60 kb that aligned entirely and with no error to the reference genome and that spanned highly conserved repetitive regions. The average accuracy of NaS reads reached 99.99% without losing the initial size of the input MinION® reads.Conclusions
We described NaS tool, a hybrid approach allowing the sequencing of microbial genomes using the MinION® device. Our method, based ideally on 20x and 50x of NaS and Illumina reads respectively, provides an efficient and cost-effective way of sequencing microbial or small eukaryotic genomes in a very short time even in small facilities. Moreover, we demonstrated that although the Oxford Nanopore technology is a relatively new sequencing technology, currently with a high error rate, it is already useful in the generation of high-quality genome assemblies.Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1519-z) contains supplementary material, which is available to authorized users. 相似文献17.
Matthew Parker Xiang Chen Armita Bahrami James Dalton Michael Rusch Gang Wu John Easton Nai-Kong Cheung Michael Dyer Elaine R Mardis Richard K Wilson Charles Mullighan Richard Gilbertson Suzanne J Baker Gerard Zambetti David W Ellison James R Downing Jinghui Zhang 《Genome biology》2012,13(12):R113
Background
Telomeres are the protective arrays of tandem TTAGGG sequence and associated proteins at the termini of chromosomes. Telomeres shorten at each cell division due to the end-replication problem and are maintained above a critical threshold in malignant cancer cells to prevent cellular senescence or apoptosis. With the recent advances in massive parallel sequencing, assessing telomere content in the context of other cancer genomic aberrations becomes an attractive possibility. We present the first comprehensive analysis of telomeric DNA content change in tumors using whole-genome sequencing data from 235 pediatric cancers.Results
To measure telomeric DNA content, we counted telomeric reads containing TTAGGGx4 or CCCTAAx4 and normalized to the average genomic coverage. Changes in telomeric DNA content in tumor genomes were clustered using a Bayesian Information Criterion to determine loss, no change, or gain. Using this approach, we found that the pattern of telomeric DNA alteration varies dramatically across the landscape of pediatric malignancies: telomere gain was found in 32% of solid tumors, 4% of brain tumors and 0% of hematopoietic malignancies. The results were validated by three independent experimental approaches and reveal significant association of telomere gain with the frequency of somatic sequence mutations and structural variations.Conclusions
Telomere DNA content measurement using whole-genome sequencing data is a reliable approach that can generate useful insights into the landscape of the cancer genome. Measuring the change in telomeric DNA during malignant progression is likely to be a useful metric when considering telomeres in the context of the whole genome. 相似文献18.
Romain Blanc-Mathieu Bram Verhelst Evelyne Derelle Stephane Rombauts Fran?ois-Yves Bouget Isabelle Carré Annie Chateau Adam Eyre-Walker Nigel Grimsley Hervé Moreau Benoit Piégu Eric Rivals Wendy Schackwitz Yves Van de Peer Gwena?l Piganeau 《BMC genomics》2014,15(1)
Background
Cost effective next generation sequencing technologies now enable the production of genomic datasets for many novel planktonic eukaryotes, representing an understudied reservoir of genetic diversity. O. tauri is the smallest free-living photosynthetic eukaryote known to date, a coccoid green alga that was first isolated in 1995 in a lagoon by the Mediterranean sea. Its simple features, ease of culture and the sequencing of its 13 Mb haploid nuclear genome have promoted this microalga as a new model organism for cell biology. Here, we investigated the quality of genome assemblies of Illumina GAIIx 75 bp paired-end reads from Ostreococcus tauri, thereby also improving the existing assembly and showing the genome to be stably maintained in culture.Results
The 3 assemblers used, ABySS, CLCBio and Velvet, produced 95% complete genomes in 1402 to 2080 scaffolds with a very low rate of misassembly. Reciprocally, these assemblies improved the original genome assembly by filling in 930 gaps. Combined with additional analysis of raw reads and PCR sequencing effort, 1194 gaps have been solved in total adding up to 460 kb of sequence. Mapping of RNAseq Illumina data on this updated genome led to a twofold reduction in the proportion of multi-exon protein coding genes, representing 19% of the total 7699 protein coding genes. The comparison of the DNA extracted in 2001 and 2009 revealed the fixation of 8 single nucleotide substitutions and 2 deletions during the approximately 6000 generations in the lab. The deletions either knocked out or truncated two predicted transmembrane proteins, including a glutamate-receptor like gene.Conclusion
High coverage (>80 fold) paired-end Illumina sequencing enables a high quality 95% complete genome assembly of a compact ~13 Mb haploid eukaryote. This genome sequence has remained stable for 6000 generations of lab culture.Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-1103) contains supplementary material, which is available to authorized users. 相似文献19.