期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Efficient and accurate whole genome assembly and methylome profiling of E. coli

Jason G Powers Victor J Weigman Jenny Shu John M Pufky Donald Cox Patrick Hurban 《BMC genomics》2013,14(1)

Background

With the price of next generation sequencing steadily decreasing, bacterial genome assembly is now accessible to a wide range of researchers. It is therefore necessary to understand the best methods for generating a genome assembly, specifically, which combination of sequencing and bioinformatics strategies result in the most accurate assemblies. Here, we sequence three E. coli strains on the Illumina MiSeq, Life Technologies Ion Torrent PGM, and Pacific Biosciences RS. We then perform genome assemblies on all three datasets alone or in combination to determine the best methods for the assembly of bacterial genomes.

Results

Three E. coli strains – BL21(DE3), Bal225, and DH5α – were sequenced to a depth of 100× on the MiSeq and Ion Torrent machines and to at least 125× on the PacBio RS. Four assembly methods were examined and compared. The previously published BL21(DE3) genome [GenBank:AM946981.2], allowed us to evaluate the accuracy of each of the BL21(DE3) assemblies. BL21(DE3) PacBio-only assemblies resulted in a 90% reduction in contigs versus short read only assemblies, while N50 numbers increased by over 7-fold. Strikingly, the number of SNPs in PacBio-only assemblies were less than half that seen with short read assemblies (~20 SNPs vs. ~50 SNPs) and indels also saw dramatic reductions (~2 indel >5 bp in PacBio-only assemblies vs. ~12 for short-read only assemblies). Assemblies that used a mixture of PacBio and short read data generally fell in between these two extremes. Use of PacBio sequencing reads also allowed us to call covalent base modifications for the three strains. Each of the strains used here had a known covalent base modification genotype, which was confirmed by PacBio sequencing.

Conclusion

Using data generated solely from the Pacific Biosciences RS, we were able to generate the most complete and accurate de novo assemblies of E. coli strains. We found that the addition of other sequencing technology data offered no improvements over use of PacBio data alone. In addition, the sequencing data from the PacBio RS allowed for sensitive and specific calling of covalent base modifications.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-14-675) contains supplementary material, which is available to authorized users. 相似文献

2.

Analysis of the genetic diversity of influenza A viruses using next-generation DNA sequencing

Silvie Van den Hoecke Judith Verhelst Marnik Vuylsteke Xavier Saelens 《BMC genomics》2015,16(1)

Background

Influenza viruses exist as a large group of closely related viral genomes, also called quasispecies. The composition of this influenza viral quasispecies can be determined by an accurate and sensitive sequencing technique and data analysis pipeline. We compared the suitability of two benchtop next-generation sequencers for whole genome influenza A quasispecies analysis: the Illumina MiSeq sequencing-by-synthesis and the Ion Torrent PGM semiconductor sequencing technique.

Results

We first compared the accuracy and sensitivity of both sequencers using plasmid DNA and different ratios of wild type and mutant plasmid. Illumina MiSeq sequencing reads were one and a half times more accurate than those of the Ion Torrent PGM. The majority of sequencing errors were substitutions on the Illumina MiSeq and insertions and deletions, mostly in homopolymer regions, on the Ion Torrent PGM. To evaluate the suitability of the two techniques for determining the genome diversity of influenza A virus, we generated plasmid-derived PR8 virus and grew this virus in vitro. We also optimized an RT-PCR protocol to obtain uniform coverage of all eight genomic RNA segments. The sequencing reads obtained with both sequencers could successfully be assembled de novo into the segmented influenza virus genome. After mapping of the reads to the reference genome, we found that the detection limit for reliable recognition of variants in the viral genome required a frequency of 0.5% or higher. This threshold exceeds the background error rate resulting from the RT-PCR reaction and the sequencing method. Most of the variants in the PR8 virus genome were present in hemagglutinin, and these mutations were detected by both sequencers.

Conclusions

Our approach underlines the power and limitations of two commonly used next-generation sequencers for the analysis of influenza virus gene diversity. We conclude that the Illumina MiSeq platform is better suited for detecting variant sequences whereas the Ion Torrent PGM platform has a shorter turnaround time. The data analysis pipeline that we propose here will also help to standardize variant calling in small RNA genomes based on next-generation sequencing data. 相似文献

3.

Transcriptome sequencing and annotation of the polychaete Hermodice carunculata (Annelida,Amphinomidae)

Shaadi Mehr Aida Verdes Rob DeSalle John Sparks Vincent Pieribone David F Gruber 《BMC genomics》2015,16(1)

相似文献

4.

Non-referenced genome assembly from epigenomic short-read data

Antony Kaspi Mark Ziemann Samuel T Keating Ishant Khurana Timothy Connor Briana Spolding Adrian Cooper Ross Lazarus Ken Walder Paul Zimmet Assam El-Osta 《Epigenetics》2014,9(10):1329-1338

Current computational methods used to analyze changes in DNA methylation and chromatin modification rely on sequenced genomes. Here we describe a pipeline for the detection of these changes from short-read sequence data that does not require a reference genome. Open source software packages were used for sequence assembly, alignment, and measurement of differential enrichment. The method was evaluated by comparing results with reference-based results showing a strong correlation between chromatin modification and gene expression. We then used our de novo sequence assembly to build the DNA methylation profile for the non-referenced Psammomys obesus genome. The pipeline described uses open source software for fast annotation and visualization of unreferenced genomic regions from short-read data. 相似文献

5.

A draft genome of field pennycress (Thlaspi arvense) provides tools for the domestication of a new winter biofuel crop

Kevin M. Dorn Johnathon D. Fankhauser Donald L. Wyse M. David Marks 《DNA research》2015,22(2):121-131

Field pennycress (Thlaspi arvense L.) is being domesticated as a new winter cover crop and biofuel species for the Midwestern United States that can be double-cropped between corn and soybeans. A genome sequence will enable the use of new technologies to make improvements in pennycress. To generate a draft genome, a hybrid sequencing approach was used to generate 47 Gb of DNA sequencing reads from both the Illumina and PacBio platforms. These reads were used to assemble 6,768 genomic scaffolds. The draft genome was annotated using the MAKER pipeline, which identified 27,390 predicted protein-coding genes, with almost all of these predicted peptides having significant sequence similarity to Arabidopsis proteins. A comprehensive analysis of pennycress gene homologues involved in glucosinolate biosynthesis, metabolism, and transport pathways revealed high sequence conservation compared with other Brassicaceae species, and helps validate the assembly of the pennycress gene space in this draft genome. Additional comparative genomic analyses indicate that the knowledge gained from years of basic Brassicaceae research will serve as a powerful tool for identifying gene targets whose manipulation can be predicted to result in improvements for pennycress. 相似文献

6.

De Novo sequencing and transcriptome analysis for Tetramorium bicarinatum: a comprehensive venom gland transcriptome analysis from an ant species

Wafa Bouzid Marion Verdenaud Christophe Klopp Frédéric Ducancel Céline Noirot Angélique Vétillard 《BMC genomics》2014,15(1)

相似文献

7.

The oak gene expression atlas: insights into Fagaceae genome evolution and the discovery of genes regulated during bud dormancy release

Isabelle Lesur Grégoire Le Provost Pascal Bento Corinne Da Silva Jean-Charles Leplé Florent Murat Saneyoshi Ueno Jer?me Bartholomé Céline Lalanne Fran?ois Ehrenmann Céline Noirot Christian Burban Valérie Léger Joelle Amselem Caroline Belser Hadi Quesneville Michael Stierschneider Silvia Fluch Lasse Feldhahn Mika Tarkka Sylvie Herrmann Fran?ois Buscot Christophe Klopp Antoine Kremer Jér?me Salse Jean-Marc Aury Christophe Plomion 《BMC genomics》2015,16(1)

相似文献

8.

PLEK: a tool for predicting long non-coding RNAs and messenger RNAs based on an improved k-mer scheme

Aimin Li Junying Zhang Zhongyin Zhou 《BMC bioinformatics》2014,15(1)

相似文献

9.

Comprehensive transcriptome analysis of Crocus sativus for discovery and expression of genes involved in apocarotenoid biosynthesis

Shoib Ahmad Baba Tabasum Mohiuddin Swaraj Basu Mohit Kumar Swarnkar Aubid Hussain Malik Zahoor Ahmed Wani Nazia Abbas Anil Kumar Singh Nasheeman Ashraf 《BMC genomics》2015,16(1)

相似文献

10.

Genome assembly using Nanopore-guided long and error-free DNA reads

Mohammed-Amin Madoui Stefan Engelen Corinne Cruaud Caroline Belser Laurie Bertrand Adriana Alberti Arnaud Lemainque Patrick Wincker Jean-Marc Aury 《BMC genomics》2015,16(1)

Background

Long-read sequencing technologies were launched a few years ago, and in contrast with short-read sequencing technologies, they offered a promise of solving assembly problems for large and complex genomes. Moreover by providing long-range information, it could also solve haplotype phasing. However, existing long-read technologies still have several limitations that complicate their use for most research laboratories, as well as in large and/or complex genome projects. In 2014, Oxford Nanopore released the MinION® device, a small and low-cost single-molecule nanopore sequencer, which offers the possibility of sequencing long DNA fragments.

Results

The assembly of long reads generated using the Oxford Nanopore MinION® instrument is challenging as existing assemblers were not implemented to deal with long reads exhibiting close to 30% of errors. Here, we presented a hybrid approach developed to take advantage of data generated using MinION® device. We sequenced a well-known bacterium, Acinetobacter baylyi ADP1 and applied our method to obtain a highly contiguous (one single contig) and accurate genome assembly even in repetitive regions, in contrast to an Illumina-only assembly. Our hybrid strategy was able to generate NaS (Nanopore Synthetic-long) reads up to 60 kb that aligned entirely and with no error to the reference genome and that spanned highly conserved repetitive regions. The average accuracy of NaS reads reached 99.99% without losing the initial size of the input MinION® reads.

Conclusions

We described NaS tool, a hybrid approach allowing the sequencing of microbial genomes using the MinION® device. Our method, based ideally on 20x and 50x of NaS and Illumina reads respectively, provides an efficient and cost-effective way of sequencing microbial or small eukaryotic genomes in a very short time even in small facilities. Moreover, we demonstrated that although the Oxford Nanopore technology is a relatively new sequencing technology, currently with a high error rate, it is already useful in the generation of high-quality genome assemblies.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1519-z) contains supplementary material, which is available to authorized users. 相似文献

11.

A pipeline for the de novo assembly of the Themira biloba (Sepsidae: Diptera) transcriptome using a multiple k-mer length approach

Dacotah Melicher Alex S Torson Ian Dworkin Julia H Bowsher 《BMC genomics》2014,15(1)

相似文献

12.

Transcriptome analysis of northern elephant seal (Mirounga angustirostris) muscle tissue provides a novel molecular resource and physiological insights

Jane I Khudyakov Likit Preeyanon Cory D Champagne Rudy M Ortiz Daniel E Crocker 《BMC genomics》2015,16(1)

相似文献

13.

Tissue-specific transcriptome assemblies of the marine medaka Oryzias melastigma and comparative analysis with the freshwater medaka Oryzias latipes

Keng Po Lai Jing-Woei Li Simon Yuan Wang Jill Man-Ying Chiu Anna Tse Karen Lau Si Lok Doris Wai-Ting Au William Ka-Fai Tse Chris Kong-Chu Wong Ting-Fung Chan Richard Yuen-Chong Kong Rudolf Shiu-Sun Wu 《BMC genomics》2015,16(1)

相似文献

14.

Comparative genome analysis of Pediococcus damnosus LMG 28219, a strain well-adapted to the beer environment

Isabel Snauwaert Pieter Stragier Luc De Vuyst Peter Vandamme 《BMC genomics》2015,16(1)

相似文献

15.

Rapid hybrid de novo assembly of a microbial genome using only short reads: Corynebacterium pseudotuberculosis I19 as a case study

Cerdeira LT Carneiro AR Ramos RT de Almeida SS D'Afonseca V Schneider MP Baumbach J Tauch A McCulloch JA Azevedo VA Silva A 《Journal of microbiological methods》2011,86(2):218-223

Due to the advent of the so-called Next-Generation Sequencing (NGS) technologies the amount of monetary and temporal resources for whole-genome sequencing has been reduced by several orders of magnitude. Sequence reads can be assembled either by anchoring them directly onto an available reference genome (classical reference assembly), or can be concatenated by overlap (de novo assembly). The latter strategy is preferable because it tends to maintain the architecture of the genome sequence the however, depending on the NGS platform used, the shortness of read lengths cause tremendous problems the in the subsequent genome assembly phase, impeding closing of the entire genome sequence. To address the problem, we developed a multi-pronged hybrid de novo strategy combining De Bruijn graph and Overlap-Layout-Consensus methods, which was used to assemble from short reads the entire genome of Corynebacterium pseudotuberculosis strain I19, a bacterium with immense importance in veterinary medicine that causes Caseous Lymphadenitis in ruminants, principally ovines and caprines. Briefly, contigs were assembled de novo from the short reads and were only oriented using a reference genome by anchoring. Remaining gaps were closed using iterative anchoring of short reads by craning to gap flanks. Finally, we compare the genome sequence assembled using our hybrid strategy to a classical reference assembly using the same data as input and show that with the availability of a reference genome, it pays off to use the hybrid de novo strategy, rather than a classical reference assembly, because more genome sequences are preserved using the former. 相似文献

16.

De novo assembly and characterisation of the field pea transcriptome using RNA-Seq

Shimna Sudheesh Timothy I. Sawbridge Noel OI Cogan Peter Kennedy John W. Forster Sukhjiwan Kaur 《BMC genomics》2015,16(1)

相似文献

17.

De Novo Assembly of Chickpea Transcriptome Using Short Reads for Gene Discovery and Marker Identification 总被引：5，自引：0，他引：5

Rohini Garg Ravi K. Patel Akhilesh K. Tyagi Mukesh Jain 《DNA research》2011,18(1):53-63

相似文献

18.

Identification of putative candidate genes involved in cuticle formation in Prunus avium (sweet cherry) fruit 总被引：1，自引：0，他引：1

Alkio M Jonas U Sprink T van Nocker S Knoche M 《Annals of botany》2012,110(1):101-112

相似文献

19.

Evaluation and optimisation of indel detection workflows for ion torrent sequencing of the BRCA1 and BRCA2 genes

Zhen Xuan Yeo Joshua Chee Leong Wong Steven G Rozen Ann Siew Gek Lee 《BMC genomics》2014,15(1)

Background

The Ion Torrent PGM is a popular benchtop sequencer that shows promise in replacing conventional Sanger sequencing as the gold standard for mutation detection. Despite the PGM’s reported high accuracy in calling single nucleotide variations, it tends to generate many false positive calls in detecting insertions and deletions (indels), which may hinder its utility for clinical genetic testing.

Results

Recently, the proprietary analytical workflow for the Ion Torrent sequencer, Torrent Suite (TS), underwent a series of upgrades. We evaluated three major upgrades of TS by calling indels in the BRCA1 and BRCA2 genes. Our analysis revealed that false negative indels could be generated by TS under both default calling parameters and parameters adjusted for maximum sensitivity. However, indel calling with the same data using the open source variant callers, GATK and SAMtools showed that false negatives could be minimised with the use of appropriate bioinformatics analysis. Furthermore, we identified two variant calling measures, Quality-by-Depth (QD) and VARiation of the Width of gaps and inserts (VARW), which substantially reduced false positive indels, including non-homopolymer associated errors without compromising sensitivity. In our best case scenario that involved the TMAP aligner and SAMtools, we achieved 100% sensitivity, 99.99% specificity and 29% False Discovery Rate (FDR) in indel calling from all 23 samples, which is a good performance for mutation screening using PGM.

Conclusions

New versions of TS, BWA and GATK have shown improvements in indel calling sensitivity and specificity over their older counterpart. However, the variant caller of TS exhibits a lower sensitivity than GATK and SAMtools. Our findings demonstrate that although indel calling from PGM sequences may appear to be noisy at first glance, proper computational indel calling analysis is able to maximize both the sensitivity and specificity at the single base level, paving the way for the usage of this technology for future clinical genetic testing.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-516) contains supplementary material, which is available to authorized users. 相似文献

20.

Differential genome evolution and speciation of Coix lacryma-jobi L. and Coix aquatica Roxb. hybrid guangxi revealed by repetitive sequence analysis and fine karyotyping

Zexi Cai Huijun Liu Qunyan He Mingwei Pu Jian Chen Jinsheng Lai Xuexian Li Weiwei Jin 《BMC genomics》2014,15(1)

Abstract

Background

Coix, Sorghum and Zea are closely related plant genera in the subtribe Maydeae. Coix comprises 9–11 species with different ploidy levels (2n = 10, 20, 30, and 40). The exclusively cultivated C. lacryma-jobi L. (2n = 20) is widely used in East and Southeast Asia for food and medicinal applications. Three fertile cytotypes (2n = 10, 20, and 40) have been reported for C. aquatica Roxb. One sterile cytotype (2n = 30) closely related to C. aquatica has been recently found in Guangxi of China. This putative hybrid has been named C. aquatica HG (Hybrid Guangxi). The genome composition and the evolutionary history of C. lacryma-jobi and C. aquatica HG are largely unclear.

Results

About 76% of the genome of C. lacryma-jobi and 73% of the genome of C. aquatica HG are repetitive DNA sequences as shown by low coverage genome sequencing followed by similarity-based cluster analysis. In addition, long terminal repeat (LTR) retrotransposable elements are dominant repetitive sequences in these two genomes, and the proportions of many repetitive sequences in whole genome varied greatly between the two species, indicating evolutionary divergence of them. We also found that a novel 102 bp variant of centromeric satellite repeat CentX and two other satellites only appeared in C. aquatica HG. The results from FISH analysis with repeat probe cocktails and the data from chromosomes pairing in meiosis metaphase showed that C. lacryma-jobi is likely a diploidized paleotetraploid species and C. aquatica HG is possibly a recently formed hybrid. Furthermore, C. lacryma-jobi and C. aquatica HG shared more co-existing repeat families and higher sequence similarity with Sorghum than with Zea.

Conclusions

The composition and abundance of repetitive sequences are divergent between the genomes of C. lacryma-jobi and C. aquatica HG. The results from fine karyotyping analysis and chromosome pairing suggested diploidization of C. lacryma-jobi during evolution and C. aquatica HG is a recently formed hybrid. The genome-wide comparison of repetitive sequences indicated that the repeats in Coix were more similar to those in Sorghum than to those in Zea, which is consistent with the phylogenetic relationship reported by previous work.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-1025) contains supplementary material, which is available to authorized users. 相似文献