期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Identification of Optimum Sequencing Depth Especially for De Novo Genome Assembly of Small Genomes Using Next Generation Sequencing Data

Aarti Desai Veer Singh Marwah Akshay Yadav Vineet Jha Kishor Dhaygude Ujwala Bangar Vivek Kulkarni Abhay Jere 《PloS one》2013,8(4)

Next Generation Sequencing (NGS) is a disruptive technology that has found widespread acceptance in the life sciences research community. The high throughput and low cost of sequencing has encouraged researchers to undertake ambitious genomic projects, especially in de novo genome sequencing. Currently, NGS systems generate sequence data as short reads and de novo genome assembly using these short reads is computationally very intensive. Due to lower cost of sequencing and higher throughput, NGS systems now provide the ability to sequence genomes at high depth. However, currently no report is available highlighting the impact of high sequence depth on genome assembly using real data sets and multiple assembly algorithms. Recently, some studies have evaluated the impact of sequence coverage, error rate and average read length on genome assembly using multiple assembly algorithms, however, these evaluations were performed using simulated datasets. One limitation of using simulated datasets is that variables such as error rates, read length and coverage which are known to impact genome assembly are carefully controlled. Hence, this study was undertaken to identify the minimum depth of sequencing required for de novo assembly for different sized genomes using graph based assembly algorithms and real datasets. Illumina reads for E.coli (4.6 MB) S.kudriavzevii (11.18 MB) and C.elegans (100 MB) were assembled using SOAPdenovo, Velvet, ABySS, Meraculous and IDBA-UD. Our analysis shows that 50X is the optimum read depth for assembling these genomes using all assemblers except Meraculous which requires 100X read depth. Moreover, our analysis shows that de novo assembly from 50X read data requires only 6–40 GB RAM depending on the genome size and assembly algorithm used. We believe that this information can be extremely valuable for researchers in designing experiments and multiplexing which will enable optimum utilization of sequencing as well as analysis resources. 相似文献

2.

Hybrid De Novo Genome Assembly Using MiSeq and SOLiD Short Read Data

Tsutomu Ikegami Toyohiro Inatsugi Isao Kojima Myco Umemura Hiroko Hagiwara Masayuki Machida Kiyoshi Asai 《PloS one》2015,10(4)

A hybrid de novo assembly pipeline was constructed to utilize both MiSeq and SOLiD short read data in combination in the assembly. The short read data were converted to a standard format of the pipeline, and were supplied to the pipeline components such as ABySS and SOAPdenovo. The assembly pipeline proceeded through several stages, and either MiSeq paired-end data, SOLiD mate-paired data, or both of them could be specified as input data at each stage separately. The pipeline was examined on the filamentous fungus Aspergillus oryzae RIB40, by aligning the assembly results against the reference sequences. Using both the MiSeq and the SOLiD data in the hybrid assembly, the alignment length was improved by a factor of 3 to 8, compared with the assemblies using either one of the data types. The number of the reproduced gene cluster regions encoding secondary metabolite biosyntheses (SMB) was also improved by the hybrid assemblies. These results imply that the MiSeq data with long read length are essential to construct accurate nucleotide sequences, while the SOLiD mate-paired reads with long insertion length enhance long-range arrangements of the sequences. The pipeline was also tested on the actinomycete Streptomyces avermitilis MA-4680, whose gene is known to have high-GC content. Although the quality of the SOLiD reads was too low to perform any meaningful assemblies by themselves, the alignment length to the reference was improved by a factor of 2, compared with the assembly using only the MiSeq data. 相似文献

3.

Enhanced De Novo Assembly of High Throughput Pyrosequencing Data Using Whole Genome Mapping

Fatma Onmus-Leone Jun Hang Robert J. Clifford Yu Yang Matthew C. Riley Robert A. Kuschner Paige E. Waterman Emil P. Lesho 《PloS one》2013,8(4)

Despite major advances in next-generation sequencing, assembly of sequencing data, especially data from novel microorganisms or re-emerging pathogens, remains constrained by the lack of suitable reference sequences. De novo assembly is the best approach to achieve an accurate finished sequence, but multiple sequencing platforms or paired-end libraries are often required to achieve full genome coverage. In this study, we demonstrated a method to assemble complete bacterial genome sequences by integrating shotgun Roche 454 pyrosequencing with optical whole genome mapping (WGM). The whole genome restriction map (WGRM) was used as the reference to scaffold de novo assembled sequence contigs through a stepwise process. Large de novo contigs were placed in the correct order and orientation through alignment to the WGRM. De novo contigs that were not aligned to WGRM were merged into scaffolds using contig branching structure information. These extended scaffolds were then aligned to the WGRM to identify the overlaps to be eliminated and the gaps and mismatches to be resolved with unused contigs. The process was repeated until a sequence with full coverage and alignment with the whole genome map was achieved. Using this method we were able to achieved 100% WGRM coverage without a paired-end library. We assembled complete sequences for three distinct genetic components of a clinical isolate of Providencia stuartii: a bacterial chromosome, a novel bla _NDM-1 plasmid, and a novel bacteriophage, without separately purifying them to homogeneity. 相似文献

4.

Focus on Weed Control: De Novo Assembly and Characterization of the Transcriptome of the Parasitic Weed Dodder Identifies Genes Associated with Plant Parasitism

Aashish Ranjan Yasunori Ichihashi Moran Farhi Kristina Zumstein Brad Townsley Rakefet David-Schwartz Neelima R. Sinha 《Plant physiology》2014,166(3):1186-1199

相似文献

5.

Effects of GC Bias in Next-Generation-Sequencing Data on De Novo Genome Assembly

Yen-Chun Chen Tsunglin Liu Chun-Hui Yu Tzen-Yuh Chiang Chi-Chuan Hwang 《PloS one》2013,8(4)

Next-generation-sequencing (NGS) has revolutionized the field of genome assembly because of its much higher data throughput and much lower cost compared with traditional Sanger sequencing. However, NGS poses new computational challenges to de novo genome assembly. Among the challenges, GC bias in NGS data is known to aggravate genome assembly. However, it is not clear to what extent GC bias affects genome assembly in general. In this work, we conduct a systematic analysis on the effects of GC bias on genome assembly. Our analyses reveal that GC bias only lowers assembly completeness when the degree of GC bias is above a threshold. At a strong GC bias, the assembly fragmentation due to GC bias can be explained by the low coverage of reads in the GC-poor or GC-rich regions of a genome. This effect is observed for all the assemblers under study. Increasing the total amount of NGS data thus rescues the assembly fragmentation because of GC bias. However, the amount of data needed for a full rescue depends on the distribution of GC contents. Both low and high coverage depths due to GC bias lower the accuracy of assembly. These pieces of information provide guidance toward a better de novo genome assembly in the presence of GC bias. 相似文献

6.

BinPacker: Packing-Based De Novo Transcriptome Assembly from RNA-seq Data

Juntao Liu Guojun Li Zheng Chang Ting Yu Bingqiang Liu Rick McMullen Pengyin Chen Xiuzhen Huang 《PLoS computational biology》2016,12(2)

相似文献

7.

High-Throughput Sequencing and De Novo Assembly of the Isatis indigotica Transcriptome

Xiaoqing Tang Yunhua Xiao Tingting Lv Fangquan Wang QianHao Zhu Tianqing Zheng Jie Yang 《PloS one》2014,9(9)

相似文献

8.

Complete Genome Sequence of Amycolatopsis mediterranei S699 Based on De Novo Assembly via a Combinatorial Sequencing Strategy

Biao Tang Wei Zhao Huajun Zheng Ying Zhuo Lixin Zhang Guo-Ping Zhao 《Journal of bacteriology》2012,194(20):5699-5700

The genome of Amycolatopsis mediterranei S699 was resequenced and assembled de novo. By comparing the sequences of S699 previously released and that of A. mediterranei U32, about 10 kb of major indels was found to differ between the two S699 genomes, and the differences are likely attributable to their different assembly strategies. 相似文献

9.

ConPADE: Genome Assembly Ploidy Estimation from Next-Generation Sequencing Data

Gabriel R. A. Margarido David Heckerman 《PLoS computational biology》2015,11(4)

As a result of improvements in genome assembly algorithms and the ever decreasing costs of high-throughput sequencing technologies, new high quality draft genome sequences are published at a striking pace. With well-established methodologies, larger and more complex genomes are being tackled, including polyploid plant genomes. Given the similarity between multiple copies of a basic genome in polyploid individuals, assembly of such data usually results in collapsed contigs that represent a variable number of homoeologous genomic regions. Unfortunately, such collapse is often not ideal, as keeping contigs separate can lead both to improved assembly and also insights about how haplotypes influence phenotype. Here, we describe a first step in avoiding inappropriate collapse during assembly. In particular, we describe ConPADE (Contig Ploidy and Allele Dosage Estimation), a probabilistic method that estimates the ploidy of any given contig/scaffold based on its allele proportions. In the process, we report findings regarding errors in sequencing. The method can be used for whole genome shotgun (WGS) sequencing data. We also show applicability of the method for variant calling and allele dosage estimation. Results for simulated and real datasets are discussed and provide evidence that ConPADE performs well as long as enough sequencing coverage is available, or the true contig ploidy is low. We show that ConPADE may also be used for related applications, such as the identification of duplicated genes in fragmented assemblies, although refinements are needed. 相似文献

10.

MIDDAS-M: Motif-Independent De Novo Detection of Secondary Metabolite Gene Clusters through the Integration of Genome Sequencing and Transcriptome Data

Myco Umemura Hideaki Koike Nozomi Nagano Tomoko Ishii Jin Kawano Noriko Yamane Ikuko Kozone Katsuhisa Horimoto Kazuo Shin-ya Kiyoshi Asai Jiujiang Yu Joan W. Bennett Masayuki Machida 《PloS one》2013,8(12)

相似文献

11.

Sequencing and De Novo Assembly of the Asian Clam (Corbicula fluminea) Transcriptome Using the Illumina GAIIx Method

Huihui Chen Jinmiao Zha Xuefang Liang Jihong Bu Miao Wang Zijian Wang 《PloS one》2013,8(11)

相似文献

12.

De Novo Assembly and Characterization of the Transcriptome of Seagrass Zostera marina Using Illumina Paired-End Sequencing

Fanna Kong Hong Li Peipei Sun Yang Zhou Yunxiang Mao 《PloS one》2014,9(11)

相似文献

13.

Sequencing and De Novo Assembly of the Transcriptome of the Glassy-Winged Sharpshooter (Homalodisca vitripennis)

Raja Sekhar Nandety Shizuo G. Kamita Bruce D. Hammock Bryce W. Falk 《PloS one》2013,8(12)

相似文献

14.

Sequencing and De Novo Assembly of the Western Tarnished Plant Bug (Lygus hesperus) Transcriptome

J. Joe Hull Scott M. Geib Jeffrey A. Fabrick Colin S. Brent 《PloS one》2013,8(1)

相似文献

15.

RNA-Seq Analysis of Cocos nucifera: Transcriptome Sequencing and De Novo Assembly for Subsequent Functional Genomics Approaches

Haikuo Fan Yong Xiao Yaodong Yang Wei Xia Annaliese S. Mason Zhihui Xia Fei Qiao Songlin Zhao Haoru Tang 《PloS one》2013,8(3)

相似文献

16.

Next-Generation Sequencing and De Novo Assembly,Genome Organization,and Comparative Genomic Analyses of the Genomes of Two Helicobacter pylori Isolates from Duodenal Ulcer Patients in India

Narender Kumar Asish K. Mukhopadhyay Rajashree Patra Ronita De Ramani Baddam Sabiha Shaik Jawed Alam Suma Tiruvayipati Niyaz Ahmed 《Journal of bacteriology》2012,194(21):5963-5964

The prevalence of different H. pylori genotypes in various geographical regions indicates region-specific adaptations during the course of evolution. Complete genomes of H. pylori from countries with high infection burdens, such as India, have not yet been described. Herein we present genome sequences of two H. pylori strains, NAB47 and NAD1, from India. In this report, we briefly mention the sequencing and finishing approaches, genome assembly with downstream statistics, and important features of the two draft genomes, including their phylogenetic status. We believe that these genome sequences and the comparative genomics emanating thereupon will help us to clearly understand the ancestry and biology of the Indian H. pylori genotypes, and this will be helpful in solving the so-called Indian enigma, by which high infection rates do not corroborate the minuscule number of serious outcomes observed, including gastric cancer. 相似文献

17.

De Novo Assembly of the Complete Genome of an Enhanced Electricity-Producing Variant of Geobacter sulfurreducens Using Only Short Reads

Harish Nagarajan Jessica E. Butler Anna Klimes Yu Qiu Karsten Zengler Joy Ward Nelson D. Young Barbara A. Methé Bernhard ?. Palsson Derek R. Lovley Christian L. Barrett 《PloS one》2010,5(6)

State-of-the-art DNA sequencing technologies are transforming the life sciences due to their ability to generate nucleotide sequence information with a speed and quantity that is unapproachable with traditional Sanger sequencing. Genome sequencing is a principal application of this technology, where the ultimate goal is the full and complete sequence of the organism of interest. Due to the nature of the raw data produced by these technologies, a full genomic sequence attained without the aid of Sanger sequencing has yet to be demonstrated.We have successfully developed a four-phase strategy for using only next-generation sequencing technologies (Illumina and 454) to assemble a complete microbial genome de novo. We applied this approach to completely assemble the 3.7 Mb genome of a rare Geobacter variant (KN400) that is capable of unprecedented current production at an electrode. Two key components of our strategy enabled us to achieve this result. First, we integrated the two data types early in the process to maximally leverage their complementary characteristics. And second, we used the output of different short read assembly programs in such a way so as to leverage the complementary nature of their different underlying algorithms or of their different implementations of the same underlying algorithm.The significance of our result is that it demonstrates a general approach for maximizing the efficiency and success of genome assembly projects as new sequencing technologies and new assembly algorithms are introduced. The general approach is a meta strategy, wherein sequencing data are integrated as early as possible and in particular ways and wherein multiple assembly algorithms are judiciously applied such that the deficiencies in one are complemented by another. 相似文献

18.

Sequencing,De Novo Assembly and Annotation of the Colorado Potato Beetle,Leptinotarsa decemlineata,Transcriptome

Abhishek Kumar Leonardo Congiu Leena Lindstr?m Saija Piiroinen Michele Vidotto Alessandro Grapputo 《PloS one》2014,9(1)

相似文献

19.

Characterization of Liaoning Cashmere Goat Transcriptome: Sequencing,De Novo Assembly,Functional Annotation and Comparative Analysis

Hongliang Liu Tingting Wang Jinke Wang Fusheng Quan Yong Zhang 《PloS one》2013,8(10)

相似文献

20.

Rapid Genome Mapping in Nanochannel Arrays for Highly Complete and Accurate De Novo Sequence Assembly of the Complex Aegilops tauschii Genome

Alex R. Hastie Lingli Dong Alexis Smith Jeff Finklestein Ernest T. Lam Naxin Huo Han Cao Pui-Yan Kwok Karin R. Deal Jan Dvorak Ming-Cheng Luo Yong Gu Ming Xiao 《PloS one》2013,8(2)

Next-generation sequencing (NGS) technologies have enabled high-throughput and low-cost generation of sequence data; however, de novo genome assembly remains a great challenge, particularly for large genomes. NGS short reads are often insufficient to create large contigs that span repeat sequences and to facilitate unambiguous assembly. Plant genomes are notorious for containing high quantities of repetitive elements, which combined with huge genome sizes, makes accurate assembly of these large and complex genomes intractable thus far. Using two-color genome mapping of tiling bacterial artificial chromosomes (BAC) clones on nanochannel arrays, we completed high-confidence assembly of a 2.1-Mb, highly repetitive region in the large and complex genome of Aegilops tauschii, the D-genome donor of hexaploid wheat (Triticum aestivum). Genome mapping is based on direct visualization of sequence motifs on single DNA molecules hundreds of kilobases in length. With the genome map as a scaffold, we anchored unplaced sequence contigs, validated the initial draft assembly, and resolved instances of misassembly, some involving contigs <2 kb long, to dramatically improve the assembly from 75% to 95% complete. 相似文献