期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Evaluating characteristics of de novo assembly software on 454 transcriptome data: a simulation approach

Mundry M Bornberg-Bauer E Sammeth M Feulner PG 《PloS one》2012,7(2):e31410

相似文献

2.

Pollux: platform independent error correction of single and mixed genomes

Eric Marinier Daniel G Brown Brendan J McConkey 《BMC bioinformatics》2014,16(1)

Background

Second-generation sequencers generate millions of relatively short, but error-prone, reads. These errors make sequence assembly and other downstream projects more challenging. Correcting these errors improves the quality of assemblies and projects which benefit from error-free reads.

Results

We have developed a general-purpose error corrector that corrects errors introduced by Illumina, Ion Torrent, and Roche 454 sequencing technologies and can be applied to single- or mixed-genome data. In addition to correcting substitution errors, we locate and correct insertion, deletion, and homopolymer errors while remaining sensitive to low coverage areas of sequencing projects. Using published data sets, we correct 94% of Illumina MiSeq errors, 88% of Ion Torrent PGM errors, 85% of Roche 454 GS Junior errors. Introduced errors are 20 to 70 times more rare than successfully corrected errors. Furthermore, we show that the quality of assemblies improves when reads are corrected by our software.

Conclusions

Pollux is highly effective at correcting errors across platforms, and is consistently able to perform as well or better than currently available error correction software. Pollux provides general-purpose error correction and may be used in applications with or without assembly. 相似文献

3.

Sequencing,De Novo Assembly and Annotation of the Colorado Potato Beetle,Leptinotarsa decemlineata,Transcriptome

Abhishek Kumar Leonardo Congiu Leena Lindstr?m Saija Piiroinen Michele Vidotto Alessandro Grapputo 《PloS one》2014,9(1)

相似文献

4.

The fat body transcriptomes of the yellow fever mosquito Aedes aegypti, pre- and post- blood meal

Price DP Nagarajan V Churbanov A Houde P Milligan B Drake LL Gustafson JE Hansen IA 《PloS one》2011,6(7):e22573

相似文献

5.

Predicting the functional repertoire of an organism from unassembled RNA–seq data

Manuel Landesfeind Peter Meinicke 《BMC genomics》2014,15(1)

相似文献

6.

Is the whole greater than the sum of its parts? De novo assembly strategies for bacterial genomes based on paired-end sequencing

Ting-Wen Chen Ruei-Chi Gan Yi-Feng Chang Wei-Chao Liao Timothy H. Wu Chi-Ching Lee Po-Jung Huang Cheng-Yang Lee Yi-Ywan M. Chen Cheng-Hsun Chiu Petrus Tang 《BMC genomics》2015,16(1)

Background

Whole genome sequence construction is becoming increasingly feasible because of advances in next generation sequencing (NGS), including increasing throughput and read length. By simply overlapping paired-end reads, we can obtain longer reads with higher accuracy, which can facilitate the assembly process. However, the influences of different library sizes and assembly methods on paired-end sequencing-based de novo assembly remain poorly understood.

Results

We used 250 bp Illumina Miseq paired-end reads of different library sizes generated from genomic DNA from Escherichia coli DH1 and Streptococcus parasanguinis FW213 to compare the assembly results of different library sizes and assembly approaches. Our data indicate that overlapping paired-end reads can increase read accuracy but sometimes cause insertion or deletions. Regarding genome assembly, merged reads only outcompete original paired-end reads when coverage depth is low, and larger libraries tend to yield better assembly results. These results imply that distance information is the most critical factor during assembly. Our results also indicate that when depth is sufficiently high, assembly from subsets can sometimes produce better results.

Conclusions

In summary, this study provides systematic evaluations of de novo assembly from paired end sequencing data. Among the assembly strategies, we find that overlapping paired-end reads is not always beneficial for bacteria genome assembly and should be avoided or used with caution especially for genomes containing high fraction of repetitive sequences. Because increasing numbers of projects aim at bacteria genome sequencing, our study provides valuable suggestions for the field of genomic sequence construction.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1859-8) contains supplementary material, which is available to authorized users. 相似文献

7.

Comparison of Metatranscriptomic Samples Based on k-Tuple Frequencies

Ying Wang Lin Liu Lina Chen Ting Chen Fengzhu Sun 《PloS one》2014,9(1)

相似文献

8.

Transcriptome analysis of female and male Xiphophorus maculatus Jp 163 A

Zhang Z Wang Y Wang S Liu J Warren W Mitreva M Walter RB 《PloS one》2011,6(4):e18379

相似文献

9.

Plantagora: modeling whole genome sequencing and assembly of plant genomes

Barthelson R McFarlin AJ Rounsley SD Young S 《PloS one》2011,6(12):e28436

Background

Genomics studies are being revolutionized by the next generation sequencing technologies, which have made whole genome sequencing much more accessible to the average researcher. Whole genome sequencing with the new technologies is a developing art that, despite the large volumes of data that can be produced, may still fail to provide a clear and thorough map of a genome. The Plantagora project was conceived to address specifically the gap between having the technical tools for genome sequencing and knowing precisely the best way to use them.

Methodology/Principal Findings

For Plantagora, a platform was created for generating simulated reads from several different plant genomes of different sizes. The resulting read files mimicked either 454 or Illumina reads, with varying paired end spacing. Thousands of datasets of reads were created, most derived from our primary model genome, rice chromosome one. All reads were assembled with different software assemblers, including Newbler, Abyss, and SOAPdenovo, and the resulting assemblies were evaluated by an extensive battery of metrics chosen for these studies. The metrics included both statistics of the assembly sequences and fidelity-related measures derived by alignment of the assemblies to the original genome source for the reads. The results were presented in a website, which includes a data graphing tool, all created to help the user compare rapidly the feasibility and effectiveness of different sequencing and assembly strategies prior to testing an approach in the lab. Some of our own conclusions regarding the different strategies were also recorded on the website.

Conclusions/Significance

Plantagora provides a substantial body of information for comparing different approaches to sequencing a plant genome, and some conclusions regarding some of the specific approaches. Plantagora also provides a platform of metrics and tools for studying the process of sequencing and assembly further. 相似文献

10.

Next generation sequencing provides rapid access to the genome of Puccinia striiformis f. sp. tritici, the causal agent of wheat stripe rust 总被引：2，自引：0，他引：2

Cantu D Govindarajulu M Kozik A Wang M Chen X Kojima KK Jurka J Michelmore RW Dubcovsky J 《PloS one》2011,6(8):e24230

Background

The wheat stripe rust fungus (Puccinia striiformis f. sp. tritici, PST) is responsible for significant yield losses in wheat production worldwide. In spite of its economic importance, the PST genomic sequence is not currently available. Fortunately Next Generation Sequencing (NGS) has radically improved sequencing speed and efficiency with a great reduction in costs compared to traditional sequencing technologies. We used Illumina sequencing to rapidly access the genomic sequence of the highly virulent PST race 130 (PST-130).

Methodology/Principal Findings

We obtained nearly 80 million high quality paired-end reads (>50x coverage) that were assembled into 29,178 contigs (64.8 Mb), which provide an estimated coverage of at least 88% of the PST genes and are available through GenBank. Extensive micro-synteny with the Puccinia graminis f. sp. tritici (PGTG) genome and high sequence similarity with annotated PGTG genes support the quality of the PST-130 contigs. We characterized the transposable elements present in the PST-130 contigs and using an ab initio gene prediction program we identified and tentatively annotated 22,815 putative coding sequences. We provide examples on the use of comparative approaches to improve gene annotation for both PST and PGTG and to identify candidate effectors. Finally, the assembled contigs provided an inventory of PST repetitive elements, which were annotated and deposited in Repbase.

Conclusions/Significance

The assembly of the PST-130 genome and the predicted proteins provide useful resources to rapidly identify and clone PST genes and their regulatory regions. Although the automatic gene prediction has limitations, we show that a comparative genomics approach using multiple rust species can greatly improve the quality of gene annotation in these species. The PST-130 sequence will also be useful for comparative studies within PST as more races are sequenced. This study illustrates the power of NGS for rapid and efficient access to genomic sequence in non-model organisms. 相似文献

11.

Pseudo-Sanger sequencing: massively parallel production of long and near error-free reads using NGS technology

Jue Ruan Lan Jiang Zechen Chong Qiang Gong Heng Li Chunyan Li Yong Tao Caihong Zheng Weiwei Zhai David Turissini Charles H Cannon Xuemei Lu Chung-I Wu 《BMC genomics》2013,14(1)

Background

Usually, next generation sequencing (NGS) technology has the property of ultra-high throughput but the read length is remarkably short compared to conventional Sanger sequencing. Paired-end NGS could computationally extend the read length but with a lot of practical inconvenience because of the inherent gaps. Now that Illumina paired-end sequencing has the ability of read both ends from 600 bp or even 800 bp DNA fragments, how to fill in the gaps between paired ends to produce accurate long reads is intriguing but challenging.

Results

We have developed a new technology, referred to as pseudo-Sanger (PS) sequencing. It tries to fill in the gaps between paired ends and could generate near error-free sequences equivalent to the conventional Sanger reads in length but with the high throughput of the Next Generation Sequencing. The major novelty of PS method lies on that the gap filling is based on local assembly of paired-end reads which have overlaps with at either end. Thus, we are able to fill in the gaps in repetitive genomic region correctly. The PS sequencing starts with short reads from NGS platforms, using a series of paired-end libraries of stepwise decreasing insert sizes. A computational method is introduced to transform these special paired-end reads into long and near error-free PS sequences, which correspond in length to those with the largest insert sizes. The PS construction has 3 advantages over untransformed reads: gap filling, error correction and heterozygote tolerance. Among the many applications of the PS construction is de novo genome assembly, which we tested in this study. Assembly of PS reads from a non-isogenic strain of Drosophila melanogaster yields an N50 contig of 190 kb, a 5 fold improvement over the existing de novo assembly methods and a 3 fold advantage over the assembly of long reads from 454 sequencing.

Conclusions

Our method generated near error-free long reads from NGS paired-end sequencing. We demonstrated that de novo assembly could benefit a lot from these Sanger-like reads. Besides, the characteristic of the long reads could be applied to such applications as structural variations detection and metagenomics.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-14-711) contains supplementary material, which is available to authorized users. 相似文献

12.

De novo Assembly and Characterization of the Barnyardgrass (Echinochloa crus-galli) Transcriptome Using Next-Generation Pyrosequencing

Xia Yang Xin-Yan Yu Yong-Feng Li 《PloS one》2013,8(7)

相似文献

13.

Exploring the switchgrass transcriptome using second-generation sequencing technology

Wang Y Zeng X Iyer NJ Bryant DW Mockler TC Mahalingam R 《PloS one》2012,7(3):e34225

相似文献

14.

De Novo Transcriptome Hybrid Assembly and Validation in the European Earwig (Dermaptera,Forficula auricularia)

Anne C. Roulin Min Wu Samuel Pichon Roberto Arbore Simone Kühn-Bühlmann Mathias K?lliker Jean-Claude Walser 《PloS one》2014,9(4)

相似文献

15.

Transcriptomic analysis of Siberian ginseng (Eleutherococcus senticosus) to discover genes involved in saponin biosynthesis

Hwan-Su Hwang Hyoshin Lee Yong Eui Choi 《BMC genomics》2015,16(1)

相似文献

16.

A de novo expression profiling of Anopheles funestus, malaria vector in Africa, using 454 pyrosequencing

Gregory R Darby AC Irving H Coulibaly MB Hughes M Koekemoer LL Coetzee M Ranson H Hemingway J Hall N Wondji CS 《PloS one》2011,6(2):e17418

相似文献

17.

Transcriptomics of in vitro immune-stimulated hemocytes from the Manila clam Ruditapes philippinarum using high-throughput sequencing

Moreira R Balseiro P Planas JV Fuste B Beltran S Novoa B Figueras A 《PloS one》2012,7(4):e35009

相似文献

18.

Genome analysis of a major urban malaria vector mosquito,Anopheles stephensi

《Genome biology》2014,15(9)

相似文献

19.

Development of Molecular Resources for an Intertidal Clam,Sinonovacula constricta,Using 454 Transcriptome Sequencing

Donghong Niu Lie Wang Fanyue Sun Zhanjiang Liu Jiale Li 《PloS one》2013,8(7)

相似文献

20.

QuorUM: An Error Corrector for Illumina Reads

Guillaume Mar?ais James A. Yorke Aleksey Zimin 《PloS one》2015,10(6)

Motivation

Illumina Sequencing data can provide high coverage of a genome by relatively short (most often 100 bp to 150 bp) reads at a low cost. Even with low (advertised 1%) error rate, 100 × coverage Illumina data on average has an error in some read at every base in the genome. These errors make handling the data more complicated because they result in a large number of low-count erroneous k-mers in the reads. However, there is enough information in the reads to correct most of the sequencing errors, thus making subsequent use of the data (e.g. for mapping or assembly) easier. Here we use the term “error correction” to denote the reduction in errors due to both changes in individual bases and trimming of unusable sequence. We developed an error correction software called QuorUM. QuorUM is mainly aimed at error correcting Illumina reads for subsequent assembly. It is designed around the novel idea of minimizing the number of distinct erroneous k-mers in the output reads and preserving the most true k-mers, and we introduce a composite statistic π that measures how successful we are at achieving this dual goal. We evaluate the performance of QuorUM by correcting actual Illumina reads from genomes for which a reference assembly is available.

Results

We produce trimmed and error-corrected reads that result in assemblies with longer contigs and fewer errors. We compared QuorUM against several published error correctors and found that it is the best performer in most metrics we use. QuorUM is efficiently implemented making use of current multi-core computing architectures and it is suitable for large data sets (1 billion bases checked and corrected per day per core). We also demonstrate that a third-party assembler (SOAPdenovo) benefits significantly from using QuorUM error-corrected reads. QuorUM error corrected reads result in a factor of 1.1 to 4 improvement in N50 contig size compared to using the original reads with SOAPdenovo for the data sets investigated.

Availability

QuorUM is distributed as an independent software package and as a module of the MaSuRCA assembly software. Both are available under the GPL open source license at http://www.genome.umd.edu.

Contact

ude.dmu@siacramg. 相似文献