期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Optimizing de novo common wheat transcriptome assembly using short-read RNA-Seq data

J Duan C Xia G Zhao J Jia X Kong 《BMC genomics》2012,13(1):392

相似文献

2.

Non-referenced genome assembly from epigenomic short-read data

Antony Kaspi Mark Ziemann Samuel T Keating Ishant Khurana Timothy Connor Briana Spolding Adrian Cooper Ross Lazarus Ken Walder Paul Zimmet Assam El-Osta 《Epigenetics》2014,9(10):1329-1338

Current computational methods used to analyze changes in DNA methylation and chromatin modification rely on sequenced genomes. Here we describe a pipeline for the detection of these changes from short-read sequence data that does not require a reference genome. Open source software packages were used for sequence assembly, alignment, and measurement of differential enrichment. The method was evaluated by comparing results with reference-based results showing a strong correlation between chromatin modification and gene expression. We then used our de novo sequence assembly to build the DNA methylation profile for the non-referenced Psammomys obesus genome. The pipeline described uses open source software for fast annotation and visualization of unreferenced genomic regions from short-read data. 相似文献

3.

De novo assembly of bacterial transcriptomes from RNA-seq data

Brian Tjaden 《Genome biology》2015,16(1)

相似文献

4.

Non-referenced genome assembly from epigenomic short-read data

《Epigenetics》2013,8(10):1329-1338

Current computational methods used to analyze changes in DNA methylation and chromatin modification rely on sequenced genomes. Here we describe a pipeline for the detection of these changes from short-read sequence data that does not require a reference genome. Open source software packages were used for sequence assembly, alignment, and measurement of differential enrichment. The method was evaluated by comparing results with reference-based results showing a strong correlation between chromatin modification and gene expression. We then used our de novo sequence assembly to build the DNA methylation profile for the non-referenced Psammomys obesus genome. The pipeline described uses open source software for fast annotation and visualization of unreferenced genomic regions from short-read data. 相似文献

5.

An ensemble strategy that significantly improves de novo assembly of microbial genomes from metagenomic next-generation sequencing data

Xutao Deng Samia N. Naccache Terry Ng Scot Federman Linlin Li Charles Y. Chiu Eric L. Delwart 《Nucleic acids research》2015,43(7):e46

Next-generation sequencing (NGS) approaches rapidly produce millions to billions of short reads, which allow pathogen detection and discovery in human clinical, animal and environmental samples. A major limitation of sequence homology-based identification for highly divergent microorganisms is the short length of reads generated by most highly parallel sequencing technologies. Short reads require a high level of sequence similarities to annotated genes to confidently predict gene function or homology. Such recognition of highly divergent homologues can be improved by reference-free (de novo) assembly of short overlapping sequence reads into larger contigs. We describe an ensemble strategy that integrates the sequential use of various de Bruijn graph and overlap-layout-consensus assemblers with a novel partitioned sub-assembly approach. We also proposed new quality metrics that are suitable for evaluating metagenome de novo assembly. We demonstrate that this new ensemble strategy tested using in silico spike-in, clinical and environmental NGS datasets achieved significantly better contigs than current approaches. 相似文献

6.

Efficient de novo assembly of large and complex genomes by massively parallel sequencing of Fosmid pools

Andrey Alexeyenko Bj?rn Nystedt Francesco Vezzi Ellen Sherwood Rosa Ye Bjarne Knudsen Martin Simonsen Benjamin Turner Pieter de Jong Cheng-Cang Wu Joakim Lundeberg 《BMC genomics》2014,15(1)

Background

Sampling genomes with Fosmid vectors and sequencing of pooled Fosmid libraries on the Illumina platform for massive parallel sequencing is a novel and promising approach to optimizing the trade-off between sequencing costs and assembly quality.

Results

In order to sequence the genome of Norway spruce, which is of great size and complexity, we developed and applied a new technology based on the massive production, sequencing, and assembly of Fosmid pools (FP). The spruce chromosomes were sampled with ~40,000 bp Fosmid inserts to obtain around two-fold genome coverage, in parallel with traditional whole genome shotgun sequencing (WGS) of haploid and diploid genomes. Compared to the WGS results, the contiguity and quality of the FP assemblies were high, and they allowed us to fill WGS gaps resulting from repeats, low coverage, and allelic differences. The FP contig sets were further merged with WGS data using a novel software package GAM-NGS.

Conclusions

By exploiting FP technology, the first published assembly of a conifer genome was sequenced entirely with massively parallel sequencing. Here we provide a comprehensive report on the different features of the approach and the optimization of the process.We have made public the input data (FASTQ format) for the set of pools used in this study:ftp://congenie.org/congenie/Nystedt_2013/Assembly/ProcessedData/FosmidPools/.(alternatively accessible via http://congenie.org/downloads).The software used for running the assembly process is available at http://research.scilifelab.se/andrej_alexeyenko/downloads/fpools/.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-439) contains supplementary material, which is available to authorized users. 相似文献

7.

A biologist's guide to de novo genome assembly using next-generation sequence data: A test with fungal genomes

Haridas S Breuill C Bohlmann J Hsiang T 《Journal of microbiological methods》2011,86(3):368-375

We offer a guide to de novo genome assembly¹ using sequence data generated by the Illumina platform for biologists working with fungi or other organisms whose genomes are less than 100 Mb in size. The guide requires no familiarity with sequencing assembly technology or associated computer programs. It defines commonly used terms in genome sequencing and assembly; provides examples of assembling short-read genome sequence data for four strains of the fungus Grosmannia clavigera using four assembly programs; gives examples of protocols and software; and presents a commented flowchart that extends from DNA preparation for submission to a sequencing center, through to processing and assembly of the raw sequence reads using freely available operating systems and software. 相似文献

8.

Identifying structural variation in haploid microbial genomes from short-read resequencing data using breseq

Jeffrey E Barrick Geoffrey Colburn Daniel E Deatherage Charles C Traverse Matthew D Strand Jordan J Borges David B Knoester Aaron Reba Austin G Meyer 《BMC genomics》2014,15(1)

相似文献

9.

Feature-by-feature--evaluating de novo sequence assembly

Vezzi F Narzisi G Mishra B 《PloS one》2012,7(2):e31002

The whole-genome sequence assembly (WGSA) problem is among one of the most studied problems in computational biology. Despite the availability of a plethora of tools (i.e., assemblers), all claiming to have solved the WGSA problem, little has been done to systematically compare their accuracy and power. Traditional methods rely on standard metrics and read simulation: while on the one hand, metrics like N50 and number of contigs focus only on size without proportionately emphasizing the information about the correctness of the assembly, comparisons performed on simulated dataset, on the other hand, can be highly biased by the non-realistic assumptions in the underlying read generator. Recently the Feature Response Curve (FRC) method was proposed to assess the overall assembly quality and correctness: FRC transparently captures the trade-offs between contigs' quality against their sizes. Nevertheless, the relationship among the different features and their relative importance remains unknown. In particular, FRC cannot account for the correlation among the different features. We analyzed the correlation among different features in order to better describe their relationships and their importance in gauging assembly quality and correctness. In particular, using multivariate techniques like principal and independent component analysis we were able to estimate the "excess-dimensionality" of the feature space. Moreover, principal component analysis allowed us to show how poorly the acclaimed N50 metric describes the assembly quality. Applying independent component analysis we identified a subset of features that better describe the assemblers performances. We demonstrated that by focusing on a reduced set of highly informative features we can use the FRC curve to better describe and compare the performances of different assemblers. Moreover, as a by-product of our analysis, we discovered how often evaluation based on simulated data, obtained with state of the art simulators, lead to not-so-realistic results. 相似文献

10.

The Inferelator: an algorithm for learning parsimonious regulatory networks from systems-biology data sets de novo 总被引：1，自引：0，他引：1

Bonneau R Reiss DJ Shannon P Facciotti M Hood L Baliga NS Thorsson V 《Genome biology》2006,7(5):R36-16

相似文献

11.

Identification of interleukin genes in Pogona vitticeps using a de novo transcriptome assembly from RNA-seq data

Alexandra Livernois Kristine Hardy Renae Domaschenz Alexie Papanicolaou Arthur Georges Stephen D Sarre Sudha Rao Tariq Ezaz Janine E Deakin 《Immunogenetics》2016,68(9):719-731

相似文献

12.

Starting from scratch: de novo kinetochore assembly in vertebrates

Bowers SR Mellone BG 《The EMBO journal》2011,30(19):3882-3884

相似文献

13.

Comparative analysis of de novo transcriptome assembly

Kaitlin Clarke Yi Yang Ronald Marsh LingLin Xie Ke K. Zhang 《中国科学：生命科学英文版》2013,56(2):156-162

相似文献

14.

Structural variation in two human genomes mapped at single-nucleotide resolution by whole genome de novo assembly

Li Y Zheng H Luo R Wu H Zhu H Li R Cao H Wu B Huang S Shao H Ma H Zhang F Feng S Zhang W Du H Tian G Li J Zhang X Li S Bolund L Kristiansen K de Smith AJ Blakemore AI Coin LJ Yang H Wang J Wang J 《Nature biotechnology》2011,29(8):723-730

Here we use whole-genome de novo assembly of second-generation sequencing reads to map structural variation (SV) in an Asian genome and an African genome. Our approach identifies small- and intermediate-size homozygous variants (1-50 kb) including insertions, deletions, inversions and their precise breakpoints, and in contrast to other methods, can resolve complex rearrangements. In total, we identified 277,243 SVs ranging in length from 1-23 kb. Validation using computational and experimental methods suggests that we achieve overall <6% false-positive rate and <10% false-negative rate in genomic regions that can be assembled, which outperforms other methods. Analysis of the SVs in the genomes of 106 individuals sequenced as part of the 1000 Genomes Project suggests that SVs account for a greater fraction of the diversity between individuals than do single-nucleotide polymorphisms (SNPs). These findings demonstrate that whole-genome de novo assembly is a feasible approach to deriving more comprehensive maps of genetic variation. 相似文献

15.

Evaluating the fidelity of de novo short read metagenomic assembly using simulated data

Pignatelli M Moya A 《PloS one》2011,6(5):e19984

A frequent step in metagenomic data analysis comprises the assembly of the sequenced reads. Many assembly tools have been published in the last years targeting data coming from next-generation sequencing (NGS) technologies but these assemblers have not been designed for or tested in multi-genome scenarios that characterize metagenomic studies. Here we provide a critical assessment of current de novo short reads assembly tools in multi-genome scenarios using complex simulated metagenomic data. With this approach we tested the fidelity of different assemblers in metagenomic studies demonstrating that even under the simplest compositions the number of chimeric contigs involving different species is noticeable. We further showed that the assembly process reduces the accuracy of the functional classification of the metagenomic data and that these errors can be overcome raising the coverage of the studied metagenome. The results presented here highlight the particular difficulties that de novo genome assemblers face in multi-genome scenarios demonstrating that these difficulties, that often compromise the functional classification of the analyzed data, can be overcome with a high sequencing effort. 相似文献

16.

Assessing the benefits of using mate-pairs to resolve repeats in de novo short-read prokaryotic assemblies

Joshua?Wetzel Carl?Kingsford Mihai?Pop Email author 《BMC bioinformatics》2011,12(1):95

Background

Next-generation sequencing technologies allow genomes to be sequenced more quickly and less expensively than ever before. However, as sequencing technology has improved, the difficulty of de novo genome assembly has increased, due in large part to the shorter reads generated by the new technologies. The use of mated sequences (referred to as mate-pairs) is a standard means of disambiguating assemblies to obtain a more complete picture of the genome without resorting to manual finishing. Here, we examine the effectiveness of mate-pair information in resolving repeated sequences in the DNA (a paramount issue to overcome). While it has been empirically accepted that mate-pairs improve assemblies, and a variety of assemblers use mate-pairs in the context of repeat resolution, the effectiveness of mate-pairs in this context has not been systematically evaluated in previous literature. 相似文献

17.

Bridger: a new framework for de novo transcriptome assembly using RNA-seq data

Zheng Chang Guojun Li Juntao Liu Yu Zhang Cody Ashby Deli Liu Carole L Cramer Xiuzhen Huang 《Genome biology》2015,16(1)

相似文献

18.

Evaluation of de novo transcriptome assemblies from RNA-Seq data

Bo Li Nathanael Fillmore Yongsheng Bai Mike Collins James A Thomson Ron Stewart Colin N Dewey 《Genome biology》2014,15(12)

相似文献

19.

Anchored pseudo-de novo assembly of human genomes identifies extensive sequence variation from unmapped sequence reads

Joshua?J.?Faber-Hammond Kim?H.?Brown Email author View author&#;s OrcID profile 《Human genetics》2016,135(7):727-740

The human genome reference (HGR) completion marked the genomics era beginning, yet despite its utility universal application is limited by the small number of individuals used in its development. This is highlighted by the presence of high-quality sequence reads failing to map within the HGR. Sequences failing to map generally represent 2–5 % of total reads, which may harbor regions that would enhance our understanding of population variation, evolution, and disease. Alternatively, complete de novo assemblies can be created, but these effectively ignore the groundwork of the HGR. In an effort to find a middle ground, we developed a bioinformatic pipeline that maps paired-end reads to the HGR as separate single reads, exports unmappable reads, de novo assembles these reads per individual and then combines assemblies into a secondary reference assembly used for comparative analysis. Using 45 diverse 1000 Genomes Project individuals, we identified 351,361 contigs covering 195.5 Mb of sequence unincorporated in GRCh38. 30,879 contigs are represented in multiple individuals with ~40 % showing high sequence complexity. Genomic coordinates were generated for 99.9 %, with 52.5 % exhibiting high-quality mapping scores. Comparative genomic analyses with archaic humans and primates revealed significant sequence alignments and comparisons with model organism RefSeq gene datasets identified novel human genes. If incorporated, these sequences will expand the HGR, but more importantly our data highlight that with this method low coverage (~10–20×) next-generation sequencing can still be used to identify novel unmapped sequences to explore biological functions contributing to human phenotypic variation, disease and functionality for personal genomic medicine. 相似文献

20.

Combining de novo and reference-guided assembly with scaffold_builder

Genivaldo?GZ?Silva Bas?E?Dutilh Keri?Elkins Robert?Schmieder Elizabeth?A?Dinsdale Robert?A?Edwards

Email author

《Source code for biology and medicine》

Genome sequencing has become routine, however genome assembly still remains a challenge despite the computational advances in the last decade. In particular, the abundance of repeat elements in genomes makes it difficult to assemble them into a single complete sequence. Identical repeats shorter than the average read length can generally be assembled without issue. However, longer repeats such as ribosomal RNA operons cannot be accurately assembled using existing tools. The application Scaffold_builder was designed to generate scaffolds – super contigs of sequences joined by N-bases – based on the similarity to a closely related reference sequence. This is independent of mate-pair information and can be used complementarily for genome assembly, e.g. when mate-pairs are not available or have already been exploited. Scaffold_builder was evaluated using simulated pyrosequencing reads of the bacterial genomes Escherichia coli 042, Lactobacillus salivarius UCC118 and Salmonella enterica subsp. enterica serovar Typhi str. P-stx-12. Moreover, we sequenced two genomes from Salmonella enterica serovar Typhimurium LT2 G455 and Salmonella enterica serovar Typhimurium SDT1291 and show that Scaffold_builder decreases the number of contig sequences by 53% while more than doubling their average length. Scaffold_builder is written in Python and is available at http://edwards.sdsu.edu/scaffold_builder. A web-based implementation is additionally provided to allow users to submit a reference genome and a set of contigs to be scaffolded. 相似文献