期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Comparative analysis of de novo transcriptome assembly

Kaitlin Clarke Yi Yang Ronald Marsh LingLin Xie Ke K. Zhang 《中国科学：生命科学英文版》2013,56(2):156-162

相似文献

2.

Identification of Optimum Sequencing Depth Especially for De Novo Genome Assembly of Small Genomes Using Next Generation Sequencing Data

Aarti Desai Veer Singh Marwah Akshay Yadav Vineet Jha Kishor Dhaygude Ujwala Bangar Vivek Kulkarni Abhay Jere 《PloS one》2013,8(4)

Next Generation Sequencing (NGS) is a disruptive technology that has found widespread acceptance in the life sciences research community. The high throughput and low cost of sequencing has encouraged researchers to undertake ambitious genomic projects, especially in de novo genome sequencing. Currently, NGS systems generate sequence data as short reads and de novo genome assembly using these short reads is computationally very intensive. Due to lower cost of sequencing and higher throughput, NGS systems now provide the ability to sequence genomes at high depth. However, currently no report is available highlighting the impact of high sequence depth on genome assembly using real data sets and multiple assembly algorithms. Recently, some studies have evaluated the impact of sequence coverage, error rate and average read length on genome assembly using multiple assembly algorithms, however, these evaluations were performed using simulated datasets. One limitation of using simulated datasets is that variables such as error rates, read length and coverage which are known to impact genome assembly are carefully controlled. Hence, this study was undertaken to identify the minimum depth of sequencing required for de novo assembly for different sized genomes using graph based assembly algorithms and real datasets. Illumina reads for E.coli (4.6 MB) S.kudriavzevii (11.18 MB) and C.elegans (100 MB) were assembled using SOAPdenovo, Velvet, ABySS, Meraculous and IDBA-UD. Our analysis shows that 50X is the optimum read depth for assembling these genomes using all assemblers except Meraculous which requires 100X read depth. Moreover, our analysis shows that de novo assembly from 50X read data requires only 6–40 GB RAM depending on the genome size and assembly algorithm used. We believe that this information can be extremely valuable for researchers in designing experiments and multiplexing which will enable optimum utilization of sequencing as well as analysis resources. 相似文献

3.

Stacking up RADSeq assembly programs: From complete hit to completely abysmal

Annarita Marrano Alice E. Palmer Brook T. Moyers 《Molecular ecology resources》2020,20(2):357-359

Decreasing sequencing costs have driven a rapid expansion of novel genotyping methods. One of these methods is the exploitation of restriction enzyme cut sites to generate genome‐wide but reduced representation sequencing libraries (RRLs), alternatively termed genotyping by sequencing or restriction‐site associated DNA sequencing. Without a reference genome, the resulting short sequence reads must be assembled de novo. There are many possible assembly programs, most not explicitly developed for RRL data, and we know little of their effectiveness. In this issue of Molecular Ecology Resources, LaCava et al. (2020) systematically evaluate six commonly used programs and two commonly varied parameters for complete and accurate assembly of RRLs, using simulated double digests of Homo sapiens and Arabidopsis thaliana genomes with varied mutation rates and types. The authors find substantial variation in performance across assembly programs. The most consistently high‐performing assembler is infrequently used in their literature survey (CD‐HIT; Li and Godzik, 2006), while several others fail to produce complete, accurate assemblies under many conditions. LaCava et al. additionally recommend best practices in parameter choice and evaluation of future assembly programs—advice that molecular ecologists working to assemble sequences of all kinds should take to heart. 相似文献

4.

Chromosome‐level hybrid de novo genome assemblies as an attainable option for nonmodel insects

Coline C. Jaworski Carson W. Allan Luciano M. Matzkin 《Molecular ecology resources》2020,20(5):1277-1293

The emergence of third‐generation sequencing (3GS; long‐reads) is bringing closer the goal of chromosome‐size fragments in de novo genome assemblies. This allows the exploration of new and broader questions on genome evolution for a number of nonmodel organisms. However, long‐read technologies result in higher sequencing error rates and therefore impose an elevated cost of sufficient coverage to achieve high enough quality. In this context, hybrid assemblies, combining short‐reads and long‐reads, provide an alternative efficient and cost‐effective approach to generate de novo, chromosome‐level genome assemblies. The array of available software programs for hybrid genome assembly, sequence correction and manipulation are constantly being expanded and improved. This makes it difficult for nonexperts to find efficient, fast and tractable computational solutions for genome assembly, especially in the case of nonmodel organisms lacking a reference genome or one from a closely related species. In this study, we review and test the most recent pipelines for hybrid assemblies, comparing the model organism Drosophila melanogaster to a nonmodel cactophilic Drosophila, D. mojavensis. We show that it is possible to achieve excellent contiguity on this nonmodel organism using the dbg2olc pipeline. 相似文献

5.

Evaluating de Bruijn Graph Assemblers on 454 Transcriptomic Data

Xianwen Ren Tao Liu Jie Dong Lilian Sun Jian Yang Yafang Zhu Qi Jin 《PloS one》2012,7(12)

相似文献

6.

Assembly algorithms for next-generation sequencing data

Jason R. Miller Sergey Koren Granger Sutton 《Genomics》2010,95(6):315-327

相似文献

7.

Lessons from assembling UCEs: A comparison of common methods and the case of Clavinomia (Halictidae)

Silas Bossert Alain Pauly Bryan N. Danforth Michael C. Orr Elizabeth A. Murray 《Molecular ecology resources》2024,24(3):e13925

Sequence data assembly is a foundational step in high-throughput sequencing, with untold consequences for downstream analyses. Despite this, few studies have interrogated the many methods for assembling phylogenomic UCE data for their comparative efficacy, or for how outputs may be impacted. We study this by comparing the most commonly used assembly methods for UCEs in the under-studied bee lineage Nomiinae and a representative sampling of relatives. Data for 63 UCE-only and 75 mixed taxa were assembled with five methods, including ABySS, HybPiper, SPAdes, Trinity and Velvet, and then benchmarked for their relative performance in terms of locus capture parameters and phylogenetic reconstruction. Unexpectedly, Trinity and Velvet trailed the other methods in terms of locus capture and DNA matrix density, whereas SPAdes performed favourably in most assessed metrics. In comparison with SPAdes, the guided-assembly approach HybPiper generally recovered the highest quality loci but in lower numbers. Based on our results, we formally move Clavinomia to Dieunomiini and render Epinomia once more a subgenus of Dieunomia. We strongly advise that future studies more closely examine the influence of assembly approach on their results, or, minimally, use better-performing assembly methods such as SPAdes or HybPiper. In this way, we can move forward with phylogenomic studies in a more standardized, comparable manner. 相似文献

8.

Comparing de novo and reference-based transcriptome assembly strategies by applying them to the blood-sucking bug Rhodnius prolixus

《Insect biochemistry and molecular biology》2016

相似文献

9.

BioNano genome mapping of individual chromosomes supports physical mapping and sequence assembly in complex plant genomes

下载免费PDF全文

Helena Staňková Alex R. Hastie Saki Chan Jan Vrána Zuzana Tulpová Marie Kubaláková Paul Visendi Satomi Hayashi Mingcheng Luo Jacqueline Batley David Edwards Jaroslav Doležel Hana Šimková 《Plant biotechnology journal》2016,14(7):1523-1531

The assembly of a reference genome sequence of bread wheat is challenging due to its specific features such as the genome size of 17 Gbp, polyploid nature and prevalence of repetitive sequences. BAC‐by‐BAC sequencing based on chromosomal physical maps, adopted by the International Wheat Genome Sequencing Consortium as the key strategy, reduces problems caused by the genome complexity and polyploidy, but the repeat content still hampers the sequence assembly. Availability of a high‐resolution genomic map to guide sequence scaffolding and validate physical map and sequence assemblies would be highly beneficial to obtaining an accurate and complete genome sequence. Here, we chose the short arm of chromosome 7D (7DS) as a model to demonstrate for the first time that it is possible to couple chromosome flow sorting with genome mapping in nanochannel arrays and create a de novo genome map of a wheat chromosome. We constructed a high‐resolution chromosome map composed of 371 contigs with an N50 of 1.3 Mb. Long DNA molecules achieved by our approach facilitated chromosome‐scale analysis of repetitive sequences and revealed a ~800‐kb array of tandem repeats intractable to current DNA sequencing technologies. Anchoring 7DS sequence assemblies obtained by clone‐by‐clone sequencing to the 7DS genome map provided a valuable tool to improve the BAC‐contig physical map and validate sequence assembly on a chromosome‐arm scale. Our results indicate that creating genome maps for the whole wheat genome in a chromosome‐by‐chromosome manner is feasible and that they will be an affordable tool to support the production of improved pseudomolecules. 相似文献

10.

RADProc: A computationally efficient de novo locus assembler for population studies using RADseq data

Praveen Nadukkalam Ravindran Paul Bentzen Ian R. Bradbury Robert G. Beiko 《Molecular ecology resources》2019,19(1):272-282

Restriction site‐associated DNA sequencing (RADseq) is a powerful tool for genotyping of individuals, but the identification of loci and assignment of sequence reads is a crucial and often challenging step. The optimal parameter settings for a given de novo RADseq assembly vary between data sets and can be difficult and computationally expensive to determine. Here, we introduce RADProc, a software package that uses a graph data structure to represent all sequence reads and their similarity relationships. Storing sequence–comparison results in a graph eliminates unnecessary and redundant sequence similarity calculations. De novo locus formation for a given parameter set can be performed on the precomputed graph, making parameter sweeps far more efficient. RADProc also uses a clustering approach for faster nucleotide‐distance calculation. The performance of RADProc compares favourably with that of the widely used Stacks software. The run‐time comparisons between RADProc and Stacks for 32 different parameter settings using 20 green‐crab (Carcinus maenas) samples showed that RADProc took as little as 2 hr 40 min compared to 78 hr by Stacks, while 16 brown trout (Salmo trutta L.) samples were processed by RADProc and Stacks in 23 and 263 hr, respectively. Comparisons of the de novo loci formed, and catalog built using both the methods demonstrate that the improvement in processing speeds achieved by RADProc does not affect much the actual loci formed and the results of downstream analyses based on those loci. 相似文献

11.

Restriction site‐associated DNA sequencing,genotyping error estimation and de novo assembly optimization for population genetic inference

A. Mastretta‐Yanes N. Arrigo N. Alvarez T. H. Jorgensen D. Piñero B. C. Emerson 《Molecular ecology resources》2015,15(1):28-41

Restriction site‐associated DNA sequencing (RADseq) provides researchers with the ability to record genetic polymorphism across thousands of loci for nonmodel organisms, potentially revolutionizing the field of molecular ecology. However, as with other genotyping methods, RADseq is prone to a number of sources of error that may have consequential effects for population genetic inferences, and these have received only limited attention in terms of the estimation and reporting of genotyping error rates. Here we use individual sample replicates, under the expectation of identical genotypes, to quantify genotyping error in the absence of a reference genome. We then use sample replicates to (i) optimize de novo assembly parameters within the program Stacks, by minimizing error and maximizing the retrieval of informative loci; and (ii) quantify error rates for loci, alleles and single‐nucleotide polymorphisms. As an empirical example, we use a double‐digest RAD data set of a nonmodel plant species, Berberis alpina, collected from high‐altitude mountains in Mexico. 相似文献

12.

A chromosomal genomics approach to assess and validate the desi and kabuli draft chickpea genome assemblies

Pradeep Ruperao Chon‐Kit Kenneth Chan Sarwar Azam Miroslava Karafiátová Satomi Hayashi Jana Čížková Rachit K. Saxena Hana Šimková Chi Song Jan Vrána Annapurna Chitikineni Paul Visendi Pooran M. Gaur Teresa Millán Karam B. Singh Bunyamin Taran Jun Wang Jacqueline Batley Jaroslav Doležel Rajeev K. Varshney David Edwards 《Plant biotechnology journal》2014,12(6):778-786

With the expansion of next‐generation sequencing technology and advanced bioinformatics, there has been a rapid growth of genome sequencing projects. However, while this technology enables the rapid and cost‐effective assembly of draft genomes, the quality of these assemblies usually falls short of gold standard genome assemblies produced using the more traditional BAC by BAC and Sanger sequencing approaches. Assembly validation is often performed by the physical anchoring of genetically mapped markers, but this is prone to errors and the resolution is usually low, especially towards centromeric regions where recombination is limited. New approaches are required to validate reference genome assemblies. The ability to isolate individual chromosomes combined with next‐generation sequencing permits the validation of genome assemblies at the chromosome level. We demonstrate this approach by the assessment of the recently published chickpea kabuli and desi genomes. While previous genetic analysis suggests that these genomes should be very similar, a comparison of their chromosome sizes and published assemblies highlights significant differences. Our chromosomal genomics analysis highlights short defined regions that appear to have been misassembled in the kabuli genome and identifies large‐scale misassembly in the draft desi genome. The integration of chromosomal genomics tools within genome sequencing projects has the potential to significantly improve the construction and validation of genome assemblies. The approach could be applied both for new genome assemblies as well as published assemblies, and complements currently applied genome assembly strategies. 相似文献

13.

Two Low Coverage Bird Genomes and a Comparison of Reference-Guided versus De Novo Genome Assemblies

Daren C. Card Drew R. Schield Jacobo Reyes-Velasco Matthew K. Fujita Audra L. Andrew Sara J. Oyler-McCance Jennifer A. Fike Diana F. Tomback Robert P. Ruggiero Todd A. Castoe 《PloS one》2014,9(9)

As a greater number and diversity of high-quality vertebrate reference genomes become available, it is increasingly feasible to use these references to guide new draft assemblies for related species. Reference-guided assembly approaches may substantially increase the contiguity and completeness of a new genome using only low levels of genome coverage that might otherwise be insufficient for de novo genome assembly. We used low-coverage (∼3.5–5.5x) Illumina paired-end sequencing to assemble draft genomes of two bird species (the Gunnison Sage-Grouse, Centrocercus minimus, and the Clark''s Nutcracker, Nucifraga columbiana). We used these data to estimate de novo genome assemblies and reference-guided assemblies, and compared the information content and completeness of these assemblies by comparing CEGMA gene set representation, repeat element content, simple sequence repeat content, and GC isochore structure among assemblies. Our results demonstrate that even lower-coverage genome sequencing projects are capable of producing informative and useful genomic resources, particularly through the use of reference-guided assemblies. 相似文献

14.

Selecting Superior De Novo Transcriptome Assemblies: Lessons Learned by Leveraging the Best Plant Genome

Loren A. Honaas Eric K. Wafula Norman J. Wickett Joshua P. Der Yeting Zhang Patrick P. Edger Naomi S. Altman J. Chris Pires James H. Leebens-Mack Claude W. dePamphilis 《PloS one》2016,11(1)

相似文献

15.

An Integrated Pipeline for de Novo Assembly of Microbial Genomes

Andrew Tritt Jonathan A. Eisen Marc T. Facciotti Aaron E. Darling 《PloS one》2012,7(9)

Remarkable advances in DNA sequencing technology have created a need for de novo genome assembly methods tailored to work with the new sequencing data types. Many such methods have been published in recent years, but assembling raw sequence data to obtain a draft genome has remained a complex, multi-step process, involving several stages of sequence data cleaning, error correction, assembly, and quality control. Successful application of these steps usually requires intimate knowledge of a diverse set of algorithms and software. We present an assembly pipeline called A5 (Andrew And Aaron''s Awesome Assembly pipeline) that simplifies the entire genome assembly process by automating these stages, by integrating several previously published algorithms with new algorithms for quality control and automated assembly parameter selection. We demonstrate that A5 can produce assemblies of quality comparable to a leading assembly algorithm, SOAPdenovo, without any prior knowledge of the particular genome being assembled and without the extensive parameter tuning required by the other assembly algorithm. In particular, the assemblies produced by A5 exhibit 50% or more reduction in broken protein coding sequences relative to SOAPdenovo assemblies. The A5 pipeline can also assemble Illumina sequence data from libraries constructed by the Nextera (transposon-catalyzed) protocol, which have markedly different characteristics to mechanically sheared libraries. Finally, A5 has modest compute requirements, and can assemble a typical bacterial genome on current desktop or laptop computer hardware in under two hours, depending on depth of coverage. 相似文献

16.

An improved genome of the model marine alga Ostreococcus tauri unfolds by assessing Illumina de novo assemblies

Romain Blanc-Mathieu Bram Verhelst Evelyne Derelle Stephane Rombauts Fran?ois-Yves Bouget Isabelle Carré Annie Chateau Adam Eyre-Walker Nigel Grimsley Hervé Moreau Benoit Piégu Eric Rivals Wendy Schackwitz Yves Van de Peer Gwena?l Piganeau 《BMC genomics》2014,15(1)

Background

Cost effective next generation sequencing technologies now enable the production of genomic datasets for many novel planktonic eukaryotes, representing an understudied reservoir of genetic diversity. O. tauri is the smallest free-living photosynthetic eukaryote known to date, a coccoid green alga that was first isolated in 1995 in a lagoon by the Mediterranean sea. Its simple features, ease of culture and the sequencing of its 13 Mb haploid nuclear genome have promoted this microalga as a new model organism for cell biology. Here, we investigated the quality of genome assemblies of Illumina GAIIx 75 bp paired-end reads from Ostreococcus tauri, thereby also improving the existing assembly and showing the genome to be stably maintained in culture.

Results

The 3 assemblers used, ABySS, CLCBio and Velvet, produced 95% complete genomes in 1402 to 2080 scaffolds with a very low rate of misassembly. Reciprocally, these assemblies improved the original genome assembly by filling in 930 gaps. Combined with additional analysis of raw reads and PCR sequencing effort, 1194 gaps have been solved in total adding up to 460 kb of sequence. Mapping of RNAseq Illumina data on this updated genome led to a twofold reduction in the proportion of multi-exon protein coding genes, representing 19% of the total 7699 protein coding genes. The comparison of the DNA extracted in 2001 and 2009 revealed the fixation of 8 single nucleotide substitutions and 2 deletions during the approximately 6000 generations in the lab. The deletions either knocked out or truncated two predicted transmembrane proteins, including a glutamate-receptor like gene.

Conclusion

High coverage (>80 fold) paired-end Illumina sequencing enables a high quality 95% complete genome assembly of a compact ~13 Mb haploid eukaryote. This genome sequence has remained stable for 6000 generations of lab culture.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-1103) contains supplementary material, which is available to authorized users. 相似文献

17.

真菌基因组de novo测序组装的方法与实践

李丽翠曲积彬曾昭清 Tom Hsiang 余知和《基因组学与应用生物学》2020,39(1):173-180

真菌基因组较其他真核生物基因组结构简单,长度短,易于测序、组装与注释,因此真菌基因组是研究真核生物基因组的模型。为研究真菌基因组组装策略,本研究基于Illumina HiSeq测序平台对烟曲霉菌株An16007基因组测序,分别使用5种de novo组装软件ABySS、SOAP-denovo、Velvet、MaSuRCA和IDBA-UD组装基因组,然后通过Augustus软件进行基因预测,BUSCO软件评估组装结果。研究发现,5种组装软件对基因组组装结果不同,ABySS组装的基因组较其他4种组装软件具有更高的完整性和准确性,且预测的基因数量较高,因此,ABySS更适合本研究基因组的组装。本研究提供了真菌de novo测序、组装及组装质量评估的技术流程,为基因组<100 Mb的真菌或其他生物基因组的研究提供参考。相似文献

18.

A NGS approach to the encrusting Mediterranean sponge Crella elegans (Porifera,Demospongiae, Poecilosclerida): transcriptome sequencing,characterization and overview of the gene expression along three life cycle stages

A. R. Pérez‐Porro D. Navarro‐Gómez M. J. Uriz G. Giribet 《Molecular ecology resources》2013,13(3):494-509

相似文献

19.

Anchoring and ordering NGS contig assemblies by population sequencing (POPSEQ)

Martin Mascher Gary J. Muehlbauer Daniel S. Rokhsar Jarrod Chapman Jeremy Schmutz Kerrie Barry María Muñoz‐Amatriaín Timothy J. Close Roger P. Wise Alan H. Schulman Axel Himmelbach Klaus F.X. Mayer Uwe Scholz Jesse A. Poland Nils Stein Robbie Waugh 《The Plant journal : for cell and molecular biology》2013,76(4):718-727

Next‐generation whole‐genome shotgun assemblies of complex genomes are highly useful, but fail to link nearby sequence contigs with each other or provide a linear order of contigs along individual chromosomes. Here, we introduce a strategy based on sequencing progeny of a segregating population that allows de novo production of a genetically anchored linear assembly of the gene space of an organism. We demonstrate the power of the approach by reconstructing the chromosomal organization of the gene space of barley, a large, complex and highly repetitive 5.1 Gb genome. We evaluate the robustness of the new assembly by comparison to a recently released physical and genetic framework of the barley genome, and to various genetically ordered sequence‐based genotypic datasets. The method is independent of the need for any prior sequence resources, and will enable rapid and cost‐efficient establishment of powerful genomic information for many species. 相似文献

20.

Pebble and Rock Band: Heuristic Resolution of Repeats and Scaffolding in the Velvet Short-Read de Novo Assembler

Daniel R. Zerbino Gayle K. McEwen Elliott H. Margulies Ewan Birney 《PloS one》2009,4(12)

Background

Despite the short length of their reads, micro-read sequencing technologies have shown their usefulness for de novo sequencing. However, especially in eukaryotic genomes, complex repeat patterns are an obstacle to large assemblies.

Principal Findings

We present a novel heuristic algorithm, Pebble, which uses paired-end read information to resolve repeats and scaffold contigs to produce large-scale assemblies. In simulations, we can achieve weighted median scaffold lengths (N50) of above 1 Mbp in Bacteria and above 100 kbp in more complex organisms. Using real datasets we obtained a 96 kbp N50 in Pseudomonas syringae and a unique 147 kbp scaffold of a ferret BAC clone. We also present an efficient algorithm called Rock Band for the resolution of repeats in the case of mixed length assemblies, where different sequencing platforms are combined to obtain a cost-effective assembly.

Conclusions

These algorithms extend the utility of short read only assemblies into large complex genomes. They have been implemented and made available within the open-source Velvet short-read de novo assembler. 相似文献