期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Evaluating de Bruijn Graph Assemblers on 454 Transcriptomic Data

Xianwen Ren Tao Liu Jie Dong Lilian Sun Jian Yang Yafang Zhu Qi Jin 《PloS one》2012,7(12)

相似文献

2.

赵磊 Zachary LARSON-RABIN 陈斯云郭振华《植物分类与资源学报》2012,34(5):487-501

利用公共数据库中果蝇F1代和栽培水稻基于高通量Illumina测序平台的RNA Seq短序列数据,比较了8个 (ABySS, Velvet, SOAPdenovo, Oases, Trinity, Multiple k, T IDBA and Trans ABySS) 转录组从头组装软件。结果显示,在基于单一k mer和多重k mer方法的两类软件中,Trinity和Trans ABySS分别表现出最好的组装性能,而其它软件性能比较接近。我们还发现基于多重k mer比单一k mer可以组装获得更多的总碱基数目,但是即使利用最好的多重k mer组装软件,所获得的数据质量也比研究人员所期望的要低。鉴于此,我们提出了“ETM”优化方法,将多重k mer方法组合到Trinity中,使其在具有最好的组装性能的基础上兼具了多重k mer的优势,测试结果显示了该方法具有一定的优越性。我们的研究结果为用户选择合适的软件提供了依据,对推动基于高通量Illumina测序的转录组研究具有重要意义。相似文献

3.

Comparing De Novo Transcriptome Assemblers Using Illumina RNA Seq Reads

ZhAO Lei Zachary LARSONRABIN CHEN Si-Yun GUO Zhen-Hua 《Plant Diversity》2012,34(5):487-501

相似文献

4.

The Impacts of Read Length and Transcriptome Complexity for De Novo Assembly: A Simulation Study

Zheng Chang Zhenjia Wang Guojun Li 《PloS one》2014,9(4)

相似文献

5.

Selecting Superior De Novo Transcriptome Assemblies: Lessons Learned by Leveraging the Best Plant Genome

Loren A. Honaas Eric K. Wafula Norman J. Wickett Joshua P. Der Yeting Zhang Patrick P. Edger Naomi S. Altman J. Chris Pires James H. Leebens-Mack Claude W. dePamphilis 《PloS one》2016,11(1)

相似文献

6.

MetaVelvet-SL: an extension of the Velvet assembler to a de novo metagenomic assembler utilizing supervised learning

Afiahayati Kengo Sato Yasubumi Sakakibara 《DNA research》2015,22(1):69-77

The assembly of multiple genomes from mixed sequence reads is a bottleneck in metagenomic analysis. A single-genome assembly program (assembler) is not capable of resolving metagenome sequences, so assemblers designed specifically for metagenomics have been developed. MetaVelvet is an extension of the single-genome assembler Velvet. It has been proved to generate assemblies with higher N50 scores and higher quality than single-genome assemblers such as Velvet and SOAPdenovo when applied to metagenomic sequence reads and is frequently used in this research community. One important open problem for MetaVelvet is its low accuracy and sensitivity in detecting chimeric nodes in the assembly (de Bruijn) graph, which prevents the generation of longer contigs and scaffolds. We have tackled this problem of classifying chimeric nodes using supervised machine learning to significantly improve the performance of MetaVelvet and developed a new tool, called MetaVelvet-SL. A Support Vector Machine is used for learning the classification model based on 94 features extracted from candidate nodes. In extensive experiments, MetaVelvet-SL outperformed the original MetaVelvet and other state-of-the-art metagenomic assemblers, IDBA-UD, Ray Meta and Omega, to reconstruct accurate longer assemblies with higher N50 scores for both simulated data sets and real data sets of human gut microbial sequences. 相似文献

7.

Optimal assembly strategies of transcriptome related to ploidies of eukaryotic organisms

Bin He Shirong Zhao Yuehong Chen Qinghua Cao Changhe Wei Xiaojie Cheng Yizheng Zhang 《BMC genomics》2015,16(1)

相似文献

8.

Comparative de novo transcriptome analysis and metabolic pathway studies of Citrus paradisi flavedo from naive stage to ripened stage

Maulik Patel Toral Manvar Sachin Apurwa Arpita Ghosh Tanushree Tiwari Surendra K. Chikara 《Molecular biology reports》2014,41(5):3071-3080

相似文献

9.

Comparison of De Novo Transcriptome Assemblers and k-mer Strategies Using the Killifish,Fundulus heteroclitus

Satshil B. Rana Frank J. Zadlock IV Ziping Zhang Wyatt R. Murphy Carolyn S. Bentivegna 《PloS one》2016,11(4)

相似文献

10.

MetaVelvet: an extension of Velvet assembler to de novo metagenome assembly from short sequence reads

Toshiaki Namiki Tsuyoshi Hachiya Hideaki Tanaka Yasubumi Sakakibara 《Nucleic acids research》2012,40(20):e155

An important step in ‘metagenomics’ analysis is the assembly of multiple genomes from mixed sequence reads of multiple species in a microbial community. Most conventional pipelines use a single-genome assembler with carefully optimized parameters. A limitation of a single-genome assembler for de novo metagenome assembly is that sequences of highly abundant species are likely misidentified as repeats in a single genome, resulting in a number of small fragmented scaffolds. We extended a single-genome assembler for short reads, known as ‘Velvet’, to metagenome assembly, which we called ‘MetaVelvet’, for mixed short reads of multiple species. Our fundamental concept was to first decompose a de Bruijn graph constructed from mixed short reads into individual sub-graphs, and second, to build scaffolds based on each decomposed de Bruijn sub-graph as an isolate species genome. We made use of two features, the coverage (abundance) difference and graph connectivity, for the decomposition of the de Bruijn graph. For simulated datasets, MetaVelvet succeeded in generating significantly higher N50 scores than any single-genome assemblers. MetaVelvet also reconstructed relatively low-coverage genome sequences as scaffolds. On real datasets of human gut microbial read data, MetaVelvet produced longer scaffolds and increased the number of predicted genes. 相似文献

11.

Paired de bruijn graphs: a novel approach for incorporating mate pair information into genome assemblers

Medvedev P Pham S Chaisson M Tesler G Pevzner P 《Journal of computational biology》2011,18(11):1625-1634

The recent proliferation of next generation sequencing with short reads has enabled many new experimental opportunities but, at the same time, has raised formidable computational challenges in genome assembly. One of the key advances that has led to an improvement in contig lengths has been mate pairs, which facilitate the assembly of repeating regions. Mate pairs have been algorithmically incorporated into most next generation assemblers as various heuristic post-processing steps to correct the assembly graph or to link contigs into scaffolds. Such methods have allowed the identification of longer contigs than would be possible with single reads; however, they can still fail to resolve complex repeats. Thus, improved methods for incorporating mate pairs will have a strong effect on contig length in the future. Here, we introduce the paired de Bruijn graph, a generalization of the de Bruijn graph that incorporates mate pair information into the graph structure itself instead of analyzing mate pairs at a post-processing step. This graph has the potential to be used in place of the de Bruijn graph in any de Bruijn graph based assembler, maintaining all other assembly steps such as error-correction and repeat resolution. Through assembly results on simulated perfect data, we argue that this can effectively improve the contig sizes in assembly. 相似文献

12.

Comparative study of de novo assembly and genome-guided assembly strategies for transcriptome reconstruction based on RNA-Seq

BingXin Lu ZhenBing Zeng TieLiu Shi 《中国科学：生命科学英文版》2013,56(2):143-155

相似文献

13.

An ensemble strategy that significantly improves de novo assembly of microbial genomes from metagenomic next-generation sequencing data

Xutao Deng Samia N. Naccache Terry Ng Scot Federman Linlin Li Charles Y. Chiu Eric L. Delwart 《Nucleic acids research》2015,43(7):e46

Next-generation sequencing (NGS) approaches rapidly produce millions to billions of short reads, which allow pathogen detection and discovery in human clinical, animal and environmental samples. A major limitation of sequence homology-based identification for highly divergent microorganisms is the short length of reads generated by most highly parallel sequencing technologies. Short reads require a high level of sequence similarities to annotated genes to confidently predict gene function or homology. Such recognition of highly divergent homologues can be improved by reference-free (de novo) assembly of short overlapping sequence reads into larger contigs. We describe an ensemble strategy that integrates the sequential use of various de Bruijn graph and overlap-layout-consensus assemblers with a novel partitioned sub-assembly approach. We also proposed new quality metrics that are suitable for evaluating metagenome de novo assembly. We demonstrate that this new ensemble strategy tested using in silico spike-in, clinical and environmental NGS datasets achieved significantly better contigs than current approaches. 相似文献

14.

Construction of a Public CHO Cell Line Transcript Database Using Versatile Bioinformatics Analysis Pipelines

Oliver Rupp Jennifer Becker Karina Brinkrolf Christina Timmermann Nicole Borth Alfred Pühler Thomas Noll Alexander Goesmann 《PloS one》2014,9(1)

相似文献

15.

Accuracy of de novo assembly of DNA sequences from double‐digest libraries varies substantially among software

Melanie E. F. LaCava Ellen O. Aikens Libby C. Megna Gregg Randolph Charley Hubbard C. Alex Buerkle 《Molecular ecology resources》2020,20(2):360-370

Advances in DNA sequencing have made it feasible to gather genomic data for non‐model organisms and large sets of individuals, often using methods for sequencing subsets of the genome. Several of these methods sequence DNA associated with endonuclease restriction sites (various RAD and GBS methods). For use in taxa without a reference genome, these methods rely on de novo assembly of fragments in the sequencing library. Many of the software options available for this application were originally developed for other assembly types and we do not know their accuracy for reduced representation libraries. To address this important knowledge gap, we simulated data from the Arabidopsis thaliana and Homo sapiens genomes and compared de novo assemblies by six software programs that are commonly used or promising for this purpose (ABySS , CD‐HIT , Stacks , Stacks2 , Velvet and VSEARCH ). We simulated different mutation rates and types of mutations, and then applied the six assemblers to the simulated data sets, varying assembly parameters. We found substantial variation in software performance across simulations and parameter settings. ABySS failed to recover any true genome fragments, and Velvet and VSEARCH performed poorly for most simulations. Stacks and Stacks2 produced accurate assemblies of simulations containing SNPs, but the addition of insertion and deletion mutations decreased their performance. CD‐HIT was the only assembler that consistently recovered a high proportion of true genome fragments. Here, we demonstrate the substantial difference in the accuracy of assemblies from different software programs and the importance of comparing assemblies that result from different parameter settings. 相似文献

16.

Comparisons of De Novo Transcriptome Assemblers in Diploid and Polyploid Species Using Peanut (Arachis spp.) RNA-Seq Data

Ratan Chopra Gloria Burow Andrew Farmer Joann Mudge Charles E. Simpson Mark D. Burow 《PloS one》2014,9(12)

相似文献

17.

真菌基因组de novo测序组装的方法与实践

李丽翠曲积彬曾昭清 Tom Hsiang 余知和《基因组学与应用生物学》2020,39(1):173-180

真菌基因组较其他真核生物基因组结构简单,长度短,易于测序、组装与注释,因此真菌基因组是研究真核生物基因组的模型。为研究真菌基因组组装策略,本研究基于Illumina HiSeq测序平台对烟曲霉菌株An16007基因组测序,分别使用5种de novo组装软件ABySS、SOAP-denovo、Velvet、MaSuRCA和IDBA-UD组装基因组,然后通过Augustus软件进行基因预测,BUSCO软件评估组装结果。研究发现,5种组装软件对基因组组装结果不同,ABySS组装的基因组较其他4种组装软件具有更高的完整性和准确性,且预测的基因数量较高,因此,ABySS更适合本研究基因组的组装。本研究提供了真菌de novo测序、组装及组装质量评估的技术流程,为基因组<100 Mb的真菌或其他生物基因组的研究提供参考。相似文献

18.

De Novo Assembly of the Transcriptome of the Non-Model Plant Streptocarpus rexii Employing a Novel Heuristic to Recover Locus-Specific Transcript Clusters

Matteo Chiara David S. Horner Alberto Spada 《PloS one》2013,8(12)

相似文献

19.

Inferring bona fide transfrags in RNA-Seq derived-transcriptome assemblies of non-model organisms

Stanley Kimbung Mbandi Uljana Hesse Peter van Heusden Alan Christoffels 《BMC bioinformatics》2015,16(1)

相似文献

20.

Identification of Optimum Sequencing Depth Especially for De Novo Genome Assembly of Small Genomes Using Next Generation Sequencing Data

Aarti Desai Veer Singh Marwah Akshay Yadav Vineet Jha Kishor Dhaygude Ujwala Bangar Vivek Kulkarni Abhay Jere 《PloS one》2013,8(4)

Next Generation Sequencing (NGS) is a disruptive technology that has found widespread acceptance in the life sciences research community. The high throughput and low cost of sequencing has encouraged researchers to undertake ambitious genomic projects, especially in de novo genome sequencing. Currently, NGS systems generate sequence data as short reads and de novo genome assembly using these short reads is computationally very intensive. Due to lower cost of sequencing and higher throughput, NGS systems now provide the ability to sequence genomes at high depth. However, currently no report is available highlighting the impact of high sequence depth on genome assembly using real data sets and multiple assembly algorithms. Recently, some studies have evaluated the impact of sequence coverage, error rate and average read length on genome assembly using multiple assembly algorithms, however, these evaluations were performed using simulated datasets. One limitation of using simulated datasets is that variables such as error rates, read length and coverage which are known to impact genome assembly are carefully controlled. Hence, this study was undertaken to identify the minimum depth of sequencing required for de novo assembly for different sized genomes using graph based assembly algorithms and real datasets. Illumina reads for E.coli (4.6 MB) S.kudriavzevii (11.18 MB) and C.elegans (100 MB) were assembled using SOAPdenovo, Velvet, ABySS, Meraculous and IDBA-UD. Our analysis shows that 50X is the optimum read depth for assembling these genomes using all assemblers except Meraculous which requires 100X read depth. Moreover, our analysis shows that de novo assembly from 50X read data requires only 6–40 GB RAM depending on the genome size and assembly algorithm used. We believe that this information can be extremely valuable for researchers in designing experiments and multiplexing which will enable optimum utilization of sequencing as well as analysis resources. 相似文献