期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

De novo assembly of bacterial transcriptomes from RNA-seq data

Brian Tjaden 《Genome biology》2015,16(1)

相似文献

2.

A new approach for annotation of transposable elements using small RNA mapping

Moaine El?Baidouri Kyung Do Kim Brian Abernathy Siwaret Arikit Florian Maumus Olivier Panaud Blake C. Meyers Scott A. Jackson 《Nucleic acids research》2015,43(13):e84

Transposable elements (TEs) are mobile genomic DNA sequences found in most organisms. They so densely populate the genomes of many eukaryotic species that they are often the major constituents. With the rapid generation of many plant genome sequencing projects over the past few decades, there is an urgent need for improved TE annotation as a prerequisite for genome-wide studies. Analogous to the use of RNA-seq for gene annotation, we propose a new method for de novo TE annotation that uses as a guide 24 nt-siRNAs that are a part of TE silencing pathways. We use this new approach, called TASR (for Transposon Annotation using Small RNAs), for de novo annotation of TEs in Arabidopsis, rice and soybean and demonstrate that this strategy can be successfully applied for de novo TE annotation in plants.Executable PERL is available for download from: http://tasr-pipeline.sourceforge.net/ 相似文献

3.

BinPacker: Packing-Based De Novo Transcriptome Assembly from RNA-seq Data

Juntao Liu Guojun Li Zheng Chang Ting Yu Bingqiang Liu Rick McMullen Pengyin Chen Xiuzhen Huang 《PLoS computational biology》2016,12(2)

相似文献

4.

ChiloKey,an interactive identification tool for the geophilomorph centipedes of Europe (Chilopoda,Geophilomorpha)

Lucio Bonato Alessandro Minelli Massimo Lopresti Pierfilippo Cerretti 《ZooKeys》2014,(443):1-9

ChiloKey is a matrix-based, interactive key to all 179 species of Geophilomorpha (Chilopoda) recorded from Europe, including species of uncertain identity and those whose morphology is known partially only. The key is intended to assist in identification of subadult and adult specimens, by means of microscopy and simple dissection techniques whenever necessary. The key is freely available through the web at: http://www.biologia.unipd.it/chilokey/ and at http://www.interactive-keys.eu/chilokey/. 相似文献

5.

Compression of next-generation sequencing reads aided by highly efficient de novo assembly

Daniel C. Jones Walter L. Ruzzo Xinxia Peng Michael G. Katze 《Nucleic acids research》2012,40(22):e171

We present Quip, a lossless compression algorithm for next-generation sequencing data in the FASTQ and SAM/BAM formats. In addition to implementing reference-based compression, we have developed, to our knowledge, the first assembly-based compressor, using a novel de novo assembly algorithm. A probabilistic data structure is used to dramatically reduce the memory required by traditional de Bruijn graph assemblers, allowing millions of reads to be assembled very efficiently. Read sequences are then stored as positions within the assembled contigs. This is combined with statistical compression of read identifiers, quality scores, alignment information and sequences, effectively collapsing very large data sets to <15% of their original size with no loss of information. Availability: Quip is freely available under the 3-clause BSD license from http://cs.washington.edu/homes/dcjones/quip. 相似文献

6.

Identification of novel fusion genes in lung cancer using breakpoint assembly of transcriptome sequencing data

《Genome biology》2015,16(1)

相似文献

7.

DIDA: Distributed Indexing Dispatched Alignment

Hamid Mohamadi Benjamin P Vandervalk Anthony Raymond Shaun D Jackman Justin Chu Clay P Breshears Inanc Birol 《PloS one》2015,10(4)

One essential application in bioinformatics that is affected by the high-throughput sequencing data deluge is the sequence alignment problem, where nucleotide or amino acid sequences are queried against targets to find regions of close similarity. When queries are too many and/or targets are too large, the alignment process becomes computationally challenging. This is usually addressed by preprocessing techniques, where the queries and/or targets are indexed for easy access while searching for matches. When the target is static, such as in an established reference genome, the cost of indexing is amortized by reusing the generated index. However, when the targets are non-static, such as contigs in the intermediate steps of a de novo assembly process, a new index must be computed for each run. To address such scalability problems, we present DIDA, a novel framework that distributes the indexing and alignment tasks into smaller subtasks over a cluster of compute nodes. It provides a workflow beyond the common practice of embarrassingly parallel implementations. DIDA is a cost-effective, scalable and modular framework for the sequence alignment problem in terms of memory usage and runtime. It can be employed in large-scale alignments to draft genomes and intermediate stages of de novo assembly runs. The DIDA source code, sample files and user manual are available through http://www.bcgsc.ca/platform/bioinfo/software/dida. The software is released under the British Columbia Cancer Agency License (BCCA), and is free for academic use. 相似文献

8.

Bridger: a new framework for de novo transcriptome assembly using RNA-seq data

Zheng Chang Guojun Li Juntao Liu Yu Zhang Cody Ashby Deli Liu Carole L Cramer Xiuzhen Huang 《Genome biology》2015,16(1)

相似文献

9.

Sequencing and Characterization of Striped Venus Transcriptome Expand Resources for Clam Fishery Genetics

Alessandro Coppe Stefania Bortoluzzi Giulia Murari Ilaria Anna Maria Marino Lorenzo Zane Chiara Papetti 《PloS one》2012,7(9)

相似文献

10.

SHRiMP: Accurate Mapping of Short Color-space Reads

Stephen M. Rumble Phil Lacroute Adrian V. Dalca Marc Fiume Arend Sidow Michael Brudno 《PLoS computational biology》2009,5(5)

The development of Next Generation Sequencing technologies, capable of sequencing hundreds of millions of short reads (25–70 bp each) in a single run, is opening the door to population genomic studies of non-model species. In this paper we present SHRiMP - the SHort Read Mapping Package: a set of algorithms and methods to map short reads to a genome, even in the presence of a large amount of polymorphism. Our method is based upon a fast read mapping technique, separate thorough alignment methods for regular letter-space as well as AB SOLiD (color-space) reads, and a statistical model for false positive hits. We use SHRiMP to map reads from a newly sequenced Ciona savignyi individual to the reference genome. We demonstrate that SHRiMP can accurately map reads to this highly polymorphic genome, while confirming high heterozygosity of C. savignyi in this second individual. SHRiMP is freely available at http://compbio.cs.toronto.edu/shrimp. 相似文献

11.

A multi-split mapping algorithm for circular RNA,splicing, trans-splicing and fusion detection

Steve Hoffmann Christian Otto Gero Doose Andrea Tanzer David Langenberger Sabina Christ Manfred Kunz Lesca M Holdt Daniel Teupser J?rg Hackermüller Peter F Stadler 《Genome biology》2014,15(2):R34

Numerous high-throughput sequencing studies have focused on detecting conventionally spliced mRNAs in RNA-seq data. However, non-standard RNAs arising through gene fusion, circularization or trans-splicing are often neglected. We introduce a novel, unbiased algorithm to detect splice junctions from single-end cDNA sequences. In contrast to other methods, our approach accommodates multi-junction structures. Our method compares favorably with competing tools for conventionally spliced mRNAs and, with a gain of up to 40% of recall, systematically outperforms them on reads with multiple splits, trans-splicing and circular products. The algorithm is integrated into our mapping tool segemehl (http://www.bioinf.uni-leipzig.de/Software/segemehl/). 相似文献

12.

Corset: enabling differential gene expression analysis for de novo assembled transcriptomes

Nadia M Davidson Alicia Oshlack 《Genome biology》2014,15(7)

相似文献

13.

Optimal assembly strategies of transcriptome related to ploidies of eukaryotic organisms

Bin He Shirong Zhao Yuehong Chen Qinghua Cao Changhe Wei Xiaojie Cheng Yizheng Zhang 《BMC genomics》2015,16(1)

相似文献

14.

Evaluation of de novo transcriptome assemblies from RNA-Seq data

Bo Li Nathanael Fillmore Yongsheng Bai Mike Collins James A Thomson Ron Stewart Colin N Dewey 《Genome biology》2014,15(12)

相似文献

15.

Probabilistic error correction for RNA sequencing

Hai-Son Le Marcel H. Schulz Brenna M. McCauley Veronica F. Hinman Ziv Bar-Joseph 《Nucleic acids research》2013,41(10):e109

相似文献

16.

PANADA: Protein Association Network Annotation,Determination and Analysis

Alberto J. M. Martin Ian Walsh Tomás Di Domenico Ivan Mi?eti? Silvio C. E. Tosatto 《PloS one》2013,8(11)

Increasingly large numbers of proteins require methods for functional annotation. This is typically based on pairwise inference from the homology of either protein sequence or structure. Recently, similarity networks have been presented to leverage both the ability to visualize relationships between proteins and assess the transferability of functional inference. Here we present PANADA, a novel toolkit for the visualization and analysis of protein similarity networks in Cytoscape. Networks can be constructed based on pairwise sequence or structural alignments either on a set of proteins or, alternatively, by database search from a single sequence. The Panada web server, executable for download and examples and extensive help files are available at URL: http://protein.bio.unipd.it/panada/. 相似文献

17.

FastUniq: A Fast De Novo Duplicates Removal Tool for Paired Short Reads

Haibin Xu Xiang Luo Jun Qian Xiaohui Pang Jingyuan Song Guangrui Qian Jinhui Chen Shilin Chen 《PloS one》2012,7(12)

The presence of duplicates introduced by PCR amplification is a major issue in paired short reads from next-generation sequencing platforms. These duplicates might have a serious impact on research applications, such as scaffolding in whole-genome sequencing and discovering large-scale genome variations, and are usually removed. We present FastUniq as a fast de novo tool for removal of duplicates in paired short reads. FastUniq identifies duplicates by comparing sequences between read pairs and does not require complete genome sequences as prerequisites. FastUniq is capable of simultaneously handling reads with different lengths and results in highly efficient running time, which increases linearly at an average speed of 87 million reads per 10 minutes. FastUniq is freely available at http://sourceforge.net/projects/fastuniq/. 相似文献

18.

A Scalable and Accurate Targeted Gene Assembly Tool (SAT-Assembler) for Next-Generation Sequencing Data

Yuan Zhang Yanni Sun James R. Cole 《PLoS computational biology》2014,10(8)

Gene assembly, which recovers gene segments from short reads, is an important step in functional analysis of next-generation sequencing data. Lacking quality reference genomes, de novo assembly is commonly used for RNA-Seq data of non-model organisms and metagenomic data. However, heterogeneous sequence coverage caused by heterogeneous expression or species abundance, similarity between isoforms or homologous genes, and large data size all pose challenges to de novo assembly. As a result, existing assembly tools tend to output fragmented contigs or chimeric contigs, or have high memory footprint. In this work, we introduce a targeted gene assembly program SAT-Assembler, which aims to recover gene families of particular interest to biologists. It addresses the above challenges by conducting family-specific homology search, homology-guided overlap graph construction, and careful graph traversal. It can be applied to both RNA-Seq and metagenomic data. Our experimental results on an Arabidopsis RNA-Seq data set and two metagenomic data sets show that SAT-Assembler has smaller memory usage, comparable or better gene coverage, and lower chimera rate for assembling a set of genes from one or multiple pathways compared with other assembly tools. Moreover, the family-specific design and rapid homology search allow SAT-Assembler to be naturally compatible with parallel computing platforms. The source code of SAT-Assembler is available at https://sourceforge.net/projects/sat-assembler/. The data sets and experimental settings can be found in supplementary material. 相似文献

19.

PLEK: a tool for predicting long non-coding RNAs and messenger RNAs based on an improved k-mer scheme

Aimin Li Junying Zhang Zhongyin Zhou 《BMC bioinformatics》2014,15(1)

相似文献

20.

Optimizing Information in Next-Generation-Sequencing (NGS) Reads for Improving De Novo Genome Assembly

Tsunglin Liu Cheng-Hung Tsai Wen-Bin Lee Jung-Hsien Chiang 《PloS one》2013,8(7)

Next-Generation-Sequencing is advantageous because of its much higher data throughput and much lower cost compared with the traditional Sanger method. However, NGS reads are shorter than Sanger reads, making de novo genome assembly very challenging. Because genome assembly is essential for all downstream biological studies, great efforts have been made to enhance the completeness of genome assembly, which requires the presence of long reads or long distance information. To improve de novo genome assembly, we develop a computational program, ARF-PE, to increase the length of Illumina reads. ARF-PE takes as input Illumina paired-end (PE) reads and recovers the original DNA fragments from which two ends the paired reads are obtained. On the PE data of four bacteria, ARF-PE recovered >87% of the DNA fragments and achieved >98% of perfect DNA fragment recovery. Using Velvet, SOAPdenovo, Newbler, and CABOG, we evaluated the benefits of recovered DNA fragments to genome assembly. For all four bacteria, the recovered DNA fragments increased the assembly contiguity. For example, the N50 lengths of the P. brasiliensis contigs assembled by SOAPdenovo and Newbler increased from 80,524 bp to 166,573 bp and from 80,655 bp to 193,388 bp, respectively. ARF-PE also increased assembly accuracy in many cases. On the PE data of two fungi and a human chromosome, ARF-PE doubled and tripled the N50 length. However, the assembly accuracies dropped, but still remained >91%. In general, ARF-PE can increase both assembly contiguity and accuracy for bacterial genomes. For complex eukaryotic genomes, ARF-PE is promising because it raises assembly contiguity. But future error correction is needed for ARF-PE to also increase the assembly accuracy. ARF-PE is freely available at http://140.116.235.124/~tliu/arf-pe/. 相似文献