期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

BinPacker: Packing-Based De Novo Transcriptome Assembly from RNA-seq Data

Juntao Liu Guojun Li Zheng Chang Ting Yu Bingqiang Liu Rick McMullen Pengyin Chen Xiuzhen Huang 《PLoS computational biology》2016,12(2)

相似文献

2.

PHYSICO2: an UNIX based standalone procedure for computation of physicochemical,window-dependent and substitution based evolutionary properties of protein sequences along with automated block preparation tool,version 2

Shyamashree Banerjee Parth Sarthi Sen Gupta Arnab Nayek Sunit Das Vishma Pratap Sur Pratyay Seth Rifat Nawaz Ul Islam Amal K Bandyopadhyay 《Bioinformation》2015,11(7):366-368

AvailabilityPHYSICO2: is freely available at http://sourceforge.net/projects/physico2/ along with its documentation at https://sourceforge.net/projects/physico2/files/Documentation.pdf/download for all users. 相似文献

3.

PLEK: a tool for predicting long non-coding RNAs and messenger RNAs based on an improved k-mer scheme

Aimin Li Junying Zhang Zhongyin Zhou 《BMC bioinformatics》2014,15(1)

相似文献

4.

Compression of FASTQ and SAM Format Sequencing Data

James K. Bonfield Matthew V. Mahoney 《PloS one》2013,8(3)

Storage and transmission of the data produced by modern DNA sequencing instruments has become a major concern, which prompted the Pistoia Alliance to pose the SequenceSqueeze contest for compression of FASTQ files. We present several compression entries from the competition, Fastqz and Samcomp/Fqzcomp, including the winning entry. These are compared against existing algorithms for both reference based compression (CRAM, Goby) and non-reference based compression (DSRC, BAM) and other recently published competition entries (Quip, SCALCE). The tools are shown to be the new Pareto frontier for FASTQ compression, offering state of the art ratios at affordable CPU costs. All programs are freely available on SourceForge. Fastqz: https://sourceforge.net/projects/fastqz/, fqzcomp: https://sourceforge.net/projects/fqzcomp/, and samcomp: https://sourceforge.net/projects/samcomp/. 相似文献

5.

AutoAssemblyD: a graphical user interface system for several genome assemblers

Adonney Allan de Oliveira Veras Pablo Henrique Caracciolo Gomes de Sá Vasco Azevedo Artur Silva Rommel Thiago Jucá Ramos 《Bioinformation》2013,9(16):840-841

Next-generation sequencing technologies have increased the amount of biological data generated. Thus, bioinformatics has become important because new methods and algorithms are necessary to manipulate and process such data. However, certain challenges have emerged, such as genome assembly using short reads and high-throughput platforms. In this context, several algorithms have been developed, such as Velvet, Abyss, Euler-SR, Mira, Edna, Maq, SHRiMP, Newbler, ALLPATHS, Bowtie and BWA. However, most such assemblers do not have a graphical interface, which makes their use difficult for users without computing experience given the complexity of the assembler syntax. Thus, to make the operation of such assemblers accessible to users without a computing background, we developed AutoAssemblyD, which is a graphical tool for genome assembly submission and remote management by multiple assemblers through XML templates.

Availability

AssemblyD is freely available at https://sourceforge.net/projects/autoassemblyd. It requires Sun jdk 6 or higher. 相似文献

6.

A Scalable and Accurate Targeted Gene Assembly Tool (SAT-Assembler) for Next-Generation Sequencing Data

Yuan Zhang Yanni Sun James R. Cole 《PLoS computational biology》2014,10(8)

Gene assembly, which recovers gene segments from short reads, is an important step in functional analysis of next-generation sequencing data. Lacking quality reference genomes, de novo assembly is commonly used for RNA-Seq data of non-model organisms and metagenomic data. However, heterogeneous sequence coverage caused by heterogeneous expression or species abundance, similarity between isoforms or homologous genes, and large data size all pose challenges to de novo assembly. As a result, existing assembly tools tend to output fragmented contigs or chimeric contigs, or have high memory footprint. In this work, we introduce a targeted gene assembly program SAT-Assembler, which aims to recover gene families of particular interest to biologists. It addresses the above challenges by conducting family-specific homology search, homology-guided overlap graph construction, and careful graph traversal. It can be applied to both RNA-Seq and metagenomic data. Our experimental results on an Arabidopsis RNA-Seq data set and two metagenomic data sets show that SAT-Assembler has smaller memory usage, comparable or better gene coverage, and lower chimera rate for assembling a set of genes from one or multiple pathways compared with other assembly tools. Moreover, the family-specific design and rapid homology search allow SAT-Assembler to be naturally compatible with parallel computing platforms. The source code of SAT-Assembler is available at https://sourceforge.net/projects/sat-assembler/. The data sets and experimental settings can be found in supplementary material. 相似文献

7.

Generation of Artificial FASTQ Files to Evaluate the Performance of Next-Generation Sequencing Pipelines

Matthew Frampton Richard Houlston 《PloS one》2012,7(11)

Pipelines for the analysis of Next-Generation Sequencing (NGS) data are generally composed of a set of different publicly available software, configured together in order to map short reads of a genome and call variants. The fidelity of pipelines is variable. We have developed ArtificialFastqGenerator, which takes a reference genome sequence as input and outputs artificial paired-end FASTQ files containing Phred quality scores. Since these artificial FASTQs are derived from the reference genome, it provides a gold-standard for read-alignment and variant-calling, thereby enabling the performance of any NGS pipeline to be evaluated. The user can customise DNA template/read length, the modelling of coverage based on GC content, whether to use real Phred base quality scores taken from existing FASTQ files, and whether to simulate sequencing errors. Detailed coverage and error summary statistics are outputted. Here we describe ArtificialFastqGenerator and illustrate its implementation in evaluating a typical bespoke NGS analysis pipeline under different experimental conditions. ArtificialFastqGenerator was released in January 2012. Source code, example files and binaries are freely available under the terms of the GNU General Public License v3.0. from https://sourceforge.net/projects/artfastqgen/. 相似文献

8.

An assembly and alignment-free method of phylogeny reconstruction from next-generation sequencing data

Huan Fan Anthony R. Ives Yann Surget-Groba Charles H. Cannon 《BMC genomics》2015,16(1)

Background

Next-generation sequencing technologies are rapidly generating whole-genome datasets for an increasing number of organisms. However, phylogenetic reconstruction of genomic data remains difficult because de novo assembly for non-model genomes and multi-genome alignment are challenging.

Results

To greatly simplify the analysis, we present an Assembly and Alignment-Free (AAF) method (https://sourceforge.net/projects/aaf-phylogeny) that constructs phylogenies directly from unassembled genome sequence data, bypassing both genome assembly and alignment. Using mathematical calculations, models of sequence evolution, and simulated sequencing of published genomes, we address both evolutionary and sampling issues caused by direct reconstruction, including homoplasy, sequencing errors, and incomplete sequencing coverage. From these results, we calculate the statistical properties of the pairwise distances between genomes, allowing us to optimize parameter selection and perform bootstrapping. As a test case with real data, we successfully reconstructed the phylogeny of 12 mammals using raw sequencing reads. We also applied AAF to 21 tropical tree genome datasets with low coverage to demonstrate its effectiveness on non-model organisms.

Conclusion

Our AAF method opens up phylogenomics for species without an appropriate reference genome or high sequence coverage, and rapidly creates a phylogenetic framework for further analysis of genome structure and diversity among non-model organisms.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1647-5) contains supplementary material, which is available to authorized users. 相似文献

9.

Compression of Large genomic datasets using COMRAD on Parallel Computing Platform

Christopher Leela Biji Manu K Madhu Vineetha Vishnu Satheesh Kumar K Vijayakumar Achuthsankar S Nair 《Bioinformation》2015,11(5):267-271

The big data storage is a challenge in a post genome era. Hence, there is a need for high performance computing solutions for managing large genomic data. Therefore, it is of interest to describe a parallel-computing approach using message-passing library for distributing the different compression stages in clusters. The genomic compression helps to reduce the on disk“foot print” of large data volumes of sequences. This supports the computational infrastructure for a more efficient archiving. The approach was shown to find utility in 21 Eukaryotic genomes using stratified sampling in this report. The method achieves an average of 6-fold disk space reduction with three times better compression time than COMRAD.

Availability

The source codes are written in C using message passing libraries and are available at https:// sourceforge.net/ projects/ comradmpi/files / COMRADMPI/ 相似文献

10.

A Computational Model for Predicting RNase H Domain of Retrovirus

Sijia Wu Xinman Zhang Jiuqiang Han 《PloS one》2016,11(8)

RNase H (RNH) is a pivotal domain in retrovirus to cleave the DNA-RNA hybrid for continuing retroviral replication. The crucial role indicates that RNH is a promising drug target for therapeutic intervention. However, annotated RNHs in UniProtKB database have still been insufficient for a good understanding of their statistical characteristics so far. In this work, a computational RNH model was proposed to annotate new putative RNHs (np-RNHs) in the retroviruses. It basically predicts RNH domains through recognizing their start and end sites separately with SVM method. The classification accuracy rates are 100%, 99.01% and 97.52% respectively corresponding to jack-knife, 10-fold cross-validation and 5-fold cross-validation test. Subsequently, this model discovered 14,033 np-RNHs after scanning sequences without RNH annotations. All these predicted np-RNHs and annotated RNHs were employed to analyze the length, hydrophobicity and evolutionary relationship of RNH domains. They are all related to retroviral genera, which validates the classification of retroviruses to a certain degree. In the end, a software tool was designed for the application of our prediction model. The software together with datasets involved in this paper can be available for free download at https://sourceforge.net/projects/rhtool/files/?source=navbar. 相似文献

11.

GMATo: A novel tool for the identification and analysis of microsatellites in large genomes

Xuewen Wang Peng Lu Zhaopeng Luo 《Bioinformation》2013,9(10):541-544

Simple Sequence Repeats (SSR), also called microsatellite, is very useful for genetic marker development and genome application. The increasing whole sequences of more and more large genomes provide sources for SSR mining in silico. However currently existing SSR mining tools can’t process large genomes efficiently and generate no or poor statistics. Genome-wide Microsatellite Analyzing Tool (GMATo) is a novel tool for SSR mining and statistics at genome aspects. It is faster and more accurate than existed tools SSR Locator and MISA. If a DNA sequence was too long, it was chunked to short segments at several Mb followed by motifs generation and searching using Perl powerful pattern match function. Matched loci data from each chunk were then merged to produce final SSR loci information. Only one input file is required which contains raw fasta DNA sequences and output files in tabular format list all SSR loci information and statistical distribution at four classifications. GMATo was programmed in Java and Perl with both graphic and command line interface, either executable alone in platform independent manner with full parameters control. Software GMATo is a powerful tool for complete SSR characterization in genomes at any size.

Availability

The soft GMATo is freely available at http://sourceforge.net/projects/gmato/files/?source=navbar or on contact 相似文献

12.

CStone: A de novo transcriptome assembler for short-read data that identifies non-chimeric contigs based on underlying graph structure

Raquel Linheiro John Archer 《PLoS computational biology》2021,17(11)

相似文献

13.

Mobster: accurate detection of mobile element insertions in next generation sequencing data

Djie Tjwan Thung Joep de Ligt Lisenka EM Vissers Marloes Steehouwer Mark Kroon Petra de Vries Eline P Slagboom Kai Ye Joris A Veltman Jayne Y Hehir-Kwa 《Genome biology》2014,15(10)

Mobile elements are major drivers in changing genomic architecture and can cause disease. The detection of mobile elements is hindered due to the low mappability of their highly repetitive sequences. We have developed an algorithm, called Mobster, to detect non-reference mobile element insertions in next generation sequencing data from both whole genome and whole exome studies. Mobster uses discordant read pairs and clipped reads in combination with consensus sequences of known active mobile elements. Mobster has a low false discovery rate and high recall rate for both L1 and Alu elements. Mobster is available at http://sourceforge.net/projects/mobster.

Electronic supplementary material

The online version of this article (doi:10.1186/s13059-014-0488-x) contains supplementary material, which is available to authorized users. 相似文献

14.

FastUniq: A Fast De Novo Duplicates Removal Tool for Paired Short Reads

Haibin Xu Xiang Luo Jun Qian Xiaohui Pang Jingyuan Song Guangrui Qian Jinhui Chen Shilin Chen 《PloS one》2012,7(12)

The presence of duplicates introduced by PCR amplification is a major issue in paired short reads from next-generation sequencing platforms. These duplicates might have a serious impact on research applications, such as scaffolding in whole-genome sequencing and discovering large-scale genome variations, and are usually removed. We present FastUniq as a fast de novo tool for removal of duplicates in paired short reads. FastUniq identifies duplicates by comparing sequences between read pairs and does not require complete genome sequences as prerequisites. FastUniq is capable of simultaneously handling reads with different lengths and results in highly efficient running time, which increases linearly at an average speed of 87 million reads per 10 minutes. FastUniq is freely available at http://sourceforge.net/projects/fastuniq/. 相似文献

15.

A new approach for annotation of transposable elements using small RNA mapping

Moaine El?Baidouri Kyung Do Kim Brian Abernathy Siwaret Arikit Florian Maumus Olivier Panaud Blake C. Meyers Scott A. Jackson 《Nucleic acids research》2015,43(13):e84

Transposable elements (TEs) are mobile genomic DNA sequences found in most organisms. They so densely populate the genomes of many eukaryotic species that they are often the major constituents. With the rapid generation of many plant genome sequencing projects over the past few decades, there is an urgent need for improved TE annotation as a prerequisite for genome-wide studies. Analogous to the use of RNA-seq for gene annotation, we propose a new method for de novo TE annotation that uses as a guide 24 nt-siRNAs that are a part of TE silencing pathways. We use this new approach, called TASR (for Transposon Annotation using Small RNAs), for de novo annotation of TEs in Arabidopsis, rice and soybean and demonstrate that this strategy can be successfully applied for de novo TE annotation in plants.Executable PERL is available for download from: http://tasr-pipeline.sourceforge.net/ 相似文献

16.

ADSBET2: Automated Determination of Salt-Bridge Energy-Terms version 2

Arnab Nayek Parth Sarthi Sen Gupta Shyamashree Banerjee Vishma Pratap Sur Pratyay Seth Sunit Das Rifat Nawaz Ul Islam Amal Kumar Bandyopadhyay 《Bioinformation》2015,11(8):413-415

AvailabilityADSBET2 is freely available at http://sourceforge.net/projects/ADSBET2/ for all users. 相似文献

17.

Identification of novel fusion genes in lung cancer using breakpoint assembly of transcriptome sequencing data

《Genome biology》2015,16(1)

相似文献

18.

SBION2: Analyses of Salt Bridges from Multiple Structure Files,Version 2

Parth Sarthi Sen Gupta Arnab Nayek Shyamashree Banerjee Pratyay Seth Sunit Das Vishma Pratap Sur Chittran Roy Amal Kumar Bandyopadhyay 《Bioinformation》2015,11(1):39-42

AvailabilitySBION2 is freely available at http://sourceforge.net/projects/sbion2/ for academic users 相似文献

19.

De novo assembly of bacterial transcriptomes from RNA-seq data

Brian Tjaden 《Genome biology》2015,16(1)

相似文献

20.

CPAG: software for leveraging pleiotropy in GWAS to reveal similarity between human traits links plasma fatty acids and intestinal inflammation

Liuyang Wang Stefan H. Oehlers Scott T. Espenschied John F. Rawls David M. Tobin Dennis C. Ko 《Genome biology》2015,16(1)

Meta-analyses of genome-wide association studies (GWAS) have demonstrated that the same genetic variants can be associated with multiple diseases and other complex traits. We present software called CPAG (Cross-Phenotype Analysis of GWAS) to look for similarities between 700 traits, build trees with informative clusters, and highlight underlying pathways. Clusters are consistent with pre-defined groups and literature-based validation but also reveal novel connections. We report similarity between plasma palmitoleic acid and Crohn''s disease and find that specific fatty acids exacerbate enterocolitis in zebrafish. CPAG will become increasingly powerful as more genetic variants are uncovered, leading to a deeper understanding of complex traits. CPAG is freely available at www.sourceforge.net/projects/CPAG/.

Electronic supplementary material

The online version of this article (doi:10.1186/s13059-015-0722-1) contains supplementary material, which is available to authorized users. 相似文献