期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Compression of FASTQ and SAM Format Sequencing Data

James K. Bonfield Matthew V. Mahoney 《PloS one》2013,8(3)

Storage and transmission of the data produced by modern DNA sequencing instruments has become a major concern, which prompted the Pistoia Alliance to pose the SequenceSqueeze contest for compression of FASTQ files. We present several compression entries from the competition, Fastqz and Samcomp/Fqzcomp, including the winning entry. These are compared against existing algorithms for both reference based compression (CRAM, Goby) and non-reference based compression (DSRC, BAM) and other recently published competition entries (Quip, SCALCE). The tools are shown to be the new Pareto frontier for FASTQ compression, offering state of the art ratios at affordable CPU costs. All programs are freely available on SourceForge. Fastqz: https://sourceforge.net/projects/fastqz/, fqzcomp: https://sourceforge.net/projects/fqzcomp/, and samcomp: https://sourceforge.net/projects/samcomp/. 相似文献

2.

PHYSICO2: an UNIX based standalone procedure for computation of physicochemical,window-dependent and substitution based evolutionary properties of protein sequences along with automated block preparation tool,version 2

Shyamashree Banerjee Parth Sarthi Sen Gupta Arnab Nayek Sunit Das Vishma Pratap Sur Pratyay Seth Rifat Nawaz Ul Islam Amal K Bandyopadhyay 《Bioinformation》2015,11(7):366-368

AvailabilityPHYSICO2: is freely available at http://sourceforge.net/projects/physico2/ along with its documentation at https://sourceforge.net/projects/physico2/files/Documentation.pdf/download for all users. 相似文献

3.

Metagenomic Profiling of Known and Unknown Microbes with MicrobeGPS

Martin S. Lindner Bernhard Y. Renard 《PloS one》2015,10(2)

Microbial community profiling identifies and quantifies organisms in metagenomic sequencing data using either reference based or unsupervised approaches. However, current reference based profiling methods only report the presence and abundance of single reference genomes that are available in databases. Since only a small fraction of environmental genomes is represented in genomic databases, these approaches entail the risk of false identifications and often suggest a higher precision than justified by the data. Therefore, we developed MicrobeGPS, a novel metagenomic profiling approach that overcomes these limitations. MicrobeGPS is the first method that identifies microbiota in the sample and estimates their genomic distances to known reference genomes. With this strategy, MicrobeGPS identifies organisms down to the strain level and highlights possibly inaccurate identifications when the correct reference genome is missing. We demonstrate on three metagenomic datasets with different origin that our approach successfully avoids misleading interpretation of results and additionally provides more accurate results than current profiling methods. Our results indicate that MicrobeGPS can enable reference based taxonomic profiling of complex and less characterized microbial communities. MicrobeGPS is open source and available from https://sourceforge.net/projects/microbegps/ as source code and binary distribution for Windows and Linux operating systems. 相似文献

4.

Compression of Large genomic datasets using COMRAD on Parallel Computing Platform

Christopher Leela Biji Manu K Madhu Vineetha Vishnu Satheesh Kumar K Vijayakumar Achuthsankar S Nair 《Bioinformation》2015,11(5):267-271

The big data storage is a challenge in a post genome era. Hence, there is a need for high performance computing solutions for managing large genomic data. Therefore, it is of interest to describe a parallel-computing approach using message-passing library for distributing the different compression stages in clusters. The genomic compression helps to reduce the on disk“foot print” of large data volumes of sequences. This supports the computational infrastructure for a more efficient archiving. The approach was shown to find utility in 21 Eukaryotic genomes using stratified sampling in this report. The method achieves an average of 6-fold disk space reduction with three times better compression time than COMRAD.

Availability

The source codes are written in C using message passing libraries and are available at https:// sourceforge.net/ projects/ comradmpi/files / COMRADMPI/ 相似文献

5.

qpure: A Tool to Estimate Tumor Cellularity from Genome-Wide Single-Nucleotide Polymorphism Profiles

Sarah Song Katia Nones David Miller Ivon Harliwong Karin S. Kassahn Mark Pinese Marina Pajic Anthony J. Gill Amber L. Johns Matthew Anderson Oliver Holmes Conrad Leonard Darrin Taylor Scott Wood Qinying Xu Felicity Newell Mark J. Cowley Jianmin Wu Peter Wilson Lynn Fink Andrew V. Biankin Nic Waddell Sean M. Grimmond John V. Pearson 《PloS one》2012,7(9)

Tumour cellularity, the relative proportion of tumour and normal cells in a sample, affects the sensitivity of mutation detection, copy number analysis, cancer gene expression and methylation profiling. Tumour cellularity is traditionally estimated by pathological review of sectioned specimens; however this method is both subjective and prone to error due to heterogeneity within lesions and cellularity differences between the sample viewed during pathological review and tissue used for research purposes. In this paper we describe a statistical model to estimate tumour cellularity from SNP array profiles of paired tumour and normal samples using shifts in SNP allele frequency at regions of loss of heterozygosity (LOH) in the tumour. We also provide qpure, a software implementation of the method. Our experiments showed that there is a medium correlation 0.42 (-value = 0.0001) between tumor cellularity estimated by qpure and pathology review. Interestingly there is a high correlation 0.87 (-value 2.2e-16) between cellularity estimates by qpure and deep Ion Torrent sequencing of known somatic KRAS mutations; and a weaker correlation 0.32 (-value = 0.004) between IonTorrent sequencing and pathology review. This suggests that qpure may be a more accurate predictor of tumour cellularity than pathology review. qpure can be downloaded from https://sourceforge.net/projects/qpure/. 相似文献

6.

SBION2: Analyses of Salt Bridges from Multiple Structure Files,Version 2

Parth Sarthi Sen Gupta Arnab Nayek Shyamashree Banerjee Pratyay Seth Sunit Das Vishma Pratap Sur Chittran Roy Amal Kumar Bandyopadhyay 《Bioinformation》2015,11(1):39-42

AvailabilitySBION2 is freely available at http://sourceforge.net/projects/sbion2/ for academic users 相似文献

7.

Bridger: a new framework for de novo transcriptome assembly using RNA-seq data

Zheng Chang Guojun Li Juntao Liu Yu Zhang Cody Ashby Deli Liu Carole L Cramer Xiuzhen Huang 《Genome biology》2015,16(1)

相似文献

8.

COHCAP: an integrative genomic pipeline for single-nucleotide resolution DNA methylation analysis

Charles D. Warden Heehyoung Lee Joshua D. Tompkins Xiaojin Li Charles Wang Arthur D. Riggs Hua Yu Richard Jove Yate-Ching Yuan 《Nucleic acids research》2013,41(11):e117

COHCAP (City of Hope CpG Island Analysis Pipeline) is an algorithm to analyze single-nucleotide resolution DNA methylation data produced by either an Illumina methylation array or targeted bisulfite sequencing. The goal of the COHCAP algorithm is to identify CpG islands that show a consistent pattern of methylation among CpG sites. COHCAP is currently the only DNA methylation package that provides integration with gene expression data to identify a subset of CpG islands that are most likely to regulate downstream gene expression, and it can generate lists of differentially methylated CpG islands with ∼50% concordance with gene expression from both cell line data and heterogeneous patient data. For example, this article describes known breast cancer biomarkers (such as estrogen receptor) with a negative correlation between DNA methylation and gene expression. COHCAP also provides visualization for quality control metrics, regions of differential methylation and correlation between methylation and gene expression. This software is freely available at https://sourceforge.net/projects/cohcap/. 相似文献

9.

Allele-specific copy-number discovery from whole-genome and whole-exome sequencing

WeiBo Wang Wei Wang Wei Sun James J. Crowley Jin P. Szatkiewicz 《Nucleic acids research》2015,43(14):e90

Copy-number variants (CNVs) are a major form of genetic variation and a risk factor for various human diseases, so it is crucial to accurately detect and characterize them. It is conceivable that allele-specific reads from high-throughput sequencing data could be leveraged to both enhance CNV detection and produce allele-specific copy number (ASCN) calls. Although statistical methods have been developed to detect CNVs using whole-genome sequence (WGS) and/or whole-exome sequence (WES) data, information from allele-specific read counts has not yet been adequately exploited. In this paper, we develop an integrated method, called AS-GENSENG, which incorporates allele-specific read counts in CNV detection and estimates ASCN using either WGS or WES data. To evaluate the performance of AS-GENSENG, we conducted extensive simulations, generated empirical data using existing WGS and WES data sets and validated predicted CNVs using an independent methodology. We conclude that AS-GENSENG not only predicts accurate ASCN calls but also improves the accuracy of total copy number calls, owing to its unique ability to exploit information from both total and allele-specific read counts while accounting for various experimental biases in sequence data. Our novel, user-friendly and computationally efficient method and a complete analytic protocol is freely available at https://sourceforge.net/projects/asgenseng/. 相似文献

10.

A Scalable and Accurate Targeted Gene Assembly Tool (SAT-Assembler) for Next-Generation Sequencing Data

Yuan Zhang Yanni Sun James R. Cole 《PLoS computational biology》2014,10(8)

Gene assembly, which recovers gene segments from short reads, is an important step in functional analysis of next-generation sequencing data. Lacking quality reference genomes, de novo assembly is commonly used for RNA-Seq data of non-model organisms and metagenomic data. However, heterogeneous sequence coverage caused by heterogeneous expression or species abundance, similarity between isoforms or homologous genes, and large data size all pose challenges to de novo assembly. As a result, existing assembly tools tend to output fragmented contigs or chimeric contigs, or have high memory footprint. In this work, we introduce a targeted gene assembly program SAT-Assembler, which aims to recover gene families of particular interest to biologists. It addresses the above challenges by conducting family-specific homology search, homology-guided overlap graph construction, and careful graph traversal. It can be applied to both RNA-Seq and metagenomic data. Our experimental results on an Arabidopsis RNA-Seq data set and two metagenomic data sets show that SAT-Assembler has smaller memory usage, comparable or better gene coverage, and lower chimera rate for assembling a set of genes from one or multiple pathways compared with other assembly tools. Moreover, the family-specific design and rapid homology search allow SAT-Assembler to be naturally compatible with parallel computing platforms. The source code of SAT-Assembler is available at https://sourceforge.net/projects/sat-assembler/. The data sets and experimental settings can be found in supplementary material. 相似文献

11.

AutoAssemblyD: a graphical user interface system for several genome assemblers

Adonney Allan de Oliveira Veras Pablo Henrique Caracciolo Gomes de Sá Vasco Azevedo Artur Silva Rommel Thiago Jucá Ramos 《Bioinformation》2013,9(16):840-841

Next-generation sequencing technologies have increased the amount of biological data generated. Thus, bioinformatics has become important because new methods and algorithms are necessary to manipulate and process such data. However, certain challenges have emerged, such as genome assembly using short reads and high-throughput platforms. In this context, several algorithms have been developed, such as Velvet, Abyss, Euler-SR, Mira, Edna, Maq, SHRiMP, Newbler, ALLPATHS, Bowtie and BWA. However, most such assemblers do not have a graphical interface, which makes their use difficult for users without computing experience given the complexity of the assembler syntax. Thus, to make the operation of such assemblers accessible to users without a computing background, we developed AutoAssemblyD, which is a graphical tool for genome assembly submission and remote management by multiple assemblers through XML templates.

Availability

AssemblyD is freely available at https://sourceforge.net/projects/autoassemblyd. It requires Sun jdk 6 or higher. 相似文献

12.

ArrayPlex: distributed,interactive and programmatic access to genome sequence,annotation, ontology,and analytical toolsets

Patrick J Killion Vishwanath R Iyer 《Genome biology》2008,9(11):R159

ArrayPlex is a software package that centrally provides a large number of flexible toolsets useful for functional genomics, including microarray data storage, quality assessments, data visualization, gene annotation retrieval, statistical tests, genomic sequence retrieval and motif analysis. It uses a client-server architecture based on open source components, provides graphical, command-line, and programmatic access to all needed resources, and is extensible by virtue of a documented application programming interface. ArrayPlex is available at http://sourceforge.net/projects/arrayplex/. 相似文献

13.

VDJtools: Unifying Post-analysis of T Cell Receptor Repertoires

Mikhail Shugay Dmitriy V. Bagaev Maria A. Turchaninova Dmitriy A. Bolotin Olga V. Britanova Ekaterina V. Putintseva Mikhail V. Pogorelyy Vadim I. Nazarov Ivan V. Zvyagin Vitalina I. Kirgizova Kirill I. Kirgizov Elena V. Skorobogatova Dmitriy M. Chudakov 《PLoS computational biology》2015,11(11)

Despite the growing number of immune repertoire sequencing studies, the field still lacks software for analysis and comprehension of this high-dimensional data. Here we report VDJtools, a complementary software suite that solves a wide range of T cell receptor (TCR) repertoires post-analysis tasks, provides a detailed tabular output and publication-ready graphics, and is built on top of a flexible API. Using TCR datasets for a large cohort of unrelated healthy donors, twins, and multiple sclerosis patients we demonstrate that VDJtools greatly facilitates the analysis and leads to sound biological conclusions. VDJtools software and documentation are available at https://github.com/mikessh/vdjtools. 相似文献

14.

EXCAVATOR: detecting copy number variants from whole-exome sequencing data

Alberto Magi Lorenzo Tattini Ingrid Cifola Romina D’Aurizio Matteo Benelli Eleonora Mangano Cristina Battaglia Elena Bonora Ants Kurg Marco Seri Pamela Magini Betti Giusti Giovanni Romeo Tommaso Pippucci Gianluca De Bellis Rosanna Abbate Gian Franco Gensini 《Genome biology》2013,14(10):R120

We developed a novel software tool, EXCAVATOR, for the detection of copy number variants (CNVs) from whole-exome sequencing data. EXCAVATOR combines a three-step normalization procedure with a novel heterogeneous hidden Markov model algorithm and a calling method that classifies genomic regions into five copy number states. We validate EXCAVATOR on three datasets and compare the results with three other methods. These analyses show that EXCAVATOR outperforms the other methods and is therefore a valuable tool for the investigation of CNVs in largescale projects, as well as in clinical research and diagnostics. EXCAVATOR is freely available at http://sourceforge.net/projects/excavatortool/. 相似文献

15.

Generation of Artificial FASTQ Files to Evaluate the Performance of Next-Generation Sequencing Pipelines

Matthew Frampton Richard Houlston 《PloS one》2012,7(11)

Pipelines for the analysis of Next-Generation Sequencing (NGS) data are generally composed of a set of different publicly available software, configured together in order to map short reads of a genome and call variants. The fidelity of pipelines is variable. We have developed ArtificialFastqGenerator, which takes a reference genome sequence as input and outputs artificial paired-end FASTQ files containing Phred quality scores. Since these artificial FASTQs are derived from the reference genome, it provides a gold-standard for read-alignment and variant-calling, thereby enabling the performance of any NGS pipeline to be evaluated. The user can customise DNA template/read length, the modelling of coverage based on GC content, whether to use real Phred base quality scores taken from existing FASTQ files, and whether to simulate sequencing errors. Detailed coverage and error summary statistics are outputted. Here we describe ArtificialFastqGenerator and illustrate its implementation in evaluating a typical bespoke NGS analysis pipeline under different experimental conditions. ArtificialFastqGenerator was released in January 2012. Source code, example files and binaries are freely available under the terms of the GNU General Public License v3.0. from https://sourceforge.net/projects/artfastqgen/. 相似文献

16.

DNA-COMPACT: DNA COMpression Based on a Pattern-Aware Contextual Modeling Technique

Pinghao Li Shuang Wang Jihoon Kim Hongkai Xiong Lucila Ohno-Machado Xiaoqian Jiang 《PloS one》2013,8(11)

Genome data are becoming increasingly important for modern medicine. As the rate of increase in DNA sequencing outstrips the rate of increase in disk storage capacity, the storage and data transferring of large genome data are becoming important concerns for biomedical researchers. We propose a two-pass lossless genome compression algorithm, which highlights the synthesis of complementary contextual models, to improve the compression performance. The proposed framework could handle genome compression with and without reference sequences, and demonstrated performance advantages over best existing algorithms. The method for reference-free compression led to bit rates of 1.720 and 1.838 bits per base for bacteria and yeast, which were approximately 3.7% and 2.6% better than the state-of-the-art algorithms. Regarding performance with reference, we tested on the first Korean personal genome sequence data set, and our proposed method demonstrated a 189-fold compression rate, reducing the raw file size from 2986.8 MB to 15.8 MB at a comparable decompression cost with existing algorithms. DNAcompact is freely available at https://sourceforge.net/projects/dnacompact/for research purpose. 相似文献

17.

BioSWR – Semantic Web Services Registry for Bioinformatics

Dmitry Repchevsky Josep Ll. Gelpi 《PloS one》2014,9(9)

相似文献

18.

MutAid: Sanger and NGS Based Integrated Pipeline for Mutation Identification,Validation and Annotation in Human Molecular Genetics

Ram Vinay Pandey Stephan Pabinger Albert Kriegner Andreas Weinh?usel 《PloS one》2016,11(2)

Traditional Sanger sequencing as well as Next-Generation Sequencing have been used for the identification of disease causing mutations in human molecular research. The majority of currently available tools are developed for research and explorative purposes and often do not provide a complete, efficient, one-stop solution. As the focus of currently developed tools is mainly on NGS data analysis, no integrative solution for the analysis of Sanger data is provided and consequently a one-stop solution to analyze reads from both sequencing platforms is not available. We have therefore developed a new pipeline called MutAid to analyze and interpret raw sequencing data produced by Sanger or several NGS sequencing platforms. It performs format conversion, base calling, quality trimming, filtering, read mapping, variant calling, variant annotation and analysis of Sanger and NGS data under a single platform. It is capable of analyzing reads from multiple patients in a single run to create a list of potential disease causing base substitutions as well as insertions and deletions. MutAid has been developed for expert and non-expert users and supports four sequencing platforms including Sanger, Illumina, 454 and Ion Torrent. Furthermore, for NGS data analysis, five read mappers including BWA, TMAP, Bowtie, Bowtie2 and GSNAP and four variant callers including GATK-HaplotypeCaller, SAMTOOLS, Freebayes and VarScan2 pipelines are supported. MutAid is freely available at https://sourceforge.net/projects/mutaid. 相似文献

19.

Sliding Box Docking: a new stand-alone tool for managing docking-based virtual screening along the DNA helix axis

Andrelly Martins-José 《Bioinformation》2013,9(14):750-751

Sliding Box Docking is a program that manages simulations of ligand docking at different defined positions of a three-dimensional DNA structure. The procedure is similar to inverse docking, which is a method that performs docking simulations of a single ligand in the active sites of different targets. Sliding Box Docking manages docking simulations of one ligand into a box that slides along the DNA helix axis in regular steps. For each box position a score is calculated using the separate Autodock Vina software, and the results are automatically plotted. The evaluation of ligand interaction at different DNA locations can highlight the specificity of ligands for different DNA- sequences. When assessing the affinity between ligans AT base pairs, results for docking simulations with a test set that included berenil, distamycin, hoechst 33258, and netropsin were as expected, agreeing well with affinities previously described in the literature.

Availability

Binaries are freely available at https://sourceforge.net/projects/slidingboxdocki 相似文献

20.

cnvHiTSeq: integrative models for high-resolution copy number variation detection and genotyping using population sequencing data

Evangelos Bellos Michael R Johnson Lachlan J M Coin 《Genome biology》2012,13(12):R120

Recent advances in sequencing technologies provide the means for identifying copy number variation (CNV) at an unprecedented resolution. A single next-generation sequencing experiment offers several features that can be used to detect CNV, yet current methods do not incorporate all available signatures into a unified model. cnvHiTSeq is an integrative probabilistic method for CNV discovery and genotyping that jointly analyzes multiple features at the population level. By combining evidence from complementary sources, cnvHiTSeq achieves high genotyping accuracy and a substantial improvement in CNV detection sensitivity over existing methods, while maintaining a low false discovery rate. cnvHiTSeq is available at http://sourceforge.net/projects/cnvhitseq 相似文献