期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Compression of FASTQ and SAM Format Sequencing Data

James K. Bonfield Matthew V. Mahoney 《PloS one》2013,8(3)

Storage and transmission of the data produced by modern DNA sequencing instruments has become a major concern, which prompted the Pistoia Alliance to pose the SequenceSqueeze contest for compression of FASTQ files. We present several compression entries from the competition, Fastqz and Samcomp/Fqzcomp, including the winning entry. These are compared against existing algorithms for both reference based compression (CRAM, Goby) and non-reference based compression (DSRC, BAM) and other recently published competition entries (Quip, SCALCE). The tools are shown to be the new Pareto frontier for FASTQ compression, offering state of the art ratios at affordable CPU costs. All programs are freely available on SourceForge. Fastqz: https://sourceforge.net/projects/fastqz/, fqzcomp: https://sourceforge.net/projects/fqzcomp/, and samcomp: https://sourceforge.net/projects/samcomp/. 相似文献

2.

Bridger: a new framework for de novo transcriptome assembly using RNA-seq data

Zheng Chang Guojun Li Juntao Liu Yu Zhang Cody Ashby Deli Liu Carole L Cramer Xiuzhen Huang 《Genome biology》2015,16(1)

相似文献

3.

PHYSICO2: an UNIX based standalone procedure for computation of physicochemical,window-dependent and substitution based evolutionary properties of protein sequences along with automated block preparation tool,version 2

Shyamashree Banerjee Parth Sarthi Sen Gupta Arnab Nayek Sunit Das Vishma Pratap Sur Pratyay Seth Rifat Nawaz Ul Islam Amal K Bandyopadhyay 《Bioinformation》2015,11(7):366-368

AvailabilityPHYSICO2: is freely available at http://sourceforge.net/projects/physico2/ along with its documentation at https://sourceforge.net/projects/physico2/files/Documentation.pdf/download for all users. 相似文献

4.

BinPacker: Packing-Based De Novo Transcriptome Assembly from RNA-seq Data

Juntao Liu Guojun Li Zheng Chang Ting Yu Bingqiang Liu Rick McMullen Pengyin Chen Xiuzhen Huang 《PLoS computational biology》2016,12(2)

相似文献

5.

Performance evaluation measures for protein complex prediction

Asma Ivazeh Javad Zahiri Maseud Rahgozar Sriganesh Srihari 《Genomics》2019,111(6):1483-1492

Protein complexes play a dominant role in cellular organization and function. Prediction of protein complexes from the network of physical interactions between proteins (PPI networks) has thus become one of the important research areas. Recently, many computational approaches have been developed to identify these complexes. Various performance assessment measures have been proposed for evaluating the efficiency of these methods. However, there are many inconsistencies in the definitions and usage of the measures across the literature. To address this issue, we have gathered and presented the most important performance evaluation measures and developed a tool, named CompEvaluator, to critically assess the protein complex prediction methods. The tool and documentation are publicly available at https://sourceforge.net/projects/compevaluator/files/. 相似文献

6.

Multiple Co-Evolutionary Networks Are Supported by the Common Tertiary Scaffold of the LacI/GalR Proteins

Daniel J. Parente Liskin Swint-Kruse 《PloS one》2013,8(12)

相似文献

7.

A Computational Model for Predicting RNase H Domain of Retrovirus

Sijia Wu Xinman Zhang Jiuqiang Han 《PloS one》2016,11(8)

RNase H (RNH) is a pivotal domain in retrovirus to cleave the DNA-RNA hybrid for continuing retroviral replication. The crucial role indicates that RNH is a promising drug target for therapeutic intervention. However, annotated RNHs in UniProtKB database have still been insufficient for a good understanding of their statistical characteristics so far. In this work, a computational RNH model was proposed to annotate new putative RNHs (np-RNHs) in the retroviruses. It basically predicts RNH domains through recognizing their start and end sites separately with SVM method. The classification accuracy rates are 100%, 99.01% and 97.52% respectively corresponding to jack-knife, 10-fold cross-validation and 5-fold cross-validation test. Subsequently, this model discovered 14,033 np-RNHs after scanning sequences without RNH annotations. All these predicted np-RNHs and annotated RNHs were employed to analyze the length, hydrophobicity and evolutionary relationship of RNH domains. They are all related to retroviral genera, which validates the classification of retroviruses to a certain degree. In the end, a software tool was designed for the application of our prediction model. The software together with datasets involved in this paper can be available for free download at https://sourceforge.net/projects/rhtool/files/?source=navbar. 相似文献

8.

DNA-COMPACT: DNA COMpression Based on a Pattern-Aware Contextual Modeling Technique

Pinghao Li Shuang Wang Jihoon Kim Hongkai Xiong Lucila Ohno-Machado Xiaoqian Jiang 《PloS one》2013,8(11)

Genome data are becoming increasingly important for modern medicine. As the rate of increase in DNA sequencing outstrips the rate of increase in disk storage capacity, the storage and data transferring of large genome data are becoming important concerns for biomedical researchers. We propose a two-pass lossless genome compression algorithm, which highlights the synthesis of complementary contextual models, to improve the compression performance. The proposed framework could handle genome compression with and without reference sequences, and demonstrated performance advantages over best existing algorithms. The method for reference-free compression led to bit rates of 1.720 and 1.838 bits per base for bacteria and yeast, which were approximately 3.7% and 2.6% better than the state-of-the-art algorithms. Regarding performance with reference, we tested on the first Korean personal genome sequence data set, and our proposed method demonstrated a 189-fold compression rate, reducing the raw file size from 2986.8 MB to 15.8 MB at a comparable decompression cost with existing algorithms. DNAcompact is freely available at https://sourceforge.net/projects/dnacompact/for research purpose. 相似文献

9.

Compression of Large genomic datasets using COMRAD on Parallel Computing Platform

Christopher Leela Biji Manu K Madhu Vineetha Vishnu Satheesh Kumar K Vijayakumar Achuthsankar S Nair 《Bioinformation》2015,11(5):267-271

The big data storage is a challenge in a post genome era. Hence, there is a need for high performance computing solutions for managing large genomic data. Therefore, it is of interest to describe a parallel-computing approach using message-passing library for distributing the different compression stages in clusters. The genomic compression helps to reduce the on disk“foot print” of large data volumes of sequences. This supports the computational infrastructure for a more efficient archiving. The approach was shown to find utility in 21 Eukaryotic genomes using stratified sampling in this report. The method achieves an average of 6-fold disk space reduction with three times better compression time than COMRAD.

Availability

The source codes are written in C using message passing libraries and are available at https:// sourceforge.net/ projects/ comradmpi/files / COMRADMPI/ 相似文献

10.

Generation of Artificial FASTQ Files to Evaluate the Performance of Next-Generation Sequencing Pipelines

Matthew Frampton Richard Houlston 《PloS one》2012,7(11)

Pipelines for the analysis of Next-Generation Sequencing (NGS) data are generally composed of a set of different publicly available software, configured together in order to map short reads of a genome and call variants. The fidelity of pipelines is variable. We have developed ArtificialFastqGenerator, which takes a reference genome sequence as input and outputs artificial paired-end FASTQ files containing Phred quality scores. Since these artificial FASTQs are derived from the reference genome, it provides a gold-standard for read-alignment and variant-calling, thereby enabling the performance of any NGS pipeline to be evaluated. The user can customise DNA template/read length, the modelling of coverage based on GC content, whether to use real Phred base quality scores taken from existing FASTQ files, and whether to simulate sequencing errors. Detailed coverage and error summary statistics are outputted. Here we describe ArtificialFastqGenerator and illustrate its implementation in evaluating a typical bespoke NGS analysis pipeline under different experimental conditions. ArtificialFastqGenerator was released in January 2012. Source code, example files and binaries are freely available under the terms of the GNU General Public License v3.0. from https://sourceforge.net/projects/artfastqgen/. 相似文献

11.

Allele-specific copy-number discovery from whole-genome and whole-exome sequencing

WeiBo Wang Wei Wang Wei Sun James J. Crowley Jin P. Szatkiewicz 《Nucleic acids research》2015,43(14):e90

Copy-number variants (CNVs) are a major form of genetic variation and a risk factor for various human diseases, so it is crucial to accurately detect and characterize them. It is conceivable that allele-specific reads from high-throughput sequencing data could be leveraged to both enhance CNV detection and produce allele-specific copy number (ASCN) calls. Although statistical methods have been developed to detect CNVs using whole-genome sequence (WGS) and/or whole-exome sequence (WES) data, information from allele-specific read counts has not yet been adequately exploited. In this paper, we develop an integrated method, called AS-GENSENG, which incorporates allele-specific read counts in CNV detection and estimates ASCN using either WGS or WES data. To evaluate the performance of AS-GENSENG, we conducted extensive simulations, generated empirical data using existing WGS and WES data sets and validated predicted CNVs using an independent methodology. We conclude that AS-GENSENG not only predicts accurate ASCN calls but also improves the accuracy of total copy number calls, owing to its unique ability to exploit information from both total and allele-specific read counts while accounting for various experimental biases in sequence data. Our novel, user-friendly and computationally efficient method and a complete analytic protocol is freely available at https://sourceforge.net/projects/asgenseng/. 相似文献

12.

qpure: A Tool to Estimate Tumor Cellularity from Genome-Wide Single-Nucleotide Polymorphism Profiles

Sarah Song Katia Nones David Miller Ivon Harliwong Karin S. Kassahn Mark Pinese Marina Pajic Anthony J. Gill Amber L. Johns Matthew Anderson Oliver Holmes Conrad Leonard Darrin Taylor Scott Wood Qinying Xu Felicity Newell Mark J. Cowley Jianmin Wu Peter Wilson Lynn Fink Andrew V. Biankin Nic Waddell Sean M. Grimmond John V. Pearson 《PloS one》2012,7(9)

Tumour cellularity, the relative proportion of tumour and normal cells in a sample, affects the sensitivity of mutation detection, copy number analysis, cancer gene expression and methylation profiling. Tumour cellularity is traditionally estimated by pathological review of sectioned specimens; however this method is both subjective and prone to error due to heterogeneity within lesions and cellularity differences between the sample viewed during pathological review and tissue used for research purposes. In this paper we describe a statistical model to estimate tumour cellularity from SNP array profiles of paired tumour and normal samples using shifts in SNP allele frequency at regions of loss of heterozygosity (LOH) in the tumour. We also provide qpure, a software implementation of the method. Our experiments showed that there is a medium correlation 0.42 (-value = 0.0001) between tumor cellularity estimated by qpure and pathology review. Interestingly there is a high correlation 0.87 (-value 2.2e-16) between cellularity estimates by qpure and deep Ion Torrent sequencing of known somatic KRAS mutations; and a weaker correlation 0.32 (-value = 0.004) between IonTorrent sequencing and pathology review. This suggests that qpure may be a more accurate predictor of tumour cellularity than pathology review. qpure can be downloaded from https://sourceforge.net/projects/qpure/. 相似文献

13.

Metagenomic Profiling of Known and Unknown Microbes with MicrobeGPS

Martin S. Lindner Bernhard Y. Renard 《PloS one》2015,10(2)

Microbial community profiling identifies and quantifies organisms in metagenomic sequencing data using either reference based or unsupervised approaches. However, current reference based profiling methods only report the presence and abundance of single reference genomes that are available in databases. Since only a small fraction of environmental genomes is represented in genomic databases, these approaches entail the risk of false identifications and often suggest a higher precision than justified by the data. Therefore, we developed MicrobeGPS, a novel metagenomic profiling approach that overcomes these limitations. MicrobeGPS is the first method that identifies microbiota in the sample and estimates their genomic distances to known reference genomes. With this strategy, MicrobeGPS identifies organisms down to the strain level and highlights possibly inaccurate identifications when the correct reference genome is missing. We demonstrate on three metagenomic datasets with different origin that our approach successfully avoids misleading interpretation of results and additionally provides more accurate results than current profiling methods. Our results indicate that MicrobeGPS can enable reference based taxonomic profiling of complex and less characterized microbial communities. MicrobeGPS is open source and available from https://sourceforge.net/projects/microbegps/ as source code and binary distribution for Windows and Linux operating systems. 相似文献

14.

Proteome-wide profiling of carbonylated proteins and carbonylation sites in HeLa cells under mild oxidative stress conditions

《Free radical biology & medicine》2014

A number of oxidative protein modifications have been well characterized during the past decade. Presumably, reversible oxidative posttranslational modifications (PTMs) play a significant role in redox signaling pathways, whereas irreversible modifications including reactive protein carbonyl groups are harmful, as their levels are typically increased during aging and in certain diseases. Despite compelling evidence linking protein carbonylation to numerous disorders, the underlying molecular mechanisms at the proteome remain to be identified. Recent advancements in analysis of PTMs by mass spectrometry provided new insights into the mechanisms of protein carbonylation, such as protein susceptibility and exact modification sites, but only for a limited number of proteins. Here we report the first proteome-wide study of carbonylated proteins including modification sites in HeLa cells for mild oxidative stress conditions. The analysis relied on our recent strategy utilizing mass spectrometry-based enrichment of carbonylated peptides after DNPH derivatization. Thus a total of 210 carbonylated proteins containing 643 carbonylation sites were consistently identified in three replicates. Most carbonylation sites (284, 44.2%) resulted from oxidation of lysine residues (aminoadipic semialdehyde). Additionally, 121 arginine (18.8%), 121 threonine (18.8%), and 117 proline residues (18.2%) were oxidized to reactive carbonyls. The sequence motifs were significantly enriched for lysine and arginine residues near carbonylation sites (±10 residues). Gene Ontology analysis revealed that 80% of the carbonylated proteins originated from organelles, 50% enrichment of which was demonstrated for the nucleus. Moreover, functional interactions between carbonylated proteins of kinetochore/spindle machinery and centrosome organization were significantly enriched. One-third of the 210 carbonylated proteins identified here are regulated during apoptosis. 相似文献

15.

BioSWR – Semantic Web Services Registry for Bioinformatics

Dmitry Repchevsky Josep Ll. Gelpi 《PloS one》2014,9(9)

相似文献

16.

ADSBET2: Automated Determination of Salt-Bridge Energy-Terms version 2

Arnab Nayek Parth Sarthi Sen Gupta Shyamashree Banerjee Vishma Pratap Sur Pratyay Seth Sunit Das Rifat Nawaz Ul Islam Amal Kumar Bandyopadhyay 《Bioinformation》2015,11(8):413-415

AvailabilityADSBET2 is freely available at http://sourceforge.net/projects/ADSBET2/ for all users. 相似文献

17.

AutoAssemblyD: a graphical user interface system for several genome assemblers

Adonney Allan de Oliveira Veras Pablo Henrique Caracciolo Gomes de Sá Vasco Azevedo Artur Silva Rommel Thiago Jucá Ramos 《Bioinformation》2013,9(16):840-841

Next-generation sequencing technologies have increased the amount of biological data generated. Thus, bioinformatics has become important because new methods and algorithms are necessary to manipulate and process such data. However, certain challenges have emerged, such as genome assembly using short reads and high-throughput platforms. In this context, several algorithms have been developed, such as Velvet, Abyss, Euler-SR, Mira, Edna, Maq, SHRiMP, Newbler, ALLPATHS, Bowtie and BWA. However, most such assemblers do not have a graphical interface, which makes their use difficult for users without computing experience given the complexity of the assembler syntax. Thus, to make the operation of such assemblers accessible to users without a computing background, we developed AutoAssemblyD, which is a graphical tool for genome assembly submission and remote management by multiple assemblers through XML templates.

Availability

AssemblyD is freely available at https://sourceforge.net/projects/autoassemblyd. It requires Sun jdk 6 or higher. 相似文献

18.

PLEK: a tool for predicting long non-coding RNAs and messenger RNAs based on an improved k-mer scheme

Aimin Li Junying Zhang Zhongyin Zhou 《BMC bioinformatics》2014,15(1)

相似文献

19.

EXCAVATOR: detecting copy number variants from whole-exome sequencing data

Alberto Magi Lorenzo Tattini Ingrid Cifola Romina D’Aurizio Matteo Benelli Eleonora Mangano Cristina Battaglia Elena Bonora Ants Kurg Marco Seri Pamela Magini Betti Giusti Giovanni Romeo Tommaso Pippucci Gianluca De Bellis Rosanna Abbate Gian Franco Gensini 《Genome biology》2013,14(10):R120

We developed a novel software tool, EXCAVATOR, for the detection of copy number variants (CNVs) from whole-exome sequencing data. EXCAVATOR combines a three-step normalization procedure with a novel heterogeneous hidden Markov model algorithm and a calling method that classifies genomic regions into five copy number states. We validate EXCAVATOR on three datasets and compare the results with three other methods. These analyses show that EXCAVATOR outperforms the other methods and is therefore a valuable tool for the investigation of CNVs in largescale projects, as well as in clinical research and diagnostics. EXCAVATOR is freely available at http://sourceforge.net/projects/excavatortool/. 相似文献

20.

SBION2: Analyses of Salt Bridges from Multiple Structure Files,Version 2

Parth Sarthi Sen Gupta Arnab Nayek Shyamashree Banerjee Pratyay Seth Sunit Das Vishma Pratap Sur Chittran Roy Amal Kumar Bandyopadhyay 《Bioinformation》2015,11(1):39-42

AvailabilitySBION2 is freely available at http://sourceforge.net/projects/sbion2/ for academic users 相似文献