期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Compression of FASTQ and SAM Format Sequencing Data

James K. Bonfield Matthew V. Mahoney 《PloS one》2013,8(3)

Storage and transmission of the data produced by modern DNA sequencing instruments has become a major concern, which prompted the Pistoia Alliance to pose the SequenceSqueeze contest for compression of FASTQ files. We present several compression entries from the competition, Fastqz and Samcomp/Fqzcomp, including the winning entry. These are compared against existing algorithms for both reference based compression (CRAM, Goby) and non-reference based compression (DSRC, BAM) and other recently published competition entries (Quip, SCALCE). The tools are shown to be the new Pareto frontier for FASTQ compression, offering state of the art ratios at affordable CPU costs. All programs are freely available on SourceForge. Fastqz: https://sourceforge.net/projects/fastqz/, fqzcomp: https://sourceforge.net/projects/fqzcomp/, and samcomp: https://sourceforge.net/projects/samcomp/. 相似文献

2.

cnvHiTSeq: integrative models for high-resolution copy number variation detection and genotyping using population sequencing data

Evangelos Bellos Michael R Johnson Lachlan J M Coin 《Genome biology》2012,13(12):R120

Recent advances in sequencing technologies provide the means for identifying copy number variation (CNV) at an unprecedented resolution. A single next-generation sequencing experiment offers several features that can be used to detect CNV, yet current methods do not incorporate all available signatures into a unified model. cnvHiTSeq is an integrative probabilistic method for CNV discovery and genotyping that jointly analyzes multiple features at the population level. By combining evidence from complementary sources, cnvHiTSeq achieves high genotyping accuracy and a substantial improvement in CNV detection sensitivity over existing methods, while maintaining a low false discovery rate. cnvHiTSeq is available at http://sourceforge.net/projects/cnvhitseq 相似文献

3.

Piecewise Polynomial Representations of Genomic Tracks

Maxime Tarabichi Vincent Detours Tomasz Konopka 《PloS one》2012,7(11)

相似文献

4.

Compression of Large genomic datasets using COMRAD on Parallel Computing Platform

Christopher Leela Biji Manu K Madhu Vineetha Vishnu Satheesh Kumar K Vijayakumar Achuthsankar S Nair 《Bioinformation》2015,11(5):267-271

The big data storage is a challenge in a post genome era. Hence, there is a need for high performance computing solutions for managing large genomic data. Therefore, it is of interest to describe a parallel-computing approach using message-passing library for distributing the different compression stages in clusters. The genomic compression helps to reduce the on disk“foot print” of large data volumes of sequences. This supports the computational infrastructure for a more efficient archiving. The approach was shown to find utility in 21 Eukaryotic genomes using stratified sampling in this report. The method achieves an average of 6-fold disk space reduction with three times better compression time than COMRAD.

Availability

The source codes are written in C using message passing libraries and are available at https:// sourceforge.net/ projects/ comradmpi/files / COMRADMPI/ 相似文献

5.

NESmapper: Accurate Prediction of Leucine-Rich Nuclear Export Signals Using Activity-Based Profiles

Shunichi Kosugi Hiroshi Yanagawa Ryohei Terauchi Satoshi Tabata 《PLoS computational biology》2014,10(9)

The nuclear export of proteins is regulated largely through the exportin/CRM1 pathway, which involves the specific recognition of leucine-rich nuclear export signals (NESs) in the cargo proteins, and modulates nuclear–cytoplasmic protein shuttling by antagonizing the nuclear import activity mediated by importins and the nuclear import signal (NLS). Although the prediction of NESs can help to define proteins that undergo regulated nuclear export, current methods of predicting NESs, including computational tools and consensus-sequence-based searches, have limited accuracy, especially in terms of their specificity. We found that each residue within an NES largely contributes independently and additively to the entire nuclear export activity. We created activity-based profiles of all classes of NESs with a comprehensive mutational analysis in mammalian cells. The profiles highlight a number of specific activity-affecting residues not only at the conserved hydrophobic positions but also in the linker and flanking regions. We then developed a computational tool, NESmapper, to predict NESs by using profiles that had been further optimized by training and combining the amino acid properties of the NES-flanking regions. This tool successfully reduced the considerable number of false positives, and the overall prediction accuracy was higher than that of other methods, including NESsential and Wregex. This profile-based prediction strategy is a reliable way to identify functional protein motifs. NESmapper is available at http://sourceforge.net/projects/nesmapper.

This is a PLOS Computational Biology Software Article

相似文献

6.

PHYSICO2: an UNIX based standalone procedure for computation of physicochemical,window-dependent and substitution based evolutionary properties of protein sequences along with automated block preparation tool,version 2

Shyamashree Banerjee Parth Sarthi Sen Gupta Arnab Nayek Sunit Das Vishma Pratap Sur Pratyay Seth Rifat Nawaz Ul Islam Amal K Bandyopadhyay 《Bioinformation》2015,11(7):366-368

AvailabilityPHYSICO2: is freely available at http://sourceforge.net/projects/physico2/ along with its documentation at https://sourceforge.net/projects/physico2/files/Documentation.pdf/download for all users. 相似文献

7.

qpure: A Tool to Estimate Tumor Cellularity from Genome-Wide Single-Nucleotide Polymorphism Profiles

Sarah Song Katia Nones David Miller Ivon Harliwong Karin S. Kassahn Mark Pinese Marina Pajic Anthony J. Gill Amber L. Johns Matthew Anderson Oliver Holmes Conrad Leonard Darrin Taylor Scott Wood Qinying Xu Felicity Newell Mark J. Cowley Jianmin Wu Peter Wilson Lynn Fink Andrew V. Biankin Nic Waddell Sean M. Grimmond John V. Pearson 《PloS one》2012,7(9)

Tumour cellularity, the relative proportion of tumour and normal cells in a sample, affects the sensitivity of mutation detection, copy number analysis, cancer gene expression and methylation profiling. Tumour cellularity is traditionally estimated by pathological review of sectioned specimens; however this method is both subjective and prone to error due to heterogeneity within lesions and cellularity differences between the sample viewed during pathological review and tissue used for research purposes. In this paper we describe a statistical model to estimate tumour cellularity from SNP array profiles of paired tumour and normal samples using shifts in SNP allele frequency at regions of loss of heterozygosity (LOH) in the tumour. We also provide qpure, a software implementation of the method. Our experiments showed that there is a medium correlation 0.42 (-value = 0.0001) between tumor cellularity estimated by qpure and pathology review. Interestingly there is a high correlation 0.87 (-value 2.2e-16) between cellularity estimates by qpure and deep Ion Torrent sequencing of known somatic KRAS mutations; and a weaker correlation 0.32 (-value = 0.004) between IonTorrent sequencing and pathology review. This suggests that qpure may be a more accurate predictor of tumour cellularity than pathology review. qpure can be downloaded from https://sourceforge.net/projects/qpure/. 相似文献

8.

BinPacker: Packing-Based De Novo Transcriptome Assembly from RNA-seq Data

Juntao Liu Guojun Li Zheng Chang Ting Yu Bingqiang Liu Rick McMullen Pengyin Chen Xiuzhen Huang 《PLoS computational biology》2016,12(2)

相似文献

9.

SEWAL: an open-source platform for next-generation sequence analysis and visualization

Jason N. Pitt Indika Rajapakse Adrian R. Ferré-D’Amaré 《Nucleic acids research》2010,38(22):7908-7915

Next-generation DNA sequencing platforms provide exciting new possibilities for in vitro genetic analysis of functional nucleic acids. However, the size of the resulting data sets presents computational and analytical challenges. We present an open-source software package that employs a locality-sensitive hashing algorithm to enumerate all unique sequences in an entire Illumina sequencing run (∼10⁸ sequences). The algorithm results in quasilinear time processing of entire Illumina lanes (∼10⁷ sequences) on a desktop computer in minutes. To facilitate visual analysis of sequencing data, the software produces three-dimensional scatter plots similar in concept to Sewall Wright and John Maynard Smith’s adaptive or fitness landscape. The software also contains functions that are particularly useful for doped selections such as mutation frequency analysis, information content calculation, multivariate statistical functions (including principal component analysis), sequence distance metrics, sequence searches and sequence comparisons across multiple Illumina data sets. Source code, executable files and links to sample data sets are available at http://www.sourceforge.net/projects/sewal. 相似文献

10.

ADSBET2: Automated Determination of Salt-Bridge Energy-Terms version 2

Arnab Nayek Parth Sarthi Sen Gupta Shyamashree Banerjee Vishma Pratap Sur Pratyay Seth Sunit Das Rifat Nawaz Ul Islam Amal Kumar Bandyopadhyay 《Bioinformation》2015,11(8):413-415

AvailabilityADSBET2 is freely available at http://sourceforge.net/projects/ADSBET2/ for all users. 相似文献

11.

LoFreq: a sequence-quality aware,ultra-sensitive variant caller for uncovering cell-population heterogeneity from high-throughput sequencing datasets

Andreas Wilm Pauline Poh Kim Aw Denis Bertrand Grace Hui Ting Yeo Swee Hoe Ong Chang Hua Wong Chiea Chuen Khor Rosemary Petric Martin Lloyd Hibberd Niranjan Nagarajan 《Nucleic acids research》2012,40(22):11189-11201

The study of cell-population heterogeneity in a range of biological systems, from viruses to bacterial isolates to tumor samples, has been transformed by recent advances in sequencing throughput. While the high-coverage afforded can be used, in principle, to identify very rare variants in a population, existing ad hoc approaches frequently fail to distinguish true variants from sequencing errors. We report a method (LoFreq) that models sequencing run-specific error rates to accurately call variants occurring in <0.05% of a population. Using simulated and real datasets (viral, bacterial and human), we show that LoFreq has near-perfect specificity, with significantly improved sensitivity compared with existing methods and can efficiently analyze deep Illumina sequencing datasets without resorting to approximations or heuristics. We also present experimental validation for LoFreq on two different platforms (Fluidigm and Sequenom) and its application to call rare somatic variants from exome sequencing datasets for gastric cancer. Source code and executables for LoFreq are freely available at http://sourceforge.net/projects/lofreq/. 相似文献

12.

Detection of Homologous Recombination Events in Bacterial Genomes

Wei-Bung Wang Tao Jiang Shea Gardner 《PloS one》2013,8(10)

We study the detection of mutations, sequencing errors, and homologous recombination events (HREs) in a set of closely related microbial genomes. We base the model on single nucleotide polymorphisms (SNPs) and break the genomes into blocks to handle the rearrangement problem. Then we apply a dynamic programming algorithm to model whether changes within each block are likely a result of mutations, sequencing errors, or HREs. Results from simulation experiments show that we can detect 31%–61% of HREs and the precision of our detection is about 48%–90% depending on the rates of mutation and missing data. The HREfinder software for predicting HREs in a set of whole genomes is available as open source (http://sourceforge.net/projects/hrefinder/). 相似文献

13.

SBION2: Analyses of Salt Bridges from Multiple Structure Files,Version 2

Parth Sarthi Sen Gupta Arnab Nayek Shyamashree Banerjee Pratyay Seth Sunit Das Vishma Pratap Sur Chittran Roy Amal Kumar Bandyopadhyay 《Bioinformation》2015,11(1):39-42

AvailabilitySBION2 is freely available at http://sourceforge.net/projects/sbion2/ for academic users 相似文献

14.

An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaea 总被引：1，自引：0，他引：1

Daniel McDonald Morgan N Price Julia Goodrich Eric P Nawrocki Todd Z DeSantis Alexander Probst Gary L Andersen Rob Knight Philip Hugenholtz 《The ISME journal》2012,6(3):610-618

Reference phylogenies are crucial for providing a taxonomic framework for interpretation of marker gene and metagenomic surveys, which continue to reveal novel species at a remarkable rate. Greengenes is a dedicated full-length 16S rRNA gene database that provides users with a curated taxonomy based on de novo tree inference. We developed a ‘taxonomy to tree'' approach for transferring group names from an existing taxonomy to a tree topology, and used it to apply the Greengenes, National Center for Biotechnology Information (NCBI) and cyanoDB (Cyanobacteria only) taxonomies to a de novo tree comprising 408 315 sequences. We also incorporated explicit rank information provided by the NCBI taxonomy to group names (by prefixing rank designations) for better user orientation and classification consistency. The resulting merged taxonomy improved the classification of 75% of the sequences by one or more ranks relative to the original NCBI taxonomy with the most pronounced improvements occurring in under-classified environmental sequences. We also assessed candidate phyla (divisions) currently defined by NCBI and present recommendations for consolidation of 34 redundantly named groups. All intermediate results from the pipeline, which includes tree inference, jackknifing and transfer of a donor taxonomy to a recipient tree (tax2tree) are available for download. The improved Greengenes taxonomy should provide important infrastructure for a wide range of megasequencing projects studying ecosystems on scales ranging from our own bodies (the Human Microbiome Project) to the entire planet (the Earth Microbiome Project). The implementation of the software can be obtained from http://sourceforge.net/projects/tax2tree/. 相似文献

15.

detectIR: A Novel Program for Detecting Perfect and Imperfect Inverted Repeats Using Complex Numbers and Vector Calculation

Congting Ye Guoli Ji Lei Li Chun Liang 《PloS one》2014,9(11)

Inverted repeats are present in abundance in both prokaryotic and eukaryotic genomes and can form DNA secondary structures – hairpins and cruciforms that are involved in many important biological processes. Bioinformatics tools for efficient and accurate detection of inverted repeats are desirable, because existing tools are often less accurate and time consuming, sometimes incapable of dealing with genome-scale input data. Here, we present a MATLAB-based program called detectIR for the perfect and imperfect inverted repeat detection that utilizes complex numbers and vector calculation and allows genome-scale data inputs. A novel algorithm is adopted in detectIR to convert the conventional sequence string comparison in inverted repeat detection into vector calculation of complex numbers, allowing non-complementary pairs (mismatches) in the pairing stem and a non-palindromic spacer (loop or gaps) in the middle of inverted repeats. Compared with existing popular tools, our program performs with significantly higher accuracy and efficiency. Using genome sequence data from HIV-1, Arabidopsis thaliana, Homo sapiens and Zea mays for comparison, detectIR can find lots of inverted repeats missed by existing tools whose outputs often contain many invalid cases. detectIR is open source and its source code is freely available at: https://sourceforge.net/projects/detectir. 相似文献

16.

Bridger: a new framework for de novo transcriptome assembly using RNA-seq data

Zheng Chang Guojun Li Juntao Liu Yu Zhang Cody Ashby Deli Liu Carole L Cramer Xiuzhen Huang 《Genome biology》2015,16(1)

相似文献

17.

Mobster: accurate detection of mobile element insertions in next generation sequencing data

Djie Tjwan Thung Joep de Ligt Lisenka EM Vissers Marloes Steehouwer Mark Kroon Petra de Vries Eline P Slagboom Kai Ye Joris A Veltman Jayne Y Hehir-Kwa 《Genome biology》2014,15(10)

Mobile elements are major drivers in changing genomic architecture and can cause disease. The detection of mobile elements is hindered due to the low mappability of their highly repetitive sequences. We have developed an algorithm, called Mobster, to detect non-reference mobile element insertions in next generation sequencing data from both whole genome and whole exome studies. Mobster uses discordant read pairs and clipped reads in combination with consensus sequences of known active mobile elements. Mobster has a low false discovery rate and high recall rate for both L1 and Alu elements. Mobster is available at http://sourceforge.net/projects/mobster.

Electronic supplementary material

The online version of this article (doi:10.1186/s13059-014-0488-x) contains supplementary material, which is available to authorized users. 相似文献

18.

High Resolution Systematic Digital Histological Quantification of Cardiac Fibrosis and Adipose Tissue in Phospholamban p.Arg14del Mutation Associated Cardiomyopathy

Johannes M. I. H. Gho René van Es Nikolas Stathonikos Magdalena Harakalova Wouter P. te Rijdt Albert J. H. Suurmeijer Jeroen F. van der Heijden Nicolaas de Jonge Steven A. J. Chamuleau Roel A. de Weger Folkert W. Asselbergs Aryan Vink 《PloS one》2014,9(4)

相似文献

19.

Multiple Co-Evolutionary Networks Are Supported by the Common Tertiary Scaffold of the LacI/GalR Proteins

Daniel J. Parente Liskin Swint-Kruse 《PloS one》2013,8(12)

相似文献

20.

FastUniq: A Fast De Novo Duplicates Removal Tool for Paired Short Reads

Haibin Xu Xiang Luo Jun Qian Xiaohui Pang Jingyuan Song Guangrui Qian Jinhui Chen Shilin Chen 《PloS one》2012,7(12)

The presence of duplicates introduced by PCR amplification is a major issue in paired short reads from next-generation sequencing platforms. These duplicates might have a serious impact on research applications, such as scaffolding in whole-genome sequencing and discovering large-scale genome variations, and are usually removed. We present FastUniq as a fast de novo tool for removal of duplicates in paired short reads. FastUniq identifies duplicates by comparing sequences between read pairs and does not require complete genome sequences as prerequisites. FastUniq is capable of simultaneously handling reads with different lengths and results in highly efficient running time, which increases linearly at an average speed of 87 million reads per 10 minutes. FastUniq is freely available at http://sourceforge.net/projects/fastuniq/. 相似文献