首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We describe methclone, a novel method to identify epigenetic loci that harbor large changes in the clonality of their epialleles (epigenetic alleles). Methclone efficiently analyzes genome-wide DNA methylation sequencing data. We quantify the changes using a composition entropy difference calculation and also introduce a new measure of global clonality shift, loci with epiallele shift per million loci covered, which enables comparisons between different samples to gauge overall epiallelic dynamics. Finally, we demonstrate the utility of methclone in capturing functional epiallele shifts in leukemia patients from diagnosis to relapse. Methclone is open-source and freely available at https://code.google.com/p/methclone.

Electronic supplementary material

The online version of this article (doi:10.1186/s13059-014-0472-5) contains supplementary material, which is available to authorized users.  相似文献   

2.
It is computationally challenging to detect variation by aligning single-molecule sequencing (SMS) reads, or contigs from SMS assemblies. One approach to efficiently align SMS reads is sparse dynamic programming (SDP), where optimal chains of exact matches are found between the sequence and the genome. While straightforward implementations of SDP penalize gaps with a cost that is a linear function of gap length, biological variation is more accurately represented when gap cost is a concave function of gap length. We have developed a method, lra, that uses SDP with a concave-cost gap penalty, and used lra to align long-read sequences from PacBio and Oxford Nanopore (ONT) instruments as well as de novo assembly contigs. This alignment approach increases sensitivity and specificity for SV discovery, particularly for variants above 1kb and when discovering variation from ONT reads, while having runtime that are comparable (1.05-3.76×) to current methods. When applied to calling variation from de novo assembly contigs, there is a 3.2% increase in Truvari F1 score compared to minimap2+htsbox. lra is available in bioconda (https://anaconda.org/bioconda/lra) and github (https://github.com/ChaissonLab/LRA).  相似文献   

3.
Gene assembly, which recovers gene segments from short reads, is an important step in functional analysis of next-generation sequencing data. Lacking quality reference genomes, de novo assembly is commonly used for RNA-Seq data of non-model organisms and metagenomic data. However, heterogeneous sequence coverage caused by heterogeneous expression or species abundance, similarity between isoforms or homologous genes, and large data size all pose challenges to de novo assembly. As a result, existing assembly tools tend to output fragmented contigs or chimeric contigs, or have high memory footprint. In this work, we introduce a targeted gene assembly program SAT-Assembler, which aims to recover gene families of particular interest to biologists. It addresses the above challenges by conducting family-specific homology search, homology-guided overlap graph construction, and careful graph traversal. It can be applied to both RNA-Seq and metagenomic data. Our experimental results on an Arabidopsis RNA-Seq data set and two metagenomic data sets show that SAT-Assembler has smaller memory usage, comparable or better gene coverage, and lower chimera rate for assembling a set of genes from one or multiple pathways compared with other assembly tools. Moreover, the family-specific design and rapid homology search allow SAT-Assembler to be naturally compatible with parallel computing platforms. The source code of SAT-Assembler is available at https://sourceforge.net/projects/sat-assembler/. The data sets and experimental settings can be found in supplementary material.  相似文献   

4.
PathVisio is a commonly used pathway editor, visualization and analysis software. Biological pathways have been used by biologists for many years to describe the detailed steps in biological processes. Those powerful, visual representations help researchers to better understand, share and discuss knowledge. Since the first publication of PathVisio in 2008, the original paper was cited more than 170 times and PathVisio was used in many different biological studies. As an online editor PathVisio is also integrated in the community curated pathway database WikiPathways.Here we present the third version of PathVisio with the newest additions and improvements of the application. The core features of PathVisio are pathway drawing, advanced data visualization and pathway statistics. Additionally, PathVisio 3 introduces a new powerful extension systems that allows other developers to contribute additional functionality in form of plugins without changing the core application.PathVisio can be downloaded from http://www.pathvisio.org and in 2014 PathVisio 3 has been downloaded over 5,500 times. There are already more than 15 plugins available in the central plugin repository. PathVisio is a freely available, open-source tool published under the Apache 2.0 license (http://www.apache.org/licenses/LICENSE-2.0). It is implemented in Java and thus runs on all major operating systems. The code repository is available at http://svn.bigcat.unimaas.nl/pathvisio. The support mailing list for users is available on https://groups.google.com/forum/#!forum/wikipathways-discuss and for developers on https://groups.google.com/forum/#!forum/wikipathways-devel.
This is a PLOS Computational Biology software article.
  相似文献   

5.
6.
Bisulfite sequencing (BS-seq) is the gold standard for studying genome-wide DNA methylation. We developed MOABS to increase the speed, accuracy, statistical power and biological relevance of BS-seq data analysis. MOABS detects differential methylation with 10-fold coverage at single-CpG resolution based on a Beta-Binomial hierarchical model and is capable of processing two billion reads in 24 CPU hours. Here, using simulated and real BS-seq data, we demonstrate that MOABS outperforms other leading algorithms, such as Fisher’s exact test and BSmooth. Furthermore, MOABS analysis can be easily extended to differential 5hmC analysis using RRBS and oxBS-seq. MOABS is available at http://code.google.com/p/moabs/.  相似文献   

7.
DNA methylation is a chemical modification of cytosine bases that is pivotal for gene regulation, cellular specification and cancer development. Here, we describe an R package, methylKit, that rapidly analyzes genome-wide cytosine epigenetic profiles from high-throughput methylation and hydroxymethylation sequencing experiments. methylKit includes functions for clustering, sample quality visualization, differential methylation analysis and annotation features, thus automating and simplifying many of the steps for discerning statistically significant bases or regions of DNA methylation. Finally, we demonstrate methylKit on breast cancer data, in which we find statistically significant regions of differential methylation and stratify tumor subtypes. methylKit is available at http://code.google.com/p/methylkit.  相似文献   

8.
Scaffolding, i.e. ordering and orienting contigs is an important step in genome assembly. We present a method for scaffolding using second generation sequencing reads based on likelihoods of genome assemblies. A generative model for sequencing is used to obtain maximum likelihood estimates of gaps between contigs and to estimate whether linking contigs into scaffolds would lead to an increase in the likelihood of the assembly. We then link contigs if they can be unambiguously joined or if the corresponding increase in likelihood is substantially greater than that of other possible joins of those contigs. The method is implemented in a tool called Swalo with approximations to make it efficient and applicable to large datasets. Analysis on real and simulated datasets reveals that it consistently makes more or similar number of correct joins as other scaffolders while linking very few contigs incorrectly, thus outperforming other scaffolders and demonstrating that substantial improvement in genome assembly may be achieved through the use of statistical models. Swalo is freely available for download at https://atifrahman.github.io/SWALO/.  相似文献   

9.
Clustering analysis is an important tool in studying gene expression data. The Bayesian hierarchical clustering (BHC) algorithm can automatically infer the number of clusters and uses Bayesian model selection to improve clustering quality. In this paper, we present an extension of the BHC algorithm. Our Gaussian BHC (GBHC) algorithm represents data as a mixture of Gaussian distributions. It uses normal-gamma distribution as a conjugate prior on the mean and precision of each of the Gaussian components. We tested GBHC over 11 cancer and 3 synthetic datasets. The results on cancer datasets show that in sample clustering, GBHC on average produces a clustering partition that is more concordant with the ground truth than those obtained from other commonly used algorithms. Furthermore, GBHC frequently infers the number of clusters that is often close to the ground truth. In gene clustering, GBHC also produces a clustering partition that is more biologically plausible than several other state-of-the-art methods. This suggests GBHC as an alternative tool for studying gene expression data.The implementation of GBHC is available at https://sites.google.com/site/gaussianbhc/  相似文献   

10.
When working on an ongoing genome sequencing and assembly project, it is rather inconvenient when gene identifiers change from one build of the assembly to the next. The gene labelling system described here, UniqTag, addresses this common challenge. UniqTag assigns a unique identifier to each gene that is a representative k-mer, a string of length k, selected from the sequence of that gene. Unlike serial numbers, these identifiers are stable between different assemblies and annotations of the same data without requiring that previous annotations be lifted over by sequence alignment. We assign UniqTag identifiers to ten builds of the Ensembl human genome spanning eight years to demonstrate this stability. The implementation of UniqTag in Ruby and an R package are available at https://github.com/sjackman/uniqtag sjackman/uniqtag. The R package is also available from CRAN: install.packages ("uniqtag"). Supplementary material and code to reproduce it is available at https://github.com/sjackman/uniqtag-paper.  相似文献   

11.
12.
13.
Many tumors are composed of genetically divergent cell subpopulations. We report SubcloneSeeker, a package capable of exhaustive identification of subclone structures and evolutionary histories with bulk somatic variant allele frequency measurements from tumor biopsies. We present a statistical framework to elucidate whether specific sets of mutations are present within the same subclones, and the order in which they occur. We demonstrate how subclone reconstruction provides crucial information about tumorigenesis and relapse mechanisms; guides functional study by variant prioritization, and has the potential as a rational basis for informed therapeutic strategies for the patient. SubcloneSeeker is available at: https://github.com/yiq/SubcloneSeeker.

Electronic supplementary material

The online version of this article (doi:10.1186/s13059-014-0443-x) contains supplementary material, which is available to authorized users.  相似文献   

14.

Background

Next generation sequencing technology has allowed efficient production of draft genomes for many organisms of interest. However, most draft genomes are just collections of independent contigs, whose relative positions and orientations along the genome being sequenced are unknown. Although several tools have been developed to order and orient the contigs of draft genomes, more accurate tools are still needed.

Results

In this study, we present a novel reference-based contig assembly (or scaffolding) tool, named as CAR, that can efficiently and more accurately order and orient the contigs of a prokaryotic draft genome based on a reference genome of a related organism. Given a set of contigs in multi-FASTA format and a reference genome in FASTA format, CAR can output a list of scaffolds, each of which is a set of ordered and oriented contigs. For validation, we have tested CAR on a real dataset composed of several prokaryotic genomes and also compared its performance with several other reference-based contig assembly tools. Consequently, our experimental results have shown that CAR indeed performs better than all these other reference-based contig assembly tools in terms of sensitivity, precision and genome coverage.

Conclusions

CAR serves as an efficient tool that can more accurately order and orient the contigs of a prokaryotic draft genome based on a reference genome. The web server of CAR is freely available at http://genome.cs.nthu.edu.tw/CAR/ and its stand-alone program can also be downloaded from the same website.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-014-0381-3) contains supplementary material, which is available to authorized users.  相似文献   

15.
MiRNAs play important roles in many diseases including cancers. However computational prediction of miRNA target genes is challenging and the accuracies of existing methods remain poor. We report mirMark, a new machine learning-based method of miRNA target prediction at the site and UTR levels. This method uses experimentally verified miRNA targets from miRecords and mirTarBase as training sets and considers over 700 features. By combining Correlation-based Feature Selection with a variety of statistical or machine learning methods for the site- and UTR-level classifiers, mirMark significantly improves the overall predictive performance compared to existing publicly available methods. MirMark is available from https://github.com/lanagarmire/MirMark.

Electronic supplementary material

The online version of this article (doi:10.1186/s13059-014-0500-5) contains supplementary material, which is available to authorized users.  相似文献   

16.
Protein designers use a wide variety of software tools for de novo design, yet their repertoire still lacks a fast and interactive all-atom search engine. To solve this, we have built the Suns program: a real-time, atomic search engine integrated into the PyMOL molecular visualization system. Users build atomic-level structural search queries within PyMOL and receive a stream of search results aligned to their query within a few seconds. This instant feedback cycle enables a new “designability”-inspired approach to protein design where the designer searches for and interactively incorporates native-like fragments from proven protein structures. We demonstrate the use of Suns to interactively build protein motifs, tertiary interactions, and to identify scaffolds compatible with hot-spot residues. The official web site and installer are located at http://www.degradolab.org/suns/ and the source code is hosted at https://github.com/godotgildor/Suns (PyMOL plugin, BSD license), https://github.com/Gabriel439/suns-cmd (command line client, BSD license), and https://github.com/Gabriel439/suns-search (search engine server, GPLv2 license).
This is a PLOS Computational Biology Software Article
  相似文献   

17.
Tumors often contain multiple subpopulations of cancerous cells defined by distinct somatic mutations. We describe a new method, PhyloWGS, which can be applied to whole-genome sequencing data from one or more tumor samples to reconstruct complete genotypes of these subpopulations based on variant allele frequencies (VAFs) of point mutations and population frequencies of structural variations. We introduce a principled phylogenic correction for VAFs in loci affected by copy number alterations and we show that this correction greatly improves subclonal reconstruction compared to existing methods. PhyloWGS is free, open-source software, available at https://github.com/morrislab/phylowgs.

Electronic supplementary material

The online version of this article (doi:10.1186/s13059-015-0602-8) contains supplementary material, which is available to authorized users.  相似文献   

18.
We describe MetAMOS, an open source and modular metagenomic assembly and analysis pipeline. MetAMOS represents an important step towards fully automated metagenomic analysis, starting with next-generation sequencing reads and producing genomic scaffolds, open-reading frames and taxonomic or functional annotations. MetAMOS can aid in reducing assembly errors, commonly encountered when assembling metagenomic samples, and improves taxonomic assignment accuracy while also reducing computational cost. MetAMOS can be downloaded from: https://github.com/treangen/MetAMOS.  相似文献   

19.
We describe an open-source kPAL package that facilitates an alignment-free assessment of the quality and comparability of sequencing datasets by analyzing k-mer frequencies. We show that kPAL can detect technical artefacts such as high duplication rates, library chimeras, contamination and differences in library preparation protocols. kPAL also successfully captures the complexity and diversity of microbiomes and provides a powerful means to study changes in microbial communities. Together, these features make kPAL an attractive and broadly applicable tool to determine the quality and comparability of sequence libraries even in the absence of a reference sequence. kPAL is freely available at https://github.com/LUMC/kPAL.

Electronic supplementary material

The online version of this article (doi:10.1186/s13059-014-0555-3) contains supplementary material, which is available to authorized users.  相似文献   

20.
RNA-binding proteins (RBPs) bind to their target RNA molecules by recognizing specific RNA sequences and structural contexts. The development of CLIP-seq and related protocols has made it possible to exhaustively identify RNA fragments that bind to RBPs. However, no efficient bioinformatics method exists to reveal the structural specificities of RBP–RNA interactions using these data. We present CapR, an efficient algorithm that calculates the probability that each RNA base position is located within each secondary structural context. Using CapR, we demonstrate that several RBPs bind to their target RNA molecules under specific structural contexts. CapR is available at https://sites.google.com/site/fukunagatsu/software/capr.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号