首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We present GobyWeb, a web-based system that facilitates the management and analysis of high-throughput sequencing (HTS) projects. The software provides integrated support for a broad set of HTS analyses and offers a simple plugin extension mechanism. Analyses currently supported include quantification of gene expression for messenger and small RNA sequencing, estimation of DNA methylation (i.e., reduced bisulfite sequencing and whole genome methyl-seq), or the detection of pathogens in sequenced data. In contrast to previous analysis pipelines developed for analysis of HTS data, GobyWeb requires significantly less storage space, runs analyses efficiently on a parallel grid, scales gracefully to process tens or hundreds of multi-gigabyte samples, yet can be used effectively by researchers who are comfortable using a web browser. We conducted performance evaluations of the software and found it to either outperform or have similar performance to analysis programs developed for specialized analyses of HTS data. We found that most biologists who took a one-hour GobyWeb training session were readily able to analyze RNA-Seq data with state of the art analysis tools. GobyWeb can be obtained at http://gobyweb.campagnelab.org and is freely available for non-commercial use. GobyWeb plugins are distributed in source code and licensed under the open source LGPL3 license to facilitate code inspection, reuse and independent extensions http://github.com/CampagneLaboratory/gobyweb2-plugins.  相似文献   

2.
3.
Analysis of bisulfite sequencing data usually requires two tasks: to call methylated cytosines (mCs) in a sample, and to detect differentially methylated regions (DMRs) between paired samples. Although numerous tools have been proposed for mC calling, methods for DMR detection have been largely limited. Here, we present Bisulfighter, a new software package for detecting mCs and DMRs from bisulfite sequencing data. Bisulfighter combines the LAST alignment tool for mC calling, and a novel framework for DMR detection based on hidden Markov models (HMMs). Unlike previous attempts that depend on empirical parameters, Bisulfighter can use the expectation-maximization algorithm for HMMs to adjust parameters for each data set. We conduct extensive experiments in which accuracy of mC calling and DMR detection is evaluated on simulated data with various mC contexts, read qualities, sequencing depths and DMR lengths, as well as on real data from a wide range of biological processes. We demonstrate that Bisulfighter consistently achieves better accuracy than other published tools, providing greater sensitivity for mCs with fewer false positives, more precise estimates of mC levels, more exact locations of DMRs and better agreement of DMRs with gene expression and DNase I hypersensitivity. The source code is available at http://epigenome.cbrc.jp/bisulfighter.  相似文献   

4.
We introduce a novel computational approach, CoReCo, for comparative metabolic reconstruction and provide genome-scale metabolic network models for 49 important fungal species. Leveraging on the exponential growth in sequenced genome availability, our method reconstructs genome-scale gapless metabolic networks simultaneously for a large number of species by integrating sequence data in a probabilistic framework. High reconstruction accuracy is demonstrated by comparisons to the well-curated Saccharomyces cerevisiae consensus model and large-scale knock-out experiments. Our comparative approach is particularly useful in scenarios where the quality of available sequence data is lacking, and when reconstructing evolutionary distant species. Moreover, the reconstructed networks are fully carbon mapped, allowing their use in 13C flux analysis. We demonstrate the functionality and usability of the reconstructed fungal models with computational steady-state biomass production experiment, as these fungi include some of the most important production organisms in industrial biotechnology. In contrast to many existing reconstruction techniques, only minimal manual effort is required before the reconstructed models are usable in flux balance experiments. CoReCo is available at http://esaskar.github.io/CoReCo/.  相似文献   

5.

Background

The Immunoglobulins (IG) and the T cell receptors (TR) play the key role in antigen recognition during the adaptive immune response. Recent progress in next-generation sequencing technologies has provided an opportunity for the deep T cell receptor repertoire profiling. However, a specialised software is required for the rational analysis of massive data generated by next-generation sequencing.

Results

Here we introduce tcR, a new R package, representing a platform for the advanced analysis of T cell receptor repertoires, which includes diversity measures, shared T cell receptor sequences identification, gene usage statistics computation and other widely used methods. The tool has proven its utility in recent research studies.

Conclusions

tcR is an R package for the advanced analysis of T cell receptor repertoires after primary TR sequences extraction from raw sequencing reads. The stable version can be directly installed from The Comprehensive R Archive Network (http://cran.r-project.org/mirrors.html). The source code and development version are available at tcR GitHub (http://imminfo.github.io/tcr/) along with the full documentation and typical usage examples.  相似文献   

6.
Systems biologists aim to decipher the structure and dynamics of signaling and regulatory networks underpinning cellular responses; synthetic biologists can use this insight to alter existing networks or engineer de novo ones. Both tasks will benefit from an understanding of which structural and dynamic features of networks can emerge from evolutionary processes, through which intermediary steps these arise, and whether they embody general design principles. As natural evolution at the level of network dynamics is difficult to study, in silico evolution of network models can provide important insights. However, current tools used for in silico evolution of network dynamics are limited to ad hoc computer simulations and models. Here we introduce BioJazz, an extendable, user-friendly tool for simulating the evolution of dynamic biochemical networks. Unlike previous tools for in silico evolution, BioJazz allows for the evolution of cellular networks with unbounded complexity by combining rule-based modeling with an encoding of networks that is akin to a genome. We show that BioJazz can be used to implement biologically realistic selective pressures and allows exploration of the space of network architectures and dynamics that implement prescribed physiological functions. BioJazz is provided as an open-source tool to facilitate its further development and use. Source code and user manuals are available at: http://oss-lab.github.io/biojazz and http://osslab.lifesci.warwick.ac.uk/BioJazz.aspx.  相似文献   

7.
8.
9.
Scaffolding, i.e. ordering and orienting contigs is an important step in genome assembly. We present a method for scaffolding using second generation sequencing reads based on likelihoods of genome assemblies. A generative model for sequencing is used to obtain maximum likelihood estimates of gaps between contigs and to estimate whether linking contigs into scaffolds would lead to an increase in the likelihood of the assembly. We then link contigs if they can be unambiguously joined or if the corresponding increase in likelihood is substantially greater than that of other possible joins of those contigs. The method is implemented in a tool called Swalo with approximations to make it efficient and applicable to large datasets. Analysis on real and simulated datasets reveals that it consistently makes more or similar number of correct joins as other scaffolders while linking very few contigs incorrectly, thus outperforming other scaffolders and demonstrating that substantial improvement in genome assembly may be achieved through the use of statistical models. Swalo is freely available for download at https://atifrahman.github.io/SWALO/.  相似文献   

10.
DNA methylation is an important epigenetic modification involved in gene regulation, which can now be measured using whole-genome bisulfite sequencing. However, cost, complexity of the data, and lack of comprehensive analytical tools are major challenges that keep this technology from becoming widely applied. Here we present BSmooth, an alignment, quality control and analysis pipeline that provides accurate and precise results even with low coverage data, appropriately handling biological replicates. BSmooth is open source software, and can be downloaded from http://rafalab.jhsph.edu/bsmooth.  相似文献   

11.
DNA methylation is an epigenetic modification critical for normal development and diseases. The determination of genome-wide DNA methylation at single-nucleotide resolution is made possible by sequencing bisulfite treated DNA with next generation high-throughput sequencing. However, aligning bisulfite short reads to a reference genome remains challenging as only a limited proportion of them (around 50–70%) can be aligned uniquely; a significant proportion, known as multireads, are mapped to multiple locations and thus discarded from downstream analyses, causing financial waste and biased methylation inference. To address this issue, we develop a Bayesian model that assigns multireads to their most likely locations based on the posterior probability derived from information hidden in uniquely aligned reads. Analyses of both simulated data and real hairpin bisulfite sequencing data show that our method can effectively assign approximately 70% of the multireads to their best locations with up to 90% accuracy, leading to a significant increase in the overall mapping efficiency. Moreover, the assignment model shows robust performance with low coverage depth, making it particularly attractive considering the prohibitive cost of bisulfite sequencing. Additionally, results show that longer reads help improve the performance of the assignment model. The assignment model is also robust to varying degrees of methylation and varying sequencing error rates. Finally, incorporating prior knowledge on mutation rate and context specific methylation level into the assignment model increases inference accuracy. The assignment model is implemented in the BAM-ABS package and freely available at https://github.com/zhanglabvt/BAM_ABS.  相似文献   

12.
Existing methods for identifying structural variants (SVs) from short read datasets are inaccurate. This complicates disease-gene identification and efforts to understand the consequences of genetic variation. In response, we have created Wham (Whole-genome Alignment Metrics) to provide a single, integrated framework for both structural variant calling and association testing, thereby bypassing many of the difficulties that currently frustrate attempts to employ SVs in association testing. Here we describe Wham, benchmark it against three other widely used SV identification tools–Lumpy, Delly and SoftSearch–and demonstrate Wham’s ability to identify and associate SVs with phenotypes using data from humans, domestic pigeons, and vaccinia virus. Wham and all associated software are covered under the MIT License and can be freely downloaded from github (https://github.com/zeeev/wham), with documentation on a wiki (http://zeeev.github.io/wham/). For community support please post questions to https://www.biostars.org/.
This is PLOS Computational Biology software paper.
  相似文献   

13.
Advances in biotechnology have resulted in large-scale studies of DNA methylation. A differentially methylated region (DMR) is a genomic region with multiple adjacent CpG sites that exhibit different methylation statuses among multiple samples. Many so-called “supervised” methods have been established to identify DMRs between two or more comparison groups. Methods for the identification of DMRs without reference to phenotypic information are, however, less well studied. An alternative “unsupervised” approach was proposed, in which DMRs in studied samples were identified with consideration of nature dependence structure of methylation measurements between neighboring probes from tiling arrays. Through simulation study, we investigated effects of dependencies between neighboring probes on determining DMRs where a lot of spurious signals would be produced if the methylation data were analyzed independently of the probe. In contrast, our newly proposed method could successfully correct for this effect with a well-controlled false positive rate and a comparable sensitivity. By applying to two real datasets, we demonstrated that our method could provide a global picture of methylation variation in studied samples. R source codes to implement the proposed method were freely available at http://www.csjfann.ibms.sinica.edu.tw/eag/programlist/ICDMR/ICDMR.html.  相似文献   

14.
DNA modifications such as methylation and DNA damage can play critical regulatory roles in biological systems. Single molecule, real time (SMRT) sequencing technology generates DNA sequences as well as DNA polymerase kinetic information that can be used for the direct detection of DNA modifications. We demonstrate that local sequence context has a strong impact on DNA polymerase kinetics in the neighborhood of the incorporation site during the DNA synthesis reaction, allowing for the possibility of estimating the expected kinetic rate of the enzyme at the incorporation site using kinetic rate information collected from existing SMRT sequencing data (historical data) covering the same local sequence contexts of interest. We develop an Empirical Bayesian hierarchical model for incorporating historical data. Our results show that the model could greatly increase DNA modification detection accuracy, and reduce requirement of control data coverage. For some DNA modifications that have a strong signal, a control sample is not even needed by using historical data as alternative to control. Thus, sequencing costs can be greatly reduced by using the model. We implemented the model in a R package named seqPatch, which is available at https://github.com/zhixingfeng/seqPatch.  相似文献   

15.
Identifying copy number variants (CNVs) can provide diagnoses to patients and provide important biological insights into human health and disease. Current exome and targeted sequencing approaches cannot detect clinically and biologically-relevant CNVs outside their target area. We present SavvyCNV, a tool which uses off-target read data from exome and targeted sequencing data to call germline CNVs genome-wide. Up to 70% of sequencing reads from exome and targeted sequencing fall outside the targeted regions. We have developed a new tool, SavvyCNV, to exploit this ‘free data’ to call CNVs across the genome. We benchmarked SavvyCNV against five state-of-the-art CNV callers using truth sets generated from genome sequencing data and Multiplex Ligation-dependent Probe Amplification assays. SavvyCNV called CNVs with high precision and recall, outperforming the five other tools at calling CNVs genome-wide, using off-target or on-target reads from targeted panel and exome sequencing. We then applied SavvyCNV to clinical samples sequenced using a targeted panel and were able to call previously undetected clinically-relevant CNVs, highlighting the utility of this tool within the diagnostic setting. SavvyCNV outperforms existing tools for calling CNVs from off-target reads. It can call CNVs genome-wide from targeted panel and exome data, increasing the utility and diagnostic yield of these tests. SavvyCNV is freely available at https://github.com/rdemolgen/SavvySuite.  相似文献   

16.

Background

Exome sequencing allows researchers to study the human genome in unprecedented detail. Among the many types of variants detectable through exome sequencing, one of the most over looked types of mutation is internal deletion of exons. Internal exon deletions are the absence of consecutive exons in a gene. Such deletions have potentially significant biological meaning, and they are often too short to be considered copy number variation. Therefore, to the need for efficient detection of such deletions using exome sequencing data exists.

Results

We present ExonDel, a tool specially designed to detect homozygous exon deletions efficiently. We tested ExonDel on exome sequencing data generated from 16 breast cancer cell lines and identified both novel and known IEDs. Subsequently, we verified our findings using RNAseq and PCR technologies. Further comparisons with multiple sequencing-based CNV tools showed that ExonDel is capable of detecting unique IEDs not found by other CNV tools.

Conclusions

ExonDel is an efficient way to screen for novel and known IEDs using exome sequencing data. ExonDel and its source code can be downloaded freely at https://github.com/slzhao/ExonDel.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-332) contains supplementary material, which is available to authorized users.  相似文献   

17.
Studies describing intricate patterns of DNA methylation in nematode and ciliate are controversial due to the uncertainty of genomic evolutionary conservation of DNA methylation enzymes.See related research articles http://genomebiology.com/2012/13/10/R99 and http://genomebiology.com/2012/13/10/R100  相似文献   

18.
Protein stability is a fundamental molecular property enabling organisms to adapt to their biological niches. How this is facilitated and whether there are kingdom specific or more general universal strategies are unknown. A principal obstacle to addressing this issue is that the vast majority of proteins lack annotation, specifically thermodynamic annotation, beyond the amino acid and chromosome information derived from genome sequencing. To address this gap and facilitate future investigation into large-scale patterns of protein stability and dynamics within and between organisms, we applied a unique ensemble-based thermodynamic characterization of protein folds to a substantial portion of extant sequenced genomes. Using this approach, we compiled a database resource focused on the position-specific variation in protein stability. Interrogation of the database reveals: 1) domains of life exhibit distinguishing thermodynamic features, with eukaryotes particularly different from both archaea and bacteria; 2) the optimal growth temperature of an organism is proportional to the average apolar enthalpy of its proteome; 3) intrinsic disorder content is also proportional to the apolar enthalpy (but unexpectedly not the predicted stability at 25 °C); and 4) secondary structure and global stability information of individual proteins is extractable. We hypothesize that wider access to residue-specific thermodynamic information of proteomes will result in deeper understanding of mechanisms driving functional adaptation and protein evolution. Our database is free for download at https://afc-science.github.io/thermo-env-atlas/ (last accessed January 18, 2022).  相似文献   

19.
Tn-seq is a high throughput technique for analysis of transposon mutant libraries. Tn-seq Explorer was developed as a convenient and easy-to-use package of tools for exploration of the Tn-seq data. In a typical application, the user will have obtained a collection of sequence reads adjacent to transposon insertions in a reference genome. The reads are first aligned to the reference genome using one of the tools available for this task. Tn-seq Explorer reads the alignment and the gene annotation, and provides the user with a set of tools to investigate the data and identify possibly essential or advantageous genes as those that contain significantly low counts of transposon insertions. Emphasis is placed on providing flexibility in selecting parameters and methodology most appropriate for each particular dataset. Tn-seq Explorer is written in Java as a menu-driven, stand-alone application. It was tested on Windows, Mac OS, and Linux operating systems. The source code is distributed under the terms of GNU General Public License. The program and the source code are available for download at http://www.cmbl.uga.edu/downloads/programs/Tn_seq_Explorer/ and https://github.com/sina-cb/Tn-seqExplorer.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号