首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Patterns of linkage disequilibrium, homoplasy, and incompatibility are difficult to interpret because they depend on several factors, including the recombination process and the population structure. Here we introduce a novel model-based framework to infer recombination properties from such summary statistics in bacterial genomes. The underlying model is sequentially Markovian so that data can be simulated very efficiently, and we use approximate Bayesian computation techniques to infer parameters. As this does not require us to calculate the likelihood function, the model can be easily extended to investigate less probed aspects of recombination. In particular, we extend our model to account for the bias in the recombination process whereby closely related bacteria recombine more often with one another. We show that this model provides a good fit to a data set of Bacillus cereus genomes and estimate several recombination properties, including the rate of bias in recombination. All the methods described in this article are implemented in a software package that is freely available for download at http://code.google.com/p/clonalorigin/.  相似文献   

2.
DNA methylation plays a central role in genomic regulation and disease. Sodium bisulfite treatment (SBT) causes unmethylated cytosines to be sequenced as thymine, which allows methylation levels to reflected in the number of ‘C’-‘C’ alignments covering reference cytosines. Di-base color reads produced by lifetech’s SOLiD sequencer provide unreliable results when translated to bases because single sequencing errors effect the downstream sequence. We describe FadE, an algorithm to accurately determine genome-wide methylation rates directly in color or nucleotide space. FadE uses SBT unmethylated and untreated data to determine background error rates and incorporate them into a model which uses Newton–Raphson optimization to estimate the methylation rate and provide a credible interval describing its distribution at every reference cytosine. We sequenced two slides of human fibroblast cell-line bisulfite-converted fragment library with the SOLiD sequencer to investigate genome-wide methylation levels. FadE reported widespread differences in methylation levels across CpG islands and a large number of differentially methylated regions adjacent to genes which compares favorably to the results of an investigation on the same cell-line using nucleotide-space reads at higher coverage levels, suggesting that FadE is an accurate method to estimate genome-wide methylation with color or nucleotide reads. http://code.google.com/p/fade/.  相似文献   

3.
DNA methylation is a chemical modification of cytosine bases that is pivotal for gene regulation, cellular specification and cancer development. Here, we describe an R package, methylKit, that rapidly analyzes genome-wide cytosine epigenetic profiles from high-throughput methylation and hydroxymethylation sequencing experiments. methylKit includes functions for clustering, sample quality visualization, differential methylation analysis and annotation features, thus automating and simplifying many of the steps for discerning statistically significant bases or regions of DNA methylation. Finally, we demonstrate methylKit on breast cancer data, in which we find statistically significant regions of differential methylation and stratify tumor subtypes. methylKit is available at http://code.google.com/p/methylkit.  相似文献   

4.
5.
With the rapid and steady increase of next generation sequencing data output, the mapping of short reads has become a major data analysis bottleneck. On a single computer, it can take several days to map the vast quantity of reads produced from a single Illumina HiSeq lane. In an attempt to ameliorate this bottleneck we present a new tool, DistMap - a modular, scalable and integrated workflow to map reads in the Hadoop distributed computing framework. DistMap is easy to use, currently supports nine different short read mapping tools and can be run on all Unix-based operating systems. It accepts reads in FASTQ format as input and provides mapped reads in a SAM/BAM format. DistMap supports both paired-end and single-end reads thereby allowing the mapping of read data produced by different sequencing platforms. DistMap is available from http://code.google.com/p/distmap/  相似文献   

6.
Repetitive sequences are biologically and clinically important because they can influence traits and disease, but repeats are challenging to analyse using short-read sequencing technology. We present a tool for genotyping microsatellite repeats called RepeatSeq, which uses Bayesian model selection guided by an empirically derived error model that incorporates sequence and read properties. Next, we apply RepeatSeq to high-coverage genomes from the 1000 Genomes Project to evaluate performance and accuracy. The software uses common formats, such as VCF, for compatibility with existing genome analysis pipelines. Source code and binaries are available at http://github.com/adaptivegenome/repeatseq.  相似文献   

7.
Clustering analysis is an important tool in studying gene expression data. The Bayesian hierarchical clustering (BHC) algorithm can automatically infer the number of clusters and uses Bayesian model selection to improve clustering quality. In this paper, we present an extension of the BHC algorithm. Our Gaussian BHC (GBHC) algorithm represents data as a mixture of Gaussian distributions. It uses normal-gamma distribution as a conjugate prior on the mean and precision of each of the Gaussian components. We tested GBHC over 11 cancer and 3 synthetic datasets. The results on cancer datasets show that in sample clustering, GBHC on average produces a clustering partition that is more concordant with the ground truth than those obtained from other commonly used algorithms. Furthermore, GBHC frequently infers the number of clusters that is often close to the ground truth. In gene clustering, GBHC also produces a clustering partition that is more biologically plausible than several other state-of-the-art methods. This suggests GBHC as an alternative tool for studying gene expression data.The implementation of GBHC is available at https://sites.google.com/site/gaussianbhc/  相似文献   

8.
Both 454 and Ion Torrent sequencers are capable of producing large amounts of long high-quality sequencing reads. However, as both methods sequence homopolymers in one cycle, they both suffer from homopolymer uncertainty and incorporation asynchronization. In mapping, such sequencing errors could shift alignments around homopolymers and thus induce incorrect mismatches, which have become a critical barrier against the accurate detection of single nucleotide polymorphisms (SNPs). In this article, we propose a hidden Markov model (HMM) to statistically and explicitly formulate homopolymer sequencing errors by the overcall, undercall, insertion and deletion. We use a hierarchical model to describe the sequencing and base-calling processes, and we estimate parameters of the HMM from resequencing data by an expectation-maximization algorithm. Based on the HMM, we develop a realignment-based SNP-calling program, termed PyroHMMsnp, which realigns read sequences around homopolymers according to the error model and then infers the underlying genotype by using a Bayesian approach. Simulation experiments show that the performance of PyroHMMsnp is exceptional across various sequencing coverages in terms of sensitivity, specificity and F1 measure, compared with other tools. Analysis of the human resequencing data shows that PyroHMMsnp predicts 12.9% more SNPs than Samtools while achieving a higher specificity. (http://code.google.com/p/pyrohmmsnp/).  相似文献   

9.
10.

Background

Dynamic visual exploration of detailed pathway information can help researchers digest and interpret complex mechanisms and genomic datasets.

Results

ChiBE is a free, open-source software tool for visualizing, querying, and analyzing human biological pathways in BioPAX format. The recently released version 2 can search for neighborhoods, paths between molecules, and common regulators/targets of molecules, on large integrated cellular networks in the Pathway Commons database as well as in local BioPAX models. Resulting networks can be automatically laid out for visualization using a graphically rich, process-centric notation. Profiling data from the cBioPortal for Cancer Genomics and expression data from the Gene Expression Omnibus can be overlaid on these networks.

Conclusions

ChiBE’s new capabilities are organized around a genomics-oriented workflow and offer a unique comprehensive pathway analysis solution for genomics researchers. The software is freely available at http://code.google.com/p/chibe.  相似文献   

11.
Bisulfite sequencing (BS-seq) is the gold standard for studying genome-wide DNA methylation. We developed MOABS to increase the speed, accuracy, statistical power and biological relevance of BS-seq data analysis. MOABS detects differential methylation with 10-fold coverage at single-CpG resolution based on a Beta-Binomial hierarchical model and is capable of processing two billion reads in 24 CPU hours. Here, using simulated and real BS-seq data, we demonstrate that MOABS outperforms other leading algorithms, such as Fisher’s exact test and BSmooth. Furthermore, MOABS analysis can be easily extended to differential 5hmC analysis using RRBS and oxBS-seq. MOABS is available at http://code.google.com/p/moabs/.  相似文献   

12.
PathVisio is a commonly used pathway editor, visualization and analysis software. Biological pathways have been used by biologists for many years to describe the detailed steps in biological processes. Those powerful, visual representations help researchers to better understand, share and discuss knowledge. Since the first publication of PathVisio in 2008, the original paper was cited more than 170 times and PathVisio was used in many different biological studies. As an online editor PathVisio is also integrated in the community curated pathway database WikiPathways.Here we present the third version of PathVisio with the newest additions and improvements of the application. The core features of PathVisio are pathway drawing, advanced data visualization and pathway statistics. Additionally, PathVisio 3 introduces a new powerful extension systems that allows other developers to contribute additional functionality in form of plugins without changing the core application.PathVisio can be downloaded from http://www.pathvisio.org and in 2014 PathVisio 3 has been downloaded over 5,500 times. There are already more than 15 plugins available in the central plugin repository. PathVisio is a freely available, open-source tool published under the Apache 2.0 license (http://www.apache.org/licenses/LICENSE-2.0). It is implemented in Java and thus runs on all major operating systems. The code repository is available at http://svn.bigcat.unimaas.nl/pathvisio. The support mailing list for users is available on https://groups.google.com/forum/#!forum/wikipathways-discuss and for developers on https://groups.google.com/forum/#!forum/wikipathways-devel.
This is a PLOS Computational Biology software article.
  相似文献   

13.
14.
DNA methylation is a chemical modification of cytosine bases that is pivotal for gene regulation, cellular specification and cancer development. Here, we describe an R package, methylKit, that rapidly analyzes genome-wide cytosine epigenetic profiles from high-throughput methylation and hydroxymethylation sequencing experiments. methylKit includes functions for clustering, sample quality visualization, differential methylation analysis and annotation features, thus automating and simplifying many of the steps for discerning statistically significant bases or regions of DNA methylation. Finally, we demonstrate methylKit on breast cancer data, in which we find statistically significant regions of differential methylation and stratify tumor subtypes. methylKit is available at http://code.google.com/p/methylkit.  相似文献   

15.
The introduction of affordable, consumer-oriented 3-D printers is a milestone in the current “maker movement,” which has been heralded as the next industrial revolution. Combined with free and open sharing of detailed design blueprints and accessible development tools, rapid prototypes of complex products can now be assembled in one’s own garage—a game-changer reminiscent of the early days of personal computing. At the same time, 3-D printing has also allowed the scientific and engineering community to build the “little things” that help a lab get up and running much faster and easier than ever before.Applications of 3-D printing technologies (Fig. 1A, Box 1) have become as diverse as the types of materials that can be used for printing. Replacement parts at the International Space Station may be printed in orbit from durable plastics or metals, while back on Earth the food industry is starting to explore the same basic technology to fold strings of chocolate into custom-shaped confectionary. Also, consumer-oriented laser-cutting technology makes it very easy to cut raw materials such as sheets of plywood, acrylic, or aluminum into complex shapes within seconds. The range of possibilities comes to light when those mechanical parts are combined with off-the-shelf electronics, low-cost microcontrollers like Arduino boards [1], and single-board computers such as a Beagleboard [2] or a Raspberry Pi [3]. After an initial investment of typically less than a thousand dollars (e.g., to set-up a 3-D printer), the only other materials needed to build virtually anything include a few hundred grams of plastic (approximately US$30/kg), cables, and basic electronic components [4,5].Open in a separate windowFig 1Examples of open 3-D printed laboratory tools. A 1, Components for laboratory tools, such as the base for a micromanipulator [18] shown here, can be rapidly prototyped using 3-D printing. A 2, The printed parts can be easily combined with an off-the-shelf continuous rotation servo-motor (bottom) to motorize the main axis. B 1, A 3-D printable micropipette [8], designed in OpenSCAD [19], shown in full (left) and cross-section (right). B 2, The pipette consists of the printed parts (blue), two biro fillings with the spring, an off-the-shelf piece of tubing to fit the tip, and one screw used as a spacer. B 3, Assembly is complete with a laboratory glove or balloon spanned between the two main printed parts and sealed with tape to create an airtight bottom chamber continuous with the pipette tip. Accuracy is ±2–10 μl depending on printer precision, and total capacity of the system is easily adjusted using two variables listed in the source code, or accessed via the “Customizer” plugin on the thingiverse link [8]. See also the first table.

Box 1. Glossary

Open source

A collective license that defines terms of free availability and redistribution of published source material. Terms include free and unrestricted distribution, as well as full access to source code/blueprints/circuit board designs and derived works. For details, see http://opensource.org.

Maker movement

Technology-oriented extension of the traditional “Do-it-Yourself (DIY)” movement, typically denoting specific pursuits in electronics, CNC (computer numerical control) tools such as mills and laser cutters, as well as 3-D printing and related technologies.

3-D printing

Technology to generate three-dimensional objects from raw materials based on computer models. Most consumer-oriented 3-D printers print in plastic by locally melting a strand of raw material at the tip (“hot-end”) and “drawing” a 3-D object in layers. Plastic materials include Acrylnitrile butadiene styrene (ABS) and Polylactic acid (PLA). Many variations of 3-D printers exist, including those based on laser-polymerization or fusion of resins or powdered raw materials (e.g., metal or ceramic printers).

Arduino boards

Inexpensive and consumer-oriented microcontroller boards built around simple processors. These boards offer a variety of interfaces (serial ports, I2C and CAN bus, etc.), μs-timers, and multiple general-purpose input-output (GPIO) pins suitable for running simple, time-precise programs to control custom-built electronics.

Single board computers

Inexpensive single-board computers capable of running a mature operating system with graphical-user interface, such as Linux. Like microcontroller boards, they offer a variety of hardware interfaces and GPIO pins to control custom-built electronics.It therefore comes as no surprise that these technologies are also routinely used by research scientists and, especially, educators aiming to customize existing lab equipment or even build sophisticated lab equipment from scratch for a mere fraction of what commercial alternatives cost [6]. Designs for such “Open Labware” include simple mechanical adaptors [7], micropipettes (Fig. 1B) [8], and an egg-whisk–based centrifuge [9] as well as more sophisticated equipment such as an extracellular amplifier for neurophysiological experiments [10], a thermocycler for PCR [11], or a two-photon microscope [12]. At the same time, conceptually related approaches are also being pursued in chemistry [1315] and material sciences [16,17]. See also
AreaProjectSource
MicroscopySmartphone Microscope http://www.instructables.com/id/10-Smartphone-to-digital-microscope-conversion
iPad Microscope http://www.thingiverse.com/thing:31632
Raspberry Pi Microscope http://www.thingiverse.com/thing:385308
Foldscope http://www.foldscope.com/
Molecular BiologyThermocycler (PCR) http://openpcr.org/
Water bath http://blog.labfab.cc/?p=47
Centrifuge http://www.thingiverse.com/thing:151406
Dremelfuge http://www.thingiverse.com/thing:1483
Colorometer http://www.thingiverse.com/thing:73910
Micropipette http://www.thingiverse.com/thing:255519
Gel Comb http://www.thingiverse.com/thing:352873
Hot Plate http://www.instructables.com/id/Programmable-Temperature-Controller-Hot-Plate/
Magnetic Stirrer http://www.instructables.com/id/How-to-Build-a-Magnetic-Stirrer/
ElectrophysiologyWaveform Generator http://www.instructables.com/id/Arduino-Waveform-Generator/
Open EEG https://www.olimex.com/Products/EEG/OpenEEG/
Mobile ECG http://mobilecg.hu/
Extracellular amplifier https://backyardbrains.com/products/spikerBox
Micromanipulator http://www.thingiverse.com/thing:239105
Open Ephys http://open-ephys.org/
OtherSyringe pump http://www.thingiverse.com/thing:210756
Translational Stage http://www.thingiverse.com/thing:144838
Vacuum pump http://www.instructables.com/id/The-simplest-vacuum-pump-in-the-world/
Skinner Box http://www.kscottz.com/open-skinner-box-pycon-2014/
Open in a separate windowSee also S1 Data.  相似文献   

16.
Comparison of Metatranscriptomic Samples Based on k-Tuple Frequencies     
Ying Wang  Lin Liu  Lina Chen  Ting Chen  Fengzhu Sun 《PloS one》2014,9(1)
  相似文献   

17.
PD5: A General Purpose Library for Primer Design Software     
Michael C. Riley  Wayne Aubrey  Michael Young  Amanda Clare 《PloS one》2013,8(11)

Background

Complex PCR applications for large genome-scale projects require fast, reliable and often highly sophisticated primer design software applications. Presently, such applications use pipelining methods to utilise many third party applications and this involves file parsing, interfacing and data conversion, which is slow and prone to error. A fully integrated suite of software tools for primer design would considerably improve the development time, the processing speed, and the reliability of bespoke primer design software applications.

Results

The PD5 software library is an open-source collection of classes and utilities, providing a complete collection of software building blocks for primer design and analysis. It is written in object-oriented C++ with an emphasis on classes suitable for efficient and rapid development of bespoke primer design programs. The modular design of the software library simplifies the development of specific applications and also integration with existing third party software where necessary. We demonstrate several applications created using this software library that have already proved to be effective, but we view the project as a dynamic environment for building primer design software and it is open for future development by the bioinformatics community. Therefore, the PD5 software library is published under the terms of the GNU General Public License, which guarantee access to source-code and allow redistribution and modification.

Conclusions

The PD5 software library is downloadable from Google Code and the accompanying Wiki includes instructions and examples: http://code.google.com/p/primer-design  相似文献   

18.
BioCluster:Tool for Identification and Clustering of Enterobacteriaceae Based on Biochemical Data     
Ahmed Abdullah  S.M. Sabbir Alam  Munawar Sultana  M. Anwar Hossain 《基因组蛋白质组与生物信息学报(英文版)》2015,13(3):192-199
Presumptive identification of different Enterobacteriaceae species is routinely achieved based on biochemical properties. Traditional practice includes manual comparison of each biochemical property of the unknown sample with known reference samples and inference of its identity based on the maximum similarity pattern with the known samples. This process is laborintensive, time-consuming, error-prone, and subjective. Therefore, automation of sorting and similarity in calculation would be advantageous. Here we present a MATLAB-based graphical user interface(GUI) tool named Bio Cluster. This tool was designed for automated clustering and identification of Enterobacteriaceae based on biochemical test results. In this tool, we used two types of algorithms, i.e., traditional hierarchical clustering(HC) and the Improved Hierarchical Clustering(IHC), a modified algorithm that was developed specifically for the clustering and identification of Enterobacteriaceae species. IHC takes into account the variability in result of 1–47 biochemical tests within this Enterobacteriaceae family. This tool also provides different options to optimize the clustering in a user-friendly way. Using computer-generated synthetic data and some real data, we have demonstrated that Bio Cluster has high accuracy in clustering and identifying enterobacterial species based on biochemical test data. This tool can be freely downloaded at http://microbialgen.du.ac.bd/biocluster/.  相似文献   

19.
A Novel Role of the N Terminus of Budding Yeast Histone H3 Variant Cse4 in Ubiquitin-Mediated Proteolysis     
Wei Chun Au  Anthony R. Dawson  David W. Rawson  Sara B. Taylor  Richard E. Baker  Munira A. Basrai 《Genetics》2013,194(2):513-518
Understanding the molecular basis of common traits is a primary challenge of modern genetics. One model holds that rare mutations in many genetic backgrounds may often phenocopy one another, together explaining the prevalence of the resulting trait in the population. For the vast majority of phenotypes, the role of rare variants and the evolutionary forces that underlie them are unknown. In this work, we use a population of Saccharomyces paradoxus yeast as a model system for the study of common trait variation. We observed an unusual, flocculation and invasive-growth phenotype in one-third of S. paradoxus strains, which were otherwise unrelated. In crosses with each strain in turn, these morphologies segregated as a recessive Mendelian phenotype, mapping either to IRA1 or to IRA2, yeast homologs of the hypermutable human neurofibromatosis gene NF1. The causal IRA1 and IRA2 haplotypes were of distinct evolutionary origin and, in addition to their morphological effects, associated with hundreds of stress-resistance and growth traits, both beneficial and disadvantageous, across S. paradoxus. Single-gene molecular genetic analyses confirmed variant IRA1 and IRA2 haplotypes as causal for these growth characteristics, many of which were independent of morphology. Our data make clear that common growth and morphology traits in yeast result from a suite of variants in master regulators, which function as a mutation-driven switch between phenotypic states.  相似文献   

20.
Fast and Sensitive Alignment of Microbial Whole Genome Sequencing Reads to Large Sequence Datasets on a Desktop PC: Application to Metagenomic Datasets and Pathogen Identification     
L?rinc S. Pongor  Roberto Vera  Balázs Ligeti 《PloS one》2014,9(7)
Next generation sequencing (NGS) of metagenomic samples is becoming a standard approach to detect individual species or pathogenic strains of microorganisms. Computer programs used in the NGS community have to balance between speed and sensitivity and as a result, species or strain level identification is often inaccurate and low abundance pathogens can sometimes be missed. We have developed Taxoner, an open source, taxon assignment pipeline that includes a fast aligner (e.g. Bowtie2) and a comprehensive DNA sequence database. We tested the program on simulated datasets as well as experimental data from Illumina, IonTorrent, and Roche 454 sequencing platforms. We found that Taxoner performs as well as, and often better than BLAST, but requires two orders of magnitude less running time meaning that it can be run on desktop or laptop computers. Taxoner is slower than the approaches that use small marker databases but is more sensitive due the comprehensive reference database. In addition, it can be easily tuned to specific applications using small tailored databases. When applied to metagenomic datasets, Taxoner can provide a functional summary of the genes mapped and can provide strain level identification. Taxoner is written in C for Linux operating systems. The code and documentation are available for research applications at http://code.google.com/p/taxoner.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号